qazal
1fce864a6d
delete multi output support ( #8822 )
...
* delete multioutput for now
* test_schedule
* test_assign too
* linter
* 515 for sd
* update tests and ctx
* update that assign check
2025-01-30 22:45:50 -05:00
George Hotz
a9d9f98d05
hotfix: those tests fail locally on mac due to buffer count
2025-01-27 07:53:48 +09:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
geohotstan
dd82b4c913
make onnx runner a class ( #8647 )
...
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 10:11:05 -08:00
George Hotz
f29d6f54b8
support multilb gradient [pr] ( #8624 )
2025-01-14 18:33:33 -08:00
Francis Lata
5755ac1f72
Fix FC layer ResNet load_from_pretrained error ( #8387 )
...
* validate that FC exists before loading pretrained weights
* add test case for ResNet pretrained model without FC layer
* remove extra newline
* rename test case
* reraise exception if not handled by check
2024-12-26 18:11:27 -05:00
geohotstan
78cb47dfc5
docs and tests clean ups ( #8383 )
2024-12-23 11:12:13 -05:00
George Hotz
bd9c015b09
tests from grad uop path [pr] ( #8313 )
2024-12-18 09:25:05 -08:00
George Hotz
aa3b094334
changes from delete lazy [pr] ( #8146 )
...
* changes from delete lazy [pr]
* test tweak
2024-12-10 11:06:17 -08:00
chenyu
aa51f3c14e
update kernel counts in test_real_world ( #7960 )
...
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.
if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
George Hotz
205befa788
move is_dtype_supported to device [pr] ( #7575 )
2024-11-07 20:38:03 +08:00
George Hotz
4fe1945df6
llvm if load ( #7345 )
...
* llvm if load
* unneeded line
* local llvm CI
2024-10-29 11:33:22 +08:00
George Hotz
4013c9848c
don't use tons of memory for tests non CI [pr] ( #7209 )
...
* don't use tons of memory for tests
* fix import and clean up pre-commit
* use pathlib
* no shm on windows
* Revert "use pathlib"
This reverts commit 7c38489820 .
* run pre-commit hooks in test
* ugh, fix later
2024-10-22 15:04:51 +08:00
George Hotz
be64ac417e
move GGUF test to it's own file [pr] ( #7208 )
...
* move GGUF test to it's own file [pr]
* skip tests if modules aren't installed
2024-10-22 13:24:55 +08:00
George Hotz
5ae2de9845
UOp.variable ( #7010 )
...
* UOp.variable [pr]
* fix tests
* clean
* improve name rendering
* last bug
2024-10-12 18:20:44 +08:00
kormann
f5dd25d376
enable whisper batch for long sequences ( #6458 )
...
* long batch +test
* long batch +test
* cleanup
* rollback syntactic changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-09-17 00:42:10 -04:00
George Hotz
d3b098299d
add failing regression test for image ( #5540 )
...
* add failing regression test for image
* tg type
* simpler test
* don't realize image to image casts caused issue
* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot
90f0e2fc49
db in wal mode ( #5388 )
2024-07-12 20:43:36 -07:00
chenyu
a0dbe20dbd
skip some redundant and slow tests in ci ( #5416 )
2024-07-12 14:43:13 -04:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
9a2a82a77f
test stable diffusion unet in ci ( #5268 )
...
unet is parameterized now so can test a smaller one is ci
2024-07-02 21:37:52 -04:00
Tobias Fischer
9a25ee0b9a
pixed unet call params ( #5262 )
2024-07-02 12:40:27 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
e2c5054bdd
update resnet.load_from_pretrained ( #5040 )
2024-06-18 16:29:22 -04:00
chenyu
6bbbeb93ac
skip a few clang test that took > 30 seconds in CI ( #4126 )
...
* skip slow CLANG test test_train_cifar
* skip those too
* and that
* only CI
* one more
2024-04-10 02:00:34 -04:00
George Hotz
f916aadaea
external that test
2024-03-29 19:35:50 -07:00
reddyn12
9b5e15db6e
Mamba Implementation ( #3456 )
...
* first commit
* state back to orig
* mamba comparisions
* rm file
* rename file
* use Tensor.einsum and mke default model 370M
* Cleaned code and made a comparision test
* Simplyfy pull request. Only has 1 mamba implementation now.
* Update prompt
* rm whitespaces
* last space
* remove Einops dependency
* rm unused code
* add tests
* rm print statement
* rm imports
* skip CLANG
* Update skipIf description
* skip model test in CI and add CLANG fix
* rm Device import
* don't be stupid
* Fix conv assign
When the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token
* fix p1
* temp
* fix jit import
---------
Co-authored-by: schlimeszn <schlimeszn@gmail.com >
Co-authored-by: reddyn <nikidsniper@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-03-28 17:49:12 -07:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
chenyu
a2d3cf64a5
move is_dtype_supported to test.helpers ( #3762 )
...
* move is_dtype_supported to test.helpers
updated all places that check if float16 is supports
* fix tests
2024-03-15 14:33:26 -04:00
chenyu
922f8319cb
Run test_real_world in METAL test ( #3760 )
...
* clean up test_real_world
* skip that
* JIT=2 for metal
* all device
2024-03-15 13:56:52 -04:00
George Hotz
41f0a25b53
lazy.py: cache consts ( #3577 )
...
* lazy.py: cache consts
* add regression test
* always always cache const
* bump by 1
2024-03-02 03:50:05 -08:00
xarkes
28a8b72024
Remove Interpreted device & remaining CPU/TORCH ref ( #3423 )
...
* Remove Interpreted device & remaining CPU/TORCH ref
* Oops
* supports_device was useful
* Fix doc wording
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-02-16 00:30:21 -05:00
George Hotz
93eceef727
remove cpu prereqs ( #3410 )
2024-02-15 13:45:06 +01:00
George Hotz
41efaa848c
move graph.py and jit.py into features ( #3376 )
...
* move graph.py into features
* move jit into features
* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz
9e17378b60
Fix metal tests ( #3266 )
...
* small fixes for tests on mac
* remove device from TensorCore
2024-01-27 18:09:42 -08:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
Yixiang Gao
8a63f26a0f
make LR scheduler work with multigpu ( #3011 )
...
* add a failing test for LR scheduler when using multigpu
* fix calculation order and unnecessary tensor created for float
* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
chenyu
ae112c9dbe
fix some long lines in tests ( #3006 )
...
* fix some long lines in tests
* better
2024-01-03 23:53:33 -05:00
Yixiang Gao
84eb6dd32a
skip GPU cause opencl on intel can't compile half
2024-01-03 07:07:21 -08:00
Yixiang Gao
73879b50ad
only need to check the min_lr for the nan bug
2024-01-03 07:00:50 -08:00
Yixiang Gao
99f8740c60
running half in CI CPU is slow
2024-01-02 18:44:35 -08:00
Yixiang Gao
781690fd99
how long it takes on CI CPU without the lr scheduler
2024-01-02 18:33:48 -08:00
Yixiang Gao
dd00bcb9c0
fix whitespace
2024-01-02 18:16:33 -08:00
Yixiang Gao
841487cad9
add half test with using hyp from benchmarks
2024-01-02 18:14:30 -08:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
chenyu
1fb815e77e
hotfix fix coder. RMSNorm cannot have float16 input ( #2932 )
...
* hotfix fix coder. RMSNorm cannot have float16 input
* update real world test due to new kernels
* more type casts
2023-12-25 02:28:11 -05:00
chenyu
50cfb1fb3a
update onnx model links ( #2908 )
...
updated in https://github.com/onnx/models/pull/644
2023-12-22 00:19:41 -05:00
chenyu
73cadfbb3c
Remove pytest markers ( #2831 )
...
* remove pytest marker
* fix some, skip some
* tweak
* fix
* skip slow
* skip more
2023-12-18 18:53:28 -05:00
chenyu
0723f26c80
dtypes.default_float and dtypes.default_int ( #2824 )
2023-12-18 12:21:44 -05:00
George Hotz
877c78b4ce
lazy tests ( #2796 )
...
* tests
* mini sd is very mini
2023-12-16 08:24:21 -08:00