Commit Graph

136 Commits

Author SHA1 Message Date
qazal
1fce864a6d delete multi output support (#8822)
* delete multioutput for now

* test_schedule

* test_assign too

* linter

* 515 for sd

* update tests and ctx

* update that assign check
2025-01-30 22:45:50 -05:00
George Hotz
a9d9f98d05 hotfix: those tests fail locally on mac due to buffer count 2025-01-27 07:53:48 +09:00
George Hotz
b4bf6a7dea switch backward to use gradient [pr] (#8235)
* switch backward to use gradient [pr]

* set device correctly, dedup

* why does that fail?

* add noop cast

* simple backward

* fix beautiful_mnist

* touchups

* set in compute_gradient

* uop_count

* uop_count was wrong

* collections

* no note

* skip that test

* update sched kernel counts

* train mnist is 65

* fix metadata and gc

* fixes

* materialize_grads

* no pathlib stuff

* add contiguous_backward, fix bugs

* add some realize

* fix multi
2025-01-26 09:12:16 +09:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
George Hotz
f29d6f54b8 support multilb gradient [pr] (#8624) 2025-01-14 18:33:33 -08:00
Francis Lata
5755ac1f72 Fix FC layer ResNet load_from_pretrained error (#8387)
* validate that FC exists before loading pretrained weights

* add test case for ResNet pretrained model without FC layer

* remove extra newline

* rename test case

* reraise exception if not handled by check
2024-12-26 18:11:27 -05:00
geohotstan
78cb47dfc5 docs and tests clean ups (#8383) 2024-12-23 11:12:13 -05:00
George Hotz
bd9c015b09 tests from grad uop path [pr] (#8313) 2024-12-18 09:25:05 -08:00
George Hotz
aa3b094334 changes from delete lazy [pr] (#8146)
* changes from delete lazy [pr]

* test tweak
2024-12-10 11:06:17 -08:00
chenyu
aa51f3c14e update kernel counts in test_real_world (#7960)
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.

if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
George Hotz
205befa788 move is_dtype_supported to device [pr] (#7575) 2024-11-07 20:38:03 +08:00
George Hotz
4fe1945df6 llvm if load (#7345)
* llvm if load

* unneeded line

* local llvm CI
2024-10-29 11:33:22 +08:00
George Hotz
4013c9848c don't use tons of memory for tests non CI [pr] (#7209)
* don't use tons of memory for tests

* fix import and clean up pre-commit

* use pathlib

* no shm on windows

* Revert "use pathlib"

This reverts commit 7c38489820.

* run pre-commit hooks in test

* ugh, fix later
2024-10-22 15:04:51 +08:00
George Hotz
be64ac417e move GGUF test to it's own file [pr] (#7208)
* move GGUF test to it's own file [pr]

* skip tests if modules aren't installed
2024-10-22 13:24:55 +08:00
George Hotz
5ae2de9845 UOp.variable (#7010)
* UOp.variable [pr]

* fix tests

* clean

* improve name rendering

* last bug
2024-10-12 18:20:44 +08:00
kormann
f5dd25d376 enable whisper batch for long sequences (#6458)
* long batch +test

* long batch +test

* cleanup

* rollback syntactic changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-17 00:42:10 -04:00
George Hotz
d3b098299d add failing regression test for image (#5540)
* add failing regression test for image

* tg type

* simpler test

* don't realize image to image casts caused issue

* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot
90f0e2fc49 db in wal mode (#5388) 2024-07-12 20:43:36 -07:00
chenyu
a0dbe20dbd skip some redundant and slow tests in ci (#5416) 2024-07-12 14:43:13 -04:00
chenyu
322c37e621 use helpers.JIT in llama and gpt2 examples (#5350)
* use helpers.JIT in llama and gpt2 examples

replaced getenv("JIT"), effectively made gpt2 default jit

* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
9a2a82a77f test stable diffusion unet in ci (#5268)
unet is parameterized now so can test a smaller one is ci
2024-07-02 21:37:52 -04:00
Tobias Fischer
9a25ee0b9a pixed unet call params (#5262) 2024-07-02 12:40:27 -04:00
Tobias Fischer
8c9c1cf62f Pulled CLIP and UNet into Seperate Files (#5253)
* pulled clip and unet into seperate files

* reference cleanup, lru cache fix

* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
e2c5054bdd update resnet.load_from_pretrained (#5040) 2024-06-18 16:29:22 -04:00
chenyu
6bbbeb93ac skip a few clang test that took > 30 seconds in CI (#4126)
* skip slow CLANG test test_train_cifar

* skip those too

* and that

* only CI

* one more
2024-04-10 02:00:34 -04:00
George Hotz
f916aadaea external that test 2024-03-29 19:35:50 -07:00
reddyn12
9b5e15db6e Mamba Implementation (#3456)
* first commit

* state back to orig

* mamba comparisions

* rm file

* rename file

* use Tensor.einsum and mke default model 370M

* Cleaned code and made a comparision test

* Simplyfy pull request. Only has 1 mamba implementation now.

* Update prompt

* rm whitespaces

* last space

* remove Einops dependency

* rm unused code

* add tests

* rm print statement

* rm imports

* skip CLANG

* Update skipIf description

* skip model test in CI and add CLANG fix

* rm Device import

* don't be stupid

* Fix conv assign

When the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token

* fix p1

* temp

* fix jit import

---------

Co-authored-by: schlimeszn <schlimeszn@gmail.com>
Co-authored-by: reddyn <nikidsniper@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-28 17:49:12 -07:00
George Hotz
150ea2eb76 create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
chenyu
a2d3cf64a5 move is_dtype_supported to test.helpers (#3762)
* move is_dtype_supported to test.helpers

updated all places that check if float16 is supports

* fix tests
2024-03-15 14:33:26 -04:00
chenyu
922f8319cb Run test_real_world in METAL test (#3760)
* clean up test_real_world

* skip that

* JIT=2 for metal

* all device
2024-03-15 13:56:52 -04:00
George Hotz
41f0a25b53 lazy.py: cache consts (#3577)
* lazy.py: cache consts

* add regression test

* always always cache const

* bump by 1
2024-03-02 03:50:05 -08:00
xarkes
28a8b72024 Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
George Hotz
93eceef727 remove cpu prereqs (#3410) 2024-02-15 13:45:06 +01:00
George Hotz
41efaa848c move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz
9e17378b60 Fix metal tests (#3266)
* small fixes for tests on mac

* remove device from TensorCore
2024-01-27 18:09:42 -08:00
chenyu
f88506e630 move gpt2/llama sampling inside the model call (#3013)
* move gpt2/llama sampling inside the model call

* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
Yixiang Gao
8a63f26a0f make LR scheduler work with multigpu (#3011)
* add a failing test for LR scheduler when using multigpu

* fix calculation order and unnecessary tensor created for float

* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
chenyu
ae112c9dbe fix some long lines in tests (#3006)
* fix some long lines in tests

* better
2024-01-03 23:53:33 -05:00
Yixiang Gao
84eb6dd32a skip GPU cause opencl on intel can't compile half 2024-01-03 07:07:21 -08:00
Yixiang Gao
73879b50ad only need to check the min_lr for the nan bug 2024-01-03 07:00:50 -08:00
Yixiang Gao
99f8740c60 running half in CI CPU is slow 2024-01-02 18:44:35 -08:00
Yixiang Gao
781690fd99 how long it takes on CI CPU without the lr scheduler 2024-01-02 18:33:48 -08:00
Yixiang Gao
dd00bcb9c0 fix whitespace 2024-01-02 18:16:33 -08:00
Yixiang Gao
841487cad9 add half test with using hyp from benchmarks 2024-01-02 18:14:30 -08:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
chenyu
1fb815e77e hotfix fix coder. RMSNorm cannot have float16 input (#2932)
* hotfix fix coder. RMSNorm cannot have float16 input

* update real world test due to new kernels

* more type casts
2023-12-25 02:28:11 -05:00
chenyu
50cfb1fb3a update onnx model links (#2908)
updated in https://github.com/onnx/models/pull/644
2023-12-22 00:19:41 -05:00
chenyu
73cadfbb3c Remove pytest markers (#2831)
* remove pytest marker

* fix some, skip some

* tweak

* fix

* skip slow

* skip more
2023-12-18 18:53:28 -05:00
chenyu
0723f26c80 dtypes.default_float and dtypes.default_int (#2824) 2023-12-18 12:21:44 -05:00
George Hotz
877c78b4ce lazy tests (#2796)
* tests

* mini sd is very mini
2023-12-16 08:24:21 -08:00