George Hotz
3dbde178c1
mark slow tests as slow instead of as CI ( #13736 )
...
* mark slow tests as slow instead of as CI
* CI shouldn't have different behavior
* more skips / CI
* slow
2025-12-17 10:29:57 -04:00
George Hotz
9015a22523
make tests faster ( #13734 )
2025-12-17 09:39:44 -04:00
chenyu
42f6cf3a90
tighter test_real_world mem and kernel count bounds ( #13573 )
...
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
George Hotz
b0da173f2f
add unique to const, fix longstanding bug ( #12965 )
...
* add unique to const, fix longstanding bug
* _force_unique=True
* fix tests
* fix more tests
2025-10-28 15:11:37 +08:00
George Hotz
3b0b3a2e64
fast RANGEIFY ( #12504 )
...
* rtoposort is fast, can replace rangeify with this
* fast rangeify
* work
* fast rangeify works for mnist
* should work
* progress
* pad fix
* FAST
* tests passing
* don't delete those shape ops
* put in rangeify map
* ending ranges fix
* tests
* mstack/mselect no hacks
* move to indexing.py
* touch up tests + add comments
* disable failing test
* actually make the file readable
* failing
* error
2025-10-08 19:38:06 +08:00
qazal
9448924d9e
update gpt2 kernel count tests in CI=0 ( #12523 )
2025-10-08 14:29:11 +03:00
chenyu
b087663c35
RANGEIFY test_bert uses more ran somehow ( #12443 )
2025-10-03 04:38:53 -04:00
chenyu
f203d8b221
update RANGEIFY kernel count and test_masked_select ( #12435 )
2025-10-03 00:41:34 -04:00
George Hotz
7129419500
fix cifar training in RANGEIFY ( #12355 )
...
* fix cifar training in RANGEIFY
* even more wino fuse
* bugfix
* test to show issue
2025-09-30 15:59:19 +08:00
chenyu
647965fb09
test_train cleanup ( #12140 )
...
* test_train cleanup
remove skipIf due to buffer sizes, runs locally
* those are slow
2025-09-12 13:21:30 -04:00
chenyu
20cd7177de
delete test_bert_fuse_arange ( #12121 )
...
* delete test_bert_fuse_arange
it's the default now and we are not interested in FUSE_ARANGE=0 version
* remove -v
2025-09-11 12:35:51 -04:00
chenyu
0e266f376c
ops_gpu -> ops_cl ( #12103 )
2025-09-10 15:15:48 -04:00
nimlgen
1c6c42715f
unify cpu and llvm ( #11982 )
...
* try unify cpu and llvm
* fixes
* fix
* ops
* no llvm
* fix
* rm
* lvmm is ot
* oops
* override
* no llvm
* ignore
* skip llvm
* ooops
2025-09-09 13:54:44 +03:00
chenyu
bfa87f3490
clean up binary_crossentropy_logits ( #10958 )
2025-06-24 12:23:40 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] ( #10708 )
...
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
2025-06-08 14:05:56 -07:00
qazal
95c6a736a9
fix FUSE_ARANGE=1 for bert ( #10255 )
2025-05-12 14:44:05 +03:00
chenyu
70c797b107
train bert tests ( #10248 )
...
added a working bert tiny test, and a failed bert FUSE_ARANGE test
2025-05-11 08:42:08 -04:00
George Hotz
b6d2effaf5
assign is contiguous ( #10066 )
...
* assign is contiguous
* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
qazal
1fce864a6d
delete multi output support ( #8822 )
...
* delete multioutput for now
* test_schedule
* test_assign too
* linter
* 515 for sd
* update tests and ctx
* update that assign check
2025-01-30 22:45:50 -05:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
George Hotz
f29d6f54b8
support multilb gradient [pr] ( #8624 )
2025-01-14 18:33:33 -08:00
George Hotz
aa3b094334
changes from delete lazy [pr] ( #8146 )
...
* changes from delete lazy [pr]
* test tweak
2024-12-10 11:06:17 -08:00
chenyu
aa51f3c14e
update kernel counts in test_real_world ( #7960 )
...
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.
if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
George Hotz
205befa788
move is_dtype_supported to device [pr] ( #7575 )
2024-11-07 20:38:03 +08:00
George Hotz
4013c9848c
don't use tons of memory for tests non CI [pr] ( #7209 )
...
* don't use tons of memory for tests
* fix import and clean up pre-commit
* use pathlib
* no shm on windows
* Revert "use pathlib"
This reverts commit 7c38489820 .
* run pre-commit hooks in test
* ugh, fix later
2024-10-22 15:04:51 +08:00
George Hotz
5ae2de9845
UOp.variable ( #7010 )
...
* UOp.variable [pr]
* fix tests
* clean
* improve name rendering
* last bug
2024-10-12 18:20:44 +08:00
chenyu
a0dbe20dbd
skip some redundant and slow tests in ci ( #5416 )
2024-07-12 14:43:13 -04:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
9a2a82a77f
test stable diffusion unet in ci ( #5268 )
...
unet is parameterized now so can test a smaller one is ci
2024-07-02 21:37:52 -04:00
Tobias Fischer
9a25ee0b9a
pixed unet call params ( #5262 )
2024-07-02 12:40:27 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
6bbbeb93ac
skip a few clang test that took > 30 seconds in CI ( #4126 )
...
* skip slow CLANG test test_train_cifar
* skip those too
* and that
* only CI
* one more
2024-04-10 02:00:34 -04:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
chenyu
a2d3cf64a5
move is_dtype_supported to test.helpers ( #3762 )
...
* move is_dtype_supported to test.helpers
updated all places that check if float16 is supports
* fix tests
2024-03-15 14:33:26 -04:00
chenyu
922f8319cb
Run test_real_world in METAL test ( #3760 )
...
* clean up test_real_world
* skip that
* JIT=2 for metal
* all device
2024-03-15 13:56:52 -04:00
George Hotz
41f0a25b53
lazy.py: cache consts ( #3577 )
...
* lazy.py: cache consts
* add regression test
* always always cache const
* bump by 1
2024-03-02 03:50:05 -08:00
xarkes
28a8b72024
Remove Interpreted device & remaining CPU/TORCH ref ( #3423 )
...
* Remove Interpreted device & remaining CPU/TORCH ref
* Oops
* supports_device was useful
* Fix doc wording
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-02-16 00:30:21 -05:00
George Hotz
41efaa848c
move graph.py and jit.py into features ( #3376 )
...
* move graph.py into features
* move jit into features
* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz
9e17378b60
Fix metal tests ( #3266 )
...
* small fixes for tests on mac
* remove device from TensorCore
2024-01-27 18:09:42 -08:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
Yixiang Gao
8a63f26a0f
make LR scheduler work with multigpu ( #3011 )
...
* add a failing test for LR scheduler when using multigpu
* fix calculation order and unnecessary tensor created for float
* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
Yixiang Gao
84eb6dd32a
skip GPU cause opencl on intel can't compile half
2024-01-03 07:07:21 -08:00
Yixiang Gao
73879b50ad
only need to check the min_lr for the nan bug
2024-01-03 07:00:50 -08:00
Yixiang Gao
99f8740c60
running half in CI CPU is slow
2024-01-02 18:44:35 -08:00
Yixiang Gao
781690fd99
how long it takes on CI CPU without the lr scheduler
2024-01-02 18:33:48 -08:00
Yixiang Gao
dd00bcb9c0
fix whitespace
2024-01-02 18:16:33 -08:00
Yixiang Gao
841487cad9
add half test with using hyp from benchmarks
2024-01-02 18:14:30 -08:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00