chenyu
b087663c35
RANGEIFY test_bert uses more ran somehow ( #12443 )
2025-10-03 04:38:53 -04:00
chenyu
f203d8b221
update RANGEIFY kernel count and test_masked_select ( #12435 )
2025-10-03 00:41:34 -04:00
George Hotz
7129419500
fix cifar training in RANGEIFY ( #12355 )
...
* fix cifar training in RANGEIFY
* even more wino fuse
* bugfix
* test to show issue
2025-09-30 15:59:19 +08:00
chenyu
647965fb09
test_train cleanup ( #12140 )
...
* test_train cleanup
remove skipIf due to buffer sizes, runs locally
* those are slow
2025-09-12 13:21:30 -04:00
chenyu
20cd7177de
delete test_bert_fuse_arange ( #12121 )
...
* delete test_bert_fuse_arange
it's the default now and we are not interested in FUSE_ARANGE=0 version
* remove -v
2025-09-11 12:35:51 -04:00
chenyu
0e266f376c
ops_gpu -> ops_cl ( #12103 )
2025-09-10 15:15:48 -04:00
nimlgen
1c6c42715f
unify cpu and llvm ( #11982 )
...
* try unify cpu and llvm
* fixes
* fix
* ops
* no llvm
* fix
* rm
* lvmm is ot
* oops
* override
* no llvm
* ignore
* skip llvm
* ooops
2025-09-09 13:54:44 +03:00
chenyu
bfa87f3490
clean up binary_crossentropy_logits ( #10958 )
2025-06-24 12:23:40 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] ( #10708 )
...
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
2025-06-08 14:05:56 -07:00
qazal
95c6a736a9
fix FUSE_ARANGE=1 for bert ( #10255 )
2025-05-12 14:44:05 +03:00
chenyu
70c797b107
train bert tests ( #10248 )
...
added a working bert tiny test, and a failed bert FUSE_ARANGE test
2025-05-11 08:42:08 -04:00
George Hotz
b6d2effaf5
assign is contiguous ( #10066 )
...
* assign is contiguous
* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
qazal
1fce864a6d
delete multi output support ( #8822 )
...
* delete multioutput for now
* test_schedule
* test_assign too
* linter
* 515 for sd
* update tests and ctx
* update that assign check
2025-01-30 22:45:50 -05:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
George Hotz
f29d6f54b8
support multilb gradient [pr] ( #8624 )
2025-01-14 18:33:33 -08:00
George Hotz
aa3b094334
changes from delete lazy [pr] ( #8146 )
...
* changes from delete lazy [pr]
* test tweak
2024-12-10 11:06:17 -08:00
chenyu
aa51f3c14e
update kernel counts in test_real_world ( #7960 )
...
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.
if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
George Hotz
205befa788
move is_dtype_supported to device [pr] ( #7575 )
2024-11-07 20:38:03 +08:00
George Hotz
4013c9848c
don't use tons of memory for tests non CI [pr] ( #7209 )
...
* don't use tons of memory for tests
* fix import and clean up pre-commit
* use pathlib
* no shm on windows
* Revert "use pathlib"
This reverts commit 7c38489820 .
* run pre-commit hooks in test
* ugh, fix later
2024-10-22 15:04:51 +08:00
George Hotz
5ae2de9845
UOp.variable ( #7010 )
...
* UOp.variable [pr]
* fix tests
* clean
* improve name rendering
* last bug
2024-10-12 18:20:44 +08:00
chenyu
a0dbe20dbd
skip some redundant and slow tests in ci ( #5416 )
2024-07-12 14:43:13 -04:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
9a2a82a77f
test stable diffusion unet in ci ( #5268 )
...
unet is parameterized now so can test a smaller one is ci
2024-07-02 21:37:52 -04:00
Tobias Fischer
9a25ee0b9a
pixed unet call params ( #5262 )
2024-07-02 12:40:27 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
6bbbeb93ac
skip a few clang test that took > 30 seconds in CI ( #4126 )
...
* skip slow CLANG test test_train_cifar
* skip those too
* and that
* only CI
* one more
2024-04-10 02:00:34 -04:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
chenyu
a2d3cf64a5
move is_dtype_supported to test.helpers ( #3762 )
...
* move is_dtype_supported to test.helpers
updated all places that check if float16 is supports
* fix tests
2024-03-15 14:33:26 -04:00
chenyu
922f8319cb
Run test_real_world in METAL test ( #3760 )
...
* clean up test_real_world
* skip that
* JIT=2 for metal
* all device
2024-03-15 13:56:52 -04:00
George Hotz
41f0a25b53
lazy.py: cache consts ( #3577 )
...
* lazy.py: cache consts
* add regression test
* always always cache const
* bump by 1
2024-03-02 03:50:05 -08:00
xarkes
28a8b72024
Remove Interpreted device & remaining CPU/TORCH ref ( #3423 )
...
* Remove Interpreted device & remaining CPU/TORCH ref
* Oops
* supports_device was useful
* Fix doc wording
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-02-16 00:30:21 -05:00
George Hotz
41efaa848c
move graph.py and jit.py into features ( #3376 )
...
* move graph.py into features
* move jit into features
* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz
9e17378b60
Fix metal tests ( #3266 )
...
* small fixes for tests on mac
* remove device from TensorCore
2024-01-27 18:09:42 -08:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
Yixiang Gao
8a63f26a0f
make LR scheduler work with multigpu ( #3011 )
...
* add a failing test for LR scheduler when using multigpu
* fix calculation order and unnecessary tensor created for float
* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
Yixiang Gao
84eb6dd32a
skip GPU cause opencl on intel can't compile half
2024-01-03 07:07:21 -08:00
Yixiang Gao
73879b50ad
only need to check the min_lr for the nan bug
2024-01-03 07:00:50 -08:00
Yixiang Gao
99f8740c60
running half in CI CPU is slow
2024-01-02 18:44:35 -08:00
Yixiang Gao
781690fd99
how long it takes on CI CPU without the lr scheduler
2024-01-02 18:33:48 -08:00
Yixiang Gao
dd00bcb9c0
fix whitespace
2024-01-02 18:16:33 -08:00
Yixiang Gao
841487cad9
add half test with using hyp from benchmarks
2024-01-02 18:14:30 -08:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
chenyu
1fb815e77e
hotfix fix coder. RMSNorm cannot have float16 input ( #2932 )
...
* hotfix fix coder. RMSNorm cannot have float16 input
* update real world test due to new kernels
* more type casts
2023-12-25 02:28:11 -05:00
chenyu
0723f26c80
dtypes.default_float and dtypes.default_int ( #2824 )
2023-12-18 12:21:44 -05:00
George Hotz
877c78b4ce
lazy tests ( #2796 )
...
* tests
* mini sd is very mini
2023-12-16 08:24:21 -08:00
George Hotz
c6eb618013
tests from new lazy branch ( #2774 )
...
* tests from new lazy branch
* fix lin 11
* that was needed
* doesn't fail
* mark
* meant that
* llvm passes
2023-12-14 23:06:39 -08:00
George Hotz
6d6eb9302d
ruff checks the max line length is 150 ( #2734 )
...
* ruff checks the max line length is 150
* fix tensor.py
* a lot more
* done
2023-12-12 17:34:47 -08:00
George Hotz
4164d0ebbd
multitensor start ( #2676 )
...
* multitensor work
* early gen fixes the tests
* atol for flaky test
2023-12-07 17:07:05 -08:00