Commit Graph

4433 Commits

Author SHA1 Message Date
chenyu
c9225d22ce only disable flaky test_jit_multidev_xfer (#11523) 2025-08-05 22:17:25 -04:00
George Hotz
07b0df0d86 hotfix: test tensor dims start at 1 2025-08-05 15:40:24 -07:00
George Hotz
4dabdf7c6d Revert "optimize in rewrite (#11516)" (#11517)
This reverts commit 3b777a9e05.
2025-08-05 15:39:07 -07:00
George Hotz
3b777a9e05 optimize in rewrite (#11516)
* changes

* fix test uops

* dim shouldn't be 0

* huh, why did that one not save
2025-08-05 15:33:26 -07:00
nimlgen
fc4e713d1c jit graph split tests (#11507)
* jit graph split tests

* fix

* one more test

* more tests

* fix

* xm

* rmeote
2025-08-05 21:32:37 +03:00
chenyu
ace8e9a706 fix test_conv2d_winograd (#11511) 2025-08-05 12:15:46 -04:00
chenyu
223aaa0492 clean up more conv tests (#11510) 2025-08-05 12:15:30 -04:00
Garret Castro
76e62a1c23 extract conv layer test logic (#11488)
* refactor: extract conv layer test logic

* tuple is unnecessary

* integrate _test_conv logic into all conv tests

* fix linter, forgot dilation

* undo winograd extraction

adds too many if statements for a single case
2025-08-05 11:15:54 -04:00
uuuvn
011ef8fa9d Fix incorrect jit current batch devs reset (#11505)
`current_batch_devs = []` (in `flush_batch()`) happens between
`new_batched_devs = ...` and `current_batch_devs = new_batched_devs` =>
doesn't actually reset anything leading to things not jitting properly

which 2xs remote bert step time (should have similar effects on any
non-hcq backend)
2025-08-05 08:16:16 +03:00
chenyu
f02720ca2d fix fuse gate_contiguous unique (#11504) 2025-08-04 23:43:31 -04:00
qazal
846a2826ab viz: remove TracingKey.fmt (#11482)
* viz: remove TracingKey.fmt

* remove from test too
2025-08-05 00:00:03 +03:00
leopf
4f0ee4e982 BPE tokenizer (#11415)
* BPE works

* refactor tok

* oops

* basic tests

* fix eval

* smaller diff

* fix error

* proper vocab decoding

* use regex for splitting

* escape ucatrange

* full compat

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236 fix double exception + add name,loc in error msg (#11487)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-04 13:41:23 +03:00
chenyu
e0106b6b25 1/(x*c) -> (1/c)*(1/x) (#11491)
example: 2*(2*a).reciprocal() -> a.reciprocal()

# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
chenyu
dbc7807c61 enable WEBGPU tests with buffer limit (#11489)
TestSample still fails?
2025-08-03 13:02:44 -07:00
chenyu
66be747908 few more dtype cast convinience methods (#11480) 2025-08-02 15:47:09 -04:00
chenyu
e22e5da9a5 move some test_dtype tests to unit (#11479) 2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4 hcq: cpu can be graphed (#11474)
* hcq: cpu can be graphed

* ops

* new jit decisions

* fix test

* fix remote

* cleaner

* fix
2025-08-02 21:01:19 +03:00
kevvz
ef7e01cadf Fix SVD shape bug + Fix batched SVD bug (#11477)
* failing test case

* fix

* better test

* space
2025-08-02 09:47:41 -07:00
qazal
fa66d9772d viz: show const node when it's root (#11456) 2025-08-01 01:01:58 +03:00
Eitan Turok
cba3655de5 Add Test for Setitem (#10559)
* init

* update

* better

* failing test

* works

* Delete test file

* clean

* lint

* simplify variable name

* rm contigious, rm int dtype, and add assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-30 22:03:41 -04:00
chenyu
4ca430e5bf fix search dedup (#11439)
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
chenyu
d5fc6af4a2 remove unused ShapeTracker.consecutive [pr] (#11426) 2025-07-29 18:36:19 -04:00
George Hotz
49a2583584 real new lowerer (#11419)
* real new lowerer

* fix group for reduce

* skip missing ranges

* fix wmma and unroll/contract

* real fix for wmma

* disable that test

* fix if gate

* simpler

* flash attention fusion works

* no end barriers

* still broken

* flash attention finally works
2025-07-29 15:35:51 -07:00
chenyu
0e5d8d5c3c remove tests that used .to_uop() (#11425)
* remove tests that used .to_uop()

* import
2025-07-29 15:52:16 -04:00
nimlgen
d38d285489 ci: add h machines (#11416)
* ci: add h machines

* more

* fix names

* names not collide

* 20

* 10
2025-07-29 19:21:51 +03:00
George Hotz
8c10085459 assert shape on lowerer store [pr] (#11395)
* assert shape on lowerer store [pr]

* fix ptx
2025-07-27 10:41:57 -07:00
George Hotz
dfeee63d30 uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061 no_vectorized_acc keeps single DEFINE_REG (#11387)
* no_vectorized_acc keeps single DEFINE_REG

* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
George Hotz
466ab5a3f2 store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
chenyu
88c338bfcc add kernelize to keccak for each data block (#11370)
* add kernelize to keccak for each data block

test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep

* this?

* hash stuff

* gate test

* mv
2025-07-25 16:07:20 -04:00
nimlgen
3b3de8df61 hcq: graphed copies (#11302)
* fast copies p2

* upd and fix

* graph supports

* fixes

* fixes

* fixes

* fix

* fix

* fix mockgpu

* fix alignment

* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
bf12041910 hcq: mapping of cpu to all hcq devices (#11354)
* hcq: mapping of cpu to all hcq devices

* fix kfd

* nv

* simpler

* cleaner

* correct skip

* fix ifaces

* system fixes

* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6 more keccak reference tests (#11329) 2025-07-23 22:06:39 -04:00
chenyu
5b570196e4 support DEV= to specify device (#11351) 2025-07-23 17:40:55 -04:00
George Hotz
7f0a41df4d move optional out of devectorize [pr] (#11350)
* move optional out of devectorize [pr]

* fast idiv
2025-07-23 11:26:05 -07:00
chenyu
960da9319d Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
nimlgen
304eb9cecb allocate less memory in am tests (#11342) 2025-07-23 11:11:26 +03:00
George Hotz
e14b4fefa5 ranges on store (#11334)
* ranges on store

* fix store spec

* fix that

* fix gates

* fix tests

* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
53339e62f7 no gate store anymore (#11338)
* no gate store anymore

* fix up spec
2025-07-22 18:41:15 -07:00
George Hotz
09431d4ad1 make DEFINE_REG behave like the others (#11273)
* simpler define reg

* cast

* PTRCAT define_acc

* cleanups

* fix uops stats

* fix linearizer tests

* llvm

* define reg sets const

* define reg sets const

* no assign

* collapse that

* fix test_max_pool2d_bigger_stride_dilation

* use index, fix webgpu

* devec

* fix tests

* fix webgpu

* fix llvm

* threads for python

* fix ops_python

* only for reg

* acc_half is real now in the emulator

* fix llvm

* fix webgpu init

* fix wgpu test

* fix some tests

* fix ptx

* fix ptx bool acc

* cleanups

* broken, meh. will fix with ENDRANGE

* line count
2025-07-22 13:53:56 -07:00
chenyu
4535908679 update keccak test_long (#11331)
it should compare with arg "shake_128"
2025-07-22 16:08:01 -04:00
George Hotz
affd83961c small changes from define_reg (#11327)
* small changes from define_reg

* fix webgpu
2025-07-22 11:11:48 -07:00
chenyu
2d7c28de6a clean up dup lambdas in helper_test_exception (#11325) 2025-07-22 12:21:57 -04:00
chenyu
c6aa8e58ca fix TestDropoutProbabilityEdgeCases (#11322) 2025-07-22 11:13:56 -04:00
chenyu
fb42c84365 merge TestRollEdgeCases into test_ops (#11321) 2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c movementop only Tensor.roll (#11317)
* movementop only Tensor.roll

* fixed
2025-07-22 10:34:15 -04:00
chenyu
a41140241b truncate unsigned const in cstyle (#11318)
it can be a warning or a hard error in clang

PTX and PYTHON also need fix, skipping for now
2025-07-22 08:02:12 -04:00
qazal
6668d6d241 fix word_wrap with newlines in input string [pr] (#11319) 2025-07-22 12:03:13 +03:00
George Hotz
3b674df34b generic changes from define_reg_2 (#11315)
* generic changes from define_reg_2

* fix for ptx

* ugh, that one
2025-07-21 15:14:06 -07:00