chenyu
dab07bcad9
use next instead of full list in UOp._device [pr] ( #11369 )
...
prevents exponential fan out
2025-07-25 10:04:29 -04:00
nimlgen
1bb1f1aee8
hcq: fix race in _at_profile_finalize ( #11368 )
2025-07-25 14:14:02 +03:00
George Hotz
490a93902c
define reg doesn't have init anymore ( #11365 )
...
* define reg doesn't have init anymore
* remove that
* no special logic for dr
* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
9da3f72495
identity store for DEFINE_REG ( #11363 )
...
* identity store for DEFINE_REG
* identity store for DEFINE_REG
* noop continue
2025-07-24 16:41:29 -07:00
chenyu
cc795c6656
simplify keccak pad mask code ( #11362 )
2025-07-24 19:24:10 -04:00
chenyu
c0c4bc9d7c
use int32 for keccak reorder_indexes ( #11360 )
...
it's used for tensor indexing, so int32 instead of uint64 is slightly faster
2025-07-24 15:54:50 -04:00
George Hotz
0602b22086
kernel spec ( #11359 )
...
* kernel spec
* ops.VIEW
* work
2025-07-24 12:45:38 -07:00
qazal
519f1d13cc
viz: generic stuff from gpu counters ui ( #11358 )
...
* viz: generic stuff from gpu counters ui
* move pointer
* pre fetch
* move timeout
2025-07-24 20:29:24 +03:00
nimlgen
3b3de8df61
hcq: graphed copies ( #11302 )
...
* fast copies p2
* upd and fix
* graph supports
* fixes
* fixes
* fixes
* fix
* fix
* fix mockgpu
* fix alignment
* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
3046ead6e8
jit: graph reports ei support ( #11356 )
2025-07-24 16:35:10 +03:00
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices ( #11354 )
...
* hcq: mapping of cpu to all hcq devices
* fix kfd
* nv
* simpler
* cleaner
* correct skip
* fix ifaces
* system fixes
* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6
more keccak reference tests ( #11329 )
2025-07-23 22:06:39 -04:00
George Hotz
b0dc97d1f7
write out kernel 3 in uops ( #11352 )
...
* write out kernel 3 in uops
* matmul is correct
* gemm passes spec
* bugfix to match speed
* cleanups
2025-07-23 17:32:38 -07:00
chenyu
5b570196e4
support DEV= to specify device ( #11351 )
2025-07-23 17:40:55 -04:00
uuuvn
76a2ddbd78
Move remote tests out of onnx ( #11310 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-23 13:25:55 -07:00
George Hotz
7f0a41df4d
move optional out of devectorize [pr] ( #11350 )
...
* move optional out of devectorize [pr]
* fast idiv
2025-07-23 11:26:05 -07:00
nimlgen
0f374e10d2
cpu: use mmap for allocations ( #11349 )
...
* cpu: use mmap for allocations
* ops
* fix mypy
2025-07-23 20:30:18 +03:00
George Hotz
ae07a93814
simple block barrier ( #11341 )
...
* simple block barrier
* simple
2025-07-23 10:14:11 -07:00
chenyu
86e7504111
mypy check extra/onnx.py ( #11348 )
...
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d
Remove StrEnum in onnx for python 3.10 ( #11345 )
...
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
qazal
478a355325
gate PRINT_MATCH_STATS behind graph_rewrite tracking ( #11344 )
2025-07-23 16:32:43 +03:00
nimlgen
ca09c180dc
cpu: remove del spam ( #11343 )
...
* cpu: remove del spam
* fix
2025-07-23 12:02:37 +03:00
nimlgen
304eb9cecb
allocate less memory in am tests ( #11342 )
2025-07-23 11:11:26 +03:00
George Hotz
e14b4fefa5
ranges on store ( #11334 )
...
* ranges on store
* fix store spec
* fix that
* fix gates
* fix tests
* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
c65b5aab62
small things from endrange ( #11339 )
...
* small things from endrange
* store
2025-07-22 19:45:37 -07:00
George Hotz
53339e62f7
no gate store anymore ( #11338 )
...
* no gate store anymore
* fix up spec
2025-07-22 18:41:15 -07:00
chenyu
7a9a5cfd28
isolate test/external/external_test_am.py ( #11335 )
...
seems to be the one crashing, also remove -n=auto for that
2025-07-22 19:02:20 -04:00
George Hotz
fcbd0e4de3
assigns are no longer used [pr] ( #11333 )
2025-07-22 15:35:07 -07:00
George Hotz
09431d4ad1
make DEFINE_REG behave like the others ( #11273 )
...
* simpler define reg
* cast
* PTRCAT define_acc
* cleanups
* fix uops stats
* fix linearizer tests
* llvm
* define reg sets const
* define reg sets const
* no assign
* collapse that
* fix test_max_pool2d_bigger_stride_dilation
* use index, fix webgpu
* devec
* fix tests
* fix webgpu
* fix llvm
* threads for python
* fix ops_python
* only for reg
* acc_half is real now in the emulator
* fix llvm
* fix webgpu init
* fix wgpu test
* fix some tests
* fix ptx
* fix ptx bool acc
* cleanups
* broken, meh. will fix with ENDRANGE
* line count
2025-07-22 13:53:56 -07:00
chenyu
4535908679
update keccak test_long ( #11331 )
...
it should compare with arg "shake_128"
2025-07-22 16:08:01 -04:00
nimlgen
3faa352dcc
am: bump version after mm changes ( #11328 )
2025-07-22 21:54:10 +03:00
George Hotz
affd83961c
small changes from define_reg ( #11327 )
...
* small changes from define_reg
* fix webgpu
2025-07-22 11:11:48 -07:00
nimlgen
53b3d87456
am: use 4-lvl pdir ( #11326 )
2025-07-22 20:58:15 +03:00
chenyu
2d7c28de6a
clean up dup lambdas in helper_test_exception ( #11325 )
2025-07-22 12:21:57 -04:00
chenyu
c6aa8e58ca
fix TestDropoutProbabilityEdgeCases ( #11322 )
2025-07-22 11:13:56 -04:00
chenyu
fb42c84365
merge TestRollEdgeCases into test_ops ( #11321 )
2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c
movementop only Tensor.roll ( #11317 )
...
* movementop only Tensor.roll
* fixed
2025-07-22 10:34:15 -04:00
chenyu
a41140241b
truncate unsigned const in cstyle ( #11318 )
...
it can be a warning or a hard error in clang
PTX and PYTHON also need fix, skipping for now
2025-07-22 08:02:12 -04:00
qazal
6668d6d241
fix word_wrap with newlines in input string [pr] ( #11319 )
2025-07-22 12:03:13 +03:00
qazal
0c4e19f270
hotfix: disable process replay in REMOTE=1 tests ( #11320 )
...
* hotfix: disable process replay in REMOTE=1 tests
* comment
2025-07-22 10:41:58 +03:00
George Hotz
3b674df34b
generic changes from define_reg_2 ( #11315 )
...
* generic changes from define_reg_2
* fix for ptx
* ugh, that one
2025-07-21 15:14:06 -07:00
chenyu
6e9506e6fd
Tensor.roll supports dims=None ( #11313 )
2025-07-21 17:29:23 -04:00
George Hotz
108aac8af4
use AddrSpace instead of local ( #11314 )
...
* use AddrSpace instead of local
* addrspace in test
2025-07-21 14:00:06 -07:00
chenyu
d3a93185a6
clean up test_roll ( #11312 )
2025-07-21 16:00:50 -04:00
George Hotz
532b52fcef
store has a dtype, like assign ( #11309 )
...
* store has a dtype, like assign
* fix upat
* fix test
2025-07-21 12:50:01 -07:00
geohotstan
445ff8de56
ONNX onnx_parser and buffer_parse clean up ( #11000 )
...
* start
* remove onnx.load from compile4 and move np to dropout
* clean up and enable test
* clean up
* move WebGPU ONNX test into MacOS (WebGPU)
* leave test in ONNX (CPU)
* fix raw_data init None, and simplify onnx_runner test a little?
* THESE TESTS ARE SO UGLY UGHH
* need to really think about how to structure the test
* wow LLMs are quite something
* not always on disk now
* also add external data loading test
* cleaner tests
* minimize diff and add const folding tests
* add external data loading too
* whoops add webgpu back.. but why was it not needed in the first place?
* better comment
* move webgpu test to macos(webgpu)?
* llm english so much better than me wow
* trigger CI to check flakiness
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab
rename kernelize to schedule, try 2 ( #11305 )
2025-07-21 11:18:36 -07:00
George Hotz
7e8f5dde74
matmul style is still reshape ( #11308 )
2025-07-21 11:14:57 -07:00
George Hotz
41de76a7fd
put assign and store next to each other [pr] ( #11306 )
2025-07-21 11:07:35 -07:00
nimlgen
de2df92551
hcq: use devices instead of ids in HCQGraph ( #11303 )
...
* hcq: use devices instead of ids in HCQGraph
* fiz
2025-07-21 20:03:12 +03:00