Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
dab07bcad9 use next instead of full list in UOp._device [pr] (#11369)
prevents exponential fan out
2025-07-25 10:04:29 -04:00
nimlgen
1bb1f1aee8 hcq: fix race in _at_profile_finalize (#11368) 2025-07-25 14:14:02 +03:00
George Hotz
490a93902c define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
9da3f72495 identity store for DEFINE_REG (#11363)
* identity store for DEFINE_REG

* identity store for DEFINE_REG

* noop continue
2025-07-24 16:41:29 -07:00
chenyu
cc795c6656 simplify keccak pad mask code (#11362) 2025-07-24 19:24:10 -04:00
chenyu
c0c4bc9d7c use int32 for keccak reorder_indexes (#11360)
it's used for tensor indexing, so int32 instead of uint64 is slightly faster
2025-07-24 15:54:50 -04:00
George Hotz
0602b22086 kernel spec (#11359)
* kernel spec

* ops.VIEW

* work
2025-07-24 12:45:38 -07:00
qazal
519f1d13cc viz: generic stuff from gpu counters ui (#11358)
* viz: generic stuff from gpu counters ui

* move pointer

* pre fetch

* move timeout
2025-07-24 20:29:24 +03:00
nimlgen
3b3de8df61 hcq: graphed copies (#11302)
* fast copies p2

* upd and fix

* graph supports

* fixes

* fixes

* fixes

* fix

* fix

* fix mockgpu

* fix alignment

* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
3046ead6e8 jit: graph reports ei support (#11356) 2025-07-24 16:35:10 +03:00
nimlgen
bf12041910 hcq: mapping of cpu to all hcq devices (#11354)
* hcq: mapping of cpu to all hcq devices

* fix kfd

* nv

* simpler

* cleaner

* correct skip

* fix ifaces

* system fixes

* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6 more keccak reference tests (#11329) 2025-07-23 22:06:39 -04:00
George Hotz
b0dc97d1f7 write out kernel 3 in uops (#11352)
* write out kernel 3 in uops

* matmul is correct

* gemm passes spec

* bugfix to match speed

* cleanups
2025-07-23 17:32:38 -07:00
chenyu
5b570196e4 support DEV= to specify device (#11351) 2025-07-23 17:40:55 -04:00
uuuvn
76a2ddbd78 Move remote tests out of onnx (#11310)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-23 13:25:55 -07:00
George Hotz
7f0a41df4d move optional out of devectorize [pr] (#11350)
* move optional out of devectorize [pr]

* fast idiv
2025-07-23 11:26:05 -07:00
nimlgen
0f374e10d2 cpu: use mmap for allocations (#11349)
* cpu: use mmap for allocations

* ops

* fix mypy
2025-07-23 20:30:18 +03:00
George Hotz
ae07a93814 simple block barrier (#11341)
* simple block barrier

* simple
2025-07-23 10:14:11 -07:00
chenyu
86e7504111 mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
qazal
478a355325 gate PRINT_MATCH_STATS behind graph_rewrite tracking (#11344) 2025-07-23 16:32:43 +03:00
nimlgen
ca09c180dc cpu: remove del spam (#11343)
* cpu: remove del spam

* fix
2025-07-23 12:02:37 +03:00
nimlgen
304eb9cecb allocate less memory in am tests (#11342) 2025-07-23 11:11:26 +03:00
George Hotz
e14b4fefa5 ranges on store (#11334)
* ranges on store

* fix store spec

* fix that

* fix gates

* fix tests

* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
c65b5aab62 small things from endrange (#11339)
* small things from endrange

* store
2025-07-22 19:45:37 -07:00
George Hotz
53339e62f7 no gate store anymore (#11338)
* no gate store anymore

* fix up spec
2025-07-22 18:41:15 -07:00
chenyu
7a9a5cfd28 isolate test/external/external_test_am.py (#11335)
seems to be the one crashing, also remove -n=auto for that
2025-07-22 19:02:20 -04:00
George Hotz
fcbd0e4de3 assigns are no longer used [pr] (#11333) 2025-07-22 15:35:07 -07:00
George Hotz
09431d4ad1 make DEFINE_REG behave like the others (#11273)
* simpler define reg

* cast

* PTRCAT define_acc

* cleanups

* fix uops stats

* fix linearizer tests

* llvm

* define reg sets const

* define reg sets const

* no assign

* collapse that

* fix test_max_pool2d_bigger_stride_dilation

* use index, fix webgpu

* devec

* fix tests

* fix webgpu

* fix llvm

* threads for python

* fix ops_python

* only for reg

* acc_half is real now in the emulator

* fix llvm

* fix webgpu init

* fix wgpu test

* fix some tests

* fix ptx

* fix ptx bool acc

* cleanups

* broken, meh. will fix with ENDRANGE

* line count
2025-07-22 13:53:56 -07:00
chenyu
4535908679 update keccak test_long (#11331)
it should compare with arg "shake_128"
2025-07-22 16:08:01 -04:00
nimlgen
3faa352dcc am: bump version after mm changes (#11328) 2025-07-22 21:54:10 +03:00
George Hotz
affd83961c small changes from define_reg (#11327)
* small changes from define_reg

* fix webgpu
2025-07-22 11:11:48 -07:00
nimlgen
53b3d87456 am: use 4-lvl pdir (#11326) 2025-07-22 20:58:15 +03:00
chenyu
2d7c28de6a clean up dup lambdas in helper_test_exception (#11325) 2025-07-22 12:21:57 -04:00
chenyu
c6aa8e58ca fix TestDropoutProbabilityEdgeCases (#11322) 2025-07-22 11:13:56 -04:00
chenyu
fb42c84365 merge TestRollEdgeCases into test_ops (#11321) 2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c movementop only Tensor.roll (#11317)
* movementop only Tensor.roll

* fixed
2025-07-22 10:34:15 -04:00
chenyu
a41140241b truncate unsigned const in cstyle (#11318)
it can be a warning or a hard error in clang

PTX and PYTHON also need fix, skipping for now
2025-07-22 08:02:12 -04:00
qazal
6668d6d241 fix word_wrap with newlines in input string [pr] (#11319) 2025-07-22 12:03:13 +03:00
qazal
0c4e19f270 hotfix: disable process replay in REMOTE=1 tests (#11320)
* hotfix: disable process replay in REMOTE=1 tests

* comment
2025-07-22 10:41:58 +03:00
George Hotz
3b674df34b generic changes from define_reg_2 (#11315)
* generic changes from define_reg_2

* fix for ptx

* ugh, that one
2025-07-21 15:14:06 -07:00
chenyu
6e9506e6fd Tensor.roll supports dims=None (#11313) 2025-07-21 17:29:23 -04:00
George Hotz
108aac8af4 use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
chenyu
d3a93185a6 clean up test_roll (#11312) 2025-07-21 16:00:50 -04:00
George Hotz
532b52fcef store has a dtype, like assign (#11309)
* store has a dtype, like assign

* fix upat

* fix test
2025-07-21 12:50:01 -07:00
geohotstan
445ff8de56 ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
George Hotz
7e8f5dde74 matmul style is still reshape (#11308) 2025-07-21 11:14:57 -07:00
George Hotz
41de76a7fd put assign and store next to each other [pr] (#11306) 2025-07-21 11:07:35 -07:00
nimlgen
de2df92551 hcq: use devices instead of ids in HCQGraph (#11303)
* hcq: use devices instead of ids in HCQGraph

* fiz
2025-07-21 20:03:12 +03:00