Commit Graph

10417 Commits

Author SHA1 Message Date
nimlgen
0f374e10d2 cpu: use mmap for allocations (#11349)
* cpu: use mmap for allocations

* ops

* fix mypy
2025-07-23 20:30:18 +03:00
George Hotz
ae07a93814 simple block barrier (#11341)
* simple block barrier

* simple
2025-07-23 10:14:11 -07:00
chenyu
86e7504111 mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
qazal
478a355325 gate PRINT_MATCH_STATS behind graph_rewrite tracking (#11344) 2025-07-23 16:32:43 +03:00
nimlgen
ca09c180dc cpu: remove del spam (#11343)
* cpu: remove del spam

* fix
2025-07-23 12:02:37 +03:00
nimlgen
304eb9cecb allocate less memory in am tests (#11342) 2025-07-23 11:11:26 +03:00
George Hotz
e14b4fefa5 ranges on store (#11334)
* ranges on store

* fix store spec

* fix that

* fix gates

* fix tests

* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
c65b5aab62 small things from endrange (#11339)
* small things from endrange

* store
2025-07-22 19:45:37 -07:00
George Hotz
53339e62f7 no gate store anymore (#11338)
* no gate store anymore

* fix up spec
2025-07-22 18:41:15 -07:00
chenyu
7a9a5cfd28 isolate test/external/external_test_am.py (#11335)
seems to be the one crashing, also remove -n=auto for that
2025-07-22 19:02:20 -04:00
George Hotz
fcbd0e4de3 assigns are no longer used [pr] (#11333) 2025-07-22 15:35:07 -07:00
George Hotz
09431d4ad1 make DEFINE_REG behave like the others (#11273)
* simpler define reg

* cast

* PTRCAT define_acc

* cleanups

* fix uops stats

* fix linearizer tests

* llvm

* define reg sets const

* define reg sets const

* no assign

* collapse that

* fix test_max_pool2d_bigger_stride_dilation

* use index, fix webgpu

* devec

* fix tests

* fix webgpu

* fix llvm

* threads for python

* fix ops_python

* only for reg

* acc_half is real now in the emulator

* fix llvm

* fix webgpu init

* fix wgpu test

* fix some tests

* fix ptx

* fix ptx bool acc

* cleanups

* broken, meh. will fix with ENDRANGE

* line count
2025-07-22 13:53:56 -07:00
chenyu
4535908679 update keccak test_long (#11331)
it should compare with arg "shake_128"
2025-07-22 16:08:01 -04:00
nimlgen
3faa352dcc am: bump version after mm changes (#11328) 2025-07-22 21:54:10 +03:00
George Hotz
affd83961c small changes from define_reg (#11327)
* small changes from define_reg

* fix webgpu
2025-07-22 11:11:48 -07:00
nimlgen
53b3d87456 am: use 4-lvl pdir (#11326) 2025-07-22 20:58:15 +03:00
chenyu
2d7c28de6a clean up dup lambdas in helper_test_exception (#11325) 2025-07-22 12:21:57 -04:00
chenyu
c6aa8e58ca fix TestDropoutProbabilityEdgeCases (#11322) 2025-07-22 11:13:56 -04:00
chenyu
fb42c84365 merge TestRollEdgeCases into test_ops (#11321) 2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c movementop only Tensor.roll (#11317)
* movementop only Tensor.roll

* fixed
2025-07-22 10:34:15 -04:00
chenyu
a41140241b truncate unsigned const in cstyle (#11318)
it can be a warning or a hard error in clang

PTX and PYTHON also need fix, skipping for now
2025-07-22 08:02:12 -04:00
qazal
6668d6d241 fix word_wrap with newlines in input string [pr] (#11319) 2025-07-22 12:03:13 +03:00
qazal
0c4e19f270 hotfix: disable process replay in REMOTE=1 tests (#11320)
* hotfix: disable process replay in REMOTE=1 tests

* comment
2025-07-22 10:41:58 +03:00
George Hotz
3b674df34b generic changes from define_reg_2 (#11315)
* generic changes from define_reg_2

* fix for ptx

* ugh, that one
2025-07-21 15:14:06 -07:00
chenyu
6e9506e6fd Tensor.roll supports dims=None (#11313) 2025-07-21 17:29:23 -04:00
George Hotz
108aac8af4 use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
chenyu
d3a93185a6 clean up test_roll (#11312) 2025-07-21 16:00:50 -04:00
George Hotz
532b52fcef store has a dtype, like assign (#11309)
* store has a dtype, like assign

* fix upat

* fix test
2025-07-21 12:50:01 -07:00
geohotstan
445ff8de56 ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
George Hotz
7e8f5dde74 matmul style is still reshape (#11308) 2025-07-21 11:14:57 -07:00
George Hotz
41de76a7fd put assign and store next to each other [pr] (#11306) 2025-07-21 11:07:35 -07:00
nimlgen
de2df92551 hcq: use devices instead of ids in HCQGraph (#11303)
* hcq: use devices instead of ids in HCQGraph

* fiz
2025-07-21 20:03:12 +03:00
wozeparrot
30ce16a424 feat: failing test for long keccak (#11292) 2025-07-21 12:49:23 -04:00
uuuvn
178dbf3f66 Remote scheduler changes (#11177) 2025-07-21 09:29:44 -07:00
वेदांत
e368628736 Add amin support to Tensor operations in Torch backend (#11290)
* intiger div mod fix

* Revert "intiger div mod fix"

This reverts commit d5d2f201bf.

* feat arg_min support

* tets update

* test fix
2025-07-21 09:14:08 -04:00
qazal
5eb54e2499 viz: close event streams before profiler render (#11300) 2025-07-21 15:42:31 +03:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
816c01c2d4 hcq: default copy_queue_t=None (#11297) 2025-07-21 14:45:20 +03:00
qazal
6520a7fcb6 viz: factorize event stream (#11298) 2025-07-21 14:42:00 +03:00
nimlgen
9c533e5c38 hcq: cpu prereq (#11296) 2025-07-21 13:35:18 +03:00
nimlgen
e87a42e243 hcq: prepare for windows (#11293)
* hcq: prepare for windows

* comments
2025-07-21 13:08:56 +03:00
nimlgen
df3ba0a7c0 autogen: fix imports in libusb (#11294) 2025-07-21 13:04:27 +03:00
nimlgen
dd6a2d432f hcq: default timestamp metrics is ns (#11295) 2025-07-21 12:56:30 +03:00
wozeparrot
53345ef4e2 feat: make ops_disk work on block devices (#11291) 2025-07-20 14:39:50 -07:00
qazal
3002c63b1e process replay: optionally pass tinygrad import error (#11289)
* process replay: optionally pass tinygrad import error

* gate all tinygrad internals

* s/getenv/os.getenv pre import

* diff
2025-07-20 22:57:56 +03:00
chenyu
9e3a593313 minor kernel.py cleanups [pr] (#11286) 2025-07-20 10:15:31 -04:00
quortus
5f17927a87 Shorten UOp.load method (#11285) 2025-07-20 13:48:04 +03:00
chenyu
54924f9969 type remove Union and Optional [pr] (#11283)
use `|` for consistency
2025-07-19 14:05:52 -04:00