chenyu
e22e5da9a5
move some test_dtype tests to unit ( #11479 )
2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4
hcq: cpu can be graphed ( #11474 )
...
* hcq: cpu can be graphed
* ops
* new jit decisions
* fix test
* fix remote
* cleaner
* fix
2025-08-02 21:01:19 +03:00
kevvz
ef7e01cadf
Fix SVD shape bug + Fix batched SVD bug ( #11477 )
...
* failing test case
* fix
* better test
* space
2025-08-02 09:47:41 -07:00
qazal
fa66d9772d
viz: show const node when it's root ( #11456 )
2025-08-01 01:01:58 +03:00
Eitan Turok
cba3655de5
Add Test for Setitem ( #10559 )
...
* init
* update
* better
* failing test
* works
* Delete test file
* clean
* lint
* simplify variable name
* rm contigious, rm int dtype, and add assertEqual
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-30 22:03:41 -04:00
chenyu
4ca430e5bf
fix search dedup ( #11439 )
...
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
chenyu
d5fc6af4a2
remove unused ShapeTracker.consecutive [pr] ( #11426 )
2025-07-29 18:36:19 -04:00
George Hotz
49a2583584
real new lowerer ( #11419 )
...
* real new lowerer
* fix group for reduce
* skip missing ranges
* fix wmma and unroll/contract
* real fix for wmma
* disable that test
* fix if gate
* simpler
* flash attention fusion works
* no end barriers
* still broken
* flash attention finally works
2025-07-29 15:35:51 -07:00
chenyu
0e5d8d5c3c
remove tests that used .to_uop() ( #11425 )
...
* remove tests that used .to_uop()
* import
2025-07-29 15:52:16 -04:00
nimlgen
d38d285489
ci: add h machines ( #11416 )
...
* ci: add h machines
* more
* fix names
* names not collide
* 20
* 10
2025-07-29 19:21:51 +03:00
George Hotz
8c10085459
assert shape on lowerer store [pr] ( #11395 )
...
* assert shape on lowerer store [pr]
* fix ptx
2025-07-27 10:41:57 -07:00
George Hotz
dfeee63d30
uop matmul work ( #11388 )
...
* uop matmul work
* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061
no_vectorized_acc keeps single DEFINE_REG ( #11387 )
...
* no_vectorized_acc keeps single DEFINE_REG
* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
George Hotz
466ab5a3f2
store/load not pass through index ( #11381 )
...
* noop
* fix noop
* store cat is NOOP
* store dtype is void
* stores aren't passed through anymore
* meh, skip those for ptx
* correct ptx skip
* hl runs
2025-07-25 21:01:47 -07:00
chenyu
88c338bfcc
add kernelize to keccak for each data block ( #11370 )
...
* add kernelize to keccak for each data block
test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep
* this?
* hash stuff
* gate test
* mv
2025-07-25 16:07:20 -04:00
nimlgen
3b3de8df61
hcq: graphed copies ( #11302 )
...
* fast copies p2
* upd and fix
* graph supports
* fixes
* fixes
* fixes
* fix
* fix
* fix mockgpu
* fix alignment
* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices ( #11354 )
...
* hcq: mapping of cpu to all hcq devices
* fix kfd
* nv
* simpler
* cleaner
* correct skip
* fix ifaces
* system fixes
* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6
more keccak reference tests ( #11329 )
2025-07-23 22:06:39 -04:00
chenyu
5b570196e4
support DEV= to specify device ( #11351 )
2025-07-23 17:40:55 -04:00
George Hotz
7f0a41df4d
move optional out of devectorize [pr] ( #11350 )
...
* move optional out of devectorize [pr]
* fast idiv
2025-07-23 11:26:05 -07:00
chenyu
960da9319d
Remove StrEnum in onnx for python 3.10 ( #11345 )
...
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
nimlgen
304eb9cecb
allocate less memory in am tests ( #11342 )
2025-07-23 11:11:26 +03:00
George Hotz
e14b4fefa5
ranges on store ( #11334 )
...
* ranges on store
* fix store spec
* fix that
* fix gates
* fix tests
* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
53339e62f7
no gate store anymore ( #11338 )
...
* no gate store anymore
* fix up spec
2025-07-22 18:41:15 -07:00
George Hotz
09431d4ad1
make DEFINE_REG behave like the others ( #11273 )
...
* simpler define reg
* cast
* PTRCAT define_acc
* cleanups
* fix uops stats
* fix linearizer tests
* llvm
* define reg sets const
* define reg sets const
* no assign
* collapse that
* fix test_max_pool2d_bigger_stride_dilation
* use index, fix webgpu
* devec
* fix tests
* fix webgpu
* fix llvm
* threads for python
* fix ops_python
* only for reg
* acc_half is real now in the emulator
* fix llvm
* fix webgpu init
* fix wgpu test
* fix some tests
* fix ptx
* fix ptx bool acc
* cleanups
* broken, meh. will fix with ENDRANGE
* line count
2025-07-22 13:53:56 -07:00
chenyu
4535908679
update keccak test_long ( #11331 )
...
it should compare with arg "shake_128"
2025-07-22 16:08:01 -04:00
George Hotz
affd83961c
small changes from define_reg ( #11327 )
...
* small changes from define_reg
* fix webgpu
2025-07-22 11:11:48 -07:00
chenyu
2d7c28de6a
clean up dup lambdas in helper_test_exception ( #11325 )
2025-07-22 12:21:57 -04:00
chenyu
c6aa8e58ca
fix TestDropoutProbabilityEdgeCases ( #11322 )
2025-07-22 11:13:56 -04:00
chenyu
fb42c84365
merge TestRollEdgeCases into test_ops ( #11321 )
2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c
movementop only Tensor.roll ( #11317 )
...
* movementop only Tensor.roll
* fixed
2025-07-22 10:34:15 -04:00
chenyu
a41140241b
truncate unsigned const in cstyle ( #11318 )
...
it can be a warning or a hard error in clang
PTX and PYTHON also need fix, skipping for now
2025-07-22 08:02:12 -04:00
qazal
6668d6d241
fix word_wrap with newlines in input string [pr] ( #11319 )
2025-07-22 12:03:13 +03:00
George Hotz
3b674df34b
generic changes from define_reg_2 ( #11315 )
...
* generic changes from define_reg_2
* fix for ptx
* ugh, that one
2025-07-21 15:14:06 -07:00
chenyu
6e9506e6fd
Tensor.roll supports dims=None ( #11313 )
2025-07-21 17:29:23 -04:00
George Hotz
108aac8af4
use AddrSpace instead of local ( #11314 )
...
* use AddrSpace instead of local
* addrspace in test
2025-07-21 14:00:06 -07:00
chenyu
d3a93185a6
clean up test_roll ( #11312 )
2025-07-21 16:00:50 -04:00
George Hotz
532b52fcef
store has a dtype, like assign ( #11309 )
...
* store has a dtype, like assign
* fix upat
* fix test
2025-07-21 12:50:01 -07:00
geohotstan
445ff8de56
ONNX onnx_parser and buffer_parse clean up ( #11000 )
...
* start
* remove onnx.load from compile4 and move np to dropout
* clean up and enable test
* clean up
* move WebGPU ONNX test into MacOS (WebGPU)
* leave test in ONNX (CPU)
* fix raw_data init None, and simplify onnx_runner test a little?
* THESE TESTS ARE SO UGLY UGHH
* need to really think about how to structure the test
* wow LLMs are quite something
* not always on disk now
* also add external data loading test
* cleaner tests
* minimize diff and add const folding tests
* add external data loading too
* whoops add webgpu back.. but why was it not needed in the first place?
* better comment
* move webgpu test to macos(webgpu)?
* llm english so much better than me wow
* trigger CI to check flakiness
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab
rename kernelize to schedule, try 2 ( #11305 )
2025-07-21 11:18:36 -07:00
wozeparrot
30ce16a424
feat: failing test for long keccak ( #11292 )
2025-07-21 12:49:23 -04:00
uuuvn
178dbf3f66
Remote scheduler changes ( #11177 )
2025-07-21 09:29:44 -07:00
nimlgen
cc3c1e4c14
hcq: move cpu to hcq ( #11262 )
...
* hcq: move cpu to hcq
* import time
* upd
* fix
* windows support
* hm
* cleaner
* fix timer
* fix timing
* std is ns
* skip profiler
* mypy
* cleaner
* cleanups
* after merge
* default is back
2025-07-21 15:10:38 +03:00
qazal
3002c63b1e
process replay: optionally pass tinygrad import error ( #11289 )
...
* process replay: optionally pass tinygrad import error
* gate all tinygrad internals
* s/getenv/os.getenv pre import
* diff
2025-07-20 22:57:56 +03:00
chenyu
54924f9969
type remove Union and Optional [pr] ( #11283 )
...
use `|` for consistency
2025-07-19 14:05:52 -04:00
nimlgen
188ed38315
replace from_mv with lightweight mv_address ( #11280 )
2025-07-19 13:50:51 +03:00
chenyu
ec3efd2919
move upcast before reduce ( #11250 )
...
* move upcast before reduce
upcast goes to end of global+local+upcast
* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
nimlgen
9a88bd841c
hcq: refactor into peer_groups ( #11277 )
...
* hcq: refactor into peer_groups
* fix fors
* fixes
* ooops
* mypy
* tiny fixes
2025-07-18 16:34:18 +03:00
chenyu
c5a5d74642
Revert "image_dot of 2 half inputs returns half ( #11007 )" ( #11274 )
...
This reverts commit fa8e08f922 .
2025-07-17 17:34:18 -04:00
Utkarsh Gill
fa8e08f922
image_dot of 2 half inputs returns half ( #11007 )
...
* cast after sum
* comment out skipif
* minor fix
* only test IMAGE
* IMAGE is supported now
* simpler
* simplerr
* only cast if dtype is None
* dont need to change base_imaeg_type
* only cast when dtype is half
* add explicit test
* actually no, workflow seems better
* actually, keep both
* move test
* fix indent
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-07-17 13:47:22 -07:00