Commit Graph

829 Commits

Author SHA1 Message Date
chenyu
2bd1fff79c ci GPU misc cleanups (#12078) 2025-09-08 16:47:29 -04:00
chenyu
1781d5bced remove PYTHONPATH in test.yml (#12077)
set globally already
2025-09-08 15:41:47 -04:00
chenyu
11213398b9 reorder amdremote in test yml (#12073) 2025-09-08 13:43:04 -04:00
nimlgen
10ac427aaa cpu threading (#11951)
* start cpu threading

* fix

* fix2

* fix

* hacks?

* threads

* minor

* no dsp

* dsp 2

* n

* more

* test

* xm

* cleaner

* readable

* f

* reorder

* when no threads

* rangeify

* typos

* not needed

* reapply

* remoev this

* linter

* fixed cpu count in ci

* fix

* fixes

* rm

* typo

* sort based on speed

* test if test works in ci

* Revert "test if test works in ci"

This reverts commit 1f05edb531.

* do not pad thread
2025-09-06 16:13:43 +03:00
Jordan Chalupka
48ec5efad9 only run autogen tests on change (#12049)
* only run autogen tests on change

* example change

* rm example change
2025-09-05 23:53:01 -07:00
George Hotz
0123c394e5 early simplfy_merge_adjacent (#12045)
* do simplify_merge_adjacent before schedule

* do simplify_merge_adjacent before schedule

* disable that slow test
2025-09-05 16:39:20 -07:00
George Hotz
38dcadf07b delete kernel.py (#12040)
* delete kernel.py

* delete that file

* rip and tear

* don't test search

* imports

* fix torch frontend

* not a part of regen
2025-09-05 15:52:07 -07:00
George Hotz
433581f8ed make POSTOPT=2 the default (#12034)
* make POSTOPT=2 the default

* more matching tc

* fix winograd

* fix that test

* add matvec to Scheduler

* flip tc sort order

* similar speed

* fix beam on image

* disable slow tests

* slow
2025-09-05 14:34:05 -07:00
chenyu
a340723bf1 SKIP_SLOW_TEST=1 for nv CI (#12031) 2025-09-05 11:52:02 -04:00
chenyu
ce7163e9b4 clean up skip slow tests in PYTHON (#12028)
skip with SKIP_SLOW_TEST and decorators
2025-09-05 11:35:26 -04:00
chenyu
5dcc4c7f1b skip test_linalg in windows unit test (#12030) 2025-09-05 11:28:40 -04:00
chenyu
677220ae7e test_tesnor_data to unit/ (#12013) 2025-09-04 19:58:27 -04:00
George Hotz
560df206cc split tc test (#12003)
* split tc test

* split hand coded opts

* remove some skipped tests

* skips on emulated
2025-09-04 11:47:56 -07:00
George Hotz
9dee724fc4 make EMULATE a context var (#12002)
* make EMULATE a context var

* fix test amx
2025-09-04 11:15:43 -07:00
chenyu
ca7574cb2d ci set PYTHONPATH for all (#11997) 2025-09-04 10:06:04 -04:00
George Hotz
5cf42dc4db add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt

* support old

* fix beam

* beaming

* beam on old

* bring tensor cores back

* raise

* postbeam

* test ops passes on mac

* skip that

* postopt default

* gate that

* fix tensor cores

* a few test fixes

* dsp fix

* tc fix

* loop

* support swap

* test_gemv

* fix beam for variable

* test opts from high level stuff

* range annoying

* compile slow

* metal slow

* better beam

* no POSTBEAM

* fix nolocals

* hc opt mostly works

* put that back

* lil

* some work

* fix that

* POSTOPT 2

* fix tests

* no postopt 2

* work

* back

* padded tensors cores

* shift_to

* postopt 0 passes?

* write PADTO

* fix padded tensor cores

* compare hcopt

* 18000 lines

* should pass tests

* fix rangeify

* put types back
2025-09-03 19:23:30 -07:00
chenyu
e921fb44ee clean up testnvidia env (#11969) 2025-09-02 18:29:00 -04:00
Jordan Chalupka
4785cd959a [TYPED=1] cvar should allow dtype as a tuple (#11770)
* cvar dtype:DType|tuple[DType, ...]|None=None

* fmt

* add a test

* list typeguard as a dep for CI

* extra step to install mypy

* fix venv

* ci fixes

* mv typeguard to testing install group

* simpler TYPED=1 test

* add typeguard to lint group
2025-08-26 12:49:51 -04:00
George Hotz
66e9d54eed RANGEIFY=2 is partial contig (#11777) 2025-08-21 16:53:58 -07:00
George Hotz
5954a0975f fix some assigns on rangeify (#11774)
* fix some assigns

* llvm test

* more tests

* upd test
2025-08-21 15:15:54 -07:00
George Hotz
d6f9606e93 small cleanups to rangeify (#11769) 2025-08-21 11:15:09 -07:00
chenyu
5276fbc9c5 fix gather with inf values (#11760)
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
George Hotz
9635592141 ** rangeify, try 3 (#11683)
* ** rangeify, try 3

* bring that over

* bufferize, don't use contig tag

* work

* ish

* fix rangeify

* flash attention is back

* fix rangeify tests

* stuff passes

* fix test_log_softmax

* more stuff passes

* progress children

* new endrange solution

* progress

* progress counter

* basic assign

* contigs only

* symbolic in schedule

* unbind_kernel

* late children

* ops fixed

* beautiful mnist is close

* that seems to work

* mnist works

* improve names

* fix bmnist

* no pcontig

* testing backward

* work

* clone movement ops

* new_range helper

* MBLOCK/MERGE

* ops tests pass

* revert mblock stuff

* cleanups...but it breaks ops

* remove reindex

* hack for relu

* disable the hacks

* more hacks

* upd

* mostly works with cleanups disabled

* ndr

* ops tests pass

* terrible hacks for indexing to work

* context mismatch

* pcontig

* split pcontig v contig

* z3 trunc

* null

* no fuse in rangeify

* ops test passes

* lnorm

* fix assign

* nd rangeify

* both should work

* tests for rangeify

* cleanups

* stores pass the pointer through

* disable pcontig for now

* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00
George Hotz
8af8808c61 cleanup tests, bump caches (#11746) 2025-08-19 21:21:07 -07:00
George Hotz
1d307f568c move device tests to test/device + test cleanups (#11735)
* move device tests to test/device

* test speedups

* test device

* linalg to unit

* upd

* so pytest just works

* more divide and skip

* speed

* test devectorize

* add pillow
2025-08-19 16:02:20 -07:00
George Hotz
2ea54d7337 improve syntax of UPats using f [pr] (#11717)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-18 20:49:45 -04:00
George Hotz
4afa0b86bb hotfix: ls -lh on wheel size 2025-08-18 11:52:59 -07:00
chenyu
c10e4c4e20 print wheel build size (#11714) 2025-08-18 14:29:47 -04:00
chenyu
d0d39885c3 onnx in tinygrad (#11675) 2025-08-14 19:57:21 -04:00
chenyu
48c4033ae1 fix pylint for onnx (#11673)
* fix pylint for onnx

* too long
2025-08-14 18:48:02 -04:00
ttomsa
ae0c3cfff6 change clang -march flag to -mcpu on arm (#10970)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
nimlgen
5403a4aeaf null dev: support offset on buffers (#11606)
* null dev: support offset on buffers

* nolimit
2025-08-10 21:58:37 +03:00
chenyu
dd3d2eb36c add training llama3 test in ci (#11599) 2025-08-09 22:35:39 -04:00
chenyu
7ee3770961 FUSE_ARANGE=1 (#11427)
* FUSE_ARANGE=1

* fix test

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-07 13:32:34 -04:00
George Hotz
21570545d3 move view pushing to codegen, try 2 (#11534)
* move view pushing to codegen, try 2

* fix up some linearizer tests

* fix test search

* fix test schedule

* delete that test

* fix test arange

* fix a few tests

* update tests

* push views

* ebs cleanup

* fix local/reg

* test and lint

* fix more tests

* test cleanups

* skipped that one
2025-08-06 15:58:38 -07:00
George Hotz
4fe11725c6 pass through sink arg, update linearizer test (#11536)
* pass through sink arg, update linearizer test

* get_program help

* bump line count

* use new api
2025-08-06 09:48:48 -07:00
geohotstan
1163292759 move onnx_parser into onnx (#11530) 2025-08-06 10:46:27 -04:00
chenyu
c9225d22ce only disable flaky test_jit_multidev_xfer (#11523) 2025-08-05 22:17:25 -04:00
George Hotz
f58fd3143d cleanup fix_kernel (#11520)
* cleanup fix_kernel

* early load buffer

* early meta ops

* move those to fix_kernel_ops

* fix tests

* remote metal was flaky

* Revert "fix tests"

This reverts commit a27019383d.

* that hack broke things

* fine for ptx
2025-08-05 18:38:43 -07:00
uuuvn
052191eae4 Remote multihost (p2p with infiniband verbs) (#9746)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-27 14:44:32 -07:00
uuuvn
76a2ddbd78 Move remote tests out of onnx (#11310)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-23 13:25:55 -07:00
chenyu
86e7504111 mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
chenyu
7a9a5cfd28 isolate test/external/external_test_am.py (#11335)
seems to be the one crashing, also remove -n=auto for that
2025-07-22 19:02:20 -04:00
George Hotz
09431d4ad1 make DEFINE_REG behave like the others (#11273)
* simpler define reg

* cast

* PTRCAT define_acc

* cleanups

* fix uops stats

* fix linearizer tests

* llvm

* define reg sets const

* define reg sets const

* no assign

* collapse that

* fix test_max_pool2d_bigger_stride_dilation

* use index, fix webgpu

* devec

* fix tests

* fix webgpu

* fix llvm

* threads for python

* fix ops_python

* only for reg

* acc_half is real now in the emulator

* fix llvm

* fix webgpu init

* fix wgpu test

* fix some tests

* fix ptx

* fix ptx bool acc

* cleanups

* broken, meh. will fix with ENDRANGE

* line count
2025-07-22 13:53:56 -07:00
George Hotz
affd83961c small changes from define_reg (#11327)
* small changes from define_reg

* fix webgpu
2025-07-22 11:11:48 -07:00
qazal
0c4e19f270 hotfix: disable process replay in REMOTE=1 tests (#11320)
* hotfix: disable process replay in REMOTE=1 tests

* comment
2025-07-22 10:41:58 +03:00
geohotstan
445ff8de56 ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
chenyu
be2f4336e6 use onnx 1.18.0 in DSP test (#11279) 2025-07-18 14:09:23 -04:00
chenyu
c5a5d74642 Revert "image_dot of 2 half inputs returns half (#11007)" (#11274)
This reverts commit fa8e08f922.
2025-07-17 17:34:18 -04:00