Commit Graph

932 Commits

Author SHA1 Message Date
George Hotz
45bac1eee8 fix test amx 2025-09-04 11:07:17 -07:00
George Hotz
758a1888d6 make EMULATE a context var 2025-09-04 11:03:32 -07:00
chenyu
ca7574cb2d ci set PYTHONPATH for all (#11997) 2025-09-04 10:06:04 -04:00
George Hotz
5cf42dc4db add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt

* support old

* fix beam

* beaming

* beam on old

* bring tensor cores back

* raise

* postbeam

* test ops passes on mac

* skip that

* postopt default

* gate that

* fix tensor cores

* a few test fixes

* dsp fix

* tc fix

* loop

* support swap

* test_gemv

* fix beam for variable

* test opts from high level stuff

* range annoying

* compile slow

* metal slow

* better beam

* no POSTBEAM

* fix nolocals

* hc opt mostly works

* put that back

* lil

* some work

* fix that

* POSTOPT 2

* fix tests

* no postopt 2

* work

* back

* padded tensors cores

* shift_to

* postopt 0 passes?

* write PADTO

* fix padded tensor cores

* compare hcopt

* 18000 lines

* should pass tests

* fix rangeify

* put types back
2025-09-03 19:23:30 -07:00
chenyu
e921fb44ee clean up testnvidia env (#11969) 2025-09-02 18:29:00 -04:00
nimlgen
897254ad6c ci: add dev<->cpu copy speeds (#11959) 2025-09-02 15:22:44 +03:00
nimlgen
a4f05ebd1a ci: rebuild gpuocelot with boost libs (#11920) 2025-08-30 17:24:19 +03:00
nimlgen
cf9d8c8142 ci: pin boost for macos runners (#11910) 2025-08-30 01:38:06 +03:00
nimlgen
e8289c75b1 ci: do not reinstall existing pkgs in macos (#11900) 2025-08-28 21:20:15 +03:00
chenyu
134cf56904 update cache name for gpuocelot (#11896) 2025-08-28 13:11:10 -04:00
Jordan Chalupka
4785cd959a [TYPED=1] cvar should allow dtype as a tuple (#11770)
* cvar dtype:DType|tuple[DType, ...]|None=None

* fmt

* add a test

* list typeguard as a dep for CI

* extra step to install mypy

* fix venv

* ci fixes

* mv typeguard to testing install group

* simpler TYPED=1 test

* add typeguard to lint group
2025-08-26 12:49:51 -04:00
George Hotz
66e9d54eed RANGEIFY=2 is partial contig (#11777) 2025-08-21 16:53:58 -07:00
George Hotz
5954a0975f fix some assigns on rangeify (#11774)
* fix some assigns

* llvm test

* more tests

* upd test
2025-08-21 15:15:54 -07:00
George Hotz
d6f9606e93 small cleanups to rangeify (#11769) 2025-08-21 11:15:09 -07:00
chenyu
5276fbc9c5 fix gather with inf values (#11760)
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
George Hotz
9635592141 ** rangeify, try 3 (#11683)
* ** rangeify, try 3

* bring that over

* bufferize, don't use contig tag

* work

* ish

* fix rangeify

* flash attention is back

* fix rangeify tests

* stuff passes

* fix test_log_softmax

* more stuff passes

* progress children

* new endrange solution

* progress

* progress counter

* basic assign

* contigs only

* symbolic in schedule

* unbind_kernel

* late children

* ops fixed

* beautiful mnist is close

* that seems to work

* mnist works

* improve names

* fix bmnist

* no pcontig

* testing backward

* work

* clone movement ops

* new_range helper

* MBLOCK/MERGE

* ops tests pass

* revert mblock stuff

* cleanups...but it breaks ops

* remove reindex

* hack for relu

* disable the hacks

* more hacks

* upd

* mostly works with cleanups disabled

* ndr

* ops tests pass

* terrible hacks for indexing to work

* context mismatch

* pcontig

* split pcontig v contig

* z3 trunc

* null

* no fuse in rangeify

* ops test passes

* lnorm

* fix assign

* nd rangeify

* both should work

* tests for rangeify

* cleanups

* stores pass the pointer through

* disable pcontig for now

* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00
George Hotz
8af8808c61 cleanup tests, bump caches (#11746) 2025-08-19 21:21:07 -07:00
George Hotz
1d307f568c move device tests to test/device + test cleanups (#11735)
* move device tests to test/device

* test speedups

* test device

* linalg to unit

* upd

* so pytest just works

* more divide and skip

* speed

* test devectorize

* add pillow
2025-08-19 16:02:20 -07:00
George Hotz
2ea54d7337 improve syntax of UPats using f [pr] (#11717)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-18 20:49:45 -04:00
George Hotz
4afa0b86bb hotfix: ls -lh on wheel size 2025-08-18 11:52:59 -07:00
chenyu
c10e4c4e20 print wheel build size (#11714) 2025-08-18 14:29:47 -04:00
chenyu
d0d39885c3 onnx in tinygrad (#11675) 2025-08-14 19:57:21 -04:00
wozeparrot
71260a5ea4 feat: only bench openpilot 0.9.9 models (#11664) 2025-08-14 19:27:18 -04:00
chenyu
48c4033ae1 fix pylint for onnx (#11673)
* fix pylint for onnx

* too long
2025-08-14 18:48:02 -04:00
geohotstan
1e904155e3 Add Onnx Huggingface to test/models/test_onnx.py (#11468)
* BOOM

* cache extra/huggingface/models/

* why max buffer size is not 0

* override MAX_BUFFER_SIZE

* less models

* remove more models and change cache dir to already cached dir

* only metal

* less is more?

* remove check ops

* why is this not setting the ENVVAR

* ughhhhh just test in models

* only cpu and gpu

* only cpu actually

* just override it idk

* final

* move extra dependencies up top

* simplification

* fix print

* make README better

* revert ops_disk fix for now

* clean up test_onnx

* remove testing fashion clip model cuz sloooowwwwww

* actually let METAL run this

* fix comment mistake

* fix download path in run_models

* does this work?

* cleanup setup and teardown

* contextvar like this?

* prove model is cached

* do I need to increment DOWNLOAD_CACHE_VERSION?

* see if cached with incremented DOWNLOAD_CACHE_VERSION

* use warnings to see if the model exists

* revert DOWNLOAD_CACHE_VERSION stuff and clean up

* add retry to download

* nit
2025-08-14 11:16:41 -04:00
ttomsa
ae0c3cfff6 change clang -march flag to -mcpu on arm (#10970)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
nimlgen
5403a4aeaf null dev: support offset on buffers (#11606)
* null dev: support offset on buffers

* nolimit
2025-08-10 21:58:37 +03:00
chenyu
dd3d2eb36c add training llama3 test in ci (#11599) 2025-08-09 22:35:39 -04:00
chenyu
b232c60def benchmark openpilot 0.9.9 (#11575)
* benchmark openpilot 0.9.9

not sure what to do with the 0.9.7 ones with IMAGE=2 and validate

* name
2025-08-08 01:26:14 -04:00
chenyu
702e38dc19 remove FUSE_ARANGE_UINT (#11567)
also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset
2025-08-07 16:49:06 -04:00
chenyu
594cbdc66f skip AM ResNet50 benchmark (#11565)
hanging with FUSE_ARANGE?
2025-08-07 14:07:01 -04:00
chenyu
7ee3770961 FUSE_ARANGE=1 (#11427)
* FUSE_ARANGE=1

* fix test

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-07 13:32:34 -04:00
George Hotz
21570545d3 move view pushing to codegen, try 2 (#11534)
* move view pushing to codegen, try 2

* fix up some linearizer tests

* fix test search

* fix test schedule

* delete that test

* fix test arange

* fix a few tests

* update tests

* push views

* ebs cleanup

* fix local/reg

* test and lint

* fix more tests

* test cleanups

* skipped that one
2025-08-06 15:58:38 -07:00
George Hotz
4fe11725c6 pass through sink arg, update linearizer test (#11536)
* pass through sink arg, update linearizer test

* get_program help

* bump line count

* use new api
2025-08-06 09:48:48 -07:00
geohotstan
1163292759 move onnx_parser into onnx (#11530) 2025-08-06 10:46:27 -04:00
nimlgen
1afb290027 ci: fix runner in nv (#11527) 2025-08-06 10:38:04 +03:00
chenyu
c9225d22ce only disable flaky test_jit_multidev_xfer (#11523) 2025-08-05 22:17:25 -04:00
George Hotz
f58fd3143d cleanup fix_kernel (#11520)
* cleanup fix_kernel

* early load buffer

* early meta ops

* move those to fix_kernel_ops

* fix tests

* remote metal was flaky

* Revert "fix tests"

This reverts commit a27019383d.

* that hack broke things

* fine for ptx
2025-08-05 18:38:43 -07:00
chenyu
3f742a5a7c comma space lab models benchmark (#11461) 2025-07-31 19:06:18 -04:00
wozeparrot
d3da20eca6 feat: bump mlperf workflow timeout to 6 hours (#11440) 2025-07-30 14:12:12 -07:00
nimlgen
5fc5bb5237 ci: clear processes (#11434)
* unified hcq_smi for managment

* fix

* fix

* no reset for amd
2025-07-30 22:15:18 +03:00
nimlgen
4b4ba5454c ci: move driver start higher (#11431) 2025-07-30 10:48:38 +03:00
chenyu
204da24cfc increase driverbenchmark timeout-minutes to 15 (#11428) 2025-07-29 19:45:05 -04:00
nimlgen
c88e401d0e ci: fix typos in h machine benchmarks (#11423) 2025-07-29 22:11:47 +03:00
George Hotz
1f1f99c287 hotfix: add DEBUG=3 to driver CI 2025-07-29 11:03:47 -07:00
nimlgen
d38d285489 ci: add h machines (#11416)
* ci: add h machines

* more

* fix names

* names not collide

* 20

* 10
2025-07-29 19:21:51 +03:00
Tom Clesius
2568bc0d99 ci: add caching for apt packages (#11162)
* add caching for apt packages

* remove 'inputs' from apt cache key, use outputs instead of env

* remove unnecessary mkdir for partial

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-29 09:04:56 -07:00
uuuvn
052191eae4 Remote multihost (p2p with infiniband verbs) (#9746)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-27 14:44:32 -07:00
uuuvn
76a2ddbd78 Move remote tests out of onnx (#11310)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-23 13:25:55 -07:00
chenyu
86e7504111 mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00