Commit Graph

870 Commits

Author SHA1 Message Date
George Hotz
ba97fd0b9c hotfix: add test/external/external_benchmark_disk_raw 2025-03-02 02:32:15 +00:00
geohotstan
d9ec05cea6 Test Onnx quantization behavior (#9301)
* add DynamicDequantizeLinear and corresponding tests

* wow qlinearops are round away from zero

* this passes locally...

* again

* try

* try separate test

* round to even again

* also add QLinearMul

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-01 19:21:58 -05:00
chenyu
38d7aae3b7 onnx fmod (#9307) 2025-02-28 14:09:22 -05:00
chenyu
4342300eff lower test_gemm_8192 amd to 70 (#9277)
flaky
2025-02-26 16:32:08 -05:00
chenyu
aaf0a8069f xor -> bitwise_xor (#9264) 2025-02-26 10:21:14 -05:00
geohotstan
f0b24d230c add test_onnx_ops.py (#8569)
* boom

* fix webgpu

* use exact variable names in test so that AI can read easier

* add tag for specific test name like test a specific dtype

* fix ruff

* astype everything

* dtype in array creation

* just arange

* is 67% considered fixed?

* move test up

* small cleanups

* share function

* add qgemm as well

* add qgemm too

* make sure qgemm comes out as int

* take out qgemm for now

* fixed test

* add correct qgemm

* addressing feedback here too, early naive fix for now

* simplify bias and c to be minimalistic enough to test correctness

* refactored qlinearops

* maybe these asserts aren't the best..

* fix test

* updated tests to cover new ops

* try to add to CI

* move test_onnx_ops into testextra/

* more attention tests

* qlinear_add atol=1

* attention still not fullllllly correct

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 16:15:22 -05:00
George Hotz
c9493e41a6 reorder expand (#9051)
* reorder expand

* symbolic ops needs resolve here

* s/arg/st + whitespace

* viz

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-24 13:55:47 +01:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
George Hotz
a4dab3ec3f add name uop (#9149)
* add name uop, TODO: refactor renderer to use

* renderer uses name uop

* fix tests

* render

* ptx
2025-02-18 15:26:58 +08:00
George Hotz
df3b320f46 rewriter -> devectorizer [pr] (#9147) 2025-02-18 12:42:08 +08:00
Josh Moore
1f9d2442b9 Add Tensor.scatter_reduce (#8947)
* pytorch scatter -> scatter_reduce

* WIP scatter_reduce implementation

* _pre_scatter return type hint

* split out src, mask to satisfy linter

* Add src cast back in

* dict of lambdas instead of ifs

* sum and prod reduction ops with include_self

* add reduce arg error message

* add amax and amin reduction ops

* Fix include_self for higher dims

* Simplify

* Simplify amax and amin too

* Pull include_self logic out into _inv_mask function

* reduce arg cannot be None for scatter_reduce

* Fix self-mask issue

* Add mean reduce op

* Add tests

* any() not needed here

* remove comment

* End support for Tensor src with reduce arg in tinygrad scatter

* Process index, dim inside actual functions

* Add scatter_reduce to onnx

* Add excluded onnx ScatterElements reduction tests back in

* Save 2 lines on the mask helpers

* Update docs

* Add include_self=False tests

* cleanup

* Remove unneeded helper function

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-13 09:08:54 -05:00
chenyu
f4f56d7c15 move time_linearizer to extra.optimization.helpers [pr] (#9048)
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
nimlgen
e5a3f60fc2 am: remove libpciaccess dep (#8980)
* am: remove libpciaccess dep

* offset in mockhwiface

* op

* fake regions
2025-02-09 16:06:55 +03:00
George Hotz
ae45826758 hotfix: GRAPH_ONE_KERNEL + fix timing 2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
qazal
6f0cc2e9c5 rename to KernelContext and move the linearize_sched comment [pr] (#8899)
* rename to KernelContext and move that comment [pr]

* 500
2025-02-05 07:49:58 +01:00
qazal
6a0da51ed0 truncate process replay logs [pr] (#8891)
* truncate process replay logs [pr]

* work

* max_lines

* bump to 1K
2025-02-04 20:26:48 +01:00
qazal
acf0baefee process replay from tensor uops to kernel ast (#8883)
* process replay from tensor uops to kernel ast

* this dedups

* switch back to string key
2025-02-04 18:09:20 +01:00
George Hotz
56fa5c1191 dsp simulator (#8869)
* dsp simulator

* progress

* fix

* close on test tiny

* working

* less waste

* line savings

* Device DSP compiler

* mock DSP at the bottom

* DSP tests

* docker caching

* test update

* need load

* skip that test for CI DSP

* last touch

* ugh
2025-02-04 09:45:04 +08:00
nimlgen
7841852870 hcq pci signal fuzzer (#8854)
* hcq pci signal fuzzer

* kk

* correct
2025-02-01 23:42:27 +03:00
qazal
dc34a4146f better process_replay context print [pr] (#8856)
* better process_replay context print [pr]

* test: revert push cast

* Revert "test: revert push cast"

This reverts commit 38a2aef6f8.
2025-02-01 21:50:23 +02:00
chenyu
73ee2d74c0 raise RuntimeError for int base pow (#8852)
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
qazal
72e1f41f8e add unbind_vars pattern matcher (#8851)
* add unbind_vars pattern matcher [pr]

* this can be cvar

* this is empty
2025-02-01 18:25:44 +02:00
Ankit Avinash
7647cd8428 [bounty] Stride is flip (#8792)
* replace stride with flip

* Complete replacing stride with flip

clean flip function in view.py
fix tests

* fix tests for multi shapetracker

* fix tests for fuzz shapetracker

* fix tests for fuzz shapetracker

* debug

* debug

* fix

* fix

* fix

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-31 11:34:10 +09:00
chenyu
0513b0c17d lower green test_gemm_8192 tflops to 125 [pr] (#8820)
flaky
2025-01-30 17:30:08 -05:00
nimlgen
a2faa5e49b am: fix pt free (#8810) 2025-01-30 15:14:55 +03:00
Ignacio Sica
260df1a17f tc_select noop (#8801)
* tc_select noop

* revert changes in test
2025-01-29 13:53:23 -05:00
qazal
ed672881b0 remove additions/deletion in pr + check uops are equal [pr] (#8779)
* use warnings there [pr]

* remove those + move assert_diff [pr]

* warn after log

* remove

* back
2025-01-28 08:57:34 +02:00
Ignacio Sica
b240f12593 [TIP-9] rename Opt's amt to arg 2 (#8770)
* rename Opt amt to arg

* ignore_beam_cache for test_tiny

* move ignore_beam_cache to test_tiny

* move to separate pr

* revert space change

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-27 14:19:04 -05:00
George Hotz
3ed146a5ff Revert "rename Opt amt to arg (#8767)" (#8769)
This reverts commit bf041659a5.
2025-01-27 23:46:37 +09:00
Ignacio Sica
bf041659a5 rename Opt amt to arg (#8767) 2025-01-27 23:36:47 +09:00
nimlgen
dc10187fc0 am: add am_smi (#8739)
* am: start monitor

* cleanups

* fixes

* hmm

* progress

* cleanup
2025-01-24 20:16:19 +03:00
nimlgen
e4512baea4 am: cleanup mm (#8730)
* am: cleanup mm

* cle

* ops

* entries
2025-01-23 15:49:37 +03:00
nimlgen
93fb50ce77 allreduce: add flags (#8713) 2025-01-22 17:44:31 +03:00
qazal
2dae467b75 scheduler + process_replay import cleanup (#8711) 2025-01-22 12:44:07 +02:00
nimlgen
c5e46c5eee am: recover from any boot interrupt (#8703)
* am: recover from any load interrupt

* add fuzzer

* nu
2025-01-21 22:22:23 +03:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
nimlgen
08ca871d77 am: remove pm block (#8688)
* am: remove pm block

* hm

* oops
2025-01-20 18:05:22 +03:00
nimlgen
9d3c40601f am: fast memory manager (#8654)
* start

* progress

* fixes

* smth

* mini fixes

* fix2

* ugh, need this for now

* faster

* cleanups

* tiny linters

* make mypy happier

* test & free pts

* ops

* linter

* cleanup vm

* fix

* remove map_from

* tiny fixes

* add test to ci
2025-01-20 16:58:22 +03:00
George Hotz
98d01a059d rename uopgraph to rewriter [pr] (#8682) 2025-01-19 17:03:12 -08:00
eliotgolding
0289fbb1c2 limit real_size to the size of first View of ShapeTracker (#8628)
* fix real_size

* add fuzzer; typing

* spacing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-16 16:27:39 -05:00
chenyu
52e7003414 Revert "make kits19 dataset samples have small sizes (#8591)" (#8610)
This reverts commit 76a03e950a.
2025-01-14 12:24:27 -05:00
Francis Lata
76a03e950a make kits19 dataset samples have small sizes (#8591) 2025-01-14 08:27:45 -08:00
qazal
863abc7140 scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598)
* remove the BUF_LIMIT assert

* skip the base one

* work

* work

* good error

* ok comment

* shorter check
2025-01-14 03:01:59 -05:00
qazal
ae2229d727 assert kernel buffer limit at compile time [pr] (#8595)
* remove the BUF_LIMIT assert

* skip the base one
2025-01-13 16:32:07 -05:00
geohotstan
4abe631b56 fix onnx mobilenetv2-7-quantized.onnx (#8574)
* is 67% considered fixed?

* move test up

* share function

* add qgemm too

* make sure qgemm comes out as int

* actually that note is not right

* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
George Hotz
d19c1c7f03 bump 75 -> 73 for test failure 2025-01-13 09:18:38 -08:00
qazal
79738d768c do not require PYTHONPATH=. for process replay [pr] (#8567) 2025-01-11 09:45:34 -05:00