nimlgen
c2504357af
am: lock to access dev ( #8594 )
...
* amm lock to access dev
* wording
* just works
* disbale
2025-01-13 23:53:13 +03:00
geohotstan
4abe631b56
fix onnx mobilenetv2-7-quantized.onnx ( #8574 )
...
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
George Hotz
d19c1c7f03
bump 75 -> 73 for test failure
2025-01-13 09:18:38 -08:00
Francis Lata
c25d5d3101
improve isin checks ( #8589 )
2025-01-13 12:12:31 -05:00
nimlgen
74b83c4c41
am in ci ( #8532 )
...
* try am in ci
* no sudo
* temp
* run more am test
* run half on am
* insert amdgpu
* other machine as well
2025-01-13 19:55:17 +03:00
nimlgen
d224d0ed7f
nv: fix fault info ( #8587 )
...
* nv: fix fault info
* and emu for amd
* skip if not mock
2025-01-13 14:38:43 +03:00
qazal
586e730d32
use UOp.st for kernel reduce axes ( #8499 )
...
* use UOp.st for kernel reduce axes [pr]
* do not return dict
2025-01-13 06:24:11 -05:00
qazal
7562cc0399
better test for reduce swizzle + don't use double dtype [pr] ( #8586 )
...
* better test_permute_rewrite
* use float32
2025-01-13 05:02:21 -05:00
George Hotz
df59b072db
rename to top_down_rewrite [pr] ( #8583 )
2025-01-12 18:36:38 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
05e5de6a91
ugh, remove that binary blob
2025-01-12 17:02:28 -08:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
George Hotz
d817dc10db
start on test rewrite map [pr] ( #8432 )
...
* start on test rewrite map [pr]
* chatgpt writes dumb tests
* comment out failing
* fix that test
* fix gc issue
* oh, frame 2
* remove uop mutability
* map is only the map
* simplier + more tests
* test tiny passes
* tests that need to pass
* parent test passes
* child test passes
* remove uop mutability [pr]
* test fixups
* most tests pass
* more tests pass
* lil test fixups
* them too
* fix test
* unneeded
* err, that
* fix test_hcq
* fix test failures
* fix that test
* tensor universe
* does this pass test
* Revert "does this pass test"
This reverts commit ed516b3169 .
* Revert "tensor universe"
This reverts commit c21301852a .
* test_mutate_add passes
* this can pass
* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"
This reverts commit 657822dcdc , reversing
changes made to 2a126c145b .
* Revert "test_mutate_add passes"
This reverts commit ab4fc4c78e .
* correct enough
* remove test_rewrite_map_schedule.py
* viz
* uops are immutable
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-12 13:13:51 -05:00
qazal
2f71a00236
remove PYTHONPATH=. from mypy ci [pr] ( #8578 )
2025-01-12 09:52:03 -08:00
qazal
cde18fddce
fix DEBUG=2 output for copy runners [pr] ( #8579 )
...
* fix DEBUG=2 output for copy runners [pr]
* itemsize is constant
2025-01-12 12:03:01 -05:00
eliotgolding
867004fbeb
use unravel in views_to_indexed_uops [pr] ( #8560 )
...
* use unravel in shape
* make process replay work
* earlier View.minify()
* fix
* fix tests
* mypy
* get rid of early minify
* fix
* linter
* clean and add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a
mypy for mockgpu/cuda & dsp/run ( #8575 )
2025-01-12 18:25:39 +03:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
qazal
ae241e96db
fix half4 on qcom and gpu ( #8573 )
...
* add test_setitem_half
* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038
add SINK folding from the tensor_map branch [pr] ( #8562 )
...
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0
always give BUFFER uops Buffers [pr] ( #8572 )
...
* always give BUFFER uops Buffers [pr]
* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
98c9e23560
remove global PYTHONPATH setting in CI (test.yml) [pr] ( #8568 )
...
* remove global PYTHONPATH setting in CI [pr]
* only run mypy in tinygrad/
* still needed for benchmarks
2025-01-11 12:47:50 -05:00
geohotstan
815c505e1d
fixes from adapting tvm tests ( #8570 )
2025-01-11 11:38:36 -05:00
qazal
79738d768c
do not require PYTHONPATH=. for process replay [pr] ( #8567 )
2025-01-11 09:45:34 -05:00
qazal
a70d1bf439
move print_diff to process replay [pr] ( #8566 )
...
* move print_diff to process replay [pr]
* ruff rightfully complians
2025-01-11 09:28:45 -05:00
nimlgen
2f0856c1e2
qcom: use hwinterface ( #8565 )
...
* qcom: use hwinterface
* ops
* not needed anymore
2025-01-11 17:11:23 +03:00
qazal
60503c8621
use CAPTURE_PROCESS_REPLAY=1 in CI [pr] ( #8564 )
2025-01-11 06:03:48 -05:00
nimlgen
61665a63c9
am logs to debug2 ( #8563 )
2025-01-11 13:33:18 +03:00
George Hotz
c7acd40574
more aggressive onnx const creation [pr] ( #8561 )
2025-01-10 17:38:32 -08:00
ignaciosica
8891495996
minor arg spec check on wmma ( #8525 )
2025-01-10 15:42:56 -08:00
chenyu
d09897c2aa
allow double copy [pr] ( #8559 )
...
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
George Hotz
70fa65cd95
viz fixups + scheduler option [pr] ( #8557 )
2025-01-10 15:09:31 -08:00
nimlgen
f457cb64d6
am: do not reload fw each run ( #8466 )
...
* am do not reload fw each run
* works
* comment this
* clean + comment
* warn message
* linter
* move out pci en master
* useless
* more correct
* oops
* oops
2025-01-10 23:33:38 +03:00
nimlgen
337328e409
am: fini gpu after use ( #8556 )
...
* am: fini gpu after use
* mypy
2025-01-10 21:02:34 +03:00
chenyu
6a7f971fa0
hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] ( #8553 )
2025-01-10 12:57:44 -05:00
George Hotz
cd4edc5206
hotfix: pylint ignores runtime for speed
2025-01-10 09:07:18 -08:00
nimlgen
92b59c9b7a
test_hcq limits for mockgpu not (only) ci ( #8555 )
...
* test_hcq limits for mockgpu not (only) ci
* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00
chenyu
2cbb34535c
simpler allreduce script [pr] ( #8551 )
...
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8
update and clean up allreduce script [pr] ( #8549 )
...
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
George Hotz
5720871903
onnx consts are const [pr] ( #8548 )
2025-01-09 16:09:22 -08:00
chenyu
88661cd96f
fix checking DiskBuffer is opened [pr] ( #8547 )
...
`assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first
2025-01-09 18:58:36 -05:00
George Hotz
62447c253d
viz cleanups [pr] ( #8498 )
...
* viz cleanups [pr]
* Update serve.py
2025-01-09 15:46:48 -08:00
geohotstan
299d333806
Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx ( #8478 )
...
* QLinearEverything
* ok ort verify passes
* this should be int instead
* cast to int then char to do wraparound
* cleaner
* move contrib ops to microsoft ops
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0
delete empty op ( #8544 )
...
* simple delete EMPTY op
* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2
start tests for putting the tensor graph in a single kernel [pr] ( #8542 )
...
* start tests for putting the tensor graph in a single kernel [pr]
* parallel actually
* better view_left test
* test a softmax
* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
83a8217cbf
hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] ( #8543 )
2025-01-09 11:10:15 -05:00
qazal
1efb1188d8
support pickling a realized BUFFER uop [pr] ( #8541 )
...
* try 2 at this diff
* process replay
* delete uops from buffer
* free buffers
* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
qazal
7595352dfc
refactor buffer_view op structure [pr] ( #8540 )
...
* refactor buffer_view op [pr]
* only empty now
* same st
* empty shape is fine
2025-01-09 03:07:46 -05:00