nimlgen
d224d0ed7f
nv: fix fault info ( #8587 )
...
* nv: fix fault info
* and emu for amd
* skip if not mock
2025-01-13 14:38:43 +03:00
qazal
586e730d32
use UOp.st for kernel reduce axes ( #8499 )
...
* use UOp.st for kernel reduce axes [pr]
* do not return dict
2025-01-13 06:24:11 -05:00
qazal
7562cc0399
better test for reduce swizzle + don't use double dtype [pr] ( #8586 )
...
* better test_permute_rewrite
* use float32
2025-01-13 05:02:21 -05:00
George Hotz
df59b072db
rename to top_down_rewrite [pr] ( #8583 )
2025-01-12 18:36:38 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
05e5de6a91
ugh, remove that binary blob
2025-01-12 17:02:28 -08:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
George Hotz
d817dc10db
start on test rewrite map [pr] ( #8432 )
...
* start on test rewrite map [pr]
* chatgpt writes dumb tests
* comment out failing
* fix that test
* fix gc issue
* oh, frame 2
* remove uop mutability
* map is only the map
* simplier + more tests
* test tiny passes
* tests that need to pass
* parent test passes
* child test passes
* remove uop mutability [pr]
* test fixups
* most tests pass
* more tests pass
* lil test fixups
* them too
* fix test
* unneeded
* err, that
* fix test_hcq
* fix test failures
* fix that test
* tensor universe
* does this pass test
* Revert "does this pass test"
This reverts commit ed516b3169 .
* Revert "tensor universe"
This reverts commit c21301852a .
* test_mutate_add passes
* this can pass
* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"
This reverts commit 657822dcdc , reversing
changes made to 2a126c145b .
* Revert "test_mutate_add passes"
This reverts commit ab4fc4c78e .
* correct enough
* remove test_rewrite_map_schedule.py
* viz
* uops are immutable
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-12 13:13:51 -05:00
qazal
2f71a00236
remove PYTHONPATH=. from mypy ci [pr] ( #8578 )
2025-01-12 09:52:03 -08:00
qazal
cde18fddce
fix DEBUG=2 output for copy runners [pr] ( #8579 )
...
* fix DEBUG=2 output for copy runners [pr]
* itemsize is constant
2025-01-12 12:03:01 -05:00
eliotgolding
867004fbeb
use unravel in views_to_indexed_uops [pr] ( #8560 )
...
* use unravel in shape
* make process replay work
* earlier View.minify()
* fix
* fix tests
* mypy
* get rid of early minify
* fix
* linter
* clean and add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a
mypy for mockgpu/cuda & dsp/run ( #8575 )
2025-01-12 18:25:39 +03:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
qazal
ae241e96db
fix half4 on qcom and gpu ( #8573 )
...
* add test_setitem_half
* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038
add SINK folding from the tensor_map branch [pr] ( #8562 )
...
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0
always give BUFFER uops Buffers [pr] ( #8572 )
...
* always give BUFFER uops Buffers [pr]
* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
98c9e23560
remove global PYTHONPATH setting in CI (test.yml) [pr] ( #8568 )
...
* remove global PYTHONPATH setting in CI [pr]
* only run mypy in tinygrad/
* still needed for benchmarks
2025-01-11 12:47:50 -05:00
geohotstan
815c505e1d
fixes from adapting tvm tests ( #8570 )
2025-01-11 11:38:36 -05:00
qazal
79738d768c
do not require PYTHONPATH=. for process replay [pr] ( #8567 )
2025-01-11 09:45:34 -05:00
qazal
a70d1bf439
move print_diff to process replay [pr] ( #8566 )
...
* move print_diff to process replay [pr]
* ruff rightfully complians
2025-01-11 09:28:45 -05:00
nimlgen
2f0856c1e2
qcom: use hwinterface ( #8565 )
...
* qcom: use hwinterface
* ops
* not needed anymore
2025-01-11 17:11:23 +03:00
qazal
60503c8621
use CAPTURE_PROCESS_REPLAY=1 in CI [pr] ( #8564 )
2025-01-11 06:03:48 -05:00
nimlgen
61665a63c9
am logs to debug2 ( #8563 )
2025-01-11 13:33:18 +03:00
George Hotz
c7acd40574
more aggressive onnx const creation [pr] ( #8561 )
2025-01-10 17:38:32 -08:00
ignaciosica
8891495996
minor arg spec check on wmma ( #8525 )
2025-01-10 15:42:56 -08:00
chenyu
d09897c2aa
allow double copy [pr] ( #8559 )
...
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
George Hotz
70fa65cd95
viz fixups + scheduler option [pr] ( #8557 )
2025-01-10 15:09:31 -08:00
nimlgen
f457cb64d6
am: do not reload fw each run ( #8466 )
...
* am do not reload fw each run
* works
* comment this
* clean + comment
* warn message
* linter
* move out pci en master
* useless
* more correct
* oops
* oops
2025-01-10 23:33:38 +03:00
nimlgen
337328e409
am: fini gpu after use ( #8556 )
...
* am: fini gpu after use
* mypy
2025-01-10 21:02:34 +03:00
chenyu
6a7f971fa0
hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] ( #8553 )
2025-01-10 12:57:44 -05:00
George Hotz
cd4edc5206
hotfix: pylint ignores runtime for speed
2025-01-10 09:07:18 -08:00
nimlgen
92b59c9b7a
test_hcq limits for mockgpu not (only) ci ( #8555 )
...
* test_hcq limits for mockgpu not (only) ci
* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00
chenyu
2cbb34535c
simpler allreduce script [pr] ( #8551 )
...
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8
update and clean up allreduce script [pr] ( #8549 )
...
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
George Hotz
5720871903
onnx consts are const [pr] ( #8548 )
2025-01-09 16:09:22 -08:00
chenyu
88661cd96f
fix checking DiskBuffer is opened [pr] ( #8547 )
...
`assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first
2025-01-09 18:58:36 -05:00
George Hotz
62447c253d
viz cleanups [pr] ( #8498 )
...
* viz cleanups [pr]
* Update serve.py
2025-01-09 15:46:48 -08:00
geohotstan
299d333806
Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx ( #8478 )
...
* QLinearEverything
* ok ort verify passes
* this should be int instead
* cast to int then char to do wraparound
* cleaner
* move contrib ops to microsoft ops
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0
delete empty op ( #8544 )
...
* simple delete EMPTY op
* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2
start tests for putting the tensor graph in a single kernel [pr] ( #8542 )
...
* start tests for putting the tensor graph in a single kernel [pr]
* parallel actually
* better view_left test
* test a softmax
* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
83a8217cbf
hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] ( #8543 )
2025-01-09 11:10:15 -05:00
qazal
1efb1188d8
support pickling a realized BUFFER uop [pr] ( #8541 )
...
* try 2 at this diff
* process replay
* delete uops from buffer
* free buffers
* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
qazal
7595352dfc
refactor buffer_view op structure [pr] ( #8540 )
...
* refactor buffer_view op [pr]
* only empty now
* same st
* empty shape is fine
2025-01-09 03:07:46 -05:00
eliotgolding
4c5c32ff5f
Small bug in _reshape_mask ( #8538 )
2025-01-08 22:11:24 -05:00
nimlgen
aa3d612df2
add script to install amd mockgpu on macOS ( #8536 )
...
* upload artifact every time
* hm
* sh script
* hm
* hm2
* hm2
* hm2
* no sudo
* def paths
* small comments
* text
* try auth for bigger limits
2025-01-09 01:29:25 +03:00
nimlgen
31fcfe764d
adjust hcq test for ci macos ( #8534 )
2025-01-08 16:18:31 +03:00
qazal
49abe6d3a6
little more compact tensor_uop_spec [pr] ( #8533 )
...
* little more compact tensor_uop_spec [pr]
* space
* fix
2025-01-08 08:01:53 -05:00
patrini32
21c7d7c71a
MOCKGPU amd test on OSX ( #8505 )
...
* add tests
* Refactor
* cache only amd/comgr/build (saves a lot of space)
* fix
* silence warning and add check for cache hit before installing cmake
* run only pytest
* use actions/cache
* lower timeout-minutes and add Device.DEFAULT step
* add nvidia to Device.DEFAULT check
* typo
* fix
* Check only for amd and run only 2 test
2025-01-08 14:27:56 +03:00