Commit Graph

7491 Commits

Author SHA1 Message Date
George Hotz
df59b072db rename to top_down_rewrite [pr] (#8583) 2025-01-12 18:36:38 -08:00
chenyu
994944920b simpler batch_load_train_bert [pr] (#8582)
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
05e5de6a91 ugh, remove that binary blob 2025-01-12 17:02:28 -08:00
George Hotz
4ac4c1415a free intermediate buffers in the jit [pr] (#8581)
* free intermediate buffers in the jit [pr]

* intermediates_freed

* deallocate if not allocated

* self._first_run is simpler
2025-01-12 15:41:41 -08:00
George Hotz
d817dc10db start on test rewrite map [pr] (#8432)
* start on test rewrite map [pr]

* chatgpt writes dumb tests

* comment out failing

* fix that test

* fix gc issue

* oh, frame 2

* remove uop mutability

* map is only the map

* simplier + more tests

* test tiny passes

* tests that need to pass

* parent test passes

* child test passes

* remove uop mutability [pr]

* test fixups

* most tests pass

* more tests pass

* lil test fixups

* them too

* fix test

* unneeded

* err, that

* fix test_hcq

* fix test failures

* fix that test

* tensor universe

* does this pass test

* Revert "does this pass test"

This reverts commit ed516b3169.

* Revert "tensor universe"

This reverts commit c21301852a.

* test_mutate_add passes

* this can pass

* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"

This reverts commit 657822dcdc, reversing
changes made to 2a126c145b.

* Revert "test_mutate_add passes"

This reverts commit ab4fc4c78e.

* correct enough

* remove test_rewrite_map_schedule.py

* viz

* uops are immutable

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-12 13:13:51 -05:00
qazal
2f71a00236 remove PYTHONPATH=. from mypy ci [pr] (#8578) 2025-01-12 09:52:03 -08:00
qazal
cde18fddce fix DEBUG=2 output for copy runners [pr] (#8579)
* fix DEBUG=2 output for copy runners [pr]

* itemsize is constant
2025-01-12 12:03:01 -05:00
eliotgolding
867004fbeb use unravel in views_to_indexed_uops [pr] (#8560)
* use unravel in shape

* make process replay work

* earlier View.minify()

* fix

* fix tests

* mypy

* get rid of early minify

* fix

* linter

* clean and add test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a mypy for mockgpu/cuda & dsp/run (#8575) 2025-01-12 18:25:39 +03:00
chenyu
def90b22f6 EVAL_BS=36 for bert [pr] (#8576)
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
qazal
ae241e96db fix half4 on qcom and gpu (#8573)
* add test_setitem_half

* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038 add SINK folding from the tensor_map branch [pr] (#8562)
* delete is_constant from the scheduler

* add sink folding

* always give BUFFER uops Buffers [pr]

* spec for view, var (bind) and const

* add test_buffer_only_after_realize

* work

* 3 lines

* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0 always give BUFFER uops Buffers [pr] (#8572)
* always give BUFFER uops Buffers [pr]

* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
98c9e23560 remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568)
* remove global PYTHONPATH setting in CI [pr]

* only run mypy in tinygrad/

* still needed for benchmarks
2025-01-11 12:47:50 -05:00
geohotstan
815c505e1d fixes from adapting tvm tests (#8570) 2025-01-11 11:38:36 -05:00
qazal
79738d768c do not require PYTHONPATH=. for process replay [pr] (#8567) 2025-01-11 09:45:34 -05:00
qazal
a70d1bf439 move print_diff to process replay [pr] (#8566)
* move print_diff to process replay [pr]

* ruff rightfully complians
2025-01-11 09:28:45 -05:00
nimlgen
2f0856c1e2 qcom: use hwinterface (#8565)
* qcom: use hwinterface

* ops

* not needed anymore
2025-01-11 17:11:23 +03:00
qazal
60503c8621 use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564) 2025-01-11 06:03:48 -05:00
nimlgen
61665a63c9 am logs to debug2 (#8563) 2025-01-11 13:33:18 +03:00
George Hotz
c7acd40574 more aggressive onnx const creation [pr] (#8561) 2025-01-10 17:38:32 -08:00
ignaciosica
8891495996 minor arg spec check on wmma (#8525) 2025-01-10 15:42:56 -08:00
chenyu
d09897c2aa allow double copy [pr] (#8559)
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
George Hotz
70fa65cd95 viz fixups + scheduler option [pr] (#8557) 2025-01-10 15:09:31 -08:00
nimlgen
f457cb64d6 am: do not reload fw each run (#8466)
* am do not reload fw each run

* works

* comment this

* clean + comment

* warn message

* linter

* move out pci en master

* useless

* more correct

* oops

* oops
2025-01-10 23:33:38 +03:00
nimlgen
337328e409 am: fini gpu after use (#8556)
* am: fini gpu after use

* mypy
2025-01-10 21:02:34 +03:00
chenyu
6a7f971fa0 hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553) 2025-01-10 12:57:44 -05:00
George Hotz
cd4edc5206 hotfix: pylint ignores runtime for speed 2025-01-10 09:07:18 -08:00
nimlgen
92b59c9b7a test_hcq limits for mockgpu not (only) ci (#8555)
* test_hcq limits for mockgpu not (only) ci

* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8 more work on onnx imagenet [pr] (#8552)
* more work on onnx imagenet [pr]

* working quantization

* static quant

* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0 more working (#8550) 2025-01-09 18:40:08 -08:00
chenyu
2cbb34535c simpler allreduce script [pr] (#8551)
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8 update and clean up allreduce script [pr] (#8549)
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
George Hotz
5720871903 onnx consts are const [pr] (#8548) 2025-01-09 16:09:22 -08:00
chenyu
88661cd96f fix checking DiskBuffer is opened [pr] (#8547)
`assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first
2025-01-09 18:58:36 -05:00
George Hotz
62447c253d viz cleanups [pr] (#8498)
* viz cleanups [pr]

* Update serve.py
2025-01-09 15:46:48 -08:00
geohotstan
299d333806 Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478)
* QLinearEverything

* ok ort verify passes

* this should be int instead

* cast to int then char to do wraparound

* cleaner

* move contrib ops to microsoft ops

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0 delete empty op (#8544)
* simple delete EMPTY op

* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2 start tests for putting the tensor graph in a single kernel [pr] (#8542)
* start tests for putting the tensor graph in a single kernel [pr]

* parallel actually

* better view_left test

* test a softmax

* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
83a8217cbf hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] (#8543) 2025-01-09 11:10:15 -05:00
qazal
1efb1188d8 support pickling a realized BUFFER uop [pr] (#8541)
* try 2 at this diff

* process replay

* delete uops from buffer

* free buffers

* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
qazal
7595352dfc refactor buffer_view op structure [pr] (#8540)
* refactor buffer_view op [pr]

* only empty now

* same st

* empty shape is fine
2025-01-09 03:07:46 -05:00
eliotgolding
4c5c32ff5f Small bug in _reshape_mask (#8538) 2025-01-08 22:11:24 -05:00
nimlgen
aa3d612df2 add script to install amd mockgpu on macOS (#8536)
* upload artifact every time

* hm

* sh script

* hm

* hm2

* hm2

* hm2

* no sudo

* def paths

* small comments

* text

* try auth for bigger limits
2025-01-09 01:29:25 +03:00
nimlgen
31fcfe764d adjust hcq test for ci macos (#8534) 2025-01-08 16:18:31 +03:00
qazal
49abe6d3a6 little more compact tensor_uop_spec [pr] (#8533)
* little more compact tensor_uop_spec [pr]

* space

* fix
2025-01-08 08:01:53 -05:00
patrini32
21c7d7c71a MOCKGPU amd test on OSX (#8505)
* add tests

* Refactor

* cache only amd/comgr/build (saves a lot of space)

* fix

* silence warning and add check for cache hit before installing cmake

* run only pytest

* use actions/cache

* lower timeout-minutes and add Device.DEFAULT step

* add nvidia to Device.DEFAULT check

* typo

* fix

* Check only for amd and run only 2 test
2025-01-08 14:27:56 +03:00
nimlgen
2f530adb04 hwiface: close fd when valid (#8530) 2025-01-08 10:43:59 +03:00
qazal
947de23cac add VIEW(DEVICE) to tensor variable [pr] (#8529)
* add VIEW(DEVICE) to tensor variable [pr]

* bind 2

* restrict shapetracker

* move var and bind closer

* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710 restrict tensor const ShapeTracker in spec [pr] (#8447)
* restrict tensor const ShapeTracker in spec [pr]

* pass sink srcs

* reject if any of the specs disagree

* deceive mypy

* viz

* default to float

* just check the view

* create_schedule is gone

* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00