eliotgolding
867004fbeb
use unravel in views_to_indexed_uops [pr] ( #8560 )
...
* use unravel in shape
* make process replay work
* earlier View.minify()
* fix
* fix tests
* mypy
* get rid of early minify
* fix
* linter
* clean and add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a
mypy for mockgpu/cuda & dsp/run ( #8575 )
2025-01-12 18:25:39 +03:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
qazal
ae241e96db
fix half4 on qcom and gpu ( #8573 )
...
* add test_setitem_half
* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038
add SINK folding from the tensor_map branch [pr] ( #8562 )
...
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0
always give BUFFER uops Buffers [pr] ( #8572 )
...
* always give BUFFER uops Buffers [pr]
* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
98c9e23560
remove global PYTHONPATH setting in CI (test.yml) [pr] ( #8568 )
...
* remove global PYTHONPATH setting in CI [pr]
* only run mypy in tinygrad/
* still needed for benchmarks
2025-01-11 12:47:50 -05:00
geohotstan
815c505e1d
fixes from adapting tvm tests ( #8570 )
2025-01-11 11:38:36 -05:00
qazal
79738d768c
do not require PYTHONPATH=. for process replay [pr] ( #8567 )
2025-01-11 09:45:34 -05:00
qazal
a70d1bf439
move print_diff to process replay [pr] ( #8566 )
...
* move print_diff to process replay [pr]
* ruff rightfully complians
2025-01-11 09:28:45 -05:00
nimlgen
2f0856c1e2
qcom: use hwinterface ( #8565 )
...
* qcom: use hwinterface
* ops
* not needed anymore
2025-01-11 17:11:23 +03:00
qazal
60503c8621
use CAPTURE_PROCESS_REPLAY=1 in CI [pr] ( #8564 )
2025-01-11 06:03:48 -05:00
nimlgen
61665a63c9
am logs to debug2 ( #8563 )
2025-01-11 13:33:18 +03:00
George Hotz
c7acd40574
more aggressive onnx const creation [pr] ( #8561 )
2025-01-10 17:38:32 -08:00
ignaciosica
8891495996
minor arg spec check on wmma ( #8525 )
2025-01-10 15:42:56 -08:00
chenyu
d09897c2aa
allow double copy [pr] ( #8559 )
...
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
George Hotz
70fa65cd95
viz fixups + scheduler option [pr] ( #8557 )
2025-01-10 15:09:31 -08:00
nimlgen
f457cb64d6
am: do not reload fw each run ( #8466 )
...
* am do not reload fw each run
* works
* comment this
* clean + comment
* warn message
* linter
* move out pci en master
* useless
* more correct
* oops
* oops
2025-01-10 23:33:38 +03:00
nimlgen
337328e409
am: fini gpu after use ( #8556 )
...
* am: fini gpu after use
* mypy
2025-01-10 21:02:34 +03:00
chenyu
6a7f971fa0
hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] ( #8553 )
2025-01-10 12:57:44 -05:00
George Hotz
cd4edc5206
hotfix: pylint ignores runtime for speed
2025-01-10 09:07:18 -08:00
nimlgen
92b59c9b7a
test_hcq limits for mockgpu not (only) ci ( #8555 )
...
* test_hcq limits for mockgpu not (only) ci
* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00
chenyu
2cbb34535c
simpler allreduce script [pr] ( #8551 )
...
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8
update and clean up allreduce script [pr] ( #8549 )
...
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
George Hotz
5720871903
onnx consts are const [pr] ( #8548 )
2025-01-09 16:09:22 -08:00
chenyu
88661cd96f
fix checking DiskBuffer is opened [pr] ( #8547 )
...
`assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first
2025-01-09 18:58:36 -05:00
George Hotz
62447c253d
viz cleanups [pr] ( #8498 )
...
* viz cleanups [pr]
* Update serve.py
2025-01-09 15:46:48 -08:00
geohotstan
299d333806
Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx ( #8478 )
...
* QLinearEverything
* ok ort verify passes
* this should be int instead
* cast to int then char to do wraparound
* cleaner
* move contrib ops to microsoft ops
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0
delete empty op ( #8544 )
...
* simple delete EMPTY op
* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2
start tests for putting the tensor graph in a single kernel [pr] ( #8542 )
...
* start tests for putting the tensor graph in a single kernel [pr]
* parallel actually
* better view_left test
* test a softmax
* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
83a8217cbf
hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] ( #8543 )
2025-01-09 11:10:15 -05:00
qazal
1efb1188d8
support pickling a realized BUFFER uop [pr] ( #8541 )
...
* try 2 at this diff
* process replay
* delete uops from buffer
* free buffers
* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
qazal
7595352dfc
refactor buffer_view op structure [pr] ( #8540 )
...
* refactor buffer_view op [pr]
* only empty now
* same st
* empty shape is fine
2025-01-09 03:07:46 -05:00
eliotgolding
4c5c32ff5f
Small bug in _reshape_mask ( #8538 )
2025-01-08 22:11:24 -05:00
nimlgen
aa3d612df2
add script to install amd mockgpu on macOS ( #8536 )
...
* upload artifact every time
* hm
* sh script
* hm
* hm2
* hm2
* hm2
* no sudo
* def paths
* small comments
* text
* try auth for bigger limits
2025-01-09 01:29:25 +03:00
nimlgen
31fcfe764d
adjust hcq test for ci macos ( #8534 )
2025-01-08 16:18:31 +03:00
qazal
49abe6d3a6
little more compact tensor_uop_spec [pr] ( #8533 )
...
* little more compact tensor_uop_spec [pr]
* space
* fix
2025-01-08 08:01:53 -05:00
patrini32
21c7d7c71a
MOCKGPU amd test on OSX ( #8505 )
...
* add tests
* Refactor
* cache only amd/comgr/build (saves a lot of space)
* fix
* silence warning and add check for cache hit before installing cmake
* run only pytest
* use actions/cache
* lower timeout-minutes and add Device.DEFAULT step
* add nvidia to Device.DEFAULT check
* typo
* fix
* Check only for amd and run only 2 test
2025-01-08 14:27:56 +03:00
nimlgen
2f530adb04
hwiface: close fd when valid ( #8530 )
2025-01-08 10:43:59 +03:00
qazal
947de23cac
add VIEW(DEVICE) to tensor variable [pr] ( #8529 )
...
* add VIEW(DEVICE) to tensor variable [pr]
* bind 2
* restrict shapetracker
* move var and bind closer
* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710
restrict tensor const ShapeTracker in spec [pr] ( #8447 )
...
* restrict tensor const ShapeTracker in spec [pr]
* pass sink srcs
* reject if any of the specs disagree
* deceive mypy
* viz
* default to float
* just check the view
* create_schedule is gone
* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
patrini32
afef69a37d
MOCKGPU on mac os ( #8520 )
...
* tweaks for macos
* fix
* fix
* typo
* remove nvidia changes
* remove nv related changes
* change address back
2025-01-07 20:27:43 +03:00
nimlgen
ab3ac2b58d
hw interface abstraction ( #8524 )
...
* use HWInterface in autogen
* mockgpu
* HWInterface
* more HWInterface
* fix
* fix
* old code
* fix
* implicit field definition
* add offset check to mockgpu too
* refactor
* forgot to pass flags + read rewrite
* test
* play with vfio
* nv: this should be kept
* try this
* vfio
* rm overwrite=True
* linetr
* do not reinit kfd
* minor
* mypy
* mock
* init them once
---------
Co-authored-by: patrini32 <patrini23@proton.me >
2025-01-07 18:18:28 +03:00
qazal
0e97f807e0
test fixup prereqs for delete_buffer_view [pr] ( #8523 )
2025-01-07 11:52:18 +02:00
chenyu
85a4397f27
fix create_schedule_with_vars usage in allreduce benchmark [pr] ( #8522 )
...
* fix create_schedule_with_vars usage in allreduce benchmark [pr]
because i didn't know how to use it...
* increase time limit because tiny17 is slow
2025-01-07 01:30:01 -05:00
chenyu
0061dc7447
fix benchmark allreduce and add to ci [pr] ( #8521 )
2025-01-07 00:37:59 -05:00
geohotstan
c69f459c96
Add checking variable dimension to onnx ( #8518 )
...
* validate variable dims and fix buffer_parse to not use numpy
* fix var_dim parsing
* gah float16
* revert buffer_parse stuff
* revert that revert
* correct some err msges
* add some more debug msgs I find helpful
* tensor init noop
* add an assert just for the sake of it.
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-07 00:30:35 -05:00
nimlgen
5cb9443ebb
PROFILE is enabled when VIZ is enabled ( #8516 )
2025-01-06 19:47:16 +03:00