Commit Graph

3206 Commits

Author SHA1 Message Date
Francis Lata
76a03e950a make kits19 dataset samples have small sizes (#8591) 2025-01-14 08:27:45 -08:00
qazal
5aab2806f0 rename to test_tensor_uop + use upats for asserting [pr] (#8604)
* rename to test_tensor_uop + use upats for asserting [pr]

* fix pr
2025-01-14 05:09:56 -05:00
qazal
863abc7140 scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598)
* remove the BUF_LIMIT assert

* skip the base one

* work

* work

* good error

* ok comment

* shorter check
2025-01-14 03:01:59 -05:00
chenyu
d443e91d82 remove custom splits in Tensor.shard [pr] (#8602)
towards even split only
2025-01-13 21:29:13 -05:00
chenyu
c4e33048c6 test Tensor.clone has a different lazydata [pr] (#8600) 2025-01-13 20:13:44 -05:00
qazal
ae2229d727 assert kernel buffer limit at compile time [pr] (#8595)
* remove the BUF_LIMIT assert

* skip the base one
2025-01-13 16:32:07 -05:00
geohotstan
4abe631b56 fix onnx mobilenetv2-7-quantized.onnx (#8574)
* is 67% considered fixed?

* move test up

* share function

* add qgemm too

* make sure qgemm comes out as int

* actually that note is not right

* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
George Hotz
d19c1c7f03 bump 75 -> 73 for test failure 2025-01-13 09:18:38 -08:00
nimlgen
d224d0ed7f nv: fix fault info (#8587)
* nv: fix fault info

* and emu for amd

* skip if not mock
2025-01-13 14:38:43 +03:00
qazal
586e730d32 use UOp.st for kernel reduce axes (#8499)
* use UOp.st for kernel reduce axes [pr]

* do not return dict
2025-01-13 06:24:11 -05:00
qazal
7562cc0399 better test for reduce swizzle + don't use double dtype [pr] (#8586)
* better test_permute_rewrite

* use float32
2025-01-13 05:02:21 -05:00
George Hotz
4ac4c1415a free intermediate buffers in the jit [pr] (#8581)
* free intermediate buffers in the jit [pr]

* intermediates_freed

* deallocate if not allocated

* self._first_run is simpler
2025-01-12 15:41:41 -08:00
George Hotz
d817dc10db start on test rewrite map [pr] (#8432)
* start on test rewrite map [pr]

* chatgpt writes dumb tests

* comment out failing

* fix that test

* fix gc issue

* oh, frame 2

* remove uop mutability

* map is only the map

* simplier + more tests

* test tiny passes

* tests that need to pass

* parent test passes

* child test passes

* remove uop mutability [pr]

* test fixups

* most tests pass

* more tests pass

* lil test fixups

* them too

* fix test

* unneeded

* err, that

* fix test_hcq

* fix test failures

* fix that test

* tensor universe

* does this pass test

* Revert "does this pass test"

This reverts commit ed516b3169.

* Revert "tensor universe"

This reverts commit c21301852a.

* test_mutate_add passes

* this can pass

* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"

This reverts commit 657822dcdc, reversing
changes made to 2a126c145b.

* Revert "test_mutate_add passes"

This reverts commit ab4fc4c78e.

* correct enough

* remove test_rewrite_map_schedule.py

* viz

* uops are immutable

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-12 13:13:51 -05:00
qazal
cde18fddce fix DEBUG=2 output for copy runners [pr] (#8579)
* fix DEBUG=2 output for copy runners [pr]

* itemsize is constant
2025-01-12 12:03:01 -05:00
eliotgolding
867004fbeb use unravel in views_to_indexed_uops [pr] (#8560)
* use unravel in shape

* make process replay work

* earlier View.minify()

* fix

* fix tests

* mypy

* get rid of early minify

* fix

* linter

* clean and add test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a mypy for mockgpu/cuda & dsp/run (#8575) 2025-01-12 18:25:39 +03:00
qazal
ae241e96db fix half4 on qcom and gpu (#8573)
* add test_setitem_half

* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038 add SINK folding from the tensor_map branch [pr] (#8562)
* delete is_constant from the scheduler

* add sink folding

* always give BUFFER uops Buffers [pr]

* spec for view, var (bind) and const

* add test_buffer_only_after_realize

* work

* 3 lines

* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0 always give BUFFER uops Buffers [pr] (#8572)
* always give BUFFER uops Buffers [pr]

* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
79738d768c do not require PYTHONPATH=. for process replay [pr] (#8567) 2025-01-11 09:45:34 -05:00
qazal
a70d1bf439 move print_diff to process replay [pr] (#8566)
* move print_diff to process replay [pr]

* ruff rightfully complians
2025-01-11 09:28:45 -05:00
qazal
60503c8621 use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564) 2025-01-11 06:03:48 -05:00
chenyu
d09897c2aa allow double copy [pr] (#8559)
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
chenyu
6a7f971fa0 hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553) 2025-01-10 12:57:44 -05:00
nimlgen
92b59c9b7a test_hcq limits for mockgpu not (only) ci (#8555)
* test_hcq limits for mockgpu not (only) ci

* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8 more work on onnx imagenet [pr] (#8552)
* more work on onnx imagenet [pr]

* working quantization

* static quant

* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
chenyu
2cbb34535c simpler allreduce script [pr] (#8551)
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8 update and clean up allreduce script [pr] (#8549)
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
geohotstan
299d333806 Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478)
* QLinearEverything

* ok ort verify passes

* this should be int instead

* cast to int then char to do wraparound

* cleaner

* move contrib ops to microsoft ops

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0 delete empty op (#8544)
* simple delete EMPTY op

* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2 start tests for putting the tensor graph in a single kernel [pr] (#8542)
* start tests for putting the tensor graph in a single kernel [pr]

* parallel actually

* better view_left test

* test a softmax

* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
1efb1188d8 support pickling a realized BUFFER uop [pr] (#8541)
* try 2 at this diff

* process replay

* delete uops from buffer

* free buffers

* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
eliotgolding
4c5c32ff5f Small bug in _reshape_mask (#8538) 2025-01-08 22:11:24 -05:00
nimlgen
aa3d612df2 add script to install amd mockgpu on macOS (#8536)
* upload artifact every time

* hm

* sh script

* hm

* hm2

* hm2

* hm2

* no sudo

* def paths

* small comments

* text

* try auth for bigger limits
2025-01-09 01:29:25 +03:00
nimlgen
31fcfe764d adjust hcq test for ci macos (#8534) 2025-01-08 16:18:31 +03:00
qazal
947de23cac add VIEW(DEVICE) to tensor variable [pr] (#8529)
* add VIEW(DEVICE) to tensor variable [pr]

* bind 2

* restrict shapetracker

* move var and bind closer

* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710 restrict tensor const ShapeTracker in spec [pr] (#8447)
* restrict tensor const ShapeTracker in spec [pr]

* pass sink srcs

* reject if any of the specs disagree

* deceive mypy

* viz

* default to float

* just check the view

* create_schedule is gone

* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
patrini32
afef69a37d MOCKGPU on mac os (#8520)
* tweaks for macos

* fix

* fix

* typo

* remove nvidia changes

* remove nv related changes

* change address back
2025-01-07 20:27:43 +03:00
nimlgen
ab3ac2b58d hw interface abstraction (#8524)
* use HWInterface in autogen

* mockgpu

* HWInterface

* more HWInterface

* fix

* fix

* old code

* fix

* implicit field definition

* add offset check to mockgpu too

* refactor

* forgot to pass flags + read rewrite

* test

* play with vfio

* nv: this should be kept

* try this

* vfio

* rm overwrite=True

* linetr

* do not reinit kfd

* minor

* mypy

* mock

* init them once

---------

Co-authored-by: patrini32 <patrini23@proton.me>
2025-01-07 18:18:28 +03:00
qazal
0e97f807e0 test fixup prereqs for delete_buffer_view [pr] (#8523) 2025-01-07 11:52:18 +02:00
chenyu
85a4397f27 fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522)
* fix create_schedule_with_vars usage in allreduce benchmark [pr]

because i didn't know how to use it...

* increase time limit because tiny17 is slow
2025-01-07 01:30:01 -05:00
chenyu
0061dc7447 fix benchmark allreduce and add to ci [pr] (#8521) 2025-01-07 00:37:59 -05:00
qazal
ed618a72e7 do not use subbuffer for bitcast (#8514)
* do not use subbuffer for bitcast

* edit that test

* explicit test for ptx

* ptx
2025-01-06 18:40:46 +02:00
qazal
547fd5078f cleanups for COPY uop implementation and spec [pr] (#8513) 2025-01-06 11:39:12 +02:00
qazal
ed121d235c spec for CAST_BEFORE_VIEW=1 [pr] (#8512) 2025-01-06 10:43:58 +02:00
qazal
eb7df92136 dedup COPY UOp [pr] (#8506) 2025-01-06 10:37:20 +02:00
geohotstan
9229867fec Support asymmetrical pads for all pooling functions (#8109)
* implemented in tensor

* apply onnx tests to asymmetrical pads

* better onnx op ordering

* correct ceil_mode asymmetrical

* fix onnx_ops comments

* a few more TODOs and fix some stupidity

* fix some typing

* fix test

* mypy still a little messed up

* refactor out pad struct transformation

* add simple docs for now

* add whatever tests possible

* add tests for _resolve_pool_pads

* better err msg

* whoops didn't mean to include this

* retry CI

* enable asymmetric pads onnx tests

* better docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-05 16:01:08 -05:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
qazal
036efa9157 use UOp.substitute for VIZ=1 [pr] (#8497)
* use UOp.substitute for VIZ=1 [pr]

* more acceptable
2025-01-04 20:00:29 +02:00
geohotstan
3dfc8e1706 Share a _resolve_pool_pads function for pool ops in Tensor (#8485)
* _padding2d -> _resolve_pool_pads

* rephrase err msg

* even better error msg

* check asymmetric first os people don't hit error twice

* test against torch
2025-01-03 23:54:11 -05:00