Commit Graph

7464 Commits

Author SHA1 Message Date
George Hotz
cd4edc5206 hotfix: pylint ignores runtime for speed 2025-01-10 09:07:18 -08:00
nimlgen
92b59c9b7a test_hcq limits for mockgpu not (only) ci (#8555)
* test_hcq limits for mockgpu not (only) ci

* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8 more work on onnx imagenet [pr] (#8552)
* more work on onnx imagenet [pr]

* working quantization

* static quant

* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0 more working (#8550) 2025-01-09 18:40:08 -08:00
chenyu
2cbb34535c simpler allreduce script [pr] (#8551)
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8 update and clean up allreduce script [pr] (#8549)
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
George Hotz
5720871903 onnx consts are const [pr] (#8548) 2025-01-09 16:09:22 -08:00
chenyu
88661cd96f fix checking DiskBuffer is opened [pr] (#8547)
`assert self.device.mem is not None` did not assert because `.mem` triggers AttributeError first
2025-01-09 18:58:36 -05:00
George Hotz
62447c253d viz cleanups [pr] (#8498)
* viz cleanups [pr]

* Update serve.py
2025-01-09 15:46:48 -08:00
geohotstan
299d333806 Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478)
* QLinearEverything

* ok ort verify passes

* this should be int instead

* cast to int then char to do wraparound

* cleaner

* move contrib ops to microsoft ops

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0 delete empty op (#8544)
* simple delete EMPTY op

* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2 start tests for putting the tensor graph in a single kernel [pr] (#8542)
* start tests for putting the tensor graph in a single kernel [pr]

* parallel actually

* better view_left test

* test a softmax

* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
83a8217cbf hotfix: TRACK_MATCH_STATS=2 should not launch viz [pr] (#8543) 2025-01-09 11:10:15 -05:00
qazal
1efb1188d8 support pickling a realized BUFFER uop [pr] (#8541)
* try 2 at this diff

* process replay

* delete uops from buffer

* free buffers

* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
qazal
7595352dfc refactor buffer_view op structure [pr] (#8540)
* refactor buffer_view op [pr]

* only empty now

* same st

* empty shape is fine
2025-01-09 03:07:46 -05:00
eliotgolding
4c5c32ff5f Small bug in _reshape_mask (#8538) 2025-01-08 22:11:24 -05:00
nimlgen
aa3d612df2 add script to install amd mockgpu on macOS (#8536)
* upload artifact every time

* hm

* sh script

* hm

* hm2

* hm2

* hm2

* no sudo

* def paths

* small comments

* text

* try auth for bigger limits
2025-01-09 01:29:25 +03:00
nimlgen
31fcfe764d adjust hcq test for ci macos (#8534) 2025-01-08 16:18:31 +03:00
qazal
49abe6d3a6 little more compact tensor_uop_spec [pr] (#8533)
* little more compact tensor_uop_spec [pr]

* space

* fix
2025-01-08 08:01:53 -05:00
patrini32
21c7d7c71a MOCKGPU amd test on OSX (#8505)
* add tests

* Refactor

* cache only amd/comgr/build (saves a lot of space)

* fix

* silence warning and add check for cache hit before installing cmake

* run only pytest

* use actions/cache

* lower timeout-minutes and add Device.DEFAULT step

* add nvidia to Device.DEFAULT check

* typo

* fix

* Check only for amd and run only 2 test
2025-01-08 14:27:56 +03:00
nimlgen
2f530adb04 hwiface: close fd when valid (#8530) 2025-01-08 10:43:59 +03:00
qazal
947de23cac add VIEW(DEVICE) to tensor variable [pr] (#8529)
* add VIEW(DEVICE) to tensor variable [pr]

* bind 2

* restrict shapetracker

* move var and bind closer

* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710 restrict tensor const ShapeTracker in spec [pr] (#8447)
* restrict tensor const ShapeTracker in spec [pr]

* pass sink srcs

* reject if any of the specs disagree

* deceive mypy

* viz

* default to float

* just check the view

* create_schedule is gone

* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
patrini32
afef69a37d MOCKGPU on mac os (#8520)
* tweaks for macos

* fix

* fix

* typo

* remove nvidia changes

* remove nv related changes

* change address back
2025-01-07 20:27:43 +03:00
nimlgen
ab3ac2b58d hw interface abstraction (#8524)
* use HWInterface in autogen

* mockgpu

* HWInterface

* more HWInterface

* fix

* fix

* old code

* fix

* implicit field definition

* add offset check to mockgpu too

* refactor

* forgot to pass flags + read rewrite

* test

* play with vfio

* nv: this should be kept

* try this

* vfio

* rm overwrite=True

* linetr

* do not reinit kfd

* minor

* mypy

* mock

* init them once

---------

Co-authored-by: patrini32 <patrini23@proton.me>
2025-01-07 18:18:28 +03:00
qazal
0e97f807e0 test fixup prereqs for delete_buffer_view [pr] (#8523) 2025-01-07 11:52:18 +02:00
chenyu
85a4397f27 fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522)
* fix create_schedule_with_vars usage in allreduce benchmark [pr]

because i didn't know how to use it...

* increase time limit because tiny17 is slow
2025-01-07 01:30:01 -05:00
chenyu
0061dc7447 fix benchmark allreduce and add to ci [pr] (#8521) 2025-01-07 00:37:59 -05:00
geohotstan
c69f459c96 Add checking variable dimension to onnx (#8518)
* validate variable dims and fix buffer_parse to not use numpy

* fix var_dim parsing

* gah float16

* revert buffer_parse stuff

* revert that revert

* correct some err msges

* add some more debug msgs I find helpful

* tensor init noop

* add an assert just for the sake of it.

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-07 00:30:35 -05:00
nimlgen
5cb9443ebb PROFILE is enabled when VIZ is enabled (#8516) 2025-01-06 19:47:16 +03:00
qazal
ed618a72e7 do not use subbuffer for bitcast (#8514)
* do not use subbuffer for bitcast

* edit that test

* explicit test for ptx

* ptx
2025-01-06 18:40:46 +02:00
nimlgen
280143467b am: tune all sleep timings to match kernel (#8515)
* am: tune all sleep timings to match kernel

* rm
2025-01-06 18:03:57 +03:00
qazal
547fd5078f cleanups for COPY uop implementation and spec [pr] (#8513) 2025-01-06 11:39:12 +02:00
qazal
ed121d235c spec for CAST_BEFORE_VIEW=1 [pr] (#8512) 2025-01-06 10:43:58 +02:00
qazal
eb7df92136 dedup COPY UOp [pr] (#8506) 2025-01-06 10:37:20 +02:00
chenyu
76a138cdb6 simpler UOp.st [pr] (#8510) 2025-01-05 22:08:14 -05:00
chenyu
b6be407bc6 fix handcode_opt bert [pr] (#8509)
* fix handcode_opt bert [pr]

* too slow
2025-01-05 19:14:12 -05:00
geohotstan
9229867fec Support asymmetrical pads for all pooling functions (#8109)
* implemented in tensor

* apply onnx tests to asymmetrical pads

* better onnx op ordering

* correct ceil_mode asymmetrical

* fix onnx_ops comments

* a few more TODOs and fix some stupidity

* fix some typing

* fix test

* mypy still a little messed up

* refactor out pad struct transformation

* add simple docs for now

* add whatever tests possible

* add tests for _resolve_pool_pads

* better err msg

* whoops didn't mean to include this

* retry CI

* enable asymmetric pads onnx tests

* better docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-05 16:01:08 -05:00
uuuvn
c9c7f1be46 Remove unused R_AARCH64_CALL26 relocation (#8508)
First iteration of the AMX fix was using symbol lookup + trampoline
approach which required this, however later i replaced it by marking
amx function `static` and assumed that relocation was still used when
callee wasn't inlined, however this turned out not to be the case
because the callee can't be moved around by linker at link-time and
can't be overloaded by other symbols (`static` means priority + local
visibility)
2025-01-06 00:00:21 +03:00
nimlgen
b4f4a3ac12 am: minor parts (#8507) 2025-01-05 23:05:21 +03:00
qazal
0e0cba2cfc move llvm_bf16_cast to the renderer [pr] (#8502)
* move llvm_bf16_cast to the renderer [pr]

* cast to half is fine too

* delete the old one

* wish i could just cast the ptr
2025-01-05 13:02:41 +02:00
chenyu
4143f6a7d9 unused from __future__ import annotations [pr] (#8504) 2025-01-04 23:11:01 -05:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
George Hotz
ddad4d55da add typing to tqdm [pr] (#8500) 2025-01-04 13:55:52 -05:00
qazal
036efa9157 use UOp.substitute for VIZ=1 [pr] (#8497)
* use UOp.substitute for VIZ=1 [pr]

* more acceptable
2025-01-04 20:00:29 +02:00
uuuvn
615d5276b1 Suppress 'X warnings generated.' in MTLCompiler (#8489)
'-fno-caret-diagnostics' is what clang-tidy uses when user passes --quiet
2025-01-04 10:22:37 -05:00
nimlgen
5df213d51e am: remove alloc frags logic (#8491) 2025-01-04 12:25:20 +03:00
geohotstan
3dfc8e1706 Share a _resolve_pool_pads function for pool ops in Tensor (#8485)
* _padding2d -> _resolve_pool_pads

* rephrase err msg

* even better error msg

* check asymmetric first os people don't hit error twice

* test against torch
2025-01-03 23:54:11 -05:00
chenyu
6c639dee5c more informative kernel opt error messages [pr] (#8487) 2025-01-03 14:29:36 -05:00
uuuvn
5ffc50d58c Clang JIT (#8481)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-03 11:12:55 -05:00