Commit Graph

4433 Commits

Author SHA1 Message Date
chenyu
6a7f971fa0 hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553) 2025-01-10 12:57:44 -05:00
nimlgen
92b59c9b7a test_hcq limits for mockgpu not (only) ci (#8555)
* test_hcq limits for mockgpu not (only) ci

* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8 more work on onnx imagenet [pr] (#8552)
* more work on onnx imagenet [pr]

* working quantization

* static quant

* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
chenyu
2cbb34535c simpler allreduce script [pr] (#8551)
time everything on tensor level and get time from GlobalCounters.time_sum_s
2025-01-09 21:38:13 -05:00
chenyu
23c56817d8 update and clean up allreduce script [pr] (#8549)
make `run` to able to run with ring only
2025-01-09 19:35:28 -05:00
geohotstan
299d333806 Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478)
* QLinearEverything

* ok ort verify passes

* this should be int instead

* cast to int then char to do wraparound

* cleaner

* move contrib ops to microsoft ops

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-09 15:08:53 -08:00
qazal
2fd068ffc0 delete empty op (#8544)
* simple delete EMPTY op

* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2 start tests for putting the tensor graph in a single kernel [pr] (#8542)
* start tests for putting the tensor graph in a single kernel [pr]

* parallel actually

* better view_left test

* test a softmax

* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
1efb1188d8 support pickling a realized BUFFER uop [pr] (#8541)
* try 2 at this diff

* process replay

* delete uops from buffer

* free buffers

* test_pickle_buffer_uop
2025-01-09 06:37:22 -05:00
eliotgolding
4c5c32ff5f Small bug in _reshape_mask (#8538) 2025-01-08 22:11:24 -05:00
nimlgen
aa3d612df2 add script to install amd mockgpu on macOS (#8536)
* upload artifact every time

* hm

* sh script

* hm

* hm2

* hm2

* hm2

* no sudo

* def paths

* small comments

* text

* try auth for bigger limits
2025-01-09 01:29:25 +03:00
nimlgen
31fcfe764d adjust hcq test for ci macos (#8534) 2025-01-08 16:18:31 +03:00
qazal
947de23cac add VIEW(DEVICE) to tensor variable [pr] (#8529)
* add VIEW(DEVICE) to tensor variable [pr]

* bind 2

* restrict shapetracker

* move var and bind closer

* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710 restrict tensor const ShapeTracker in spec [pr] (#8447)
* restrict tensor const ShapeTracker in spec [pr]

* pass sink srcs

* reject if any of the specs disagree

* deceive mypy

* viz

* default to float

* just check the view

* create_schedule is gone

* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
patrini32
afef69a37d MOCKGPU on mac os (#8520)
* tweaks for macos

* fix

* fix

* typo

* remove nvidia changes

* remove nv related changes

* change address back
2025-01-07 20:27:43 +03:00
nimlgen
ab3ac2b58d hw interface abstraction (#8524)
* use HWInterface in autogen

* mockgpu

* HWInterface

* more HWInterface

* fix

* fix

* old code

* fix

* implicit field definition

* add offset check to mockgpu too

* refactor

* forgot to pass flags + read rewrite

* test

* play with vfio

* nv: this should be kept

* try this

* vfio

* rm overwrite=True

* linetr

* do not reinit kfd

* minor

* mypy

* mock

* init them once

---------

Co-authored-by: patrini32 <patrini23@proton.me>
2025-01-07 18:18:28 +03:00
qazal
0e97f807e0 test fixup prereqs for delete_buffer_view [pr] (#8523) 2025-01-07 11:52:18 +02:00
chenyu
85a4397f27 fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522)
* fix create_schedule_with_vars usage in allreduce benchmark [pr]

because i didn't know how to use it...

* increase time limit because tiny17 is slow
2025-01-07 01:30:01 -05:00
chenyu
0061dc7447 fix benchmark allreduce and add to ci [pr] (#8521) 2025-01-07 00:37:59 -05:00
qazal
ed618a72e7 do not use subbuffer for bitcast (#8514)
* do not use subbuffer for bitcast

* edit that test

* explicit test for ptx

* ptx
2025-01-06 18:40:46 +02:00
qazal
547fd5078f cleanups for COPY uop implementation and spec [pr] (#8513) 2025-01-06 11:39:12 +02:00
qazal
ed121d235c spec for CAST_BEFORE_VIEW=1 [pr] (#8512) 2025-01-06 10:43:58 +02:00
qazal
eb7df92136 dedup COPY UOp [pr] (#8506) 2025-01-06 10:37:20 +02:00
geohotstan
9229867fec Support asymmetrical pads for all pooling functions (#8109)
* implemented in tensor

* apply onnx tests to asymmetrical pads

* better onnx op ordering

* correct ceil_mode asymmetrical

* fix onnx_ops comments

* a few more TODOs and fix some stupidity

* fix some typing

* fix test

* mypy still a little messed up

* refactor out pad struct transformation

* add simple docs for now

* add whatever tests possible

* add tests for _resolve_pool_pads

* better err msg

* whoops didn't mean to include this

* retry CI

* enable asymmetric pads onnx tests

* better docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-05 16:01:08 -05:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
qazal
036efa9157 use UOp.substitute for VIZ=1 [pr] (#8497)
* use UOp.substitute for VIZ=1 [pr]

* more acceptable
2025-01-04 20:00:29 +02:00
geohotstan
3dfc8e1706 Share a _resolve_pool_pads function for pool ops in Tensor (#8485)
* _padding2d -> _resolve_pool_pads

* rephrase err msg

* even better error msg

* check asymmetric first os people don't hit error twice

* test against torch
2025-01-03 23:54:11 -05:00
qazal
12fa4340b3 pickle ContextVars in process replay [pr] (#8484)
* pickle ContextVars in process replay

* add test_pickle_context_var [pr]

* more realistic
2025-01-03 23:11:54 +08:00
qazal
bd4d7dc4eb return becomes_map from the scheduler (#8483)
* return becomes_map from the scheduler

* fix test_schedule

* fix abstractions2

* s/becomes/becomes_map
2025-01-03 22:47:21 +08:00
qazal
0d33391038 delete unused allow_buffer_view=True arg from bitcast [pr] (#8462) 2025-01-03 22:20:46 +08:00
uuuvn
048643e7f9 Skip test that counts Ops.LOAD on CLANG+AMX (upcasts up to float16) (#8475)
This test assumes that float4 is the max upcast and tests that 8 float
loads are upcasted to 2 float4 loads, however on CLANG+AMX upcasts
can be up to float16 and in this test we get one float8 load instead.

The @unittest.skipIf line is copied from test_linearizer.py where
a bunch of tests make similar assumptions about upcasts.
2025-01-02 17:17:49 -05:00
geohotstan
de306c615b [fixed] onnx pool cleanup (#8474)
* pool janitor duty

* actually conv allows asymmetric pads

* a little prettier
2025-01-02 16:56:10 -05:00
qazal
08c9d980dc use const_like in uop zero folding [pr] (#8470) 2025-01-03 01:05:09 +08:00
chenyu
6fa38367bf Revert "onnx pool ops clean up (#8471)" (#8472)
This reverts commit 241db29ede.
2025-01-02 11:04:34 -05:00
uuuvn
e7c6282dd6 Fix uop.st for CLANG+AMX (#8460) 2025-01-02 18:01:41 +02:00
geohotstan
241db29ede onnx pool ops clean up (#8471) 2025-01-02 10:45:30 -05:00
geohotstan
c4b13e2f6d add onnx DequantizeLinear (#8468)
* is this right?

* small changes

* dont support float8

* mergeable?
2025-01-02 09:52:49 -05:00
qazal
f2bee34197 tests for symbolic_simple failing tensor const spec [pr] (#8469)
* tests for symbolic_simple failing tensor const spec [pr]

* mul is correct
2025-01-02 19:13:16 +08:00
chenyu
e5c85ec684 type annotation of resolve [pr] (#8467)
it takes UOp|bool
2025-01-01 10:21:59 -05:00
nimlgen
c18307e749 AM driver (#6923)
* connect to gpu

* rlc init?

* gfx comp start init

* early init is hardoded, some progress with fw

* gart

* progress, next mqd

* ring setup, still does not execute anything

* ugh write correct reg

* pci2: vm

* pci2: start psp

* vm seems to work

* pci2: gfx start

* pci2: fix psp ring resp

* pci2: try ring

* pci2: mes and some fixes

* pci2: some progress

* pci2: progress

* pci2: mm

* pci2: discovery

* pci2: correct apertures

* pci2: b

* pci2: i

* pci2: l

* pci2: o

* pci2: cmu

* pci2: mes_kiq works

* pci2: mes

* pci2: kcq does not work(

* pci2: unhalt gfx

* ops_am

* minor

* check if amdgpu is there, or we will crash

* bring back graph, it just works

* less prints

* do not init mes (not used)

* remove unused files

* ops_am: start move into core

* ops_am: works

* clcks, but still slower

* faster + no mes_kiq

* vm frags + remove mes

* cleanup fw

* gmc tiny cleanup

* move to ops_amd

* comment out what we dont really need

* driverless

* close in speed

* am clean most of ips

* gmc to ips

* cleaner

* new vm walker

* comment old one

* remove unsued autogens

* last write ups

* remove psp hardcoded values

* more

* add logs

* ih

* p2p and sdma

* vfio hal and interrupts

* smth

* amd dev iface

* minor after rebase

* bind for sdma

* Revert "bind for sdma"

This reverts commit a90766514d.

* tmp

* debug new mm

* ugh, allreduce hangs fixed

* p1

* works

* no pci.py

* cleaner a bit

* smth

* tiny cleanups

* cleaner a bit

* pciiface

* linter

* linter 2

* linter 3

* linter

* pylint

* reverted unrelated changes

* unrelated

* cmp tool

* ugh wrong fw

* clockgating

* unrelated

* alloc smaller chunks

* this

* opt sigs

* collect stat

* ops

* upd

* proclogs

* proclogs2

* vfio

* ruff

* linter pylint

* oops

* mypy p1

* mem fix

* mypy p2

* mypy p3

* mypy p4

* correct

* minor

* more tests

* linter in tests

* pci_regs header

* minor write up

* setup

* do not require libs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-31 23:06:17 +03:00
chenyu
f3fdec940d Tensor.mod (#8458)
it's a python style mod. possibily can be cleaner with a floor div

relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
2024-12-31 11:31:42 -05:00
George Hotz
4c94726bac remove uop mutability [pr] (#8441)
* remove uop mutability [pr]

* test fixups

* most tests pass

* more tests pass

* lil test fixups

* them too

* fix test

* unneeded

* err, that

* fix test_hcq

* fix test failures

* fix that test

* tensor universe

* does this pass test

* Revert "does this pass test"

This reverts commit ed516b3169.

* Revert "tensor universe"

This reverts commit c21301852a.

* proper spidering for uops

* cleanups

* all tensors

* all tensors

* slow but correct

* fast

* no WeakSet

* faster

* no need for list

* revert that
2024-12-31 00:29:56 -05:00
George Hotz
e276b6eecd use Tensor.replace [pr] (#8455) 2024-12-30 23:20:46 -05:00
qazal
c7ec0ab674 delete unused View lt support (2) (#8451)
* delete lt on view (2)

* the scheduler uses symbolic_simple
2024-12-31 07:01:25 +08:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
George Hotz
180916257d add children tracking to uop [pr] (#8448) 2024-12-30 10:58:20 -05:00
George Hotz
29c14f1cbf hotfix: update tests for no uop mut 2024-12-30 10:05:37 -05:00
qazal
7499139239 scheduler renames from the buffer_shape branch [pr] (#8444)
* scheduler refactors and renames from the buffer_shape branch [pr]

* all unmasked sts are allowed here

* only renames
2024-12-30 16:33:38 +08:00
George Hotz
b71c51191b tests from remove uop mutability [pr] (#8442)
* tests from remove uop mutability [pr]

* more test fix

* simpler test fix

* remove that
2024-12-29 12:14:10 -05:00
qazal
34987a03af const copy folding spec + multi.py behavior [pr] (#8436)
* const copy folding spec + multi behavior [pr]

* copy from clang, move multi test
2024-12-29 23:12:13 +08:00