Commit Graph

10633 Commits

Author SHA1 Message Date
nimlgen
280143467b am: tune all sleep timings to match kernel (#8515)
* am: tune all sleep timings to match kernel

* rm
2025-01-06 18:03:57 +03:00
qazal
547fd5078f cleanups for COPY uop implementation and spec [pr] (#8513) 2025-01-06 11:39:12 +02:00
qazal
ed121d235c spec for CAST_BEFORE_VIEW=1 [pr] (#8512) 2025-01-06 10:43:58 +02:00
qazal
eb7df92136 dedup COPY UOp [pr] (#8506) 2025-01-06 10:37:20 +02:00
chenyu
76a138cdb6 simpler UOp.st [pr] (#8510) 2025-01-05 22:08:14 -05:00
chenyu
b6be407bc6 fix handcode_opt bert [pr] (#8509)
* fix handcode_opt bert [pr]

* too slow
2025-01-05 19:14:12 -05:00
geohotstan
9229867fec Support asymmetrical pads for all pooling functions (#8109)
* implemented in tensor

* apply onnx tests to asymmetrical pads

* better onnx op ordering

* correct ceil_mode asymmetrical

* fix onnx_ops comments

* a few more TODOs and fix some stupidity

* fix some typing

* fix test

* mypy still a little messed up

* refactor out pad struct transformation

* add simple docs for now

* add whatever tests possible

* add tests for _resolve_pool_pads

* better err msg

* whoops didn't mean to include this

* retry CI

* enable asymmetric pads onnx tests

* better docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-05 16:01:08 -05:00
uuuvn
c9c7f1be46 Remove unused R_AARCH64_CALL26 relocation (#8508)
First iteration of the AMX fix was using symbol lookup + trampoline
approach which required this, however later i replaced it by marking
amx function `static` and assumed that relocation was still used when
callee wasn't inlined, however this turned out not to be the case
because the callee can't be moved around by linker at link-time and
can't be overloaded by other symbols (`static` means priority + local
visibility)
2025-01-06 00:00:21 +03:00
nimlgen
b4f4a3ac12 am: minor parts (#8507) 2025-01-05 23:05:21 +03:00
qazal
0e0cba2cfc move llvm_bf16_cast to the renderer [pr] (#8502)
* move llvm_bf16_cast to the renderer [pr]

* cast to half is fine too

* delete the old one

* wish i could just cast the ptr
2025-01-05 13:02:41 +02:00
chenyu
4143f6a7d9 unused from __future__ import annotations [pr] (#8504) 2025-01-04 23:11:01 -05:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
George Hotz
ddad4d55da add typing to tqdm [pr] (#8500) 2025-01-04 13:55:52 -05:00
qazal
036efa9157 use UOp.substitute for VIZ=1 [pr] (#8497)
* use UOp.substitute for VIZ=1 [pr]

* more acceptable
2025-01-04 20:00:29 +02:00
uuuvn
615d5276b1 Suppress 'X warnings generated.' in MTLCompiler (#8489)
'-fno-caret-diagnostics' is what clang-tidy uses when user passes --quiet
2025-01-04 10:22:37 -05:00
nimlgen
5df213d51e am: remove alloc frags logic (#8491) 2025-01-04 12:25:20 +03:00
geohotstan
3dfc8e1706 Share a _resolve_pool_pads function for pool ops in Tensor (#8485)
* _padding2d -> _resolve_pool_pads

* rephrase err msg

* even better error msg

* check asymmetric first os people don't hit error twice

* test against torch
2025-01-03 23:54:11 -05:00
chenyu
6c639dee5c more informative kernel opt error messages [pr] (#8487) 2025-01-03 14:29:36 -05:00
uuuvn
5ffc50d58c Clang JIT (#8481)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-03 11:12:55 -05:00
qazal
12fa4340b3 pickle ContextVars in process replay [pr] (#8484)
* pickle ContextVars in process replay

* add test_pickle_context_var [pr]

* more realistic
2025-01-03 23:11:54 +08:00
qazal
bd4d7dc4eb return becomes_map from the scheduler (#8483)
* return becomes_map from the scheduler

* fix test_schedule

* fix abstractions2

* s/becomes/becomes_map
2025-01-03 22:47:21 +08:00
qazal
c163b2c5f0 give copy a device: COPY(device, copyin) (#8482) 2025-01-03 22:34:38 +08:00
qazal
0d33391038 delete unused allow_buffer_view=True arg from bitcast [pr] (#8462) 2025-01-03 22:20:46 +08:00
nimlgen
5d37d33fc5 update typing.Optional to 3.10 for hcq (#8479) 2025-01-03 16:20:49 +03:00
uuuvn
048643e7f9 Skip test that counts Ops.LOAD on CLANG+AMX (upcasts up to float16) (#8475)
This test assumes that float4 is the max upcast and tests that 8 float
loads are upcasted to 2 float4 loads, however on CLANG+AMX upcasts
can be up to float16 and in this test we get one float8 load instead.

The @unittest.skipIf line is copied from test_linearizer.py where
a bunch of tests make similar assumptions about upcasts.
2025-01-02 17:17:49 -05:00
geohotstan
de306c615b [fixed] onnx pool cleanup (#8474)
* pool janitor duty

* actually conv allows asymmetric pads

* a little prettier
2025-01-02 16:56:10 -05:00
qazal
08c9d980dc use const_like in uop zero folding [pr] (#8470) 2025-01-03 01:05:09 +08:00
chenyu
6fa38367bf Revert "onnx pool ops clean up (#8471)" (#8472)
This reverts commit 241db29ede.
2025-01-02 11:04:34 -05:00
uuuvn
e7c6282dd6 Fix uop.st for CLANG+AMX (#8460) 2025-01-02 18:01:41 +02:00
geohotstan
241db29ede onnx pool ops clean up (#8471) 2025-01-02 10:45:30 -05:00
geohotstan
c4b13e2f6d add onnx DequantizeLinear (#8468)
* is this right?

* small changes

* dont support float8

* mergeable?
2025-01-02 09:52:49 -05:00
qazal
f2bee34197 tests for symbolic_simple failing tensor const spec [pr] (#8469)
* tests for symbolic_simple failing tensor const spec [pr]

* mul is correct
2025-01-02 19:13:16 +08:00
Kyunghyun Park
dc9af4e2fc [VIZ] fix hljs.highlightElement to correctly target <code/> (#8465)
* hljs.highlightElement target code not pre

* createPre

* no style change

* real no style change

* remove unnecessary scroll bar

* horizontal scrollbar appears only when scrolled all the way to the bottom

* misc
2025-01-02 14:50:51 +08:00
chenyu
e5c85ec684 type annotation of resolve [pr] (#8467)
it takes UOp|bool
2025-01-01 10:21:59 -05:00
George Hotz
e3c9cfad80 am driver: print on that assert (#8463) 2024-12-31 18:01:59 -05:00
nimlgen
c18307e749 AM driver (#6923)
* connect to gpu

* rlc init?

* gfx comp start init

* early init is hardoded, some progress with fw

* gart

* progress, next mqd

* ring setup, still does not execute anything

* ugh write correct reg

* pci2: vm

* pci2: start psp

* vm seems to work

* pci2: gfx start

* pci2: fix psp ring resp

* pci2: try ring

* pci2: mes and some fixes

* pci2: some progress

* pci2: progress

* pci2: mm

* pci2: discovery

* pci2: correct apertures

* pci2: b

* pci2: i

* pci2: l

* pci2: o

* pci2: cmu

* pci2: mes_kiq works

* pci2: mes

* pci2: kcq does not work(

* pci2: unhalt gfx

* ops_am

* minor

* check if amdgpu is there, or we will crash

* bring back graph, it just works

* less prints

* do not init mes (not used)

* remove unused files

* ops_am: start move into core

* ops_am: works

* clcks, but still slower

* faster + no mes_kiq

* vm frags + remove mes

* cleanup fw

* gmc tiny cleanup

* move to ops_amd

* comment out what we dont really need

* driverless

* close in speed

* am clean most of ips

* gmc to ips

* cleaner

* new vm walker

* comment old one

* remove unsued autogens

* last write ups

* remove psp hardcoded values

* more

* add logs

* ih

* p2p and sdma

* vfio hal and interrupts

* smth

* amd dev iface

* minor after rebase

* bind for sdma

* Revert "bind for sdma"

This reverts commit a90766514d.

* tmp

* debug new mm

* ugh, allreduce hangs fixed

* p1

* works

* no pci.py

* cleaner a bit

* smth

* tiny cleanups

* cleaner a bit

* pciiface

* linter

* linter 2

* linter 3

* linter

* pylint

* reverted unrelated changes

* unrelated

* cmp tool

* ugh wrong fw

* clockgating

* unrelated

* alloc smaller chunks

* this

* opt sigs

* collect stat

* ops

* upd

* proclogs

* proclogs2

* vfio

* ruff

* linter pylint

* oops

* mypy p1

* mem fix

* mypy p2

* mypy p3

* mypy p4

* correct

* minor

* more tests

* linter in tests

* pci_regs header

* minor write up

* setup

* do not require libs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-31 23:06:17 +03:00
George Hotz
d4a1d5211e bring back the DSP runtime 2024-12-31 12:01:42 -05:00
George Hotz
24de25b52f example to benchmark onnx [pr] (#8459)
* example to benchmark onnx [pr]

* reset global count
2024-12-31 11:38:33 -05:00
chenyu
f3fdec940d Tensor.mod (#8458)
it's a python style mod. possibily can be cleaner with a floor div

relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
2024-12-31 11:31:42 -05:00
qazal
ae00fa3b28 delete (slow) viz prepickle [pr] (#8456) 2024-12-31 20:26:18 +08:00
George Hotz
4c94726bac remove uop mutability [pr] (#8441)
* remove uop mutability [pr]

* test fixups

* most tests pass

* more tests pass

* lil test fixups

* them too

* fix test

* unneeded

* err, that

* fix test_hcq

* fix test failures

* fix that test

* tensor universe

* does this pass test

* Revert "does this pass test"

This reverts commit ed516b3169.

* Revert "tensor universe"

This reverts commit c21301852a.

* proper spidering for uops

* cleanups

* all tensors

* all tensors

* slow but correct

* fast

* no WeakSet

* faster

* no need for list

* revert that
2024-12-31 00:29:56 -05:00
George Hotz
e276b6eecd use Tensor.replace [pr] (#8455) 2024-12-30 23:20:46 -05:00
chenyu
19a54ae0b4 add Tensor.roll and Tensor.rearrange to doc (#8454)
also moved rearrange in tensor.py to high level movement
2024-12-30 20:25:50 -05:00
Alessandro Benetti
12cccd8bc5 fix rearrange docs (#8453)
* fix rearrange docs

* just the typo
2024-12-30 20:04:06 -05:00
qazal
c7ec0ab674 delete unused View lt support (2) (#8451)
* delete lt on view (2)

* the scheduler uses symbolic_simple
2024-12-31 07:01:25 +08:00
George Hotz
803a47494e Revert "Clang JIT (#8312)" (#8452)
This reverts commit b6266c8e41.
2024-12-30 17:49:35 -05:00
uuuvn
b6266c8e41 Clang JIT (#8312)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-30 17:37:53 -05:00
qazal
d157b20027 delete create_schedule, only create_schedule_with_vars [pr] (#8450) 2024-12-31 04:20:53 +08:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
George Hotz
0addbad36d Happy New Year! Let's get AM merged 2024-12-30 13:15:10 -05:00