Commit Graph

8704 Commits

Author SHA1 Message Date
nimlgen
b8fb0f11ff hcq: parametrize signal allocation size (#10192) 2025-05-07 15:50:43 +03:00
nimlgen
685d5c46df usbgpu: send pci write in batches (#10190)
* usbgpu: send pci write in batches

* mock
2025-05-07 14:41:56 +03:00
qazal
3a32fa228c refactor merge_views matcher [pr] (#10188) 2025-05-07 19:22:06 +08:00
qazal
94e07725a6 only reorder expand if it can fuse with input (#10186)
* failing test

* only reorder expand if it can fuse with input

* (16,) is reshaped to (4, 4)
2025-05-07 18:14:31 +08:00
qazal
4ea3e373aa decode lds ops in remu (#10184) 2025-05-07 16:44:18 +08:00
uuuvn
dba073e5c0 Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
nimlgen
59c03e8904 usbgpu: tiny changes in setup pci bars to match spec (#10181)
* usbgpu: tiny changes in setup pci bars to match spec

* unused
2025-05-06 20:39:03 +03:00
Ignacio Sica
74c25bdc8b add support for ds_load_u8 in remu (#10180)
* add support for ds_load_u8 in remu

* add test for ds_load_u8
2025-05-06 20:31:00 +03:00
nimlgen
10f115fdb0 usbgpu: USB_RESCAN_BUS envvar (#10177) 2025-05-06 17:09:36 +03:00
nimlgen
781fd8c1eb usbgpu: some tlp error info (#10176)
* usbgpu: some tlp error info

* oops
2025-05-06 17:01:10 +03:00
nimlgen
aea1f77225 amd: uppercase amd_iface vals (#10175) 2025-05-06 15:12:50 +03:00
nimlgen
34d55857cf usbgpu: more devs in scan_pci (#10171) 2025-05-06 11:55:34 +03:00
nimlgen
37a7a99adb metal: fix graph when unrelated input buffers are not metal buffers (#10170)
* metal: fix graph when unrelated input buffers are not metal buffers

* tinier test
2025-05-06 11:37:16 +03:00
George Hotz
603c03bef2 fix tests for rewrite [pr] (#10167)
* fix tests for rewrite [pr]

* cleaner

* delete linearize_uop

* clean up the rest
2025-05-05 19:19:49 -07:00
wozeparrot
10437904cd refactor: ops_cloud -> ops_remote [pr] (#10166) 2025-05-05 15:59:51 -07:00
uuuvn
b4dfb3ba78 A bit less convoluted jit capability detection (#9923)
Required for cloud multi stuff to know if graph supports transfers and
if graph is on the same host (multi-host graphs are not supported)
2025-05-05 09:16:18 -07:00
Sieds Lykles
338f33efae Fast mod (#10055)
* Enable fast mod

* Add test
2025-05-05 09:15:43 -07:00
Kevin Buhler
363481e2fb correct mispelled words (#10165) 2025-05-05 08:12:41 -07:00
nimlgen
98f4a831c8 am: do not alloc cwsr buffer (#10163) 2025-05-05 15:40:23 +03:00
qazal
ed552e99f6 cleaner llvmir masked load render [pr] (#10161) 2025-05-05 15:22:01 +08:00
qazal
62e86bc5ec insert Ops.FUSE for arange (#10140)
* insert Ops.FUSE for arange

* reshape does not collapse

* do not fuse reshapes

* add children

* fixups

* work

* add Ops.WHERE support to z3

* fix fuse for cast

* diff

* ugh

* don't need this anymore

* contiguous

* add always_contiguous

* there too
2025-05-05 08:32:12 +03:00
qazal
cf626e23cb rename to ALWAYS_CONTIGUOUS [pr] (#10160) 2025-05-05 08:14:56 +03:00
George Hotz
b68f036551 default on OSX is llvm 19 (#10159) 2025-05-04 18:13:50 -07:00
George Hotz
e07d8b147a hotfix: don't OOM in the osx unit test 2025-05-04 17:53:55 -07:00
George Hotz
19cda7eb3a bitcast isn't needed for llvm pointers (#10158)
* bitcast isn't needed for llvm pointers

* downgrade to llvm 19

* Revert "downgrade to llvm 19"

This reverts commit 3777801b4b.

* fix llvm 20 bug

* Revert "fix llvm 20 bug"

This reverts commit edc1e053fa.
2025-05-04 17:30:45 -07:00
George Hotz
a0240d8c2b lil work on llvm speed (#10157)
* lil work on llvm speed

* llvm failing test

* 1e-4

* simpler failing test

* once is fine

* gpt suggests this syntax change

* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6 move merge views [pr] (#10156)
* move merge views [pr]

* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606 cache rewrites for renderer [pr] (#10155)
* add caching to rewrites for renderer [pr]

* remove that

* update ebs
2025-05-04 13:45:15 -07:00
George Hotz
fe0724eebf prebuild all rewrites [pr] (#10154)
* prebuild all rewrites [pr]

* fix that

* tests pass with linearizer
2025-05-04 13:01:18 -07:00
George Hotz
2b055cb59c hotfix: add BLOCKFINAL op 2025-05-04 12:26:29 -07:00
George Hotz
c64fb31bb7 simple reduce w/o devectorize [pr] (#10153)
* simple reduce w/o devectorize [pr]

* useful upgrades

* refactor to horizontal

* or_broadcasted
2025-05-04 08:42:02 -07:00
quortus
b38be2588f Graph support for LLVM (#10029)
* Graph support for LLVM

* Always inline functions

* Specify renderer class only once

* Improve formatting

* Fix indexing

* Rollback parameter name change

* Force CI rerun

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-04 06:52:55 -07:00
Sieds Lykles
848c7783a4 Sign check in div const div pattern (#10150)
* Add rule

* Relax the condition

* Add test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-03 18:04:34 -04:00
nimlgen
cda72c1de6 amd: print full trace when AMD_IFACE is set (#10149) 2025-05-03 22:46:04 +03:00
George Hotz
7c33924a50 don't use real_size for mem_bytes [pr] (#10147) 2025-05-03 09:41:21 -04:00
qazal
42cbf7aed4 more viz cleanups + notes [pr] (#10145) 2025-05-03 15:57:56 +08:00
qazal
d8cc1fd2f8 viz minor fixups (#10143)
* properly scroll sub rewrites into view

* fix resizer for right sidebar
2025-05-03 03:50:14 +03:00
qazal
230a369708 remove some IGNORE_OOB [pr] (#10142)
* remove some IGNORE_OOB

* remove fuzz_schedule stuff

* test with global

* add for amd ci
2025-05-03 01:16:14 +03:00
qazal
1ed5d733bd disable TRACK_MATCH_STATS for type_verify (#10141) 2025-05-02 20:59:19 +03:00
nimlgen
993f0a0e87 am: a bit faster alloc (#10138)
* am: a bit faster allocs

* am: faster allocs
2025-05-02 16:03:42 +03:00
nimlgen
81410befc2 am: remove sleep from wait_reg (#10139)
* am: remove sleep from wait_reg

* fst

* ooops
2025-05-02 15:46:29 +03:00
nimlgen
45bf7c5b81 am: add allocation bench (#10135)
* init allocation bench

* sorryg

* betetr
2025-05-02 13:51:07 +03:00
nimlgen
6a845c2de2 amd: fix sigs on xcc path (#10137) 2025-05-02 13:50:56 +03:00
nimlgen
bdd4dd9238 am: do not expect aligned size in valloc (#10136) 2025-05-02 12:19:59 +03:00
Ignacio Sica
8f79492c75 fix test_tensor_cores_codegen for ptx renderer (#10119) 2025-05-01 21:52:36 -03:00
nimlgen
30bd6a619f usb gpu (#8766)
* start gpu

* progress

* fixes

* read correct

* libusb

* libusb works

* support asm24

* hmm

* one access file

* fix extra

* start AMBar

* works on am

* back to usb

* patch fw

* full fast write into a bar

* ugh, minus one gpus, next please

* mute libusb for now

* usb for asm24

* 63

* hmm

* ops

* rescan

* and gpu shoudl be there

* enumerate them?

* usbgpu bus 4, 100% reliable (draft)

* lil

* works

* comments

* add DEBUG

* cleaner

* simplest

* Revert "simplest"

This reverts commit 1d00354c16.

* Revert "cleaner"

This reverts commit c5662de956.

* assert we find gpu

* that's simpler

* this back

* simpler?

* correcT

* work

* nonsense

* works with more checks

* this works

* the 6s in the right place

* reliable now

* fix after reboot

* set config

* 1s timeouts

* close to fw loading

* streams

* usbhub works

* endpoints

* fix

* want to test tiny10

* move to tiny 10

* fix gpu

* ugly speed

* smth

* mostly broken, but signals and dmas

* do not reset gpu every time

* changes to run kernels

* ugh, not working

* t10

* pg and sc files

* some prog

* um?

* somehow it works

* patched for 24

* some tries

* minimal

* moving

* back to working

* so sloooooow

* move to controller

* usb.py rewrite

* rework

* cleaner 1

* cleaner 2

* cleaner 3

* new abstractions

* aft merge

* init controller

* cleaner 4

* cleaner 5

* patcher + tiny changes

* ignore that

* cleaner 6

* after rebase

* cleaner 7

* bring it back

* start linter war

* linter 2

* autogen was missing

* fix autogen

* typing

* better?

* mypy

* extra/legacy rename and cleaner

* shuffle

* better printing

* tiny changes and tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-01 18:03:47 +03:00
nimlgen
7573c0ef4e amd,nv: use .cpu_view() in bind (#10131) 2025-05-01 17:46:12 +03:00
nimlgen
16e5376ae8 line limit 12800 for usb (#10130) 2025-05-01 16:57:44 +03:00
qazal
0c59c6b8c7 remove replace from Tensor assign [pr] (#10127)
* remove replace from Tensor assign

* assign is contiguous

* allow chaining view

* only assert axis
2025-05-01 19:37:55 +08:00
nimlgen
9caceda79a amd: comgr is not required (#10128) 2025-05-01 13:41:44 +03:00