Commit Graph

8888 Commits

Author SHA1 Message Date
Sieds Lykles
338f33efae Fast mod (#10055)
* Enable fast mod

* Add test
2025-05-05 09:15:43 -07:00
Kevin Buhler
363481e2fb correct mispelled words (#10165) 2025-05-05 08:12:41 -07:00
nimlgen
98f4a831c8 am: do not alloc cwsr buffer (#10163) 2025-05-05 15:40:23 +03:00
qazal
ed552e99f6 cleaner llvmir masked load render [pr] (#10161) 2025-05-05 15:22:01 +08:00
qazal
62e86bc5ec insert Ops.FUSE for arange (#10140)
* insert Ops.FUSE for arange

* reshape does not collapse

* do not fuse reshapes

* add children

* fixups

* work

* add Ops.WHERE support to z3

* fix fuse for cast

* diff

* ugh

* don't need this anymore

* contiguous

* add always_contiguous

* there too
2025-05-05 08:32:12 +03:00
qazal
cf626e23cb rename to ALWAYS_CONTIGUOUS [pr] (#10160) 2025-05-05 08:14:56 +03:00
George Hotz
b68f036551 default on OSX is llvm 19 (#10159) 2025-05-04 18:13:50 -07:00
George Hotz
e07d8b147a hotfix: don't OOM in the osx unit test 2025-05-04 17:53:55 -07:00
George Hotz
19cda7eb3a bitcast isn't needed for llvm pointers (#10158)
* bitcast isn't needed for llvm pointers

* downgrade to llvm 19

* Revert "downgrade to llvm 19"

This reverts commit 3777801b4b.

* fix llvm 20 bug

* Revert "fix llvm 20 bug"

This reverts commit edc1e053fa.
2025-05-04 17:30:45 -07:00
George Hotz
a0240d8c2b lil work on llvm speed (#10157)
* lil work on llvm speed

* llvm failing test

* 1e-4

* simpler failing test

* once is fine

* gpt suggests this syntax change

* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6 move merge views [pr] (#10156)
* move merge views [pr]

* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606 cache rewrites for renderer [pr] (#10155)
* add caching to rewrites for renderer [pr]

* remove that

* update ebs
2025-05-04 13:45:15 -07:00
George Hotz
fe0724eebf prebuild all rewrites [pr] (#10154)
* prebuild all rewrites [pr]

* fix that

* tests pass with linearizer
2025-05-04 13:01:18 -07:00
George Hotz
2b055cb59c hotfix: add BLOCKFINAL op 2025-05-04 12:26:29 -07:00
George Hotz
c64fb31bb7 simple reduce w/o devectorize [pr] (#10153)
* simple reduce w/o devectorize [pr]

* useful upgrades

* refactor to horizontal

* or_broadcasted
2025-05-04 08:42:02 -07:00
quortus
b38be2588f Graph support for LLVM (#10029)
* Graph support for LLVM

* Always inline functions

* Specify renderer class only once

* Improve formatting

* Fix indexing

* Rollback parameter name change

* Force CI rerun

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-04 06:52:55 -07:00
Sieds Lykles
848c7783a4 Sign check in div const div pattern (#10150)
* Add rule

* Relax the condition

* Add test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-03 18:04:34 -04:00
nimlgen
cda72c1de6 amd: print full trace when AMD_IFACE is set (#10149) 2025-05-03 22:46:04 +03:00
George Hotz
7c33924a50 don't use real_size for mem_bytes [pr] (#10147) 2025-05-03 09:41:21 -04:00
qazal
42cbf7aed4 more viz cleanups + notes [pr] (#10145) 2025-05-03 15:57:56 +08:00
qazal
d8cc1fd2f8 viz minor fixups (#10143)
* properly scroll sub rewrites into view

* fix resizer for right sidebar
2025-05-03 03:50:14 +03:00
qazal
230a369708 remove some IGNORE_OOB [pr] (#10142)
* remove some IGNORE_OOB

* remove fuzz_schedule stuff

* test with global

* add for amd ci
2025-05-03 01:16:14 +03:00
qazal
1ed5d733bd disable TRACK_MATCH_STATS for type_verify (#10141) 2025-05-02 20:59:19 +03:00
nimlgen
993f0a0e87 am: a bit faster alloc (#10138)
* am: a bit faster allocs

* am: faster allocs
2025-05-02 16:03:42 +03:00
nimlgen
81410befc2 am: remove sleep from wait_reg (#10139)
* am: remove sleep from wait_reg

* fst

* ooops
2025-05-02 15:46:29 +03:00
nimlgen
45bf7c5b81 am: add allocation bench (#10135)
* init allocation bench

* sorryg

* betetr
2025-05-02 13:51:07 +03:00
nimlgen
6a845c2de2 amd: fix sigs on xcc path (#10137) 2025-05-02 13:50:56 +03:00
nimlgen
bdd4dd9238 am: do not expect aligned size in valloc (#10136) 2025-05-02 12:19:59 +03:00
Ignacio Sica
8f79492c75 fix test_tensor_cores_codegen for ptx renderer (#10119) 2025-05-01 21:52:36 -03:00
nimlgen
30bd6a619f usb gpu (#8766)
* start gpu

* progress

* fixes

* read correct

* libusb

* libusb works

* support asm24

* hmm

* one access file

* fix extra

* start AMBar

* works on am

* back to usb

* patch fw

* full fast write into a bar

* ugh, minus one gpus, next please

* mute libusb for now

* usb for asm24

* 63

* hmm

* ops

* rescan

* and gpu shoudl be there

* enumerate them?

* usbgpu bus 4, 100% reliable (draft)

* lil

* works

* comments

* add DEBUG

* cleaner

* simplest

* Revert "simplest"

This reverts commit 1d00354c16.

* Revert "cleaner"

This reverts commit c5662de956.

* assert we find gpu

* that's simpler

* this back

* simpler?

* correcT

* work

* nonsense

* works with more checks

* this works

* the 6s in the right place

* reliable now

* fix after reboot

* set config

* 1s timeouts

* close to fw loading

* streams

* usbhub works

* endpoints

* fix

* want to test tiny10

* move to tiny 10

* fix gpu

* ugly speed

* smth

* mostly broken, but signals and dmas

* do not reset gpu every time

* changes to run kernels

* ugh, not working

* t10

* pg and sc files

* some prog

* um?

* somehow it works

* patched for 24

* some tries

* minimal

* moving

* back to working

* so sloooooow

* move to controller

* usb.py rewrite

* rework

* cleaner 1

* cleaner 2

* cleaner 3

* new abstractions

* aft merge

* init controller

* cleaner 4

* cleaner 5

* patcher + tiny changes

* ignore that

* cleaner 6

* after rebase

* cleaner 7

* bring it back

* start linter war

* linter 2

* autogen was missing

* fix autogen

* typing

* better?

* mypy

* extra/legacy rename and cleaner

* shuffle

* better printing

* tiny changes and tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-01 18:03:47 +03:00
nimlgen
7573c0ef4e amd,nv: use .cpu_view() in bind (#10131) 2025-05-01 17:46:12 +03:00
nimlgen
16e5376ae8 line limit 12800 for usb (#10130) 2025-05-01 16:57:44 +03:00
qazal
0c59c6b8c7 remove replace from Tensor assign [pr] (#10127)
* remove replace from Tensor assign

* assign is contiguous

* allow chaining view

* only assert axis
2025-05-01 19:37:55 +08:00
nimlgen
9caceda79a amd: comgr is not required (#10128) 2025-05-01 13:41:44 +03:00
nimlgen
c3d2e4a6e1 amd: use sdma to copy program (#10126)
* amd: use sdma to copy program

* rm

* ensure prog is copies

* match nv style
2025-05-01 13:04:22 +03:00
nimlgen
09f5be9bcb amd: finalize device in case of failures (#10124) 2025-05-01 10:41:15 +03:00
George Hotz
ef011ff5f9 flip Ops.COPY order [pr] (#10122)
* flip Ops.COPY order [pr]

* fix copy and support multi device copy in _device
2025-05-01 00:26:24 -04:00
chenyu
145e51247a split CAST and BITCAST in PYTHON [pr] (#10123)
CAST only needs truncate and does not require dtype fmt. added bfloat16 tests can run locally
2025-04-30 23:27:35 -04:00
Ignacio Sica
bf5fb97498 fix AMD_LLVM bf16 tc for gfx1100 (#10102)
* fix amd_llvm bf16 tc

* cleanup pattern
2025-04-30 20:06:38 -03:00
George Hotz
dd0070daab Revert "flip Ops.COPY order [pr] (#10120)" (#10121)
This reverts commit 984f09ac74.
2025-04-30 17:25:21 -04:00
George Hotz
984f09ac74 flip Ops.COPY order [pr] (#10120) 2025-04-30 16:50:18 -04:00
chenyu
17d4d258ea simple symbolic slice in llama [pr] (#10112)
support slice that has step None and stop > start
2025-04-30 14:36:35 -04:00
nimlgen
b583ece8f3 amd: replace AMD_DRIVERLESS with AMD_IFACE (#10116)
* amd: replace AMD_DRIVERLESS with AMD_IFACE

* docs

* print direct err for amd_iface

* print for all
2025-04-30 20:22:02 +03:00
nimlgen
0e1beaf44f nv: align copies + better test (#10118) 2025-04-30 20:09:53 +03:00
Ignacio Sica
2941537250 cast is noop if src has dtypes.void (#10110) 2025-04-30 13:55:41 -03:00
nimlgen
fcdda4fc09 am: move boot memory to vram start (#10115) 2025-04-30 19:12:19 +03:00
nimlgen
844d5577d8 hcq: make copy_bufs and kernargs_size params configurable per device (#10114) 2025-04-30 18:43:50 +03:00
nimlgen
2ec3b722e2 nv: fix copies larger than 4g (#10117) 2025-04-30 18:43:17 +03:00
George Hotz
d81acbeef6 multi: move shrink after copy (#10109)
* multi: move shrink after copy

* passing now
2025-04-30 10:29:51 -04:00
qazal
67bd8489ad grouper cleanups [pr] (#10113) 2025-04-30 18:54:47 +08:00