qazal
cf626e23cb
rename to ALWAYS_CONTIGUOUS [pr] ( #10160 )
2025-05-05 08:14:56 +03:00
George Hotz
b68f036551
default on OSX is llvm 19 ( #10159 )
2025-05-04 18:13:50 -07:00
George Hotz
e07d8b147a
hotfix: don't OOM in the osx unit test
2025-05-04 17:53:55 -07:00
George Hotz
19cda7eb3a
bitcast isn't needed for llvm pointers ( #10158 )
...
* bitcast isn't needed for llvm pointers
* downgrade to llvm 19
* Revert "downgrade to llvm 19"
This reverts commit 3777801b4b .
* fix llvm 20 bug
* Revert "fix llvm 20 bug"
This reverts commit edc1e053fa .
2025-05-04 17:30:45 -07:00
George Hotz
a0240d8c2b
lil work on llvm speed ( #10157 )
...
* lil work on llvm speed
* llvm failing test
* 1e-4
* simpler failing test
* once is fine
* gpt suggests this syntax change
* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6
move merge views [pr] ( #10156 )
...
* move merge views [pr]
* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606
cache rewrites for renderer [pr] ( #10155 )
...
* add caching to rewrites for renderer [pr]
* remove that
* update ebs
2025-05-04 13:45:15 -07:00
George Hotz
fe0724eebf
prebuild all rewrites [pr] ( #10154 )
...
* prebuild all rewrites [pr]
* fix that
* tests pass with linearizer
2025-05-04 13:01:18 -07:00
George Hotz
2b055cb59c
hotfix: add BLOCKFINAL op
2025-05-04 12:26:29 -07:00
George Hotz
c64fb31bb7
simple reduce w/o devectorize [pr] ( #10153 )
...
* simple reduce w/o devectorize [pr]
* useful upgrades
* refactor to horizontal
* or_broadcasted
2025-05-04 08:42:02 -07:00
quortus
b38be2588f
Graph support for LLVM ( #10029 )
...
* Graph support for LLVM
* Always inline functions
* Specify renderer class only once
* Improve formatting
* Fix indexing
* Rollback parameter name change
* Force CI rerun
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-04 06:52:55 -07:00
Sieds Lykles
848c7783a4
Sign check in div const div pattern ( #10150 )
...
* Add rule
* Relax the condition
* Add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-03 18:04:34 -04:00
nimlgen
cda72c1de6
amd: print full trace when AMD_IFACE is set ( #10149 )
2025-05-03 22:46:04 +03:00
George Hotz
7c33924a50
don't use real_size for mem_bytes [pr] ( #10147 )
2025-05-03 09:41:21 -04:00
qazal
42cbf7aed4
more viz cleanups + notes [pr] ( #10145 )
2025-05-03 15:57:56 +08:00
qazal
d8cc1fd2f8
viz minor fixups ( #10143 )
...
* properly scroll sub rewrites into view
* fix resizer for right sidebar
2025-05-03 03:50:14 +03:00
qazal
230a369708
remove some IGNORE_OOB [pr] ( #10142 )
...
* remove some IGNORE_OOB
* remove fuzz_schedule stuff
* test with global
* add for amd ci
2025-05-03 01:16:14 +03:00
qazal
1ed5d733bd
disable TRACK_MATCH_STATS for type_verify ( #10141 )
2025-05-02 20:59:19 +03:00
nimlgen
993f0a0e87
am: a bit faster alloc ( #10138 )
...
* am: a bit faster allocs
* am: faster allocs
2025-05-02 16:03:42 +03:00
nimlgen
81410befc2
am: remove sleep from wait_reg ( #10139 )
...
* am: remove sleep from wait_reg
* fst
* ooops
2025-05-02 15:46:29 +03:00
nimlgen
45bf7c5b81
am: add allocation bench ( #10135 )
...
* init allocation bench
* sorryg
* betetr
2025-05-02 13:51:07 +03:00
nimlgen
6a845c2de2
amd: fix sigs on xcc path ( #10137 )
2025-05-02 13:50:56 +03:00
nimlgen
bdd4dd9238
am: do not expect aligned size in valloc ( #10136 )
2025-05-02 12:19:59 +03:00
Ignacio Sica
8f79492c75
fix test_tensor_cores_codegen for ptx renderer ( #10119 )
2025-05-01 21:52:36 -03:00
nimlgen
30bd6a619f
usb gpu ( #8766 )
...
* start gpu
* progress
* fixes
* read correct
* libusb
* libusb works
* support asm24
* hmm
* one access file
* fix extra
* start AMBar
* works on am
* back to usb
* patch fw
* full fast write into a bar
* ugh, minus one gpus, next please
* mute libusb for now
* usb for asm24
* 63
* hmm
* ops
* rescan
* and gpu shoudl be there
* enumerate them?
* usbgpu bus 4, 100% reliable (draft)
* lil
* works
* comments
* add DEBUG
* cleaner
* simplest
* Revert "simplest"
This reverts commit 1d00354c16 .
* Revert "cleaner"
This reverts commit c5662de956 .
* assert we find gpu
* that's simpler
* this back
* simpler?
* correcT
* work
* nonsense
* works with more checks
* this works
* the 6s in the right place
* reliable now
* fix after reboot
* set config
* 1s timeouts
* close to fw loading
* streams
* usbhub works
* endpoints
* fix
* want to test tiny10
* move to tiny 10
* fix gpu
* ugly speed
* smth
* mostly broken, but signals and dmas
* do not reset gpu every time
* changes to run kernels
* ugh, not working
* t10
* pg and sc files
* some prog
* um?
* somehow it works
* patched for 24
* some tries
* minimal
* moving
* back to working
* so sloooooow
* move to controller
* usb.py rewrite
* rework
* cleaner 1
* cleaner 2
* cleaner 3
* new abstractions
* aft merge
* init controller
* cleaner 4
* cleaner 5
* patcher + tiny changes
* ignore that
* cleaner 6
* after rebase
* cleaner 7
* bring it back
* start linter war
* linter 2
* autogen was missing
* fix autogen
* typing
* better?
* mypy
* extra/legacy rename and cleaner
* shuffle
* better printing
* tiny changes and tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-01 18:03:47 +03:00
nimlgen
7573c0ef4e
amd,nv: use .cpu_view() in bind ( #10131 )
2025-05-01 17:46:12 +03:00
nimlgen
16e5376ae8
line limit 12800 for usb ( #10130 )
2025-05-01 16:57:44 +03:00
qazal
0c59c6b8c7
remove replace from Tensor assign [pr] ( #10127 )
...
* remove replace from Tensor assign
* assign is contiguous
* allow chaining view
* only assert axis
2025-05-01 19:37:55 +08:00
nimlgen
9caceda79a
amd: comgr is not required ( #10128 )
2025-05-01 13:41:44 +03:00
nimlgen
c3d2e4a6e1
amd: use sdma to copy program ( #10126 )
...
* amd: use sdma to copy program
* rm
* ensure prog is copies
* match nv style
2025-05-01 13:04:22 +03:00
nimlgen
09f5be9bcb
amd: finalize device in case of failures ( #10124 )
2025-05-01 10:41:15 +03:00
George Hotz
ef011ff5f9
flip Ops.COPY order [pr] ( #10122 )
...
* flip Ops.COPY order [pr]
* fix copy and support multi device copy in _device
2025-05-01 00:26:24 -04:00
chenyu
145e51247a
split CAST and BITCAST in PYTHON [pr] ( #10123 )
...
CAST only needs truncate and does not require dtype fmt. added bfloat16 tests can run locally
2025-04-30 23:27:35 -04:00
Ignacio Sica
bf5fb97498
fix AMD_LLVM bf16 tc for gfx1100 ( #10102 )
...
* fix amd_llvm bf16 tc
* cleanup pattern
2025-04-30 20:06:38 -03:00
George Hotz
dd0070daab
Revert "flip Ops.COPY order [pr] ( #10120 )" ( #10121 )
...
This reverts commit 984f09ac74 .
2025-04-30 17:25:21 -04:00
George Hotz
984f09ac74
flip Ops.COPY order [pr] ( #10120 )
2025-04-30 16:50:18 -04:00
chenyu
17d4d258ea
simple symbolic slice in llama [pr] ( #10112 )
...
support slice that has step None and stop > start
2025-04-30 14:36:35 -04:00
nimlgen
b583ece8f3
amd: replace AMD_DRIVERLESS with AMD_IFACE ( #10116 )
...
* amd: replace AMD_DRIVERLESS with AMD_IFACE
* docs
* print direct err for amd_iface
* print for all
2025-04-30 20:22:02 +03:00
nimlgen
0e1beaf44f
nv: align copies + better test ( #10118 )
2025-04-30 20:09:53 +03:00
Ignacio Sica
2941537250
cast is noop if src has dtypes.void ( #10110 )
2025-04-30 13:55:41 -03:00
nimlgen
fcdda4fc09
am: move boot memory to vram start ( #10115 )
2025-04-30 19:12:19 +03:00
nimlgen
844d5577d8
hcq: make copy_bufs and kernargs_size params configurable per device ( #10114 )
2025-04-30 18:43:50 +03:00
nimlgen
2ec3b722e2
nv: fix copies larger than 4g ( #10117 )
2025-04-30 18:43:17 +03:00
George Hotz
d81acbeef6
multi: move shrink after copy ( #10109 )
...
* multi: move shrink after copy
* passing now
2025-04-30 10:29:51 -04:00
qazal
67bd8489ad
grouper cleanups [pr] ( #10113 )
2025-04-30 18:54:47 +08:00
nimlgen
b4c9a3d8f4
hcq: use mmio iface in copies ( #10111 )
...
* hcq: use mmio iface in copies
* linter
* fix_am
* am
2025-04-30 11:05:13 +03:00
nimlgen
5c7d004da5
hcq: refactor int ptrs to hcqbuffers ( #10105 )
...
* hcq: refactor int ptrs to hcqbuffers
* more refactors
* linter
* use in allocator
* test fiz
* fx
* ops
* final?
* simpler
* keep this for now
2025-04-30 00:12:18 +03:00
chenyu
573bbb9746
Revert "remove TransformerBlock contiguous in llama ( #10104 )" ( #10108 )
...
This reverts commit b8d07dcc54 .
2025-04-29 15:28:38 -04:00
chenyu
4a04098389
fix llama3 with nf4 quantize ( #10107 )
...
also int8 outputs is wrong
2025-04-29 15:14:36 -04:00
George Hotz
9c1b80499f
names for graph rewrites + null device supports exp and friends ( #10106 )
2025-04-29 14:28:20 -04:00