nimlgen
5a7f6b4d8e
am: fix launch on rdna4 ( #10206 )
2025-05-08 09:46:12 +03:00
George Hotz
8d4c563c01
all COPY can be clone ( #10205 )
...
* match old behavior
* simple
* it means the naive thing before the multi
* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea
Refactor test: Enable generality in testing UOp alu expressions ( #10200 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
* isolate test refactor
2025-05-07 19:39:44 -07:00
George Hotz
83efc5d5bb
lil changes from multi [pr] ( #10202 )
2025-05-07 14:42:30 -07:00
Rory Clear
9f2931ae67
Fix yolo load failing silently ( #10046 )
...
* wait for js before loading model
* use f32
* revert html changes, try both cameras and remove f16 req
* clean
2025-05-07 11:46:09 -07:00
uuuvn
10c9ede6b7
Cloud graph ( #9876 )
2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834
Fold constant variable ( #10196 )
...
* Add rule
* add test and comment
* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9
Take neg out of idiv ( #10164 )
...
* Add rules
* Fix tests
* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
qazal
e6c80a9e40
hotfix: early kwargs.pop('err') ( #10197 )
...
* hotfix: early kwargs.pop('err')
* err, no container
2025-05-07 23:53:26 +08:00
qazal
3bc72f02d9
better error message for linearizer failures in viz [pr] ( #10195 )
2025-05-07 23:11:44 +08:00
Sieds Lykles
09544d4556
Add rule and test ( #10189 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-07 10:15:55 -04:00
nimlgen
c603b86d69
usbgpu: move queues to controller ( #10194 )
2025-05-07 16:41:16 +03:00
nimlgen
0fbe494c6b
usb: cache writes into 0xa000 ( #10191 )
...
* usb: cache writes into 0xa000
* mock
* match parent spec
* ugh
2025-05-07 16:03:35 +03:00
nimlgen
b8fb0f11ff
hcq: parametrize signal allocation size ( #10192 )
2025-05-07 15:50:43 +03:00
nimlgen
685d5c46df
usbgpu: send pci write in batches ( #10190 )
...
* usbgpu: send pci write in batches
* mock
2025-05-07 14:41:56 +03:00
qazal
3a32fa228c
refactor merge_views matcher [pr] ( #10188 )
2025-05-07 19:22:06 +08:00
qazal
94e07725a6
only reorder expand if it can fuse with input ( #10186 )
...
* failing test
* only reorder expand if it can fuse with input
* (16,) is reshaped to (4, 4)
2025-05-07 18:14:31 +08:00
qazal
4ea3e373aa
decode lds ops in remu ( #10184 )
2025-05-07 16:44:18 +08:00
uuuvn
dba073e5c0
Less messy broken graph on paravirtualized metal workaround ( #10182 )
...
* Less messy broken graph on paravirtualized metal workaround
GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).
> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.
This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.
* unused import
2025-05-06 20:41:02 +03:00
nimlgen
59c03e8904
usbgpu: tiny changes in setup pci bars to match spec ( #10181 )
...
* usbgpu: tiny changes in setup pci bars to match spec
* unused
2025-05-06 20:39:03 +03:00
Ignacio Sica
74c25bdc8b
add support for ds_load_u8 in remu ( #10180 )
...
* add support for ds_load_u8 in remu
* add test for ds_load_u8
2025-05-06 20:31:00 +03:00
nimlgen
10f115fdb0
usbgpu: USB_RESCAN_BUS envvar ( #10177 )
2025-05-06 17:09:36 +03:00
nimlgen
781fd8c1eb
usbgpu: some tlp error info ( #10176 )
...
* usbgpu: some tlp error info
* oops
2025-05-06 17:01:10 +03:00
nimlgen
aea1f77225
amd: uppercase amd_iface vals ( #10175 )
2025-05-06 15:12:50 +03:00
nimlgen
34d55857cf
usbgpu: more devs in scan_pci ( #10171 )
2025-05-06 11:55:34 +03:00
nimlgen
37a7a99adb
metal: fix graph when unrelated input buffers are not metal buffers ( #10170 )
...
* metal: fix graph when unrelated input buffers are not metal buffers
* tinier test
2025-05-06 11:37:16 +03:00
George Hotz
603c03bef2
fix tests for rewrite [pr] ( #10167 )
...
* fix tests for rewrite [pr]
* cleaner
* delete linearize_uop
* clean up the rest
2025-05-05 19:19:49 -07:00
wozeparrot
10437904cd
refactor: ops_cloud -> ops_remote [pr] ( #10166 )
2025-05-05 15:59:51 -07:00
uuuvn
b4dfb3ba78
A bit less convoluted jit capability detection ( #9923 )
...
Required for cloud multi stuff to know if graph supports transfers and
if graph is on the same host (multi-host graphs are not supported)
2025-05-05 09:16:18 -07:00
Sieds Lykles
338f33efae
Fast mod ( #10055 )
...
* Enable fast mod
* Add test
2025-05-05 09:15:43 -07:00
Kevin Buhler
363481e2fb
correct mispelled words ( #10165 )
2025-05-05 08:12:41 -07:00
nimlgen
98f4a831c8
am: do not alloc cwsr buffer ( #10163 )
2025-05-05 15:40:23 +03:00
qazal
ed552e99f6
cleaner llvmir masked load render [pr] ( #10161 )
2025-05-05 15:22:01 +08:00
qazal
62e86bc5ec
insert Ops.FUSE for arange ( #10140 )
...
* insert Ops.FUSE for arange
* reshape does not collapse
* do not fuse reshapes
* add children
* fixups
* work
* add Ops.WHERE support to z3
* fix fuse for cast
* diff
* ugh
* don't need this anymore
* contiguous
* add always_contiguous
* there too
2025-05-05 08:32:12 +03:00
qazal
cf626e23cb
rename to ALWAYS_CONTIGUOUS [pr] ( #10160 )
2025-05-05 08:14:56 +03:00
George Hotz
b68f036551
default on OSX is llvm 19 ( #10159 )
2025-05-04 18:13:50 -07:00
George Hotz
e07d8b147a
hotfix: don't OOM in the osx unit test
2025-05-04 17:53:55 -07:00
George Hotz
19cda7eb3a
bitcast isn't needed for llvm pointers ( #10158 )
...
* bitcast isn't needed for llvm pointers
* downgrade to llvm 19
* Revert "downgrade to llvm 19"
This reverts commit 3777801b4b .
* fix llvm 20 bug
* Revert "fix llvm 20 bug"
This reverts commit edc1e053fa .
2025-05-04 17:30:45 -07:00
George Hotz
a0240d8c2b
lil work on llvm speed ( #10157 )
...
* lil work on llvm speed
* llvm failing test
* 1e-4
* simpler failing test
* once is fine
* gpt suggests this syntax change
* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6
move merge views [pr] ( #10156 )
...
* move merge views [pr]
* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606
cache rewrites for renderer [pr] ( #10155 )
...
* add caching to rewrites for renderer [pr]
* remove that
* update ebs
2025-05-04 13:45:15 -07:00
George Hotz
fe0724eebf
prebuild all rewrites [pr] ( #10154 )
...
* prebuild all rewrites [pr]
* fix that
* tests pass with linearizer
2025-05-04 13:01:18 -07:00
George Hotz
2b055cb59c
hotfix: add BLOCKFINAL op
2025-05-04 12:26:29 -07:00
George Hotz
c64fb31bb7
simple reduce w/o devectorize [pr] ( #10153 )
...
* simple reduce w/o devectorize [pr]
* useful upgrades
* refactor to horizontal
* or_broadcasted
2025-05-04 08:42:02 -07:00
quortus
b38be2588f
Graph support for LLVM ( #10029 )
...
* Graph support for LLVM
* Always inline functions
* Specify renderer class only once
* Improve formatting
* Fix indexing
* Rollback parameter name change
* Force CI rerun
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-04 06:52:55 -07:00
Sieds Lykles
848c7783a4
Sign check in div const div pattern ( #10150 )
...
* Add rule
* Relax the condition
* Add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-03 18:04:34 -04:00
nimlgen
cda72c1de6
amd: print full trace when AMD_IFACE is set ( #10149 )
2025-05-03 22:46:04 +03:00
George Hotz
7c33924a50
don't use real_size for mem_bytes [pr] ( #10147 )
2025-05-03 09:41:21 -04:00
qazal
42cbf7aed4
more viz cleanups + notes [pr] ( #10145 )
2025-05-03 15:57:56 +08:00
qazal
d8cc1fd2f8
viz minor fixups ( #10143 )
...
* properly scroll sub rewrites into view
* fix resizer for right sidebar
2025-05-03 03:50:14 +03:00