Commit Graph

10490 Commits

Author SHA1 Message Date
nimlgen
d08ce62553 hcq: do not reread signal in wait (#10232) 2025-05-09 14:38:36 +03:00
nimlgen
0464a31000 usbgpu: no overrun check needed (#10231) 2025-05-09 14:20:24 +03:00
nimlgen
116390083f nvme speed write example (#10230) 2025-05-09 14:20:01 +03:00
chenyu
9846435c2e fix test_div_numerator_negative (#10229)
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3 update uop symbolic tests (#10228)
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319 better bound for mod negative number (#10227) 2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb tighter idiv bound for symbolic denominator (#10226) 2025-05-08 22:38:56 -04:00
uuuvn
82a6160ff7 Detect metal paravirtualization bug via device name instead of CI (#10225) 2025-05-08 19:31:47 -07:00
Xingyu
a21369d039 Enhance tensor random functions with dtype support (#10214)
* Enhance tensor random functions with dtype support
- Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py
- Added unit tests for uniform and normal tensor generation with specific dtypes in test.py

* Refactor test name for clarity
- Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py`
- Aims to improve readability and better reflect the test's purpose
2025-05-08 20:48:07 -04:00
qazal
b6904bbf83 Revert "split grouper into insert and finalize stages [pr] (#10222)" (#10224)
This reverts commit 2594e4db15.
2025-05-09 03:02:38 +03:00
qazal
2594e4db15 split grouper into insert and finalize stages [pr] (#10222) 2025-05-09 02:36:22 +03:00
George Hotz
0b7e3e86d0 single device copy [pr] (#10221)
* single device copy [pr]

* simpler
2025-05-08 15:23:22 -07:00
qazal
1d0f239df7 use Tensor.train() in schedule test + typo [pr] (#10220) 2025-05-08 23:46:42 +03:00
qazal
ff2aa6d0b2 buffer in create_kernel is optional [pr] (#10218)
* buffer in create_kernel is optional [pr]

* pylint
2025-05-08 22:35:55 +03:00
qazal
40560e77c2 minor grouper + viz fixup [pr] (#10217)
* minor grouper + viz fixup [pr]

* gitignore mypy_cache

* reorder create_kernels

* replace with realized

* use tensor_map + viz before spec

* lint

* add that back
2025-05-08 21:39:44 +03:00
George Hotz
0411b09763 small changes from new multi [pr] (#10213) 2025-05-08 07:04:27 -07:00
Sieds Lykles
a0580e8d3c Cleanup in div_and_mod_folding [pr] (#10178)
* Refactor binary var simplification

* Simplify the congruence logic

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-08 06:25:32 -07:00
nimlgen
267ba9b592 usbgpu: better names in copy speed benchmark (#10212) 2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00 Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
e24fe1c746 usbgpu: pci cache (#10207) 2025-05-08 14:31:01 +03:00
nimlgen
7d6ed1b1e9 hotfix: mac ci (#10210)
* fixed?

* cmnt
2025-05-08 14:13:23 +03:00
nimlgen
ba52fce4b2 usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992 remove view_supported_devices, check allocator instead [pr] (#10209) 2025-05-08 11:45:02 +03:00
nimlgen
5a7f6b4d8e am: fix launch on rdna4 (#10206) 2025-05-08 09:46:12 +03:00
George Hotz
8d4c563c01 all COPY can be clone (#10205)
* match old behavior

* simple

* it means the naive thing before the multi

* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea Refactor test: Enable generality in testing UOp alu expressions (#10200)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test

* isolate test refactor
2025-05-07 19:39:44 -07:00
George Hotz
83efc5d5bb lil changes from multi [pr] (#10202) 2025-05-07 14:42:30 -07:00
Rory Clear
9f2931ae67 Fix yolo load failing silently (#10046)
* wait for js before loading model

* use f32

* revert html changes, try both cameras and remove f16 req

* clean
2025-05-07 11:46:09 -07:00
uuuvn
10c9ede6b7 Cloud graph (#9876) 2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834 Fold constant variable (#10196)
* Add rule

* add test and comment

* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9 Take neg out of idiv (#10164)
* Add rules

* Fix tests

* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
qazal
e6c80a9e40 hotfix: early kwargs.pop('err') (#10197)
* hotfix: early kwargs.pop('err')

* err, no container
2025-05-07 23:53:26 +08:00
qazal
3bc72f02d9 better error message for linearizer failures in viz [pr] (#10195) 2025-05-07 23:11:44 +08:00
Sieds Lykles
09544d4556 Add rule and test (#10189)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-07 10:15:55 -04:00
nimlgen
c603b86d69 usbgpu: move queues to controller (#10194) 2025-05-07 16:41:16 +03:00
nimlgen
0fbe494c6b usb: cache writes into 0xa000 (#10191)
* usb: cache writes into 0xa000

* mock

* match parent spec

* ugh
2025-05-07 16:03:35 +03:00
nimlgen
b8fb0f11ff hcq: parametrize signal allocation size (#10192) 2025-05-07 15:50:43 +03:00
nimlgen
685d5c46df usbgpu: send pci write in batches (#10190)
* usbgpu: send pci write in batches

* mock
2025-05-07 14:41:56 +03:00
qazal
3a32fa228c refactor merge_views matcher [pr] (#10188) 2025-05-07 19:22:06 +08:00
qazal
94e07725a6 only reorder expand if it can fuse with input (#10186)
* failing test

* only reorder expand if it can fuse with input

* (16,) is reshaped to (4, 4)
2025-05-07 18:14:31 +08:00
qazal
4ea3e373aa decode lds ops in remu (#10184) 2025-05-07 16:44:18 +08:00
uuuvn
dba073e5c0 Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
nimlgen
59c03e8904 usbgpu: tiny changes in setup pci bars to match spec (#10181)
* usbgpu: tiny changes in setup pci bars to match spec

* unused
2025-05-06 20:39:03 +03:00
Ignacio Sica
74c25bdc8b add support for ds_load_u8 in remu (#10180)
* add support for ds_load_u8 in remu

* add test for ds_load_u8
2025-05-06 20:31:00 +03:00
nimlgen
10f115fdb0 usbgpu: USB_RESCAN_BUS envvar (#10177) 2025-05-06 17:09:36 +03:00
nimlgen
781fd8c1eb usbgpu: some tlp error info (#10176)
* usbgpu: some tlp error info

* oops
2025-05-06 17:01:10 +03:00
nimlgen
aea1f77225 amd: uppercase amd_iface vals (#10175) 2025-05-06 15:12:50 +03:00
nimlgen
34d55857cf usbgpu: more devs in scan_pci (#10171) 2025-05-06 11:55:34 +03:00
nimlgen
37a7a99adb metal: fix graph when unrelated input buffers are not metal buffers (#10170)
* metal: fix graph when unrelated input buffers are not metal buffers

* tinier test
2025-05-06 11:37:16 +03:00
George Hotz
603c03bef2 fix tests for rewrite [pr] (#10167)
* fix tests for rewrite [pr]

* cleaner

* delete linearize_uop

* clean up the rest
2025-05-05 19:19:49 -07:00