Commit Graph

8752 Commits

Author SHA1 Message Date
qazal
9210280811 add v_fmac_f16 vop3 instruction to remu (#10247)
* fmac vop3

* from the box
2025-05-10 23:48:25 +03:00
George Hotz
697259a8a1 amd_comgr_action_info_set_options was deprecated [pr] (#10245)
* amd_comgr_action_info_set_options was deprecated [pr]

* more standard
2025-05-10 11:59:04 -07:00
Kevin Buhler
2e0990c4e9 even spacing in viz nodes (#10168)
* even spacing in viz nodes

* precise dy value

* dominant-baseline text-after-edge

* add STROKE_WIDTH constant, delete dominant_baseline attr

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-05-10 10:35:10 +03:00
chenyu
d0e9b74f40 minor div_and_mod_folding cleanup [pr] (#10243)
remove type ignore and one walrus
2025-05-09 22:42:01 -04:00
Adam Van Ymeren
a28ca0680f update dead link (#10242) 2025-05-09 19:59:52 -04:00
nimlgen
2145bce3f9 usbgpu: copyin size is 16k (#10240)
* usbgpu: copyin size is 16k

* ush
2025-05-09 22:12:54 +03:00
Sieds Lykles
74e40aafa0 use cdiv in div and mod folding (#10216)
* use cdiv

* use cdiv and cmod there as well

* Add tests

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-09 12:37:24 -04:00
Sieds Lykles
8da9c070ca take gcd out of trunc div (#10238) 2025-05-09 12:08:10 -04:00
qazal
e2292f6663 TRACEMETA>=2 displays UOp metadata in VIZ (#10237) 2025-05-09 17:42:00 +03:00
qazal
d5686f33a9 delete KernelContext dataclass [pr] (#10236) 2025-05-09 17:36:21 +03:00
qazal
467daf8d4c remap UOp metadata in graph_rewrite_map [pr] (#10234)
* remap metadata in graph_rewrite_map [pr]

* fix

* merge loops

* UOp.metadata returns Metadata|None

* shorter
2025-05-09 17:20:53 +03:00
nimlgen
4c75b124b6 usb: copy into mv is faster (#10233)
* usb: copy into mv is faster

* missing

* bytes
2025-05-09 14:53:36 +03:00
nimlgen
d08ce62553 hcq: do not reread signal in wait (#10232) 2025-05-09 14:38:36 +03:00
nimlgen
0464a31000 usbgpu: no overrun check needed (#10231) 2025-05-09 14:20:24 +03:00
nimlgen
116390083f nvme speed write example (#10230) 2025-05-09 14:20:01 +03:00
chenyu
9846435c2e fix test_div_numerator_negative (#10229)
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3 update uop symbolic tests (#10228)
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319 better bound for mod negative number (#10227) 2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb tighter idiv bound for symbolic denominator (#10226) 2025-05-08 22:38:56 -04:00
uuuvn
82a6160ff7 Detect metal paravirtualization bug via device name instead of CI (#10225) 2025-05-08 19:31:47 -07:00
Xingyu
a21369d039 Enhance tensor random functions with dtype support (#10214)
* Enhance tensor random functions with dtype support
- Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py
- Added unit tests for uniform and normal tensor generation with specific dtypes in test.py

* Refactor test name for clarity
- Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py`
- Aims to improve readability and better reflect the test's purpose
2025-05-08 20:48:07 -04:00
qazal
b6904bbf83 Revert "split grouper into insert and finalize stages [pr] (#10222)" (#10224)
This reverts commit 2594e4db15.
2025-05-09 03:02:38 +03:00
qazal
2594e4db15 split grouper into insert and finalize stages [pr] (#10222) 2025-05-09 02:36:22 +03:00
George Hotz
0b7e3e86d0 single device copy [pr] (#10221)
* single device copy [pr]

* simpler
2025-05-08 15:23:22 -07:00
qazal
1d0f239df7 use Tensor.train() in schedule test + typo [pr] (#10220) 2025-05-08 23:46:42 +03:00
qazal
ff2aa6d0b2 buffer in create_kernel is optional [pr] (#10218)
* buffer in create_kernel is optional [pr]

* pylint
2025-05-08 22:35:55 +03:00
qazal
40560e77c2 minor grouper + viz fixup [pr] (#10217)
* minor grouper + viz fixup [pr]

* gitignore mypy_cache

* reorder create_kernels

* replace with realized

* use tensor_map + viz before spec

* lint

* add that back
2025-05-08 21:39:44 +03:00
George Hotz
0411b09763 small changes from new multi [pr] (#10213) 2025-05-08 07:04:27 -07:00
Sieds Lykles
a0580e8d3c Cleanup in div_and_mod_folding [pr] (#10178)
* Refactor binary var simplification

* Simplify the congruence logic

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-08 06:25:32 -07:00
nimlgen
267ba9b592 usbgpu: better names in copy speed benchmark (#10212) 2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00 Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
e24fe1c746 usbgpu: pci cache (#10207) 2025-05-08 14:31:01 +03:00
nimlgen
7d6ed1b1e9 hotfix: mac ci (#10210)
* fixed?

* cmnt
2025-05-08 14:13:23 +03:00
nimlgen
ba52fce4b2 usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992 remove view_supported_devices, check allocator instead [pr] (#10209) 2025-05-08 11:45:02 +03:00
nimlgen
5a7f6b4d8e am: fix launch on rdna4 (#10206) 2025-05-08 09:46:12 +03:00
George Hotz
8d4c563c01 all COPY can be clone (#10205)
* match old behavior

* simple

* it means the naive thing before the multi

* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea Refactor test: Enable generality in testing UOp alu expressions (#10200)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test

* isolate test refactor
2025-05-07 19:39:44 -07:00
George Hotz
83efc5d5bb lil changes from multi [pr] (#10202) 2025-05-07 14:42:30 -07:00
Rory Clear
9f2931ae67 Fix yolo load failing silently (#10046)
* wait for js before loading model

* use f32

* revert html changes, try both cameras and remove f16 req

* clean
2025-05-07 11:46:09 -07:00
uuuvn
10c9ede6b7 Cloud graph (#9876) 2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834 Fold constant variable (#10196)
* Add rule

* add test and comment

* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9 Take neg out of idiv (#10164)
* Add rules

* Fix tests

* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
qazal
e6c80a9e40 hotfix: early kwargs.pop('err') (#10197)
* hotfix: early kwargs.pop('err')

* err, no container
2025-05-07 23:53:26 +08:00
qazal
3bc72f02d9 better error message for linearizer failures in viz [pr] (#10195) 2025-05-07 23:11:44 +08:00
Sieds Lykles
09544d4556 Add rule and test (#10189)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-07 10:15:55 -04:00
nimlgen
c603b86d69 usbgpu: move queues to controller (#10194) 2025-05-07 16:41:16 +03:00
nimlgen
0fbe494c6b usb: cache writes into 0xa000 (#10191)
* usb: cache writes into 0xa000

* mock

* match parent spec

* ugh
2025-05-07 16:03:35 +03:00
nimlgen
b8fb0f11ff hcq: parametrize signal allocation size (#10192) 2025-05-07 15:50:43 +03:00
nimlgen
685d5c46df usbgpu: send pci write in batches (#10190)
* usbgpu: send pci write in batches

* mock
2025-05-07 14:41:56 +03:00