qazal
9210280811
add v_fmac_f16 vop3 instruction to remu ( #10247 )
...
* fmac vop3
* from the box
2025-05-10 23:48:25 +03:00
George Hotz
697259a8a1
amd_comgr_action_info_set_options was deprecated [pr] ( #10245 )
...
* amd_comgr_action_info_set_options was deprecated [pr]
* more standard
2025-05-10 11:59:04 -07:00
Kevin Buhler
2e0990c4e9
even spacing in viz nodes ( #10168 )
...
* even spacing in viz nodes
* precise dy value
* dominant-baseline text-after-edge
* add STROKE_WIDTH constant, delete dominant_baseline attr
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-05-10 10:35:10 +03:00
chenyu
d0e9b74f40
minor div_and_mod_folding cleanup [pr] ( #10243 )
...
remove type ignore and one walrus
2025-05-09 22:42:01 -04:00
Adam Van Ymeren
a28ca0680f
update dead link ( #10242 )
2025-05-09 19:59:52 -04:00
nimlgen
2145bce3f9
usbgpu: copyin size is 16k ( #10240 )
...
* usbgpu: copyin size is 16k
* ush
2025-05-09 22:12:54 +03:00
Sieds Lykles
74e40aafa0
use cdiv in div and mod folding ( #10216 )
...
* use cdiv
* use cdiv and cmod there as well
* Add tests
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-09 12:37:24 -04:00
Sieds Lykles
8da9c070ca
take gcd out of trunc div ( #10238 )
2025-05-09 12:08:10 -04:00
qazal
e2292f6663
TRACEMETA>=2 displays UOp metadata in VIZ ( #10237 )
2025-05-09 17:42:00 +03:00
qazal
d5686f33a9
delete KernelContext dataclass [pr] ( #10236 )
2025-05-09 17:36:21 +03:00
qazal
467daf8d4c
remap UOp metadata in graph_rewrite_map [pr] ( #10234 )
...
* remap metadata in graph_rewrite_map [pr]
* fix
* merge loops
* UOp.metadata returns Metadata|None
* shorter
2025-05-09 17:20:53 +03:00
nimlgen
4c75b124b6
usb: copy into mv is faster ( #10233 )
...
* usb: copy into mv is faster
* missing
* bytes
2025-05-09 14:53:36 +03:00
nimlgen
d08ce62553
hcq: do not reread signal in wait ( #10232 )
2025-05-09 14:38:36 +03:00
nimlgen
0464a31000
usbgpu: no overrun check needed ( #10231 )
2025-05-09 14:20:24 +03:00
nimlgen
116390083f
nvme speed write example ( #10230 )
2025-05-09 14:20:01 +03:00
chenyu
9846435c2e
fix test_div_numerator_negative ( #10229 )
...
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3
update uop symbolic tests ( #10228 )
...
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319
better bound for mod negative number ( #10227 )
2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb
tighter idiv bound for symbolic denominator ( #10226 )
2025-05-08 22:38:56 -04:00
uuuvn
82a6160ff7
Detect metal paravirtualization bug via device name instead of CI ( #10225 )
2025-05-08 19:31:47 -07:00
Xingyu
a21369d039
Enhance tensor random functions with dtype support ( #10214 )
...
* Enhance tensor random functions with dtype support
- Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py
- Added unit tests for uniform and normal tensor generation with specific dtypes in test.py
* Refactor test name for clarity
- Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py`
- Aims to improve readability and better reflect the test's purpose
2025-05-08 20:48:07 -04:00
qazal
b6904bbf83
Revert "split grouper into insert and finalize stages [pr] ( #10222 )" ( #10224 )
...
This reverts commit 2594e4db15 .
2025-05-09 03:02:38 +03:00
qazal
2594e4db15
split grouper into insert and finalize stages [pr] ( #10222 )
2025-05-09 02:36:22 +03:00
George Hotz
0b7e3e86d0
single device copy [pr] ( #10221 )
...
* single device copy [pr]
* simpler
2025-05-08 15:23:22 -07:00
qazal
1d0f239df7
use Tensor.train() in schedule test + typo [pr] ( #10220 )
2025-05-08 23:46:42 +03:00
qazal
ff2aa6d0b2
buffer in create_kernel is optional [pr] ( #10218 )
...
* buffer in create_kernel is optional [pr]
* pylint
2025-05-08 22:35:55 +03:00
qazal
40560e77c2
minor grouper + viz fixup [pr] ( #10217 )
...
* minor grouper + viz fixup [pr]
* gitignore mypy_cache
* reorder create_kernels
* replace with realized
* use tensor_map + viz before spec
* lint
* add that back
2025-05-08 21:39:44 +03:00
George Hotz
0411b09763
small changes from new multi [pr] ( #10213 )
2025-05-08 07:04:27 -07:00
Sieds Lykles
a0580e8d3c
Cleanup in div_and_mod_folding [pr] ( #10178 )
...
* Refactor binary var simplification
* Simplify the congruence logic
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-08 06:25:32 -07:00
nimlgen
267ba9b592
usbgpu: better names in copy speed benchmark ( #10212 )
2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00
Add test for correctness of Infinity in WebGPU ( #10201 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
e24fe1c746
usbgpu: pci cache ( #10207 )
2025-05-08 14:31:01 +03:00
nimlgen
7d6ed1b1e9
hotfix: mac ci ( #10210 )
...
* fixed?
* cmnt
2025-05-08 14:13:23 +03:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci ( #10208 )
...
* usbgpu: benchmark
* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992
remove view_supported_devices, check allocator instead [pr] ( #10209 )
2025-05-08 11:45:02 +03:00
nimlgen
5a7f6b4d8e
am: fix launch on rdna4 ( #10206 )
2025-05-08 09:46:12 +03:00
George Hotz
8d4c563c01
all COPY can be clone ( #10205 )
...
* match old behavior
* simple
* it means the naive thing before the multi
* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea
Refactor test: Enable generality in testing UOp alu expressions ( #10200 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
* isolate test refactor
2025-05-07 19:39:44 -07:00
George Hotz
83efc5d5bb
lil changes from multi [pr] ( #10202 )
2025-05-07 14:42:30 -07:00
Rory Clear
9f2931ae67
Fix yolo load failing silently ( #10046 )
...
* wait for js before loading model
* use f32
* revert html changes, try both cameras and remove f16 req
* clean
2025-05-07 11:46:09 -07:00
uuuvn
10c9ede6b7
Cloud graph ( #9876 )
2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834
Fold constant variable ( #10196 )
...
* Add rule
* add test and comment
* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9
Take neg out of idiv ( #10164 )
...
* Add rules
* Fix tests
* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
qazal
e6c80a9e40
hotfix: early kwargs.pop('err') ( #10197 )
...
* hotfix: early kwargs.pop('err')
* err, no container
2025-05-07 23:53:26 +08:00
qazal
3bc72f02d9
better error message for linearizer failures in viz [pr] ( #10195 )
2025-05-07 23:11:44 +08:00
Sieds Lykles
09544d4556
Add rule and test ( #10189 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-07 10:15:55 -04:00
nimlgen
c603b86d69
usbgpu: move queues to controller ( #10194 )
2025-05-07 16:41:16 +03:00
nimlgen
0fbe494c6b
usb: cache writes into 0xa000 ( #10191 )
...
* usb: cache writes into 0xa000
* mock
* match parent spec
* ugh
2025-05-07 16:03:35 +03:00
nimlgen
b8fb0f11ff
hcq: parametrize signal allocation size ( #10192 )
2025-05-07 15:50:43 +03:00
nimlgen
685d5c46df
usbgpu: send pci write in batches ( #10190 )
...
* usbgpu: send pci write in batches
* mock
2025-05-07 14:41:56 +03:00