* fix cuda on windows
* fix linter errors
* test github action install cuda-toolkit
* Revert "test github action install cuda-toolkit"
This reverts commit c18ad6f937.
* Revert "fix linter errors"
This reverts commit 00aa943e91.
* Revert "fix cuda on windows"
This reverts commit 7aea5256b1.
* fix windows sysconfig.get_config_var("MULTIARCH") is None
* fix correct indentation in UOp pretty_print for repeated references
When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.
Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.
* add simple unit tests for UOp repr
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* speed up rdna3 unit tests
* add test to CI
* faster and simpler
* speedups
* bugfixes
* use helper
* fix CI maybe
* test fixes
* llvm-21 on 24.04
* upd
* llvm-21
* fix test
* bring that back
* merge gen into lib
* test generators
* write python emulator from RDNA3 psuedocode in pdf
* emu2
* more emu
* working
* more psueod
* progress
* cleanups
* delete junk
* delete stale files
* just emu
* work
* emu compare
* bemu
* cleanups and more failures
* revert bench emu
* fix emu cmp
* four tests fail
* bugfixes
* dsl
* ext
* refactor
* dsl
* div scale fix
* test_emu
* fix emu tests
* pcode
* test pcode
* top imports
* fix test_emu to use run_asm
* emu tests on real hardware
* more tests
* more emu tests
* more
* work
* work
* bug fix
* bugfixes
* fix fp16 gemm
* all ops tests pass in emulator
* fix llvm tests
* fix a few more tests
* fix mockgpu timeout
`UOp.render()` crashed with `IndexError: tuple index out of range` when
the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs
when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`.
The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns
`True` (vacuous truth), causing the code to access `x.src[0]` on an
empty tuple.
- Fix `IndexError` when calling `UOp.render()` on graphs containing
empty `VECTORIZE` nodes.
- Add test for empty `VECTORIZE` rendering.
* more llvm asm tests
* roundtrip test
* work
* more handwritten
* more handwritten
* work
* tests pass
* dual mov
* all tests pass
* all tests pass fast