mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-09 15:08:02 -05:00
* assembly/amd: move Reg out of the psuedocode * remove extra * fix pcode tests * simpler pcode * simpler * simpler * cleaner * fix mypy
An integrated environment for AMD GPU assembly and emulation Test with `PYTHONPATH="." pytest -n12 extra/assembly/amd/` `AMD_LLVM=1 PYTHONPATH="." pytest -n12 extra/assembly/amd/` * pdf.py -- extract assembly format + instruction psuedocode from AMD PDF * dsl.py -- helpers for the autogen instruction classes in `__init__.py`. should be standalone with init * pcode.py -- psuedocode execution environment. psuedocode should be transformed as little as possible. * asm.py -- an asm/disasm function to transform to and from AMD assembly syntax * emu.py -- an emulator for RDNA that runs in tinygrad with `AMD=1 MOCKGPU=1 PYTHON_REMU=1` The code should be as readable and deduplicated as possible. asm and emu shouldn't be required for dsl. test_emu.py has a good set of instruction tests for the emulation, with USE_HW=1 it will compare to real hardware. Whenever an instruction is fixed, regression tests should be added here and confirmed with real hardware. test_llvm.py tests asm/disasm on the LLVM tests, confirming it behaves the same as LLVM. tinygrad's dtype tests should pass with and without LLVM. they run in about 12 seconds. `PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=0 pytest -n=12 test/test_dtype_alu.py test/test_dtype.py` `PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=1 pytest -n=12 test/test_dtype_alu.py test/test_dtype.py` The ops tests also pass, but they are very slow, so you should run them one at a time. `SKIP_SLOW_TEST=1 PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=0 pytest -n=12 test/test_ops.py` `SKIP_SLOW_TEST=1 PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=1 pytest -n=12 test/test_ops.py` When something is caught by main tinygrad tests, a local regression test should be added to `extra/assembly/amd/test`. While working with tinygrad, you can dump the assembly with `DEBUG=7`. These tests all pass on real hardware, so if a test is failing with `AMD=1 PYTHON_REMU=1 MOCKGPU=1` it's likely because an instruction is emulated incorrectly. You can test without `MOCKGPU=1` to test on real hardware, if it works on real hardware there's a bug in the emulator. Currently, only RDNA3 is well supported, but when finished, this will support RDNA3+RDNA4+CDNA in ~2000 lines.