Fix cuda tc emu test (#5663)

* fix acc folding for NV tensor cores

* fix correctness of reduce_before_expand

* fix test emulated CUDA tensor cores

* test_gemm_fp16 on some devices
This commit is contained in:
George Hotz
2024-07-23 15:04:25 -07:00
committed by GitHub
parent c34f9db0f7
commit e3f00ac77d
2 changed files with 6 additions and 3 deletions

View File

@@ -43,7 +43,7 @@ jobs:
PYTHONPATH=. DEBUG=2 EMULATE_AMD=1 FORWARD_ONLY=1 PYTHON=1 N=16 HALF=1 ACC_HALF=1 python3 ./extra/gemm/simple_matmul.py
PYTHONPATH=. DEBUG=2 EMULATE_AMD=1 FORWARD_ONLY=1 PYTHON=1 N=64 HALF=1 ACC_HALF=1 python3 ./extra/gemm/simple_matmul.py
- name: Test emulated CUDA tensor cores
run: DEBUG=2 EMULATE_CUDA=1 FORWARD_ONLY=1 PYTHON=1 python3 test/test_ops.py TestOps.test_gemm
run: DEBUG=2 EMULATE_CUDA=1 FORWARD_ONLY=1 PYTHON=1 python3 test/test_ops.py TestOps.test_gemm_fp16
- name: Full test tensor cores
run: |
PYTHONPATH=. DEBUG=2 EMULATE_METAL=1 FORWARD_ONLY=1 PYTHON=1 python3 ./test/test_linearizer.py TestLinearizer.test_tensor_cores