Files
b1tg 45e2f916a3 add quantize fp8 in llama3 (#12893)
* add quantize fp8 in llama3

* don't truncate fp8 alu result

* cast to float32 before matmul

* --model weights/LLaMA-3/8B-SF-DPO/

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-27 10:22:57 -04:00
..
2024-07-15 14:21:37 -07:00
2024-01-09 17:52:22 -08:00
2024-01-09 17:52:22 -08:00
2025-10-27 21:53:57 +08:00