mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-22 13:28:06 -05:00
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).