15 Commits

Author SHA1 Message Date
wozeparrot
f73468d516 fa: block skipping for fa kv bwd (#14569) 2026-02-05 16:13:53 -08:00
wozeparrot
c1ea6687e5 fa: simpler is faster (#14548) 2026-02-05 01:13:17 -08:00
wozeparrot
bbcd3d67a3 fa: faster (#14453) 2026-02-02 21:34:17 -08:00
wozeparrot
c2fb8b208f fa: 32 block size (#14416) 2026-01-29 13:59:13 -08:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
wozeparrot
76a9242a66 fa: merge kv bwd into one kernel (#14277) 2026-01-21 15:24:41 -08:00
wozeparrot
1f89eaf790 tk: fa bert mask fix + some numerical stability improvements (#14214) 2026-01-19 19:18:07 -08:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
wozeparrot
7e5687f6a3 more fa multi fix (#14152) 2026-01-14 13:57:11 -08:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
wozeparrot
9f082e8e25 fa: split kv bwd into 2 kernels (#13981) 2026-01-02 18:45:51 -08:00
wozeparrot
ecbac8a338 tk: fa cleanups + causal test (#13963) 2026-01-01 18:05:00 -08:00
chenyu
80b84f5267 ruff lint tinykitten (#13762)
deleted used import and double spaces. a few ignore to not change the real code
2025-12-19 14:31:00 -05:00
wozeparrot
99e667bdcd tk fa bwd (#13480) 2025-12-17 23:56:37 -08:00
wozeparrot
93f1baca77 feat: tk fa in tensor (#13580) 2025-12-05 14:36:29 -08:00