Files
tinygrad/tinygrad
chenyu 1891ebb655 make ring allreduce chunks a multiple of 2^n if possible (#4302)
in resnet, instead of chunking as [43691, 43691, 43691, 43691, 43690, 43690], chunk as [43712, 43712, 43680, 43680, 43680, 43680] and those can have 32 local.

more than 2X faster for the applicable kernels and overall 1% for resnet
2024-04-25 23:45:28 -04:00
..
2024-04-23 12:20:14 +04:00
2024-04-25 15:39:39 -04:00
2024-04-24 22:54:42 +03:00
2024-04-19 15:41:30 +04:00
2024-04-25 16:08:32 +08:00
2024-04-24 10:52:42 +08:00
2024-04-16 10:59:51 +04:00
2024-04-24 15:12:34 +08:00
2024-04-23 16:28:14 -04:00