chenyu
ff05bff221
put bert data shard inside jit ( #9160 )
...
python time 45ms -> 9ms, it was spending time to schedule the shard
also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
..
2023-12-02 15:03:46 -08:00
2024-12-31 03:15:52 +08:00
2025-02-18 10:36:54 -05:00
2025-02-06 22:49:31 +08:00
2024-07-14 11:09:58 -07:00
2024-01-07 17:41:09 -08:00
2024-11-14 17:56:02 +08:00
2025-02-01 09:34:44 +08:00
2023-11-28 17:36:55 -08:00
2025-02-12 19:46:53 +08:00
2023-03-11 16:28:10 -08:00
2024-06-22 14:45:06 -04:00
2024-10-10 11:34:29 -04:00
2024-10-10 11:34:29 -04:00
2024-12-13 19:07:09 -08:00
2025-02-04 16:36:01 -05:00
2024-03-17 21:25:24 -04:00
2024-12-03 16:02:53 +01:00
2025-01-20 10:11:05 -08:00
2024-12-26 12:45:19 -05:00
2024-05-04 08:38:01 -07:00
2024-09-24 10:08:04 +08:00
2024-11-12 22:11:40 -05:00
2025-02-07 19:01:59 +08:00
2025-02-12 15:49:58 -05:00
2025-01-24 13:28:55 +09:00
2025-01-23 19:06:05 -05:00
2024-11-25 16:46:23 -05:00
2024-07-03 09:06:01 -07:00
2024-05-24 17:04:19 -04:00
2024-06-22 14:45:06 -04:00
2024-06-22 14:45:06 -04:00
2024-07-02 21:39:01 -04:00
2024-12-04 21:46:37 -05:00
2024-07-03 22:47:10 -04:00
2024-09-25 17:45:13 +08:00
2025-01-17 21:10:28 -05:00
2024-12-25 14:16:30 -05:00
2024-11-14 17:56:02 +08:00
2024-05-22 20:43:21 -04:00
2024-11-14 17:56:02 +08:00
2024-09-25 17:45:13 +08:00
2024-12-02 11:31:14 +01:00
2024-10-01 15:00:48 +08:00
2025-02-17 17:34:58 +08:00
2024-06-22 14:45:06 -04:00
2023-11-13 20:18:40 -08:00
2024-01-10 02:22:41 -05:00
2023-10-19 22:07:15 -07:00
2023-11-28 17:36:55 -08:00
2024-12-26 12:45:19 -05:00
2024-09-17 00:42:10 -04:00
2023-12-08 12:59:38 -08:00
2025-02-04 16:36:01 -05:00
2024-12-03 15:10:41 +01:00