nimlgen
|
14c88abf27
|
add some options to allreduce bench (#9348)
|
2025-03-04 23:46:36 +03:00 |
|
nimlgen
|
93fb50ce77
|
allreduce: add flags (#8713)
|
2025-01-22 17:44:31 +03:00 |
|
chenyu
|
6a7f971fa0
|
hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553)
|
2025-01-10 12:57:44 -05:00 |
|
chenyu
|
2cbb34535c
|
simpler allreduce script [pr] (#8551)
time everything on tensor level and get time from GlobalCounters.time_sum_s
|
2025-01-09 21:38:13 -05:00 |
|
chenyu
|
23c56817d8
|
update and clean up allreduce script [pr] (#8549)
make `run` to able to run with ring only
|
2025-01-09 19:35:28 -05:00 |
|
chenyu
|
85a4397f27
|
fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522)
* fix create_schedule_with_vars usage in allreduce benchmark [pr]
because i didn't know how to use it...
* increase time limit because tiny17 is slow
|
2025-01-07 01:30:01 -05:00 |
|
chenyu
|
0061dc7447
|
fix benchmark allreduce and add to ci [pr] (#8521)
|
2025-01-07 00:37:59 -05:00 |
|
George Hotz
|
8a04a3a77a
|
rename LazyBuffer -> UOp [pr] (#8169)
* rename LazyBuffer -> UOp [pr]
* fix docs
|
2024-12-11 16:15:52 -08:00 |
|
qazal
|
e84d089ef1
|
delete ReduceOps, only use REDUCE_AXIS (#7667)
|
2024-11-13 19:04:27 +08:00 |
|
George Hotz
|
4df5c7a4ef
|
move lazy to engine [pr] (#6886)
* move lazy to engine [pr]
* engine.lazy
|
2024-10-04 23:19:26 +08:00 |
|
nimlgen
|
f768935be8
|
add RING_ALLREDUCE_THRESHOLD (#5835)
* add RING_ALLREDUCE_THRESHOLD
* becnhmark
* fixes
* fix n_gpus
* unused import
* remove debug=2
|
2024-07-31 16:13:09 +03:00 |
|
George Hotz
|
5ba611787d
|
move image into tensor.py. delete features (#4603)
* move image into tensor.py
* change setup.py
* openpilot tests need pythonpath now
|
2024-05-15 10:50:25 -07:00 |
|
chenyu
|
c71627fee6
|
move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
|
2024-03-30 00:30:30 -04:00 |
|
George Hotz
|
68ca4d4276
|
split to schedule.py (#3949)
* split to schedule.py
* split
|
2024-03-26 21:02:46 -07:00 |
|
George Hotz
|
150ea2eb76
|
create engine folder and move code (#3948)
* retry
* older tf
* that
|
2024-03-26 20:38:03 -07:00 |
|
uuuvn
|
6729f20aab
|
Ring allreduce try 2 (#3852)
* Ring allreduce v3
* Configurable size, number of gpus and jit in benchmark
* ScheduleBarrier v0
* GB/s that make sense
* ScheduleBarrier v0.1
* Fallback on 2 GPUs
* ScheduleBarrier v0.2
* ScheduleBarrier v0.3
* ScheduleBarrier v0.3.1
* ScheduleBarrier v0.3.2
* Replace ScheduleBarrier with automatic optimization
* unused import
* fix comment
* typing
* better fallback
* python 3.8
* RING=2 and use ContextVar
* DEBUG >= 2 and change name
* linter
* type
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
|
2024-03-21 19:17:51 -04:00 |
|
George Hotz
|
8cb5215885
|
Revert "Ring allreduce in multitensor (#3000)" (#3840)
This reverts commit c5bf9e4c96.
|
2024-03-20 11:41:49 -07:00 |
|
uuuvn
|
c5bf9e4c96
|
Ring allreduce in multitensor (#3000)
* Ring allreduce v3
* Configurable size, number of gpus and jit in benchmark
* ScheduleBarrier v0
* GB/s that make sense
* ScheduleBarrier v0.1
* Fallback on 2 GPUs
* ScheduleBarrier v0.2
* ScheduleBarrier v0.3
* ScheduleBarrier v0.3.1
* ScheduleBarrier v0.3.2
* Replace ScheduleBarrier with automatic optimization
* unused import
* fix comment
* typing
* better fallback
* python 3.8
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
|
2024-03-20 11:20:01 -07:00 |
|