Commit Graph

1087 Commits

Author SHA1 Message Date
George Hotz
577a0b4cfa openpilot compile4 (wip) (#10407)
* openpilot compile4

* add copies

* remove junk
2025-05-22 10:47:34 -07:00
chenyu
67d1364106 update LOGMLPERF in red resnet run_and_time (#10416) 2025-05-19 13:23:33 -04:00
qazal
90eb3c0e5d add MobileNetV2 benchmark to comma CI (#10250)
* add MobileNetV2 to comma CI

* symlink imagenet

* also the signature

* comment that out

* need imagenetmock

* same train and test set

* quantize on CPU=1

* verbose

* need __hexagon_divsf3

* 0x858d6c15

* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
chenyu
485e80da69 run_and_time for resnet ci (#10405) 2025-05-18 23:39:57 -04:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
0b733ba75e multi device training with GPT2 [pr] (#10375)
* multi device training with GPT2 [pr]

* Update grouper.py
2025-05-17 15:33:56 -07:00
wozeparrot
12a1ccc680 clean: double import (#10345) 2025-05-15 20:15:09 -07:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
George Hotz
568d6d96e7 small changes from new multi [pr] (#10318) 2025-05-14 20:50:59 -07:00
George Hotz
bfc30fa6ea hotfix: typo in shm_name 2025-05-14 19:34:52 -07:00
George Hotz
2bc54b3e22 manually handle OSX 2025-05-14 19:17:51 -07:00
George Hotz
ab460486d7 Revert "resnet dataloader osx (#10316)"
This reverts commit aef336930a.
2025-05-14 19:15:07 -07:00
George Hotz
aef336930a resnet dataloader osx (#10316)
* mlperf dataloader on mac

* resnet dataloader [pr]

* simple should work
2025-05-14 18:31:26 -07:00
chenyu
fbaa26247a randn_like in minrf (#10298)
tested that it trains to similar loss
2025-05-14 07:59:50 -04:00
George Hotz
98c84a711d min rectified flow example [pr] (#10252)
* work on minrf example

* more

* jit sample

* t is tensor not const

* fixes

* more convs

* fix dropout

* don't print

* 504

* big patch

* onehot

* touch

* use embeddings

* dumb uses final layer

* act

* non fl

* match

* tp

* 3

* of

* ppsz

* normal

* add adln

* no t

* weird transformer

* weird transformer

* contig

* actual speed fix

* dumb

* cb

* 0

* t is 0

* mort-t

* args

* dumb days are over

* readable

* contig

* no more t mask

* mask_t

* init to zero

* clean

* steps

* work

* tt

* t

* solid
2025-05-11 18:36:44 -07:00
Adam Van Ymeren
a28ca0680f update dead link (#10242) 2025-05-09 19:59:52 -04:00
Rory Clear
9f2931ae67 Fix yolo load failing silently (#10046)
* wait for js before loading model

* use f32

* revert html changes, try both cameras and remove f16 req

* clean
2025-05-07 11:46:09 -07:00
Kevin Buhler
363481e2fb correct mispelled words (#10165) 2025-05-05 08:12:41 -07:00
chenyu
4a04098389 fix llama3 with nf4 quantize (#10107)
also int8 outputs is wrong
2025-04-29 15:14:36 -04:00
qazal
a59d18da21 hack for VIZ=1 with examples/llama (#10103)
* hack for VIZ=1 with examples/llama

* move it alongside BEAM=0
2025-04-29 23:42:17 +08:00
chenyu
3eba3d6ee9 don't pass model in convert_from_huggingface and convert_from_gguf (#10094)
it only needs n_layers
2025-04-28 20:11:19 -04:00
chenyu
610ee79b22 cherry pick mlperf5.0 branch to master (#10089) 2025-04-28 15:36:56 -04:00
George Hotz
b341296304 hotfix: save sdxl ram 2025-04-27 12:09:45 -04:00
George Hotz
68c5f7ba80 load fast in sdxl (#10072)
* load fast in sdxl

* back to that with the ret

* no context
2025-04-27 11:58:51 -04:00
George Hotz
4b8ef6ce78 hotfix: sdxl corealize 2025-04-27 10:41:46 -04:00
George Hotz
1253819151 make beautiful indexing use a Variable (#10063)
* make beautiful indexing use a Variable

* stunning test

* better color

* training is broken

* fix tests

* fix variable indexing

* fix test

* no contiguous

* revert that

* revert that too

* indexing two bind

* skip for webgpu

* make not slow
2025-04-27 08:22:38 -04:00
Rory Clear
a13a43c4fe yolo 416 to 640 res (#10047) 2025-04-26 20:45:58 -04:00
George Hotz
ea5dddc537 reduce collapse generic (#10045)
* reduce collapse generic

* new arange folder

* new range folding

* correct with sym

* all tests pass

* indexing ops passes

* failing tests

* fix tests, remove unused

* revert that

* torch indexing is fast

* skip on webgpu

* touchups

* comments
2025-04-26 09:13:24 -04:00
Rory Clear
3a189fa561 More yolo processing in tinygrad (#9928)
* more tg less np

* update webgpu html for new compile

* resize boxes

* remove text

* add back note

* fix indentation

* fix indentation

* remove magic num

* remove now unused funcs

* back to numpy nms

* no loop

* fix iou suppression

* update test

* dont suppress other classes

* add working scale

* fix expected value, rounded up 0.24 was being counted

* add postprocess bool for onnx test

* fix indents

* clean

* clean

* fix indent

* remove print

* fix indent

* remove unused import

* remove hardcoded 0.25

* space

* spacing

* clean label_predictions func

* remove single item lists

* space

* use postprocess output in test

* space

* clean

* clean

* remove redundant threshold

* remove redundant threshold

* clean

* rename var

* move loop into func

* unhardcode iou_threshold

* remove unused values

* clean

* add note

* clean

* keep const

* move back funcs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 16:21:46 -04:00
chenyu
74c6cf8be3 lint mlperf model_train (#10038) 2025-04-24 16:19:44 -04:00
chenyu
a25abf55e3 retinanet only call postprocess_detections with RUNMLPERF (#10017)
during setup only need to compile `_eval_step().numpy()`
2025-04-23 20:45:38 -04:00
chenyu
65faa1d94b explicit device in mlperf scripts (#10015) 2025-04-23 17:11:52 -04:00
chenyu
a3f938dbee remove retinanet INITMLPERF from beam script (#10011)
it only controls logging, loading real data or not is solely controlled by RUNMLPERF
2025-04-23 14:32:54 -04:00
Francis Lata
5542aeb0e4 RetinaNet MLPerf flag updates (#10009)
* add RUNMLPERF and update INITMLPERF usage

* update scripts to use RUNMLPERF
2025-04-23 13:00:34 -04:00
George Hotz
de0504276b pop 0 is slow [pr] (#10007) 2025-04-23 17:00:59 +01:00
chenyu
d3a8d5c128 print postprocess_detections time in retinanet eval (#10005)
`BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py`

```
...
loaded dataset             @  8.64s
loaded initial data        @ 12.57s
******  619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections).     0.09 examples/sec.  0.83 TFLOPS  @ 59.23s
******  147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections).     0.11 examples/sec.  1.03 TFLOPS  @ 96.74s
******  152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 134.14s
******  146.39 ms to enqueue, 37279.85 ms to realize (  65.07 ms fetching, 36233.56 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 171.56s
******  152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 208.98s
******  151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections).     0.11 examples/sec.  1.05 TFLOPS  @ 246.00s
******  136.41 ms to enqueue, 37325.04 ms to realize (  90.29 ms fetching, 36573.38 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 283.46s
```
2025-04-23 11:39:56 -04:00
chenyu
c39128133c retinanet green scripts (#9996)
also removed realize in data_get and used empty for fake data. slightly bigger lr. https://wandb.ai/chenyuxyz/MLPerf-RetinaNet/runs/8skid0e8?nw=nwuserchenyuxyz
2025-04-23 08:28:03 -04:00
chenyu
fb89d9a584 retinanet eval combine output on GPUS[0] (#9966)
eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS
2025-04-22 07:43:51 -04:00
chenyu
5294c32279 dev scripts for retinanet (#9968)
also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing
2025-04-21 17:54:56 -04:00
Francis Lata
defa1e77f6 get the proper dataset count (#9962) 2025-04-21 12:11:37 -04:00
Francis Lata
d7e247f329 RetinaNet INITMLPERF support (#9950)
* fixes to make fake data work

* fix eval beam

* fix merge issue
2025-04-21 10:32:05 -04:00
Francis Lata
ea4cb2c715 small cleanups (#9947) 2025-04-20 20:33:20 -04:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
3fdba48fc7 update bert green and README (#9934)
submission candidate
2025-04-18 21:21:28 -04:00
chenyu
617b45748f fuse embedding for bert on red (#9925)
also updated BEAM param and use AMD driver for actual run. 535ms step
2025-04-18 07:20:25 -04:00
chenyu
e2ed673c94 FUSE_ARANGE_UINT to not fuse uint (#9915)
hack to bypass rand, can FUSE_ARANGE on green for 6ms per step
2025-04-16 18:49:38 -04:00
chenyu
e8024c8281 faster bert global_norm (#9901)
tinyamd 2% faster.  also updated beam params that's 2-3% faster.

update mlperf doc and steps too
2025-04-15 18:24:44 -04:00
Sieds Lykles
91ccf1c343 Off by one error in start_pos (#9792)
Variable upper bound is inclusive
2025-04-15 15:07:13 -04:00
Francis Lata
31483050c0 add eval_freq flag (#9894) 2025-04-15 06:42:40 -04:00
chenyu
43d3a75d6c increase bert max train_steps (#9883) 2025-04-14 08:53:44 -04:00