George Hotz
0d39bb5de1
rename to get_kernelize_map ( #10465 )
2025-05-22 11:44:44 -07:00
George Hotz
577a0b4cfa
openpilot compile4 (wip) ( #10407 )
...
* openpilot compile4
* add copies
* remove junk
2025-05-22 10:47:34 -07:00
chenyu
67d1364106
update LOGMLPERF in red resnet run_and_time ( #10416 )
2025-05-19 13:23:33 -04:00
qazal
90eb3c0e5d
add MobileNetV2 benchmark to comma CI ( #10250 )
...
* add MobileNetV2 to comma CI
* symlink imagenet
* also the signature
* comment that out
* need imagenetmock
* same train and test set
* quantize on CPU=1
* verbose
* need __hexagon_divsf3
* 0x858d6c15
* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
chenyu
485e80da69
run_and_time for resnet ci ( #10405 )
2025-05-18 23:39:57 -04:00
George Hotz
411392dfb7
move files into uop dir ( #10399 )
...
* move files into uop dir [pr]
* tinygrad.uop is a thing
* fix uop docs, no pr
* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
0b733ba75e
multi device training with GPT2 [pr] ( #10375 )
...
* multi device training with GPT2 [pr]
* Update grouper.py
2025-05-17 15:33:56 -07:00
wozeparrot
12a1ccc680
clean: double import ( #10345 )
2025-05-15 20:15:09 -07:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb ( #10185 )
2025-05-15 16:14:56 -07:00
George Hotz
568d6d96e7
small changes from new multi [pr] ( #10318 )
2025-05-14 20:50:59 -07:00
George Hotz
bfc30fa6ea
hotfix: typo in shm_name
2025-05-14 19:34:52 -07:00
George Hotz
2bc54b3e22
manually handle OSX
2025-05-14 19:17:51 -07:00
George Hotz
ab460486d7
Revert "resnet dataloader osx ( #10316 )"
...
This reverts commit aef336930a .
2025-05-14 19:15:07 -07:00
George Hotz
aef336930a
resnet dataloader osx ( #10316 )
...
* mlperf dataloader on mac
* resnet dataloader [pr]
* simple should work
2025-05-14 18:31:26 -07:00
chenyu
fbaa26247a
randn_like in minrf ( #10298 )
...
tested that it trains to similar loss
2025-05-14 07:59:50 -04:00
George Hotz
98c84a711d
min rectified flow example [pr] ( #10252 )
...
* work on minrf example
* more
* jit sample
* t is tensor not const
* fixes
* more convs
* fix dropout
* don't print
* 504
* big patch
* onehot
* touch
* use embeddings
* dumb uses final layer
* act
* non fl
* match
* tp
* 3
* of
* ppsz
* normal
* add adln
* no t
* weird transformer
* weird transformer
* contig
* actual speed fix
* dumb
* cb
* 0
* t is 0
* mort-t
* args
* dumb days are over
* readable
* contig
* no more t mask
* mask_t
* init to zero
* clean
* steps
* work
* tt
* t
* solid
2025-05-11 18:36:44 -07:00
Adam Van Ymeren
a28ca0680f
update dead link ( #10242 )
2025-05-09 19:59:52 -04:00
Rory Clear
9f2931ae67
Fix yolo load failing silently ( #10046 )
...
* wait for js before loading model
* use f32
* revert html changes, try both cameras and remove f16 req
* clean
2025-05-07 11:46:09 -07:00
Kevin Buhler
363481e2fb
correct mispelled words ( #10165 )
2025-05-05 08:12:41 -07:00
chenyu
4a04098389
fix llama3 with nf4 quantize ( #10107 )
...
also int8 outputs is wrong
2025-04-29 15:14:36 -04:00
qazal
a59d18da21
hack for VIZ=1 with examples/llama ( #10103 )
...
* hack for VIZ=1 with examples/llama
* move it alongside BEAM=0
2025-04-29 23:42:17 +08:00
chenyu
3eba3d6ee9
don't pass model in convert_from_huggingface and convert_from_gguf ( #10094 )
...
it only needs n_layers
2025-04-28 20:11:19 -04:00
chenyu
610ee79b22
cherry pick mlperf5.0 branch to master ( #10089 )
2025-04-28 15:36:56 -04:00
George Hotz
b341296304
hotfix: save sdxl ram
2025-04-27 12:09:45 -04:00
George Hotz
68c5f7ba80
load fast in sdxl ( #10072 )
...
* load fast in sdxl
* back to that with the ret
* no context
2025-04-27 11:58:51 -04:00
George Hotz
4b8ef6ce78
hotfix: sdxl corealize
2025-04-27 10:41:46 -04:00
George Hotz
1253819151
make beautiful indexing use a Variable ( #10063 )
...
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow
2025-04-27 08:22:38 -04:00
Rory Clear
a13a43c4fe
yolo 416 to 640 res ( #10047 )
2025-04-26 20:45:58 -04:00
George Hotz
ea5dddc537
reduce collapse generic ( #10045 )
...
* reduce collapse generic
* new arange folder
* new range folding
* correct with sym
* all tests pass
* indexing ops passes
* failing tests
* fix tests, remove unused
* revert that
* torch indexing is fast
* skip on webgpu
* touchups
* comments
2025-04-26 09:13:24 -04:00
Rory Clear
3a189fa561
More yolo processing in tinygrad ( #9928 )
...
* more tg less np
* update webgpu html for new compile
* resize boxes
* remove text
* add back note
* fix indentation
* fix indentation
* remove magic num
* remove now unused funcs
* back to numpy nms
* no loop
* fix iou suppression
* update test
* dont suppress other classes
* add working scale
* fix expected value, rounded up 0.24 was being counted
* add postprocess bool for onnx test
* fix indents
* clean
* clean
* fix indent
* remove print
* fix indent
* remove unused import
* remove hardcoded 0.25
* space
* spacing
* clean label_predictions func
* remove single item lists
* space
* use postprocess output in test
* space
* clean
* clean
* remove redundant threshold
* remove redundant threshold
* clean
* rename var
* move loop into func
* unhardcode iou_threshold
* remove unused values
* clean
* add note
* clean
* keep const
* move back funcs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 16:21:46 -04:00
chenyu
74c6cf8be3
lint mlperf model_train ( #10038 )
2025-04-24 16:19:44 -04:00
chenyu
a25abf55e3
retinanet only call postprocess_detections with RUNMLPERF ( #10017 )
...
during setup only need to compile `_eval_step().numpy()`
2025-04-23 20:45:38 -04:00
chenyu
65faa1d94b
explicit device in mlperf scripts ( #10015 )
2025-04-23 17:11:52 -04:00
chenyu
a3f938dbee
remove retinanet INITMLPERF from beam script ( #10011 )
...
it only controls logging, loading real data or not is solely controlled by RUNMLPERF
2025-04-23 14:32:54 -04:00
Francis Lata
5542aeb0e4
RetinaNet MLPerf flag updates ( #10009 )
...
* add RUNMLPERF and update INITMLPERF usage
* update scripts to use RUNMLPERF
2025-04-23 13:00:34 -04:00
George Hotz
de0504276b
pop 0 is slow [pr] ( #10007 )
2025-04-23 17:00:59 +01:00
chenyu
d3a8d5c128
print postprocess_detections time in retinanet eval ( #10005 )
...
`BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py`
```
...
loaded dataset @ 8.64s
loaded initial data @ 12.57s
****** 619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections). 0.09 examples/sec. 0.83 TFLOPS @ 59.23s
****** 147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections). 0.11 examples/sec. 1.03 TFLOPS @ 96.74s
****** 152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 134.14s
****** 146.39 ms to enqueue, 37279.85 ms to realize ( 65.07 ms fetching, 36233.56 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 171.56s
****** 152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 208.98s
****** 151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections). 0.11 examples/sec. 1.05 TFLOPS @ 246.00s
****** 136.41 ms to enqueue, 37325.04 ms to realize ( 90.29 ms fetching, 36573.38 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 283.46s
```
2025-04-23 11:39:56 -04:00
chenyu
c39128133c
retinanet green scripts ( #9996 )
...
also removed realize in data_get and used empty for fake data. slightly bigger lr. https://wandb.ai/chenyuxyz/MLPerf-RetinaNet/runs/8skid0e8?nw=nwuserchenyuxyz
2025-04-23 08:28:03 -04:00
chenyu
fb89d9a584
retinanet eval combine output on GPUS[0] ( #9966 )
...
eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS
2025-04-22 07:43:51 -04:00
chenyu
5294c32279
dev scripts for retinanet ( #9968 )
...
also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing
2025-04-21 17:54:56 -04:00
Francis Lata
defa1e77f6
get the proper dataset count ( #9962 )
2025-04-21 12:11:37 -04:00
Francis Lata
d7e247f329
RetinaNet INITMLPERF support ( #9950 )
...
* fixes to make fake data work
* fix eval beam
* fix merge issue
2025-04-21 10:32:05 -04:00
Francis Lata
ea4cb2c715
small cleanups ( #9947 )
2025-04-20 20:33:20 -04:00
chenyu
6c30948df6
hand_coded_optimizations returns list[Opt] [pr] ( #9938 )
...
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
3fdba48fc7
update bert green and README ( #9934 )
...
submission candidate
2025-04-18 21:21:28 -04:00
chenyu
617b45748f
fuse embedding for bert on red ( #9925 )
...
also updated BEAM param and use AMD driver for actual run. 535ms step
2025-04-18 07:20:25 -04:00
chenyu
e2ed673c94
FUSE_ARANGE_UINT to not fuse uint ( #9915 )
...
hack to bypass rand, can FUSE_ARANGE on green for 6ms per step
2025-04-16 18:49:38 -04:00
chenyu
e8024c8281
faster bert global_norm ( #9901 )
...
tinyamd 2% faster. also updated beam params that's 2-3% faster.
update mlperf doc and steps too
2025-04-15 18:24:44 -04:00
Sieds Lykles
91ccf1c343
Off by one error in start_pos ( #9792 )
...
Variable upper bound is inclusive
2025-04-15 15:07:13 -04:00
Francis Lata
31483050c0
add eval_freq flag ( #9894 )
2025-04-15 06:42:40 -04:00