Francis Lata
cd511384e2
move anchors as part of dataloader
2025-01-20 13:13:16 -08:00
Francis Lata
575c748d94
fix wandb resuming feature
2025-01-20 07:22:16 -08:00
Francis Lata
a90a6e624d
add wandb
2025-01-20 07:07:51 -08:00
Francis Lata
9402872d90
Merge branch 'master' into retinanet_mlperf
2025-01-20 06:51:12 -08:00
chenyu
c49e0fca60
GlobalCounters.reset() in sdxl step [pr] ( #8664 )
2025-01-17 21:10:28 -05:00
Francis Lata
4bc762120e
Merge branch 'master' into retinanet_mlperf
2025-01-15 02:45:21 -08:00
chenyu
930728c069
bert BS 72->66 [pr] ( #8621 )
...
72 does not fit now
2025-01-14 18:41:41 -05:00
Francis Lata
b957b023fc
Merge branch 'master' into retinanet_mlperf
2025-01-13 09:33:38 -08:00
geohotstan
4abe631b56
fix onnx mobilenetv2-7-quantized.onnx ( #8574 )
...
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
Francis Lata
aeecce1d18
Merge branch 'master' into retinanet_mlperf
2025-01-13 07:02:19 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
Francis Lata
f7537e4db2
Merge branch 'master' into retinanet_mlperf
2025-01-10 05:46:04 -08:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00
chenyu
b6be407bc6
fix handcode_opt bert [pr] ( #8509 )
...
* fix handcode_opt bert [pr]
* too slow
2025-01-05 19:14:12 -05:00
George Hotz
24de25b52f
example to benchmark onnx [pr] ( #8459 )
...
* example to benchmark onnx [pr]
* reset global count
2024-12-31 11:38:33 -05:00
qazal
866dfa1f23
create_schedule([x.lazydata]) -> x.schedule() in tests ( #8449 )
2024-12-31 03:15:52 +08:00
Francis Lata
40d6752854
adjust regression loss to mask after L1 loss is calculated
2024-12-27 17:41:12 +00:00
Francis Lata
cc4a673aa9
Merge branch 'master' into retinanet_mlperf
2024-12-26 21:11:55 +00:00
Calum
d8b08790b9
Fix examples/conversation.py ( #8425 )
...
* fix: conversation example
* remove slice func
* remove unused import
* use Tensor.split
2024-12-26 12:45:19 -05:00
chenyu
4712847766
make self_tokenize output more like a python file ( #8411 )
...
use comment for file name and join with newline instead of null byte when export to file
2024-12-25 14:16:30 -05:00
chenyu
a35eef8d58
optionally output to file in self_tokenize.py ( #8399 )
...
can paste the whole tinygrad in gemini this way
2024-12-24 21:09:26 -05:00
Francis Lata
c1a18e13ef
make training work
2024-12-23 21:48:55 +00:00
Francis Lata
d1627d0b1b
start re-enabling training step
2024-12-23 19:43:20 +00:00
Francis Lata
44abfbcacb
Merge branch 'master' into retinanet_mlperf
2024-12-23 05:36:37 +00:00
Francis Lata
8defd337d8
fixes after helper refactor cleanup
2024-12-23 05:30:03 +00:00
Francis Lata
fb689f7097
move BoxCoder to MLPerf helpers
2024-12-23 05:12:51 +00:00
Francis Lata
d57f7cc209
fix regression loss
2024-12-21 09:34:10 +00:00
Francis Lata
630267914f
implement regression loss
2024-12-20 23:38:25 +00:00
Francis Lata
971d10361f
revert anchors to use np
2024-12-20 16:56:50 +00:00
Francis Lata
67da37e390
simplify anchors batching
2024-12-20 15:47:41 +00:00
Harald Schäfer
7059459648
Openpilot compile: fix for openpilot use ( #8338 )
...
* compile3 changes
* merge conflict
* merge conflict
* give dm npy for now
* Revert "give dm npy for now"
This reverts commit bfd980da7d2c2bab5b073127442c361922032ba1.
* updates
* Always float32 floats
* Update compile3.py
* Update compile3.py
---------
Co-authored-by: ZwX1616 <zwx1616@gmail.com >
2024-12-19 19:43:15 -05:00
Francis Lata
759e1d6cbc
make anchors use Tensors
2024-12-19 22:20:03 +00:00
George Hotz
8f95b578f6
use Estimates class [pr] ( #8319 )
...
* use Estimates class [pr]
* frozen dataclass
2024-12-18 10:19:32 -08:00
George Hotz
37fa38d272
Revert "switch beautiful_mnist to use new optimizer [pr] ( #8231 )" ( #8233 )
...
This reverts commit e9ee39df22 .
2024-12-13 19:07:09 -08:00
George Hotz
e9ee39df22
switch beautiful_mnist to use new optimizer [pr] ( #8231 )
...
* switch beautiful_mnist to use new optimizer [pr]
* fix abstractions3 + docs
* fix OptimizerGroup with schedule_step api
2024-12-13 18:27:16 -08:00
Francis Lata
43e1f33d33
make ClassificationHead loss work
2024-12-13 20:20:27 +00:00
Francis Lata
bf9a0609dc
remove masking support for sigmoid_focal_loss
2024-12-13 16:19:47 +00:00
Ahmed Harmouche
651f72442c
encapsulate the exported webgpu model ( #8203 )
2024-12-13 10:55:37 +01:00
chenyu
64a917b7eb
remove LAZYCACHE ContextVar [pr] ( #8175 )
...
also removed from resnet latest script
2024-12-11 22:02:52 -05:00
Francis Lata
2214d13b3d
add missing test and cleanup focal loss
2024-12-11 23:00:48 +00:00
chenyu
26e049ab40
add ALLOWED_READ_IMAGE=2131 to openpilot ( #8166 )
...
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
Maxim Zakharov
e53a5bf0c3
StableDdiffusion UI - convenient send via Enter ( #8160 )
2024-12-11 19:05:24 +01:00
Francis Lata
827b2114e2
update focal loss to support masking
2024-12-10 23:00:32 +00:00
Francis Lata
ddca00d17b
Merge branch 'master' into retinanet_mlperf
2024-12-10 16:35:00 +00:00
Francis Lata
e5bc0c0485
start some work on classification loss
2024-12-10 16:33:16 +00:00
George Hotz
f83d715f41
move checks into compile3, delete compile2 [pr] ( #8127 )
...
* move checks into compile3 [pr]
* test_vs_onnx
* test v torch works
* float16 won't compile on compile3
* actually delete compile2
2024-12-09 14:21:42 -08:00
George Hotz
a773c5a571
hotfix: default llama3 is 1B with download_model
2024-12-09 07:23:35 -08:00