Commit Graph

1033 Commits

Author SHA1 Message Date
Francis Lata
335d11281c add multi device support to retinanet eval 2025-01-29 10:00:46 -08:00
Francis Lata
fc957e7377 update validation dataloader and more cleanups 2025-01-28 03:02:17 -08:00
Francis Lata
3b9e5a3ed4 debug test 2025-01-27 08:35:36 -08:00
Francis Lata
2177053076 Merge branch 'master' into retinanet_mlperf 2025-01-27 08:07:19 -08:00
Francis Lata
e91733baae refactor on training loop and start the work on val looop 2025-01-27 07:27:19 -08:00
George Hotz
e82ba1454b MultiLazyBuffer is UOp [pr] (#8662)
* MultiLazyBuffer is UOp [pr]

* this is new mlb

* this is the idea

* progress

* multitensor works

* more movement ops

* this

* MultiLazyBuffer is UOp

* cleanups

* multi axis

* fix more tests

* work

* not that

* add multi grad and move shard to ops

* mops not views

* no double contig

* sweet, all mt tests passing

* port old logic

* remove lbs

* fix realized

* whitespace

* assign tweak

* test_assign_kv_cache_multi passes

* fix is_realized

* fix JIT for multi

* just a few more lines i'll pay them back soon i swear please bro just a few more

* no split reduceop for multi
2025-01-24 13:28:55 +09:00
chenyu
eb77488f85 update llama3 70B to use R1 (#8733) 2025-01-23 19:06:05 -05:00
chenyu
af65331b76 update beam params for bert green [pr] (#8726)
increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step
2025-01-22 22:00:05 -05:00
Francis Lata
6fdcaa178b add checkpointing and training resume capabilities 2025-01-22 14:20:17 -08:00
Francis Lata
95cdbbf237 add jit to the training loop 2025-01-22 12:31:29 -08:00
Francis Lata
efe64ebeaf enable lr scheduler and fix benchmark timing 2025-01-22 09:56:38 -08:00
chenyu
9a9079118e envvar BERT_LAYERS [pr] (#8709)
default is 24 for large
2025-01-21 22:49:19 -05:00
chenyu
9f6d545a16 bert log global_norm in training step [pr] (#8708)
* bert log global_norm in training step [pr]

and minor cleanups

* .item()
2025-01-21 20:36:27 -05:00
Francis Lata
d1bc4aef94 do not realize when sharding model weights 2025-01-21 13:45:35 -08:00
Francis Lata
7f331d8836 fix dataloader script 2025-01-21 13:43:59 -08:00
Francis Lata
1bf5ee286b Revert "debug dataset test failuire"
This reverts commit 1b2f9d7f50.
2025-01-21 13:30:12 -08:00
Francis Lata
1b2f9d7f50 debug dataset test failuire 2025-01-21 13:23:50 -08:00
Francis Lata
7815d3ddff Merge branch 'master' into retinanet_mlperf 2025-01-21 13:06:04 -08:00
chenyu
1e283c33d3 remove realize in bert model init [pr] (#8707) 2025-01-21 14:11:03 -05:00
Francis Lata
bf36006ff0 set seed 2025-01-20 22:54:54 -08:00
Francis Lata
5d9a604963 add support for BENCHMARK 2025-01-20 22:47:23 -08:00
Francis Lata
be2e97260d fix dtype for anchor inside dataloader and fix horizontal flip transformation 2025-01-20 22:45:25 -08:00
Francis Lata
cd511384e2 move anchors as part of dataloader 2025-01-20 13:13:16 -08:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
Francis Lata
575c748d94 fix wandb resuming feature 2025-01-20 07:22:16 -08:00
Francis Lata
a90a6e624d add wandb 2025-01-20 07:07:51 -08:00
Francis Lata
9402872d90 Merge branch 'master' into retinanet_mlperf 2025-01-20 06:51:12 -08:00
chenyu
c49e0fca60 GlobalCounters.reset() in sdxl step [pr] (#8664) 2025-01-17 21:10:28 -05:00
Francis Lata
4bc762120e Merge branch 'master' into retinanet_mlperf 2025-01-15 02:45:21 -08:00
chenyu
930728c069 bert BS 72->66 [pr] (#8621)
72 does not fit now
2025-01-14 18:41:41 -05:00
Francis Lata
b957b023fc Merge branch 'master' into retinanet_mlperf 2025-01-13 09:33:38 -08:00
geohotstan
4abe631b56 fix onnx mobilenetv2-7-quantized.onnx (#8574)
* is 67% considered fixed?

* move test up

* share function

* add qgemm too

* make sure qgemm comes out as int

* actually that note is not right

* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
Francis Lata
aeecce1d18 Merge branch 'master' into retinanet_mlperf 2025-01-13 07:02:19 -08:00
chenyu
994944920b simpler batch_load_train_bert [pr] (#8582)
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
4ac4c1415a free intermediate buffers in the jit [pr] (#8581)
* free intermediate buffers in the jit [pr]

* intermediates_freed

* deallocate if not allocated

* self._first_run is simpler
2025-01-12 15:41:41 -08:00
chenyu
def90b22f6 EVAL_BS=36 for bert [pr] (#8576)
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
Francis Lata
f7537e4db2 Merge branch 'master' into retinanet_mlperf 2025-01-10 05:46:04 -08:00
George Hotz
9833fe83d8 more work on onnx imagenet [pr] (#8552)
* more work on onnx imagenet [pr]

* working quantization

* static quant

* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0 more working (#8550) 2025-01-09 18:40:08 -08:00
chenyu
b6be407bc6 fix handcode_opt bert [pr] (#8509)
* fix handcode_opt bert [pr]

* too slow
2025-01-05 19:14:12 -05:00
George Hotz
24de25b52f example to benchmark onnx [pr] (#8459)
* example to benchmark onnx [pr]

* reset global count
2024-12-31 11:38:33 -05:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
Francis Lata
40d6752854 adjust regression loss to mask after L1 loss is calculated 2024-12-27 17:41:12 +00:00
Francis Lata
cc4a673aa9 Merge branch 'master' into retinanet_mlperf 2024-12-26 21:11:55 +00:00
Calum
d8b08790b9 Fix examples/conversation.py (#8425)
* fix: conversation example

* remove slice func

* remove unused import

* use Tensor.split
2024-12-26 12:45:19 -05:00
chenyu
4712847766 make self_tokenize output more like a python file (#8411)
use comment for file name and join with newline instead of null byte when export to file
2024-12-25 14:16:30 -05:00
chenyu
a35eef8d58 optionally output to file in self_tokenize.py (#8399)
can paste the whole tinygrad in gemini this way
2024-12-24 21:09:26 -05:00
Francis Lata
c1a18e13ef make training work 2024-12-23 21:48:55 +00:00
Francis Lata
d1627d0b1b start re-enabling training step 2024-12-23 19:43:20 +00:00
Francis Lata
44abfbcacb Merge branch 'master' into retinanet_mlperf 2024-12-23 05:36:37 +00:00