chenyu
81597ddd96
increase lr for bert ( #9098 )
...
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
chenyu
b58e7b1898
zero out the weight in bert init run ( #9076 )
...
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
chenyu
9e91898941
bert eval at the end of training ( #9070 )
...
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
f4f56d7c15
move time_linearizer to extra.optimization.helpers [pr] ( #9048 )
...
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
7b5ac2c15e
free_intermediates in bert ( #9040 )
...
also re-enable dropout and update EVAL_BS
2025-02-12 10:00:39 -05:00
Ahmed Harmouche
916d5e7f08
WebGPU f16 support (f16 bounty part 2) ( #8653 )
...
* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
2025-02-12 19:46:53 +08:00
George Hotz
0568720a68
delete revectorize ( #9000 )
...
* delete revectorize
* test vectorized LLVM/CLANG
* idk about that
* was that the segfault?
2025-02-10 18:32:35 +08:00
George Hotz
4de084a835
cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] ( #8952 )
...
* cleanup ci [pr]
* testing_minimal
* add hypothesis to minimal
* fail tiktoken import okay
* add LLVM speed test
* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
chenyu
a092b6395d
Tuple -> tuple, List -> list [pr] ( #8936 )
2025-02-06 14:21:19 -05:00
George Hotz
8b16c65bca
add compile3 benchmark [pr] ( #8929 )
2025-02-06 22:49:31 +08:00
geohotstan
6fb0e5751b
hotfix test_onnx_imagenet ( #8897 )
...
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
* WHOOPS
* actually this is more correct
2025-02-05 14:39:55 +08:00
geohotstan
057c70b05f
add onnx_helpers to extra and add ort validate to benchmark_onnx ( #8890 )
...
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-04 16:36:01 -05:00
George Hotz
f484db0e63
dsp cleanups [pr] ( #8866 )
2025-02-03 15:18:53 +08:00
George Hotz
42d7c800a1
hotfix: add missing tinychat fonts + other assets
2025-02-01 09:34:44 +08:00
chenyu
c7ca7959e6
set DISABLE_DROPOUT=1 in bert script for now ( #8799 )
2025-01-29 10:51:29 -05:00
chenyu
c99ae81f63
update default resnet LOSS_SCALER to 256 [pr] ( #8774 )
2025-01-27 16:59:05 -05:00
George Hotz
e82ba1454b
MultiLazyBuffer is UOp [pr] ( #8662 )
...
* MultiLazyBuffer is UOp [pr]
* this is new mlb
* this is the idea
* progress
* multitensor works
* more movement ops
* this
* MultiLazyBuffer is UOp
* cleanups
* multi axis
* fix more tests
* work
* not that
* add multi grad and move shard to ops
* mops not views
* no double contig
* sweet, all mt tests passing
* port old logic
* remove lbs
* fix realized
* whitespace
* assign tweak
* test_assign_kv_cache_multi passes
* fix is_realized
* fix JIT for multi
* just a few more lines i'll pay them back soon i swear please bro just a few more
* no split reduceop for multi
2025-01-24 13:28:55 +09:00
chenyu
eb77488f85
update llama3 70B to use R1 ( #8733 )
2025-01-23 19:06:05 -05:00
chenyu
af65331b76
update beam params for bert green [pr] ( #8726 )
...
increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step
2025-01-22 22:00:05 -05:00
chenyu
9a9079118e
envvar BERT_LAYERS [pr] ( #8709 )
...
default is 24 for large
2025-01-21 22:49:19 -05:00
chenyu
9f6d545a16
bert log global_norm in training step [pr] ( #8708 )
...
* bert log global_norm in training step [pr]
and minor cleanups
* .item()
2025-01-21 20:36:27 -05:00
chenyu
1e283c33d3
remove realize in bert model init [pr] ( #8707 )
2025-01-21 14:11:03 -05:00
geohotstan
dd82b4c913
make onnx runner a class ( #8647 )
...
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 10:11:05 -08:00
chenyu
c49e0fca60
GlobalCounters.reset() in sdxl step [pr] ( #8664 )
2025-01-17 21:10:28 -05:00
chenyu
930728c069
bert BS 72->66 [pr] ( #8621 )
...
72 does not fit now
2025-01-14 18:41:41 -05:00
geohotstan
4abe631b56
fix onnx mobilenetv2-7-quantized.onnx ( #8574 )
...
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00
chenyu
b6be407bc6
fix handcode_opt bert [pr] ( #8509 )
...
* fix handcode_opt bert [pr]
* too slow
2025-01-05 19:14:12 -05:00
George Hotz
24de25b52f
example to benchmark onnx [pr] ( #8459 )
...
* example to benchmark onnx [pr]
* reset global count
2024-12-31 11:38:33 -05:00
qazal
866dfa1f23
create_schedule([x.lazydata]) -> x.schedule() in tests ( #8449 )
2024-12-31 03:15:52 +08:00
Calum
d8b08790b9
Fix examples/conversation.py ( #8425 )
...
* fix: conversation example
* remove slice func
* remove unused import
* use Tensor.split
2024-12-26 12:45:19 -05:00
chenyu
4712847766
make self_tokenize output more like a python file ( #8411 )
...
use comment for file name and join with newline instead of null byte when export to file
2024-12-25 14:16:30 -05:00
chenyu
a35eef8d58
optionally output to file in self_tokenize.py ( #8399 )
...
can paste the whole tinygrad in gemini this way
2024-12-24 21:09:26 -05:00
Harald Schäfer
7059459648
Openpilot compile: fix for openpilot use ( #8338 )
...
* compile3 changes
* merge conflict
* merge conflict
* give dm npy for now
* Revert "give dm npy for now"
This reverts commit bfd980da7d2c2bab5b073127442c361922032ba1.
* updates
* Always float32 floats
* Update compile3.py
* Update compile3.py
---------
Co-authored-by: ZwX1616 <zwx1616@gmail.com >
2024-12-19 19:43:15 -05:00
George Hotz
8f95b578f6
use Estimates class [pr] ( #8319 )
...
* use Estimates class [pr]
* frozen dataclass
2024-12-18 10:19:32 -08:00
George Hotz
37fa38d272
Revert "switch beautiful_mnist to use new optimizer [pr] ( #8231 )" ( #8233 )
...
This reverts commit e9ee39df22 .
2024-12-13 19:07:09 -08:00
George Hotz
e9ee39df22
switch beautiful_mnist to use new optimizer [pr] ( #8231 )
...
* switch beautiful_mnist to use new optimizer [pr]
* fix abstractions3 + docs
* fix OptimizerGroup with schedule_step api
2024-12-13 18:27:16 -08:00
Ahmed Harmouche
651f72442c
encapsulate the exported webgpu model ( #8203 )
2024-12-13 10:55:37 +01:00
chenyu
64a917b7eb
remove LAZYCACHE ContextVar [pr] ( #8175 )
...
also removed from resnet latest script
2024-12-11 22:02:52 -05:00
chenyu
26e049ab40
add ALLOWED_READ_IMAGE=2131 to openpilot ( #8166 )
...
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
Maxim Zakharov
e53a5bf0c3
StableDdiffusion UI - convenient send via Enter ( #8160 )
2024-12-11 19:05:24 +01:00
George Hotz
f83d715f41
move checks into compile3, delete compile2 [pr] ( #8127 )
...
* move checks into compile3 [pr]
* test_vs_onnx
* test v torch works
* float16 won't compile on compile3
* actually delete compile2
2024-12-09 14:21:42 -08:00
George Hotz
a773c5a571
hotfix: default llama3 is 1B with download_model
2024-12-09 07:23:35 -08:00
Ahmed Harmouche
c6277fce09
Remove f16 decompression lib from SD compile.py ( #8121 )
...
* Remove f16-to-f32-gpu lib, use tinygrad exported decompression
* No need to create new instance
2024-12-09 14:09:00 +01:00
George Hotz
00ac0db9d4
np tensors have the memory from numpy in compile3 [pr] ( #8098 )
2024-12-07 14:01:51 +08:00
George Hotz
22feb3a2f1
move copy into the JIT for openpilot compile3 ( #7937 )
...
* move copy into the JIT, test fails
* ahh, prune was the issue
2024-12-07 13:26:26 +08:00