Commit Graph

1117 Commits

Author SHA1 Message Date
chenyu
975c318dbc bert use int32 for input ids (#9173)
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00
chenyu
ff05bff221 put bert data shard inside jit (#9160)
python time 45ms -> 9ms, it was spending time to schedule the shard

also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
chenyu
5dc1257ce0 clean up bert fake data iterator [pr] (#9145)
reuse the same get_data_bert path in setup and real run
2025-02-17 20:03:38 -05:00
George Hotz
7eea9b639d hotfix: add replay_pkl debugging env 2025-02-17 17:34:58 +08:00
George Hotz
4672d9af73 actual tests for the dsp backend [pr] (#9102)
* actual tests for the dsp backend [pr]

* fix name
2025-02-15 15:17:56 +08:00
chenyu
81597ddd96 increase lr for bert (#9098)
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
chenyu
b58e7b1898 zero out the weight in bert init run (#9076)
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
chenyu
9e91898941 bert eval at the end of training (#9070)
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
f4f56d7c15 move time_linearizer to extra.optimization.helpers [pr] (#9048)
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
7b5ac2c15e free_intermediates in bert (#9040)
also re-enable dropout and update EVAL_BS
2025-02-12 10:00:39 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
George Hotz
4de084a835 cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952)
* cleanup ci [pr]

* testing_minimal

* add hypothesis to minimal

* fail tiktoken import okay

* add LLVM speed test

* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00
George Hotz
8b16c65bca add compile3 benchmark [pr] (#8929) 2025-02-06 22:49:31 +08:00
geohotstan
6fb0e5751b hotfix test_onnx_imagenet (#8897)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

* WHOOPS

* actually this is more correct
2025-02-05 14:39:55 +08:00
geohotstan
057c70b05f add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-04 16:36:01 -05:00
George Hotz
f484db0e63 dsp cleanups [pr] (#8866) 2025-02-03 15:18:53 +08:00
George Hotz
42d7c800a1 hotfix: add missing tinychat fonts + other assets 2025-02-01 09:34:44 +08:00
chenyu
c7ca7959e6 set DISABLE_DROPOUT=1 in bert script for now (#8799) 2025-01-29 10:51:29 -05:00
chenyu
c99ae81f63 update default resnet LOSS_SCALER to 256 [pr] (#8774) 2025-01-27 16:59:05 -05:00
George Hotz
e82ba1454b MultiLazyBuffer is UOp [pr] (#8662)
* MultiLazyBuffer is UOp [pr]

* this is new mlb

* this is the idea

* progress

* multitensor works

* more movement ops

* this

* MultiLazyBuffer is UOp

* cleanups

* multi axis

* fix more tests

* work

* not that

* add multi grad and move shard to ops

* mops not views

* no double contig

* sweet, all mt tests passing

* port old logic

* remove lbs

* fix realized

* whitespace

* assign tweak

* test_assign_kv_cache_multi passes

* fix is_realized

* fix JIT for multi

* just a few more lines i'll pay them back soon i swear please bro just a few more

* no split reduceop for multi
2025-01-24 13:28:55 +09:00
chenyu
eb77488f85 update llama3 70B to use R1 (#8733) 2025-01-23 19:06:05 -05:00
chenyu
af65331b76 update beam params for bert green [pr] (#8726)
increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step
2025-01-22 22:00:05 -05:00
chenyu
9a9079118e envvar BERT_LAYERS [pr] (#8709)
default is 24 for large
2025-01-21 22:49:19 -05:00
chenyu
9f6d545a16 bert log global_norm in training step [pr] (#8708)
* bert log global_norm in training step [pr]

and minor cleanups

* .item()
2025-01-21 20:36:27 -05:00
chenyu
1e283c33d3 remove realize in bert model init [pr] (#8707) 2025-01-21 14:11:03 -05:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
chenyu
c49e0fca60 GlobalCounters.reset() in sdxl step [pr] (#8664) 2025-01-17 21:10:28 -05:00
chenyu
930728c069 bert BS 72->66 [pr] (#8621)
72 does not fit now
2025-01-14 18:41:41 -05:00
geohotstan
4abe631b56 fix onnx mobilenetv2-7-quantized.onnx (#8574)
* is 67% considered fixed?

* move test up

* share function

* add qgemm too

* make sure qgemm comes out as int

* actually that note is not right

* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
chenyu
994944920b simpler batch_load_train_bert [pr] (#8582)
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
4ac4c1415a free intermediate buffers in the jit [pr] (#8581)
* free intermediate buffers in the jit [pr]

* intermediates_freed

* deallocate if not allocated

* self._first_run is simpler
2025-01-12 15:41:41 -08:00
chenyu
def90b22f6 EVAL_BS=36 for bert [pr] (#8576)
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
George Hotz
9833fe83d8 more work on onnx imagenet [pr] (#8552)
* more work on onnx imagenet [pr]

* working quantization

* static quant

* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0 more working (#8550) 2025-01-09 18:40:08 -08:00
chenyu
b6be407bc6 fix handcode_opt bert [pr] (#8509)
* fix handcode_opt bert [pr]

* too slow
2025-01-05 19:14:12 -05:00
George Hotz
24de25b52f example to benchmark onnx [pr] (#8459)
* example to benchmark onnx [pr]

* reset global count
2024-12-31 11:38:33 -05:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
Calum
d8b08790b9 Fix examples/conversation.py (#8425)
* fix: conversation example

* remove slice func

* remove unused import

* use Tensor.split
2024-12-26 12:45:19 -05:00
chenyu
4712847766 make self_tokenize output more like a python file (#8411)
use comment for file name and join with newline instead of null byte when export to file
2024-12-25 14:16:30 -05:00
chenyu
a35eef8d58 optionally output to file in self_tokenize.py (#8399)
can paste the whole tinygrad in gemini this way
2024-12-24 21:09:26 -05:00
Harald Schäfer
7059459648 Openpilot compile: fix for openpilot use (#8338)
* compile3 changes

* merge conflict

* merge conflict

* give dm npy for now

* Revert "give dm npy for now"

This reverts commit bfd980da7d2c2bab5b073127442c361922032ba1.

* updates

* Always float32 floats

* Update compile3.py

* Update compile3.py

---------

Co-authored-by: ZwX1616 <zwx1616@gmail.com>
2024-12-19 19:43:15 -05:00
George Hotz
8f95b578f6 use Estimates class [pr] (#8319)
* use Estimates class [pr]

* frozen dataclass
2024-12-18 10:19:32 -08:00
George Hotz
37fa38d272 Revert "switch beautiful_mnist to use new optimizer [pr] (#8231)" (#8233)
This reverts commit e9ee39df22.
2024-12-13 19:07:09 -08:00
George Hotz
e9ee39df22 switch beautiful_mnist to use new optimizer [pr] (#8231)
* switch beautiful_mnist to use new optimizer [pr]

* fix abstractions3 + docs

* fix OptimizerGroup with schedule_step api
2024-12-13 18:27:16 -08:00
Ahmed Harmouche
651f72442c encapsulate the exported webgpu model (#8203) 2024-12-13 10:55:37 +01:00
chenyu
64a917b7eb remove LAZYCACHE ContextVar [pr] (#8175)
also removed from resnet latest script
2024-12-11 22:02:52 -05:00
chenyu
26e049ab40 add ALLOWED_READ_IMAGE=2131 to openpilot (#8166)
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
Maxim Zakharov
e53a5bf0c3 StableDdiffusion UI - convenient send via Enter (#8160) 2024-12-11 19:05:24 +01:00