chenyu
975c318dbc
bert use int32 for input ids ( #9173 )
...
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00
chenyu
ff05bff221
put bert data shard inside jit ( #9160 )
...
python time 45ms -> 9ms, it was spending time to schedule the shard
also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
chenyu
5dc1257ce0
clean up bert fake data iterator [pr] ( #9145 )
...
reuse the same get_data_bert path in setup and real run
2025-02-17 20:03:38 -05:00
George Hotz
7eea9b639d
hotfix: add replay_pkl debugging env
2025-02-17 17:34:58 +08:00
George Hotz
4672d9af73
actual tests for the dsp backend [pr] ( #9102 )
...
* actual tests for the dsp backend [pr]
* fix name
2025-02-15 15:17:56 +08:00
chenyu
81597ddd96
increase lr for bert ( #9098 )
...
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
chenyu
b58e7b1898
zero out the weight in bert init run ( #9076 )
...
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
chenyu
9e91898941
bert eval at the end of training ( #9070 )
...
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
f4f56d7c15
move time_linearizer to extra.optimization.helpers [pr] ( #9048 )
...
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
7b5ac2c15e
free_intermediates in bert ( #9040 )
...
also re-enable dropout and update EVAL_BS
2025-02-12 10:00:39 -05:00
Ahmed Harmouche
916d5e7f08
WebGPU f16 support (f16 bounty part 2) ( #8653 )
...
* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
2025-02-12 19:46:53 +08:00
George Hotz
0568720a68
delete revectorize ( #9000 )
...
* delete revectorize
* test vectorized LLVM/CLANG
* idk about that
* was that the segfault?
2025-02-10 18:32:35 +08:00
George Hotz
4de084a835
cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] ( #8952 )
...
* cleanup ci [pr]
* testing_minimal
* add hypothesis to minimal
* fail tiktoken import okay
* add LLVM speed test
* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
chenyu
a092b6395d
Tuple -> tuple, List -> list [pr] ( #8936 )
2025-02-06 14:21:19 -05:00
George Hotz
8b16c65bca
add compile3 benchmark [pr] ( #8929 )
2025-02-06 22:49:31 +08:00
geohotstan
6fb0e5751b
hotfix test_onnx_imagenet ( #8897 )
...
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
* WHOOPS
* actually this is more correct
2025-02-05 14:39:55 +08:00
geohotstan
057c70b05f
add onnx_helpers to extra and add ort validate to benchmark_onnx ( #8890 )
...
* start
* log severity
* only change this
* change abstraction so it's more usable for huggingface
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-04 16:36:01 -05:00
George Hotz
f484db0e63
dsp cleanups [pr] ( #8866 )
2025-02-03 15:18:53 +08:00
George Hotz
42d7c800a1
hotfix: add missing tinychat fonts + other assets
2025-02-01 09:34:44 +08:00
chenyu
c7ca7959e6
set DISABLE_DROPOUT=1 in bert script for now ( #8799 )
2025-01-29 10:51:29 -05:00
chenyu
c99ae81f63
update default resnet LOSS_SCALER to 256 [pr] ( #8774 )
2025-01-27 16:59:05 -05:00
George Hotz
e82ba1454b
MultiLazyBuffer is UOp [pr] ( #8662 )
...
* MultiLazyBuffer is UOp [pr]
* this is new mlb
* this is the idea
* progress
* multitensor works
* more movement ops
* this
* MultiLazyBuffer is UOp
* cleanups
* multi axis
* fix more tests
* work
* not that
* add multi grad and move shard to ops
* mops not views
* no double contig
* sweet, all mt tests passing
* port old logic
* remove lbs
* fix realized
* whitespace
* assign tweak
* test_assign_kv_cache_multi passes
* fix is_realized
* fix JIT for multi
* just a few more lines i'll pay them back soon i swear please bro just a few more
* no split reduceop for multi
2025-01-24 13:28:55 +09:00
chenyu
eb77488f85
update llama3 70B to use R1 ( #8733 )
2025-01-23 19:06:05 -05:00
chenyu
af65331b76
update beam params for bert green [pr] ( #8726 )
...
increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step
2025-01-22 22:00:05 -05:00
chenyu
9a9079118e
envvar BERT_LAYERS [pr] ( #8709 )
...
default is 24 for large
2025-01-21 22:49:19 -05:00
chenyu
9f6d545a16
bert log global_norm in training step [pr] ( #8708 )
...
* bert log global_norm in training step [pr]
and minor cleanups
* .item()
2025-01-21 20:36:27 -05:00
chenyu
1e283c33d3
remove realize in bert model init [pr] ( #8707 )
2025-01-21 14:11:03 -05:00
geohotstan
dd82b4c913
make onnx runner a class ( #8647 )
...
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 10:11:05 -08:00
chenyu
c49e0fca60
GlobalCounters.reset() in sdxl step [pr] ( #8664 )
2025-01-17 21:10:28 -05:00
chenyu
930728c069
bert BS 72->66 [pr] ( #8621 )
...
72 does not fit now
2025-01-14 18:41:41 -05:00
geohotstan
4abe631b56
fix onnx mobilenetv2-7-quantized.onnx ( #8574 )
...
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00
chenyu
b6be407bc6
fix handcode_opt bert [pr] ( #8509 )
...
* fix handcode_opt bert [pr]
* too slow
2025-01-05 19:14:12 -05:00
George Hotz
24de25b52f
example to benchmark onnx [pr] ( #8459 )
...
* example to benchmark onnx [pr]
* reset global count
2024-12-31 11:38:33 -05:00
qazal
866dfa1f23
create_schedule([x.lazydata]) -> x.schedule() in tests ( #8449 )
2024-12-31 03:15:52 +08:00
Calum
d8b08790b9
Fix examples/conversation.py ( #8425 )
...
* fix: conversation example
* remove slice func
* remove unused import
* use Tensor.split
2024-12-26 12:45:19 -05:00
chenyu
4712847766
make self_tokenize output more like a python file ( #8411 )
...
use comment for file name and join with newline instead of null byte when export to file
2024-12-25 14:16:30 -05:00
chenyu
a35eef8d58
optionally output to file in self_tokenize.py ( #8399 )
...
can paste the whole tinygrad in gemini this way
2024-12-24 21:09:26 -05:00
Harald Schäfer
7059459648
Openpilot compile: fix for openpilot use ( #8338 )
...
* compile3 changes
* merge conflict
* merge conflict
* give dm npy for now
* Revert "give dm npy for now"
This reverts commit bfd980da7d2c2bab5b073127442c361922032ba1.
* updates
* Always float32 floats
* Update compile3.py
* Update compile3.py
---------
Co-authored-by: ZwX1616 <zwx1616@gmail.com >
2024-12-19 19:43:15 -05:00
George Hotz
8f95b578f6
use Estimates class [pr] ( #8319 )
...
* use Estimates class [pr]
* frozen dataclass
2024-12-18 10:19:32 -08:00
George Hotz
37fa38d272
Revert "switch beautiful_mnist to use new optimizer [pr] ( #8231 )" ( #8233 )
...
This reverts commit e9ee39df22 .
2024-12-13 19:07:09 -08:00
George Hotz
e9ee39df22
switch beautiful_mnist to use new optimizer [pr] ( #8231 )
...
* switch beautiful_mnist to use new optimizer [pr]
* fix abstractions3 + docs
* fix OptimizerGroup with schedule_step api
2024-12-13 18:27:16 -08:00
Ahmed Harmouche
651f72442c
encapsulate the exported webgpu model ( #8203 )
2024-12-13 10:55:37 +01:00
chenyu
64a917b7eb
remove LAZYCACHE ContextVar [pr] ( #8175 )
...
also removed from resnet latest script
2024-12-11 22:02:52 -05:00
chenyu
26e049ab40
add ALLOWED_READ_IMAGE=2131 to openpilot ( #8166 )
...
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
Maxim Zakharov
e53a5bf0c3
StableDdiffusion UI - convenient send via Enter ( #8160 )
2024-12-11 19:05:24 +01:00