chenyu
b072be0e2d
hotfix whisper main script ( #11184 )
2025-07-11 12:34:00 -04:00
Nino Risteski
bc15e98f5c
clean up unused imports in examples and update CI linting ( #11024 )
...
* clean up unused imports in examples
* enable unused import checking in examples
* lint
* ignore F541 and F841 - focus on unused imports only
* clean up
* restore tinygrad.frontend.torch for TINY_BACKEND
* tiny change
2025-06-30 08:21:27 -07:00
chenyu
c14c9a8eff
llama3 grad clip ( #11003 )
2025-06-27 19:14:12 -04:00
chenyu
f2548afeb5
bert grad clipping start with const 0 ( #11008 )
...
saved the init kernels
2025-06-27 18:02:23 -04:00
chenyu
6ab5a5cb6c
llama3 mlperf train ( #10983 )
...
work in progress. now it can overfit small examples and vram roughly matches
2025-06-26 20:24:27 -04:00
geohotstan
50936b4a18
ONNX real float16 ( #10694 )
...
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-26 14:05:12 -04:00
chenyu
8751d47985
CosineAnnealingLRWithWarmup ( #10981 )
2025-06-25 17:45:21 -04:00
chenyu
efad567ebd
ruff check whole examples/mlperf/ ( #10979 )
2025-06-25 12:57:48 -04:00
Alexey Zaytsev
230ad3a460
[bounty] Don't use numpy inside hlb_cifar10 training loop ( #10777 )
...
* Don't use numpy inside hlb_cifar10 training loop
* Lint it
* jit it
* Drop the last half-batch
* Use gather for random_crop and reuse perms
* Wrap train_cifar in FUSE_ARANGE context
* No need to pass FUSE_ARANGE=1 to hlb_cifar10.py
* Add cutmix to jittable augmentations
* Remove .contiguous() from fetch_batches
* Fix indexing boundary
---------
Co-authored-by: Irwin1138 <irwin1139@gmail.com >
2025-06-23 17:24:56 -07:00
chenyu
3699d1d3ba
hotfix llama3 temperature is float ( #10938 )
2025-06-23 15:20:56 -04:00
chenyu
0480139def
log_perplexity metrics ( #10912 )
2025-06-21 10:44:47 -04:00
George Hotz
b41e0563a3
move stuff to kernelize folder ( #10902 )
...
* move stuff to kernelize folder
* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
92678e59ee
move kernel to opt ( #10899 )
2025-06-20 15:22:28 -07:00
chenyu
62a540066e
remove DEBUG=2 in mi300x bert setup ( #10886 )
...
seems fine now, not sure what the issue was
2025-06-19 13:28:53 -04:00
Nino Risteski
5a56710ff4
small fix replacing download_file with fetch ( #10877 )
...
* imported a missing os and replaced download_file with fetch from tg helpers
* use fetch directly
* Remove if not os.path.isfile
2025-06-19 12:12:09 -04:00
chenyu
8d721a4ead
add 405B params to llama3.py ( #10884 )
...
tested with `python examples/llama3.py --model /raid/weights/llama31_405b/ --size 405B --shard 8 --benchmark` on tinyamd2
2025-06-19 11:45:37 -04:00
chenyu
f377cc19cd
use AM for bert ( #10882 )
...
have triained 3 runs and all seem fine
2025-06-19 09:48:54 -04:00
chenyu
b70c7d3631
bert grad accumulation ( #10863 )
...
* bert grad accumulation
* realize grad
2025-06-18 12:17:07 -04:00
George Hotz
cba6e15937
split grouper and kernelize [pr] ( #10854 )
2025-06-17 17:54:20 -07:00
chenyu
075a74cf25
add global_batch_size to mlperf bert ( #10852 )
...
global_batch_size = grad_acc_steps * batch_size. no-op change to prep grad acc for bert
2025-06-17 17:54:15 -04:00
chenyu
7d5c769c6b
fix compile4 ( #10797 )
2025-06-12 22:28:56 -04:00
chenyu
81e296d7b8
remove Tensor.test() in retinanet ( #10770 )
...
test was removed
2025-06-10 22:14:57 -04:00
George Hotz
acf72872b3
move view left to the outer graph prereqs + testing ( #10725 )
...
* move view left to the outer graph
* global view right
* dont need that one
* remove comment
* test kernelize
* simple
* split onnx, test sdxl null
* fix testing
* ugh, wrong one
* Update test.yml
2025-06-09 20:43:25 -07:00
b1tg
24d328e313
onnx parser ( #10435 )
...
* onnx parser
* fix compile, lint
* onnx.load -> onnx_load
* compatible with ModelProto
* fix test external_test_onnx_ops.py
* fix tests
* fix signed int
* reduce to 261 lines
* fix TypeProto.Optional
* debug for _parse_message, add TypeProto.Sequence, cleanup
* onnx_load from Tensor
* remove BufferedReader
* 174 lines and reduce tensor copy
* cleanup
* use onnx_load in external_model_benchmark.py
* fix qcom test
* [onnx] parser support external data
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-09 12:44:28 -04:00
Sieds Lykles
cfa65bea05
Subtract 1 from Variable upper bound ( #10715 )
2025-06-09 09:25:53 -07:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
chenyu
e88fe41d37
update vits vctk model to use download from huggingface ( #10688 )
...
google drive points to a warning page that does not work
2025-06-07 20:47:28 -04:00
Sieds Lykles
c29a56dd51
Fix whisper OOB ( #10685 )
...
* fix whisper and test
* remove import
2025-06-07 20:23:50 -04:00
Sieds Lykles
2f605eadf7
fix oob ( #10666 )
2025-06-07 11:32:03 -04:00
wozeparrot
0d86f8d375
fix failed threefry ( #10646 )
2025-06-05 17:17:42 -07:00
chenyu
4ab3391e6f
set -o pipefail for mlperf run_and_time (#10577 )
...
also run the 5.1 script in ci cron job
2025-05-30 16:36:44 -04:00
chenyu
baf482d314
copy mlperf stuff to 5.1 ( #10576 )
...
5.0 is finalized, new changes go to 5.1
2025-05-30 16:12:39 -04:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] ( #10556 )
2025-05-28 22:20:02 -07:00
George Hotz
e4e7b5d7e1
continue work on beautiful cifar ( #10555 )
2025-05-28 21:42:01 -07:00
George Hotz
871df1436a
more beautiful cifar ( #10551 )
...
* enumerate cases of Tensors in the JIT
* optional fused optimizers
* add fused optimizer test
* move that there
* ugh
* work on beautiful_cifar
* speed close to hlb_cifar
* schedule to corealize all
* one line sched step
* less lines
2025-05-28 20:48:20 -07:00
chenyu
74cf5dbd9e
mlperf system updates ( #10550 )
...
standardized processor and accelerator names
2025-05-28 16:15:46 -04:00
chenyu
51dc7eedb0
correct use AM for resnet run_and_time ( #10524 )
2025-05-26 15:33:11 -04:00
chenyu
c1919ad55f
use AM for resnet run_and_time ( #10523 )
2025-05-26 14:50:49 -04:00
chenyu
2d50efb92b
set -e on mlperf run_and_time scripts (#10519 )
2025-05-26 09:22:30 -04:00
chenyu
dc6309242d
WallTimeEvent for mlperf ci ( #10506 )
2025-05-24 10:56:03 -04:00
George Hotz
0d39bb5de1
rename to get_kernelize_map ( #10465 )
2025-05-22 11:44:44 -07:00
George Hotz
577a0b4cfa
openpilot compile4 (wip) ( #10407 )
...
* openpilot compile4
* add copies
* remove junk
2025-05-22 10:47:34 -07:00
chenyu
67d1364106
update LOGMLPERF in red resnet run_and_time ( #10416 )
2025-05-19 13:23:33 -04:00
qazal
90eb3c0e5d
add MobileNetV2 benchmark to comma CI ( #10250 )
...
* add MobileNetV2 to comma CI
* symlink imagenet
* also the signature
* comment that out
* need imagenetmock
* same train and test set
* quantize on CPU=1
* verbose
* need __hexagon_divsf3
* 0x858d6c15
* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
chenyu
485e80da69
run_and_time for resnet ci ( #10405 )
2025-05-18 23:39:57 -04:00
George Hotz
411392dfb7
move files into uop dir ( #10399 )
...
* move files into uop dir [pr]
* tinygrad.uop is a thing
* fix uop docs, no pr
* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
0b733ba75e
multi device training with GPT2 [pr] ( #10375 )
...
* multi device training with GPT2 [pr]
* Update grouper.py
2025-05-17 15:33:56 -07:00
wozeparrot
12a1ccc680
clean: double import ( #10345 )
2025-05-15 20:15:09 -07:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb ( #10185 )
2025-05-15 16:14:56 -07:00
George Hotz
568d6d96e7
small changes from new multi [pr] ( #10318 )
2025-05-14 20:50:59 -07:00