1234 Commits

Author SHA1 Message Date
wozeparrot
825b6a2505 feat: llama3 dataloader (#11340) 2025-07-30 13:27:55 -07:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
chenyu
85ddd72038 simpler grouptop in hcopt (#11219)
* simpler grouptop in hcopt

keep the only perf relevant conditions and the rest is handled by try except

* update openpilot read image count
2025-07-13 16:06:09 -04:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
chenyu
b072be0e2d hotfix whisper main script (#11184) 2025-07-11 12:34:00 -04:00
Nino Risteski
bc15e98f5c clean up unused imports in examples and update CI linting (#11024)
* clean up unused imports in examples

* enable unused import checking in examples

* lint

* ignore F541 and F841 - focus on unused imports only

* clean up

* restore tinygrad.frontend.torch for TINY_BACKEND

* tiny change
2025-06-30 08:21:27 -07:00
chenyu
c14c9a8eff llama3 grad clip (#11003) 2025-06-27 19:14:12 -04:00
chenyu
f2548afeb5 bert grad clipping start with const 0 (#11008)
saved the init kernels
2025-06-27 18:02:23 -04:00
chenyu
6ab5a5cb6c llama3 mlperf train (#10983)
work in progress. now it can overfit small examples and vram roughly matches
2025-06-26 20:24:27 -04:00
geohotstan
50936b4a18 ONNX real float16 (#10694)
* squash commits

* temp fix for const tensor

* actually realizing float16 can only happen in raw_data

* .float -> cast(float) to rerun CI

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-26 14:05:12 -04:00
chenyu
8751d47985 CosineAnnealingLRWithWarmup (#10981) 2025-06-25 17:45:21 -04:00
chenyu
efad567ebd ruff check whole examples/mlperf/ (#10979) 2025-06-25 12:57:48 -04:00
Alexey Zaytsev
230ad3a460 [bounty] Don't use numpy inside hlb_cifar10 training loop (#10777)
* Don't use numpy inside hlb_cifar10 training loop

* Lint it

* jit it

* Drop the last half-batch

* Use gather for random_crop and reuse perms

* Wrap train_cifar in FUSE_ARANGE context

* No need to pass FUSE_ARANGE=1 to hlb_cifar10.py

* Add cutmix to jittable augmentations

* Remove .contiguous() from fetch_batches

* Fix indexing boundary

---------

Co-authored-by: Irwin1138 <irwin1139@gmail.com>
2025-06-23 17:24:56 -07:00
chenyu
3699d1d3ba hotfix llama3 temperature is float (#10938) 2025-06-23 15:20:56 -04:00
chenyu
0480139def log_perplexity metrics (#10912) 2025-06-21 10:44:47 -04:00
George Hotz
b41e0563a3 move stuff to kernelize folder (#10902)
* move stuff to kernelize folder

* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
chenyu
62a540066e remove DEBUG=2 in mi300x bert setup (#10886)
seems fine now, not sure what the issue was
2025-06-19 13:28:53 -04:00
Nino Risteski
5a56710ff4 small fix replacing download_file with fetch (#10877)
* imported a missing os and replaced download_file with fetch from tg helpers

* use fetch directly

* Remove if not os.path.isfile
2025-06-19 12:12:09 -04:00
chenyu
8d721a4ead add 405B params to llama3.py (#10884)
tested with `python examples/llama3.py --model /raid/weights/llama31_405b/ --size 405B --shard 8 --benchmark` on tinyamd2
2025-06-19 11:45:37 -04:00
chenyu
f377cc19cd use AM for bert (#10882)
have triained 3 runs and all seem fine
2025-06-19 09:48:54 -04:00
chenyu
b70c7d3631 bert grad accumulation (#10863)
* bert grad accumulation

* realize grad
2025-06-18 12:17:07 -04:00
George Hotz
cba6e15937 split grouper and kernelize [pr] (#10854) 2025-06-17 17:54:20 -07:00
chenyu
075a74cf25 add global_batch_size to mlperf bert (#10852)
global_batch_size = grad_acc_steps * batch_size. no-op change to prep grad acc for bert
2025-06-17 17:54:15 -04:00
chenyu
7d5c769c6b fix compile4 (#10797) 2025-06-12 22:28:56 -04:00
chenyu
81e296d7b8 remove Tensor.test() in retinanet (#10770)
test was removed
2025-06-10 22:14:57 -04:00
George Hotz
acf72872b3 move view left to the outer graph prereqs + testing (#10725)
* move view left to the outer graph

* global view right

* dont need that one

* remove comment

* test kernelize

* simple

* split onnx, test sdxl null

* fix testing

* ugh, wrong one

* Update test.yml
2025-06-09 20:43:25 -07:00
b1tg
24d328e313 onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
Sieds Lykles
cfa65bea05 Subtract 1 from Variable upper bound (#10715) 2025-06-09 09:25:53 -07:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
chenyu
e88fe41d37 update vits vctk model to use download from huggingface (#10688)
google drive points to a warning page that does not work
2025-06-07 20:47:28 -04:00
Sieds Lykles
c29a56dd51 Fix whisper OOB (#10685)
* fix whisper and test

* remove import
2025-06-07 20:23:50 -04:00
Sieds Lykles
2f605eadf7 fix oob (#10666) 2025-06-07 11:32:03 -04:00
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
4ab3391e6f set -o pipefail for mlperf run_and_time (#10577)
also run the 5.1 script in ci cron job
2025-05-30 16:36:44 -04:00
chenyu
baf482d314 copy mlperf stuff to 5.1 (#10576)
5.0 is finalized, new changes go to 5.1
2025-05-30 16:12:39 -04:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
George Hotz
e4e7b5d7e1 continue work on beautiful cifar (#10555) 2025-05-28 21:42:01 -07:00
George Hotz
871df1436a more beautiful cifar (#10551)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh

* work on beautiful_cifar

* speed close to hlb_cifar

* schedule to corealize all

* one line sched step

* less lines
2025-05-28 20:48:20 -07:00
chenyu
74cf5dbd9e mlperf system updates (#10550)
standardized processor and accelerator names
2025-05-28 16:15:46 -04:00
chenyu
51dc7eedb0 correct use AM for resnet run_and_time (#10524) 2025-05-26 15:33:11 -04:00
chenyu
c1919ad55f use AM for resnet run_and_time (#10523) 2025-05-26 14:50:49 -04:00
chenyu
2d50efb92b set -e on mlperf run_and_time scripts (#10519) 2025-05-26 09:22:30 -04:00
chenyu
dc6309242d WallTimeEvent for mlperf ci (#10506) 2025-05-24 10:56:03 -04:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00
George Hotz
577a0b4cfa openpilot compile4 (wip) (#10407)
* openpilot compile4

* add copies

* remove junk
2025-05-22 10:47:34 -07:00
chenyu
67d1364106 update LOGMLPERF in red resnet run_and_time (#10416) 2025-05-19 13:23:33 -04:00
qazal
90eb3c0e5d add MobileNetV2 benchmark to comma CI (#10250)
* add MobileNetV2 to comma CI

* symlink imagenet

* also the signature

* comment that out

* need imagenetmock

* same train and test set

* quantize on CPU=1

* verbose

* need __hexagon_divsf3

* 0x858d6c15

* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00