Commit Graph

852 Commits

Author SHA1 Message Date
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
George Hotz
f7d4638e05 start LLM app, tons of clean up required. target is 200 line ollama (#11068)
* start LLM app, tons of clean up required. target is 200 line ollama

* kind of works

* simpler

* add k/v cache

* with SYM=1, it loops

* no rope cache

* simpler

* more cleanups

* cleanups

* works

* argparse and comments

* from gguf

* generate is a function

* no copy from cpu

* fix max context pass in

* test

* improve test

* ai2_arc

* fix 8B, use less ram

* 136 lines
2025-07-07 17:09:46 -07:00
nimlgen
01f3c4f44d memory: simpler paddr allocation logic (#11090)
* memory: new paddr allocation logic

* am fix

* am refactrros

* fix

* mypy

* use it

* am
2025-07-04 17:00:36 +03:00
qazal
ad155f5454 print inputs to get_program in process replay [pr] (#11051)
* print inputs to get_program in process replay [pr]

* colors

* keep dataclass default escapes

* Revert "keep dataclass default escapes"

This reverts commit c6db7e8a7a.

* note for ast_repr

* add that back
2025-07-02 20:20:01 +03:00
qazal
452b22c9b6 fix process replay diff in PYTHON device [pr] (#11052)
* fix process replay diff in PYTHON device [pr]

The PYTHON backend pickles and encodes UOps, the encoded binary can't be
directly diffed in process replay.

* note
2025-07-02 11:06:46 +03:00
geohotstan
8ebf0abaae ONNX external_test_onnx_backend use PYTHON device for model (#10915)
* try

* ruff check --fix

* no skip test

* hmmmmmmm I don't get this D:

* run CI again

* why is PYTHON device faster than CPU?

* run ci again and fix lint

* actually doesn't PYTHON device make sense here?

* see cpu speed again

* Revert "see cpu speed again"

This reverts commit 1e366f2256.

* trigger CI

* pretty good

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-01 12:11:17 -04:00
qazal
712980e167 fix extract_dataset + add tests to CI (#10995)
* fix extract_dataset + tests

* add CI

* sops.gz itself is same as master

* yml + gzip -c + ge

* don't commit that

* bump limit to 1000

* axis=7

* test_tiny
2025-06-27 01:51:36 +03:00
geohotstan
50936b4a18 ONNX real float16 (#10694)
* squash commits

* temp fix for const tensor

* actually realizing float16 can only happen in raw_data

* .float -> cast(float) to rerun CI

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-26 14:05:12 -04:00
chenyu
8751d47985 CosineAnnealingLRWithWarmup (#10981) 2025-06-25 17:45:21 -04:00
Ignacio Sica
21f1c4cc09 remove some linearize calls from tests [pr] (#10978)
* remove some linearize calls from tests

speed_compare_cuda_ptx
test_uop_spec
test_linearizer
test_uops
test_winograd

* more clear assert message
2025-06-25 12:37:17 -07:00
qazal
de4b9bf53b add opts_to_apply option to AST KernelInfo (#10950)
* proposal: add option to override opts in the get_program API

* update test_linearizer_rewrite

* state in uops

* update process_replay and names

* empty isn't none

* fix process replay
2025-06-24 18:55:39 +03:00
qazal
7a5e4e0bf1 fix unittests process replay [pr] (#10947) 2025-06-24 10:30:23 +03:00
George Hotz
ae4d2d71b4 bump line count to 14500 2025-06-23 15:32:27 -07:00
George Hotz
e15754db28 remove (some) kernelize from llama and test schedule speed (#10939)
* remove kernelize from llama

* 405B

* space
2025-06-23 15:07:31 -07:00
chenyu
42b1c9625b skip test TestKiTS19Dataset::test_training_set (#10936)
flaky
2025-06-23 14:27:24 -04:00
patrini32
9e9fd44987 refactor test/external/external_llama_eval.py (#10567)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-23 10:43:20 -07:00
qazal
7820aeca8e update codegen process replay to use get_program [pr] (#10921)
* update codegen process replay to get_program [pr]

* precommit

* try str replace

* +to_function_name

* fixup tc

* local2.sh

* fix openpilot NOLOCALS

* new local.sh

* correct merge

* beam cache

* back

* revert beam thing

* adding opts_override and name_override makes output of get_program
reproducible

* min diff
2025-06-23 17:31:41 +03:00
alpharush
22f9696522 Fix/hcqfuzz harnesss bug (#10923)
* update command so extra module is found

* fix empty range in randrange errors

* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc ONNX improve dtype fallback (#10800)
* fix

* add early verbose demo test

* is this how to write tests :s

* is definition drift even a thing? gemini says it is

* clean up

* better

* even better

* try add to CI

* doesn't work quite yet

* much more work to be done

* whoops

* partition the test heh

* skipif

* some nits for better names

* add webgpu test for onnxrunner

* fix reference links

* flush for now
2025-06-21 19:29:45 -04:00
chenyu
0480139def log_perplexity metrics (#10912) 2025-06-21 10:44:47 -04:00
nimlgen
0e7bd9fd03 factor out generic MemoryManager (#10910)
* allocator -> memory

* just moveout it

* mm is abstracted

* need entry abstraction

* fix

* mypy
2025-06-21 16:18:33 +03:00
George Hotz
7636d2cdc5 flip order of get_program args (#10905) 2025-06-20 17:23:23 -07:00
George Hotz
b41e0563a3 move stuff to kernelize folder (#10902)
* move stuff to kernelize folder

* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
chenyu
a3dae51085 lower test_gemm_8192 on red (#10883) 2025-06-19 10:01:25 -04:00
George Hotz
18593c9800 one less rewrite on schedule [pr] (#10872)
* one less rewrite on schedule [pr]

* verify in ebs
2025-06-18 17:06:17 -07:00
wozeparrot
bdbf121285 fix: contigous -> contiguous (#10868) 2025-06-18 13:09:51 -07:00
George Hotz
cba6e15937 split grouper and kernelize [pr] (#10854) 2025-06-17 17:54:20 -07:00
uuuvn
a51f18f8f9 CI flakiness (#10851)
https://github.com/tinygrad/tinygrad/actions/runs/15718103629/job/44292845140?pr=10753#step:4:161
2025-06-17 14:46:30 -07:00
nimlgen
c0329148c7 am: check va is aligned to page size (#10815)
* am: check va is aligned to page size

* swap them

* is this faster
2025-06-15 22:51:09 +03:00
George Hotz
5dc1bc6070 switch get_kernel -> get_program [pr] (#10817)
* switch get_kernel -> get_program [pr]

* fix tests
2025-06-15 12:26:50 -07:00
wozeparrot
eb739bb96a hotfix: lower threshold (#10786) 2025-06-11 19:36:20 -04:00
chenyu
612cdf5146 move fuzz_shape_ops to run with other fuzzer (#10767)
* move fuzz_shape_ops to run with other fuzzer

* don't skip CPU
2025-06-10 17:43:04 -04:00
b1tg
52c49dd4f3 fix onnx ci (#10762)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-06-10 14:28:40 -04:00
George Hotz
f84c320548 better external_benchmark_schedule [pr] (#10722) 2025-06-09 10:26:11 -07:00
b1tg
24d328e313 onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
leopf
eb7305e6a4 Tensor.keccak("sha3_256") (#7186)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-06 15:24:05 -07:00
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
46811d0d3c minor external_model_benchmark cleanup (#10644) 2025-06-05 14:13:28 -04:00
chenyu
80ebce421d remove metal buffer limit in external_model_benchmark [pr] (#10642)
not needed anymore
2025-06-05 13:00:51 -04:00
wozeparrot
4d1686f767 clean: becnhmark -> benchmark (#10620) 2025-06-03 19:28:18 -07:00
qazal
910cabb081 add kernel count to grouper process replay differ [pr] (#10611) 2025-06-03 15:21:27 +03:00
qazal
3cc73a0172 simpler process replay main loop [pr] (#10588)
* simpler process replay main loop [pr]

* use logging

* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d merge process replay and viz captures [pr] (#10581)
* refactoring

* test script

* work

* more work

* diff

* repr splits lines correctly

* that

* add location

* add location

* also don't need name_override

* k.copy

* [pr]

* name_override 2

* err
2025-06-01 12:30:10 +03:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00