chenyu
6283d50224
DEPRECATED_linearize -> to_program [pr] ( #11198 )
2025-07-12 13:46:20 -04:00
nimlgen
b6981404ed
memory: use page shifts in memory manager ( #11149 )
...
* memory: use page shifts in memory manager
* fix
2025-07-09 22:05:00 +03:00
George Hotz
2893feb9f6
cleanups for kernel.py ( #11143 )
...
* cleanups for kernel.py
* fixups
2025-07-08 18:10:25 -07:00
chenyu
dada3f5bf3
skip some new onnx tests ( #11135 )
...
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
George Hotz
f7d4638e05
start LLM app, tons of clean up required. target is 200 line ollama ( #11068 )
...
* start LLM app, tons of clean up required. target is 200 line ollama
* kind of works
* simpler
* add k/v cache
* with SYM=1, it loops
* no rope cache
* simpler
* more cleanups
* cleanups
* works
* argparse and comments
* from gguf
* generate is a function
* no copy from cpu
* fix max context pass in
* test
* improve test
* ai2_arc
* fix 8B, use less ram
* 136 lines
2025-07-07 17:09:46 -07:00
nimlgen
01f3c4f44d
memory: simpler paddr allocation logic ( #11090 )
...
* memory: new paddr allocation logic
* am fix
* am refactrros
* fix
* mypy
* use it
* am
2025-07-04 17:00:36 +03:00
qazal
ad155f5454
print inputs to get_program in process replay [pr] ( #11051 )
...
* print inputs to get_program in process replay [pr]
* colors
* keep dataclass default escapes
* Revert "keep dataclass default escapes"
This reverts commit c6db7e8a7a .
* note for ast_repr
* add that back
2025-07-02 20:20:01 +03:00
qazal
452b22c9b6
fix process replay diff in PYTHON device [pr] ( #11052 )
...
* fix process replay diff in PYTHON device [pr]
The PYTHON backend pickles and encodes UOps, the encoded binary can't be
directly diffed in process replay.
* note
2025-07-02 11:06:46 +03:00
geohotstan
8ebf0abaae
ONNX external_test_onnx_backend use PYTHON device for model ( #10915 )
...
* try
* ruff check --fix
* no skip test
* hmmmmmmm I don't get this D:
* run CI again
* why is PYTHON device faster than CPU?
* run ci again and fix lint
* actually doesn't PYTHON device make sense here?
* see cpu speed again
* Revert "see cpu speed again"
This reverts commit 1e366f2256 .
* trigger CI
* pretty good
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-01 12:11:17 -04:00
qazal
712980e167
fix extract_dataset + add tests to CI ( #10995 )
...
* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
2025-06-27 01:51:36 +03:00
geohotstan
50936b4a18
ONNX real float16 ( #10694 )
...
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-26 14:05:12 -04:00
chenyu
8751d47985
CosineAnnealingLRWithWarmup ( #10981 )
2025-06-25 17:45:21 -04:00
Ignacio Sica
21f1c4cc09
remove some linearize calls from tests [pr] ( #10978 )
...
* remove some linearize calls from tests
speed_compare_cuda_ptx
test_uop_spec
test_linearizer
test_uops
test_winograd
* more clear assert message
2025-06-25 12:37:17 -07:00
qazal
de4b9bf53b
add opts_to_apply option to AST KernelInfo ( #10950 )
...
* proposal: add option to override opts in the get_program API
* update test_linearizer_rewrite
* state in uops
* update process_replay and names
* empty isn't none
* fix process replay
2025-06-24 18:55:39 +03:00
qazal
7a5e4e0bf1
fix unittests process replay [pr] ( #10947 )
2025-06-24 10:30:23 +03:00
George Hotz
ae4d2d71b4
bump line count to 14500
2025-06-23 15:32:27 -07:00
George Hotz
e15754db28
remove (some) kernelize from llama and test schedule speed ( #10939 )
...
* remove kernelize from llama
* 405B
* space
2025-06-23 15:07:31 -07:00
chenyu
42b1c9625b
skip test TestKiTS19Dataset::test_training_set ( #10936 )
...
flaky
2025-06-23 14:27:24 -04:00
patrini32
9e9fd44987
refactor test/external/external_llama_eval.py ( #10567 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-06-23 10:43:20 -07:00
qazal
7820aeca8e
update codegen process replay to use get_program [pr] ( #10921 )
...
* update codegen process replay to get_program [pr]
* precommit
* try str replace
* +to_function_name
* fixup tc
* local2.sh
* fix openpilot NOLOCALS
* new local.sh
* correct merge
* beam cache
* back
* revert beam thing
* adding opts_override and name_override makes output of get_program
reproducible
* min diff
2025-06-23 17:31:41 +03:00
alpharush
22f9696522
Fix/hcqfuzz harnesss bug ( #10923 )
...
* update command so extra module is found
* fix empty range in randrange errors
* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc
ONNX improve dtype fallback ( #10800 )
...
* fix
* add early verbose demo test
* is this how to write tests :s
* is definition drift even a thing? gemini says it is
* clean up
* better
* even better
* try add to CI
* doesn't work quite yet
* much more work to be done
* whoops
* partition the test heh
* skipif
* some nits for better names
* add webgpu test for onnxrunner
* fix reference links
* flush for now
2025-06-21 19:29:45 -04:00
chenyu
0480139def
log_perplexity metrics ( #10912 )
2025-06-21 10:44:47 -04:00
nimlgen
0e7bd9fd03
factor out generic MemoryManager ( #10910 )
...
* allocator -> memory
* just moveout it
* mm is abstracted
* need entry abstraction
* fix
* mypy
2025-06-21 16:18:33 +03:00
George Hotz
7636d2cdc5
flip order of get_program args ( #10905 )
2025-06-20 17:23:23 -07:00
George Hotz
b41e0563a3
move stuff to kernelize folder ( #10902 )
...
* move stuff to kernelize folder
* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
92678e59ee
move kernel to opt ( #10899 )
2025-06-20 15:22:28 -07:00
chenyu
a3dae51085
lower test_gemm_8192 on red ( #10883 )
2025-06-19 10:01:25 -04:00
George Hotz
18593c9800
one less rewrite on schedule [pr] ( #10872 )
...
* one less rewrite on schedule [pr]
* verify in ebs
2025-06-18 17:06:17 -07:00
wozeparrot
bdbf121285
fix: contigous -> contiguous ( #10868 )
2025-06-18 13:09:51 -07:00
George Hotz
cba6e15937
split grouper and kernelize [pr] ( #10854 )
2025-06-17 17:54:20 -07:00
uuuvn
a51f18f8f9
CI flakiness ( #10851 )
...
https://github.com/tinygrad/tinygrad/actions/runs/15718103629/job/44292845140?pr=10753#step:4:161
2025-06-17 14:46:30 -07:00
nimlgen
c0329148c7
am: check va is aligned to page size ( #10815 )
...
* am: check va is aligned to page size
* swap them
* is this faster
2025-06-15 22:51:09 +03:00
George Hotz
5dc1bc6070
switch get_kernel -> get_program [pr] ( #10817 )
...
* switch get_kernel -> get_program [pr]
* fix tests
2025-06-15 12:26:50 -07:00
wozeparrot
eb739bb96a
hotfix: lower threshold ( #10786 )
2025-06-11 19:36:20 -04:00
chenyu
612cdf5146
move fuzz_shape_ops to run with other fuzzer ( #10767 )
...
* move fuzz_shape_ops to run with other fuzzer
* don't skip CPU
2025-06-10 17:43:04 -04:00
b1tg
52c49dd4f3
fix onnx ci ( #10762 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-06-10 14:28:40 -04:00
George Hotz
f84c320548
better external_benchmark_schedule [pr] ( #10722 )
2025-06-09 10:26:11 -07:00
b1tg
24d328e313
onnx parser ( #10435 )
...
* onnx parser
* fix compile, lint
* onnx.load -> onnx_load
* compatible with ModelProto
* fix test external_test_onnx_ops.py
* fix tests
* fix signed int
* reduce to 261 lines
* fix TypeProto.Optional
* debug for _parse_message, add TypeProto.Sequence, cleanup
* onnx_load from Tensor
* remove BufferedReader
* 174 lines and reduce tensor copy
* cleanup
* use onnx_load in external_model_benchmark.py
* fix qcom test
* [onnx] parser support external data
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-09 12:44:28 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] ( #10708 )
...
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
2025-06-08 14:05:56 -07:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
leopf
eb7305e6a4
Tensor.keccak("sha3_256") ( #7186 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-06-06 15:24:05 -07:00
wozeparrot
0d86f8d375
fix failed threefry ( #10646 )
2025-06-05 17:17:42 -07:00
chenyu
46811d0d3c
minor external_model_benchmark cleanup ( #10644 )
2025-06-05 14:13:28 -04:00
chenyu
80ebce421d
remove metal buffer limit in external_model_benchmark [pr] ( #10642 )
...
not needed anymore
2025-06-05 13:00:51 -04:00
wozeparrot
4d1686f767
clean: becnhmark -> benchmark ( #10620 )
2025-06-03 19:28:18 -07:00
qazal
910cabb081
add kernel count to grouper process replay differ [pr] ( #10611 )
2025-06-03 15:21:27 +03:00
qazal
3cc73a0172
simpler process replay main loop [pr] ( #10588 )
...
* simpler process replay main loop [pr]
* use logging
* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d
merge process replay and viz captures [pr] ( #10581 )
...
* refactoring
* test script
* work
* more work
* diff
* repr splits lines correctly
* that
* add location
* add location
* also don't need name_override
* k.copy
* [pr]
* name_override 2
* err
2025-06-01 12:30:10 +03:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] ( #10556 )
2025-05-28 22:20:02 -07:00