George Hotz
7636d2cdc5
flip order of get_program args ( #10905 )
2025-06-20 17:23:23 -07:00
George Hotz
b41e0563a3
move stuff to kernelize folder ( #10902 )
...
* move stuff to kernelize folder
* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
92678e59ee
move kernel to opt ( #10899 )
2025-06-20 15:22:28 -07:00
chenyu
a3dae51085
lower test_gemm_8192 on red ( #10883 )
2025-06-19 10:01:25 -04:00
George Hotz
18593c9800
one less rewrite on schedule [pr] ( #10872 )
...
* one less rewrite on schedule [pr]
* verify in ebs
2025-06-18 17:06:17 -07:00
wozeparrot
bdbf121285
fix: contigous -> contiguous ( #10868 )
2025-06-18 13:09:51 -07:00
George Hotz
cba6e15937
split grouper and kernelize [pr] ( #10854 )
2025-06-17 17:54:20 -07:00
uuuvn
a51f18f8f9
CI flakiness ( #10851 )
...
https://github.com/tinygrad/tinygrad/actions/runs/15718103629/job/44292845140?pr=10753#step:4:161
2025-06-17 14:46:30 -07:00
nimlgen
c0329148c7
am: check va is aligned to page size ( #10815 )
...
* am: check va is aligned to page size
* swap them
* is this faster
2025-06-15 22:51:09 +03:00
George Hotz
5dc1bc6070
switch get_kernel -> get_program [pr] ( #10817 )
...
* switch get_kernel -> get_program [pr]
* fix tests
2025-06-15 12:26:50 -07:00
wozeparrot
eb739bb96a
hotfix: lower threshold ( #10786 )
2025-06-11 19:36:20 -04:00
chenyu
612cdf5146
move fuzz_shape_ops to run with other fuzzer ( #10767 )
...
* move fuzz_shape_ops to run with other fuzzer
* don't skip CPU
2025-06-10 17:43:04 -04:00
b1tg
52c49dd4f3
fix onnx ci ( #10762 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-06-10 14:28:40 -04:00
George Hotz
f84c320548
better external_benchmark_schedule [pr] ( #10722 )
2025-06-09 10:26:11 -07:00
b1tg
24d328e313
onnx parser ( #10435 )
...
* onnx parser
* fix compile, lint
* onnx.load -> onnx_load
* compatible with ModelProto
* fix test external_test_onnx_ops.py
* fix tests
* fix signed int
* reduce to 261 lines
* fix TypeProto.Optional
* debug for _parse_message, add TypeProto.Sequence, cleanup
* onnx_load from Tensor
* remove BufferedReader
* 174 lines and reduce tensor copy
* cleanup
* use onnx_load in external_model_benchmark.py
* fix qcom test
* [onnx] parser support external data
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-09 12:44:28 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] ( #10708 )
...
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
2025-06-08 14:05:56 -07:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
leopf
eb7305e6a4
Tensor.keccak("sha3_256") ( #7186 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-06-06 15:24:05 -07:00
wozeparrot
0d86f8d375
fix failed threefry ( #10646 )
2025-06-05 17:17:42 -07:00
chenyu
46811d0d3c
minor external_model_benchmark cleanup ( #10644 )
2025-06-05 14:13:28 -04:00
chenyu
80ebce421d
remove metal buffer limit in external_model_benchmark [pr] ( #10642 )
...
not needed anymore
2025-06-05 13:00:51 -04:00
wozeparrot
4d1686f767
clean: becnhmark -> benchmark ( #10620 )
2025-06-03 19:28:18 -07:00
qazal
910cabb081
add kernel count to grouper process replay differ [pr] ( #10611 )
2025-06-03 15:21:27 +03:00
qazal
3cc73a0172
simpler process replay main loop [pr] ( #10588 )
...
* simpler process replay main loop [pr]
* use logging
* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d
merge process replay and viz captures [pr] ( #10581 )
...
* refactoring
* test script
* work
* more work
* diff
* repr splits lines correctly
* that
* add location
* add location
* also don't need name_override
* k.copy
* [pr]
* name_override 2
* err
2025-06-01 12:30:10 +03:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] ( #10556 )
2025-05-28 22:20:02 -07:00
Sieds Lykles
ae02a1e232
[bounty] Z3 symbolic fuzzer [pr] ( #10514 )
...
* First version, caught a bug?
* Nicely print failure to reproduce
* Remove that
* Put the assert back
* Change fuzzing to use testing_unit so it has z3
* Test key to match
* Add rule
* Add test
* Add test for edge case 0
* Merge patterns
* update comment
* consistent whitespace
* whitespace
* add condition
* add test
* update comment
* use Variable
* fuzzer using z3_renderer
* Cleaned up printing and debugging
* working new fuzzer
* change some comments and printing
* more formatting
* fuzz failures in seperate file
* fix fstring
* more tests
* naming
* remove added line
* remove comment
* print number of skipped expressions
* use self.assertEqual
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-28 16:28:37 -04:00
geohotstan
fd9f236a82
move test over ( #10508 )
2025-05-25 21:51:51 -04:00
George Hotz
0d39bb5de1
rename to get_kernelize_map ( #10465 )
2025-05-22 11:44:44 -07:00
qazal
df4cbb69e9
move fuzz_schedule.py to extra [pr] ( #10444 )
2025-05-21 10:07:24 +03:00
chenyu
29624af872
skip commavq in external_model_benchmark ( #10439 )
...
precision issue with different onnxruntime version
2025-05-21 01:45:33 -04:00
nimlgen
2895198c36
am: download regs ( #10419 )
...
* am: download regs
* x
* linter
* mypy
* after merge
* raise
* fixed name
* fix
* xx
* remove
* missing reg
* missing reg
* move to online
* ops
2025-05-20 18:59:56 +03:00
George Hotz
b06291077c
no amdgpu kernel driver ( #10408 )
...
* no amdgpu kernel driver
* don't test hip
* lower req
2025-05-18 20:52:39 -07:00
George Hotz
411392dfb7
move files into uop dir ( #10399 )
...
* move files into uop dir [pr]
* tinygrad.uop is a thing
* fix uop docs, no pr
* fix viz
2025-05-18 11:38:28 -07:00
qazal
9e2089dcd4
don't raise Exception in process replay [pr] ( #10392 )
...
* don't raise Exception in process replay [pr]
* continue generating diffs unless [pr] is set, exit(1) otherwise
* change
* works
2025-05-18 11:23:23 +03:00
qazal
e9e5b54e43
grouper cleanups and merge with insert_kernels [pr] ( #10349 )
...
* grouper cleanups and merge with insert_kernels [pr]
* remove that
2025-05-16 14:39:56 +03:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb ( #10185 )
2025-05-15 16:14:56 -07:00
qazal
1770e00c41
only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] ( #10292 )
2025-05-14 11:58:42 +03:00
qazal
1c97338be5
enable process replay assert for schedule [pr] ( #10280 )
...
* enable process replay assert for schedule
* start at unique+1
2025-05-14 11:10:47 +03:00
uuuvn
7bc4864bc4
Make dev a property of Allocator ( #10286 )
...
* Make `dev` a property of `Allocator`
(this is a prereq refactor for #10285 )
At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.
This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).
`MallocAllocator` is a bit special, but passing `None` works just fine.
* typing
* ignore type instead of cast
2025-05-13 17:01:01 -07:00
nimlgen
6f42bf8b54
usbgpu: 10 steps in benchmark to hit cache ( #10273 )
2025-05-13 17:06:50 +03:00
geohotstan
1c4ab6b991
ONNX add tests against ORT ( #10270 )
...
* start
* clean up
* indicate file location too
2025-05-13 04:03:52 -04:00
nimlgen
2145bce3f9
usbgpu: copyin size is 16k ( #10240 )
...
* usbgpu: copyin size is 16k
* ush
2025-05-09 22:12:54 +03:00
nimlgen
267ba9b592
usbgpu: better names in copy speed benchmark ( #10212 )
2025-05-08 16:12:37 +03:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci ( #10208 )
...
* usbgpu: benchmark
* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
wozeparrot
10437904cd
refactor: ops_cloud -> ops_remote [pr] ( #10166 )
2025-05-05 15:59:51 -07:00
George Hotz
a0240d8c2b
lil work on llvm speed ( #10157 )
...
* lil work on llvm speed
* llvm failing test
* 1e-4
* simpler failing test
* once is fine
* gpt suggests this syntax change
* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6
move merge views [pr] ( #10156 )
...
* move merge views [pr]
* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606
cache rewrites for renderer [pr] ( #10155 )
...
* add caching to rewrites for renderer [pr]
* remove that
* update ebs
2025-05-04 13:45:15 -07:00
nimlgen
45bf7c5b81
am: add allocation bench ( #10135 )
...
* init allocation bench
* sorryg
* betetr
2025-05-02 13:51:07 +03:00