Commit Graph

870 Commits

Author SHA1 Message Date
George Hotz
5dc1bc6070 switch get_kernel -> get_program [pr] (#10817)
* switch get_kernel -> get_program [pr]

* fix tests
2025-06-15 12:26:50 -07:00
wozeparrot
eb739bb96a hotfix: lower threshold (#10786) 2025-06-11 19:36:20 -04:00
chenyu
612cdf5146 move fuzz_shape_ops to run with other fuzzer (#10767)
* move fuzz_shape_ops to run with other fuzzer

* don't skip CPU
2025-06-10 17:43:04 -04:00
b1tg
52c49dd4f3 fix onnx ci (#10762)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-06-10 14:28:40 -04:00
George Hotz
f84c320548 better external_benchmark_schedule [pr] (#10722) 2025-06-09 10:26:11 -07:00
b1tg
24d328e313 onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
leopf
eb7305e6a4 Tensor.keccak("sha3_256") (#7186)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-06 15:24:05 -07:00
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
46811d0d3c minor external_model_benchmark cleanup (#10644) 2025-06-05 14:13:28 -04:00
chenyu
80ebce421d remove metal buffer limit in external_model_benchmark [pr] (#10642)
not needed anymore
2025-06-05 13:00:51 -04:00
wozeparrot
4d1686f767 clean: becnhmark -> benchmark (#10620) 2025-06-03 19:28:18 -07:00
qazal
910cabb081 add kernel count to grouper process replay differ [pr] (#10611) 2025-06-03 15:21:27 +03:00
qazal
3cc73a0172 simpler process replay main loop [pr] (#10588)
* simpler process replay main loop [pr]

* use logging

* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d merge process replay and viz captures [pr] (#10581)
* refactoring

* test script

* work

* more work

* diff

* repr splits lines correctly

* that

* add location

* add location

* also don't need name_override

* k.copy

* [pr]

* name_override 2

* err
2025-06-01 12:30:10 +03:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
geohotstan
fd9f236a82 move test over (#10508) 2025-05-25 21:51:51 -04:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00
qazal
df4cbb69e9 move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00
chenyu
29624af872 skip commavq in external_model_benchmark (#10439)
precision issue with different onnxruntime version
2025-05-21 01:45:33 -04:00
nimlgen
2895198c36 am: download regs (#10419)
* am: download regs

* x

* linter

* mypy

* after merge

* raise

* fixed name

* fix

* xx

* remove

* missing reg

* missing reg

* move to online

* ops
2025-05-20 18:59:56 +03:00
George Hotz
b06291077c no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
qazal
9e2089dcd4 don't raise Exception in process replay [pr] (#10392)
* don't raise Exception in process replay [pr]

* continue generating diffs unless [pr] is set, exit(1) otherwise

* change

* works
2025-05-18 11:23:23 +03:00
qazal
e9e5b54e43 grouper cleanups and merge with insert_kernels [pr] (#10349)
* grouper cleanups and merge with insert_kernels [pr]

* remove that
2025-05-16 14:39:56 +03:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
qazal
1770e00c41 only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292) 2025-05-14 11:58:42 +03:00
qazal
1c97338be5 enable process replay assert for schedule [pr] (#10280)
* enable process replay assert for schedule

* start at unique+1
2025-05-14 11:10:47 +03:00
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
nimlgen
6f42bf8b54 usbgpu: 10 steps in benchmark to hit cache (#10273) 2025-05-13 17:06:50 +03:00
geohotstan
1c4ab6b991 ONNX add tests against ORT (#10270)
* start

* clean up

* indicate file location too
2025-05-13 04:03:52 -04:00
nimlgen
2145bce3f9 usbgpu: copyin size is 16k (#10240)
* usbgpu: copyin size is 16k

* ush
2025-05-09 22:12:54 +03:00
nimlgen
267ba9b592 usbgpu: better names in copy speed benchmark (#10212) 2025-05-08 16:12:37 +03:00
nimlgen
ba52fce4b2 usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
wozeparrot
10437904cd refactor: ops_cloud -> ops_remote [pr] (#10166) 2025-05-05 15:59:51 -07:00
George Hotz
a0240d8c2b lil work on llvm speed (#10157)
* lil work on llvm speed

* llvm failing test

* 1e-4

* simpler failing test

* once is fine

* gpt suggests this syntax change

* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6 move merge views [pr] (#10156)
* move merge views [pr]

* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606 cache rewrites for renderer [pr] (#10155)
* add caching to rewrites for renderer [pr]

* remove that

* update ebs
2025-05-04 13:45:15 -07:00
nimlgen
45bf7c5b81 am: add allocation bench (#10135)
* init allocation bench

* sorryg

* betetr
2025-05-02 13:51:07 +03:00
nimlgen
30bd6a619f usb gpu (#8766)
* start gpu

* progress

* fixes

* read correct

* libusb

* libusb works

* support asm24

* hmm

* one access file

* fix extra

* start AMBar

* works on am

* back to usb

* patch fw

* full fast write into a bar

* ugh, minus one gpus, next please

* mute libusb for now

* usb for asm24

* 63

* hmm

* ops

* rescan

* and gpu shoudl be there

* enumerate them?

* usbgpu bus 4, 100% reliable (draft)

* lil

* works

* comments

* add DEBUG

* cleaner

* simplest

* Revert "simplest"

This reverts commit 1d00354c16.

* Revert "cleaner"

This reverts commit c5662de956.

* assert we find gpu

* that's simpler

* this back

* simpler?

* correcT

* work

* nonsense

* works with more checks

* this works

* the 6s in the right place

* reliable now

* fix after reboot

* set config

* 1s timeouts

* close to fw loading

* streams

* usbhub works

* endpoints

* fix

* want to test tiny10

* move to tiny 10

* fix gpu

* ugly speed

* smth

* mostly broken, but signals and dmas

* do not reset gpu every time

* changes to run kernels

* ugh, not working

* t10

* pg and sc files

* some prog

* um?

* somehow it works

* patched for 24

* some tries

* minimal

* moving

* back to working

* so sloooooow

* move to controller

* usb.py rewrite

* rework

* cleaner 1

* cleaner 2

* cleaner 3

* new abstractions

* aft merge

* init controller

* cleaner 4

* cleaner 5

* patcher + tiny changes

* ignore that

* cleaner 6

* after rebase

* cleaner 7

* bring it back

* start linter war

* linter 2

* autogen was missing

* fix autogen

* typing

* better?

* mypy

* extra/legacy rename and cleaner

* shuffle

* better printing

* tiny changes and tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-01 18:03:47 +03:00
qazal
93bf8764f2 do not open devices in lowering (#10101)
* do not open devices in lowering [pr]

* ctx=opts

* ctx

* fuzz test
2025-04-29 23:18:16 +08:00
George Hotz
427471550a hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff 2025-04-29 09:02:27 -04:00
George Hotz
73c2f6602f test sdxl softmax (#10096) 2025-04-28 21:55:50 -04:00
Ignacio Sica
bda116d773 fix use_tensor_cores propagation (#10048)
* propagate use_tensor_cores

* add use_tensor_core to arg in test and search

* bugfix

* get TC val from ContextVar in search

* revert minor space change

* add tc emulation test to ci and benchmark

* revert

* revert whitespace change

* remove test for ptx

* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
qazal
d13c100981 don't sort dims in verify_sink_dims [pr] (#10059)
* don't sort dims in verify_sink_dims [pr]

* 1 can exist with n

* put process_replay warn last

* assert shape is the same

* bring that back
2025-04-26 23:24:30 +08:00
quortus
5cdc96409e Update outdated renderer.render calls (#10044) 2025-04-26 07:35:19 -04:00
nimlgen
0fc85a2b0a hcqfuzz: init (#10049)
* hcqfuzz: init

* fix fuzz

* linter

* graph

* taht test

* update readme
2025-04-25 23:19:21 +03:00
Rory Clear
3a189fa561 More yolo processing in tinygrad (#9928)
* more tg less np

* update webgpu html for new compile

* resize boxes

* remove text

* add back note

* fix indentation

* fix indentation

* remove magic num

* remove now unused funcs

* back to numpy nms

* no loop

* fix iou suppression

* update test

* dont suppress other classes

* add working scale

* fix expected value, rounded up 0.24 was being counted

* add postprocess bool for onnx test

* fix indents

* clean

* clean

* fix indent

* remove print

* fix indent

* remove unused import

* remove hardcoded 0.25

* space

* spacing

* clean label_predictions func

* remove single item lists

* space

* use postprocess output in test

* space

* clean

* clean

* remove redundant threshold

* remove redundant threshold

* clean

* rename var

* move loop into func

* unhardcode iou_threshold

* remove unused values

* clean

* add note

* clean

* keep const

* move back funcs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 16:21:46 -04:00