nimlgen
2145bce3f9
usbgpu: copyin size is 16k ( #10240 )
...
* usbgpu: copyin size is 16k
* ush
2025-05-09 22:12:54 +03:00
nimlgen
267ba9b592
usbgpu: better names in copy speed benchmark ( #10212 )
2025-05-08 16:12:37 +03:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci ( #10208 )
...
* usbgpu: benchmark
* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
wozeparrot
10437904cd
refactor: ops_cloud -> ops_remote [pr] ( #10166 )
2025-05-05 15:59:51 -07:00
George Hotz
a0240d8c2b
lil work on llvm speed ( #10157 )
...
* lil work on llvm speed
* llvm failing test
* 1e-4
* simpler failing test
* once is fine
* gpt suggests this syntax change
* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6
move merge views [pr] ( #10156 )
...
* move merge views [pr]
* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606
cache rewrites for renderer [pr] ( #10155 )
...
* add caching to rewrites for renderer [pr]
* remove that
* update ebs
2025-05-04 13:45:15 -07:00
nimlgen
45bf7c5b81
am: add allocation bench ( #10135 )
...
* init allocation bench
* sorryg
* betetr
2025-05-02 13:51:07 +03:00
nimlgen
30bd6a619f
usb gpu ( #8766 )
...
* start gpu
* progress
* fixes
* read correct
* libusb
* libusb works
* support asm24
* hmm
* one access file
* fix extra
* start AMBar
* works on am
* back to usb
* patch fw
* full fast write into a bar
* ugh, minus one gpus, next please
* mute libusb for now
* usb for asm24
* 63
* hmm
* ops
* rescan
* and gpu shoudl be there
* enumerate them?
* usbgpu bus 4, 100% reliable (draft)
* lil
* works
* comments
* add DEBUG
* cleaner
* simplest
* Revert "simplest"
This reverts commit 1d00354c16 .
* Revert "cleaner"
This reverts commit c5662de956 .
* assert we find gpu
* that's simpler
* this back
* simpler?
* correcT
* work
* nonsense
* works with more checks
* this works
* the 6s in the right place
* reliable now
* fix after reboot
* set config
* 1s timeouts
* close to fw loading
* streams
* usbhub works
* endpoints
* fix
* want to test tiny10
* move to tiny 10
* fix gpu
* ugly speed
* smth
* mostly broken, but signals and dmas
* do not reset gpu every time
* changes to run kernels
* ugh, not working
* t10
* pg and sc files
* some prog
* um?
* somehow it works
* patched for 24
* some tries
* minimal
* moving
* back to working
* so sloooooow
* move to controller
* usb.py rewrite
* rework
* cleaner 1
* cleaner 2
* cleaner 3
* new abstractions
* aft merge
* init controller
* cleaner 4
* cleaner 5
* patcher + tiny changes
* ignore that
* cleaner 6
* after rebase
* cleaner 7
* bring it back
* start linter war
* linter 2
* autogen was missing
* fix autogen
* typing
* better?
* mypy
* extra/legacy rename and cleaner
* shuffle
* better printing
* tiny changes and tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-01 18:03:47 +03:00
qazal
93bf8764f2
do not open devices in lowering ( #10101 )
...
* do not open devices in lowering [pr]
* ctx=opts
* ctx
* fuzz test
2025-04-29 23:18:16 +08:00
George Hotz
427471550a
hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff
2025-04-29 09:02:27 -04:00
George Hotz
73c2f6602f
test sdxl softmax ( #10096 )
2025-04-28 21:55:50 -04:00
Ignacio Sica
bda116d773
fix use_tensor_cores propagation ( #10048 )
...
* propagate use_tensor_cores
* add use_tensor_core to arg in test and search
* bugfix
* get TC val from ContextVar in search
* revert minor space change
* add tc emulation test to ci and benchmark
* revert
* revert whitespace change
* remove test for ptx
* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
qazal
d13c100981
don't sort dims in verify_sink_dims [pr] ( #10059 )
...
* don't sort dims in verify_sink_dims [pr]
* 1 can exist with n
* put process_replay warn last
* assert shape is the same
* bring that back
2025-04-26 23:24:30 +08:00
quortus
5cdc96409e
Update outdated renderer.render calls ( #10044 )
2025-04-26 07:35:19 -04:00
nimlgen
0fc85a2b0a
hcqfuzz: init ( #10049 )
...
* hcqfuzz: init
* fix fuzz
* linter
* graph
* taht test
* update readme
2025-04-25 23:19:21 +03:00
Rory Clear
3a189fa561
More yolo processing in tinygrad ( #9928 )
...
* more tg less np
* update webgpu html for new compile
* resize boxes
* remove text
* add back note
* fix indentation
* fix indentation
* remove magic num
* remove now unused funcs
* back to numpy nms
* no loop
* fix iou suppression
* update test
* dont suppress other classes
* add working scale
* fix expected value, rounded up 0.24 was being counted
* add postprocess bool for onnx test
* fix indents
* clean
* clean
* fix indent
* remove print
* fix indent
* remove unused import
* remove hardcoded 0.25
* space
* spacing
* clean label_predictions func
* remove single item lists
* space
* use postprocess output in test
* space
* clean
* clean
* remove redundant threshold
* remove redundant threshold
* clean
* rename var
* move loop into func
* unhardcode iou_threshold
* remove unused values
* clean
* add note
* clean
* keep const
* move back funcs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 16:21:46 -04:00
nimlgen
1c5e353249
am: use mmio iface ( #10012 )
...
* am: use mmio iface
* linters
* fixes
* fixes + cleanups
* mute
* mypy
* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767
toposort is a function [pr] ( #10004 )
2025-04-23 16:25:03 +01:00
George Hotz
71ecc7fa1a
use a pattern matcher for upcast [pr] ( #10000 )
2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec
move simplify into views_to_indexed_uops ( #9999 )
...
* move simplify into views_to_indexed_uops
* cache that
2025-04-23 13:50:27 +01:00
George Hotz
d1f6701eb7
hotfix: lower amd threshold + improve block reorder test
2025-04-22 20:44:29 +01:00
qazal
1d90be2cff
match kernelize API in process replay ( #9948 )
2025-04-21 05:23:41 +08:00
chenyu
6c30948df6
hand_coded_optimizations returns list[Opt] [pr] ( #9938 )
...
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b
remove required_optimizations ( #9848 )
2025-04-19 16:51:16 -04:00
qazal
b58decac0c
fix diamond assigns before mapping tensors UOps to assigns ( #9855 )
...
* keep tensor_map until diamond assign fixup
* ctx
2025-04-18 14:17:43 +03:00
George Hotz
aa98aff4cd
don't use ops name, just keep sink ( #9922 )
...
* don't use ops name, just keep sink
* fix test
* endif sink
2025-04-18 08:59:18 +01:00
chenyu
f5256e0020
Kernel.apply_opts [pr] ( #9917 )
...
* Kernel.apply_opts [pr]
updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization
* not you yet
2025-04-17 08:00:56 -04:00
geohotstan
4e8f25109a
Revert "ONNX add output shape validation ( #9720 )" ( #9904 )
...
This reverts commit ac713e04db .
2025-04-16 03:15:56 -04:00
nimlgen
83ae83d871
compare amd and am to cpu as well ( #9896 )
2025-04-15 13:32:18 +03:00
nimlgen
23a95dd84d
script to compare amd and am kerns ( #9889 )
...
* script to compare amd and am kerns
* tool
* is it used???
2025-04-15 00:11:22 +03:00
qazal
e201bc3e93
process replay kernel asts in toposort order [pr] ( #9869 )
...
* process replay kernel asts in toposort order [pr]
* use HEAD replay
2025-04-13 17:20:34 +08:00
Alexey Zaytsev
7dda6aae7d
Skip CLOUD in external_test_example ( #9857 )
...
Closes #9814
2025-04-12 10:17:44 +08:00
chenyu
8c6299bced
move hand_coded_optimizations to heuristic.py [pr] ( #9844 )
...
* move hand_coded_optimizations to heuristic.py [pr]
also folded all long lines
* make a copy and rename self -> k
* fix test
2025-04-10 23:40:16 -04:00
qazal
fbc6aa53d4
script for local process_replay + fix viz name [pr] ( #9837 )
2025-04-11 00:39:18 +08:00
qazal
16afe04f45
move process replay to grouper ( #9830 )
...
* simpler
* sched
2025-04-10 18:27:42 +08:00
chenyu
c462162db8
update benchmark bert scripts with BS and ACC_DTYPE ( #9826 )
...
BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x
2025-04-10 02:06:02 -04:00
George Hotz
fefee5d3ab
single kernel softmax ( #9776 )
...
* real single kernel softmax
* cleanup
* fix blockend insertion
* add to bert test
2025-04-08 12:35:48 +08:00
George Hotz
db22094d35
hotfix: update softmax fusion test
2025-04-08 11:23:19 +08:00
Sieds Lykles
07d1aefaf4
fast idiv ( #9755 )
...
* fast idiv with tests and fuzzer
* Add todo comment
* Add env variable to toggle fast_idiv
* Move env check
* Add fuzz fast_idiv to ci
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-07 08:32:24 -04:00
chenyu
b190d85ad7
benchmark script bert softmax ( #9759 )
2025-04-07 00:31:18 -04:00
chenyu
43e4565148
weighted linear in external_benchmark_bert_matmuls ( #9757 )
...
include the linear to get qkv, and permute so that stride matches with the real run
2025-04-06 23:35:42 -04:00
chenyu
8a585dc5c1
benchmark script for matmuls in bert ( #9752 )
...
2 main matmuls in the bert layers. getting these to be fast makes bert fast
2025-04-06 19:34:25 +08:00
George Hotz
926b0bcc57
cache folded upcast [pr] ( #9733 )
2025-04-04 11:23:19 +08:00
geohotstan
ac713e04db
ONNX add output shape validation ( #9720 )
...
* add output shape validation and remove support for sequence_type
* nit better err msg
* add sequence_type back
* improve err msg
* Revert "improve err msg"
This reverts commit dc9eaea4bb .
* Revert "add sequence_type back"
This reverts commit 288170b2d9 .
* do explicit shape equality
* small nit
2025-04-03 05:44:53 -04:00
George Hotz
49dafe6d43
add gc tests [pr] ( #9718 )
...
* add gc tests [pr]
* del
* more gc tests
* add NullGraph
2025-04-03 14:08:32 +08:00
geohotstan
e1d7e47cca
fix ONNX IsInf unintended dtype promotion ( #9711 )
...
* add IsInf
* add corresponding test
* that float16 is kinda silly
2025-04-02 22:46:15 -04:00
qazal
bb94f13e58
add RECORD_TRACEBACKS=1 option to process replay ( #9679 )
...
* add RECORD_TRACEBACKS=1 option to process replay
* stack
2025-04-02 11:58:27 +08:00
chenyu
c672716b38
improve vmin/vmax for IDIV ( #9678 )
2025-04-01 23:16:01 -04:00
geohotstan
d52e91db7b
ONNX ops clean ups ( #9622 )
...
* combine work from remove numpy and onnx ops tests
* clippy
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:39:22 -04:00