Commit Graph

4667 Commits

Author SHA1 Message Date
George Hotz
984f09ac74 flip Ops.COPY order [pr] (#10120) 2025-04-30 16:50:18 -04:00
chenyu
17d4d258ea simple symbolic slice in llama [pr] (#10112)
support slice that has step None and stop > start
2025-04-30 14:36:35 -04:00
nimlgen
0e1beaf44f nv: align copies + better test (#10118) 2025-04-30 20:09:53 +03:00
nimlgen
2ec3b722e2 nv: fix copies larger than 4g (#10117) 2025-04-30 18:43:17 +03:00
George Hotz
d81acbeef6 multi: move shrink after copy (#10109)
* multi: move shrink after copy

* passing now
2025-04-30 10:29:51 -04:00
nimlgen
5c7d004da5 hcq: refactor int ptrs to hcqbuffers (#10105)
* hcq: refactor int ptrs to hcqbuffers

* more refactors

* linter

* use in allocator

* test fiz

* fx

* ops

* final?

* simpler

* keep this for now
2025-04-30 00:12:18 +03:00
qazal
93bf8764f2 do not open devices in lowering (#10101)
* do not open devices in lowering [pr]

* ctx=opts

* ctx

* fuzz test
2025-04-29 23:18:16 +08:00
George Hotz
c3ff308abb range has only one src now [pr] (#10100)
* range has only one op now

* fix z3 checker

* ci fix

* needs shell

* try pip ensure update

* that ensurepip is useless

* upgrade pip before cache

* windows happy?
2025-04-29 10:31:05 -04:00
George Hotz
427471550a hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff 2025-04-29 09:02:27 -04:00
qazal
ad7546c931 assert in test_indexing_two_bind instead of silent fail (#10099)
* assert in test_indexing_two_bind instead of silent fail

* debuggable

* skip test_simple_train
2025-04-29 20:23:25 +08:00
qazal
cbf7347cd6 display viz rewrites with tabbing if they are subrewrites (#10097)
* display viz rewrites with tabbing if they are subrewrites

* update viz api
2025-04-29 17:57:21 +08:00
George Hotz
73c2f6602f test sdxl softmax (#10096) 2025-04-28 21:55:50 -04:00
George Hotz
eaceafecae do fusion locally (#10095)
* do fusion locally

* oops, that's the right way

* explicit delete closure
2025-04-28 20:45:37 -04:00
George Hotz
a2d0684fc1 test_attention_simple_view (#10092)
* test_attention_simple_view

* correct comment
2025-04-28 20:01:22 -04:00
Ignacio Sica
bda116d773 fix use_tensor_cores propagation (#10048)
* propagate use_tensor_cores

* add use_tensor_core to arg in test and search

* bugfix

* get TC val from ContextVar in search

* revert minor space change

* add tc emulation test to ci and benchmark

* revert

* revert whitespace change

* remove test for ptx

* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
George Hotz
d32f5e9f3a improve rendering of shapes in viz + investigate symbolic [pr] (#10091) 2025-04-28 16:44:09 -04:00
Sieds Lykles
dbb7aee02e Split constant in div with negative x (#10088)
* add rule

* change test

* lower complexity limit

* remove offset in fold_unrolled_divs

* remove import

* add one more condition
2025-04-28 16:24:14 -04:00
George Hotz
ecff82a698 fixing single kernel softmax: resolve (#10086)
* fixing single kernel softmax: resolve

* add failing lin test
2025-04-28 13:46:20 -04:00
George Hotz
4c242b0483 hotfix: tests all pass on metal local 2025-04-28 12:09:00 -04:00
George Hotz
690dac79b5 don't modify the ranges on reduce rewrite (#10062)
* bug in div range folding

* simpler

* oh, this is right for indexing, but the div mod folding needs to be fixed

* reenable

* Passing test_complexity_w_unroll2 (#10068)

* Passing

* remove non_folded_divs

* Add check for negative tern in div folding

* Add test

* bump that limit

* fix casted

---------

Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
2025-04-28 12:01:19 -04:00
George Hotz
129bddde74 lin failure from SINGLE_KERNEL_SOFTMAX (#10073)
* lin failure from SINGLE_KERNEL_SOFTMAX

* fix lin issue

* more pure diff
2025-04-27 13:02:10 -04:00
George Hotz
68c5f7ba80 load fast in sdxl (#10072)
* load fast in sdxl

* back to that with the ret

* no context
2025-04-27 11:58:51 -04:00
George Hotz
b6d2effaf5 assign is contiguous (#10066)
* assign is contiguous

* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
George Hotz
1253819151 make beautiful indexing use a Variable (#10063)
* make beautiful indexing use a Variable

* stunning test

* better color

* training is broken

* fix tests

* fix variable indexing

* fix test

* no contiguous

* revert that

* revert that too

* indexing two bind

* skip for webgpu

* make not slow
2025-04-27 08:22:38 -04:00
chenyu
4c1ce1a299 don't simplify if div folding resulted in negative numerator (#10064)
* don't simplify if div folding resulted in negative numerator

* test
2025-04-26 17:01:18 -04:00
George Hotz
1805403821 fix rand arange folding (#10060)
* test rand range

* --amend

* fix rand arange folding

* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
qazal
d13c100981 don't sort dims in verify_sink_dims [pr] (#10059)
* don't sort dims in verify_sink_dims [pr]

* 1 can exist with n

* put process_replay warn last

* assert shape is the same

* bring that back
2025-04-26 23:24:30 +08:00
George Hotz
11113c9d07 reduce_unparented (#10056) 2025-04-26 09:48:16 -04:00
George Hotz
ea5dddc537 reduce collapse generic (#10045)
* reduce collapse generic

* new arange folder

* new range folding

* correct with sym

* all tests pass

* indexing ops passes

* failing tests

* fix tests, remove unused

* revert that

* torch indexing is fast

* skip on webgpu

* touchups

* comments
2025-04-26 09:13:24 -04:00
quortus
5cdc96409e Update outdated renderer.render calls (#10044) 2025-04-26 07:35:19 -04:00
nimlgen
0fc85a2b0a hcqfuzz: init (#10049)
* hcqfuzz: init

* fix fuzz

* linter

* graph

* taht test

* update readme
2025-04-25 23:19:21 +03:00
Ignacio Sica
76a86735c0 hotfix amd bf16 is supported case (#10039)
* hotfix amd and amd_llvm

* bf16 not supported in ci

* hotfix amd_llvm is not a device

* remove default

* dont gate on ci and amd_llvm

* minor cleanup

* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe fix helper_tc_allclose (#9606)
* fix helper_tc_allclose

* cleanup

* hotfix

* cleanup

* cleanup

* check real buffer and add cast for bf16

* cleanup

* fix padded for ops_python

* avoid assert on amd emulated tc

* swap dimensions

* revert, should have nothing to do with padded

* revert fix, should not go in this pr

* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561 More yolo processing in tinygrad (#9928)
* more tg less np

* update webgpu html for new compile

* resize boxes

* remove text

* add back note

* fix indentation

* fix indentation

* remove magic num

* remove now unused funcs

* back to numpy nms

* no loop

* fix iou suppression

* update test

* dont suppress other classes

* add working scale

* fix expected value, rounded up 0.24 was being counted

* add postprocess bool for onnx test

* fix indents

* clean

* clean

* fix indent

* remove print

* fix indent

* remove unused import

* remove hardcoded 0.25

* space

* spacing

* clean label_predictions func

* remove single item lists

* space

* use postprocess output in test

* space

* clean

* clean

* remove redundant threshold

* remove redundant threshold

* clean

* rename var

* move loop into func

* unhardcode iou_threshold

* remove unused values

* clean

* add note

* clean

* keep const

* move back funcs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 16:21:46 -04:00
Ignacio Sica
51ca19d061 set test_tensor_cores_padded_amd to expectedFailure (#10036)
* init

* add expected failure to correctly track progres

* hotfix

* skip for amd_llvm as well

* add skip

* add pr number

* move comment to amd test

* change reason
2025-04-24 17:11:40 -03:00
uuuvn
779aa1e2e9 Enable image tests on cloud if clouddev supports image (#9903)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 14:30:12 -04:00
Ignacio Sica
373ca59b7f use is_dtype_supported to check dtype support in tc tests (#10035) 2025-04-24 14:59:14 -03:00
uuuvn
754d789f51 Fix and enable jit tests on CLOUD (#10031) 2025-04-24 18:39:31 +03:00
George Hotz
aec75f51ef fixup some slow CI tests [pr] (#10027)
* fixup some slow CI tests [pr]

* shrink test index
2025-04-24 09:20:49 -04:00
qazal
c990aac2b1 skip flaky test_transcribe_file1_OOB (#10026) 2025-04-24 21:09:43 +08:00
Sieds Lykles
e75be6eafc [bounty] [pr] index validation with z3 (#9981)
* index validation with z3

* Change comment

* toposort -> toposort()

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 08:06:08 -04:00
quortus
9e49721c47 CPUGraph support for clang (#10014)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 07:52:35 -04:00
Park Jun
c3ad7b2a84 create randperm and support pytorch backend (#10019) 2025-04-24 07:29:02 -04:00
nimlgen
1c5e353249 am: use mmio iface (#10012)
* am: use mmio iface

* linters

* fixes

* fixes + cleanups

* mute

* mypy

* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767 toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00
uuuvn
0730ff0e50 Skip test that requires lru if device's allocator isn't lru (#10003) 2025-04-23 16:12:56 +01:00
uuuvn
9de73ccc22 Skip test that requires python 3.12 on older versions (#10001)
`out.cast(it.dtype.fmt).tolist()` fails with `ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'`
2025-04-23 10:09:26 -04:00
George Hotz
71ecc7fa1a use a pattern matcher for upcast [pr] (#10000) 2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec move simplify into views_to_indexed_uops (#9999)
* move simplify into views_to_indexed_uops

* cache that
2025-04-23 13:50:27 +01:00
pkotzbach
dbbd755cba FP8s truncate (#9937)
* truncate fp8

* fix

* maybe like that?

* fix linters

* ruff

* move from extra and add ml_types to tests

* minor changes

* str to dtypes and nan support

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
2025-04-22 19:12:49 -04:00