George Hotz
984f09ac74
flip Ops.COPY order [pr] ( #10120 )
2025-04-30 16:50:18 -04:00
chenyu
17d4d258ea
simple symbolic slice in llama [pr] ( #10112 )
...
support slice that has step None and stop > start
2025-04-30 14:36:35 -04:00
nimlgen
0e1beaf44f
nv: align copies + better test ( #10118 )
2025-04-30 20:09:53 +03:00
nimlgen
2ec3b722e2
nv: fix copies larger than 4g ( #10117 )
2025-04-30 18:43:17 +03:00
George Hotz
d81acbeef6
multi: move shrink after copy ( #10109 )
...
* multi: move shrink after copy
* passing now
2025-04-30 10:29:51 -04:00
nimlgen
5c7d004da5
hcq: refactor int ptrs to hcqbuffers ( #10105 )
...
* hcq: refactor int ptrs to hcqbuffers
* more refactors
* linter
* use in allocator
* test fiz
* fx
* ops
* final?
* simpler
* keep this for now
2025-04-30 00:12:18 +03:00
qazal
93bf8764f2
do not open devices in lowering ( #10101 )
...
* do not open devices in lowering [pr]
* ctx=opts
* ctx
* fuzz test
2025-04-29 23:18:16 +08:00
George Hotz
c3ff308abb
range has only one src now [pr] ( #10100 )
...
* range has only one op now
* fix z3 checker
* ci fix
* needs shell
* try pip ensure update
* that ensurepip is useless
* upgrade pip before cache
* windows happy?
2025-04-29 10:31:05 -04:00
George Hotz
427471550a
hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff
2025-04-29 09:02:27 -04:00
qazal
ad7546c931
assert in test_indexing_two_bind instead of silent fail ( #10099 )
...
* assert in test_indexing_two_bind instead of silent fail
* debuggable
* skip test_simple_train
2025-04-29 20:23:25 +08:00
qazal
cbf7347cd6
display viz rewrites with tabbing if they are subrewrites ( #10097 )
...
* display viz rewrites with tabbing if they are subrewrites
* update viz api
2025-04-29 17:57:21 +08:00
George Hotz
73c2f6602f
test sdxl softmax ( #10096 )
2025-04-28 21:55:50 -04:00
George Hotz
eaceafecae
do fusion locally ( #10095 )
...
* do fusion locally
* oops, that's the right way
* explicit delete closure
2025-04-28 20:45:37 -04:00
George Hotz
a2d0684fc1
test_attention_simple_view ( #10092 )
...
* test_attention_simple_view
* correct comment
2025-04-28 20:01:22 -04:00
Ignacio Sica
bda116d773
fix use_tensor_cores propagation ( #10048 )
...
* propagate use_tensor_cores
* add use_tensor_core to arg in test and search
* bugfix
* get TC val from ContextVar in search
* revert minor space change
* add tc emulation test to ci and benchmark
* revert
* revert whitespace change
* remove test for ptx
* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
George Hotz
d32f5e9f3a
improve rendering of shapes in viz + investigate symbolic [pr] ( #10091 )
2025-04-28 16:44:09 -04:00
Sieds Lykles
dbb7aee02e
Split constant in div with negative x ( #10088 )
...
* add rule
* change test
* lower complexity limit
* remove offset in fold_unrolled_divs
* remove import
* add one more condition
2025-04-28 16:24:14 -04:00
George Hotz
ecff82a698
fixing single kernel softmax: resolve ( #10086 )
...
* fixing single kernel softmax: resolve
* add failing lin test
2025-04-28 13:46:20 -04:00
George Hotz
4c242b0483
hotfix: tests all pass on metal local
2025-04-28 12:09:00 -04:00
George Hotz
690dac79b5
don't modify the ranges on reduce rewrite ( #10062 )
...
* bug in div range folding
* simpler
* oh, this is right for indexing, but the div mod folding needs to be fixed
* reenable
* Passing test_complexity_w_unroll2 (#10068 )
* Passing
* remove non_folded_divs
* Add check for negative tern in div folding
* Add test
* bump that limit
* fix casted
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com >
2025-04-28 12:01:19 -04:00
George Hotz
129bddde74
lin failure from SINGLE_KERNEL_SOFTMAX ( #10073 )
...
* lin failure from SINGLE_KERNEL_SOFTMAX
* fix lin issue
* more pure diff
2025-04-27 13:02:10 -04:00
George Hotz
68c5f7ba80
load fast in sdxl ( #10072 )
...
* load fast in sdxl
* back to that with the ret
* no context
2025-04-27 11:58:51 -04:00
George Hotz
b6d2effaf5
assign is contiguous ( #10066 )
...
* assign is contiguous
* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
George Hotz
1253819151
make beautiful indexing use a Variable ( #10063 )
...
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow
2025-04-27 08:22:38 -04:00
chenyu
4c1ce1a299
don't simplify if div folding resulted in negative numerator ( #10064 )
...
* don't simplify if div folding resulted in negative numerator
* test
2025-04-26 17:01:18 -04:00
George Hotz
1805403821
fix rand arange folding ( #10060 )
...
* test rand range
* --amend
* fix rand arange folding
* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
qazal
d13c100981
don't sort dims in verify_sink_dims [pr] ( #10059 )
...
* don't sort dims in verify_sink_dims [pr]
* 1 can exist with n
* put process_replay warn last
* assert shape is the same
* bring that back
2025-04-26 23:24:30 +08:00
George Hotz
11113c9d07
reduce_unparented ( #10056 )
2025-04-26 09:48:16 -04:00
George Hotz
ea5dddc537
reduce collapse generic ( #10045 )
...
* reduce collapse generic
* new arange folder
* new range folding
* correct with sym
* all tests pass
* indexing ops passes
* failing tests
* fix tests, remove unused
* revert that
* torch indexing is fast
* skip on webgpu
* touchups
* comments
2025-04-26 09:13:24 -04:00
quortus
5cdc96409e
Update outdated renderer.render calls ( #10044 )
2025-04-26 07:35:19 -04:00
nimlgen
0fc85a2b0a
hcqfuzz: init ( #10049 )
...
* hcqfuzz: init
* fix fuzz
* linter
* graph
* taht test
* update readme
2025-04-25 23:19:21 +03:00
Ignacio Sica
76a86735c0
hotfix amd bf16 is supported case ( #10039 )
...
* hotfix amd and amd_llvm
* bf16 not supported in ci
* hotfix amd_llvm is not a device
* remove default
* dont gate on ci and amd_llvm
* minor cleanup
* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe
fix helper_tc_allclose ( #9606 )
...
* fix helper_tc_allclose
* cleanup
* hotfix
* cleanup
* cleanup
* check real buffer and add cast for bf16
* cleanup
* fix padded for ops_python
* avoid assert on amd emulated tc
* swap dimensions
* revert, should have nothing to do with padded
* revert fix, should not go in this pr
* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561
More yolo processing in tinygrad ( #9928 )
...
* more tg less np
* update webgpu html for new compile
* resize boxes
* remove text
* add back note
* fix indentation
* fix indentation
* remove magic num
* remove now unused funcs
* back to numpy nms
* no loop
* fix iou suppression
* update test
* dont suppress other classes
* add working scale
* fix expected value, rounded up 0.24 was being counted
* add postprocess bool for onnx test
* fix indents
* clean
* clean
* fix indent
* remove print
* fix indent
* remove unused import
* remove hardcoded 0.25
* space
* spacing
* clean label_predictions func
* remove single item lists
* space
* use postprocess output in test
* space
* clean
* clean
* remove redundant threshold
* remove redundant threshold
* clean
* rename var
* move loop into func
* unhardcode iou_threshold
* remove unused values
* clean
* add note
* clean
* keep const
* move back funcs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 16:21:46 -04:00
Ignacio Sica
51ca19d061
set test_tensor_cores_padded_amd to expectedFailure ( #10036 )
...
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason
2025-04-24 17:11:40 -03:00
uuuvn
779aa1e2e9
Enable image tests on cloud if clouddev supports image ( #9903 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 14:30:12 -04:00
Ignacio Sica
373ca59b7f
use is_dtype_supported to check dtype support in tc tests ( #10035 )
2025-04-24 14:59:14 -03:00
uuuvn
754d789f51
Fix and enable jit tests on CLOUD ( #10031 )
2025-04-24 18:39:31 +03:00
George Hotz
aec75f51ef
fixup some slow CI tests [pr] ( #10027 )
...
* fixup some slow CI tests [pr]
* shrink test index
2025-04-24 09:20:49 -04:00
qazal
c990aac2b1
skip flaky test_transcribe_file1_OOB ( #10026 )
2025-04-24 21:09:43 +08:00
Sieds Lykles
e75be6eafc
[bounty] [pr] index validation with z3 ( #9981 )
...
* index validation with z3
* Change comment
* toposort -> toposort()
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 08:06:08 -04:00
quortus
9e49721c47
CPUGraph support for clang ( #10014 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 07:52:35 -04:00
Park Jun
c3ad7b2a84
create randperm and support pytorch backend ( #10019 )
2025-04-24 07:29:02 -04:00
nimlgen
1c5e353249
am: use mmio iface ( #10012 )
...
* am: use mmio iface
* linters
* fixes
* fixes + cleanups
* mute
* mypy
* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767
toposort is a function [pr] ( #10004 )
2025-04-23 16:25:03 +01:00
uuuvn
0730ff0e50
Skip test that requires lru if device's allocator isn't lru ( #10003 )
2025-04-23 16:12:56 +01:00
uuuvn
9de73ccc22
Skip test that requires python 3.12 on older versions ( #10001 )
...
`out.cast(it.dtype.fmt).tolist()` fails with `ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'`
2025-04-23 10:09:26 -04:00
George Hotz
71ecc7fa1a
use a pattern matcher for upcast [pr] ( #10000 )
2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec
move simplify into views_to_indexed_uops ( #9999 )
...
* move simplify into views_to_indexed_uops
* cache that
2025-04-23 13:50:27 +01:00
pkotzbach
dbbd755cba
FP8s truncate ( #9937 )
...
* truncate fp8
* fix
* maybe like that?
* fix linters
* ruff
* move from extra and add ml_types to tests
* minor changes
* str to dtypes and nan support
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com >
2025-04-22 19:12:49 -04:00