quortus
5cdc96409e
Update outdated renderer.render calls ( #10044 )
2025-04-26 07:35:19 -04:00
nimlgen
0fc85a2b0a
hcqfuzz: init ( #10049 )
...
* hcqfuzz: init
* fix fuzz
* linter
* graph
* taht test
* update readme
2025-04-25 23:19:21 +03:00
Ignacio Sica
76a86735c0
hotfix amd bf16 is supported case ( #10039 )
...
* hotfix amd and amd_llvm
* bf16 not supported in ci
* hotfix amd_llvm is not a device
* remove default
* dont gate on ci and amd_llvm
* minor cleanup
* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe
fix helper_tc_allclose ( #9606 )
...
* fix helper_tc_allclose
* cleanup
* hotfix
* cleanup
* cleanup
* check real buffer and add cast for bf16
* cleanup
* fix padded for ops_python
* avoid assert on amd emulated tc
* swap dimensions
* revert, should have nothing to do with padded
* revert fix, should not go in this pr
* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561
More yolo processing in tinygrad ( #9928 )
...
* more tg less np
* update webgpu html for new compile
* resize boxes
* remove text
* add back note
* fix indentation
* fix indentation
* remove magic num
* remove now unused funcs
* back to numpy nms
* no loop
* fix iou suppression
* update test
* dont suppress other classes
* add working scale
* fix expected value, rounded up 0.24 was being counted
* add postprocess bool for onnx test
* fix indents
* clean
* clean
* fix indent
* remove print
* fix indent
* remove unused import
* remove hardcoded 0.25
* space
* spacing
* clean label_predictions func
* remove single item lists
* space
* use postprocess output in test
* space
* clean
* clean
* remove redundant threshold
* remove redundant threshold
* clean
* rename var
* move loop into func
* unhardcode iou_threshold
* remove unused values
* clean
* add note
* clean
* keep const
* move back funcs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 16:21:46 -04:00
Ignacio Sica
51ca19d061
set test_tensor_cores_padded_amd to expectedFailure ( #10036 )
...
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason
2025-04-24 17:11:40 -03:00
uuuvn
779aa1e2e9
Enable image tests on cloud if clouddev supports image ( #9903 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 14:30:12 -04:00
Ignacio Sica
373ca59b7f
use is_dtype_supported to check dtype support in tc tests ( #10035 )
2025-04-24 14:59:14 -03:00
uuuvn
754d789f51
Fix and enable jit tests on CLOUD ( #10031 )
2025-04-24 18:39:31 +03:00
George Hotz
aec75f51ef
fixup some slow CI tests [pr] ( #10027 )
...
* fixup some slow CI tests [pr]
* shrink test index
2025-04-24 09:20:49 -04:00
qazal
c990aac2b1
skip flaky test_transcribe_file1_OOB ( #10026 )
2025-04-24 21:09:43 +08:00
Sieds Lykles
e75be6eafc
[bounty] [pr] index validation with z3 ( #9981 )
...
* index validation with z3
* Change comment
* toposort -> toposort()
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 08:06:08 -04:00
quortus
9e49721c47
CPUGraph support for clang ( #10014 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 07:52:35 -04:00
Park Jun
c3ad7b2a84
create randperm and support pytorch backend ( #10019 )
2025-04-24 07:29:02 -04:00
nimlgen
1c5e353249
am: use mmio iface ( #10012 )
...
* am: use mmio iface
* linters
* fixes
* fixes + cleanups
* mute
* mypy
* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767
toposort is a function [pr] ( #10004 )
2025-04-23 16:25:03 +01:00
uuuvn
0730ff0e50
Skip test that requires lru if device's allocator isn't lru ( #10003 )
2025-04-23 16:12:56 +01:00
uuuvn
9de73ccc22
Skip test that requires python 3.12 on older versions ( #10001 )
...
`out.cast(it.dtype.fmt).tolist()` fails with `ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'`
2025-04-23 10:09:26 -04:00
George Hotz
71ecc7fa1a
use a pattern matcher for upcast [pr] ( #10000 )
2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec
move simplify into views_to_indexed_uops ( #9999 )
...
* move simplify into views_to_indexed_uops
* cache that
2025-04-23 13:50:27 +01:00
pkotzbach
dbbd755cba
FP8s truncate ( #9937 )
...
* truncate fp8
* fix
* maybe like that?
* fix linters
* ruff
* move from extra and add ml_types to tests
* minor changes
* str to dtypes and nan support
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com >
2025-04-22 19:12:49 -04:00
qazal
f4ec57baff
new schedule linearizer enqueues KERNEL UOps [pr] ( #9993 )
...
* new schedule linearizer enqueues kernels [pr]
* no defaultdict
* diff
* minor
2025-04-23 05:17:58 +08:00
George Hotz
d1f6701eb7
hotfix: lower amd threshold + improve block reorder test
2025-04-22 20:44:29 +01:00
nimlgen
db51133537
rename HWInterface -> FileIOInterface ( #9989 )
...
* rename HWInterface -> FileIOInterface
* ugh
2025-04-22 22:18:57 +03:00
George Hotz
c1539b0319
putting add first orders loads as expected ( #9991 )
2025-04-22 20:12:05 +01:00
nimlgen
bd580d8ea4
hcq: use mmio interface in nv ( #9986 )
...
* hcq: start mmio interface
* allow double cast
* revert
* faster?
* simpler, not needed more now
* dd
* types
* fix
2025-04-22 21:58:12 +03:00
George Hotz
feee6986c9
faster block reorder ( #9990 )
...
* faster block reorder [pr]
* that shouldn't change order
* key just in sorted
* ind
2025-04-22 19:18:57 +01:00
qazal
6cb2d18c03
refactor schedule linearize to defaultdict [pr] ( #9984 )
...
* refactor schedule linearize to defaultdict [pr]
* skip that
* don't need .get
2025-04-23 00:00:23 +08:00
chenyu
9e5e371999
make DISABLE_COMPILER_CACHE a ContextVar [pr] ( #9983 )
2025-04-22 10:32:54 -04:00
qazal
bbc324f5dc
remove CAST_AFTER_EXPAND ( #9980 )
2025-04-22 21:06:11 +08:00
George Hotz
c519b553db
non recursive toposort is 2x+ faster ( #9979 )
...
* non recursive toposort is 2x+ faster
* don't change the order
2025-04-22 13:59:38 +01:00
qazal
7b55846e08
prep STORE UOp creation for multi output [pr] ( #9975 )
...
* prep STORE UOp creation for multi output [pr]
* test_multioutput_ast
2025-04-22 19:34:52 +08:00
George Hotz
e358e0a0c6
move metadata set to tensor [pr] ( #9976 )
...
* move metadata set to tensor [pr]
* only track that in tensor.py
2025-04-22 12:30:35 +01:00
George Hotz
f5dc70c624
microbenchmarks + micro speed ups ( #9972 )
...
* microbenchmarks
* forgot the ubenchs
* clean up type verify
2025-04-22 11:30:46 +01:00
qazal
1cf4e24ca5
fix kernelize usage with pm_gradient ( #9953 )
...
* fix kernelize usage with pm_gradient
* remove that
2025-04-22 17:26:05 +08:00
qazal
36ed3c3253
fix kernelize with VIEW children ( #9961 )
2025-04-21 23:38:46 +08:00
qazal
e8910540f6
Kernelize can be called multiple times on a Tensor ( #9949 )
...
* Kernelize can be called multiple times on a Tensor
* add (failing) test_kernelize_bw
2025-04-21 06:28:47 +08:00
qazal
1d90be2cff
match kernelize API in process replay ( #9948 )
2025-04-21 05:23:41 +08:00
qazal
e20ef7196a
Tensor.kernelize ( #9845 )
...
* add kernelize
* remove that
* kernelize returns self
* update abstractions2.py
* kernelize in test_schedule
* temp: assert BUFFER_VIEW's existence
* ASSIGN must have a buffer or subbuffer target
* assert and shrink
* fix
* padded setitem
* var
* toposort once
* extra
* base_buffer
* end with BUFFER_VIEW
* setitem for disk
* test_setitem_becomes_subbuffer
* mul slice test
* torch backend fix 1
* non-deterministic
* keep subbuffer
2025-04-20 20:53:49 +08:00
qazal
dd16087f62
fold double ASSIGN to same target ( #9941 )
2025-04-20 19:06:38 +08:00
qazal
9a9aba4cd5
setitem tests (some failing) from kernelize ( #9940 )
2025-04-20 18:47:55 +08:00
chenyu
6c30948df6
hand_coded_optimizations returns list[Opt] [pr] ( #9938 )
...
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b
remove required_optimizations ( #9848 )
2025-04-19 16:51:16 -04:00
Ignacio Sica
023b1c28a2
test_tensor_cores_padded refactor (#9724 )
...
* set pad t 3 for amd padded tc test
* change pad for amd regardless CI
* test tc padded uops and correctness separately
* add test_tensor_cores_padded_uops test to ci
* remove redundant chack for amd device
* cleanup
2025-04-18 17:05:54 -03:00
qazal
b58decac0c
fix diamond assigns before mapping tensors UOps to assigns ( #9855 )
...
* keep tensor_map until diamond assign fixup
* ctx
2025-04-18 14:17:43 +03:00
George Hotz
aa98aff4cd
don't use ops name, just keep sink ( #9922 )
...
* don't use ops name, just keep sink
* fix test
* endif sink
2025-04-18 08:59:18 +01:00
George Hotz
8919370c76
hotfix: fix test_save_all_dtypes on METAL
2025-04-18 08:42:31 +01:00
qazal
16dfe0a902
upstream remu ( #9921 )
2025-04-18 01:57:36 +03:00
chenyu
f5256e0020
Kernel.apply_opts [pr] ( #9917 )
...
* Kernel.apply_opts [pr]
updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization
* not you yet
2025-04-17 08:00:56 -04:00
Eitan Turok
2c7c205bc5
Fix dtype comparisons in vectorized transcendental + tests ( #9794 )
...
* init test
* cleanup
* init
* update
* fix
* fix python runtime for vectorized code
* awesome helper
* update
* update
* cleanup
* more cleaning
* cleanup more
* fix tests
* more cleaning
* cleanup more
* fix
* even cleaner
* failing tests is sad
* cleanup
* better name
* make tests pass
* remove vec from python runtime
* remove vec from eval_uop
* remove expected failues
* better name
2025-04-16 08:06:12 -04:00