George Hotz
b341296304
hotfix: save sdxl ram
2025-04-27 12:09:45 -04:00
George Hotz
68c5f7ba80
load fast in sdxl ( #10072 )
...
* load fast in sdxl
* back to that with the ret
* no context
2025-04-27 11:58:51 -04:00
George Hotz
768eb94c3e
disable debug for load_state_dict [pr] ( #10070 )
2025-04-27 11:11:56 -04:00
George Hotz
4b8ef6ce78
hotfix: sdxl corealize
2025-04-27 10:41:46 -04:00
George Hotz
b6d2effaf5
assign is contiguous ( #10066 )
...
* assign is contiguous
* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
George Hotz
1253819151
make beautiful indexing use a Variable ( #10063 )
...
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow
2025-04-27 08:22:38 -04:00
Rory Clear
a13a43c4fe
yolo 416 to 640 res ( #10047 )
2025-04-26 20:45:58 -04:00
chenyu
4c1ce1a299
don't simplify if div folding resulted in negative numerator ( #10064 )
...
* don't simplify if div folding resulted in negative numerator
* test
2025-04-26 17:01:18 -04:00
George Hotz
1805403821
fix rand arange folding ( #10060 )
...
* test rand range
* --amend
* fix rand arange folding
* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
qazal
d13c100981
don't sort dims in verify_sink_dims [pr] ( #10059 )
...
* don't sort dims in verify_sink_dims [pr]
* 1 can exist with n
* put process_replay warn last
* assert shape is the same
* bring that back
2025-04-26 23:24:30 +08:00
George Hotz
c80fe6d5fc
handle some fancier reduces ( #10057 )
...
* reduce_unparented
* handle fancier reduces
* fold more
* bugfix
2025-04-26 11:20:15 -04:00
nimlgen
e08270c1ba
nv: fix program init for no-args kernels ( #10058 )
2025-04-26 18:08:53 +03:00
George Hotz
11113c9d07
reduce_unparented ( #10056 )
2025-04-26 09:48:16 -04:00
George Hotz
ea5dddc537
reduce collapse generic ( #10045 )
...
* reduce collapse generic
* new arange folder
* new range folding
* correct with sym
* all tests pass
* indexing ops passes
* failing tests
* fix tests, remove unused
* revert that
* torch indexing is fast
* skip on webgpu
* touchups
* comments
2025-04-26 09:13:24 -04:00
quortus
5cdc96409e
Update outdated renderer.render calls ( #10044 )
2025-04-26 07:35:19 -04:00
nimlgen
e055b9422f
am: fix mmap failures ( #10054 )
2025-04-26 14:21:28 +03:00
qazal
e1d2b64e92
remu new instructions ( #10050 )
...
* remu new instructions
* test_ds_store_half
* test_v_mul_f16
2025-04-26 02:04:12 +03:00
qazal
bba5d0a3e4
remu refactors ( #10028 )
...
* remu refactors
* scc is sgpr 253
* remove that
* rename to vcc_lo
* run cargo test in CI
* llvm-mc
* meh
* work
* work_group work 1
* seeded_lanes is dumb
* better than seeded_lanes
* does not need to be address
* 128 sgpr per wave
* scc is sgpr, we don't know which one
* null_src once more
* derive clone, wave init is cleaner
* init comes first
2025-04-26 04:31:10 +08:00
nimlgen
0fc85a2b0a
hcqfuzz: init ( #10049 )
...
* hcqfuzz: init
* fix fuzz
* linter
* graph
* taht test
* update readme
2025-04-25 23:19:21 +03:00
qazal
b30050e287
fix amdgpu_disassemble on mac [pr] ( #10042 )
2025-04-25 15:23:11 +08:00
George Hotz
a197aa4ef3
upat reduce syntax [pr] ( #10040 )
...
* upat reduce syntax [pr]
* switch z3 to graph_rewrite
2025-04-24 22:05:28 -04:00
Ignacio Sica
76a86735c0
hotfix amd bf16 is supported case ( #10039 )
...
* hotfix amd and amd_llvm
* bf16 not supported in ci
* hotfix amd_llvm is not a device
* remove default
* dont gate on ci and amd_llvm
* minor cleanup
* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe
fix helper_tc_allclose ( #9606 )
...
* fix helper_tc_allclose
* cleanup
* hotfix
* cleanup
* cleanup
* check real buffer and add cast for bf16
* cleanup
* fix padded for ops_python
* avoid assert on amd emulated tc
* swap dimensions
* revert, should have nothing to do with padded
* revert fix, should not go in this pr
* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561
More yolo processing in tinygrad ( #9928 )
...
* more tg less np
* update webgpu html for new compile
* resize boxes
* remove text
* add back note
* fix indentation
* fix indentation
* remove magic num
* remove now unused funcs
* back to numpy nms
* no loop
* fix iou suppression
* update test
* dont suppress other classes
* add working scale
* fix expected value, rounded up 0.24 was being counted
* add postprocess bool for onnx test
* fix indents
* clean
* clean
* fix indent
* remove print
* fix indent
* remove unused import
* remove hardcoded 0.25
* space
* spacing
* clean label_predictions func
* remove single item lists
* space
* use postprocess output in test
* space
* clean
* clean
* remove redundant threshold
* remove redundant threshold
* clean
* rename var
* move loop into func
* unhardcode iou_threshold
* remove unused values
* clean
* add note
* clean
* keep const
* move back funcs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 16:21:46 -04:00
chenyu
74c6cf8be3
lint mlperf model_train ( #10038 )
2025-04-24 16:19:44 -04:00
Ignacio Sica
51ca19d061
set test_tensor_cores_padded_amd to expectedFailure ( #10036 )
...
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason
2025-04-24 17:11:40 -03:00
b1tg
914d89fa0b
fix tensor cores for gfx1201 ( #9838 )
...
* fix tensor cores for gfx1201
* fix typo
* fix python wmma
* AMDLLVMRenderer with arch + AMDLLVM tensor_cores
* fix ci
* clean up
* more tensor cores for RDNA4
* fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 14:57:41 -04:00
uuuvn
779aa1e2e9
Enable image tests on cloud if clouddev supports image ( #9903 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 14:30:12 -04:00
uuuvn
29a12b19ea
Add macos CLOUD tests ( #10033 )
...
A lot more work is required to enable all of them and move into osxtests
matrix, for now i created a separate runner for them (copied from WebGPU)
Will add test/test_graph.py to those tests in #9876
2025-04-24 14:14:13 -04:00
Nishant Rajadhyaksha
55942a8d8e
[Bounty] moved index_tensor off cpu in torch_backend ( #9916 )
...
* moved index tensor off cpu in torch_backend
* added support for None based indexing
* fix_to_pass_tests
* fix segfault tests
2025-04-24 14:12:37 -04:00
Ignacio Sica
373ca59b7f
use is_dtype_supported to check dtype support in tc tests ( #10035 )
2025-04-24 14:59:14 -03:00
Ignacio Sica
93a1e9eeb9
improve bf16 case for is_dtype_supported [pr] ( #10034 )
...
* fix is_dtype_supported for bf16
* hotfix
* add llvm and amd_llvm
* gate on machine
* separate gpu vs cpu cases
* add arm case
2025-04-24 14:03:57 -03:00
uuuvn
754d789f51
Fix and enable jit tests on CLOUD ( #10031 )
2025-04-24 18:39:31 +03:00
qazal
0b482fb824
add RDNA3 parser to remu ( #10025 )
...
* llvm ref
* work
* all of them
* salu
* cleaner
* start
* vector ops
* done
* replace SMEM
* vopd
* sop1
* SOPC
* null stays null_src
* sopp
* SOPK
* sop2
* vop1
* vop2
* remove allow(dead_code)
* vopc
2025-04-24 21:34:07 +08:00
uuuvn
0d903c9495
Print clouddev instead of cloudev's renderer ( #10023 )
...
This is kind of a bug because currently with DEBUG>=1 it will say that
remote has device and then an array of renderer props instead of a real
device name which doesn't make sense:
```
127.0.0.1 - - [24/Apr/2025 16:50:44] "GET /properties HTTP/1.1" 200 -
remote has device ['tinygrad.renderer.cstyle', 'MetalRenderer', []]
opened device CLOUD from pid:20210
```
Now it will actually print the name of device behind cloud:
```
127.0.0.1 - - [24/Apr/2025 16:56:29] "GET /properties HTTP/1.1" 200 -
remote has device METAL
opened device CLOUD from pid:20315
```
2025-04-24 09:32:08 -04:00
George Hotz
aec75f51ef
fixup some slow CI tests [pr] ( #10027 )
...
* fixup some slow CI tests [pr]
* shrink test index
2025-04-24 09:20:49 -04:00
qazal
c990aac2b1
skip flaky test_transcribe_file1_OOB ( #10026 )
2025-04-24 21:09:43 +08:00
George Hotz
4e2ccfddc6
ci refactor to split AMD/NVIDIA [pr] ( #10024 )
...
* ci refactor to split AMD [pr]
* split
* split amd tests
* explicit 0
2025-04-24 08:59:54 -04:00
uuuvn
0c68e44d6f
Cloud properties ( #10021 )
2025-04-24 08:17:01 -04:00
George Hotz
db00d88415
hotfix: handle bad z3 install like z3 import fail
2025-04-24 08:09:40 -04:00
Sieds Lykles
e75be6eafc
[bounty] [pr] index validation with z3 ( #9981 )
...
* index validation with z3
* Change comment
* toposort -> toposort()
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 08:06:08 -04:00
quortus
9e49721c47
CPUGraph support for clang ( #10014 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-04-24 07:52:35 -04:00
Park Jun
c3ad7b2a84
create randperm and support pytorch backend ( #10019 )
2025-04-24 07:29:02 -04:00
Matthew Daiter
b545338e59
isin_Tensor_out added ( #10018 )
2025-04-24 07:26:51 -04:00
chenyu
a25abf55e3
retinanet only call postprocess_detections with RUNMLPERF ( #10017 )
...
during setup only need to compile `_eval_step().numpy()`
2025-04-23 20:45:38 -04:00
nimlgen
7f53e80db9
hotfix: amd mmio on mi300 ( #10016 )
...
* hotfix: amd mmio on mi300
* fix
* ops
2025-04-24 01:08:18 +03:00
nimlgen
1c5e353249
am: use mmio iface ( #10012 )
...
* am: use mmio iface
* linters
* fixes
* fixes + cleanups
* mute
* mypy
* style
2025-04-24 00:27:04 +03:00
chenyu
65faa1d94b
explicit device in mlperf scripts ( #10015 )
2025-04-23 17:11:52 -04:00
chenyu
a3f938dbee
remove retinanet INITMLPERF from beam script ( #10011 )
...
it only controls logging, loading real data or not is solely controlled by RUNMLPERF
2025-04-23 14:32:54 -04:00
nimlgen
cc52b9c528
am: add entry() to PT ( #10010 )
2025-04-23 20:45:52 +03:00