Commit Graph

8599 Commits

Author SHA1 Message Date
George Hotz
b341296304 hotfix: save sdxl ram 2025-04-27 12:09:45 -04:00
George Hotz
68c5f7ba80 load fast in sdxl (#10072)
* load fast in sdxl

* back to that with the ret

* no context
2025-04-27 11:58:51 -04:00
George Hotz
768eb94c3e disable debug for load_state_dict [pr] (#10070) 2025-04-27 11:11:56 -04:00
George Hotz
4b8ef6ce78 hotfix: sdxl corealize 2025-04-27 10:41:46 -04:00
George Hotz
b6d2effaf5 assign is contiguous (#10066)
* assign is contiguous

* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
George Hotz
1253819151 make beautiful indexing use a Variable (#10063)
* make beautiful indexing use a Variable

* stunning test

* better color

* training is broken

* fix tests

* fix variable indexing

* fix test

* no contiguous

* revert that

* revert that too

* indexing two bind

* skip for webgpu

* make not slow
2025-04-27 08:22:38 -04:00
Rory Clear
a13a43c4fe yolo 416 to 640 res (#10047) 2025-04-26 20:45:58 -04:00
chenyu
4c1ce1a299 don't simplify if div folding resulted in negative numerator (#10064)
* don't simplify if div folding resulted in negative numerator

* test
2025-04-26 17:01:18 -04:00
George Hotz
1805403821 fix rand arange folding (#10060)
* test rand range

* --amend

* fix rand arange folding

* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
qazal
d13c100981 don't sort dims in verify_sink_dims [pr] (#10059)
* don't sort dims in verify_sink_dims [pr]

* 1 can exist with n

* put process_replay warn last

* assert shape is the same

* bring that back
2025-04-26 23:24:30 +08:00
George Hotz
c80fe6d5fc handle some fancier reduces (#10057)
* reduce_unparented

* handle fancier reduces

* fold more

* bugfix
2025-04-26 11:20:15 -04:00
nimlgen
e08270c1ba nv: fix program init for no-args kernels (#10058) 2025-04-26 18:08:53 +03:00
George Hotz
11113c9d07 reduce_unparented (#10056) 2025-04-26 09:48:16 -04:00
George Hotz
ea5dddc537 reduce collapse generic (#10045)
* reduce collapse generic

* new arange folder

* new range folding

* correct with sym

* all tests pass

* indexing ops passes

* failing tests

* fix tests, remove unused

* revert that

* torch indexing is fast

* skip on webgpu

* touchups

* comments
2025-04-26 09:13:24 -04:00
quortus
5cdc96409e Update outdated renderer.render calls (#10044) 2025-04-26 07:35:19 -04:00
nimlgen
e055b9422f am: fix mmap failures (#10054) 2025-04-26 14:21:28 +03:00
qazal
e1d2b64e92 remu new instructions (#10050)
* remu new instructions

* test_ds_store_half

* test_v_mul_f16
2025-04-26 02:04:12 +03:00
qazal
bba5d0a3e4 remu refactors (#10028)
* remu refactors

* scc is sgpr 253

* remove that

* rename to vcc_lo

* run cargo test in CI

* llvm-mc

* meh

* work

* work_group work 1

* seeded_lanes is dumb

* better than seeded_lanes

* does not need to be address

* 128 sgpr per wave

* scc is sgpr, we don't know which one

* null_src once more

* derive clone, wave init is cleaner

* init comes first
2025-04-26 04:31:10 +08:00
nimlgen
0fc85a2b0a hcqfuzz: init (#10049)
* hcqfuzz: init

* fix fuzz

* linter

* graph

* taht test

* update readme
2025-04-25 23:19:21 +03:00
qazal
b30050e287 fix amdgpu_disassemble on mac [pr] (#10042) 2025-04-25 15:23:11 +08:00
George Hotz
a197aa4ef3 upat reduce syntax [pr] (#10040)
* upat reduce syntax [pr]

* switch z3 to graph_rewrite
2025-04-24 22:05:28 -04:00
Ignacio Sica
76a86735c0 hotfix amd bf16 is supported case (#10039)
* hotfix amd and amd_llvm

* bf16 not supported in ci

* hotfix amd_llvm is not a device

* remove default

* dont gate on ci and amd_llvm

* minor cleanup

* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe fix helper_tc_allclose (#9606)
* fix helper_tc_allclose

* cleanup

* hotfix

* cleanup

* cleanup

* check real buffer and add cast for bf16

* cleanup

* fix padded for ops_python

* avoid assert on amd emulated tc

* swap dimensions

* revert, should have nothing to do with padded

* revert fix, should not go in this pr

* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561 More yolo processing in tinygrad (#9928)
* more tg less np

* update webgpu html for new compile

* resize boxes

* remove text

* add back note

* fix indentation

* fix indentation

* remove magic num

* remove now unused funcs

* back to numpy nms

* no loop

* fix iou suppression

* update test

* dont suppress other classes

* add working scale

* fix expected value, rounded up 0.24 was being counted

* add postprocess bool for onnx test

* fix indents

* clean

* clean

* fix indent

* remove print

* fix indent

* remove unused import

* remove hardcoded 0.25

* space

* spacing

* clean label_predictions func

* remove single item lists

* space

* use postprocess output in test

* space

* clean

* clean

* remove redundant threshold

* remove redundant threshold

* clean

* rename var

* move loop into func

* unhardcode iou_threshold

* remove unused values

* clean

* add note

* clean

* keep const

* move back funcs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 16:21:46 -04:00
chenyu
74c6cf8be3 lint mlperf model_train (#10038) 2025-04-24 16:19:44 -04:00
Ignacio Sica
51ca19d061 set test_tensor_cores_padded_amd to expectedFailure (#10036)
* init

* add expected failure to correctly track progres

* hotfix

* skip for amd_llvm as well

* add skip

* add pr number

* move comment to amd test

* change reason
2025-04-24 17:11:40 -03:00
b1tg
914d89fa0b fix tensor cores for gfx1201 (#9838)
* fix tensor cores for gfx1201

* fix typo

* fix python wmma

* AMDLLVMRenderer with arch + AMDLLVM tensor_cores

* fix ci

* clean up

* more tensor cores for RDNA4

* fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 14:57:41 -04:00
uuuvn
779aa1e2e9 Enable image tests on cloud if clouddev supports image (#9903)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 14:30:12 -04:00
uuuvn
29a12b19ea Add macos CLOUD tests (#10033)
A lot more work is required to enable all of them and move into osxtests
matrix, for now i created a separate runner for them (copied from WebGPU)

Will add test/test_graph.py to those tests in #9876
2025-04-24 14:14:13 -04:00
Nishant Rajadhyaksha
55942a8d8e [Bounty] moved index_tensor off cpu in torch_backend (#9916)
* moved index tensor off cpu in torch_backend

* added support for None based indexing

* fix_to_pass_tests

* fix segfault tests
2025-04-24 14:12:37 -04:00
Ignacio Sica
373ca59b7f use is_dtype_supported to check dtype support in tc tests (#10035) 2025-04-24 14:59:14 -03:00
Ignacio Sica
93a1e9eeb9 improve bf16 case for is_dtype_supported [pr] (#10034)
* fix is_dtype_supported for bf16

* hotfix

* add llvm and amd_llvm

* gate on machine

* separate gpu vs cpu cases

* add arm case
2025-04-24 14:03:57 -03:00
uuuvn
754d789f51 Fix and enable jit tests on CLOUD (#10031) 2025-04-24 18:39:31 +03:00
qazal
0b482fb824 add RDNA3 parser to remu (#10025)
* llvm ref

* work

* all of them

* salu

* cleaner

* start

* vector ops

* done

* replace SMEM

* vopd

* sop1

* SOPC

* null stays null_src

* sopp

* SOPK

* sop2

* vop1

* vop2

* remove allow(dead_code)

* vopc
2025-04-24 21:34:07 +08:00
uuuvn
0d903c9495 Print clouddev instead of cloudev's renderer (#10023)
This is kind of a bug because currently with DEBUG>=1 it will say that
remote has device and then an array of renderer props instead of a real
device name which doesn't make sense:

```
127.0.0.1 - - [24/Apr/2025 16:50:44] "GET /properties HTTP/1.1" 200 -
remote has device ['tinygrad.renderer.cstyle', 'MetalRenderer', []]
opened device CLOUD from pid:20210
```

Now it will actually print the name of device behind cloud:

```
127.0.0.1 - - [24/Apr/2025 16:56:29] "GET /properties HTTP/1.1" 200 -
remote has device METAL
opened device CLOUD from pid:20315
```
2025-04-24 09:32:08 -04:00
George Hotz
aec75f51ef fixup some slow CI tests [pr] (#10027)
* fixup some slow CI tests [pr]

* shrink test index
2025-04-24 09:20:49 -04:00
qazal
c990aac2b1 skip flaky test_transcribe_file1_OOB (#10026) 2025-04-24 21:09:43 +08:00
George Hotz
4e2ccfddc6 ci refactor to split AMD/NVIDIA [pr] (#10024)
* ci refactor to split AMD [pr]

* split

* split amd tests

* explicit 0
2025-04-24 08:59:54 -04:00
uuuvn
0c68e44d6f Cloud properties (#10021) 2025-04-24 08:17:01 -04:00
George Hotz
db00d88415 hotfix: handle bad z3 install like z3 import fail 2025-04-24 08:09:40 -04:00
Sieds Lykles
e75be6eafc [bounty] [pr] index validation with z3 (#9981)
* index validation with z3

* Change comment

* toposort -> toposort()

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 08:06:08 -04:00
quortus
9e49721c47 CPUGraph support for clang (#10014)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 07:52:35 -04:00
Park Jun
c3ad7b2a84 create randperm and support pytorch backend (#10019) 2025-04-24 07:29:02 -04:00
Matthew Daiter
b545338e59 isin_Tensor_out added (#10018) 2025-04-24 07:26:51 -04:00
chenyu
a25abf55e3 retinanet only call postprocess_detections with RUNMLPERF (#10017)
during setup only need to compile `_eval_step().numpy()`
2025-04-23 20:45:38 -04:00
nimlgen
7f53e80db9 hotfix: amd mmio on mi300 (#10016)
* hotfix: amd mmio on mi300

* fix

* ops
2025-04-24 01:08:18 +03:00
nimlgen
1c5e353249 am: use mmio iface (#10012)
* am: use mmio iface

* linters

* fixes

* fixes + cleanups

* mute

* mypy

* style
2025-04-24 00:27:04 +03:00
chenyu
65faa1d94b explicit device in mlperf scripts (#10015) 2025-04-23 17:11:52 -04:00
chenyu
a3f938dbee remove retinanet INITMLPERF from beam script (#10011)
it only controls logging, loading real data or not is solely controlled by RUNMLPERF
2025-04-23 14:32:54 -04:00
nimlgen
cc52b9c528 am: add entry() to PT (#10010) 2025-04-23 20:45:52 +03:00