Commit Graph

1134 Commits

Author SHA1 Message Date
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
chenyu
3f29c7edda minor onnx dropout cleanup (#10891)
we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?
2025-06-20 10:18:34 -04:00
qazal
000eb30f04 viz: remove prev profiler file (#10888)
The new profiler is integrated in the main VIZ tab.

Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.
2025-06-19 23:05:46 +03:00
chenyu
7d5c769c6b fix compile4 (#10797) 2025-06-12 22:28:56 -04:00
geohotstan
806b68c2b3 Add fallback dtype to ONNX (#10788)
* start

* still need the float16 workaround in

* tiny nit for correctness

* idk hacks, I need to understand this device stuff better

* no-op?

* remove that assert for true nooooooop

* add fallback_context
2025-06-12 20:39:21 -04:00
chenyu
5e7ad70aae don't run linearize().uop tests in get_action_space test (#10766)
* don't run linearize().uop tests in get_action_space test

this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant

* -n=auto test/models
2025-06-10 17:23:53 -04:00
nimlgen
800d1796d5 am_smi: kill process group (#10750) 2025-06-10 15:23:39 +03:00
b1tg
24d328e313 onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
George Hotz
3ece2e4bb5 hotfix: remove accel from extra 2025-06-08 08:20:34 -07:00
geohotstan
dedff0e96c fix run huggingface onnx debug (#10679) 2025-06-08 00:59:20 -04:00
nimlgen
85cea23557 nv: original bw qmd (#10672)
* nv: original bw qmd

* forgot
2025-06-07 01:43:22 +03:00
Sidharth N. Babu
ef14dfb277 compile fixes (#10442) 2025-06-06 18:38:37 -04:00
chenyu
4a6d84c4c3 hotfix llama start_pos vmax is max_context-1 (#10659)
* hotfix llama start_pos vmax is max_context-1

fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`

* hotfix: multitensor transformer test tests kv cache

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-06-06 00:41:25 -04:00
Xingyu
7a1bfb668d Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612)
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend

* Add unit test for linalg.eigh function in TestTorchBackend

This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
2025-06-04 07:59:50 -04:00
nimlgen
883bb4541c am: reserve address space (#10564)
* am: reserve address space

* f

* cc

* errno

* fix

* always has cpu mapping
2025-05-30 19:31:03 +03:00
qazal
5b59728c75 refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541)
* changes to core tinygrad

* fixups pt1

TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule

* more tests

* green now

* images stay images
2025-05-30 14:27:58 +03:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
George Hotz
871df1436a more beautiful cifar (#10551)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh

* work on beautiful_cifar

* speed close to hlb_cifar

* schedule to corealize all

* one line sched step

* less lines
2025-05-28 20:48:20 -07:00
nimlgen
d1d9e729fd am_smi: mem usage (#10547) 2025-05-28 16:53:31 +03:00
chenyu
76eb130d8c hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526) 2025-05-26 20:19:37 -04:00
geohotstan
602a145f8f Add Tensor.unfold (#10518)
* yoinked 10272

* eitanturok's fixes

* hmmm should size be sint?

* add test
2025-05-26 11:15:44 -04:00
nimlgen
deb369417c am_smi: print device usage (#10520)
* am_smi: print device usage

* tiny comments
2025-05-26 17:17:56 +03:00
geohotstan
fd9f236a82 move test over (#10508) 2025-05-25 21:51:51 -04:00
George Hotz
941cbd3471 hotfix: amd works on arch linux w/o rocm 2025-05-24 16:47:13 -07:00
nimlgen
d90ddcc365 nv: blackwell support (#10487)
* nv: blackwell support

* fixes

* hm

* h

* fixes

* mypy

* xx

* yy

* arr

* revert

* oops

* unrelated
2025-05-24 18:23:53 +03:00
chenyu
dc6309242d WallTimeEvent for mlperf ci (#10506) 2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis
e21836952d mmapeak implementation for 7900 XTX (#10417)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-23 16:26:12 -07:00
George Hotz
0a313d98a0 add rocm 6.4 support (#10491)
* add rocm 6.4 support

* update to newer amdcomgr, assert lang is right

* fix aux-triple
2025-05-23 16:20:54 -07:00
Xingyu
1e0a59aca4 fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464) 2025-05-22 10:54:13 -07:00
George Hotz
577a0b4cfa openpilot compile4 (wip) (#10407)
* openpilot compile4

* add copies

* remove junk
2025-05-22 10:47:34 -07:00
qazal
7720c1aef1 hotfix: remove viz_sz.py [pr] (#10446) 2025-05-21 14:17:42 +03:00
qazal
df4cbb69e9 move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00
qazal
8a6fb37560 move viz /prof to extra [pr] (#10401) 2025-05-18 23:25:59 +03:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
qazal
17f0f5e764 add v_rcp_f32_e64 to remu (#10393)
* tests from the box

* add v_rcp_f32_e64 to remu

* f32::from_bits utils

* v_cndmask_b32 tests
2025-05-18 17:08:21 +03:00
Xingyu
286b0f4051 Add equal function implementation and corresponding test (#10351)
- Implemented a new function `equal` in the torch backend to compare two tensors for equality.
- Added unit tests for the `equal` function to verify its correctness with different tensor inputs.
2025-05-16 23:39:49 -07:00
Ignacio Sica
a54fd745c3 simpler barrier match in remu (#10339)
* s_barrier

* remove s_barrier from syncs
2025-05-16 14:40:58 +03:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
Ignacio Sica
3c453e96a9 add ds_load_b96 and ds_store_b96 instructions (#10338) 2025-05-15 18:11:08 +03:00
qazal
be8202b293 add s_abs_i32 instruction to remu (#10334) 2025-05-15 16:47:58 +03:00
nimlgen
e00679dc92 am_smi: fix layout with sleep mode (#10300) 2025-05-14 15:44:42 +03:00
nimlgen
0788659d08 usbgpu: fast cold boot (#10260)
* usbgpu: fast cold boot

* cleaner

* assert

* xx

* compat

* fix

* fix
2025-05-14 14:58:55 +03:00
geohotstan
1c4ab6b991 ONNX add tests against ORT (#10270)
* start

* clean up

* indicate file location too
2025-05-13 04:03:52 -04:00
nimlgen
bb31cc4582 usbgpu: check hash in patcher (#10266) 2025-05-12 21:08:53 +03:00
George Hotz
8864ff894b hotfix: that repeat_kv belongs outside the if 2025-05-11 18:43:01 -07:00
George Hotz
98c84a711d min rectified flow example [pr] (#10252)
* work on minrf example

* more

* jit sample

* t is tensor not const

* fixes

* more convs

* fix dropout

* don't print

* 504

* big patch

* onehot

* touch

* use embeddings

* dumb uses final layer

* act

* non fl

* match

* tp

* 3

* of

* ppsz

* normal

* add adln

* no t

* weird transformer

* weird transformer

* contig

* actual speed fix

* dumb

* cb

* 0

* t is 0

* mort-t

* args

* dumb days are over

* readable

* contig

* no more t mask

* mask_t

* init to zero

* clean

* steps

* work

* tt

* t

* solid
2025-05-11 18:36:44 -07:00
qazal
9210280811 add v_fmac_f16 vop3 instruction to remu (#10247)
* fmac vop3

* from the box
2025-05-10 23:48:25 +03:00
nimlgen
116390083f nvme speed write example (#10230) 2025-05-09 14:20:01 +03:00
Xingyu
a21369d039 Enhance tensor random functions with dtype support (#10214)
* Enhance tensor random functions with dtype support
- Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py
- Added unit tests for uniform and normal tensor generation with specific dtypes in test.py

* Refactor test name for clarity
- Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py`
- Aims to improve readability and better reflect the test's purpose
2025-05-08 20:48:07 -04:00