Commit Graph

1242 Commits

Author SHA1 Message Date
chenyu
49bba2f0a0 improve test_nll_loss (#10986)
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
nimlgen
1c45b9f7fb start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
chenyu
ffb032e31d test_diagonal touchup (#10962) 2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632 Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945)
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend

* cleanup

* generic fix

* tests

* cmp with diagonal too

* oops

* move tests

* fix test

* remove unnecessary import

* fix assert

* compare against numpy

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-06-24 15:36:06 -04:00
chenyu
18e264a449 Tensor.logsigmoid (#10955) 2025-06-24 11:16:14 -04:00
George Hotz
e15754db28 remove (some) kernelize from llama and test schedule speed (#10939)
* remove kernelize from llama

* 405B

* space
2025-06-23 15:07:31 -07:00
alpharush
22f9696522 Fix/hcqfuzz harnesss bug (#10923)
* update command so extra module is found

* fix empty range in randrange errors

* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc ONNX improve dtype fallback (#10800)
* fix

* add early verbose demo test

* is this how to write tests :s

* is definition drift even a thing? gemini says it is

* clean up

* better

* even better

* try add to CI

* doesn't work quite yet

* much more work to be done

* whoops

* partition the test heh

* skipif

* some nits for better names

* add webgpu test for onnxrunner

* fix reference links

* flush for now
2025-06-21 19:29:45 -04:00
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
chenyu
3f29c7edda minor onnx dropout cleanup (#10891)
we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?
2025-06-20 10:18:34 -04:00
qazal
000eb30f04 viz: remove prev profiler file (#10888)
The new profiler is integrated in the main VIZ tab.

Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.
2025-06-19 23:05:46 +03:00
chenyu
7d5c769c6b fix compile4 (#10797) 2025-06-12 22:28:56 -04:00
geohotstan
806b68c2b3 Add fallback dtype to ONNX (#10788)
* start

* still need the float16 workaround in

* tiny nit for correctness

* idk hacks, I need to understand this device stuff better

* no-op?

* remove that assert for true nooooooop

* add fallback_context
2025-06-12 20:39:21 -04:00
chenyu
5e7ad70aae don't run linearize().uop tests in get_action_space test (#10766)
* don't run linearize().uop tests in get_action_space test

this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant

* -n=auto test/models
2025-06-10 17:23:53 -04:00
nimlgen
800d1796d5 am_smi: kill process group (#10750) 2025-06-10 15:23:39 +03:00
b1tg
24d328e313 onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
George Hotz
3ece2e4bb5 hotfix: remove accel from extra 2025-06-08 08:20:34 -07:00
geohotstan
dedff0e96c fix run huggingface onnx debug (#10679) 2025-06-08 00:59:20 -04:00
nimlgen
85cea23557 nv: original bw qmd (#10672)
* nv: original bw qmd

* forgot
2025-06-07 01:43:22 +03:00
Sidharth N. Babu
ef14dfb277 compile fixes (#10442) 2025-06-06 18:38:37 -04:00
chenyu
4a6d84c4c3 hotfix llama start_pos vmax is max_context-1 (#10659)
* hotfix llama start_pos vmax is max_context-1

fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`

* hotfix: multitensor transformer test tests kv cache

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-06-06 00:41:25 -04:00
Xingyu
7a1bfb668d Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612)
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend

* Add unit test for linalg.eigh function in TestTorchBackend

This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
2025-06-04 07:59:50 -04:00
nimlgen
883bb4541c am: reserve address space (#10564)
* am: reserve address space

* f

* cc

* errno

* fix

* always has cpu mapping
2025-05-30 19:31:03 +03:00
qazal
5b59728c75 refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541)
* changes to core tinygrad

* fixups pt1

TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule

* more tests

* green now

* images stay images
2025-05-30 14:27:58 +03:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
George Hotz
871df1436a more beautiful cifar (#10551)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh

* work on beautiful_cifar

* speed close to hlb_cifar

* schedule to corealize all

* one line sched step

* less lines
2025-05-28 20:48:20 -07:00
nimlgen
d1d9e729fd am_smi: mem usage (#10547) 2025-05-28 16:53:31 +03:00
chenyu
76eb130d8c hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526) 2025-05-26 20:19:37 -04:00
geohotstan
602a145f8f Add Tensor.unfold (#10518)
* yoinked 10272

* eitanturok's fixes

* hmmm should size be sint?

* add test
2025-05-26 11:15:44 -04:00
nimlgen
deb369417c am_smi: print device usage (#10520)
* am_smi: print device usage

* tiny comments
2025-05-26 17:17:56 +03:00
geohotstan
fd9f236a82 move test over (#10508) 2025-05-25 21:51:51 -04:00
George Hotz
941cbd3471 hotfix: amd works on arch linux w/o rocm 2025-05-24 16:47:13 -07:00
nimlgen
d90ddcc365 nv: blackwell support (#10487)
* nv: blackwell support

* fixes

* hm

* h

* fixes

* mypy

* xx

* yy

* arr

* revert

* oops

* unrelated
2025-05-24 18:23:53 +03:00
chenyu
dc6309242d WallTimeEvent for mlperf ci (#10506) 2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis
e21836952d mmapeak implementation for 7900 XTX (#10417)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-23 16:26:12 -07:00
George Hotz
0a313d98a0 add rocm 6.4 support (#10491)
* add rocm 6.4 support

* update to newer amdcomgr, assert lang is right

* fix aux-triple
2025-05-23 16:20:54 -07:00
Xingyu
1e0a59aca4 fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464) 2025-05-22 10:54:13 -07:00
George Hotz
577a0b4cfa openpilot compile4 (wip) (#10407)
* openpilot compile4

* add copies

* remove junk
2025-05-22 10:47:34 -07:00
qazal
7720c1aef1 hotfix: remove viz_sz.py [pr] (#10446) 2025-05-21 14:17:42 +03:00
qazal
df4cbb69e9 move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00
qazal
8a6fb37560 move viz /prof to extra [pr] (#10401) 2025-05-18 23:25:59 +03:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
qazal
17f0f5e764 add v_rcp_f32_e64 to remu (#10393)
* tests from the box

* add v_rcp_f32_e64 to remu

* f32::from_bits utils

* v_cndmask_b32 tests
2025-05-18 17:08:21 +03:00
Xingyu
286b0f4051 Add equal function implementation and corresponding test (#10351)
- Implemented a new function `equal` in the torch backend to compare two tensors for equality.
- Added unit tests for the `equal` function to verify its correctness with different tensor inputs.
2025-05-16 23:39:49 -07:00
Ignacio Sica
a54fd745c3 simpler barrier match in remu (#10339)
* s_barrier

* remove s_barrier from syncs
2025-05-16 14:40:58 +03:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
Ignacio Sica
3c453e96a9 add ds_load_b96 and ds_store_b96 instructions (#10338) 2025-05-15 18:11:08 +03:00
qazal
be8202b293 add s_abs_i32 instruction to remu (#10334) 2025-05-15 16:47:58 +03:00
nimlgen
e00679dc92 am_smi: fix layout with sleep mode (#10300) 2025-05-14 15:44:42 +03:00