George Hotz
be53ef4f0a
rename DEFINE_ACC -> DEFINE_REG ( #11006 )
...
* rename DEFINE_ACC -> DEFINE_REG
* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
George Hotz
b4eb876d5a
kernel.py no longer permutes reduce axis [pr] ( #10968 )
...
* kernel.py no longer permutes reduce axis [pr]
* delete tests that handcode uops
* regen of sops is broken...
* put import back
* just remove that
* disable those tests
2025-06-26 17:44:58 -07:00
George Hotz
856759c79c
add halide example ( #10980 )
...
* add halide example
* upd halide gemm
* partial works
* touchups
2025-06-26 16:14:57 -07:00
qazal
1127302c46
move perfetto to extra ( #10994 )
...
* move perfetto to extra
* update TestViz and fix tests
* remove perfetto.html from viz directory
* work
* mypy
2025-06-27 01:53:54 +03:00
qazal
712980e167
fix extract_dataset + add tests to CI ( #10995 )
...
* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
2025-06-27 01:51:36 +03:00
geohotstan
50936b4a18
ONNX real float16 ( #10694 )
...
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-26 14:05:12 -04:00
chenyu
49bba2f0a0
improve test_nll_loss ( #10986 )
...
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
nimlgen
1c45b9f7fb
start nvpci ( #10521 )
...
* start nvpci
* talk to fsp
* boot args
* riscv core bootted
* q
* agen
* got gsp init msg
* some fixes
* set registry, stuck aft lockdown(
* start ga/ad port
* gsp init on ada
* more classes allocated
* more
* mm
* fixes and progress
* no huge pages for now
* mm seems workin, but switch to 512mb page for simplicity
* working state
* not cleaned
* claned
* nvd=1
* start gr ctx
* compute
* clean 1
* cleanup 2
* cleanup 3
* cleaner 4
* cleaner 6
* add iface to nv
* save before reboot
* merged into NV
* moveout mm
* post merge
* cleaner 7
* merge and rebase
* pciiface abstraction + reset
* download fw from web
* print logs
* minor changes + p2p
* cleaner 8
* cleaner 9
* cleaner 10
* delete
* delete this as well
* linter 1
* oops
* priv_client -> priv_root
* fix mypy
* mypy?
* mypy?
* small changes
* shorter
* ops
* remove this
* do not allocate paddr for reserve
* nodiff
* unified script
* ops
* dif ver
* add lock
* setup
2025-06-25 00:37:34 +03:00
chenyu
ffb032e31d
test_diagonal touchup ( #10962 )
2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632
Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops ( #10945 )
...
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend
* cleanup
* generic fix
* tests
* cmp with diagonal too
* oops
* move tests
* fix test
* remove unnecessary import
* fix assert
* compare against numpy
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-06-24 15:36:06 -04:00
chenyu
18e264a449
Tensor.logsigmoid ( #10955 )
2025-06-24 11:16:14 -04:00
George Hotz
e15754db28
remove (some) kernelize from llama and test schedule speed ( #10939 )
...
* remove kernelize from llama
* 405B
* space
2025-06-23 15:07:31 -07:00
alpharush
22f9696522
Fix/hcqfuzz harnesss bug ( #10923 )
...
* update command so extra module is found
* fix empty range in randrange errors
* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc
ONNX improve dtype fallback ( #10800 )
...
* fix
* add early verbose demo test
* is this how to write tests :s
* is definition drift even a thing? gemini says it is
* clean up
* better
* even better
* try add to CI
* doesn't work quite yet
* much more work to be done
* whoops
* partition the test heh
* skipif
* some nits for better names
* add webgpu test for onnxrunner
* fix reference links
* flush for now
2025-06-21 19:29:45 -04:00
George Hotz
92678e59ee
move kernel to opt ( #10899 )
2025-06-20 15:22:28 -07:00
chenyu
3f29c7edda
minor onnx dropout cleanup ( #10891 )
...
we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?
2025-06-20 10:18:34 -04:00
qazal
000eb30f04
viz: remove prev profiler file ( #10888 )
...
The new profiler is integrated in the main VIZ tab.
Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715 ) soon.
2025-06-19 23:05:46 +03:00
chenyu
7d5c769c6b
fix compile4 ( #10797 )
2025-06-12 22:28:56 -04:00
geohotstan
806b68c2b3
Add fallback dtype to ONNX ( #10788 )
...
* start
* still need the float16 workaround in
* tiny nit for correctness
* idk hacks, I need to understand this device stuff better
* no-op?
* remove that assert for true nooooooop
* add fallback_context
2025-06-12 20:39:21 -04:00
chenyu
5e7ad70aae
don't run linearize().uop tests in get_action_space test ( #10766 )
...
* don't run linearize().uop tests in get_action_space test
this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant
* -n=auto test/models
2025-06-10 17:23:53 -04:00
nimlgen
800d1796d5
am_smi: kill process group ( #10750 )
2025-06-10 15:23:39 +03:00
b1tg
24d328e313
onnx parser ( #10435 )
...
* onnx parser
* fix compile, lint
* onnx.load -> onnx_load
* compatible with ModelProto
* fix test external_test_onnx_ops.py
* fix tests
* fix signed int
* reduce to 261 lines
* fix TypeProto.Optional
* debug for _parse_message, add TypeProto.Sequence, cleanup
* onnx_load from Tensor
* remove BufferedReader
* 174 lines and reduce tensor copy
* cleanup
* use onnx_load in external_model_benchmark.py
* fix qcom test
* [onnx] parser support external data
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-09 12:44:28 -04:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
George Hotz
3ece2e4bb5
hotfix: remove accel from extra
2025-06-08 08:20:34 -07:00
geohotstan
dedff0e96c
fix run huggingface onnx debug ( #10679 )
2025-06-08 00:59:20 -04:00
nimlgen
85cea23557
nv: original bw qmd ( #10672 )
...
* nv: original bw qmd
* forgot
2025-06-07 01:43:22 +03:00
Sidharth N. Babu
ef14dfb277
compile fixes ( #10442 )
2025-06-06 18:38:37 -04:00
chenyu
4a6d84c4c3
hotfix llama start_pos vmax is max_context-1 ( #10659 )
...
* hotfix llama start_pos vmax is max_context-1
fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`
* hotfix: multitensor transformer test tests kv cache
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-06-06 00:41:25 -04:00
Xingyu
7a1bfb668d
Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend ( #10612 )
...
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend
* Add unit test for linalg.eigh function in TestTorchBackend
This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
2025-06-04 07:59:50 -04:00
nimlgen
883bb4541c
am: reserve address space ( #10564 )
...
* am: reserve address space
* f
* cc
* errno
* fix
* always has cpu mapping
2025-05-30 19:31:03 +03:00
qazal
5b59728c75
refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) ( #10541 )
...
* changes to core tinygrad
* fixups pt1
TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule
* more tests
* green now
* images stay images
2025-05-30 14:27:58 +03:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] ( #10556 )
2025-05-28 22:20:02 -07:00
George Hotz
871df1436a
more beautiful cifar ( #10551 )
...
* enumerate cases of Tensors in the JIT
* optional fused optimizers
* add fused optimizer test
* move that there
* ugh
* work on beautiful_cifar
* speed close to hlb_cifar
* schedule to corealize all
* one line sched step
* less lines
2025-05-28 20:48:20 -07:00
nimlgen
d1d9e729fd
am_smi: mem usage ( #10547 )
2025-05-28 16:53:31 +03:00
chenyu
76eb130d8c
hotfix: BenchEvent MLPERF_RUN is mlperf_run ( #10526 )
2025-05-26 20:19:37 -04:00
geohotstan
602a145f8f
Add Tensor.unfold ( #10518 )
...
* yoinked 10272
* eitanturok's fixes
* hmmm should size be sint?
* add test
2025-05-26 11:15:44 -04:00
nimlgen
deb369417c
am_smi: print device usage ( #10520 )
...
* am_smi: print device usage
* tiny comments
2025-05-26 17:17:56 +03:00
geohotstan
fd9f236a82
move test over ( #10508 )
2025-05-25 21:51:51 -04:00
George Hotz
941cbd3471
hotfix: amd works on arch linux w/o rocm
2025-05-24 16:47:13 -07:00
nimlgen
d90ddcc365
nv: blackwell support ( #10487 )
...
* nv: blackwell support
* fixes
* hm
* h
* fixes
* mypy
* xx
* yy
* arr
* revert
* oops
* unrelated
2025-05-24 18:23:53 +03:00
chenyu
dc6309242d
WallTimeEvent for mlperf ci ( #10506 )
2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis
e21836952d
mmapeak implementation for 7900 XTX ( #10417 )
...
* Add mmapeak implementation for 7900 XTX
* Change identation
* Use a template instead of multiple assebly files
* Fix output formatting
* Reduce register file bank conflicts
* More accurate measurement for quick instructions
* Add support for gfx1201
* RDNA4 wmma requires less VGRPs
* RDNA4 does not have s_cmpk instructions
* Add v_wmma_i32_16x16x32_iu4 for gfx1201
* Add sparse wmma instructions
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-23 16:26:12 -07:00
George Hotz
0a313d98a0
add rocm 6.4 support ( #10491 )
...
* add rocm 6.4 support
* update to newer amdcomgr, assert lang is right
* fix aux-triple
2025-05-23 16:20:54 -07:00
Xingyu
1e0a59aca4
fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend ( #10464 )
2025-05-22 10:54:13 -07:00
George Hotz
577a0b4cfa
openpilot compile4 (wip) ( #10407 )
...
* openpilot compile4
* add copies
* remove junk
2025-05-22 10:47:34 -07:00
qazal
7720c1aef1
hotfix: remove viz_sz.py [pr] ( #10446 )
2025-05-21 14:17:42 +03:00
qazal
df4cbb69e9
move fuzz_schedule.py to extra [pr] ( #10444 )
2025-05-21 10:07:24 +03:00
qazal
8a6fb37560
move viz /prof to extra [pr] ( #10401 )
2025-05-18 23:25:59 +03:00
George Hotz
411392dfb7
move files into uop dir ( #10399 )
...
* move files into uop dir [pr]
* tinygrad.uop is a thing
* fix uop docs, no pr
* fix viz
2025-05-18 11:38:28 -07:00
qazal
17f0f5e764
add v_rcp_f32_e64 to remu ( #10393 )
...
* tests from the box
* add v_rcp_f32_e64 to remu
* f32::from_bits utils
* v_cndmask_b32 tests
2025-05-18 17:08:21 +03:00