George Hotz
2893feb9f6
cleanups for kernel.py ( #11143 )
...
* cleanups for kernel.py
* fixups
2025-07-08 18:10:25 -07:00
chenyu
7ce9e45474
mypy onnx_parser ( #11141 )
2025-07-08 19:50:28 -04:00
chenyu
ffcc557986
lint onnx and onnx_parser ( #11134 )
2025-07-08 15:28:35 -04:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] ( #11126 )
...
* move cpu_profile and shared ProfileEvents to helpers [pr]
* TestProfiler.test_cpu_profile
* update test_viz.py
* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
nimlgen
71377cd233
nv: parse falcon app descs ( #11118 )
2025-07-07 18:14:14 +03:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
nimlgen
4dccb2ea49
am_smi: increase kill retries ( #11099 )
2025-07-05 16:23:50 +03:00
0xSG
17119b0f23
hip_ioctl: platform.machine added ( #11084 )
2025-07-04 17:20:24 +03:00
nimlgen
2d138c6cf1
am: factor out init_sw ( #11070 )
2025-07-03 11:01:17 +03:00
chenyu
425d5f55c4
generate kernel dataset and upload artifact ( #11063 )
2025-07-02 17:21:25 -04:00
chenyu
4626e9c172
is_numpy_ndarray helper [pr] ( #11050 )
2025-07-02 09:12:53 -04:00
chenyu
126fcf4129
clean up AMD_LLVM in tests ( #11021 )
2025-06-28 22:45:47 -04:00
chenyu
a6485d00c8
very tiny generate_dataset ( #11013 )
...
one minute to gen on my mac
2025-06-27 17:10:45 -04:00
George Hotz
be53ef4f0a
rename DEFINE_ACC -> DEFINE_REG ( #11006 )
...
* rename DEFINE_ACC -> DEFINE_REG
* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
George Hotz
b4eb876d5a
kernel.py no longer permutes reduce axis [pr] ( #10968 )
...
* kernel.py no longer permutes reduce axis [pr]
* delete tests that handcode uops
* regen of sops is broken...
* put import back
* just remove that
* disable those tests
2025-06-26 17:44:58 -07:00
George Hotz
856759c79c
add halide example ( #10980 )
...
* add halide example
* upd halide gemm
* partial works
* touchups
2025-06-26 16:14:57 -07:00
qazal
1127302c46
move perfetto to extra ( #10994 )
...
* move perfetto to extra
* update TestViz and fix tests
* remove perfetto.html from viz directory
* work
* mypy
2025-06-27 01:53:54 +03:00
qazal
712980e167
fix extract_dataset + add tests to CI ( #10995 )
...
* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
2025-06-27 01:51:36 +03:00
geohotstan
50936b4a18
ONNX real float16 ( #10694 )
...
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-26 14:05:12 -04:00
chenyu
49bba2f0a0
improve test_nll_loss ( #10986 )
...
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
nimlgen
1c45b9f7fb
start nvpci ( #10521 )
...
* start nvpci
* talk to fsp
* boot args
* riscv core bootted
* q
* agen
* got gsp init msg
* some fixes
* set registry, stuck aft lockdown(
* start ga/ad port
* gsp init on ada
* more classes allocated
* more
* mm
* fixes and progress
* no huge pages for now
* mm seems workin, but switch to 512mb page for simplicity
* working state
* not cleaned
* claned
* nvd=1
* start gr ctx
* compute
* clean 1
* cleanup 2
* cleanup 3
* cleaner 4
* cleaner 6
* add iface to nv
* save before reboot
* merged into NV
* moveout mm
* post merge
* cleaner 7
* merge and rebase
* pciiface abstraction + reset
* download fw from web
* print logs
* minor changes + p2p
* cleaner 8
* cleaner 9
* cleaner 10
* delete
* delete this as well
* linter 1
* oops
* priv_client -> priv_root
* fix mypy
* mypy?
* mypy?
* small changes
* shorter
* ops
* remove this
* do not allocate paddr for reserve
* nodiff
* unified script
* ops
* dif ver
* add lock
* setup
2025-06-25 00:37:34 +03:00
chenyu
ffb032e31d
test_diagonal touchup ( #10962 )
2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632
Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops ( #10945 )
...
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend
* cleanup
* generic fix
* tests
* cmp with diagonal too
* oops
* move tests
* fix test
* remove unnecessary import
* fix assert
* compare against numpy
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-06-24 15:36:06 -04:00
chenyu
18e264a449
Tensor.logsigmoid ( #10955 )
2025-06-24 11:16:14 -04:00
George Hotz
e15754db28
remove (some) kernelize from llama and test schedule speed ( #10939 )
...
* remove kernelize from llama
* 405B
* space
2025-06-23 15:07:31 -07:00
alpharush
22f9696522
Fix/hcqfuzz harnesss bug ( #10923 )
...
* update command so extra module is found
* fix empty range in randrange errors
* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc
ONNX improve dtype fallback ( #10800 )
...
* fix
* add early verbose demo test
* is this how to write tests :s
* is definition drift even a thing? gemini says it is
* clean up
* better
* even better
* try add to CI
* doesn't work quite yet
* much more work to be done
* whoops
* partition the test heh
* skipif
* some nits for better names
* add webgpu test for onnxrunner
* fix reference links
* flush for now
2025-06-21 19:29:45 -04:00
George Hotz
92678e59ee
move kernel to opt ( #10899 )
2025-06-20 15:22:28 -07:00
chenyu
3f29c7edda
minor onnx dropout cleanup ( #10891 )
...
we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?
2025-06-20 10:18:34 -04:00
qazal
000eb30f04
viz: remove prev profiler file ( #10888 )
...
The new profiler is integrated in the main VIZ tab.
Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715 ) soon.
2025-06-19 23:05:46 +03:00
chenyu
7d5c769c6b
fix compile4 ( #10797 )
2025-06-12 22:28:56 -04:00
geohotstan
806b68c2b3
Add fallback dtype to ONNX ( #10788 )
...
* start
* still need the float16 workaround in
* tiny nit for correctness
* idk hacks, I need to understand this device stuff better
* no-op?
* remove that assert for true nooooooop
* add fallback_context
2025-06-12 20:39:21 -04:00
chenyu
5e7ad70aae
don't run linearize().uop tests in get_action_space test ( #10766 )
...
* don't run linearize().uop tests in get_action_space test
this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant
* -n=auto test/models
2025-06-10 17:23:53 -04:00
nimlgen
800d1796d5
am_smi: kill process group ( #10750 )
2025-06-10 15:23:39 +03:00
b1tg
24d328e313
onnx parser ( #10435 )
...
* onnx parser
* fix compile, lint
* onnx.load -> onnx_load
* compatible with ModelProto
* fix test external_test_onnx_ops.py
* fix tests
* fix signed int
* reduce to 261 lines
* fix TypeProto.Optional
* debug for _parse_message, add TypeProto.Sequence, cleanup
* onnx_load from Tensor
* remove BufferedReader
* 174 lines and reduce tensor copy
* cleanup
* use onnx_load in external_model_benchmark.py
* fix qcom test
* [onnx] parser support external data
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-09 12:44:28 -04:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
George Hotz
3ece2e4bb5
hotfix: remove accel from extra
2025-06-08 08:20:34 -07:00
geohotstan
dedff0e96c
fix run huggingface onnx debug ( #10679 )
2025-06-08 00:59:20 -04:00
nimlgen
85cea23557
nv: original bw qmd ( #10672 )
...
* nv: original bw qmd
* forgot
2025-06-07 01:43:22 +03:00
Sidharth N. Babu
ef14dfb277
compile fixes ( #10442 )
2025-06-06 18:38:37 -04:00
chenyu
4a6d84c4c3
hotfix llama start_pos vmax is max_context-1 ( #10659 )
...
* hotfix llama start_pos vmax is max_context-1
fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`
* hotfix: multitensor transformer test tests kv cache
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-06-06 00:41:25 -04:00
Xingyu
7a1bfb668d
Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend ( #10612 )
...
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend
* Add unit test for linalg.eigh function in TestTorchBackend
This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
2025-06-04 07:59:50 -04:00
nimlgen
883bb4541c
am: reserve address space ( #10564 )
...
* am: reserve address space
* f
* cc
* errno
* fix
* always has cpu mapping
2025-05-30 19:31:03 +03:00
qazal
5b59728c75
refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) ( #10541 )
...
* changes to core tinygrad
* fixups pt1
TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule
* more tests
* green now
* images stay images
2025-05-30 14:27:58 +03:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] ( #10556 )
2025-05-28 22:20:02 -07:00
George Hotz
871df1436a
more beautiful cifar ( #10551 )
...
* enumerate cases of Tensors in the JIT
* optional fused optimizers
* add fused optimizer test
* move that there
* ugh
* work on beautiful_cifar
* speed close to hlb_cifar
* schedule to corealize all
* one line sched step
* less lines
2025-05-28 20:48:20 -07:00
nimlgen
d1d9e729fd
am_smi: mem usage ( #10547 )
2025-05-28 16:53:31 +03:00
chenyu
76eb130d8c
hotfix: BenchEvent MLPERF_RUN is mlperf_run ( #10526 )
2025-05-26 20:19:37 -04:00
geohotstan
602a145f8f
Add Tensor.unfold ( #10518 )
...
* yoinked 10272
* eitanturok's fixes
* hmmm should size be sint?
* add test
2025-05-26 11:15:44 -04:00