Commit Graph

1184 Commits

Author SHA1 Message Date
George Hotz
dfeee63d30 uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
2c70eaf18c fix load / barrier (#11386)
* fix load / barrier

* cleanups

* fix CI
2025-07-26 10:27:37 -07:00
George Hotz
466ab5a3f2 store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
chenyu
3d68feb67d minor onnx Gather cleanup (#11375)
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
George Hotz
490a93902c define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
0602b22086 kernel spec (#11359)
* kernel spec

* ops.VIEW

* work
2025-07-24 12:45:38 -07:00
George Hotz
b0dc97d1f7 write out kernel 3 in uops (#11352)
* write out kernel 3 in uops

* matmul is correct

* gemm passes spec

* bugfix to match speed

* cleanups
2025-07-23 17:32:38 -07:00
chenyu
86e7504111 mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
George Hotz
108aac8af4 use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
geohotstan
445ff8de56 ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
वेदांत
e368628736 Add amin support to Tensor operations in Torch backend (#11290)
* intiger div mod fix

* Revert "intiger div mod fix"

This reverts commit d5d2f201bf.

* feat arg_min support

* tets update

* test fix
2025-07-21 09:14:08 -04:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
2f72be5055 nv_smi: init basic insmod/rmmod/reset cmds (#11282) 2025-07-19 15:43:03 +03:00
qazal
577e581943 fix typo in sqtt/readme (#11281) 2025-07-19 15:10:24 +03:00
geohotstan
536b254df4 Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
chenyu
c8e5c4d7c3 insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42 local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
chenyu
6283d50224 DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
chenyu
7ce9e45474 mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
chenyu
ffcc557986 lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
nimlgen
71377cd233 nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
kevvz
b7af9cf849 clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0 pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
nimlgen
4dccb2ea49 am_smi: increase kill retries (#11099) 2025-07-05 16:23:50 +03:00
0xSG
17119b0f23 hip_ioctl: platform.machine added (#11084) 2025-07-04 17:20:24 +03:00
nimlgen
2d138c6cf1 am: factor out init_sw (#11070) 2025-07-03 11:01:17 +03:00
chenyu
425d5f55c4 generate kernel dataset and upload artifact (#11063) 2025-07-02 17:21:25 -04:00
chenyu
4626e9c172 is_numpy_ndarray helper [pr] (#11050) 2025-07-02 09:12:53 -04:00
chenyu
126fcf4129 clean up AMD_LLVM in tests (#11021) 2025-06-28 22:45:47 -04:00
chenyu
a6485d00c8 very tiny generate_dataset (#11013)
one minute to gen on my mac
2025-06-27 17:10:45 -04:00
George Hotz
be53ef4f0a rename DEFINE_ACC -> DEFINE_REG (#11006)
* rename DEFINE_ACC -> DEFINE_REG

* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
George Hotz
b4eb876d5a kernel.py no longer permutes reduce axis [pr] (#10968)
* kernel.py no longer permutes reduce axis [pr]

* delete tests that handcode uops

* regen of sops is broken...

* put import back

* just remove that

* disable those tests
2025-06-26 17:44:58 -07:00
George Hotz
856759c79c add halide example (#10980)
* add halide example

* upd halide gemm

* partial works

* touchups
2025-06-26 16:14:57 -07:00
qazal
1127302c46 move perfetto to extra (#10994)
* move perfetto to extra

* update TestViz and fix tests

* remove perfetto.html from viz directory

* work

* mypy
2025-06-27 01:53:54 +03:00
qazal
712980e167 fix extract_dataset + add tests to CI (#10995)
* fix extract_dataset + tests

* add CI

* sops.gz itself is same as master

* yml + gzip -c + ge

* don't commit that

* bump limit to 1000

* axis=7

* test_tiny
2025-06-27 01:51:36 +03:00
geohotstan
50936b4a18 ONNX real float16 (#10694)
* squash commits

* temp fix for const tensor

* actually realizing float16 can only happen in raw_data

* .float -> cast(float) to rerun CI

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-26 14:05:12 -04:00
chenyu
49bba2f0a0 improve test_nll_loss (#10986)
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
nimlgen
1c45b9f7fb start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
chenyu
ffb032e31d test_diagonal touchup (#10962) 2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632 Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945)
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend

* cleanup

* generic fix

* tests

* cmp with diagonal too

* oops

* move tests

* fix test

* remove unnecessary import

* fix assert

* compare against numpy

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-06-24 15:36:06 -04:00
chenyu
18e264a449 Tensor.logsigmoid (#10955) 2025-06-24 11:16:14 -04:00
George Hotz
e15754db28 remove (some) kernelize from llama and test schedule speed (#10939)
* remove kernelize from llama

* 405B

* space
2025-06-23 15:07:31 -07:00
alpharush
22f9696522 Fix/hcqfuzz harnesss bug (#10923)
* update command so extra module is found

* fix empty range in randrange errors

* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc ONNX improve dtype fallback (#10800)
* fix

* add early verbose demo test

* is this how to write tests :s

* is definition drift even a thing? gemini says it is

* clean up

* better

* even better

* try add to CI

* doesn't work quite yet

* much more work to be done

* whoops

* partition the test heh

* skipif

* some nits for better names

* add webgpu test for onnxrunner

* fix reference links

* flush for now
2025-06-21 19:29:45 -04:00