Commit Graph

1263 Commits

Author SHA1 Message Date
George Hotz
4a151e7533 make xcode signing happy, waiting for entitlement (#12712) 2025-10-16 10:20:34 +08:00
Daniel
d65bd669f8 update tiny torch backend hook (#12575)
* update the backend to fix torch deprecation warning

* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients

* fix indentation

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-15 14:02:33 -04:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
nimlgen
aa81bde150 amd: usb4/thunderbolt on macs (#12641)
* tbgpu

* works

* cleaner

* this

* zero size

* h

* fix

* simpler

* prio over usb

* c

* not needed

* linter

* this way

* mappings

* mypy

* mypy

* mypy 2

* nn
2025-10-15 13:02:01 +08:00
wozeparrot
f228c03f9f fetch raid from cloud (#10799)
* feat: initial tinyfs device

* feat: don't allow compute on tinyfs device

* feat: tensor helpers to load and store

* feat: bufferview for tinyfs

* fix: keep copy sizes correct

* fix: recv large

* clean: unneeded

* feat: comment

* clean: unneeded

* clean: remove

* clean: remove

* feat: get request tag

* feat: rename to cloud

* feat: send request_id

* feat: start computing tree

* feat: compute store tree on this side

* feat: jank chunked load

* feat: more debugging

* feat: rename to just load and store

* feat: correct chunk count

* fix: fix load for < 1mb

* feat: comments

* feat: don't truncate on block devices

* feat: better way of testing block device

* feat: don't need to pad that much

* feat: connect to nodes directly on load

* feat: cache connections

* feat: don't hard code chunk size

* feat: close mmap when closing file handle

* feat: don't overwrite stuff on disk if storing from disk

* clean: debug print

* fix: close mmap

* feat: await workers

* feat: fast copy from tinyfs to disk

* feat: don't copy to device on last

* feat: use single socket per device

* feat: raid in tinyfs

* clean: remove import

* clean: type

* feat: maintain single event loop

* feat: lower worker count

* feat: use connection pool

* feat: fetch mapping in its own process

* fix: release lock

* feat: don't fetch if exists

* feat: req id only on stores

* feat: always fetch

* fix: rangeify

* feat: allow specifying raid root

* fix: dealloc buffer

* feat: start support non 0 offset

* clean: use cleaner

* feat: don't pass to threadpool

* clean: typing
2025-10-14 07:53:55 -07:00
George Hotz
fb61f3519f remove assign contiguous hack (#12659)
* remove assign contiguous hack

* remove bad contiguous usage in torch backend

* assign
2025-10-14 16:42:14 +08:00
qazal
cd6aeebfee sqtt: osx decoder installer (#12637) 2025-10-13 17:26:12 +08:00
nimlgen
89be3590aa amd: sqtt on gfx12 (#12564)
* amd: sqtt on gfx12

* cleaner

* thi

* and this

* ops

* ugh

* back

* rm this

* rm
2025-10-10 17:54:14 +08:00
wozeparrot
f12e2a75db feat: add thunderkittens (#12590) 2025-10-10 00:32:33 -07:00
nimlgen
1309cea247 rocprof parser in extra (#12569)
* rocprof parser

* viewer

* vw

* skip
2025-10-10 14:56:42 +08:00
chenyu
c8dfd10257 ShapeTracker.real_strides -> is_expanded [pr] (#12579)
only keep the used part
2025-10-09 22:52:45 -04:00
George Hotz
9b66c2b0b7 fix weekly commits table (i didn't know we linted extra) 2025-10-10 09:23:33 +08:00
George Hotz
658b96cbfb weekly commits table 2025-10-10 09:15:41 +08:00
nimlgen
a11b686c71 amd: sqtt for all gfx11 (#12546)
* amd: general sqtt for gfx11

* target

* ops

* no gfx12 here
2025-10-09 17:04:06 +08:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
George Hotz
2653147cb7 delete the lowerer (#12526) 2025-10-08 21:58:18 +08:00
chenyu
e701106a64 remove FUSE_ARANGE (#12511)
it was the default already
2025-10-08 04:54:07 -04:00
nimlgen
4a756a37d8 amd: support rocm7 (#12502)
* amd: support rocm7

* mock
2025-10-08 14:30:39 +08:00
George Hotz
514d2a0774 merge tagless reshapes (#12474)
* merge tagless reshapes

* cleanup
2025-10-07 13:57:58 +08:00
George Hotz
b4509fba31 thundermittens (#12471)
* thundermittens

* give device a type
2025-10-07 11:47:39 +08:00
George Hotz
0f25b4b289 move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
hooved
0f804c9a83 Stable Diffusion model init for mlperf (#12314)
* include clip pr diff

* updated unet and sd init

* dehardcode default device

* revert beam hang workaround

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-02 02:28:41 -04:00
qazal
e8c595c29e remu: add new instructions introduced in RANGEIFY (#12363)
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d

* run amd test_ops

* skip test_masked_select
2025-09-30 12:36:29 +03:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
Sieds Lykles
cc038b31b6 Shrink instead of reshape to unregister symbolic (#12241)
* Slice to unbind symbolic

* use vmax for now

* assert shape in reshape is valid

* update test_symbolic_ops to use shrink instead of reshape

* remove infer_with_bound_values for npw

* symbolic output doesnt have symbolic strides

* symbolic jit tests use shrink to unregister symbolic

* update test

* update more tests

* wrap vmax in int()

* only create a new st if the store is not an assigne

* unwrap st

* comments
2025-09-19 06:04:35 +02:00
qazal
a388d2cb1a remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176)
* remove PROFILE=1 option, it's just VIZ=1 [pr]

* sqtt

* sqtt 2

* return last

* rename
2025-09-15 12:51:50 +03:00
hooved
e1fef895b1 don't hardcode weights path (#12171) 2025-09-15 00:33:47 -04:00
chenyu
12a910f1d2 update torch 2.8 (#12172)
support _reshape_alias. something is wrong with one case of unfold
2025-09-14 15:19:03 -04:00
chenyu
0e266f376c ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
nimlgen
fb96394ff5 auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
nimlgen
9182948951 remove llvm_bf16_cast (#12075) 2025-09-08 20:51:15 +03:00
nimlgen
10ac427aaa cpu threading (#11951)
* start cpu threading

* fix

* fix2

* fix

* hacks?

* threads

* minor

* no dsp

* dsp 2

* n

* more

* test

* xm

* cleaner

* readable

* f

* reorder

* when no threads

* rangeify

* typos

* not needed

* reapply

* remoev this

* linter

* fixed cpu count in ci

* fix

* fixes

* rm

* typo

* sort based on speed

* test if test works in ci

* Revert "test if test works in ci"

This reverts commit 1f05edb531.

* do not pad thread
2025-09-06 16:13:43 +03:00
Sieds Lykles
c6c16b2946 var_vals uses str for var (#12011)
* var_vals is str,int

* remove imports

* remove print

* fix test

* change var_vals in hcq

* update test_hcq

* fix multitensor _device_num var

* fix syminfer test

* shorten line

* p.vars stays list[Variable]

* shorten line

* vars is back to tuple[Variable, ...]

* change var_vals in extra

* change var_vals from shapetracker

* var_vals is str:int

* fix signature
2025-09-06 04:16:12 +02:00
George Hotz
38dcadf07b delete kernel.py (#12040)
* delete kernel.py

* delete that file

* rip and tear

* don't test search

* imports

* fix torch frontend

* not a part of regen
2025-09-05 15:52:07 -07:00
George Hotz
870f63d9cc add WARP axistype, fix postopt bugs (#12033)
* postopt is 83% match

* warp is bright CYAN

* beautiful mnist beam works

* fix shutdown bug
2025-09-05 10:36:55 -07:00
George Hotz
f8e2dd4dd1 investigate opts mismatches (#12020) 2025-09-05 07:40:29 -07:00
George Hotz
431666da74 POSTOPT=2 work (#12012)
* POSTOPT=2 work

* bugfixes

* add chain in one place

* tensor cores match

* better hcopt check

* match from old

* Change POSTOPT ContextVar value to 0

* we didn't need to check that
2025-09-04 16:55:56 -07:00
George Hotz
70ce29b630 test pyrender (#12005)
* test pyrender

* make them print

* switch to pyrendered
2025-09-04 11:48:40 -07:00
Sieds Lykles
572a3c15c6 Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src

* arg to src

* remove import

* fixup linearizer

* arg to src

* fix test_uop_graph

* fix more tests

* fix python renderer

* get const value from const uop

* ssimplify uop estimates

* fix webgpu locals

* fix old test

* gate Ops.SPECIAL in linearizer

* use ssimplify() for local/global_size

* remove toposort gate_parents_instead_of_self

* fix rendering in comment

* cleanup

* rename and add comments

* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
George Hotz
5cf42dc4db add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt

* support old

* fix beam

* beaming

* beam on old

* bring tensor cores back

* raise

* postbeam

* test ops passes on mac

* skip that

* postopt default

* gate that

* fix tensor cores

* a few test fixes

* dsp fix

* tc fix

* loop

* support swap

* test_gemv

* fix beam for variable

* test opts from high level stuff

* range annoying

* compile slow

* metal slow

* better beam

* no POSTBEAM

* fix nolocals

* hc opt mostly works

* put that back

* lil

* some work

* fix that

* POSTOPT 2

* fix tests

* no postopt 2

* work

* back

* padded tensors cores

* shift_to

* postopt 0 passes?

* write PADTO

* fix padded tensor cores

* compare hcopt

* 18000 lines

* should pass tests

* fix rangeify

* put types back
2025-09-03 19:23:30 -07:00
qazal
c7bb561ef9 remu: add v_rsq_f32_e32 instruction (#11947)
https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to
the AMD LLVM renderer that outputs this instruction. Adding both 32 and
64 bit variants.
2025-09-01 11:29:31 +03:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
George Hotz
394c2d1db1 update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
Ben Waldron
ea1be2e4cd [bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape

* Refactor symbolic reshape

* fix small error

* much cleaner + fix more tests

* Can remove this now

* Update test_symbolic_ops and test_tiny

* Couple more tests

* Unused import

* More tests and add EXPAND to Tensor.empty

* Fix test beam search

* all int

* Fix rangeify by adding shrink

* Remove OOB check and so fix test_symbolic_jit

* test_symbolic_jit doesn't need OOB Context anymore either

* Should remove that test now

* Cleanups part 1

* fix linters

* Final cleanups

* Don't reassign inside for loop

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 12:30:49 -04:00
nimlgen
874c1db4af am: init support for aql (#11888) 2025-08-28 18:41:46 +03:00
George Hotz
27701ef823 add locals support to rangeify (#11826) 2025-08-24 14:03:12 -07:00
chenyu
fb8ee02424 Tensor.logaddexp (#11793) 2025-08-23 09:15:00 -04:00
chenyu
d0d39885c3 onnx in tinygrad (#11675) 2025-08-14 19:57:21 -04:00
chenyu
48c4033ae1 fix pylint for onnx (#11673)
* fix pylint for onnx

* too long
2025-08-14 18:48:02 -04:00
nimlgen
4176b24264 amd: support xcc in regs (#11670)
* amd: support xcc in regs

* mockamd

* typong
2025-08-14 21:20:11 +03:00