Commit Graph

4538 Commits

Author SHA1 Message Date
George Hotz
203a93363c Revert "after clean up of locals (#12813)" (#12814)
This reverts commit 5d0d3d7aac.
2025-10-20 19:33:35 +08:00
George Hotz
5d0d3d7aac after clean up of locals (#12813) 2025-10-20 19:24:24 +08:00
Sieds Lykles
a8e4614436 remove REAL_SUBSTITUTE=0 and make it fast (#12809)
* fast REAL_substitute

* remove REAL_SUBSTITUTE=0
2025-10-20 12:44:20 +02:00
George Hotz
2e9082e0bc after op (#12801)
* after op

* fix tests
2025-10-20 12:27:56 +08:00
George Hotz
ba593f7b98 don't render index (#12796)
* don't render index

* update to ignore_indexing

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-10-20 09:48:36 +08:00
chenyu
63a23dfe80 test step 0 in TestTrainingOnnxOps (#12790)
and tighter rtol
2025-10-19 09:15:49 -04:00
chenyu
e8158afd4b update test_qlinear_add_round_half_to_even (#12789)
this does not pass locally
2025-10-19 08:47:27 -04:00
Sieds Lykles
fd6ef4801c rangeify uses symbolic_flat (#12786)
* symbolic_simple -> symbolic_flat

* remove expected failures
2025-10-19 12:27:14 +02:00
qazal
c8ef4b60f6 viz: share match tracing and TINY device profiler (#12783)
* set a default name for the traces

* set profile_matches + renames

* profile_matches test

* traces 4 steps total
2025-10-19 14:30:07 +08:00
chenyu
30ff84d050 update test_conv2d_ceildiv_edge_case (#12779) 2025-10-18 16:43:32 -04:00
nimlgen
442218266d qcom: fix profiler (#12778)
* qcom: fix profiler

* this way
2025-10-19 01:27:59 +08:00
wozeparrot
82f10cfe2e feat: assert on bufferview math (#12772) 2025-10-17 14:20:08 -07:00
chenyu
fcdf4ab37e remove a contiguous in LARS (#12770) 2025-10-17 17:07:30 -04:00
George Hotz
062a6d68d7 test flash attention backward (#12762)
* test flash attention backward

* TODO: fix pcontig

* end ranges

* render colors

* very big

* multiout at every level

* reset ending ranges

* fix tests

* ugh
2025-10-17 23:15:59 +08:00
George Hotz
c9a3464f76 those decimals never mattered (#12760)
* those decimals never mattered

* this

* improve debug

* real substitute fixes pcontig

* locals are different buffers
2025-10-17 17:16:24 +08:00
qazal
0160f034d6 viz: show display name for copy runners (#12761)
* viz: show display name for copy runners

* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065 viz: add metadata to buffer user list (#12758)
* simple failing test

* encodings

* test passing

* key is deduped
2025-10-17 16:28:54 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156 few shapetracker cleanups (#12741) 2025-10-16 12:43:27 -04:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
George Hotz
7c19db00f1 remove st from jit/split_reduceop (#12713)
* remove st from jit

* fix by merging reshapes

* no st usage in rangeify

* hmm, stop early works

* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be trace buffer producer and consumers (#12639)
* trace buffer producer and consumers

* work

* generic colored util

* fix batched

* basic clicking works

* generic javascript that works for producer and consumers

* keep focused shape

* idle time

* timings for producer and consumers dedup

* from sd test

* tiny cleanups

* timeline

* work

* up to here

* assert

* list it

* work
2025-10-16 11:11:31 +08:00
chenyu
c3278e5622 clean up old tests (#12708) 2025-10-15 17:53:17 -04:00
nimlgen
3ab23af829 nv: copy prog with copyin (#12701)
* nv: copy prog with copyin

* to bytes

* fix test
2025-10-15 22:48:01 +08:00
chenyu
312c622d35 support None in pad_to and shrink_to (#12700) 2025-10-15 09:25:31 -04:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
qazal
768dc952de viz ui cleanups / renaming (#12691)
* better viz names

* delete unused

* don't use opacity, it's multiplicative

* keep styles

* scrollbar coloring

* pyrender doesn't work here

beautiful_mnist r_64_16_32_36@lower all index dtypes
2025-10-15 18:40:22 +08:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
qazal
f0268d13f6 cleanup viz server (#12688) 2025-10-15 15:58:36 +08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
chenyu
d25ceffe8d update padto opts tests (#12679) 2025-10-14 17:00:42 -04:00
George Hotz
db4a359374 fix up some slow tests that launch python (#12672)
* fix up some slow tests that launch python

* svd nonfull in parallel

* split test_advancedindex
2025-10-14 19:13:55 +08:00
George Hotz
fb61f3519f remove assign contiguous hack (#12659)
* remove assign contiguous hack

* remove bad contiguous usage in torch backend

* assign
2025-10-14 16:42:14 +08:00
Sieds Lykles
e06cbfcb8a combine pm_drop_and_clauses (#12660)
* combine those

* wino kernels decreased
2025-10-14 10:09:41 +02:00
George Hotz
84d4589ed4 remove pylint from pre-commit and CI (#12658)
* remove pylint from pre-commit and CI

* multidevice test is fast

* faster pre-commit

* 8 is faster than 4

* better name

* how did that typecheck?
2025-10-14 15:39:59 +08:00
George Hotz
b9eb5b5d49 clean up the LLM tokenizer (#12653)
* clean up the LLM tokenizer

* simple tokenizer is actually simple

* ugh write good code
2025-10-14 14:22:01 +08:00
wozeparrot
47e0c43976 feat: Tensor.{load, store} (#12629) 2025-10-13 08:04:41 -07:00
Sieds Lykles
e0139fafc1 UOp symbolic tests use eval to check against string (#12643) 2025-10-13 14:19:42 +02:00
Sieds Lykles
e537e895b1 drop unused invalid conditions (#12635)
* drop where conditions if the ranges are not used inside the index

* remove allow_any_len
2025-10-13 10:52:21 +02:00
qazal
fd51ecf983 process_replay for get_rangeify_map (#12624) 2025-10-12 15:14:40 +03:00
qazal
b5afa3848e viz: fix memory graph total nbytes (#12622)
* viz: fix memory graph total nbytes

* post increment

* simple regression test

* loop with markers + slightly off text baseline

* cpu events clear
2025-10-12 14:32:46 +03:00
Sieds Lykles
772a8dfe31 reshape uses valid when simplifying (#12597)
* reshape uses valid when simplifying

* try with IGNORE_OOB=0

* is it this test?

* skipif gpuocelot
2025-10-11 17:02:54 +02:00
Sieds Lykles
a2ae56674a uop_given_valid try multiple clauses (#12615)
* uop_given_valid uses less simplify

* enable test

* try all expressions together

* enable test
2025-10-11 11:53:42 +02:00