Commit Graph

899 Commits

Author SHA1 Message Date
Sieds Lykles
9f39f6391c shared_codegen_spec and fix index spec (#12967)
* split shared_codegen_spec and fix index

* add VCONST to program_spec and move index to shared_codegen_spec

* working ignore_oob=0

* cleanup

* fix spec

* undo that

* move barrier and special earlier

* fix more spec issues

* more updates

* remove special from program_spec

* cleanup and fixes

* move more to shared

* special is not in shared_spec

* some comments

* dont do bounds check there
2025-10-29 09:14:11 +01:00
chenyu
e18922f111 limit AND const min max to ints [pr] (#12918) 2025-10-25 16:07:52 -04:00
George Hotz
b4f6a2c7a3 add kernel spec (#12911)
* add kernel spec

* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
8a941d95a4 SPEC=2 is full spec, SPEC=1 is default (#12910)
* SPEC=1 passes all tests

* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
wozeparrot
9dac505565 variable bs keccak (#10731) 2025-10-23 14:10:21 -07:00
George Hotz
7762b3558b clean up the spec (#12868)
* tighten up the spec

* move validate into a different file

* that moved to validate

* after(barr)
2025-10-22 19:50:42 +08:00
George Hotz
d711a4b933 delete old linearizer (#12834)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx

* delete linearizer

* remove more junk

* delete that test

* we insert endif

* all ends
2025-10-21 17:52:18 +08:00
George Hotz
c780cd9abb new linearizer with early endrange (#12823)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx
2025-10-21 17:37:48 +08:00
qazal
32af1ff84b viz graph drawing small cleanups (#12830)
* viz graph drawing small cleanups

* str literal
2025-10-21 15:51:32 +08:00
George Hotz
2e9082e0bc after op (#12801)
* after op

* fix tests
2025-10-20 12:27:56 +08:00
George Hotz
ba593f7b98 don't render index (#12796)
* don't render index

* update to ignore_indexing

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-10-20 09:48:36 +08:00
qazal
c8ef4b60f6 viz: share match tracing and TINY device profiler (#12783)
* set a default name for the traces

* set profile_matches + renames

* profile_matches test

* traces 4 steps total
2025-10-19 14:30:07 +08:00
wozeparrot
82f10cfe2e feat: assert on bufferview math (#12772) 2025-10-17 14:20:08 -07:00
qazal
0160f034d6 viz: show display name for copy runners (#12761)
* viz: show display name for copy runners

* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065 viz: add metadata to buffer user list (#12758)
* simple failing test

* encodings

* test passing

* key is deduped
2025-10-17 16:28:54 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
George Hotz
7c19db00f1 remove st from jit/split_reduceop (#12713)
* remove st from jit

* fix by merging reshapes

* no st usage in rangeify

* hmm, stop early works

* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be trace buffer producer and consumers (#12639)
* trace buffer producer and consumers

* work

* generic colored util

* fix batched

* basic clicking works

* generic javascript that works for producer and consumers

* keep focused shape

* idle time

* timings for producer and consumers dedup

* from sd test

* tiny cleanups

* timeline

* work

* up to here

* assert

* list it

* work
2025-10-16 11:11:31 +08:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
qazal
768dc952de viz ui cleanups / renaming (#12691)
* better viz names

* delete unused

* don't use opacity, it's multiplicative

* keep styles

* scrollbar coloring

* pyrender doesn't work here

beautiful_mnist r_64_16_32_36@lower all index dtypes
2025-10-15 18:40:22 +08:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
qazal
f0268d13f6 cleanup viz server (#12688) 2025-10-15 15:58:36 +08:00
George Hotz
db4a359374 fix up some slow tests that launch python (#12672)
* fix up some slow tests that launch python

* svd nonfull in parallel

* split test_advancedindex
2025-10-14 19:13:55 +08:00
Sieds Lykles
e06cbfcb8a combine pm_drop_and_clauses (#12660)
* combine those

* wino kernels decreased
2025-10-14 10:09:41 +02:00
George Hotz
b9eb5b5d49 clean up the LLM tokenizer (#12653)
* clean up the LLM tokenizer

* simple tokenizer is actually simple

* ugh write good code
2025-10-14 14:22:01 +08:00
wozeparrot
47e0c43976 feat: Tensor.{load, store} (#12629) 2025-10-13 08:04:41 -07:00
Sieds Lykles
e0139fafc1 UOp symbolic tests use eval to check against string (#12643) 2025-10-13 14:19:42 +02:00
qazal
b5afa3848e viz: fix memory graph total nbytes (#12622)
* viz: fix memory graph total nbytes

* post increment

* simple regression test

* loop with markers + slightly off text baseline

* cpu events clear
2025-10-12 14:32:46 +03:00
Sieds Lykles
772a8dfe31 reshape uses valid when simplifying (#12597)
* reshape uses valid when simplifying

* try with IGNORE_OOB=0

* is it this test?

* skipif gpuocelot
2025-10-11 17:02:54 +02:00
Sieds Lykles
a2ae56674a uop_given_valid try multiple clauses (#12615)
* uop_given_valid uses less simplify

* enable test

* try all expressions together

* enable test
2025-10-11 11:53:42 +02:00
chenyu
03ef5197fc move get_contraction to helpers [pr] (#12594) 2025-10-10 04:28:57 -04:00
chenyu
af90dc00de remove some View add logic [pr] (#12584)
no longer simplify the case of v0+v1 where v0 has a mask
2025-10-10 03:47:56 -04:00
chenyu
c8dfd10257 ShapeTracker.real_strides -> is_expanded [pr] (#12579)
only keep the used part
2025-10-09 22:52:45 -04:00
chenyu
678f83e41b delete ShapeTracker to_valid_uop and substitute [pr] (#12563) 2025-10-09 05:06:10 -04:00
chenyu
cf8232ec6a clean up more RANGEIFY flag (#12556) 2025-10-09 03:06:48 -04:00
chenyu
250f05a776 run some hashing test only on METAL (#12554)
quite slow on CPU
2025-10-09 02:39:49 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
chenyu
43bce1f39f delete View minify [pr] (#12538) 2025-10-08 23:25:53 -04:00
chenyu
20d98b19c3 delete more unused ShapeTracker stuff (#12536) 2025-10-08 23:09:44 -04:00
George Hotz
0774575442 delete the old rangeify path and all the children stuff (#12524)
* delete the old rangeify path and all the children stuff

* remove the on_stack stuff and any retries

* don't use the p word

* Revert "remove the on_stack stuff and any retries"

This reverts commit 49a2b328b9.
2025-10-08 21:24:04 +08:00
qazal
b6835f4134 remove Ops.VIEW and related UOp methods (#12522)
* remove Ops.VIEW and related UOp methods

* update abstractions2.py

* no ShapeTrackers in abstractions2.py

* it's a size 1
2025-10-08 14:47:02 +03:00
George Hotz
3b0b3a2e64 fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
chenyu
ee0382ad99 remove ShapeTracker.invert (#12520) 2025-10-08 18:37:34 +08:00
chenyu
d5058427ea remove ShapeTracker.real_size (#12519) 2025-10-08 06:15:29 -04:00
qazal
2e19354c1c viz: reorder timeline graphs (#12498)
* viz: reorder timeline graphs

* update test_viz with the new order
2025-10-08 07:10:23 +03:00
Sieds Lykles
b465c17b56 Revert "UOp.factor and add chain sorting (#12413)" (#12492)
This reverts commit e74be4a140.
2025-10-08 03:20:23 +02:00