Commit Graph

43 Commits

Author SHA1 Message Date
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
cf0c28d5ae all tests pass on strix halo (#13728) 2025-12-16 19:35:50 -04:00
George Hotz
ffb9e8396f fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00
George Hotz
819592ee67 hotfix: disable DoubleMatmul for PTX 2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8 all double matmul (#12993)
* fix more double matmuls

* a few more

* all double matmul passes

* opts for flash attention

* fix spec

* comment
2025-10-29 16:25:27 +08:00
George Hotz
1c362736aa fix more double matmuls (#12991)
* fix more double matmuls

* a few more
2025-10-29 16:09:48 +08:00
George Hotz
8c47cf4323 pcontig double matmul works (#12899)
* pcontig double matmul works

* tests

* contract

* closer

* works-ish

* add that broadcast

* 2 more work

* something

* disable broken ones

* llvm

* align 16
2025-10-29 13:06:43 +08:00
George Hotz
0bde87d8d7 cleanups from flash attention branch (#12897) 2025-10-24 14:14:56 +08:00
George Hotz
ff68a6263b move locals into codegen (dedup works) (#12885)
* move locals into codegen (dedup works)

* move in optimize
2025-10-23 17:07:39 +08:00
George Hotz
ddb53d1d48 PCONTIG=3 both saves ram and flops (#12884)
* PCONTIG=3 both saves ram and flops

* group

* gate locals

* should be correct
2025-10-23 16:37:26 +08:00
George Hotz
20a232f1c5 bugfixes from multioutput + PCONTIG=3 for fa bw memory fix (#12837)
* bugfixes from multioutput

* PCONTIG=3 fixes fa memory usage

* that's base
2025-10-21 19:21:02 +08:00
George Hotz
c780cd9abb new linearizer with early endrange (#12823)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx
2025-10-21 17:37:48 +08:00
Sieds Lykles
a8e4614436 remove REAL_SUBSTITUTE=0 and make it fast (#12809)
* fast REAL_substitute

* remove REAL_SUBSTITUTE=0
2025-10-20 12:44:20 +02:00
George Hotz
062a6d68d7 test flash attention backward (#12762)
* test flash attention backward

* TODO: fix pcontig

* end ranges

* render colors

* very big

* multiout at every level

* reset ending ranges

* fix tests

* ugh
2025-10-17 23:15:59 +08:00
George Hotz
c9a3464f76 those decimals never mattered (#12760)
* those decimals never mattered

* this

* improve debug

* real substitute fixes pcontig

* locals are different buffers
2025-10-17 17:16:24 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
George Hotz
5977df267f outerworld uses expand (#12578) 2025-10-10 10:25:25 +08:00
chenyu
f2c3a72b0c remove RANGEIFY flag [pr] (#12577) 2025-10-09 21:52:54 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
George Hotz
0774575442 delete the old rangeify path and all the children stuff (#12524)
* delete the old rangeify path and all the children stuff

* remove the on_stack stuff and any retries

* don't use the p word

* Revert "remove the on_stack stuff and any retries"

This reverts commit 49a2b328b9.
2025-10-08 21:24:04 +08:00
George Hotz
12c4963489 add more rangeify pm tests (#12488) 2025-10-07 05:45:38 -04:00
George Hotz
75ce11593c test_reshape_match should match (#12479) 2025-10-07 16:07:21 +08:00
George Hotz
ea7672931f fix test_matmul_relu_cat (#12478) 2025-10-07 02:32:23 -04:00
chenyu
7b48f3cc45 failed test case repro for openpilot model (#12475)
* failed test case repro for openpilot model

* assertEqual
2025-10-07 13:46:43 +08:00
George Hotz
583553f467 split ranges (#12411)
* split ranges

* simpler

* split ranges

* range str

* fix test

* oops

* faster

* no group 2

* tests

* dont_sub_ranges_for_image

* revert that
2025-10-02 12:57:22 +08:00
George Hotz
9ef319f349 bad conv in rangeify (#12373)
* bad conv with broken rangeify

* no maxpool needed

* add empty_like

* typo

* no self

* issue remains for test
2025-10-01 08:56:22 +08:00
George Hotz
a83f219253 fix bad range merges (#12368)
* fix bad range merges

* fix rng

* fix uop gc
2025-09-30 19:30:21 +08:00
George Hotz
7129419500 fix cifar training in RANGEIFY (#12355)
* fix cifar training in RANGEIFY

* even more wino fuse

* bugfix

* test to show issue
2025-09-30 15:59:19 +08:00
George Hotz
f522e83a02 fix rangeify elu fusion for openpilot (#12341)
* fix rangeify elu fusion for openpilot

* flip the metadata

* copy over permuted contiguous support

* this is correct

* update that
2025-09-30 11:41:52 +08:00
George Hotz
3291e00df7 fix efficientnet slowness on rangeify (#12332) 2025-09-29 18:01:01 +08:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
George Hotz
bcafa72b7f use tags instead of graph_rewrite_map in rangeify (#12110)
* use tags instead of graph_rewrite_map in rangeify

* new style, add realize

* metadata works

* simple failure

* fix

* loops

* stuff becomes a NOOP when you remove it

* stuff becomes a NOOP when you remove it

* tags on bufferize

* bmnist works

* locals don't work

* shippable

* fix some tests

* simpler map_realize

* remove const hack

* debuggable test

* broke

* assign test

* straight up bug

* wooo it passes

* sink shouldn't be there

* fix ops

* bmnist

* kv cache ish

* Set RANGEIFY context variable to 0

* should work normal

* better

* types

* hacks to fix test_symbolic

* pm_add_buffers

* tests should pass
2025-09-14 11:39:01 +08:00
George Hotz
3ef0e5e01e rangeify: use Ops.REALIZE and not Ops.CONTIGUOUS if it's added by system (#12111)
* rangeify: use Ops.REALIZE and not Ops.CONTIGUOUS if it's added by system

* fix contig + BufferizeOpts

* no outerworld
2025-09-11 11:56:59 +08:00
George Hotz
d4eba5800d rangeify cost function infrastructure (#12091)
* one call to hc opt

* does that pass?

* add cost function to rangeify

* test

* more test

* gate thread

* bufferize has shape

* ish

* match old behavior

* no ci there
2025-09-11 07:19:53 +08:00
George Hotz
5cf42dc4db add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt

* support old

* fix beam

* beaming

* beam on old

* bring tensor cores back

* raise

* postbeam

* test ops passes on mac

* skip that

* postopt default

* gate that

* fix tensor cores

* a few test fixes

* dsp fix

* tc fix

* loop

* support swap

* test_gemv

* fix beam for variable

* test opts from high level stuff

* range annoying

* compile slow

* metal slow

* better beam

* no POSTBEAM

* fix nolocals

* hc opt mostly works

* put that back

* lil

* some work

* fix that

* POSTOPT 2

* fix tests

* no postopt 2

* work

* back

* padded tensors cores

* shift_to

* postopt 0 passes?

* write PADTO

* fix padded tensor cores

* compare hcopt

* 18000 lines

* should pass tests

* fix rangeify

* put types back
2025-09-03 19:23:30 -07:00
George Hotz
0dfca4e74b add failing test for rangeify setitem (#11954) 2025-09-01 16:24:35 -07:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
George Hotz
b9b438c516 small updates from postopt (#11903)
* tests from postopt

* modernize

* skip lin tests

* that's fixed?

* skip, not failure
2025-08-28 12:34:52 -07:00
George Hotz
b268755d51 small changes from postopt (#11854) 2025-08-26 11:56:16 -07:00
George Hotz
9832599c9e test_vmap + permute isn't a sint (#11783)
* test_vmap + permute isn't a sint

* order
2025-08-21 22:39:35 -07:00
George Hotz
bb8de51e5f remove unused early cleanups + contig w range [pr] (#11780)
* remove unused early cleanups [pr]

* contiguous with range

* woah, this works
2025-08-21 20:04:45 -07:00
George Hotz
9635592141 ** rangeify, try 3 (#11683)
* ** rangeify, try 3

* bring that over

* bufferize, don't use contig tag

* work

* ish

* fix rangeify

* flash attention is back

* fix rangeify tests

* stuff passes

* fix test_log_softmax

* more stuff passes

* progress children

* new endrange solution

* progress

* progress counter

* basic assign

* contigs only

* symbolic in schedule

* unbind_kernel

* late children

* ops fixed

* beautiful mnist is close

* that seems to work

* mnist works

* improve names

* fix bmnist

* no pcontig

* testing backward

* work

* clone movement ops

* new_range helper

* MBLOCK/MERGE

* ops tests pass

* revert mblock stuff

* cleanups...but it breaks ops

* remove reindex

* hack for relu

* disable the hacks

* more hacks

* upd

* mostly works with cleanups disabled

* ndr

* ops tests pass

* terrible hacks for indexing to work

* context mismatch

* pcontig

* split pcontig v contig

* z3 trunc

* null

* no fuse in rangeify

* ops test passes

* lnorm

* fix assign

* nd rangeify

* both should work

* tests for rangeify

* cleanups

* stores pass the pointer through

* disable pcontig for now

* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00