Commit Graph

69 Commits

Author SHA1 Message Date
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
qazal
366badaa68 require renderer argument in get_program, removes device opening in process replay [pr] (#13524) 2025-12-03 02:05:31 +08:00
chenyu
8f0e747b3a Tensor._tri with arange (#13297) 2025-11-16 10:21:16 -05:00
chenyu
e8844853ed Tensor.eye with arange (#13287)
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
chenyu
c3278e5622 clean up old tests (#12708) 2025-10-15 17:53:17 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
chenyu
e701106a64 remove FUSE_ARANGE (#12511)
it was the default already
2025-10-08 04:54:07 -04:00
chenyu
5b12764b83 add arange cat arange test (#12217)
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
George Hotz
9789337722 early reduce simplify (#12046)
* early reduce simplify

* min changes

* need that

* that goes in simplify

* no more arange reduce opt
2025-09-10 21:02:46 +08:00
nimlgen
551560b87c do not use getenv('PTX') in tests (#12095)
* test without ptx

* fix tests

* fix test

* linters
2025-09-10 14:04:07 +03:00
George Hotz
ee4f696086 delete more tests (#12043)
* delete more tests

* delete and simplify

* flaky on windows

* a few more, those remained
2025-09-05 15:31:30 -07:00
chenyu
ed2f45712b remove skip PTX in test_arange (#11991)
all passes now
2025-09-03 20:45:19 -04:00
George Hotz
82be8abfd2 move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
George Hotz
21570545d3 move view pushing to codegen, try 2 (#11534)
* move view pushing to codegen, try 2

* fix up some linearizer tests

* fix test search

* fix test schedule

* delete that test

* fix test arange

* fix a few tests

* update tests

* push views

* ebs cleanup

* fix local/reg

* test and lint

* fix more tests

* test cleanups

* skipped that one
2025-08-06 15:58:38 -07:00
George Hotz
80d9cced07 more test cleanups (#11544)
* more test cleanups

* revert that
2025-08-06 15:05:21 -07:00
chenyu
a0438012af remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
Ahmed Harmouche
35eb4d357a [webgpu] Fix atomic shared mem load inside loop (#10530)
* Disable shared mem atomics on webgpu

* allow_any_len in load pattern matcher to fix temp load inside loop
2025-05-31 09:29:02 -04:00
Ahmed Harmouche
bbb6deff53 Increase op limit in test_index_mnist to pass on webgpu (#10504)
* Increase op limit to enable  mnist indexing on webgpu

* Only relax op_limit on WebGPU
2025-05-24 09:37:31 -04:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
Sieds Lykles
dbb7aee02e Split constant in div with negative x (#10088)
* add rule

* change test

* lower complexity limit

* remove offset in fold_unrolled_divs

* remove import

* add one more condition
2025-04-28 16:24:14 -04:00
George Hotz
690dac79b5 don't modify the ranges on reduce rewrite (#10062)
* bug in div range folding

* simpler

* oh, this is right for indexing, but the div mod folding needs to be fixed

* reenable

* Passing test_complexity_w_unroll2 (#10068)

* Passing

* remove non_folded_divs

* Add check for negative tern in div folding

* Add test

* bump that limit

* fix casted

---------

Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
2025-04-28 12:01:19 -04:00
George Hotz
1253819151 make beautiful indexing use a Variable (#10063)
* make beautiful indexing use a Variable

* stunning test

* better color

* training is broken

* fix tests

* fix variable indexing

* fix test

* no contiguous

* revert that

* revert that too

* indexing two bind

* skip for webgpu

* make not slow
2025-04-27 08:22:38 -04:00
George Hotz
1805403821 fix rand arange folding (#10060)
* test rand range

* --amend

* fix rand arange folding

* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
George Hotz
ea5dddc537 reduce collapse generic (#10045)
* reduce collapse generic

* new arange folder

* new range folding

* correct with sym

* all tests pass

* indexing ops passes

* failing tests

* fix tests, remove unused

* revert that

* torch indexing is fast

* skip on webgpu

* touchups

* comments
2025-04-26 09:13:24 -04:00
George Hotz
aec75f51ef fixup some slow CI tests [pr] (#10027)
* fixup some slow CI tests [pr]

* shrink test index
2025-04-24 09:20:49 -04:00
George Hotz
cac8bcf8b5 use Ops.REDUCE (#9721)
* decrease bert python time [pr]

* order copies

* Revert "order copies"

This reverts commit 3f62c8693b.

* rewrite count

* Ops.REDUCE

* acc first in the add chain

* Fix tensor core acc

* arange patterns look good

* fix multireduce gate

* reduce rewrite rule

* bump that to 15 minutes

* multiwmma isn't fusing

* gep through wmma is gep pushing

* bump that timeout too, it's all env setup

* add failing test
2025-04-04 10:14:34 +08:00
chenyu
f8976dd2eb enable more webgpu tests (#9502)
OSX has larger buffer number limit, and it supports fp16 now
2025-03-18 23:03:54 -04:00
chenyu
99b0287e4e add GROUP and GROUPTOP to test_arange (#9432)
it does not grow quadratically, but it's not 0 ops now
2025-03-13 11:28:38 -04:00
George Hotz
2cc4cb74f0 reorder binops (#9328)
* reorder binops

* test improvements + fix string tests

* ugh, okay this
2025-03-03 14:58:18 +08:00
chenyu
49abc09f77 remove the reshapes in test_arange_2_reduce [pr] (#9063) 2025-02-13 12:33:25 -05:00
Ignacio Sica
b240f12593 [TIP-9] rename Opt's amt to arg 2 (#8770)
* rename Opt amt to arg

* ignore_beam_cache for test_tiny

* move ignore_beam_cache to test_tiny

* move to separate pr

* revert space change

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-27 14:19:04 -05:00
George Hotz
3ed146a5ff Revert "rename Opt amt to arg (#8767)" (#8769)
This reverts commit bf041659a5.
2025-01-27 23:46:37 +09:00
Ignacio Sica
bf041659a5 rename Opt amt to arg (#8767) 2025-01-27 23:36:47 +09:00
George Hotz
b4bf6a7dea switch backward to use gradient [pr] (#8235)
* switch backward to use gradient [pr]

* set device correctly, dedup

* why does that fail?

* add noop cast

* simple backward

* fix beautiful_mnist

* touchups

* set in compute_gradient

* uop_count

* uop_count was wrong

* collections

* no note

* skip that test

* update sched kernel counts

* train mnist is 65

* fix metadata and gc

* fixes

* materialize_grads

* no pathlib stuff

* add contiguous_backward, fix bugs

* add some realize

* fix multi
2025-01-26 09:12:16 +09:00
George Hotz
46a8c5e1e5 delete forced_realize (#8615)
* delete forced_realize

* put that back

* expectedFailures

* cleaner create_subbuffer

* more comments

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-01-20 09:40:36 -08:00
George Hotz
bfbe81df71 remove cast before view (#8613)
* remove cast before view

* greener

* indexing

* that passes too

* openpilot too

* ack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-14 15:04:58 -05:00
George Hotz
8f95b578f6 use Estimates class [pr] (#8319)
* use Estimates class [pr]

* frozen dataclass
2024-12-18 10:19:32 -08:00
qazal
9044b0746a delete lazy [pr] (#7801)
* LazyBuffer = UOp

* try 4 at this diff

* skip optimization tests p1

* raise kernel count expectations

* BIND isn't the _only_ uop that can become a tensor

* fix test_ones_sum on symbolic

* bump openpilot, correctness first

* offset on assign is fine

* uop is immutable

* what if this was higher

* more optimization skips

* instant fold const copy

* test_multitensor shouldn't expect buffer for unrealized

* move copy folder to upats

* start BUFFER_VIEW

* kinda BUFFER_VIEW

* Revert "kinda BUFFER_VIEW"

This reverts commit 94b4fe3040.

* BUFFER_VIEW try 2

* linter and missed _device

* pylint

* keep Ops.CONTIGUOUS

* always BUFFER_VIEW disk

* test

* cpu isn't a real device

* buffer references afte del

* add that back

* start bringing some of these back

* more test updates

* simpler simplify copy

* subbufer everything

* this is fine with buffer view

* cleanup the diff in test/ 1

* copy is one thing

* diff pruning

* diff pruning 2

* oh bind unbinds way too early

* extra

* more diff pruning

* more const folding

* experiment with symbolic here

* Revert "experiment with symbolic here"

This reverts commit cb87d61f7a.

* Revert "more const folding"

This reverts commit 2a7d258a2b.

* Revert VALID early folding

This reverts commit 4074f52317.

* storing const is fine

* fix test_prefer_half_buffer

* iterate on test_real_world

* this fixes test_train_mnist memory, breaks everything else

* Revert "this fixes test_train_mnist memory, breaks everything else"

This reverts commit dccfcbe068.

* always expect buffer to exist here

* temp debug: something is mutating lazydata in compile3

* Revert "temp debug: something is mutating lazydata in compile3"

This reverts commit 71400f0d55.

* everything back to normal

* compile3

* compile3 test

* start captured jit work, that test passes

* finalized memory skip set

* linter err

* back to base here

* tiny metaop cleanup

* print tensor

* 4th type this unbind got me

* green pickle

* tensor_variable sanity

* cast sanity

* link from the reds

* COPY sanity + minor repr change

* you can exist

* enable test_winograd

* bye bye nbytes

* danger, uop is mutating

* real become

* delete those from uop init

* put it in buffer init

* buffer inits with so much stuff

* buffer pickle try 2

* toposort can't be a cached property

* fix test_schedule_gc_with_inputs

* remove all @unittest.skip(gc)

* Revert "remove all @unittest.skip(gc)"

This reverts commit 9d8d92dd85.

* reenable real world + test_schedule_gc

* test: RUN_PROCESS_REPLAY=0

* fix pickle jit

* test changes

* reenable test_lru_alloc and TestTrain

* fix imagedtype

* bring pr back

* reenable 3 gc tests

* test_schedule better diff

* disable SPLIT_REDUCEOP

* test_save_all_dtypes looks fixed

* fix metadata

* skip that one

* fix viz by not pickling buffers

* simple test for const folding

* bring split reduceop back

* add simplify_alu

* simplify_binop fixes a test

* fix cast folding

* disable that test

* that test looks fine

* changes from delete_lazy pruning p1

* cast folding and children base

* test: cast folding from pruning branch

* green test_sgd_4convs_fuse_conv_bw

* enable some indexing folding

* test_complex_backward is fixed

* prune more, 295 -> 233

* fix test_multi_const_folding_literal

* fix double copy

* early become test

* ooooops

* clean up ctx in all big_graph

* fix openpilot 208 kernels

* train_cifar is fine now

* fix CAST_BEFORE_VIEW

* ever faker const

* back to 13

* mark expectedFailure

* fine don't create them

* test_multi_const_folding_tensor

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-12 05:05:19 +08:00
Ahmed Harmouche
2d11765295 Fix WebGPU atomic store (#7954) 2024-11-29 19:31:25 +08:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
3b26e51fce Tensor.cummax (#7854)
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
wozeparrot
c100f3d406 default threefry (#6116) 2024-09-25 17:45:13 +08:00
chenyu
a37e92081a fix unrolled arange folding (#6606)
* fix unrolled arange folding

also added flop test to test_arange to make sure it's 0 flop

* skip PTX
2024-09-19 09:03:01 -04:00
chenyu
15c4d4f406 fold unrolled arange div pattern (#6465) 2024-09-10 22:35:52 -04:00
Roelof van Dijk
ad4b3b457f bump limit for test_llama_embedding_opt (#6332) 2024-08-31 10:03:43 -04:00
George Hotz
16f420f7a7 split full_graph_rewrite and linearize_uop [run_process_replay] (#6215)
* split full_graph_rewrite and linearize_uop

* fix tests

* graph rewrite in test uops

* add types
2024-08-20 20:12:33 -07:00
George Hotz
a5d79688db fix indexing out of bounds (#6208)
* fix indeing out of bounds

* 5 ops per access is fine
2024-08-20 11:34:56 -07:00