Commit Graph

7215 Commits

Author SHA1 Message Date
qazal
1824cbd72c s/lazybufs/tensor_uops [pr] (#8207) 2024-12-13 19:20:02 +08:00
qazal
6d6c34eb1e scheduler local graph_rewrite cleanups [pr] (#8206)
* scheduler local graph_rewrite cleanups [pr]

* extra merge
2024-12-13 19:07:09 +08:00
qazal
4a617c84e1 cleanup ctx usage in scheduler upats [pr] (#8205) 2024-12-13 18:01:13 +08:00
qazal
55b8c4e8bf apply_swizzle can apply to any views [pr] (#8204) 2024-12-13 17:58:35 +08:00
Ahmed Harmouche
651f72442c encapsulate the exported webgpu model (#8203) 2024-12-13 10:55:37 +01:00
qazal
5864627abe process replay filter warnings [pr] (#8199) 2024-12-13 17:43:43 +08:00
qazal
c5c0d0277d flatten buffer args, delete dtype [pr] (#8202) 2024-12-13 16:43:47 +08:00
Ahmed Harmouche
5198415bfb No unpack_map in wgsl (#8200) 2024-12-13 08:10:31 +01:00
leopf
fe68dbdb23 GroupOp.Idempotent (#8198) 2024-12-12 20:44:04 -05:00
chenyu
ce41e6572d unit test merge_dim [pr] (#8195)
looking for better ways to write this. first adding some tests
2024-12-12 17:55:52 -05:00
chenyu
d47530c0d4 fix device canonicalize for :0 in middle [pr] (#8193)
replace is wrong because it does not check if `:0` is at the end. use re.sub instead
2024-12-12 16:32:36 -05:00
chenyu
40a4c603b9 remove more test skip for webgpu [pr] (#8192) 2024-12-12 14:06:35 -05:00
chenyu
d586c7e108 remove had_counter from rand (#8191) 2024-12-12 13:35:39 -05:00
chenyu
2fe98e44cd unneeded isinstance(size, int) in alloc [pr] (#8189) 2024-12-12 13:05:02 -05:00
chenyu
72ff631f8d remove unreachable tensor dtype assert (#8190)
it would have failed in `to_dtype`. added some tests for it too
2024-12-12 13:04:49 -05:00
chenyu
2e4c7d4cfb add "tinygrad" to be part of cache_dir [pr] (#8188)
instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal
2024-12-12 12:09:44 -05:00
Ahmed Harmouche
db76586780 Cast pattern touchup in AMDRenderer [pr] (#8185) 2024-12-12 15:12:14 +01:00
nimlgen
bf7d1fcd2c tiny import fixes in hcq graph (#8184) 2024-12-12 16:30:06 +03:00
Ahmed Harmouche
2f2b1e792c wgsl and ops_webgpu simplifications [pr] (#8182)
Simplify wgsl and ops_webgpu
2024-12-12 14:21:58 +01:00
George Hotz
d9a0880d33 delete fuzz uops (not tested) [pr] (#8181) 2024-12-12 01:41:27 -08:00
George Hotz
c77cb57454 remove untested BEAM_COMPARE=1 [pr] (#8180) 2024-12-12 01:35:27 -08:00
Ahmed Harmouche
1b94cc095a Bump back wgpu to latest (#8179) 2024-12-12 09:40:52 +01:00
chenyu
97aaa50f3a remove duplicated UOp in Tensor init types [pr] (#8177)
and a small comment
2024-12-11 22:59:35 -05:00
chenyu
d240bdd172 remove upcast_in_mid_reduce_axes [pr] (#8176) 2024-12-11 22:14:28 -05:00
chenyu
64a917b7eb remove LAZYCACHE ContextVar [pr] (#8175)
also removed from resnet latest script
2024-12-11 22:02:52 -05:00
chenyu
7047ffd27d tiny gguf_load cleanup [pr] (#8174)
round_up helper
2024-12-11 21:32:52 -05:00
George Hotz
151ac5f5a2 remove UPCASTMID [pr] (#8173) 2024-12-11 17:29:01 -08:00
George Hotz
f86e0014b7 delete CAPTURE_BEAM, this should use PR or VIZ infrastructure instead [pr] (#8172) 2024-12-11 16:29:03 -08:00
George Hotz
8a04a3a77a rename LazyBuffer -> UOp [pr] (#8169)
* rename LazyBuffer -> UOp [pr]

* fix docs
2024-12-11 16:15:52 -08:00
George Hotz
e0fe867c74 delete beam compare 2 [pr] (#8168) 2024-12-11 16:10:01 -08:00
chenyu
aaa3cc235d unused from __future__ import annotations (#8171) 2024-12-11 19:05:04 -05:00
George Hotz
aae2f4da8d fix process replay [pr] (#8170)
* empty change [pr]

* store the context in PROCESS_REPLAY_CAPTURE
2024-12-11 15:58:42 -08:00
qazal
9044b0746a delete lazy [pr] (#7801)
* LazyBuffer = UOp

* try 4 at this diff

* skip optimization tests p1

* raise kernel count expectations

* BIND isn't the _only_ uop that can become a tensor

* fix test_ones_sum on symbolic

* bump openpilot, correctness first

* offset on assign is fine

* uop is immutable

* what if this was higher

* more optimization skips

* instant fold const copy

* test_multitensor shouldn't expect buffer for unrealized

* move copy folder to upats

* start BUFFER_VIEW

* kinda BUFFER_VIEW

* Revert "kinda BUFFER_VIEW"

This reverts commit 94b4fe3040.

* BUFFER_VIEW try 2

* linter and missed _device

* pylint

* keep Ops.CONTIGUOUS

* always BUFFER_VIEW disk

* test

* cpu isn't a real device

* buffer references afte del

* add that back

* start bringing some of these back

* more test updates

* simpler simplify copy

* subbufer everything

* this is fine with buffer view

* cleanup the diff in test/ 1

* copy is one thing

* diff pruning

* diff pruning 2

* oh bind unbinds way too early

* extra

* more diff pruning

* more const folding

* experiment with symbolic here

* Revert "experiment with symbolic here"

This reverts commit cb87d61f7a.

* Revert "more const folding"

This reverts commit 2a7d258a2b.

* Revert VALID early folding

This reverts commit 4074f52317.

* storing const is fine

* fix test_prefer_half_buffer

* iterate on test_real_world

* this fixes test_train_mnist memory, breaks everything else

* Revert "this fixes test_train_mnist memory, breaks everything else"

This reverts commit dccfcbe068.

* always expect buffer to exist here

* temp debug: something is mutating lazydata in compile3

* Revert "temp debug: something is mutating lazydata in compile3"

This reverts commit 71400f0d55.

* everything back to normal

* compile3

* compile3 test

* start captured jit work, that test passes

* finalized memory skip set

* linter err

* back to base here

* tiny metaop cleanup

* print tensor

* 4th type this unbind got me

* green pickle

* tensor_variable sanity

* cast sanity

* link from the reds

* COPY sanity + minor repr change

* you can exist

* enable test_winograd

* bye bye nbytes

* danger, uop is mutating

* real become

* delete those from uop init

* put it in buffer init

* buffer inits with so much stuff

* buffer pickle try 2

* toposort can't be a cached property

* fix test_schedule_gc_with_inputs

* remove all @unittest.skip(gc)

* Revert "remove all @unittest.skip(gc)"

This reverts commit 9d8d92dd85.

* reenable real world + test_schedule_gc

* test: RUN_PROCESS_REPLAY=0

* fix pickle jit

* test changes

* reenable test_lru_alloc and TestTrain

* fix imagedtype

* bring pr back

* reenable 3 gc tests

* test_schedule better diff

* disable SPLIT_REDUCEOP

* test_save_all_dtypes looks fixed

* fix metadata

* skip that one

* fix viz by not pickling buffers

* simple test for const folding

* bring split reduceop back

* add simplify_alu

* simplify_binop fixes a test

* fix cast folding

* disable that test

* that test looks fine

* changes from delete_lazy pruning p1

* cast folding and children base

* test: cast folding from pruning branch

* green test_sgd_4convs_fuse_conv_bw

* enable some indexing folding

* test_complex_backward is fixed

* prune more, 295 -> 233

* fix test_multi_const_folding_literal

* fix double copy

* early become test

* ooooops

* clean up ctx in all big_graph

* fix openpilot 208 kernels

* train_cifar is fine now

* fix CAST_BEFORE_VIEW

* ever faker const

* back to 13

* mark expectedFailure

* fine don't create them

* test_multi_const_folding_tensor

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-12 05:05:19 +08:00
chenyu
26e049ab40 add ALLOWED_READ_IMAGE=2131 to openpilot (#8166)
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
chenyu
0e57152dbb clean up test_uop_symbolic [pr] (#8165)
removed old `Node` references
2024-12-11 14:13:19 -05:00
chenyu
5eadae204b test multi device rand with manual_seed (#8164) 2024-12-11 13:11:31 -05:00
Maxim Zakharov
e53a5bf0c3 StableDdiffusion UI - convenient send via Enter (#8160) 2024-12-11 19:05:24 +01:00
qazal
047a6dabc3 prereq for scheduler contiguous_child [pr] (#8163)
* the whole context is fine here [pr]

* fix that
2024-12-12 02:02:22 +08:00
ignaciosica
3a8e8ac6c2 remove dead code (#8161) 2024-12-11 12:07:19 -05:00
George Hotz
8f4299fcc8 hotfix: suppress shutdown errors in CLProgram 2024-12-11 08:08:32 -08:00
Ahmed Harmouche
a73e3677d0 Test linearizer on webgpu (#8159)
* Test linearizer on wgpu

* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
qazal
b894657aa7 assert the same things without mutating or accessing internal ops state [pr] (#8157)
* don't mutate internal state in test_lazybuffer

* fix test_schedule internals

* save time

* third si

* fine sometimes buffer_view isn't there
2024-12-11 22:01:27 +08:00
qazal
63de8f2208 late scheduler context builder [pr] (#8155) 2024-12-11 19:59:39 +08:00
chenyu
d462f8ace0 use HALF in cifar wino benchmarks (#8153)
more representative as it hits tensor cores on tinyboxes
2024-12-10 20:21:00 -05:00
George Hotz
c8e7707a7e hotfix: disable flaky move tensor test 2024-12-10 17:11:21 -08:00
chenyu
155f7df599 lower test_gemm_4096 expectation on green (#8152)
getting 119 sometimes, so lowered to 115
2024-12-10 18:05:12 -05:00
chenyu
c4be1529cf update test for Tensor.softplus (#8150)
test beta and extreme inputs.
to pass big input, it needs to support `threshold`, which needs fix on backward that we punt until new gradient api
2024-12-10 17:48:02 -05:00
Ahmed Harmouche
a8cfdc70ed Run more webgpu tests (#8142) 2024-12-10 23:20:04 +01:00
Ahmed Harmouche
ed7318a3f5 Fix puppeteer install (#8148)
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
George Hotz
a1b3724ff8 prepickle process replay [pr] (#8147) 2024-12-10 11:46:36 -08:00