Commit Graph

6994 Commits

Author SHA1 Message Date
qazal
8780818d04 Revert "schedule sink folding with graph_rewrite [pr] (#7963)" (#7965)
This reverts commit 4529c5d0da.
2024-11-30 19:02:06 +08:00
qazal
4529c5d0da schedule sink folding with graph_rewrite [pr] (#7963)
* schedule sink folding with graph_rewrite [pr]

* x is reserved, use u

* match lazy const folding
2024-11-30 18:30:41 +08:00
nimlgen
10f431b96d hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
chenyu
aa51f3c14e update kernel counts in test_real_world (#7960)
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.

if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
nimlgen
d3660ccc51 prereqs for hcq updates removal (#7959)
* hcq signals touch ups

* hcq compiled has device id

* helpers

* prreq hcq api

* oops
2024-11-29 18:20:07 +03:00
geohotstan
e1a85c262c no type-tracker getitem refactor (#6917)
* newest newer than new refactor of getitem

* hmmm

* hmmmmmmmmmmmmmmmmm

* bro.

* ???

* small improvements

* cleaner, but why u gotta do this to me mypy

* fix, but still dunno about mypy

* even better

* try again? Passes locally

* use match

* fix mypy

* better

* broooooo check this out

* fix mypy

* bug fix

* fixed

* polish
2024-11-29 10:18:02 -05:00
Sieds Lykles
d267a2d9eb Div mod recombine test for issue (#7957)
* Add test for failing div_mod recombine

* Add test case when there is gcd in div/mod
2024-11-29 08:47:50 -05:00
qazal
e54ff0d3af conceptual uop st cleanup [pr] (#7956)
* conceptual uop st cleanup [pr]

* unwrap is fine here, better than arg
2024-11-29 19:35:46 +08:00
Ahmed Harmouche
2d11765295 Fix WebGPU atomic store (#7954) 2024-11-29 19:31:25 +08:00
nimlgen
309dcb1044 hcq signal add sleep (#7955)
* hcqsignal sleep

* fixes

* typing

* time ms is int
2024-11-29 14:04:45 +03:00
qazal
30f0e95fbd don't lru_cache is_scheduled [pr] (#7953) 2024-11-29 17:03:55 +08:00
qazal
f044271898 big graph do_realize cleanup and renames [pr] (#7952)
* scheduler do_realize cleanup and renames [pr]

* big graph is the better name

* more language

* append_kernel -> append_realize
2024-11-29 14:58:45 +08:00
ignaciosica
6e47dc8921 true tc swizzle [pr] (#7951)
* true tc swizzle

* cleanup

* fix linter
2024-11-29 14:39:46 +08:00
geohotstan
765096fe7d fix Tensor._pool edge case (#7581)
* split into another branch

* polish

* try this

* Revert "try this"

This reverts commit 84f711b13e.

* try

* Revert "try"

This reverts commit 89c7a7649b.

* idk anymore

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-28 23:17:13 -05:00
chenyu
70f052d2b8 flip IF and RANGE order (#7947)
this is the rest of #7919 prereqs for new block lin
2024-11-28 13:35:30 -05:00
chenyu
bb23469f93 lower conv threshold on red (#7948) 2024-11-28 13:31:06 -05:00
chenyu
e243e709a7 BLOCK ops in Ops (#7945)
did this break conv speed?
2024-11-28 12:44:22 -05:00
qazal
f39e9b4288 match lazy movement ops in uop [pr] (#7944) 2024-11-28 23:03:43 +08:00
chenyu
f54508549f don't search conv weight init in speed_v_theoretical (#7943) 2024-11-28 10:03:18 -05:00
chenyu
3c8c98253a BEAM_DEBUG=1 in speed_v_theoretical (#7942)
* DEBUG=3 in speed_v_theoretical

* BEAM_DEBUG=1
2024-11-28 08:30:55 -05:00
qazal
aa7e16744e allow sinking childless consts and fold them [pr] (#7941) 2024-11-28 20:23:37 +08:00
qazal
3ab67d45b2 init changes from the global buffer branch [pr] (#7939) 2024-11-28 19:38:58 +08:00
nimlgen
81d415be03 amd pkt3 refactor (#7923)
* amd pkt3 refactor

* replace this

* linter

* fix

* cmt

* fast

* simpler

* linter

* smth

* missing
2024-11-28 11:06:37 +03:00
qazal
e3fe7023b0 move all VIEW -> LOAD rules to big graph rewrite [pr] (#7936)
* move all VIEW -> LOAD rules to big graph rewrite [pr]

* comments
2024-11-28 14:02:29 +08:00
qazal
e2eccdab43 swizzle upat consistency + assert it's base [pr] (#7935) 2024-11-28 13:35:55 +08:00
George Hotz
c5c3b05b5a block lin: only the test changes (#7933) 2024-11-28 13:19:00 +08:00
George Hotz
32dbab945c Revert "add block uops and modify tests (#7931)" (#7932)
This reverts commit 6f4519ff45.
2024-11-28 13:15:41 +08:00
George Hotz
6f4519ff45 add block uops and modify tests (#7931) 2024-11-28 13:11:18 +08:00
chenyu
336a9b6bf3 remove dtype from llama precompute_freqs_cis (#7930)
do the cast based on input in first forward call instead
2024-11-27 22:28:40 -05:00
chenyu
3e2430f822 use tqdm tqdm in mlperf training (#7929)
issue in benchmark dashboard logging, revert back to tqdm tqdm for now
2024-11-27 21:57:05 -05:00
Sieds Lykles
864758423e Don't take const in gcd and change the "nothing_changed" condition (#7926)
* Don't take const in gcd and change the "nothing_changed" condition

Biggest difference is probably actually that I forgot to check if gcd
changed if nothing else changed
The TODO was fixed by not using the const in the gcd, and then taking it
out

* Fix more tests
2024-11-27 18:07:36 -05:00
chenyu
988d64900b add TODO case to test_mod_congruence (#7925)
same alu count but better bounds
2024-11-27 15:23:21 -05:00
chenyu
57262c8e34 update Tensor.scatter doc examples (#7924)
same example from torch, i think it's much more useful
2024-11-27 11:42:36 -05:00
geohotstan
cea5853cfa add Tensor.scatter (#7737)
* working I think

* where are my onnx scatter tests??

* forward_only for now

* try if nan hack fix NV

* looks like issue is different... CUDA WHY

* oops that was wrong. Try if this fixes CUDA

* simpler multiply

* actually finish this up tmrw morning :x

* fix tests?

* improve tests

* improve test and implementation

* fix ruff

* complete but lots of expected failure...

* reviewed tests

* add onnx tests

* is this a processing op?

* add return type to indicate that it's not in-place

* final cleanups

* use or and improve tests a little

* add masked_index_select

* call it masked_setitem instead

* try

* FIXED

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:52:04 -05:00
JaSpa99
38f34ca0cb prepare mypy==1.13.0: legacy cast (#7866)
* use helper to narrow literal type

* narrow with asserts instead of cast

* remove parantheses

* tensor.item() calls tensor.data()

* no copy

* proper indexing
2024-11-27 10:33:35 -05:00
geohotstan
753f07e193 add circular pad mode to Tensor.pad (#7918)
* start

* send it

* no more neg circular pads

* quick fix onnx too

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:30:51 -05:00
chenyu
a58e289d77 Revert "prereqs for new block lin so PR works (#7919)" (#7921)
This reverts commit c53261b541.
2024-11-27 08:41:09 -05:00
George Hotz
c53261b541 prereqs for new block lin so PR works (#7919) 2024-11-27 15:07:54 +08:00
chenyu
a6171cbe71 add stable diffusion v2 to mac benchmark (#7917)
this caught #7902
2024-11-26 22:09:43 -05:00
Sieds Lykles
d318867776 Factoring gcd out of mod (#7916)
* Factoring gcd out of mod

Curious if this will be faster/better

* Update bounds on test
2024-11-26 21:17:22 -05:00
nimlgen
84f96e48a1 hcq signal tiny refactor (#7913)
* hcq signal tiny refactor

* no mv

* fix

* fix2

* fix3
2024-11-26 21:48:38 +03:00
qazal
345457f518 webgpu cache packages (#7911)
* webgpu -n=auto

* fix webgpu ci cache
2024-11-27 00:17:36 +08:00
qazal
6102e3159c webgpu -n=auto (#7910) 2024-11-26 21:13:12 +08:00
qazal
cab461c2b5 match lazy view in uop try 2 (#7905)
* match lazy view in uop

* reswizzle

* p2

* assert count

* empty

* smaller diff
2024-11-26 20:31:50 +08:00
qazal
ea57c52b99 base uop is always contiguous (#7907)
* base is always contiguous

* add test_late_fusion_post_permute_simpler

* Revert "swizzle tc [pr] (#7633)"

This reverts commit f02462c5cb.

* Revert "Revert "swizzle tc [pr] (#7633)""

This reverts commit a26b577d86.

* yay

* minimal diff
2024-11-26 20:13:29 +08:00
qazal
ceda43ce75 always swizzle load st in wmma [pr] (#7908) 2024-11-26 20:00:58 +08:00
George Hotz
4e5bf9dc7a test assignment in jit (#7906)
* test assignment in jit

* don't waste lines

* skip broken test in webgpu
2024-11-26 17:37:00 +08:00
mesozoic-egg
0cd1cc29dc PTX simplify: use a dict matcher for prefix [pr] (#7890)
* use a dict matcher for prefix

* simplify tuple unpack

* simplify tuple unpack

* debug pr

* Revert "debug pr"

This reverts commit 3aa9f77517.

* define_acc boolean case

* remove commented lines

* wip

* no need for .scalar in define_acc

* indentation

* linter fix

* add keys to matcher from GroupOps directly

* put dtype in tuple directly

* cast, line too long fix

* check ptrdtype with isinstance

* dtype is always ptr for define_global

wip

* blank commit to trigger CI

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
2024-11-26 17:32:48 +08:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
ff3f2a9c1a Revert "move attention upcast (#7830)" (#7903)
This reverts commit c07daf40e7.
2024-11-25 18:59:51 -05:00