Commit Graph

4433 Commits

Author SHA1 Message Date
Sieds Lykles
864758423e Don't take const in gcd and change the "nothing_changed" condition (#7926)
* Don't take const in gcd and change the "nothing_changed" condition

Biggest difference is probably actually that I forgot to check if gcd
changed if nothing else changed
The TODO was fixed by not using the const in the gcd, and then taking it
out

* Fix more tests
2024-11-27 18:07:36 -05:00
chenyu
988d64900b add TODO case to test_mod_congruence (#7925)
same alu count but better bounds
2024-11-27 15:23:21 -05:00
geohotstan
cea5853cfa add Tensor.scatter (#7737)
* working I think

* where are my onnx scatter tests??

* forward_only for now

* try if nan hack fix NV

* looks like issue is different... CUDA WHY

* oops that was wrong. Try if this fixes CUDA

* simpler multiply

* actually finish this up tmrw morning :x

* fix tests?

* improve tests

* improve test and implementation

* fix ruff

* complete but lots of expected failure...

* reviewed tests

* add onnx tests

* is this a processing op?

* add return type to indicate that it's not in-place

* final cleanups

* use or and improve tests a little

* add masked_index_select

* call it masked_setitem instead

* try

* FIXED

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:52:04 -05:00
geohotstan
753f07e193 add circular pad mode to Tensor.pad (#7918)
* start

* send it

* no more neg circular pads

* quick fix onnx too

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:30:51 -05:00
chenyu
a58e289d77 Revert "prereqs for new block lin so PR works (#7919)" (#7921)
This reverts commit c53261b541.
2024-11-27 08:41:09 -05:00
George Hotz
c53261b541 prereqs for new block lin so PR works (#7919) 2024-11-27 15:07:54 +08:00
Sieds Lykles
d318867776 Factoring gcd out of mod (#7916)
* Factoring gcd out of mod

Curious if this will be faster/better

* Update bounds on test
2024-11-26 21:17:22 -05:00
qazal
ea57c52b99 base uop is always contiguous (#7907)
* base is always contiguous

* add test_late_fusion_post_permute_simpler

* Revert "swizzle tc [pr] (#7633)"

This reverts commit f02462c5cb.

* Revert "Revert "swizzle tc [pr] (#7633)""

This reverts commit a26b577d86.

* yay

* minimal diff
2024-11-26 20:13:29 +08:00
George Hotz
4e5bf9dc7a test assignment in jit (#7906)
* test assignment in jit

* don't waste lines

* skip broken test in webgpu
2024-11-26 17:37:00 +08:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
ff3f2a9c1a Revert "move attention upcast (#7830)" (#7903)
This reverts commit c07daf40e7.
2024-11-25 18:59:51 -05:00
chenyu
a49ca0c2ff clean up fully_flatten [pr] (#7885)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-25 06:53:18 -05:00
qazal
e823de3828 viz with bottom_up=True (#7894)
* add failing test

* single pass it

* linter
2024-11-25 17:56:48 +08:00
George Hotz
9d0038bccb small changes from block linearizer [pr] (#7888)
* small changes from block linearizer [pr]

* fix test_gc
2024-11-25 15:27:04 +08:00
Sieds Lykles
a49a7c4784 Improved mod folding (#7887)
* Remove uneccessary if statement

In all paths where something_changed was set to True, remainder is
appended so the list can't be empty

* Working version of improved mod folding

* Fix offset calculation

Passing fuzz_symbolic.py to 130_000 so far
Added an extra test

* Cleaner offset calculation
2024-11-24 22:21:34 -05:00
leopf
5d92efb121 [BUGFIX] Tensor([]).data() (#7884)
* added test, fix

* fix only for (0,) shape

* Revert "fix only for (0,) shape"

* test_data_empty_multi_dim
2024-11-24 16:42:57 -05:00
geohotstan
ad9df26fba add test for inconsistent behavior in float to int casting (#7870)
* found teeny bug

* no healthcheck

* change function name
2024-11-24 07:31:34 -05:00
qazal
5aee78a0a6 fix uop swizzle on BUFFER, new tests (#7875)
* fix uop swizzle on BUFFER, new tests

* can have view of view
2024-11-24 17:11:09 +08:00
George Hotz
8c3d3181dd bottom up rewrite fixes substitute [pr] (#7862)
* single pass rewrite fixes substitute [pr]

* caching for single_pass_rewrite

* allow multiple rewrites

* a simple test

* bottom_up_rewrite is fully flexible
2024-11-23 20:53:37 +08:00
George Hotz
144e9f00df viz is local, new test, and new quantize [pr] (#7859)
* viz is local, new test, and new quantize [pr]

* fix mime types

* remove font

* after index
2024-11-23 14:27:10 +08:00
chenyu
c07daf40e7 move attention upcast (#7830)
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
2024-11-22 17:10:51 -05:00
chenyu
5c5b1b994c less flaky benchmarks (#7855)
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
3b26e51fce Tensor.cummax (#7854)
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
chenyu
40d7535eeb clean up DTYPES_DICT [pr] (#7845) 2024-11-22 10:01:34 -05:00
qazal
9828277c03 view doesn't have buffer, fix the tests [pr] (#7841)
* view doesn't have buffer, fix the tests [pr]

* need assigns
2024-11-22 20:41:55 +08:00
chenyu
69e382216d fix wino conv output dtype for half inputs (#7829) 2024-11-21 12:13:54 -05:00
geohotstan
cf1ec90ad4 add inverse trig functions to Tensor (#7805)
* implement inverse trig functions

* guess we should still test nans?

* magnitude as variable name :D

* reorder onnx_ops ops

* approximation -> x for consistency

* address feedback

* simpler acos

* improvement?

* actually just have asin depend on atan

* actually this is nicer

* remove a comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-21 09:13:36 -05:00
qazal
5399ff6d06 add UOp.const_with_shape [pr] (#7825)
* add UOp.const_with_shape [pr]

* lines
2024-11-21 21:13:23 +08:00
qazal
e378aeb94e assert view degrade to const tests post scheduler graph_rewrite [pr] (#7822)
* assert view degrade to const tests post scheduler graph_rewrite [pr]

* low pri, probably tricky, todo
2024-11-21 19:00:41 +08:00
qazal
75c082b883 move CONST/BIND -> VALID to matchers (#7818)
* delete special const

* move CONST/BIND -> VALID to matchers

* unittests

* fix FUSE_ARANGE=1

* split into two upats

* the right way to access view
2024-11-21 16:07:01 +08:00
George Hotz
e9ae2ccd09 _prg to match _buf [pr] (#7816) 2024-11-21 12:44:48 +08:00
George Hotz
c5d458ce02 BufferSpec and ProgramSpec [pr] (#7814)
* BufferSpec and ProgramSpec [pr]

* delete preallocate, it's unused

* Revert "delete preallocate, it's unused"

This reverts commit dcfcfaccde.
2024-11-21 12:18:05 +08:00
George Hotz
9df5a62c5e unify to HWQueue [pr] (#7812)
* unify to HWCommandQueue [pr]

* all is HWQueue
2024-11-21 10:33:08 +08:00
chenyu
11cea00090 lower vs_theoretical conv tflops threshold for nv (#7811)
less flaky
2024-11-20 20:03:49 -05:00
ignaciosica
fc3154a7b3 metal bf16 tc support [pr] (#7408)
* add bf16 tc for metal

* hotfix: spacing

* fix tolerance and skip metal bf16 in ci

* hotfix: check for dtype_out

* hotfix: add check for tc.dtype_out is bf16 back

* hotfix: add parens
2024-11-20 14:39:08 -05:00
geohotstan
66a069ee25 add replicate mode to Tensor.pad (#7802)
* base implementation

* add tests

* actually remove the assertionerror test

* good
2024-11-20 08:39:58 -05:00
George Hotz
eb0bb7dc0b final dname to device [pr] (#7806)
* final dname to device [pr]

* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
bc977fec53 dname -> device [pr] (#7804)
* dname -> device [pr]

* a few more

* only one left
2024-11-20 17:57:14 +08:00
ttomsa
9adeb1041c fix advanced setitem with 1 in shape (#7797)
* fix advanced setitem with 1 in shape

* linter
2024-11-19 20:04:59 -05:00
ttomsa
170ece6605 fix advanced setitem overlap with 0 (#7793)
* fix advanced setitem overlap with 0

* fix comment
2024-11-19 16:03:55 -05:00
Gaétan Lepage
159c0bf25e test_kernel_cache_in_action: fix test (#7792) 2024-11-19 13:34:56 -05:00
Eitan Turok
56017c52a0 Raise error when model architecture does not match state dict (#7772)
* init

* style

* style

* style

* fix test
2024-11-20 00:11:54 +08:00
George Hotz
d71fe7faa5 rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00
geohotstan
aeaf574a05 add failure test for setitem bug (#7786)
* add failure test

* rename

* improve tests

* improve tests and no need numpy
2024-11-19 08:54:21 -05:00
qazal
1e31b5ba6b hotfix: ctx doesn't impact process replay [pr] (#7785) 2024-11-19 20:17:01 +08:00
chenyu
26200574dc load_state_dict test cases when model and data shard differently (#7774)
current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change.
but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load.

it should either always use model shard axis or always use state_dict shard
2024-11-18 16:08:24 -05:00
Francis Lata
a1c1b9547f Context manager support for tqdm (#7770)
* add context manager support

* add test case for context manager usage
2024-11-18 14:12:03 -05:00
geohotstan
8100109c9d Add replicate mode to Tensor.pad (#7608)
* base implementation

* add tests

* actually remove the assertionerror test

* actually only have reflect for this pr

* change the 4 if-else one liner

* maybe use a lambda

* fix

* maybe a lil cleaner

* fix tests

* complete

* small change

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-18 10:55:38 -05:00
chenyu
66d7d5af50 fix Tensor(MultiLazyBuffer) with different dtype should fail (#7757)
similar to Tensor(LazyBuffer) as we don't cast implicitly
2024-11-17 21:05:45 -05:00
chenyu
df817297b6 fix passing acc_dtype="" to Tensor.prod should fail (#7750)
similar to sum
2024-11-17 11:38:13 -05:00