Commit Graph

7952 Commits

Author SHA1 Message Date
nimlgen
f986e12f91 metal: choose compile spec based on macos (#9188)
* metal: choose compile spec based on macos

* correction
2025-02-21 00:43:39 +03:00
chenyu
3e22747799 run unit test on windows ci (#9187)
* factor out testing_minimal in setup.py [pr]

* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6 use torch in test_gradient (#9186)
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
qazal
574a905291 Fix running VIZ=1 after package installation + test (#9183)
* test running viz from pip install

* add pkg

* do 10 connection attempts

* include assets in package_data

* quiet curl

* better print
2025-02-20 15:02:00 +01:00
chenyu
1692087db5 _one_hot_along_dim input needs to be int (#9179)
* _one_hot_along_dim input needs to be int

indexing and onehot compare with arange, and non-int dtype is likely a bug
2025-02-20 09:00:43 -05:00
George Hotz
bf36967883 cuda hooking (#9180)
* cuda hooking

* progress

* more hook cuda

* fix params

* compile + cuMemHostAlloc hook

* work

* revert that
2025-02-20 19:20:01 +08:00
chenyu
3b37cc898b add bert tiny config (#9177)
set with BERT_SIZE=tiny. easier to study embedding and fusion
2025-02-19 14:57:03 -05:00
qazal
5662c898f1 correctly step through bottom_up_rewrites in viz [pr] (#9176) 2025-02-19 19:20:57 +01:00
peppingdore
b1ddb2a1a6 fix win32 CPUProgram missing cache flush (#9171)
* win32: fix missing inst cache flush, rename ptr->self.mem for consistency with posix code

* fix types, remove assert

* fix memory leak

* rm whitespace
2025-02-19 21:38:51 +08:00
qazal
1bb9d78c7a hotfix: add output buffer back to kernel parents + comment [pr] (#9174) 2025-02-19 14:22:01 +01:00
chenyu
975c318dbc bert use int32 for input ids (#9173)
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00
qazal
e4a8bf28ea scheduler cleanups + better cycle assert [pr] (#9172)
* scheduler cleanups + better cycle assert [pr]

* type_verify after assign fixup

* don't need base

* always realize sink parents
2025-02-19 13:30:58 +01:00
qazal
cf315d544b rename can_pad arg to cache [pr] (#9170) 2025-02-19 12:24:59 +01:00
qazal
2fc8bf115d remove support for VIEW with two sources in ops [pr] (#9168)
* only 1 src views can exist [pr]

* views can still exist without a base, this is a separate project
2025-02-19 11:10:18 +01:00
Ahmed Harmouche
a2afa523a0 Only add enable f16 directive if ShaderF16 is supported (#9163)
* F16 in check in wgsl renderer

* Default in renderer to fix pickle

* Refactor f16 check
2025-02-19 17:20:03 +08:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00
qazal
4bc708a9b0 do not create buffers we never realize in scheduler (#9165)
* work

* delete

* fix

* works

* FUSE_CONV_BW

* FUSE_ARANGE

* becomes_map

* fix assign p1

* fix assign (diamond) - 2

* fix test_assign_double_diamond_reduce

* fix subbuffer

* faster rewrite

* fix simple_pads

* start metadata work

* do some diff cleanups

* make things that can't be images not images

* openpilot fix

* fix linter

* diff

* minimal diff

* more work on the diff

* metadata
2025-02-19 10:11:47 +01:00
George Hotz
1c4e9bc363 image fixup tensor map [pr] (#8611)
Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-19 10:11:06 +02:00
qazal
2a5fe3e700 whitespace changes from the map_tensors branch [pr] (#9167) 2025-02-19 09:52:59 +02:00
qazal
a773ff73e3 match image cast folding on the cast itself [pr] (#9166) 2025-02-19 09:31:34 +02:00
qazal
9a20063837 create subbuffer immediately before constructing ScheduleItem [pr] (#9162) 2025-02-18 21:07:52 +01:00
qazal
1c92534bff hotfix: viz should show if there's a rewrite [pr] (#9161) 2025-02-18 19:11:03 +01:00
George Hotz
a330f3338c save applied opts in ProgramSpec [pr] (#9150) 2025-02-19 00:40:03 +08:00
chenyu
ff05bff221 put bert data shard inside jit (#9160)
python time 45ms -> 9ms, it was spending time to schedule the shard

also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
qazal
679291e26a assert only base maps to buffer [pr] (#9159) 2025-02-18 15:46:47 +01:00
qazal
4f592eeea6 hotfix: remove extra matcher for copy/buffer_view [pr] (#9157) 2025-02-18 13:21:24 +01:00
George Hotz
ff9b985d9f hotfix: View Base AST 2025-02-18 18:48:34 +08:00
George Hotz
30f470eaa3 UNIQUE UOp for buffer instead of arg (#9156)
* UNIQUE UOp for buffer instead of arg

* factor out buffer spec
2025-02-18 16:59:59 +08:00
qazal
38f5ea2132 increment writable buffers refcount from the kernel graph [pr] (#9153) 2025-02-18 10:20:02 +02:00
George Hotz
ddddcc165b colors back in DEBUG=2 [pr] (#9155) 2025-02-18 16:17:57 +08:00
George Hotz
6d62966bf7 add support for named rewrites [pr] (#9152) 2025-02-18 16:07:04 +08:00
George Hotz
caee42e8a6 Revert "name from uops [pr] (#9151)" (#9154)
This reverts commit 28897be9a2.
2025-02-18 16:06:44 +08:00
George Hotz
28897be9a2 name from uops [pr] (#9151) 2025-02-18 15:52:03 +08:00
George Hotz
a4dab3ec3f add name uop (#9149)
* add name uop, TODO: refactor renderer to use

* renderer uses name uop

* fix tests

* render

* ptx
2025-02-18 15:26:58 +08:00
George Hotz
2db8b4046a minor linearizer refactor to finalize in rewrite [pr] (#9148) 2025-02-18 12:42:22 +08:00
George Hotz
df3b320f46 rewriter -> devectorizer [pr] (#9147) 2025-02-18 12:42:08 +08:00
chenyu
5dc1257ce0 clean up bert fake data iterator [pr] (#9145)
reuse the same get_data_bert path in setup and real run
2025-02-17 20:03:38 -05:00
qazal
751c517b6c cancel viz request after the kernel clicked away [pr] (#9144) 2025-02-17 20:19:09 +01:00
chenyu
465421b525 fix Tensor.isclose (#9143)
many corner cases around inf and nan
2025-02-17 12:03:12 -05:00
qazal
36741cbbc1 enable real_size assert for test_conv_2x2_backward_one_view [pr] (#9142) 2025-02-17 17:53:44 +01:00
qazal
e9ff4ef4f7 s/ScheduleContext/GrouperContext [pr] (#9141)
* refactor to kernel context [pr]

* s/ScheduleContext/GrouperContext [pr]
2025-02-17 17:14:17 +01:00
qazal
96cc9f59e0 refactor to kernel context [pr] (#9140) 2025-02-17 16:57:14 +01:00
qazal
df6781332e remove var_vals from the scheduler context [pr] (#9139)
* remove var_vals from the scheduler context [pr]

* maps to int
2025-02-17 16:43:50 +01:00
Ali Ladjevardi
35e9c4657b Use proper units when printing beam time (#9103)
* use proper units when printing beam time

* refactor DEBUG=2
2025-02-17 23:41:38 +08:00
Clément Verrier
a7f91224eb add Tensor.isclose() (#8844)
* add `Tensor.isclose()`

* support `equal_nan`

so as to match PyTorch's behavior

* update unit tests

* remove some tests temporarily

* re-enable one test

* re-enable other test

* try to fix failing tests during CI

* save one line of code

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 10:11:40 -05:00
qazal
2b787c3b17 hotfix: lower ul.disabled opacity for viz [pr] (#9138) 2025-02-17 15:16:48 +01:00
qazal
660c034da6 KERNEL op try 3 (#9061)
* work

* tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL)

* err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL)

* burn the boats

* assign slightly works

* assign works

* cleanup + var_vals can exist

* fine image + fix metadata

* metadata, without making everything 30% slower

* diff pruning

* faster assign schedule

* add_buffer_ops stage

* add kernel_spec back

* add viz display

* more strict kernel_spec
2025-02-17 14:47:54 +01:00
qazal
ec80df5115 add PROGRAM renderer to viz [pr] (#9137) 2025-02-17 14:46:08 +01:00
qazal
7b09a72682 don't display void dtype in viz nodes [pr] (#9136)
* don't display void dtype in viz nodes [pr]

* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7 move is_increasing to ops [pr] (#9134) 2025-02-17 19:27:48 +08:00