Commit Graph

7688 Commits

Author SHA1 Message Date
chenyu
73ee2d74c0 raise RuntimeError for int base pow (#8852)
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
qazal
72e1f41f8e add unbind_vars pattern matcher (#8851)
* add unbind_vars pattern matcher [pr]

* this can be cvar

* this is empty
2025-02-01 18:25:44 +02:00
nimlgen
b3fa76419a am: move queues to gpus (#8848)
* am: fix

* add flsg for thos

* do not depend on host parameter,
2025-02-01 18:02:52 +03:00
George Hotz
42d7c800a1 hotfix: add missing tinychat fonts + other assets 2025-02-01 09:34:44 +08:00
George Hotz
431a86615d fix multi Ops.CONTIGUOUS_BACKWARD [pr] (#8843) 2025-02-01 09:21:31 +08:00
Ahmed Harmouche
07d3676019 weights_only=False (#8839) 2025-01-31 17:16:47 -05:00
nimlgen
741bbc900d Revert "am: queues allocated on gpus (#8836)" (#8837)
This reverts commit 7bbb568dec.
2025-01-31 22:53:41 +03:00
nimlgen
7bbb568dec am: queues allocated on gpus (#8836)
* am: fix

* add flsg for thos
2025-01-31 22:14:43 +03:00
chenyu
1f730ae8f8 remove retain_graph in Tensor.backward [pr] (#8835)
not used. gradient accumulation works directly
2025-01-31 13:41:26 -05:00
chenyu
0a59db936a raise RuntimeError in schedule_step if not Tensor.training [pr] (#8834) 2025-01-31 12:03:04 -05:00
qazal
af4f9d1aa9 use matchers to verify AST shape [pr] (#8828)
* use matchers to verify kernel AST [pr]

* work

* use swizzle_cnt

* add comment

* imports

* modified_ast comment

* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6 tensor uop spec should be in spec.py [pr] (#8827)
* tensor uop spec should be in spec.py [pr]

* err, spec.py

* print uops can stay
2025-01-31 13:54:04 +08:00
qazal
a78f0f85d3 remove support for checking tensor uops in FUSE_ARANGE [pr] (#8829) 2025-01-31 07:48:28 +02:00
qazal
2a33750e4c simpler group_realizes + ScheduleItem construction [pr] (#8825) 2025-01-31 06:34:53 +02:00
George Hotz
e63d160376 hotfix: sched comment 2025-01-31 12:10:04 +08:00
qazal
1fce864a6d delete multi output support (#8822)
* delete multioutput for now

* test_schedule

* test_assign too

* linter

* 515 for sd

* update tests and ctx

* update that assign check
2025-01-30 22:45:50 -05:00
Ankit Avinash
7647cd8428 [bounty] Stride is flip (#8792)
* replace stride with flip

* Complete replacing stride with flip

clean flip function in view.py
fix tests

* fix tests for multi shapetracker

* fix tests for fuzz shapetracker

* fix tests for fuzz shapetracker

* debug

* debug

* fix

* fix

* fix

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-31 11:34:10 +09:00
chenyu
0513b0c17d lower green test_gemm_8192 tflops to 125 [pr] (#8820)
flaky
2025-01-30 17:30:08 -05:00
Ignacio Sica
f0924e0857 fix and test (#8814)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-30 16:35:53 -05:00
qazal
f5da275f46 simpler remove_movement_ops [pr] (#8818) 2025-01-30 23:32:52 +02:00
qazal
c8d878a5c1 remove r.lazydata.buf_uop_view [pr] (#8817) 2025-01-30 23:14:36 +02:00
qazal
530961f7d5 realized only exists on base (#8815)
* realized only exists on base [pr]

* shorter

* update that too
2025-01-30 23:02:25 +02:00
Sieds Lykles
7cdc607544 add max as associative (#8816) 2025-01-30 16:01:42 -05:00
qazal
5643429c17 give BUFFER UOp a ShapeTracker [pr] (#8811)
* give BUFFER UOp a ShapeTracker [pr]

* move that

* update contiguous

* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
chenyu
5527f86a8f skip tests in test_indexing that set stride with lazydata.view [pr] (#8813) 2025-01-30 15:17:35 -05:00
nimlgen
a2faa5e49b am: fix pt free (#8810) 2025-01-30 15:14:55 +03:00
qazal
9df8e34160 prereqs for giving BUFFER UOps a ShapeTracker [pr] (#8809) 2025-01-30 13:30:24 +02:00
Sieds Lykles
78c0455c7a Better stable sigmoid (#8806)
Uses `1/(x*x) -> 1/x * 1/x`  together with `x/(1+x) -> 1-1/(1+x)` to
rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))`

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-29 16:08:53 -05:00
chenyu
cac2b4e8b6 bump db version (#8805)
OptOp spec changed
2025-01-29 15:34:43 -05:00
nimlgen
a2d55fb644 am: unset CGCG override (#8804) 2025-01-29 22:28:06 +03:00
chenyu
7f606fbde4 remove DEBUG=5 in windows ci test [pr] (#8803)
DEBUG=5 prints a lot of info that's slow, and is not visible if test passed on CI.
also skip two tests that took 3 minutes in python backend
2025-01-29 14:18:17 -05:00
Ignacio Sica
260df1a17f tc_select noop (#8801)
* tc_select noop

* revert changes in test
2025-01-29 13:53:23 -05:00
FICTURE7
ec120ce6b9 Fix allocator memory alignment (#8800)
* Fix allocator memory alignment

* Run `test_ops.py` using LLVM and CLANG on Windows
2025-01-29 21:03:17 +03:00
nimlgen
50ba2bb642 am: move ring to host mem (#8802) 2025-01-29 20:56:11 +03:00
chenyu
c7ca7959e6 set DISABLE_DROPOUT=1 in bert script for now (#8799) 2025-01-29 10:51:29 -05:00
qazal
199a36d079 add pagination to viz [pr] (#8794)
* add pagination to viz [pr]

* work

* lint
2025-01-29 04:21:53 +02:00
qazal
ba17786068 do not construct unmasked VALID (#8759)
* new lines that exist in codegen/ops

* update tests

* update sops.gz (13071 -> 13070 asts)

* fix viz too

* remove that TODO

* diff pruning

* mask assert + device

* work

* diff pruning

* re: fix viz too

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-28 20:51:21 +02:00
qazal
3417bc1814 fix ShapeTracker spec for const [pr] (#8791) 2025-01-28 19:53:36 +02:00
nimlgen
801ec9e697 am: no hardcoded clocks (#8788)
* am: no hardcoded clocks

* better
2025-01-28 20:18:46 +03:00
qazal
e724af74d7 allow VIEW source in DEFINE_VAR spec [pr] (#8790) 2025-01-28 17:42:14 +02:00
b1tg
da464d039f fix windows ci cache (#8787)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-01-28 13:22:15 +02:00
qazal
e8be8a5835 support lowering CONST(VIEW) in lowerer (#8785) 2025-01-28 12:04:41 +02:00
George Hotz
80089536e5 Revert "move llvm_bf16_cast to renderer for CLANG and LLVM [pr] (#8720)" (#8786)
This reverts commit af0452f116.
2025-01-28 18:59:02 +09:00
b1tg
5d62aa28dc Support CLANG backend on Windows (#8768)
* Support CLANG on Windows

* Put both backends in a windows ci

* remove coff loader

* use memmove

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-28 18:19:34 +09:00
mesozoic-egg
af0452f116 move llvm_bf16_cast to renderer for CLANG and LLVM [pr] (#8720)
* handle bf16 via bitcasting for CLANG and LLVM

* On LLVM, skip float16 cast

* float32 on llvm lite, float32 elsewhere

* code format

* trigger pr

* move to rewriter

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-28 18:16:43 +09:00
nimlgen
d66680b17e hotfix: am: fix hang (#8783) 2025-01-28 11:54:19 +03:00
qazal
a65d2917cb remove unused fields from viz uop_to_json [pr] (#8782) 2025-01-28 10:50:11 +02:00
qazal
aefbc2637f test fixups from unmasked valid deletion [pr] (#8776) 2025-01-28 09:23:30 +02:00
qazal
ed672881b0 remove additions/deletion in pr + check uops are equal [pr] (#8779)
* use warnings there [pr]

* remove those + move assert_diff [pr]

* warn after log

* remove

* back
2025-01-28 08:57:34 +02:00
Ignacio Sica
2c71c60719 opt arg is int or tuple (#8780) 2025-01-28 11:02:32 +09:00