Commit Graph

4433 Commits

Author SHA1 Message Date
qazal
79fb5c6470 hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928) 2025-02-06 16:27:59 +02:00
George Hotz
ae45826758 hotfix: GRAPH_ONE_KERNEL + fix timing 2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
Josh Moore
44e0eab8fd Fix AttributeError occurring after ValueError in _apply_uop (#8905)
* Fix AttributeError occurring after ValueError in _apply_uop

* Update tensor.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-06 10:56:29 +08:00
chenyu
30695da256 remove Tensor._to_const_val (#8917)
* remove Tensor._to_const_val

added a TODO for advance indexing on const, which was the last place that checks const in Tensor

* that is not folding now

* one more
2025-02-05 21:44:39 -05:00
uuuvn
09ec33a578 Better errors when relocating against undefined symbol (#8902) 2025-02-06 10:13:44 +08:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa move positive const ** t to a rewrite rule (#8914)
* move positive const ** t to a rewrite rule

* one more test
2025-02-05 19:30:12 -05:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
Ignacio Sica
aec3b8d515 add regression test: test_get_kernel_actions_preserves_actions_state (#8907)
* test_get_kernel_actions_preserves_actions_state

* simplify

* simplify

* refactor assert message
2025-02-05 14:13:01 -05:00
Ignacio Sica
15f94ac964 TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793)
* squash search over search

* refactor assert

* init benchmark

* cleaner get_kernel_actions

* cleaner get_kernel_actions

* add comment
2025-02-05 11:03:46 -05:00
qazal
6f0cc2e9c5 rename to KernelContext and move the linearize_sched comment [pr] (#8899)
* rename to KernelContext and move that comment [pr]

* 500
2025-02-05 07:49:58 +01:00
George Hotz
c1c5227acb preserve size in dtype ptr [pr] (#8898) 2025-02-05 14:38:57 +08:00
eliotgolding
bb5ded85cc Don't rewrite idiv to rshift when numerator is negative (#8885)
* more conditions for shift rewrite mul/idiv

* make ptx test uint so the new condition is true

* delete idiv test

* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division

* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
chenyu
48349efdc1 copy is already contiguous (#8886) 2025-02-04 17:53:33 -05:00
qazal
6a0da51ed0 truncate process replay logs [pr] (#8891)
* truncate process replay logs [pr]

* work

* max_lines

* bump to 1K
2025-02-04 20:26:48 +01:00
qazal
acf0baefee process replay from tensor uops to kernel ast (#8883)
* process replay from tensor uops to kernel ast

* this dedups

* switch back to string key
2025-02-04 18:09:20 +01:00
George Hotz
56fa5c1191 dsp simulator (#8869)
* dsp simulator

* progress

* fix

* close on test tiny

* working

* less waste

* line savings

* Device DSP compiler

* mock DSP at the bottom

* DSP tests

* docker caching

* test update

* need load

* skip that test for CI DSP

* last touch

* ugh
2025-02-04 09:45:04 +08:00
chenyu
836cf42c2e fix rand_like for multi (#8880) 2025-02-03 19:00:14 -05:00
chenyu
746d899dbd move multi axis to property (#8879)
also updated tests so that axis is known prior to realize
2025-02-03 16:02:09 -05:00
chenyu
cce26009f0 simplify pow to not call cos (#8877)
use %2 instead of cos to detect even numbers
2025-02-03 12:54:18 -05:00
George Hotz
af2c2837f6 hotfix: skip broken test, add KERNEL Op 2025-02-03 14:02:55 +08:00
qazal
83a904aaad just schedule in test_recursive_pad [pr] (#8860) 2025-02-02 15:01:24 +02:00
FICTURE7
66306b5321 Fix disk tensor assignment (#8855)
* Add test for disk tensor assignment failure

* Fix disk tensor assignment

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-02 13:50:34 +02:00
Ali Ladjevardi
6e523e4d17 Remove size arg from DEFINE_LOCAL [pr] (#8845)
* remove size arg form DEFINE_LOCAL

* make mypy happy

* whitespace

* dont change code in extra

* revert to temp1 to pass pr
2025-02-02 19:47:32 +08:00
nimlgen
7841852870 hcq pci signal fuzzer (#8854)
* hcq pci signal fuzzer

* kk

* correct
2025-02-01 23:42:27 +03:00
qazal
dc34a4146f better process_replay context print [pr] (#8856)
* better process_replay context print [pr]

* test: revert push cast

* Revert "test: revert push cast"

This reverts commit 38a2aef6f8.
2025-02-01 21:50:23 +02:00
chenyu
5b1fc4dcb2 push cast to branches in UOp where (#8850) 2025-02-01 13:55:24 -05:00
chenyu
73ee2d74c0 raise RuntimeError for int base pow (#8852)
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
qazal
72e1f41f8e add unbind_vars pattern matcher (#8851)
* add unbind_vars pattern matcher [pr]

* this can be cvar

* this is empty
2025-02-01 18:25:44 +02:00
George Hotz
431a86615d fix multi Ops.CONTIGUOUS_BACKWARD [pr] (#8843) 2025-02-01 09:21:31 +08:00
Ahmed Harmouche
07d3676019 weights_only=False (#8839) 2025-01-31 17:16:47 -05:00
chenyu
1f730ae8f8 remove retain_graph in Tensor.backward [pr] (#8835)
not used. gradient accumulation works directly
2025-01-31 13:41:26 -05:00
chenyu
0a59db936a raise RuntimeError in schedule_step if not Tensor.training [pr] (#8834) 2025-01-31 12:03:04 -05:00
qazal
af4f9d1aa9 use matchers to verify AST shape [pr] (#8828)
* use matchers to verify kernel AST [pr]

* work

* use swizzle_cnt

* add comment

* imports

* modified_ast comment

* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6 tensor uop spec should be in spec.py [pr] (#8827)
* tensor uop spec should be in spec.py [pr]

* err, spec.py

* print uops can stay
2025-01-31 13:54:04 +08:00
qazal
a78f0f85d3 remove support for checking tensor uops in FUSE_ARANGE [pr] (#8829) 2025-01-31 07:48:28 +02:00
qazal
1fce864a6d delete multi output support (#8822)
* delete multioutput for now

* test_schedule

* test_assign too

* linter

* 515 for sd

* update tests and ctx

* update that assign check
2025-01-30 22:45:50 -05:00
Ankit Avinash
7647cd8428 [bounty] Stride is flip (#8792)
* replace stride with flip

* Complete replacing stride with flip

clean flip function in view.py
fix tests

* fix tests for multi shapetracker

* fix tests for fuzz shapetracker

* fix tests for fuzz shapetracker

* debug

* debug

* fix

* fix

* fix

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-31 11:34:10 +09:00
chenyu
0513b0c17d lower green test_gemm_8192 tflops to 125 [pr] (#8820)
flaky
2025-01-30 17:30:08 -05:00
Ignacio Sica
f0924e0857 fix and test (#8814)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-30 16:35:53 -05:00
qazal
530961f7d5 realized only exists on base (#8815)
* realized only exists on base [pr]

* shorter

* update that too
2025-01-30 23:02:25 +02:00
Sieds Lykles
7cdc607544 add max as associative (#8816) 2025-01-30 16:01:42 -05:00
qazal
5643429c17 give BUFFER UOp a ShapeTracker [pr] (#8811)
* give BUFFER UOp a ShapeTracker [pr]

* move that

* update contiguous

* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
chenyu
5527f86a8f skip tests in test_indexing that set stride with lazydata.view [pr] (#8813) 2025-01-30 15:17:35 -05:00
nimlgen
a2faa5e49b am: fix pt free (#8810) 2025-01-30 15:14:55 +03:00
Sieds Lykles
78c0455c7a Better stable sigmoid (#8806)
Uses `1/(x*x) -> 1/x * 1/x`  together with `x/(1+x) -> 1-1/(1+x)` to
rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))`

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-29 16:08:53 -05:00
Ignacio Sica
260df1a17f tc_select noop (#8801)
* tc_select noop

* revert changes in test
2025-01-29 13:53:23 -05:00