qazal
79fb5c6470
hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] ( #8928 )
2025-02-06 16:27:59 +02:00
George Hotz
ae45826758
hotfix: GRAPH_ONE_KERNEL + fix timing
2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27
Revert "objc fast msg ( #8922 )" ( #8926 )
...
This reverts commit c3f99a727e .
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e
objc fast msg ( #8922 )
...
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* new objc message style [pr]
* without sync
* no div 0
* lru cache that
* no sync in the profile
* fix
* update all to new style
* remove comment
* graph one kernel
* fix graph one kernel
* remove that sync
2025-02-06 17:49:06 +08:00
George Hotz
a8e54df363
benchmark single kernel launch ( #8921 )
...
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* without sync
* no div 0
* lru cache that
* no sync in the profile
2025-02-06 13:35:34 +08:00
Josh Moore
44e0eab8fd
Fix AttributeError occurring after ValueError in _apply_uop ( #8905 )
...
* Fix AttributeError occurring after ValueError in _apply_uop
* Update tensor.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-06 10:56:29 +08:00
chenyu
30695da256
remove Tensor._to_const_val ( #8917 )
...
* remove Tensor._to_const_val
added a TODO for advance indexing on const, which was the last place that checks const in Tensor
* that is not folding now
* one more
2025-02-05 21:44:39 -05:00
uuuvn
09ec33a578
Better errors when relocating against undefined symbol ( #8902 )
2025-02-06 10:13:44 +08:00
chenyu
488200f16c
move more pow const to rewrite ( #8916 )
...
* move more pow const to rewrite
one less use of _to_const_val
* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa
move positive const ** t to a rewrite rule ( #8914 )
...
* move positive const ** t to a rewrite rule
* one more test
2025-02-05 19:30:12 -05:00
chenyu
189bfa164e
enable backward test for pow(neg const ** x) ( #8912 )
...
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
Ignacio Sica
aec3b8d515
add regression test: test_get_kernel_actions_preserves_actions_state ( #8907 )
...
* test_get_kernel_actions_preserves_actions_state
* simplify
* simplify
* refactor assert message
2025-02-05 14:13:01 -05:00
Ignacio Sica
15f94ac964
TC_SEARCH_OVER_SHAPE to search multiple TC shapes ( #8793 )
...
* squash search over search
* refactor assert
* init benchmark
* cleaner get_kernel_actions
* cleaner get_kernel_actions
* add comment
2025-02-05 11:03:46 -05:00
qazal
6f0cc2e9c5
rename to KernelContext and move the linearize_sched comment [pr] ( #8899 )
...
* rename to KernelContext and move that comment [pr]
* 500
2025-02-05 07:49:58 +01:00
George Hotz
c1c5227acb
preserve size in dtype ptr [pr] ( #8898 )
2025-02-05 14:38:57 +08:00
eliotgolding
bb5ded85cc
Don't rewrite idiv to rshift when numerator is negative ( #8885 )
...
* more conditions for shift rewrite mul/idiv
* make ptx test uint so the new condition is true
* delete idiv test
* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division
* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
chenyu
48349efdc1
copy is already contiguous ( #8886 )
2025-02-04 17:53:33 -05:00
qazal
6a0da51ed0
truncate process replay logs [pr] ( #8891 )
...
* truncate process replay logs [pr]
* work
* max_lines
* bump to 1K
2025-02-04 20:26:48 +01:00
qazal
acf0baefee
process replay from tensor uops to kernel ast ( #8883 )
...
* process replay from tensor uops to kernel ast
* this dedups
* switch back to string key
2025-02-04 18:09:20 +01:00
George Hotz
56fa5c1191
dsp simulator ( #8869 )
...
* dsp simulator
* progress
* fix
* close on test tiny
* working
* less waste
* line savings
* Device DSP compiler
* mock DSP at the bottom
* DSP tests
* docker caching
* test update
* need load
* skip that test for CI DSP
* last touch
* ugh
2025-02-04 09:45:04 +08:00
chenyu
836cf42c2e
fix rand_like for multi ( #8880 )
2025-02-03 19:00:14 -05:00
chenyu
746d899dbd
move multi axis to property ( #8879 )
...
also updated tests so that axis is known prior to realize
2025-02-03 16:02:09 -05:00
chenyu
cce26009f0
simplify pow to not call cos ( #8877 )
...
use %2 instead of cos to detect even numbers
2025-02-03 12:54:18 -05:00
George Hotz
af2c2837f6
hotfix: skip broken test, add KERNEL Op
2025-02-03 14:02:55 +08:00
qazal
83a904aaad
just schedule in test_recursive_pad [pr] ( #8860 )
2025-02-02 15:01:24 +02:00
FICTURE7
66306b5321
Fix disk tensor assignment ( #8855 )
...
* Add test for disk tensor assignment failure
* Fix disk tensor assignment
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-02-02 13:50:34 +02:00
Ali Ladjevardi
6e523e4d17
Remove size arg from DEFINE_LOCAL [pr] ( #8845 )
...
* remove size arg form DEFINE_LOCAL
* make mypy happy
* whitespace
* dont change code in extra
* revert to temp1 to pass pr
2025-02-02 19:47:32 +08:00
nimlgen
7841852870
hcq pci signal fuzzer ( #8854 )
...
* hcq pci signal fuzzer
* kk
* correct
2025-02-01 23:42:27 +03:00
qazal
dc34a4146f
better process_replay context print [pr] ( #8856 )
...
* better process_replay context print [pr]
* test: revert push cast
* Revert "test: revert push cast"
This reverts commit 38a2aef6f8 .
2025-02-01 21:50:23 +02:00
chenyu
5b1fc4dcb2
push cast to branches in UOp where ( #8850 )
2025-02-01 13:55:24 -05:00
chenyu
73ee2d74c0
raise RuntimeError for int base pow ( #8852 )
...
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
qazal
72e1f41f8e
add unbind_vars pattern matcher ( #8851 )
...
* add unbind_vars pattern matcher [pr]
* this can be cvar
* this is empty
2025-02-01 18:25:44 +02:00
George Hotz
431a86615d
fix multi Ops.CONTIGUOUS_BACKWARD [pr] ( #8843 )
2025-02-01 09:21:31 +08:00
Ahmed Harmouche
07d3676019
weights_only=False ( #8839 )
2025-01-31 17:16:47 -05:00
chenyu
1f730ae8f8
remove retain_graph in Tensor.backward [pr] ( #8835 )
...
not used. gradient accumulation works directly
2025-01-31 13:41:26 -05:00
chenyu
0a59db936a
raise RuntimeError in schedule_step if not Tensor.training [pr] ( #8834 )
2025-01-31 12:03:04 -05:00
qazal
af4f9d1aa9
use matchers to verify AST shape [pr] ( #8828 )
...
* use matchers to verify kernel AST [pr]
* work
* use swizzle_cnt
* add comment
* imports
* modified_ast comment
* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6
tensor uop spec should be in spec.py [pr] ( #8827 )
...
* tensor uop spec should be in spec.py [pr]
* err, spec.py
* print uops can stay
2025-01-31 13:54:04 +08:00
qazal
a78f0f85d3
remove support for checking tensor uops in FUSE_ARANGE [pr] ( #8829 )
2025-01-31 07:48:28 +02:00
qazal
1fce864a6d
delete multi output support ( #8822 )
...
* delete multioutput for now
* test_schedule
* test_assign too
* linter
* 515 for sd
* update tests and ctx
* update that assign check
2025-01-30 22:45:50 -05:00
Ankit Avinash
7647cd8428
[bounty] Stride is flip ( #8792 )
...
* replace stride with flip
* Complete replacing stride with flip
clean flip function in view.py
fix tests
* fix tests for multi shapetracker
* fix tests for fuzz shapetracker
* fix tests for fuzz shapetracker
* debug
* debug
* fix
* fix
* fix
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-31 11:34:10 +09:00
chenyu
0513b0c17d
lower green test_gemm_8192 tflops to 125 [pr] ( #8820 )
...
flaky
2025-01-30 17:30:08 -05:00
Ignacio Sica
f0924e0857
fix and test ( #8814 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-30 16:35:53 -05:00
qazal
530961f7d5
realized only exists on base ( #8815 )
...
* realized only exists on base [pr]
* shorter
* update that too
2025-01-30 23:02:25 +02:00
Sieds Lykles
7cdc607544
add max as associative ( #8816 )
2025-01-30 16:01:42 -05:00
qazal
5643429c17
give BUFFER UOp a ShapeTracker [pr] ( #8811 )
...
* give BUFFER UOp a ShapeTracker [pr]
* move that
* update contiguous
* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
chenyu
5527f86a8f
skip tests in test_indexing that set stride with lazydata.view [pr] ( #8813 )
2025-01-30 15:17:35 -05:00
nimlgen
a2faa5e49b
am: fix pt free ( #8810 )
2025-01-30 15:14:55 +03:00
Sieds Lykles
78c0455c7a
Better stable sigmoid ( #8806 )
...
Uses `1/(x*x) -> 1/x * 1/x` together with `x/(1+x) -> 1-1/(1+x)` to
rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))`
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-29 16:08:53 -05:00
Ignacio Sica
260df1a17f
tc_select noop (#8801 )
...
* tc_select noop
* revert changes in test
2025-01-29 13:53:23 -05:00