qazal
c7c279a6bd
unbind ShapeTrackers without maintaining a cache [pr] ( #8889 )
...
* replace with a try [pr]
* check vars
* ahaa
2025-02-04 19:43:41 +01:00
chenyu
61de654efa
minor shard cleanup [pr] ( #8888 )
2025-02-04 13:22:31 -05:00
qazal
6ec7f1b00f
replace UPat(name="x") with UPat.var("x") [pr] ( #8887 )
...
* replace UPat(name="x") with UPat.var("x") [pr]
* a few more
2025-02-04 19:12:40 +01:00
qazal
c26b06eaeb
delete fold_img_cast [pr] ( #8875 )
2025-02-04 18:43:45 +01:00
qazal
acf0baefee
process replay from tensor uops to kernel ast ( #8883 )
...
* process replay from tensor uops to kernel ast
* this dedups
* switch back to string key
2025-02-04 18:09:20 +01:00
Ignacio Sica
dcf104ee68
ptx wmma render refactor ( #8873 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-04 11:01:23 -05:00
qazal
b92f36179d
don't use set in schedule + add GroupOp.All [pr] ( #8882 )
...
* don't use set in schedule + add GroupOp.All [pr]
* update that
2025-02-04 08:19:27 +01:00
George Hotz
56fa5c1191
dsp simulator ( #8869 )
...
* dsp simulator
* progress
* fix
* close on test tiny
* working
* less waste
* line savings
* Device DSP compiler
* mock DSP at the bottom
* DSP tests
* docker caching
* test update
* need load
* skip that test for CI DSP
* last touch
* ugh
2025-02-04 09:45:04 +08:00
chenyu
836cf42c2e
fix rand_like for multi ( #8880 )
2025-02-03 19:00:14 -05:00
chenyu
746d899dbd
move multi axis to property ( #8879 )
...
also updated tests so that axis is known prior to realize
2025-02-03 16:02:09 -05:00
nimlgen
fa90079370
amd: reallocate scratch ( #8872 )
...
* amd: reallocate scratch
* use it
* oops
* allocate default
* mypy
* ops
* address realloc from none better
* types correct
* this better
* ops
* rm
2025-02-03 23:21:37 +03:00
chenyu
ec447a31e7
factor out get_axis in multi [pr] ( #8878 )
...
ALU/REDUCE_AXIS/RESHAPE/PERMUTE can change axis. prereq to move this logic to ops.py
2025-02-03 14:39:08 -05:00
chenyu
cce26009f0
simplify pow to not call cos ( #8877 )
...
use %2 instead of cos to detect even numbers
2025-02-03 12:54:18 -05:00
geohotstan
d1aa9f30bc
copy onnx_ops into onnx ( #8876 )
...
* just copy it over
* make OnnxOps a global var
* some small style stuff
* rerun CI but also some small clean up
* some comments
2025-02-03 12:15:07 -05:00
Ali Ladjevardi
73c75d6ee1
DEFINE_LOCAL variable names start from temp0, not temp1 ( #8870 )
2025-02-03 22:50:38 +08:00
qazal
b6c617272a
New schedule.py Order [pr] ( #8874 )
2025-02-03 14:59:11 +02:00
George Hotz
b075aefc12
hotfix: revert llvm host_arch
2025-02-03 16:46:19 +08:00
George Hotz
a5753095dc
llvm cleanups [pr] ( #8867 )
2025-02-03 15:32:41 +08:00
George Hotz
f484db0e63
dsp cleanups [pr] ( #8866 )
2025-02-03 15:18:53 +08:00
George Hotz
af2c2837f6
hotfix: skip broken test, add KERNEL Op
2025-02-03 14:02:55 +08:00
qazal
565c37c681
start simplifying the scheduler context [pr] ( #8830 )
2025-02-02 18:11:36 +02:00
qazal
d64af3c884
reorder simplifier and grouper logic in scheduler [pr] ( #8861 )
2025-02-02 17:19:52 +02:00
qazal
83a904aaad
just schedule in test_recursive_pad [pr] ( #8860 )
2025-02-02 15:01:24 +02:00
uuuvn
6dadb60c93
LLVM JIT (+autogen llvm instead of llvmlite) ( #8486 )
...
* LLVM JIT
* Autogen LLVM
* Update autogen
* Move things around
* even more non-determinism
* windows
* more autogen weirdness
* more windows stuff
* blind windows development try 2
* more blind windows development
* even more blind windows development
* maybe i should just set up a windows vm...
* why can't everyone just use sysv abi?
* cleanup debugging stuff
* unused import
* icache flushing isn't required on x86
* merge jit_nt and jit_unix
* more
* Temporary hack to not segfault
* better error
* bad conflict resolution
* Attempt to simplify support/llvm.py
* More refactoring
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-02 19:52:42 +08:00
FICTURE7
66306b5321
Fix disk tensor assignment ( #8855 )
...
* Add test for disk tensor assignment failure
* Fix disk tensor assignment
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-02-02 13:50:34 +02:00
Ali Ladjevardi
6e523e4d17
Remove size arg from DEFINE_LOCAL [pr] ( #8845 )
...
* remove size arg form DEFINE_LOCAL
* make mypy happy
* whitespace
* dont change code in extra
* revert to temp1 to pass pr
2025-02-02 19:47:32 +08:00
nimlgen
7841852870
hcq pci signal fuzzer ( #8854 )
...
* hcq pci signal fuzzer
* kk
* correct
2025-02-01 23:42:27 +03:00
qazal
dc34a4146f
better process_replay context print [pr] ( #8856 )
...
* better process_replay context print [pr]
* test: revert push cast
* Revert "test: revert push cast"
This reverts commit 38a2aef6f8 .
2025-02-01 21:50:23 +02:00
chenyu
5b1fc4dcb2
push cast to branches in UOp where ( #8850 )
2025-02-01 13:55:24 -05:00
chenyu
73ee2d74c0
raise RuntimeError for int base pow ( #8852 )
...
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
qazal
72e1f41f8e
add unbind_vars pattern matcher ( #8851 )
...
* add unbind_vars pattern matcher [pr]
* this can be cvar
* this is empty
2025-02-01 18:25:44 +02:00
nimlgen
b3fa76419a
am: move queues to gpus ( #8848 )
...
* am: fix
* add flsg for thos
* do not depend on host parameter,
2025-02-01 18:02:52 +03:00
George Hotz
42d7c800a1
hotfix: add missing tinychat fonts + other assets
2025-02-01 09:34:44 +08:00
George Hotz
431a86615d
fix multi Ops.CONTIGUOUS_BACKWARD [pr] ( #8843 )
2025-02-01 09:21:31 +08:00
Ahmed Harmouche
07d3676019
weights_only=False ( #8839 )
2025-01-31 17:16:47 -05:00
nimlgen
741bbc900d
Revert "am: queues allocated on gpus ( #8836 )" ( #8837 )
...
This reverts commit 7bbb568dec .
2025-01-31 22:53:41 +03:00
nimlgen
7bbb568dec
am: queues allocated on gpus ( #8836 )
...
* am: fix
* add flsg for thos
2025-01-31 22:14:43 +03:00
chenyu
1f730ae8f8
remove retain_graph in Tensor.backward [pr] ( #8835 )
...
not used. gradient accumulation works directly
2025-01-31 13:41:26 -05:00
chenyu
0a59db936a
raise RuntimeError in schedule_step if not Tensor.training [pr] ( #8834 )
2025-01-31 12:03:04 -05:00
qazal
af4f9d1aa9
use matchers to verify AST shape [pr] ( #8828 )
...
* use matchers to verify kernel AST [pr]
* work
* use swizzle_cnt
* add comment
* imports
* modified_ast comment
* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6
tensor uop spec should be in spec.py [pr] ( #8827 )
...
* tensor uop spec should be in spec.py [pr]
* err, spec.py
* print uops can stay
2025-01-31 13:54:04 +08:00
qazal
a78f0f85d3
remove support for checking tensor uops in FUSE_ARANGE [pr] ( #8829 )
2025-01-31 07:48:28 +02:00
qazal
2a33750e4c
simpler group_realizes + ScheduleItem construction [pr] ( #8825 )
2025-01-31 06:34:53 +02:00
George Hotz
e63d160376
hotfix: sched comment
2025-01-31 12:10:04 +08:00
qazal
1fce864a6d
delete multi output support ( #8822 )
...
* delete multioutput for now
* test_schedule
* test_assign too
* linter
* 515 for sd
* update tests and ctx
* update that assign check
2025-01-30 22:45:50 -05:00
Ankit Avinash
7647cd8428
[bounty] Stride is flip ( #8792 )
...
* replace stride with flip
* Complete replacing stride with flip
clean flip function in view.py
fix tests
* fix tests for multi shapetracker
* fix tests for fuzz shapetracker
* fix tests for fuzz shapetracker
* debug
* debug
* fix
* fix
* fix
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-31 11:34:10 +09:00
chenyu
0513b0c17d
lower green test_gemm_8192 tflops to 125 [pr] ( #8820 )
...
flaky
2025-01-30 17:30:08 -05:00
Ignacio Sica
f0924e0857
fix and test ( #8814 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-30 16:35:53 -05:00
qazal
f5da275f46
simpler remove_movement_ops [pr] ( #8818 )
2025-01-30 23:32:52 +02:00
qazal
c8d878a5c1
remove r.lazydata.buf_uop_view [pr] ( #8817 )
2025-01-30 23:14:36 +02:00