George Hotz
d1a7279605
indexing fold with casted bool ( #5551 )
...
* cast bool is where
* universal transform is wrong
2024-07-18 10:02:29 -07:00
qazal
fdfc0015a7
[run_process_replay] for opencl/openpilot ( #5009 )
...
* lil reset script
* find the prg
* use lower_schedule_item
* add process replay back
* cleanups
2024-07-18 19:42:33 +03:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
nimlgen
c30092e56d
amd remove useless barrier ( #5550 )
2024-07-18 18:05:33 +03:00
nimlgen
4e9d2b1615
nv memory_barrier command ( #5548 )
2024-07-18 16:23:11 +03:00
qazal
6d7cd34250
more save_schedule tooling ( #5547 )
2024-07-18 15:59:53 +03:00
qazal
0ad1672d5f
fuse indexing (LazyOp creation) ( #5506 )
...
* bring FUSE_AS_ONE_KERNEL back
* operands need reshape?
* fused but arange didnt fold
* something deeply wrong
* yay, fused
* derive broadcasts
* s/input/reduce_input
* _fixup_ones proved a point
* this is what it takes
* down to 3 required reshapes:
1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape
* start real reshapes
* resolve shape in the edges pre lazyop
* outputs are the same shape
* rewrite1: just the reduce
* more correct
* fuse_as_one_kernel
* closer
* this passes
* dont rerun info
* dont need these
* not needed
2024-07-18 14:09:17 +03:00
wozeparrot
6ccb2390c3
feat: update_benchmark_staging ( #5529 )
2024-07-17 20:40:57 -07:00
chenyu
e569c927cf
remove Kernel.shape_offsets [run_process_replay] ( #5544 )
...
the only use case now can be further simplified
2024-07-17 23:16:47 -04:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d
add failing regression test for image ( #5540 )
...
* add failing regression test for image
* tg type
* simpler test
* don't realize image to image casts caused issue
* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot
218e157f00
benchmark on update_benchmark_staging ( #5541 )
2024-07-17 17:11:52 -07:00
wozeparrot
8845a5dbfd
feat: begin immediate ( #5539 )
2024-07-17 16:11:21 -07:00
George Hotz
a6e70f8a71
clean up expand function [run_process_replay] ( #5538 )
...
* clean up expand function [run_process_replay]
* lil cleaner
* add a type
2024-07-17 15:02:00 -07:00
qazal
61ee02e93d
start multireduce lowerer work (var/std) ( #5537 )
...
* multireduce no-opts works
* passed test_var_multireduce
* cleanup
* double reduce
* extra check for range_group
* more checking for range_groups
* cleaning up debug prints
* cleanup diff
* linters
* revert kernel changes
* these are uops toposort
---------
Co-authored-by: timmy <timmy0x@proton.me >
2024-07-17 23:43:46 +03:00
qazal
67ea4af01f
depth first recurse_reduceops ( #5536 )
...
* early recurse
p2
* yea cache shouldnt be there
2024-07-17 23:27:53 +03:00
Francis Lam
c4eb30a04c
test/test_linearizer_failures: add a new beautiful_mnist one ( #5531 )
...
* test/test_linearizer_failures: add a new beautiful_mnist one
this one is from a DEPTH=2 fuzz_linearizer search
* add GPU to test_failure_40
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-17 16:27:04 -04:00
qazal
0259d76183
use Context only in replaying Kernel [run_process_replay] ( #5535 )
2024-07-18 03:46:14 +08:00
George Hotz
1a68854766
PatternMatcher add ( #5532 )
...
* PatternMatcher add [run_process_replay]
* f4 dynamic
* test_failure_36 is fixed
* fix PTX
2024-07-17 12:44:42 -07:00
qazal
d3c137d478
utility for computing reduceop output_shape ( #5534 )
...
* refactor to reduce_st
* update lazy
2024-07-17 22:40:07 +03:00
qazal
0a7872a62f
use exec_alu in uops flop counting ( #5511 )
...
* use exec_alu for uops flop counting
* deal with sint
2024-07-17 22:39:27 +03:00
qazal
a7706e05f9
option to [skip_process_replay] ( #5533 )
2024-07-17 22:30:46 +03:00
chenyu
4193095f67
fix handcode_opt.py with DEBUG=2 ( #5530 )
...
only one ast per kernel now
2024-07-17 14:50:47 -04:00
chenyu
466555cd17
touchup Tensor.interpolate ( #5525 )
...
* touchup Tensor.interpolate and Tensor.lerp
rewrite lerp to save one sub and thus flops.
use Tensor.lerp for interpolate and some minor cleanups
* revert lerp change
2024-07-17 13:35:57 -04:00
George Hotz
1242b302fa
expand UOps with rewrite rules ( #5501 )
...
* expand UOps with rewrite rules [run_process_replay]
* progress
* much closer
* close, way less bugs
* bunch of expander tests
* fix contract
* ops tests pass
* fix barrier
* mostly passing
* bitcast in expanded ops
* support more expand merges
* all tests pass maybe
* fix empty EXPAND
* fix LIN fuzzing
* add ALL_SAME assert
* all same
* all same work
* raise CompileError
* pass fuzz linearizer
* revert whitespace
* fix nv tensor core test
* fix mypy
* bug fix
* fuzzer passes
* put tests back
* expand arg to idx
2024-07-17 10:17:50 -07:00
George Hotz
158221b36b
expand tests from uop_expander [run_process_replay] ( #5524 )
...
* expand tests from uop_expander
* more changes from the branch
2024-07-17 09:22:36 -07:00
George Hotz
42c25cc961
fix fixup_ast ( #5523 )
...
* fix fixup_ast
* these lin failures are fixed
2024-07-17 08:52:21 -07:00
qazal
fbe0233be3
infra for multi reduce asts ( #5522 )
...
* add reduce_info
* _recurse_reduceops base
* derive output shape
* refactor
* delete reduce_for_op
* save lines
* more line saving
2024-07-17 17:23:46 +03:00
nimlgen
dcd462860f
elf loader ( #5508 )
...
* elf loader
* cleanup
* cleaner
* cleaner
* fixes
* revert this
* fix div 0
* fix nv
* amd fix
* fix mockgpu
* amd better?
* restore relocs for <12.4
* linter
* this is fixed now
* revert this
* process cdefines as function
* cleaner
* align
* save lines
* revert this change
2024-07-17 17:09:34 +03:00
nimlgen
661da32aff
nv do not map regions twice ( #5521 )
2024-07-17 11:20:02 +03:00
Francis Lam
2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes ( #5519 )
...
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
Tobias Fischer
85d4ca7caa
FID Inception Model ( #5516 )
...
* added model impl
* minor cleanups
* extracted weights loading into from_pretrained
* reorganized model for better weight loading
* removed lru cache for state dict loading
2024-07-16 23:12:03 -04:00
chenyu
4ad83d032e
remove Kernel.lazyops [run_process_replay] ( #5517 )
...
always use Kernel.ast.lazyops
2024-07-16 19:47:42 -04:00
wozeparrot
1c1d6d3a4a
feat: show caller when tracemeta >= 2 ( #5514 )
2024-07-16 15:06:02 -07:00
chenyu
5aad043522
cleanup fixup_ast local shape long line [run_process_replay] ( #5513 )
2024-07-16 17:29:38 -04:00
chenyu
6e405b0a2b
add 0d tensor to trunc/floor/ceil/round tests ( #5512 )
...
existing trunc test passes backward but its backward is incorrect in general. added tests that would fail
2024-07-16 16:48:25 -04:00
chenyu
0afcbfae84
docs: add Tensor.interpolate to doc page ( #5510 )
2024-07-16 14:17:19 -04:00
Tobias Fischer
87a2ef2bc2
Add Interpolate Function ( #5482 )
...
* add interpolate function
* fixed linter issue
* reduced sizes in test
---------
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2024-07-16 09:44:01 -07:00
gswangg
203161c75d
refactor VECTORIZE/GEP rules ( #5507 )
2024-07-16 09:41:23 -07:00
qazal
173064c69c
(re)start multireduce in codegen/* ( #5391 )
...
* test_var_multireduce
* run verify_lazyop
* test_var_multireduce
* assert lazyop
* add test_indexing_multireduce
* arange fuses (crude)
* note: extra reshape
* start readble
* test_arange_simple
* test_arange_expanded
* test_indexing_multireduce
* cleanups
* skip ptx
* skip nv and amd ci
* skip arange expanded too
* GPU=1 is slow too in CI
2024-07-16 14:20:48 +03:00
chenyu
07ff4b7d24
test_failure_33 ast that has UOps.UNMUL after linearize ( #5504 )
...
* test_failure_33 ast that has UOps.UNMUL after linearize
* smaller
2024-07-15 22:54:23 -04:00
chenyu
1ccd987e6a
simpler tc permaxis in fixup_ast.fix_st [run_process_replay] ( #5502 )
2024-07-15 21:35:32 -04:00
George Hotz
9d4c3c553c
prepare expand to support multiexpand [run_process_replay] ( #5503 )
2024-07-15 18:21:24 -07:00
chenyu
fd43d33b7d
shave some lines from transcend math [run_process_replay] ( #5500 )
...
* shave some lines from transcend math [run_process_replay]
* put input_dtype back
2024-07-15 21:02:24 -04:00
chenyu
63990705b5
test kernel opts case for 4 local and 4 groups ( #5499 )
...
make sure local grouped dim is correct
2024-07-15 20:09:38 -04:00
Alessandro Benetti
13e200b437
add strict mkdocs check ( #5497 )
2024-07-15 14:21:37 -07:00
nimlgen
8dfd11c1d8
docs: hcq add types ( #5495 )
...
* docs: hcq add types
* linter
2024-07-15 22:14:48 +03:00
George Hotz
aab1e8c6dc
uniform init to match torch ( #5494 )
2024-07-15 12:07:44 -07:00
George Hotz
338b7590b9
hotfix: docs for BatchNorm
2024-07-15 12:04:17 -07:00
nimlgen
c9ec7ce070
start hcq docs ( #5411 )
...
* start hcq docs
* more hcq docs
* docs
* docs
* linter
* correct args
* linter
* ts returns int
2024-07-15 21:31:11 +03:00