chenyu
c2ffcf6887
remove the wrong mod UOp pattern ( #5847 )
...
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
qazal
8174c438a3
pad test_failure_45 ( #5846 )
2024-07-31 23:08:48 +03:00
George Hotz
8672a9db3f
add test to validate lazyops dims ( #5845 )
2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568
fix UOp ALU bound ( #5844 )
...
* fix UOp ALU bound
root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized
* it can be nan...
2024-07-31 15:19:31 -04:00
nimlgen
f768935be8
add RING_ALLREDUCE_THRESHOLD ( #5835 )
...
* add RING_ALLREDUCE_THRESHOLD
* becnhmark
* fixes
* fix n_gpus
* unused import
* remove debug=2
2024-07-31 16:13:09 +03:00
chenyu
2e087ca8e4
UOp bound for div negative number ( #5808 )
2024-07-31 02:10:23 -04:00
qazal
bcbd925001
hcopts failing test for fused arange kernel ( #5815 )
...
* add failure_43
* n 45
2024-07-31 09:02:44 +03:00
qazal
ed556c260e
UOps.IF rules more tests ( #5831 )
...
* init tests
* split tests
* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
David Hou
492a696d14
allow specify splits in shard, handle multiple different splits in MLB.e ( #5599 )
...
* allow specify splits in shard, handle multiple different splits in MLB.e
* line width
* linter
* don't use Device in docstring
* specify size of shards instead of boundaries
* adjust docstring for specify size of shards instead of boundaries
* don't allow splits on symbolic axis?
* just allow sint in splits_to_bounds
* add message for assert
* bounds instead of splits to save lines
* fix types
* reduce diff
* fix
* tuple
* golf :(
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-07-30 19:33:04 -07:00
chenyu
c3da458bc3
UOp if min==max folds to CONST ( #5828 )
...
* UOp if min==max folds to CONST
* fix test
2024-07-30 22:14:22 -04:00
George Hotz
e6879035a0
work to make GEMV fast ( #5824 )
...
* work to make GEMV fast
* half8 cast
* align struct
* fix amd
* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
02f0be03f2
tests on UOp div negative number and arange opts ( #5825 )
2024-07-30 20:06:57 -04:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
qazal
03d866b84f
UOps.IF with rewrite rules ( #5812 )
...
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0
unify negative shape creation to raise ValueError ( #5817 )
...
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a
Add check for negative dimension in view ( #5790 )
...
* add check for negative dimension in view
* add negative dim tests
* move check to tensor level
* fix error message
* move check to view create
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-30 13:26:27 -04:00
Francis Lata
ce61be16f1
clean up how preprocessed folder is defined ( #5813 )
2024-07-30 12:35:26 -04:00
qazal
5e827e51d2
add llama3 BEAM=2 failures to test_linearizer_failures ( #5553 )
...
* skips
* opts.device
* benchmarks
* add to test_linearizer_failures
* remove hardcoded ones
* linter
* skip cpu
2024-07-30 00:37:32 +03:00
samm393
573e0f9a48
remove float division from idiv in python_alu ( #5777 )
...
* removes float division from idiv in python_alu
* add test
* cleaner logic
* pass clang unsigned literals correctly
* suffix ULL instead of U
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2
ull literal support and test ( #5789 )
...
* ull literal support and test
* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
ab3839a80a
cleanup nv/cuda compilers ( #5767 )
...
* cleanup nv/cuda compilers
* destroy prog
* small test
* fix test
* nv ptx rewrite key
* jitlink free
* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
e7a14f398e
more uop_symbolic tests for divmod pairs ( #5785 )
2024-07-28 21:27:06 -04:00
George Hotz
76d191ab94
move consts to end of add ( #5783 )
...
* move consts to end of add
* better
* fix infinite loop
2024-07-28 17:38:57 -07:00
chenyu
71a64d8252
UOps.MUL bound when one is negative ( #5781 )
...
* UOps.MUL bound when one is negative
also one more distribute_mul rule
* don't always expand
2024-07-28 19:02:47 -04:00
qazal
b775db6b60
high-level benchmark timing diff ( #5776 )
...
* high level timings
benchmark times
fix defs
* use the name map
* skip last task
2024-07-28 23:42:57 +03:00
chenyu
600a39771d
fix Tensor.arange if (stop-start) and step have different signs ( #5775 )
2024-07-28 14:34:10 -04:00
David González Martínez
d0fd84e617
feat: allow passing gradient to .backward() to compute vjp ( #5771 )
...
* feat: allow passing gradient to .backward() to compute vjp
* fix
* refactor
* fix trailing whitespace
2024-07-28 11:13:18 -07:00
qazal
e0e7293b0a
make process replay unique in retries [run_process_replay] ( #5773 )
2024-07-28 20:44:15 +03:00
qazal
95dda8dadf
more unmatching vectorize/gep asserts [run_process_replay] ( #5760 )
...
* merge vectorize/gep rules [run_process_replay]
* assert dtypes
* src=
* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
chenyu
bfbd7c5461
more generic UOp mul mod folding ( #5765 )
2024-07-27 20:20:35 -04:00
chenyu
80c6475757
update test_uop_symbolic to test UOp min and max ( #5764 )
...
covers #5750 , #5748 , #5741
2024-07-27 19:53:21 -04:00
nimlgen
ed1d784077
test profiler timer sync across devs ( #5751 )
...
* test profiler timer sync across devs
* more correct
* typo
2024-07-27 16:47:37 +03:00
qazal
3e49d86c01
process replay diffs 3 things now ( #5731 )
...
* github api infra
* process replay is 3 parts now
* parse benchmarks
* add gh_token
* complete diff
* move process replay tests
* last successful run
* add tempdir
* skip master
2024-07-27 12:52:20 +03:00
qazal
57b4a8e98d
assert process replay asserts ( #5737 )
...
* assert process replay asserts
* one ci job is fine
* test: Revert "separate process replay main loop (#5734 )"
This reverts commit 94d578396f .
* mac sed needs that
* Revert "test: Revert "separate process replay main loop (#5734 )""
This reverts commit e4ad7684d5 .
* disable process replay capture
* save time
* amd is tiny
* send to /dev/null
2024-07-27 12:07:50 +03:00
George Hotz
f8972ace38
test flops (and allow wide ALU in UOps) [run_process_replay] ( #5749 )
...
* flops test in external_test_speed_theoretical.py
* test speed theo
* min SZMAX
* allow wide ALU for things that support it
* needed for mypy
2024-07-26 21:07:28 -07:00
George Hotz
2fde2d2914
hotfix: external_test_speed_theoretical works on 24GB
2024-07-26 18:41:52 -07:00
George Hotz
829262a5ee
add external_test_speed_theoretical
2024-07-26 17:45:22 -07:00
kormann
a5ede535ef
NOp field name [run_process_replay] ( #5742 )
...
* rm def name
* add field name
2024-07-26 18:45:59 -04:00
George Hotz
c50e374bb6
multiple locals + get_kernel_modifier + fix valid ( #5739 )
...
* multiple locals + get_kernel_modifier + fix valid
* fix test pattern matcher
2024-07-26 15:10:10 -07:00
chenyu
dc7483ee6f
UOp simple div folding ( #5740 )
...
made UOp.divides return the Optional[quotient] and used it for simple div folding
2024-07-26 17:14:32 -04:00
chenyu
671259417f
reuse UOp __repr__ for NOp ( #5738 )
2024-07-26 16:59:55 -04:00
kormann
b0c1dba299
named UOp class "NOP" [run_process_replay] ( #5728 )
...
* NOP
* fix const + simplify compile
* rm VAR for NOOP
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-07-26 13:25:53 -07:00
George Hotz
4df46eac67
clean up tensor cores [run_process_replay] ( #5736 )
...
* clean up tensor cores [run_process_replay]
* remove tuple(wmma_sz), self.opts.device
* remove tls, leave DEVICE
2024-07-26 13:21:23 -07:00
qazal
94d578396f
separate process replay main loop ( #5734 )
...
* separate process replay main loop
* [run_process_replay]
* add kernel_changed
* test with [run_process_replay]
* revert temp [run_process_replay]
2024-07-26 21:43:08 +03:00
chenyu
a4e9ebc68a
update test_uop_symbolic ( #5733 )
...
enabled more passed tests
2024-07-26 13:46:09 -04:00
chenyu
2cc55a3095
UOp simple mul add div fold ( #5726 )
2024-07-25 22:00:30 -04:00
chenyu
5521b6d437
UOp simple mul-add-lt fold ( #5721 )
2024-07-25 20:49:38 -04:00
qazal
1b53207b4f
revert isolated dags scheduling ( #5724 )
2024-07-25 19:45:12 -04:00
chenyu
845b0d1c9d
UOp more generic div folding ( #5722 )
...
old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c`
new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`
2024-07-25 17:49:14 -04:00