chenyu
d072e628da
UOp bounds for max ( #5820 )
2024-07-30 17:54:44 -04:00
George Hotz
3630208a01
lil transcendental folding cleanup [run_process_replay] ( #5822 )
...
* lil transcendental folding cleanup [run_process_replay]
* idk why function isn't Callable
2024-07-30 14:10:17 -07:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
chenyu
e8a42b945c
simpler src variables in UOp._min_max [run_process_replay] ( #5819 )
...
s0,s1 instead of self.src[0] and self.src[1]
2024-07-30 15:18:42 -04:00
Francis Lata
a0baff7a3d
update dataloader script example ( #5818 )
2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
qazal
03d866b84f
UOps.IF with rewrite rules ( #5812 )
...
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0
unify negative shape creation to raise ValueError ( #5817 )
...
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a
Add check for negative dimension in view ( #5790 )
...
* add check for negative dimension in view
* add negative dim tests
* move check to tensor level
* fix error message
* move check to view create
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-30 13:26:27 -04:00
P4ssenger
2b7b7591d2
rename upcast_axis into plural ( #5788 )
2024-07-30 10:07:35 -07:00
Francis Lata
ce61be16f1
clean up how preprocessed folder is defined ( #5813 )
2024-07-30 12:35:26 -04:00
nimlgen
ca674c31f9
nv remove some type ignores ( #5811 )
2024-07-30 17:47:29 +03:00
wozeparrot
639af3f823
llama3 temperature flag ( #5803 )
2024-07-29 16:33:51 -07:00
chenyu
22e7289fe0
s/self.shape_len - self.upcasted/self.first_upcast ( #5802 )
...
missed the one with spaces.
[run_process_replay]
2024-07-29 18:23:42 -04:00
chenyu
1a19751902
s/self.shape_len-self.upcasted/self.first_upcast ( #5801 )
...
[run_process_replay]
2024-07-29 17:54:10 -04:00
qazal
5e827e51d2
add llama3 BEAM=2 failures to test_linearizer_failures ( #5553 )
...
* skips
* opts.device
* benchmarks
* add to test_linearizer_failures
* remove hardcoded ones
* linter
* skip cpu
2024-07-30 00:37:32 +03:00
chenyu
cb6718347f
python -m mkdocs build --strict in CI (#5800 )
2024-07-29 16:46:30 -04:00
nimlgen
a25e1a1c90
nv open correct device ( #5796 )
2024-07-29 23:40:52 +03:00
chenyu
be3899d211
hotfix increase ci timeout to 20 mintues ( #5799 )
...
when cache is clear it takes time to populate cache
2024-07-29 16:25:27 -04:00
chenyu
fc393d710d
LazyBuffer.const type check cleanup [run_process_replay] ( #5795 )
2024-07-29 16:17:14 -04:00
chenyu
2cadf21684
include "mkdocs" in setup docs ( #5798 )
2024-07-29 15:54:52 -04:00
chenyu
471b188d79
fix mypy errors in latest mypy ( #5794 )
...
* fix mypy errors in latest mypy
mypy has stricter partial and api arg checks now
* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
samm393
573e0f9a48
remove float division from idiv in python_alu ( #5777 )
...
* removes float division from idiv in python_alu
* add test
* cleaner logic
* pass clang unsigned literals correctly
* suffix ULL instead of U
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2
ull literal support and test ( #5789 )
...
* ull literal support and test
* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
71e1472290
hcq more types ( #5791 )
...
* mhcq more types
* linter
* pylint
* docs: bind
2024-07-29 18:03:23 +03:00
P4ssenger
9c80f9adf9
fix bug in assert message ( #5787 )
2024-07-29 15:46:23 +03:00
nimlgen
ab3839a80a
cleanup nv/cuda compilers ( #5767 )
...
* cleanup nv/cuda compilers
* destroy prog
* small test
* fix test
* nv ptx rewrite key
* jitlink free
* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
76840fd65a
minor ops cleanup [run_process_replay] ( #5786 )
2024-07-29 02:30:38 -04:00
chenyu
e7a14f398e
more uop_symbolic tests for divmod pairs ( #5785 )
2024-07-28 21:27:06 -04:00
George Hotz
76d191ab94
move consts to end of add ( #5783 )
...
* move consts to end of add
* better
* fix infinite loop
2024-07-28 17:38:57 -07:00
George Hotz
5b84a7db1a
hotfix: ptx threads match cuda threads
2024-07-28 16:53:24 -07:00
chenyu
460b120d62
apply more .alu syntactic sugar [run_process_replay] ( #5782 )
2024-07-28 19:43:48 -04:00
George Hotz
0392123e6e
TC=2 still sets tensor cores (and TC=3 support for locals) ( #5780 )
...
* TC=2 still sets tensor cores
* add TC=3 support for using locals
* bugfix
* lines + TC=3 tests
* CUDA can use threads, fix fuzz linearizer
2024-07-28 16:16:53 -07:00
chenyu
71a64d8252
UOps.MUL bound when one is negative ( #5781 )
...
* UOps.MUL bound when one is negative
also one more distribute_mul rule
* don't always expand
2024-07-28 19:02:47 -04:00
qazal
b775db6b60
high-level benchmark timing diff ( #5776 )
...
* high level timings
benchmark times
fix defs
* use the name map
* skip last task
2024-07-28 23:42:57 +03:00
chenyu
600a39771d
fix Tensor.arange if (stop-start) and step have different signs ( #5775 )
2024-07-28 14:34:10 -04:00
David González Martínez
d0fd84e617
feat: allow passing gradient to .backward() to compute vjp ( #5771 )
...
* feat: allow passing gradient to .backward() to compute vjp
* fix
* refactor
* fix trailing whitespace
2024-07-28 11:13:18 -07:00
qazal
e0e7293b0a
make process replay unique in retries [run_process_replay] ( #5773 )
2024-07-28 20:44:15 +03:00
nimlgen
ea27ec4cd0
nv switch classlist_v2 to classlist ( #5763 )
...
* nv switch classlist_v2 to classlist
* support in mockgpu
* fix mockgpu
2024-07-28 20:24:42 +03:00
nimlgen
73fda023d3
amd better comments for ENABLE_SGPR_DISPATCH_PTR ( #5768 )
...
* amd better comments for ENABLE_SGPR_DISPATCH_PTR
* fix lkinter
2024-07-28 16:23:38 +03:00
qazal
95dda8dadf
more unmatching vectorize/gep asserts [run_process_replay] ( #5760 )
...
* merge vectorize/gep rules [run_process_replay]
* assert dtypes
* src=
* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
chenyu
bfbd7c5461
more generic UOp mul mod folding ( #5765 )
2024-07-27 20:20:35 -04:00
chenyu
80c6475757
update test_uop_symbolic to test UOp min and max ( #5764 )
...
covers #5750 , #5748 , #5741
2024-07-27 19:53:21 -04:00
nimlgen
1903542c2d
nv/cuda compilers touchup ( #5759 )
...
* nv/cuda compilers touchup
* fix cuda check + move nv disasm
* remove includes
* fix nvrtc_check
2024-07-28 00:15:28 +03:00
chenyu
3c79faaf77
remove redundant UOps max folding [run_process_replay] ( #5762 )
...
all covered by generic max folding
2024-07-27 16:46:51 -04:00
chenyu
05748e5a84
fix vmax of Uop.RANGE off by 1 ( #5750 )
...
with this, can remove several redundant max folding rules, do it separately to check kernel diff
2024-07-27 16:30:46 -04:00
nimlgen
fff19b961b
docs: user runtime docs ( #5756 )
2024-07-27 23:21:54 +03:00
nimlgen
5d53fa491b
amd autogened kfd ioctls ( #5757 )
...
* amd autogened kio
* unused import
* linter
2024-07-27 22:49:48 +03:00
nimlgen
ed1d784077
test profiler timer sync across devs ( #5751 )
...
* test profiler timer sync across devs
* more correct
* typo
2024-07-27 16:47:37 +03:00