Commit Graph

5375 Commits

Author SHA1 Message Date
chenyu
d072e628da UOp bounds for max (#5820) 2024-07-30 17:54:44 -04:00
George Hotz
3630208a01 lil transcendental folding cleanup [run_process_replay] (#5822)
* lil transcendental folding cleanup [run_process_replay]

* idk why function isn't Callable
2024-07-30 14:10:17 -07:00
George Hotz
693990a346 swap src[2] and src[3] in load [run_process_replay] (#5821)
* swap src[2] and src[3] in load [run_process_replay]

* cleanups + bugfix

* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412 new style load/store folder (#5784)
* remove old index reorder

* new style folder

* works better

* dedup

* one failure

* this is fine now...

* expander_rewrite

* images broken, but all else should work

* cleanups

* make tests work with old

* fix images

* cleanups + bugfix

* minor fixes

* fix gated store folding

* flip gate_creator and expander

* fix gated store

* remove unneeded rules

* lines getting close

* line count good
2024-07-30 13:17:20 -07:00
chenyu
e8a42b945c simpler src variables in UOp._min_max [run_process_replay] (#5819)
s0,s1 instead of self.src[0] and self.src[1]
2024-07-30 15:18:42 -04:00
Francis Lata
a0baff7a3d update dataloader script example (#5818) 2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922 feat: temperature 0 llama3 benchmark (#5806) 2024-07-30 12:05:36 -07:00
qazal
03d866b84f UOps.IF with rewrite rules (#5812)
* expand merge

* merge barriers

* gate_folder

* test_linearizer_failures

* this can be here

* bring the new repr back

* gate_folder2

* gate_creator is better

* gate_folder

* dedup conditions

* early gate folding

* dedup barrier

* fold noop conditions

* all consts can go away

* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0 unify negative shape creation to raise ValueError (#5817)
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a Add check for negative dimension in view (#5790)
* add check for negative dimension in view

* add negative dim tests

* move check to tensor level

* fix error message

* move check to view create

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-30 13:26:27 -04:00
P4ssenger
2b7b7591d2 rename upcast_axis into plural (#5788) 2024-07-30 10:07:35 -07:00
Francis Lata
ce61be16f1 clean up how preprocessed folder is defined (#5813) 2024-07-30 12:35:26 -04:00
nimlgen
ca674c31f9 nv remove some type ignores (#5811) 2024-07-30 17:47:29 +03:00
wozeparrot
639af3f823 llama3 temperature flag (#5803) 2024-07-29 16:33:51 -07:00
chenyu
22e7289fe0 s/self.shape_len - self.upcasted/self.first_upcast (#5802)
missed the one with spaces.
[run_process_replay]
2024-07-29 18:23:42 -04:00
chenyu
1a19751902 s/self.shape_len-self.upcasted/self.first_upcast (#5801)
[run_process_replay]
2024-07-29 17:54:10 -04:00
qazal
5e827e51d2 add llama3 BEAM=2 failures to test_linearizer_failures (#5553)
* skips

* opts.device

* benchmarks

* add to test_linearizer_failures

* remove hardcoded ones

* linter

* skip cpu
2024-07-30 00:37:32 +03:00
chenyu
cb6718347f python -m mkdocs build --strict in CI (#5800) 2024-07-29 16:46:30 -04:00
nimlgen
a25e1a1c90 nv open correct device (#5796) 2024-07-29 23:40:52 +03:00
chenyu
be3899d211 hotfix increase ci timeout to 20 mintues (#5799)
when cache is clear it takes time to populate cache
2024-07-29 16:25:27 -04:00
chenyu
fc393d710d LazyBuffer.const type check cleanup [run_process_replay] (#5795) 2024-07-29 16:17:14 -04:00
chenyu
2cadf21684 include "mkdocs" in setup docs (#5798) 2024-07-29 15:54:52 -04:00
chenyu
471b188d79 fix mypy errors in latest mypy (#5794)
* fix mypy errors in latest mypy

mypy has stricter partial and api arg checks now

* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
samm393
573e0f9a48 remove float division from idiv in python_alu (#5777)
* removes float division from idiv in python_alu

* add test

* cleaner logic

* pass clang unsigned literals correctly

* suffix ULL instead of U

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2 ull literal support and test (#5789)
* ull literal support and test

* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
71e1472290 hcq more types (#5791)
* mhcq more types

* linter

* pylint

* docs: bind
2024-07-29 18:03:23 +03:00
P4ssenger
9c80f9adf9 fix bug in assert message (#5787) 2024-07-29 15:46:23 +03:00
nimlgen
ab3839a80a cleanup nv/cuda compilers (#5767)
* cleanup nv/cuda compilers

* destroy prog

* small test

* fix test

* nv ptx rewrite key

* jitlink free

* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
76840fd65a minor ops cleanup [run_process_replay] (#5786) 2024-07-29 02:30:38 -04:00
chenyu
e7a14f398e more uop_symbolic tests for divmod pairs (#5785) 2024-07-28 21:27:06 -04:00
George Hotz
76d191ab94 move consts to end of add (#5783)
* move consts to end of add

* better

* fix infinite loop
2024-07-28 17:38:57 -07:00
George Hotz
5b84a7db1a hotfix: ptx threads match cuda threads 2024-07-28 16:53:24 -07:00
chenyu
460b120d62 apply more .alu syntactic sugar [run_process_replay] (#5782) 2024-07-28 19:43:48 -04:00
George Hotz
0392123e6e TC=2 still sets tensor cores (and TC=3 support for locals) (#5780)
* TC=2 still sets tensor cores

* add TC=3 support for using locals

* bugfix

* lines + TC=3 tests

* CUDA can use threads, fix fuzz linearizer
2024-07-28 16:16:53 -07:00
chenyu
71a64d8252 UOps.MUL bound when one is negative (#5781)
* UOps.MUL bound when one is negative

also one more distribute_mul rule

* don't always expand
2024-07-28 19:02:47 -04:00
qazal
b775db6b60 high-level benchmark timing diff (#5776)
* high level timings

benchmark times

fix defs

* use the name map

* skip last task
2024-07-28 23:42:57 +03:00
chenyu
600a39771d fix Tensor.arange if (stop-start) and step have different signs (#5775) 2024-07-28 14:34:10 -04:00
David González Martínez
d0fd84e617 feat: allow passing gradient to .backward() to compute vjp (#5771)
* feat: allow passing gradient to .backward() to compute vjp

* fix

* refactor

* fix trailing whitespace
2024-07-28 11:13:18 -07:00
qazal
e0e7293b0a make process replay unique in retries [run_process_replay] (#5773) 2024-07-28 20:44:15 +03:00
nimlgen
ea27ec4cd0 nv switch classlist_v2 to classlist (#5763)
* nv switch classlist_v2 to classlist

* support in mockgpu

* fix mockgpu
2024-07-28 20:24:42 +03:00
nimlgen
73fda023d3 amd better comments for ENABLE_SGPR_DISPATCH_PTR (#5768)
* amd better comments for ENABLE_SGPR_DISPATCH_PTR

* fix lkinter
2024-07-28 16:23:38 +03:00
qazal
95dda8dadf more unmatching vectorize/gep asserts [run_process_replay] (#5760)
* merge vectorize/gep rules [run_process_replay]

* assert dtypes

* src=

* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
chenyu
bfbd7c5461 more generic UOp mul mod folding (#5765) 2024-07-27 20:20:35 -04:00
chenyu
80c6475757 update test_uop_symbolic to test UOp min and max (#5764)
covers #5750, #5748, #5741
2024-07-27 19:53:21 -04:00
nimlgen
1903542c2d nv/cuda compilers touchup (#5759)
* nv/cuda compilers touchup

* fix cuda check + move nv disasm

* remove includes

* fix nvrtc_check
2024-07-28 00:15:28 +03:00
chenyu
3c79faaf77 remove redundant UOps max folding [run_process_replay] (#5762)
all covered by generic max folding
2024-07-27 16:46:51 -04:00
chenyu
05748e5a84 fix vmax of Uop.RANGE off by 1 (#5750)
with this, can remove several redundant max folding rules, do it separately to check kernel diff
2024-07-27 16:30:46 -04:00
nimlgen
fff19b961b docs: user runtime docs (#5756) 2024-07-27 23:21:54 +03:00
nimlgen
5d53fa491b amd autogened kfd ioctls (#5757)
* amd autogened kio

* unused import

* linter
2024-07-27 22:49:48 +03:00
nimlgen
ed1d784077 test profiler timer sync across devs (#5751)
* test profiler timer sync across devs

* more correct

* typo
2024-07-27 16:47:37 +03:00