George Hotz
8672a9db3f
add test to validate lazyops dims ( #5845 )
2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568
fix UOp ALU bound ( #5844 )
...
* fix UOp ALU bound
root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized
* it can be nan...
2024-07-31 15:19:31 -04:00
George Hotz
5eedd9e3ad
raise the line ceiling to 8600. USE LINES CAREFULLY
2024-07-31 09:56:39 -07:00
nimlgen
f768935be8
add RING_ALLREDUCE_THRESHOLD ( #5835 )
...
* add RING_ALLREDUCE_THRESHOLD
* becnhmark
* fixes
* fix n_gpus
* unused import
* remove debug=2
2024-07-31 16:13:09 +03:00
nimlgen
431749dc21
hcq fix timestamp around kernel ( #5837 )
2024-07-31 16:12:27 +03:00
chenyu
2e087ca8e4
UOp bound for div negative number ( #5808 )
2024-07-31 02:10:23 -04:00
qazal
bcbd925001
hcopts failing test for fused arange kernel ( #5815 )
...
* add failure_43
* n 45
2024-07-31 09:02:44 +03:00
chenyu
93c5989c84
add UOp bound for BinaryOps.CMPLT ( #5833 )
...
and remove the redundant lt folding rule
2024-07-31 01:46:48 -04:00
chenyu
5560bda509
remove redundant mod 1 pattern [run_process_replay] ( #5832 )
...
it's folded because min==max
2024-07-31 01:12:05 -04:00
qazal
ed556c260e
UOps.IF rules more tests ( #5831 )
...
* init tests
* split tests
* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
Vyacheslav Pachkov
610e454132
fix opencl_ioctl on comma ( #5814 )
...
- remove unused code
- add CP_REG_TO_MEM opcode
- fixed parse_cmd_buf for more than 1 command object by correcting
an offset
- fixed memory mappings for cases when memory was allocated with
KGSL_MEMFLAGS_USE_CPU_MAP.
KGSL_MEMFLAGS_USE_CPU_MAP: If set on call and return, the returned GPU
address will be 0. Calling mmap() will set the GPU address.
So there are no IOCTL_KGSL_GPUOBJ_INFO ioctls for that type of memory
and it resulted to crash right after get_mem.
2024-07-30 20:44:06 -07:00
David Hou
9a485f36e4
shard kvcache ( #5830 )
2024-07-30 20:29:54 -07:00
David Hou
492a696d14
allow specify splits in shard, handle multiple different splits in MLB.e ( #5599 )
...
* allow specify splits in shard, handle multiple different splits in MLB.e
* line width
* linter
* don't use Device in docstring
* specify size of shards instead of boundaries
* adjust docstring for specify size of shards instead of boundaries
* don't allow splits on symbolic axis?
* just allow sint in splits_to_bounds
* add message for assert
* bounds instead of splits to save lines
* fix types
* reduce diff
* fix
* tuple
* golf :(
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-07-30 19:33:04 -07:00
George Hotz
3f6a43ba12
less lines [run_process_replay] ( #5829 )
2024-07-30 19:32:54 -07:00
chenyu
c3da458bc3
UOp if min==max folds to CONST ( #5828 )
...
* UOp if min==max folds to CONST
* fix test
2024-07-30 22:14:22 -04:00
George Hotz
4e89d45513
hotfix: put contiguous back in llama
2024-07-30 18:43:48 -07:00
George Hotz
21c5e8e1b7
extreme llama speed, 57.34 tok/s ( #5827 )
...
* extreme llama speed
* mergable
2024-07-30 18:32:09 -07:00
George Hotz
e6879035a0
work to make GEMV fast ( #5824 )
...
* work to make GEMV fast
* half8 cast
* align struct
* fix amd
* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
2d90b7a103
remove redundant max boolean pattern ( #5826 )
...
covered by generic max folding [run_process_replay]
2024-07-30 20:27:54 -04:00
chenyu
02f0be03f2
tests on UOp div negative number and arange opts ( #5825 )
2024-07-30 20:06:57 -04:00
George Hotz
4dd24dc439
use decimal for timestamps for more precision [run_process_replay] ( #5823 )
...
* use decimal for timestamps for more precision
* err, didn't get saved
* fix types + 38 -> 40
2024-07-30 15:06:14 -07:00
chenyu
d072e628da
UOp bounds for max ( #5820 )
2024-07-30 17:54:44 -04:00
George Hotz
3630208a01
lil transcendental folding cleanup [run_process_replay] ( #5822 )
...
* lil transcendental folding cleanup [run_process_replay]
* idk why function isn't Callable
2024-07-30 14:10:17 -07:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
chenyu
e8a42b945c
simpler src variables in UOp._min_max [run_process_replay] ( #5819 )
...
s0,s1 instead of self.src[0] and self.src[1]
2024-07-30 15:18:42 -04:00
Francis Lata
a0baff7a3d
update dataloader script example ( #5818 )
2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
qazal
03d866b84f
UOps.IF with rewrite rules ( #5812 )
...
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0
unify negative shape creation to raise ValueError ( #5817 )
...
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a
Add check for negative dimension in view ( #5790 )
...
* add check for negative dimension in view
* add negative dim tests
* move check to tensor level
* fix error message
* move check to view create
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-30 13:26:27 -04:00
P4ssenger
2b7b7591d2
rename upcast_axis into plural ( #5788 )
2024-07-30 10:07:35 -07:00
Francis Lata
ce61be16f1
clean up how preprocessed folder is defined ( #5813 )
2024-07-30 12:35:26 -04:00
nimlgen
ca674c31f9
nv remove some type ignores ( #5811 )
2024-07-30 17:47:29 +03:00
wozeparrot
639af3f823
llama3 temperature flag ( #5803 )
2024-07-29 16:33:51 -07:00
chenyu
22e7289fe0
s/self.shape_len - self.upcasted/self.first_upcast ( #5802 )
...
missed the one with spaces.
[run_process_replay]
2024-07-29 18:23:42 -04:00
chenyu
1a19751902
s/self.shape_len-self.upcasted/self.first_upcast ( #5801 )
...
[run_process_replay]
2024-07-29 17:54:10 -04:00
qazal
5e827e51d2
add llama3 BEAM=2 failures to test_linearizer_failures ( #5553 )
...
* skips
* opts.device
* benchmarks
* add to test_linearizer_failures
* remove hardcoded ones
* linter
* skip cpu
2024-07-30 00:37:32 +03:00
chenyu
cb6718347f
python -m mkdocs build --strict in CI (#5800 )
2024-07-29 16:46:30 -04:00
nimlgen
a25e1a1c90
nv open correct device ( #5796 )
2024-07-29 23:40:52 +03:00
chenyu
be3899d211
hotfix increase ci timeout to 20 mintues ( #5799 )
...
when cache is clear it takes time to populate cache
2024-07-29 16:25:27 -04:00
chenyu
fc393d710d
LazyBuffer.const type check cleanup [run_process_replay] ( #5795 )
2024-07-29 16:17:14 -04:00
chenyu
2cadf21684
include "mkdocs" in setup docs ( #5798 )
2024-07-29 15:54:52 -04:00
chenyu
471b188d79
fix mypy errors in latest mypy ( #5794 )
...
* fix mypy errors in latest mypy
mypy has stricter partial and api arg checks now
* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
samm393
573e0f9a48
remove float division from idiv in python_alu ( #5777 )
...
* removes float division from idiv in python_alu
* add test
* cleaner logic
* pass clang unsigned literals correctly
* suffix ULL instead of U
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2
ull literal support and test ( #5789 )
...
* ull literal support and test
* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
71e1472290
hcq more types ( #5791 )
...
* mhcq more types
* linter
* pylint
* docs: bind
2024-07-29 18:03:23 +03:00
P4ssenger
9c80f9adf9
fix bug in assert message ( #5787 )
2024-07-29 15:46:23 +03:00
nimlgen
ab3839a80a
cleanup nv/cuda compilers ( #5767 )
...
* cleanup nv/cuda compilers
* destroy prog
* small test
* fix test
* nv ptx rewrite key
* jitlink free
* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
76840fd65a
minor ops cleanup [run_process_replay] ( #5786 )
2024-07-29 02:30:38 -04:00