Commit Graph

5398 Commits

Author SHA1 Message Date
chenyu
c2ffcf6887 remove the wrong mod UOp pattern (#5847)
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
qazal
8174c438a3 pad test_failure_45 (#5846) 2024-07-31 23:08:48 +03:00
George Hotz
8672a9db3f add test to validate lazyops dims (#5845) 2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568 fix UOp ALU bound (#5844)
* fix UOp ALU bound

root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized

* it can be nan...
2024-07-31 15:19:31 -04:00
George Hotz
5eedd9e3ad raise the line ceiling to 8600. USE LINES CAREFULLY 2024-07-31 09:56:39 -07:00
nimlgen
f768935be8 add RING_ALLREDUCE_THRESHOLD (#5835)
* add RING_ALLREDUCE_THRESHOLD

* becnhmark

* fixes

* fix n_gpus

* unused import

* remove debug=2
2024-07-31 16:13:09 +03:00
nimlgen
431749dc21 hcq fix timestamp around kernel (#5837) 2024-07-31 16:12:27 +03:00
chenyu
2e087ca8e4 UOp bound for div negative number (#5808) 2024-07-31 02:10:23 -04:00
qazal
bcbd925001 hcopts failing test for fused arange kernel (#5815)
* add failure_43

* n 45
2024-07-31 09:02:44 +03:00
chenyu
93c5989c84 add UOp bound for BinaryOps.CMPLT (#5833)
and remove the redundant lt folding rule
2024-07-31 01:46:48 -04:00
chenyu
5560bda509 remove redundant mod 1 pattern [run_process_replay] (#5832)
it's folded because min==max
2024-07-31 01:12:05 -04:00
qazal
ed556c260e UOps.IF rules more tests (#5831)
* init tests

* split tests

* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
Vyacheslav Pachkov
610e454132 fix opencl_ioctl on comma (#5814)
- remove unused code
- add CP_REG_TO_MEM opcode
- fixed parse_cmd_buf for more than 1 command object by correcting
an offset
- fixed memory mappings for cases when memory was allocated with
KGSL_MEMFLAGS_USE_CPU_MAP.
KGSL_MEMFLAGS_USE_CPU_MAP: If set on call and return, the returned GPU
address will be 0. Calling mmap() will set the GPU address.
So there are no IOCTL_KGSL_GPUOBJ_INFO ioctls for that type of memory
and it resulted to crash right after get_mem.
2024-07-30 20:44:06 -07:00
David Hou
9a485f36e4 shard kvcache (#5830) 2024-07-30 20:29:54 -07:00
David Hou
492a696d14 allow specify splits in shard, handle multiple different splits in MLB.e (#5599)
* allow specify splits in shard, handle multiple different splits in MLB.e

* line width

* linter

* don't use Device in docstring

* specify size of shards instead of boundaries

* adjust docstring for specify size of shards instead of boundaries

* don't allow splits on symbolic axis?

* just allow sint in splits_to_bounds

* add message for assert

* bounds instead of splits to save lines

* fix types

* reduce diff

* fix

* tuple

* golf :(

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-30 19:33:04 -07:00
George Hotz
3f6a43ba12 less lines [run_process_replay] (#5829) 2024-07-30 19:32:54 -07:00
chenyu
c3da458bc3 UOp if min==max folds to CONST (#5828)
* UOp if min==max folds to CONST

* fix test
2024-07-30 22:14:22 -04:00
George Hotz
4e89d45513 hotfix: put contiguous back in llama 2024-07-30 18:43:48 -07:00
George Hotz
21c5e8e1b7 extreme llama speed, 57.34 tok/s (#5827)
* extreme llama speed

* mergable
2024-07-30 18:32:09 -07:00
George Hotz
e6879035a0 work to make GEMV fast (#5824)
* work to make GEMV fast

* half8 cast

* align struct

* fix amd

* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
2d90b7a103 remove redundant max boolean pattern (#5826)
covered by generic max folding [run_process_replay]
2024-07-30 20:27:54 -04:00
chenyu
02f0be03f2 tests on UOp div negative number and arange opts (#5825) 2024-07-30 20:06:57 -04:00
George Hotz
4dd24dc439 use decimal for timestamps for more precision [run_process_replay] (#5823)
* use decimal for timestamps for more precision

* err, didn't get saved

* fix types + 38 -> 40
2024-07-30 15:06:14 -07:00
chenyu
d072e628da UOp bounds for max (#5820) 2024-07-30 17:54:44 -04:00
George Hotz
3630208a01 lil transcendental folding cleanup [run_process_replay] (#5822)
* lil transcendental folding cleanup [run_process_replay]

* idk why function isn't Callable
2024-07-30 14:10:17 -07:00
George Hotz
693990a346 swap src[2] and src[3] in load [run_process_replay] (#5821)
* swap src[2] and src[3] in load [run_process_replay]

* cleanups + bugfix

* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412 new style load/store folder (#5784)
* remove old index reorder

* new style folder

* works better

* dedup

* one failure

* this is fine now...

* expander_rewrite

* images broken, but all else should work

* cleanups

* make tests work with old

* fix images

* cleanups + bugfix

* minor fixes

* fix gated store folding

* flip gate_creator and expander

* fix gated store

* remove unneeded rules

* lines getting close

* line count good
2024-07-30 13:17:20 -07:00
chenyu
e8a42b945c simpler src variables in UOp._min_max [run_process_replay] (#5819)
s0,s1 instead of self.src[0] and self.src[1]
2024-07-30 15:18:42 -04:00
Francis Lata
a0baff7a3d update dataloader script example (#5818) 2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922 feat: temperature 0 llama3 benchmark (#5806) 2024-07-30 12:05:36 -07:00
qazal
03d866b84f UOps.IF with rewrite rules (#5812)
* expand merge

* merge barriers

* gate_folder

* test_linearizer_failures

* this can be here

* bring the new repr back

* gate_folder2

* gate_creator is better

* gate_folder

* dedup conditions

* early gate folding

* dedup barrier

* fold noop conditions

* all consts can go away

* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0 unify negative shape creation to raise ValueError (#5817)
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a Add check for negative dimension in view (#5790)
* add check for negative dimension in view

* add negative dim tests

* move check to tensor level

* fix error message

* move check to view create

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-30 13:26:27 -04:00
P4ssenger
2b7b7591d2 rename upcast_axis into plural (#5788) 2024-07-30 10:07:35 -07:00
Francis Lata
ce61be16f1 clean up how preprocessed folder is defined (#5813) 2024-07-30 12:35:26 -04:00
nimlgen
ca674c31f9 nv remove some type ignores (#5811) 2024-07-30 17:47:29 +03:00
wozeparrot
639af3f823 llama3 temperature flag (#5803) 2024-07-29 16:33:51 -07:00
chenyu
22e7289fe0 s/self.shape_len - self.upcasted/self.first_upcast (#5802)
missed the one with spaces.
[run_process_replay]
2024-07-29 18:23:42 -04:00
chenyu
1a19751902 s/self.shape_len-self.upcasted/self.first_upcast (#5801)
[run_process_replay]
2024-07-29 17:54:10 -04:00
qazal
5e827e51d2 add llama3 BEAM=2 failures to test_linearizer_failures (#5553)
* skips

* opts.device

* benchmarks

* add to test_linearizer_failures

* remove hardcoded ones

* linter

* skip cpu
2024-07-30 00:37:32 +03:00
chenyu
cb6718347f python -m mkdocs build --strict in CI (#5800) 2024-07-29 16:46:30 -04:00
nimlgen
a25e1a1c90 nv open correct device (#5796) 2024-07-29 23:40:52 +03:00
chenyu
be3899d211 hotfix increase ci timeout to 20 mintues (#5799)
when cache is clear it takes time to populate cache
2024-07-29 16:25:27 -04:00
chenyu
fc393d710d LazyBuffer.const type check cleanup [run_process_replay] (#5795) 2024-07-29 16:17:14 -04:00
chenyu
2cadf21684 include "mkdocs" in setup docs (#5798) 2024-07-29 15:54:52 -04:00
chenyu
471b188d79 fix mypy errors in latest mypy (#5794)
* fix mypy errors in latest mypy

mypy has stricter partial and api arg checks now

* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
samm393
573e0f9a48 remove float division from idiv in python_alu (#5777)
* removes float division from idiv in python_alu

* add test

* cleaner logic

* pass clang unsigned literals correctly

* suffix ULL instead of U

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2 ull literal support and test (#5789)
* ull literal support and test

* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
71e1472290 hcq more types (#5791)
* mhcq more types

* linter

* pylint

* docs: bind
2024-07-29 18:03:23 +03:00
P4ssenger
9c80f9adf9 fix bug in assert message (#5787) 2024-07-29 15:46:23 +03:00