chenyu
5c8de2d044
revert a mod pattern ( #5864 )
...
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
nimlgen
34168a64e3
optimize nv profiler ( #5856 )
...
* nv profiler fix
* cleanup hcq a bit
* fixes
* fix
* typo
* all signals put timestamp
* a bit cleaner
* merge fields
* type
* import
* tiny fix
2024-08-01 23:57:45 +03:00
George Hotz
2d3c7e4d4e
some TestPickleJIT tests ( #5860 )
...
* some TestPickleJIT tests
* hotfix: print which opencl device we are using
2024-08-01 12:39:59 -07:00
George Hotz
e347f10d33
hotfix: print which opencl device we are using
2024-08-01 12:39:46 -07:00
chenyu
0c8d202348
revert some UOp IDIV bound ( #5863 )
...
* revert some UOp IDIV bound
breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI
* those are correct
* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz
53fcac9e80
hotfix: increase time on flaky NV test
2024-08-01 10:20:07 -07:00
qazal
cedf459843
infra for multi view reduce_info [run_process_replay] ( #5861 )
2024-08-01 19:46:55 +03:00
qazal
26d0265d66
test schedule of LazyBuffers [run_process_replay] ( #5859 )
2024-08-01 19:06:29 +03:00
George Hotz
0e34d83777
hotfix: don't include the old input_rawbuffers in all_resources
2024-08-01 09:00:11 -07:00
chenyu
d609206a4a
move UOp patterns around [run_process_replay] ( #5857 )
...
group lt / div / mod together and minor cleanups
2024-08-01 11:32:08 -04:00
qazal
3e95e2bb0b
mutate reduceop shapes pre ast creation [run_process_replay] ( #5855 )
2024-08-01 15:00:05 +03:00
qazal
ba0a0008aa
early update the reduceop axis [run_process_replay] ( #5854 )
2024-08-01 14:08:40 +03:00
David Hou
eb91423cb4
MLB support reshape for uneven shards ( #5804 )
...
* cleaner uneven reshape
* update test
2024-08-01 02:36:03 -07:00
David González Martínez
0f09b94c43
add failing test for second order derivatives ( #5772 )
...
* add failing test
* fix lint
* fix bad merge
* fix again
* fix test
* more minimal
2024-08-01 02:34:47 -07:00
George Hotz
9d05dfb6f4
move JIT graphing into CapturedJit ( #5852 )
...
* move JIT graphing into CapturedJit
* better
* _jit_cache
* clear inputs cleanup
* test_pickle_jit with graph + cleanup
* 0 is fine to start
* support None in bufs
* alloc real buffers
* cleaner
2024-07-31 20:48:17 -07:00
chenyu
0ec732b494
test lin fail 47 for UOP_IS_SYMBOLIC ( #5853 )
...
failed arange example with UOP_IS_SYMBOLIC
2024-07-31 23:09:22 -04:00
George Hotz
c6a8395f1b
CapturedJit is fun to pickle [run_process_replay] ( #5851 )
...
* CapturedJit is fun to pickle
* export input replace
2024-07-31 17:23:01 -07:00
George Hotz
5ff3e46718
diff symbolic with uops [run_process_replay] ( #5841 )
...
* diff symbolic with uops
* mergable symbolic diff
2024-07-31 15:15:01 -07:00
George Hotz
72621d9e7c
count the specials in uops [run_process_replay] ( #5848 )
...
* count the specials in uops [run_process_replay]
* cleanups
2024-07-31 14:53:18 -07:00
chenyu
c2ffcf6887
remove the wrong mod UOp pattern ( #5847 )
...
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
qazal
8174c438a3
pad test_failure_45 ( #5846 )
2024-07-31 23:08:48 +03:00
George Hotz
8672a9db3f
add test to validate lazyops dims ( #5845 )
2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568
fix UOp ALU bound ( #5844 )
...
* fix UOp ALU bound
root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized
* it can be nan...
2024-07-31 15:19:31 -04:00
George Hotz
5eedd9e3ad
raise the line ceiling to 8600. USE LINES CAREFULLY
2024-07-31 09:56:39 -07:00
nimlgen
f768935be8
add RING_ALLREDUCE_THRESHOLD ( #5835 )
...
* add RING_ALLREDUCE_THRESHOLD
* becnhmark
* fixes
* fix n_gpus
* unused import
* remove debug=2
2024-07-31 16:13:09 +03:00
nimlgen
431749dc21
hcq fix timestamp around kernel ( #5837 )
2024-07-31 16:12:27 +03:00
chenyu
2e087ca8e4
UOp bound for div negative number ( #5808 )
2024-07-31 02:10:23 -04:00
qazal
bcbd925001
hcopts failing test for fused arange kernel ( #5815 )
...
* add failure_43
* n 45
2024-07-31 09:02:44 +03:00
chenyu
93c5989c84
add UOp bound for BinaryOps.CMPLT ( #5833 )
...
and remove the redundant lt folding rule
2024-07-31 01:46:48 -04:00
chenyu
5560bda509
remove redundant mod 1 pattern [run_process_replay] ( #5832 )
...
it's folded because min==max
2024-07-31 01:12:05 -04:00
qazal
ed556c260e
UOps.IF rules more tests ( #5831 )
...
* init tests
* split tests
* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
Vyacheslav Pachkov
610e454132
fix opencl_ioctl on comma ( #5814 )
...
- remove unused code
- add CP_REG_TO_MEM opcode
- fixed parse_cmd_buf for more than 1 command object by correcting
an offset
- fixed memory mappings for cases when memory was allocated with
KGSL_MEMFLAGS_USE_CPU_MAP.
KGSL_MEMFLAGS_USE_CPU_MAP: If set on call and return, the returned GPU
address will be 0. Calling mmap() will set the GPU address.
So there are no IOCTL_KGSL_GPUOBJ_INFO ioctls for that type of memory
and it resulted to crash right after get_mem.
2024-07-30 20:44:06 -07:00
David Hou
9a485f36e4
shard kvcache ( #5830 )
2024-07-30 20:29:54 -07:00
David Hou
492a696d14
allow specify splits in shard, handle multiple different splits in MLB.e ( #5599 )
...
* allow specify splits in shard, handle multiple different splits in MLB.e
* line width
* linter
* don't use Device in docstring
* specify size of shards instead of boundaries
* adjust docstring for specify size of shards instead of boundaries
* don't allow splits on symbolic axis?
* just allow sint in splits_to_bounds
* add message for assert
* bounds instead of splits to save lines
* fix types
* reduce diff
* fix
* tuple
* golf :(
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-07-30 19:33:04 -07:00
George Hotz
3f6a43ba12
less lines [run_process_replay] ( #5829 )
2024-07-30 19:32:54 -07:00
chenyu
c3da458bc3
UOp if min==max folds to CONST ( #5828 )
...
* UOp if min==max folds to CONST
* fix test
2024-07-30 22:14:22 -04:00
George Hotz
4e89d45513
hotfix: put contiguous back in llama
2024-07-30 18:43:48 -07:00
George Hotz
21c5e8e1b7
extreme llama speed, 57.34 tok/s ( #5827 )
...
* extreme llama speed
* mergable
2024-07-30 18:32:09 -07:00
George Hotz
e6879035a0
work to make GEMV fast ( #5824 )
...
* work to make GEMV fast
* half8 cast
* align struct
* fix amd
* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
2d90b7a103
remove redundant max boolean pattern ( #5826 )
...
covered by generic max folding [run_process_replay]
2024-07-30 20:27:54 -04:00
chenyu
02f0be03f2
tests on UOp div negative number and arange opts ( #5825 )
2024-07-30 20:06:57 -04:00
George Hotz
4dd24dc439
use decimal for timestamps for more precision [run_process_replay] ( #5823 )
...
* use decimal for timestamps for more precision
* err, didn't get saved
* fix types + 38 -> 40
2024-07-30 15:06:14 -07:00
chenyu
d072e628da
UOp bounds for max ( #5820 )
2024-07-30 17:54:44 -04:00
George Hotz
3630208a01
lil transcendental folding cleanup [run_process_replay] ( #5822 )
...
* lil transcendental folding cleanup [run_process_replay]
* idk why function isn't Callable
2024-07-30 14:10:17 -07:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
chenyu
e8a42b945c
simpler src variables in UOp._min_max [run_process_replay] ( #5819 )
...
s0,s1 instead of self.src[0] and self.src[1]
2024-07-30 15:18:42 -04:00
Francis Lata
a0baff7a3d
update dataloader script example ( #5818 )
2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
qazal
03d866b84f
UOps.IF with rewrite rules ( #5812 )
...
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
2024-07-30 20:50:56 +03:00