Commit Graph

5310 Commits

Author SHA1 Message Date
chenyu
0d7d4dd731 UOp._min_max for MUL and MOD (#5741) 2024-07-26 18:38:10 -04:00
George Hotz
c50e374bb6 multiple locals + get_kernel_modifier + fix valid (#5739)
* multiple locals + get_kernel_modifier + fix valid

* fix test pattern matcher
2024-07-26 15:10:10 -07:00
nimlgen
f6c0e17a2c optimize symbolic-related updates in graphs (#5727)
* try

* faster

* cleaner

* better?

* better?

* cleaner

* fixes

* unused

* mypy

* fix clang

* remove comment

* better var names

* rename

* fix cuda

* rename
2024-07-27 00:57:59 +03:00
chenyu
dc7483ee6f UOp simple div folding (#5740)
made UOp.divides return the Optional[quotient] and used it for simple div folding
2024-07-26 17:14:32 -04:00
chenyu
671259417f reuse UOp __repr__ for NOp (#5738) 2024-07-26 16:59:55 -04:00
kormann
b0c1dba299 named UOp class "NOP" [run_process_replay] (#5728)
* NOP

* fix const + simplify compile

* rm VAR for NOOP

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-26 13:25:53 -07:00
George Hotz
4df46eac67 clean up tensor cores [run_process_replay] (#5736)
* clean up tensor cores [run_process_replay]

* remove tuple(wmma_sz), self.opts.device

* remove tls, leave DEVICE
2024-07-26 13:21:23 -07:00
qazal
94d578396f separate process replay main loop (#5734)
* separate process replay main loop

* [run_process_replay]

* add kernel_changed

* test with [run_process_replay]

* revert temp [run_process_replay]
2024-07-26 21:43:08 +03:00
chenyu
9838c1a6ff update import style in runtime (#5735) 2024-07-26 14:00:23 -04:00
chenyu
a4e9ebc68a update test_uop_symbolic (#5733)
enabled more passed tests
2024-07-26 13:46:09 -04:00
George Hotz
5c688560bc move CUDA/HIP compilers to their own files [run_process_replay] (#5732) 2024-07-26 10:00:15 -07:00
chenyu
2cc55a3095 UOp simple mul add div fold (#5726) 2024-07-25 22:00:30 -04:00
chenyu
78f75aa80d remove redundant symbolic mod rule [run_process_replay] (#5725) 2024-07-25 21:21:02 -04:00
chenyu
5521b6d437 UOp simple mul-add-lt fold (#5721) 2024-07-25 20:49:38 -04:00
qazal
1b53207b4f revert isolated dags scheduling (#5724) 2024-07-25 19:45:12 -04:00
chenyu
845b0d1c9d UOp more generic div folding (#5722)
old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c`
new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`
2024-07-25 17:49:14 -04:00
nimlgen
fb8148077e hcq do not update the same signal (#5719)
* hcq do not update the same signal

* import them
2024-07-26 00:24:45 +03:00
nimlgen
6ec9ea9ddd hcq update_exec with optional params (#5708) 2024-07-26 00:04:57 +03:00
George Hotz
8b34ee2f52 remove global_size and local_size from Kernel class [run_process_replay] (#5720)
* remove global_size and local_size from Kernel class [run_process_replay]

* sizes from the prg
2024-07-25 13:55:08 -07:00
George Hotz
142b7fb22f faster beam [run_process_replay] (#5718) 2024-07-25 11:58:41 -07:00
chenyu
eff7c5fd2c halve kernel counts in metal Fuzz Test linearizer (#5716)
the test time has increased to 3 minutes
2024-07-25 14:35:11 -04:00
George Hotz
e877ed9688 cleaner uop expand [run_process_replay] (#5715)
* cleaner uop expand [run_process_replay]

* comments
2024-07-25 11:29:53 -07:00
chenyu
a82815262c more test_pattern_matcher fixups (#5714) 2024-07-25 14:12:21 -04:00
George Hotz
b8b5411845 move Function to Developer section of docs 2024-07-25 11:05:23 -07:00
qazal
f02124ffa0 rename to realize_reduceop (#5713)
* rename to realize_reduceop

* shorter comment
2024-07-25 20:57:33 +03:00
chenyu
05e02ddfb3 fixup test_pattern_matcher (#5712) 2024-07-25 13:48:52 -04:00
qazal
9ceb3a3d1f beautiful_mnist -4.3% kernels (#5709)
* add is_complete

* partially delete forced_realized

* p2

* start

* refactor to can_group

* remove steps

* _get_inputs is nicer

* fix the cache

* cache is dict now

* rename to group
2024-07-25 20:30:49 +03:00
kormann
92eefab4b0 method alu (#5711)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-25 13:25:38 -04:00
qazal
76877df518 map groupable children (#5710)
* map groupable children

* remove setitem
2024-07-25 19:27:48 +03:00
kormann
1e2eac755d Fix repr upat (#5705)
* test

* fix

* x fix

* simpler

* rm extra space
2024-07-25 12:05:48 -04:00
qazal
1c992de257 hotfix: compare_schedule defaults to false (#5707) 2024-07-25 17:08:28 +03:00
qazal
489cda827a more scheduler process replay tooling (#5706)
* more scheduler process replay tooling

* refactor to compare_schedule
2024-07-25 15:47:18 +03:00
qazal
4e070a2c89 start work on indexing fusion (#5590)
* start base

* the views add up

base reduceop st:
ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),))

top st:

ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False)))

merged buf.st+st:
ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False)))

* p1

* some cleanups

* more cleanups

* one kernel

* more

* late fuse arange

* less lines

* more work

* fix st strides 1

* update test_schedule, start argmax

* test_tiny_argmax

* add FUSE_ARANGE

* more cleanup

* add utils

* reduce merging

* fix axis and fold if needed

* more fusion

* need to figure this out

* now fixing all of these

* todos+save a line

* ready for p1
2024-07-25 13:23:38 +03:00
nimlgen
08f47d7dc3 more info on failure 41 (#5704) 2024-07-25 12:14:28 +03:00
nimlgen
69d4f474d8 amd resnet pf (#5703) 2024-07-25 11:21:22 +03:00
nimlgen
1038482a66 enable hip tc (#5702) 2024-07-25 11:12:11 +03:00
qazal
5b38ff8679 shorter llvm and ptx rendering [run_process_replay] (#5686)
* src_dtype

* that's a upat

* the assert in vectorize is in type_verify

* uops asserts vectorizing a vectorize

* assert this

* for internal casts it's fine
2024-07-25 10:42:25 +03:00
chenyu
46e1151c02 UOp more generic mul -> mod folding (#5698) 2024-07-24 21:41:25 -04:00
chenyu
66a9c372af UOp mod reduction (#5697) 2024-07-24 20:36:00 -04:00
George Hotz
489a5b99a5 hotfix: triton_nv_matmul touchups 2024-07-24 23:24:29 +00:00
chenyu
8648fb2636 UOp vmin/vmax on ADD (#5689) 2024-07-24 19:09:42 -04:00
qazal
e2e70bd90b bring unbind back in Varaible const (#5687)
* bring unbind back in Varaible const

* this shows my experience with symbolic
2024-07-24 18:37:00 -04:00
nimlgen
b026312a31 nv ptx print log (#5691) 2024-07-24 21:40:58 +03:00
George Hotz
bf24be4c8c triton gets 163 TFLOPS on 4090 2024-07-24 18:32:29 +00:00
chenyu
85710e86cb UOps div folding (#5690)
#5689, with just div folding and new test cases
2024-07-24 14:21:44 -04:00
chenyu
fb1b51811b unify UOp min/max default [run_process_replay] (#5688)
* unify UOp min/max default [run_process_replay]

* fix that
2024-07-24 13:05:26 -04:00
George Hotz
33d44f00ae first fold, then expand (#5673)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-24 09:43:09 -07:00
qazal
b7b4c7844f shorter BufferOps.LOAD creation (#5685) 2024-07-24 18:53:07 +03:00
qazal
365e7afd4d make fusion deterministic (#5684)
* make fusion deterministic

* not this one yet

* line saving
2024-07-24 18:37:31 +03:00
nimlgen
2ea54176e2 docs: add more info on HCQProgram (#5683)
* docs: add more info on HCQProgram

* linter

* linter2

* one more type
2024-07-24 17:20:18 +03:00