Commit Graph

2250 Commits

Author SHA1 Message Date
George Hotz
d7387d31bf remove useless reduce cases [run_process_replay] (#5907)
* remove useless reduce cases [run_process_replay]

* do_reduce cleanup

* more cleanups + no longer supported tests

* Revert "more cleanups + no longer supported tests"

This reverts commit e9f2f6ba70.

* no longer supported tests

* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
George Hotz
be8958e26b use CONTRACT before REDUCE (#5903)
* use CONTRACT before REDUCE [run_process_replay]

* support half expand

* EXPAND GEP
2024-08-04 16:17:33 -07:00
chenyu
4a65010de8 remove CUDACPU flag in tests [run_process_replay] (#5902)
no longer used
2024-08-04 16:06:38 -04:00
qazal
aad9234e52 test fused precompute_freqs_cis (#5900)
* test_precompute_freqs_cis

* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu
c67e9887f7 support using str to specify dtype (#5897)
* support using str to specify dtype

in Tensor creation and args into `cast` and `bitcast`, and acc_dtype

* more tests
2024-08-04 12:56:28 -04:00
qazal
4c5ef2cc4f setitem with arange fusion 1 (#5898) 2024-08-04 16:09:21 +03:00
chenyu
da61dea1b2 simple failed UOp sub symbolic test case (#5894) 2024-08-03 14:27:23 -04:00
qazal
56ef9e453e pad reduceops to the max of each dimension (#5889)
* early verify

* pad reduceops to the max of each dim

* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a indexing fusion 2 (#5888)
* arange fusion

* kernels that fuse

* tests
2024-08-03 13:13:39 +03:00
qazal
af59b2eea9 tests from the indexing fusion branch (#5886) 2024-08-03 11:56:48 +03:00
chenyu
d5de44340e UOp add mod folding (#5862)
* UOp add mod folding

* that passes now
2024-08-02 18:31:46 -04:00
chenyu
41bbd3f4c1 update UOp mod reduction patterns (#5883)
prepare generic mod folding, also some test changes from mod folding pr
2024-08-02 17:43:40 -04:00
wozeparrot
acadccf344 comma benchmark (#5518) 2024-08-02 14:36:54 -07:00
Elias Wahl
4a114756f6 New BERT dataloader (#5881)
* One file == One topic

* update test

* new dataloader

* update train script

* get index is faster
2024-08-02 15:12:23 -04:00
nimlgen
2777784b91 add dependency viewer to hcq profiler (#5874)
* hcq profiler support deps

* clean up

* cleaner

* cleanup

* revert this

* linter

* mypy

* add test

* sync is strange, need to take the end

* linter + test
2024-08-02 22:07:01 +03:00
George Hotz
23e8c39288 get program fields in __post_init__ [run_process_replay] (#5878)
* get program fields in __post_init__ [run_process_replay]

* remove print
2024-08-02 09:57:12 -07:00
qazal
8611fa6c99 apply opts.extra_matcher in process replay [run_process_replay] (#5877) 2024-08-02 18:07:58 +03:00
qazal
2a791f7924 fuzz uops is simpler with List[UOp] [run_process_replay] (#5875)
* remove from fuzz_uops

* update fuzz_uops.py

* add to realize.py
2024-08-02 17:28:15 +03:00
George Hotz
877e0b4ba0 define global only has the index [run_process_replay] (#5869)
* define global only has the index [run_process_replay]

* fix that linearizer test

* fix ptx

* stupid ptx fix
2024-08-01 19:01:15 -07:00
chenyu
f27f949a5d Revert "revert some UOp IDIV bound (#5863)" (#5871)
This reverts commit 0c8d202348.
2024-08-01 21:38:31 -04:00
chenyu
df138bc558 Revert "revert a mod pattern (#5864)" (#5870)
This reverts commit 5c8de2d044.
2024-08-01 20:44:26 -04:00
chenyu
1b0314d9ef Revert "remove one more UOp mod pattern (#5865)" (#5868)
This reverts commit b03b8e18c2.
2024-08-01 20:28:35 -04:00
George Hotz
d73bc85ba9 UOpGraph not in renderer or Program [run_process_replay] (#5867)
* UOpGraph not in renderer or Program [run_process_replay]

* fix some tests

* fix ptx
2024-08-01 16:20:30 -07:00
chenyu
b392b8edc3 increase atol and rtol test_gemm_fp16 (#5866)
* increase atol and rtol test_gemm_fp16

made it pass with NOOPT which has larger accumulated error

* revert that
2024-08-01 19:09:58 -04:00
chenyu
b03b8e18c2 remove one more UOp mod pattern (#5865)
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu
5c8de2d044 revert a mod pattern (#5864)
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
George Hotz
2d3c7e4d4e some TestPickleJIT tests (#5860)
* some TestPickleJIT tests

* hotfix: print which opencl device we are using
2024-08-01 12:39:59 -07:00
chenyu
0c8d202348 revert some UOp IDIV bound (#5863)
* revert some UOp IDIV bound

breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI

* those are correct

* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz
53fcac9e80 hotfix: increase time on flaky NV test 2024-08-01 10:20:07 -07:00
qazal
26d0265d66 test schedule of LazyBuffers [run_process_replay] (#5859) 2024-08-01 19:06:29 +03:00
David Hou
eb91423cb4 MLB support reshape for uneven shards (#5804)
* cleaner uneven reshape

* update test
2024-08-01 02:36:03 -07:00
David González Martínez
0f09b94c43 add failing test for second order derivatives (#5772)
* add failing test

* fix lint

* fix bad merge

* fix again

* fix test

* more minimal
2024-08-01 02:34:47 -07:00
George Hotz
9d05dfb6f4 move JIT graphing into CapturedJit (#5852)
* move JIT graphing into CapturedJit

* better

* _jit_cache

* clear inputs cleanup

* test_pickle_jit with graph + cleanup

* 0 is fine to start

* support None in bufs

* alloc real buffers

* cleaner
2024-07-31 20:48:17 -07:00
chenyu
0ec732b494 test lin fail 47 for UOP_IS_SYMBOLIC (#5853)
failed arange example with UOP_IS_SYMBOLIC
2024-07-31 23:09:22 -04:00
George Hotz
c6a8395f1b CapturedJit is fun to pickle [run_process_replay] (#5851)
* CapturedJit is fun to pickle

* export input replace
2024-07-31 17:23:01 -07:00
George Hotz
72621d9e7c count the specials in uops [run_process_replay] (#5848)
* count the specials in uops [run_process_replay]

* cleanups
2024-07-31 14:53:18 -07:00
chenyu
c2ffcf6887 remove the wrong mod UOp pattern (#5847)
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
qazal
8174c438a3 pad test_failure_45 (#5846) 2024-07-31 23:08:48 +03:00
George Hotz
8672a9db3f add test to validate lazyops dims (#5845) 2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568 fix UOp ALU bound (#5844)
* fix UOp ALU bound

root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized

* it can be nan...
2024-07-31 15:19:31 -04:00
nimlgen
f768935be8 add RING_ALLREDUCE_THRESHOLD (#5835)
* add RING_ALLREDUCE_THRESHOLD

* becnhmark

* fixes

* fix n_gpus

* unused import

* remove debug=2
2024-07-31 16:13:09 +03:00
chenyu
2e087ca8e4 UOp bound for div negative number (#5808) 2024-07-31 02:10:23 -04:00
qazal
bcbd925001 hcopts failing test for fused arange kernel (#5815)
* add failure_43

* n 45
2024-07-31 09:02:44 +03:00
qazal
ed556c260e UOps.IF rules more tests (#5831)
* init tests

* split tests

* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
David Hou
492a696d14 allow specify splits in shard, handle multiple different splits in MLB.e (#5599)
* allow specify splits in shard, handle multiple different splits in MLB.e

* line width

* linter

* don't use Device in docstring

* specify size of shards instead of boundaries

* adjust docstring for specify size of shards instead of boundaries

* don't allow splits on symbolic axis?

* just allow sint in splits_to_bounds

* add message for assert

* bounds instead of splits to save lines

* fix types

* reduce diff

* fix

* tuple

* golf :(

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-30 19:33:04 -07:00
chenyu
c3da458bc3 UOp if min==max folds to CONST (#5828)
* UOp if min==max folds to CONST

* fix test
2024-07-30 22:14:22 -04:00
George Hotz
e6879035a0 work to make GEMV fast (#5824)
* work to make GEMV fast

* half8 cast

* align struct

* fix amd

* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
02f0be03f2 tests on UOp div negative number and arange opts (#5825) 2024-07-30 20:06:57 -04:00
George Hotz
693990a346 swap src[2] and src[3] in load [run_process_replay] (#5821)
* swap src[2] and src[3] in load [run_process_replay]

* cleanups + bugfix

* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412 new style load/store folder (#5784)
* remove old index reorder

* new style folder

* works better

* dedup

* one failure

* this is fine now...

* expander_rewrite

* images broken, but all else should work

* cleanups

* make tests work with old

* fix images

* cleanups + bugfix

* minor fixes

* fix gated store folding

* flip gate_creator and expander

* fix gated store

* remove unneeded rules

* lines getting close

* line count good
2024-07-30 13:17:20 -07:00