George Hotz
73d4d51845
add LBScheduleItem type [run_process_replay] ( #5944 )
...
* add LBScheduleItem type [run_process_replay]
* minor cleanups
* fix
* fix fuzz tests
* add group cache type
2024-08-06 14:49:40 -07:00
qazal
7b6496f2e6
fix the reduceops cache breaking beautiful_mnist ( #5938 )
...
* fix the reduceops cache breaking beautiful_mnist
* test_sparse_categorical_crossentropy_simple
* starting tests
* atol from test_nn
* test_sparse_categorical_crossentropy_alt
* dont use torch
2024-08-07 00:02:54 +03:00
George Hotz
1417cc8df1
can reenable that test now ( #5914 )
2024-08-06 13:38:21 -07:00
chenyu
489575c3be
more UOp sum div with gcd tests ( #5936 )
...
* more UOp sum div with gcd tests
* one more
2024-08-06 12:50:10 -04:00
ignaciosica
81ae9fadc8
Float4 support for CLANG ( #5915 )
...
* float4 support on clang
* skip linearizer tests that require locals
* add aligned attribute
2024-08-06 07:50:12 -07:00
qazal
a7db4c3ee9
show timings for DIFF_ARANGE=1 ( #5935 )
...
* show timings for DIFF_ARANGE=1
* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b
diff fused arange schedules with ARANGE_DIFF=1 ( #5934 )
...
* diff fused arange schedules with ARANGE_DIFF=1
* better llama diff
2024-08-06 16:52:26 +03:00
qazal
3d4742dd2e
override output shape in fused assign ( #5930 )
...
* override output shape in fused assign
This makes
```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.
* merge asserts
2024-08-06 13:28:50 +03:00
chenyu
09b7722637
UOp generic div folding ( #5896 )
2024-08-05 21:38:43 -04:00
George Hotz
3e1336957d
test arange with all opts ( #5923 )
...
* test arange with all opts
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
5d17f54e3c
fast mnist indexing ( #5921 )
...
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
George Hotz
e81c18f494
make the arange test check correctness [run_process_replay] ( #5920 )
2024-08-05 13:41:06 -07:00
George Hotz
8d1c884e78
capture the const pattern in both directions ( #5919 )
...
* capture the const pattern in both directions
* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
42f599870c
unroll arange is broken ( #5918 )
...
* unroll arange is broken
* fix unrolled arange
* one more test
2024-08-05 12:15:07 -07:00
qazal
70949ea7e6
test cstyle compile error for max with inline const ( #5838 )
...
* test_failure_46
* GPU=1 fails too
* add test_renderer
* add failing platforms
* nv too
* assert return value
2024-08-05 19:02:16 +03:00
qazal
e0c6520138
check arange fusing with VIEW and COPY ( #5912 )
...
* check arange fusing with VIEW and COPY
* gpu and clang
2024-08-05 17:09:21 +03:00
nimlgen
590b9ebb34
hcq copy queue is optional ( #5909 )
...
* hcq copy queue is optional
* one more
* this
2024-08-05 14:03:25 +03:00
George Hotz
159ac06b5b
remove unused reduce rules + improve unparented ( #5908 )
...
* remove unused reduce rules [run_process_replay]
* this work
* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz
d7387d31bf
remove useless reduce cases [run_process_replay] ( #5907 )
...
* remove useless reduce cases [run_process_replay]
* do_reduce cleanup
* more cleanups + no longer supported tests
* Revert "more cleanups + no longer supported tests"
This reverts commit e9f2f6ba70 .
* no longer supported tests
* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
George Hotz
be8958e26b
use CONTRACT before REDUCE ( #5903 )
...
* use CONTRACT before REDUCE [run_process_replay]
* support half expand
* EXPAND GEP
2024-08-04 16:17:33 -07:00
chenyu
4a65010de8
remove CUDACPU flag in tests [run_process_replay] ( #5902 )
...
no longer used
2024-08-04 16:06:38 -04:00
qazal
aad9234e52
test fused precompute_freqs_cis ( #5900 )
...
* test_precompute_freqs_cis
* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu
c67e9887f7
support using str to specify dtype ( #5897 )
...
* support using str to specify dtype
in Tensor creation and args into `cast` and `bitcast`, and acc_dtype
* more tests
2024-08-04 12:56:28 -04:00
qazal
4c5ef2cc4f
setitem with arange fusion 1 ( #5898 )
2024-08-04 16:09:21 +03:00
chenyu
da61dea1b2
simple failed UOp sub symbolic test case ( #5894 )
2024-08-03 14:27:23 -04:00
qazal
56ef9e453e
pad reduceops to the max of each dimension ( #5889 )
...
* early verify
* pad reduceops to the max of each dim
* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a
indexing fusion 2 ( #5888 )
...
* arange fusion
* kernels that fuse
* tests
2024-08-03 13:13:39 +03:00
qazal
af59b2eea9
tests from the indexing fusion branch ( #5886 )
2024-08-03 11:56:48 +03:00
chenyu
d5de44340e
UOp add mod folding ( #5862 )
...
* UOp add mod folding
* that passes now
2024-08-02 18:31:46 -04:00
chenyu
41bbd3f4c1
update UOp mod reduction patterns ( #5883 )
...
prepare generic mod folding, also some test changes from mod folding pr
2024-08-02 17:43:40 -04:00
wozeparrot
acadccf344
comma benchmark ( #5518 )
2024-08-02 14:36:54 -07:00
Elias Wahl
4a114756f6
New BERT dataloader ( #5881 )
...
* One file == One topic
* update test
* new dataloader
* update train script
* get index is faster
2024-08-02 15:12:23 -04:00
nimlgen
2777784b91
add dependency viewer to hcq profiler ( #5874 )
...
* hcq profiler support deps
* clean up
* cleaner
* cleanup
* revert this
* linter
* mypy
* add test
* sync is strange, need to take the end
* linter + test
2024-08-02 22:07:01 +03:00
George Hotz
23e8c39288
get program fields in __post_init__ [run_process_replay] ( #5878 )
...
* get program fields in __post_init__ [run_process_replay]
* remove print
2024-08-02 09:57:12 -07:00
qazal
8611fa6c99
apply opts.extra_matcher in process replay [run_process_replay] ( #5877 )
2024-08-02 18:07:58 +03:00
qazal
2a791f7924
fuzz uops is simpler with List[UOp] [run_process_replay] ( #5875 )
...
* remove from fuzz_uops
* update fuzz_uops.py
* add to realize.py
2024-08-02 17:28:15 +03:00
George Hotz
877e0b4ba0
define global only has the index [run_process_replay] ( #5869 )
...
* define global only has the index [run_process_replay]
* fix that linearizer test
* fix ptx
* stupid ptx fix
2024-08-01 19:01:15 -07:00
chenyu
f27f949a5d
Revert "revert some UOp IDIV bound ( #5863 )" ( #5871 )
...
This reverts commit 0c8d202348 .
2024-08-01 21:38:31 -04:00
chenyu
df138bc558
Revert "revert a mod pattern ( #5864 )" ( #5870 )
...
This reverts commit 5c8de2d044 .
2024-08-01 20:44:26 -04:00
chenyu
1b0314d9ef
Revert "remove one more UOp mod pattern ( #5865 )" ( #5868 )
...
This reverts commit b03b8e18c2 .
2024-08-01 20:28:35 -04:00
George Hotz
d73bc85ba9
UOpGraph not in renderer or Program [run_process_replay] ( #5867 )
...
* UOpGraph not in renderer or Program [run_process_replay]
* fix some tests
* fix ptx
2024-08-01 16:20:30 -07:00
chenyu
b392b8edc3
increase atol and rtol test_gemm_fp16 ( #5866 )
...
* increase atol and rtol test_gemm_fp16
made it pass with NOOPT which has larger accumulated error
* revert that
2024-08-01 19:09:58 -04:00
chenyu
b03b8e18c2
remove one more UOp mod pattern ( #5865 )
...
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu
5c8de2d044
revert a mod pattern ( #5864 )
...
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
George Hotz
2d3c7e4d4e
some TestPickleJIT tests ( #5860 )
...
* some TestPickleJIT tests
* hotfix: print which opencl device we are using
2024-08-01 12:39:59 -07:00
chenyu
0c8d202348
revert some UOp IDIV bound ( #5863 )
...
* revert some UOp IDIV bound
breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI
* those are correct
* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz
53fcac9e80
hotfix: increase time on flaky NV test
2024-08-01 10:20:07 -07:00
qazal
26d0265d66
test schedule of LazyBuffers [run_process_replay] ( #5859 )
2024-08-01 19:06:29 +03:00
David Hou
eb91423cb4
MLB support reshape for uneven shards ( #5804 )
...
* cleaner uneven reshape
* update test
2024-08-01 02:36:03 -07:00
David González Martínez
0f09b94c43
add failing test for second order derivatives ( #5772 )
...
* add failing test
* fix lint
* fix bad merge
* fix again
* fix test
* more minimal
2024-08-01 02:34:47 -07:00