chenyu
aee737bd9e
divide by gcd in UOp div folding ( #5949 )
...
* divide by gcd in UOp div folding
`(6x+6y)//16 -> (3x+3y)//8` etc
simpler version
* only factor out const
* don't apply for unsigned
* don't need that if
* space
2024-08-06 20:00:57 -04:00
George Hotz
6d1fdcfce2
don't reduce the same thing in a vector ( #5950 )
...
* don't reduce the same thing over and over
* cleaner way to write it that doesn't loop
2024-08-06 16:59:15 -07:00
qazal
d5d7f4e7b8
more TestIndexing correctness asserts [run_process_replay] ( #5948 )
...
* use torch in test_mnist_val
* more asserts
2024-08-07 01:50:42 +03:00
qazal
7f062929e8
start all cached scheduler functions with buf, st [run_process_replay] ( #5946 )
...
* start all cached scheduler functions with buf, st
- [x] _recursive_group
- [x] _recursive_lazyop
- [x] _recurse_reduceops
* use dict [run_process_replay]
2024-08-07 01:24:22 +03:00
chenyu
794796256c
UOp.const_factor [run_process_replay] ( #5945 )
...
* UOp.const_factor [run_process_replay]
simplify mod and div folding
* test does not work now
2024-08-06 18:18:29 -04:00
Elias Wahl
c9862e17d4
MLPERF BERT submission scripts ( #5931 )
...
* green
* red
* fix benchmark
* log
* count train samples
* oops. 4.0 -> 4.1
* note to todo
* no pillow
2024-08-06 18:09:18 -04:00
George Hotz
73d4d51845
add LBScheduleItem type [run_process_replay] ( #5944 )
...
* add LBScheduleItem type [run_process_replay]
* minor cleanups
* fix
* fix fuzz tests
* add group cache type
2024-08-06 14:49:40 -07:00
chenyu
1dab75ae37
clean up mlperf dataloader import ( #5940 )
...
use tinygrad tqdm for dataset, and PIL Image is only needed for resnet
2024-08-06 17:10:08 -04:00
qazal
7b6496f2e6
fix the reduceops cache breaking beautiful_mnist ( #5938 )
...
* fix the reduceops cache breaking beautiful_mnist
* test_sparse_categorical_crossentropy_simple
* starting tests
* atol from test_nn
* test_sparse_categorical_crossentropy_alt
* dont use torch
2024-08-07 00:02:54 +03:00
George Hotz
1417cc8df1
can reenable that test now ( #5914 )
2024-08-06 13:38:21 -07:00
George Hotz
75154d7ae2
add some types to the scheduler [run_process_replay] ( #5941 )
...
* add some types to the scheduler [run_process_replay]
* set -> dedup
2024-08-06 12:23:54 -07:00
George Hotz
e077bc7baf
move memory planner to realize ( #5937 )
2024-08-06 10:41:29 -07:00
chenyu
489575c3be
more UOp sum div with gcd tests ( #5936 )
...
* more UOp sum div with gcd tests
* one more
2024-08-06 12:50:10 -04:00
ignaciosica
81ae9fadc8
Float4 support for CLANG ( #5915 )
...
* float4 support on clang
* skip linearizer tests that require locals
* add aligned attribute
2024-08-06 07:50:12 -07:00
qazal
a7db4c3ee9
show timings for DIFF_ARANGE=1 ( #5935 )
...
* show timings for DIFF_ARANGE=1
* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b
diff fused arange schedules with ARANGE_DIFF=1 ( #5934 )
...
* diff fused arange schedules with ARANGE_DIFF=1
* better llama diff
2024-08-06 16:52:26 +03:00
qazal
f7761245aa
save_schedule pre toposort [run_process_replay] ( #5933 )
2024-08-06 15:10:01 +03:00
nimlgen
895e062723
nv remove useless init ( #5932 )
2024-08-06 14:41:40 +03:00
qazal
3d4742dd2e
override output shape in fused assign ( #5930 )
...
* override output shape in fused assign
This makes
```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.
* merge asserts
2024-08-06 13:28:50 +03:00
nimlgen
341c394c89
amd save exec offsets ( #5928 )
...
* amd save exec offsets
* fix
* better
* ugh
2024-08-06 12:11:46 +03:00
wozeparrot
5808e8a30f
mockgpu remu changes ( #5925 )
2024-08-05 19:26:58 -07:00
chenyu
09b7722637
UOp generic div folding ( #5896 )
2024-08-05 21:38:43 -04:00
George Hotz
3e1336957d
test arange with all opts ( #5923 )
...
* test arange with all opts
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
2e7adb529f
don't run kernels with 1000x more compute (fix BEAM with FUSE_ARANGE) ( #5924 )
2024-08-05 16:28:09 -07:00
George Hotz
5d17f54e3c
fast mnist indexing ( #5921 )
...
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
George Hotz
e81c18f494
make the arange test check correctness [run_process_replay] ( #5920 )
2024-08-05 13:41:06 -07:00
George Hotz
8d1c884e78
capture the const pattern in both directions ( #5919 )
...
* capture the const pattern in both directions
* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
42f599870c
unroll arange is broken ( #5918 )
...
* unroll arange is broken
* fix unrolled arange
* one more test
2024-08-05 12:15:07 -07:00
wozeparrot
6740a0a6a0
hip_ioctl changes ( #5917 )
2024-08-05 11:58:38 -07:00
qazal
70949ea7e6
test cstyle compile error for max with inline const ( #5838 )
...
* test_failure_46
* GPU=1 fails too
* add test_renderer
* add failing platforms
* nv too
* assert return value
2024-08-05 19:02:16 +03:00
nimlgen
98df648a79
metal sync queues in transfer ( #5308 )
...
* metal sync queues
* cleaner
* need this
* oops
2024-08-05 18:43:22 +03:00
qazal
6a70c69167
hotfix: TC renders nv_bfloat16 ( #5913 )
...
* fix wmma bfloat16
* cleanup
2024-08-05 18:40:31 +03:00
P4ssenger
8ce9e6e693
Fix vectorized dtype rendering bug in CLANG ( #5911 )
...
* fix vectorized types rendering for clang
* fix bug in fix
* fix bug 2 in fix 2
2024-08-05 17:43:26 +03:00
qazal
e0c6520138
check arange fusing with VIEW and COPY ( #5912 )
...
* check arange fusing with VIEW and COPY
* gpu and clang
2024-08-05 17:09:21 +03:00
nimlgen
590b9ebb34
hcq copy queue is optional ( #5909 )
...
* hcq copy queue is optional
* one more
* this
2024-08-05 14:03:25 +03:00
George Hotz
159ac06b5b
remove unused reduce rules + improve unparented ( #5908 )
...
* remove unused reduce rules [run_process_replay]
* this work
* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz
d7387d31bf
remove useless reduce cases [run_process_replay] ( #5907 )
...
* remove useless reduce cases [run_process_replay]
* do_reduce cleanup
* more cleanups + no longer supported tests
* Revert "more cleanups + no longer supported tests"
This reverts commit e9f2f6ba70 .
* no longer supported tests
* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
wozeparrot
94917521ee
fix: sqlite on pypy ( #5906 )
2024-08-04 16:40:59 -07:00
George Hotz
be8958e26b
use CONTRACT before REDUCE ( #5903 )
...
* use CONTRACT before REDUCE [run_process_replay]
* support half expand
* EXPAND GEP
2024-08-04 16:17:33 -07:00
wozeparrot
f33950f454
tracemeta fixups ( #5904 )
2024-08-04 16:15:06 -07:00
chenyu
adba5efc64
enable llama 2 70B in tinybox green CI ( #5905 )
...
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
chenyu
4a65010de8
remove CUDACPU flag in tests [run_process_replay] ( #5902 )
...
no longer used
2024-08-04 16:06:38 -04:00
chenyu
996ff0c135
pow(2) -> square in RMSNorm [run_process_replay] ( #5901 )
...
reads nicer in metadata
2024-08-04 14:21:31 -04:00
qazal
aad9234e52
test fused precompute_freqs_cis ( #5900 )
...
* test_precompute_freqs_cis
* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu
c67e9887f7
support using str to specify dtype ( #5897 )
...
* support using str to specify dtype
in Tensor creation and args into `cast` and `bitcast`, and acc_dtype
* more tests
2024-08-04 12:56:28 -04:00
nimlgen
4f9221e8dd
remove useless _ensure_shared_time_base ( #5899 )
2024-08-04 17:01:54 +03:00
qazal
4c5ef2cc4f
setitem with arange fusion 1 ( #5898 )
2024-08-04 16:09:21 +03:00
chenyu
59315ffc78
minor cleanup to UOp mod folding [run_process_replay] ( #5895 )
...
some walrus
2024-08-03 21:38:44 -04:00
nimlgen
dad8e72ee9
hcq graph refactor ( #5887 )
...
* cleanup
* prof
* cleaner
* comments
* more types
2024-08-03 23:35:33 +03:00
chenyu
da61dea1b2
simple failed UOp sub symbolic test case ( #5894 )
2024-08-03 14:27:23 -04:00