George Hotz
bc55c8a30e
pmatmul example + GB/s bugfix [run_process_replay] ( #5974 )
...
* pmatmul example + bugfix
* improve pmatmul
* Update real_pmatmul.py
2024-08-07 22:32:11 -07:00
George Hotz
c5baa3d66b
hotfix: don't run OOM test in CI
2024-08-07 22:19:29 -07:00
chenyu
859d0e4709
UOp simplify (x+c0)*c1 -> x*c1+c0*c1 ( #5973 )
2024-08-07 21:25:22 -04:00
wozeparrot
97d708252a
remove realize from threefry ( #5969 )
2024-08-07 15:08:49 -07:00
George Hotz
bf8ec23b00
hotfix: contiguous on precompute_freqs_cis
2024-08-07 14:40:56 -07:00
wozeparrot
d3e427c8d9
fix sqlite3 locks ( #5971 )
2024-08-07 14:38:19 -07:00
nimlgen
cc37c99ae4
tiny hcq touchups ( #5964 )
2024-08-07 21:03:20 +03:00
nimlgen
8d8704af2d
fix amd exec_update for locals ( #5966 )
2024-08-07 21:02:56 +03:00
ignaciosica
0ddcd005f5
fix priority width and give more space for src ( #5509 )
2024-08-07 10:48:18 -07:00
tyoc213
0c4e9dbe71
retrieve defined opencl error codes ( #5792 )
2024-08-07 10:46:24 -07:00
ignaciosica
4b48f166ec
Refactor render_kernel for NV [run_process_replay] ( #5965 )
...
* start working on it
* blind test with process replay
* remove noqa:E501 refactoring make_cuda_dtype
* refactor even more but with known bug
* fix known bug with duplicated includes
* working locally
* add noqa:e501
* remove comment and move map
* fix qaz comments
* remove comment
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-08-07 20:36:04 +03:00
qazal
d6f4a61c42
graph LBScheduleItem [run_process_replay] ( #5960 )
...
* add toposort key to LBScheduleItem
* use dedup
* graph LBScheduleItem
* make that comment beautiful again
* diff_schedule utils
* update fuzz_schedule
2024-08-07 19:59:11 +03:00
George Hotz
0a8668cf30
improvements to docs
2024-08-07 09:57:24 -07:00
qazal
7677361d90
test pushing through different expands in 1 kernel ( #5963 )
...
* test pushing through different expands in 1 kernel
* realize eye
* back to test_example_matmul
2024-08-07 19:33:18 +03:00
nimlgen
564a352194
nv unify _gpu_free ( #5961 )
...
* nv unify _gpu_free
* revert this
2024-08-07 18:18:17 +03:00
Eitan Turok
39c8c9c00a
Add docs ( #5942 )
...
* init commit
* finish writing
* add to docs
* fix docs
* fix typo
* delete new line
* rename to tensor properties
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-08-07 07:38:51 -07:00
qazal
39dda3d042
rename prescheduled items to lsi [run_process_replay] ( #5959 )
...
* rename to lsi
* fuzz_schedule more typings
* rename fuzz_schedule
2024-08-07 14:31:50 +03:00
qazal
728b7e189e
diff_schedule tests [run_process_replay] ( #5958 )
...
* diff_schedule tests [run_process_replay]
* ok to run serial
2024-08-07 13:50:27 +03:00
chenyu
a7163b80d8
lower test_transcendental fuzz test threshold for sin float64 ( #5956 )
2024-08-07 02:04:37 -04:00
chenyu
fa3a36e576
fancier UOp div gcd folding ( #5953 )
...
combine and cancel the remaining const based on gcd of other terms like SumNode.
2024-08-07 02:04:25 -04:00
chenyu
aa7fd7ef74
Use (-self).lt(-x+1) for UOp.ge ( #5955 )
...
matched symbolic and fixed UOP_IS_SYMBOLIC=1 arange folding
2024-08-07 01:31:27 -04:00
George Hotz
3d445039c2
hotfix: 8800 lines for AMX+intel tc
2024-08-06 17:50:26 -07:00
George Hotz
658d58784b
embedding doesn't cast ( #5952 )
...
* embedding doesn't cast
* test the right thing
* too much annoying with that test
2024-08-06 17:49:14 -07:00
wozeparrot
30d0cb2a82
fix: fix transcendental flakyness on exp float with 9.96875 ( #5951 )
2024-08-06 17:32:13 -07:00
George Hotz
3a0515ea22
hotfix: process_replay/diff_schedule.py to LBScheduleItem
2024-08-06 17:01:05 -07:00
chenyu
aee737bd9e
divide by gcd in UOp div folding ( #5949 )
...
* divide by gcd in UOp div folding
`(6x+6y)//16 -> (3x+3y)//8` etc
simpler version
* only factor out const
* don't apply for unsigned
* don't need that if
* space
2024-08-06 20:00:57 -04:00
George Hotz
6d1fdcfce2
don't reduce the same thing in a vector ( #5950 )
...
* don't reduce the same thing over and over
* cleaner way to write it that doesn't loop
2024-08-06 16:59:15 -07:00
qazal
d5d7f4e7b8
more TestIndexing correctness asserts [run_process_replay] ( #5948 )
...
* use torch in test_mnist_val
* more asserts
2024-08-07 01:50:42 +03:00
qazal
7f062929e8
start all cached scheduler functions with buf, st [run_process_replay] ( #5946 )
...
* start all cached scheduler functions with buf, st
- [x] _recursive_group
- [x] _recursive_lazyop
- [x] _recurse_reduceops
* use dict [run_process_replay]
2024-08-07 01:24:22 +03:00
chenyu
794796256c
UOp.const_factor [run_process_replay] ( #5945 )
...
* UOp.const_factor [run_process_replay]
simplify mod and div folding
* test does not work now
2024-08-06 18:18:29 -04:00
Elias Wahl
c9862e17d4
MLPERF BERT submission scripts ( #5931 )
...
* green
* red
* fix benchmark
* log
* count train samples
* oops. 4.0 -> 4.1
* note to todo
* no pillow
2024-08-06 18:09:18 -04:00
George Hotz
73d4d51845
add LBScheduleItem type [run_process_replay] ( #5944 )
...
* add LBScheduleItem type [run_process_replay]
* minor cleanups
* fix
* fix fuzz tests
* add group cache type
2024-08-06 14:49:40 -07:00
chenyu
1dab75ae37
clean up mlperf dataloader import ( #5940 )
...
use tinygrad tqdm for dataset, and PIL Image is only needed for resnet
2024-08-06 17:10:08 -04:00
qazal
7b6496f2e6
fix the reduceops cache breaking beautiful_mnist ( #5938 )
...
* fix the reduceops cache breaking beautiful_mnist
* test_sparse_categorical_crossentropy_simple
* starting tests
* atol from test_nn
* test_sparse_categorical_crossentropy_alt
* dont use torch
2024-08-07 00:02:54 +03:00
George Hotz
1417cc8df1
can reenable that test now ( #5914 )
2024-08-06 13:38:21 -07:00
George Hotz
75154d7ae2
add some types to the scheduler [run_process_replay] ( #5941 )
...
* add some types to the scheduler [run_process_replay]
* set -> dedup
2024-08-06 12:23:54 -07:00
George Hotz
e077bc7baf
move memory planner to realize ( #5937 )
2024-08-06 10:41:29 -07:00
chenyu
489575c3be
more UOp sum div with gcd tests ( #5936 )
...
* more UOp sum div with gcd tests
* one more
2024-08-06 12:50:10 -04:00
ignaciosica
81ae9fadc8
Float4 support for CLANG ( #5915 )
...
* float4 support on clang
* skip linearizer tests that require locals
* add aligned attribute
2024-08-06 07:50:12 -07:00
qazal
a7db4c3ee9
show timings for DIFF_ARANGE=1 ( #5935 )
...
* show timings for DIFF_ARANGE=1
* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b
diff fused arange schedules with ARANGE_DIFF=1 ( #5934 )
...
* diff fused arange schedules with ARANGE_DIFF=1
* better llama diff
2024-08-06 16:52:26 +03:00
qazal
f7761245aa
save_schedule pre toposort [run_process_replay] ( #5933 )
2024-08-06 15:10:01 +03:00
nimlgen
895e062723
nv remove useless init ( #5932 )
2024-08-06 14:41:40 +03:00
qazal
3d4742dd2e
override output shape in fused assign ( #5930 )
...
* override output shape in fused assign
This makes
```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.
* merge asserts
2024-08-06 13:28:50 +03:00
nimlgen
341c394c89
amd save exec offsets ( #5928 )
...
* amd save exec offsets
* fix
* better
* ugh
2024-08-06 12:11:46 +03:00
wozeparrot
5808e8a30f
mockgpu remu changes ( #5925 )
2024-08-05 19:26:58 -07:00
chenyu
09b7722637
UOp generic div folding ( #5896 )
2024-08-05 21:38:43 -04:00
George Hotz
3e1336957d
test arange with all opts ( #5923 )
...
* test arange with all opts
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
2e7adb529f
don't run kernels with 1000x more compute (fix BEAM with FUSE_ARANGE) ( #5924 )
2024-08-05 16:28:09 -07:00
George Hotz
5d17f54e3c
fast mnist indexing ( #5921 )
...
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00