Commit Graph

5505 Commits

Author SHA1 Message Date
George Hotz
0a8668cf30 improvements to docs 2024-08-07 09:57:24 -07:00
qazal
7677361d90 test pushing through different expands in 1 kernel (#5963)
* test pushing through different expands in 1 kernel

* realize eye

* back to test_example_matmul
2024-08-07 19:33:18 +03:00
nimlgen
564a352194 nv unify _gpu_free (#5961)
* nv unify _gpu_free

* revert this
2024-08-07 18:18:17 +03:00
Eitan Turok
39c8c9c00a Add docs (#5942)
* init commit

* finish writing

* add to docs

* fix docs

* fix typo

* delete new line

* rename to tensor properties

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-07 07:38:51 -07:00
qazal
39dda3d042 rename prescheduled items to lsi [run_process_replay] (#5959)
* rename to lsi

* fuzz_schedule more typings

* rename fuzz_schedule
2024-08-07 14:31:50 +03:00
qazal
728b7e189e diff_schedule tests [run_process_replay] (#5958)
* diff_schedule tests [run_process_replay]

* ok to run serial
2024-08-07 13:50:27 +03:00
chenyu
a7163b80d8 lower test_transcendental fuzz test threshold for sin float64 (#5956) 2024-08-07 02:04:37 -04:00
chenyu
fa3a36e576 fancier UOp div gcd folding (#5953)
combine and cancel the remaining const based on gcd of other terms like SumNode.
2024-08-07 02:04:25 -04:00
chenyu
aa7fd7ef74 Use (-self).lt(-x+1) for UOp.ge (#5955)
matched symbolic and fixed UOP_IS_SYMBOLIC=1 arange folding
2024-08-07 01:31:27 -04:00
George Hotz
3d445039c2 hotfix: 8800 lines for AMX+intel tc 2024-08-06 17:50:26 -07:00
George Hotz
658d58784b embedding doesn't cast (#5952)
* embedding doesn't cast

* test the right thing

* too much annoying with that test
2024-08-06 17:49:14 -07:00
wozeparrot
30d0cb2a82 fix: fix transcendental flakyness on exp float with 9.96875 (#5951) 2024-08-06 17:32:13 -07:00
George Hotz
3a0515ea22 hotfix: process_replay/diff_schedule.py to LBScheduleItem 2024-08-06 17:01:05 -07:00
chenyu
aee737bd9e divide by gcd in UOp div folding (#5949)
* divide by gcd in UOp div folding

`(6x+6y)//16 -> (3x+3y)//8` etc
simpler version

* only factor out const

* don't apply for unsigned

* don't need that if

* space
2024-08-06 20:00:57 -04:00
George Hotz
6d1fdcfce2 don't reduce the same thing in a vector (#5950)
* don't reduce the same thing over and over

* cleaner way to write it that doesn't loop
2024-08-06 16:59:15 -07:00
qazal
d5d7f4e7b8 more TestIndexing correctness asserts [run_process_replay] (#5948)
* use torch in test_mnist_val

* more asserts
2024-08-07 01:50:42 +03:00
qazal
7f062929e8 start all cached scheduler functions with buf, st [run_process_replay] (#5946)
* start all cached scheduler functions with buf, st

- [x] _recursive_group
- [x] _recursive_lazyop
- [x] _recurse_reduceops

* use dict [run_process_replay]
2024-08-07 01:24:22 +03:00
chenyu
794796256c UOp.const_factor [run_process_replay] (#5945)
* UOp.const_factor [run_process_replay]

simplify mod and div folding

* test does not work now
2024-08-06 18:18:29 -04:00
Elias Wahl
c9862e17d4 MLPERF BERT submission scripts (#5931)
* green

* red

* fix benchmark

* log

* count train samples

* oops. 4.0 -> 4.1

* note to todo

* no pillow
2024-08-06 18:09:18 -04:00
George Hotz
73d4d51845 add LBScheduleItem type [run_process_replay] (#5944)
* add LBScheduleItem type [run_process_replay]

* minor cleanups

* fix

* fix fuzz tests

* add group cache type
2024-08-06 14:49:40 -07:00
chenyu
1dab75ae37 clean up mlperf dataloader import (#5940)
use tinygrad tqdm for dataset, and PIL Image is only needed for resnet
2024-08-06 17:10:08 -04:00
qazal
7b6496f2e6 fix the reduceops cache breaking beautiful_mnist (#5938)
* fix the reduceops cache breaking beautiful_mnist

* test_sparse_categorical_crossentropy_simple

* starting tests

* atol from test_nn

* test_sparse_categorical_crossentropy_alt

* dont use torch
2024-08-07 00:02:54 +03:00
George Hotz
1417cc8df1 can reenable that test now (#5914) 2024-08-06 13:38:21 -07:00
George Hotz
75154d7ae2 add some types to the scheduler [run_process_replay] (#5941)
* add some types to the scheduler [run_process_replay]

* set -> dedup
2024-08-06 12:23:54 -07:00
George Hotz
e077bc7baf move memory planner to realize (#5937) 2024-08-06 10:41:29 -07:00
chenyu
489575c3be more UOp sum div with gcd tests (#5936)
* more UOp sum div with gcd tests

* one more
2024-08-06 12:50:10 -04:00
ignaciosica
81ae9fadc8 Float4 support for CLANG (#5915)
* float4 support on clang

* skip linearizer tests that require locals

* add aligned attribute
2024-08-06 07:50:12 -07:00
qazal
a7db4c3ee9 show timings for DIFF_ARANGE=1 (#5935)
* show timings for DIFF_ARANGE=1

* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b diff fused arange schedules with ARANGE_DIFF=1 (#5934)
* diff fused arange schedules with ARANGE_DIFF=1

* better llama diff
2024-08-06 16:52:26 +03:00
qazal
f7761245aa save_schedule pre toposort [run_process_replay] (#5933) 2024-08-06 15:10:01 +03:00
nimlgen
895e062723 nv remove useless init (#5932) 2024-08-06 14:41:40 +03:00
qazal
3d4742dd2e override output shape in fused assign (#5930)
* override output shape in fused assign

This makes

```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.

* merge asserts
2024-08-06 13:28:50 +03:00
nimlgen
341c394c89 amd save exec offsets (#5928)
* amd save exec offsets

* fix

* better

* ugh
2024-08-06 12:11:46 +03:00
wozeparrot
5808e8a30f mockgpu remu changes (#5925) 2024-08-05 19:26:58 -07:00
chenyu
09b7722637 UOp generic div folding (#5896) 2024-08-05 21:38:43 -04:00
George Hotz
3e1336957d test arange with all opts (#5923)
* test arange with all opts

* Update test_arange.py

* Update test_arange.py

* Update test_arange.py

* Update test_arange.py

* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
2e7adb529f don't run kernels with 1000x more compute (fix BEAM with FUSE_ARANGE) (#5924) 2024-08-05 16:28:09 -07:00
George Hotz
5d17f54e3c fast mnist indexing (#5921)
* fast mnist indexing

* more tests

* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
George Hotz
e81c18f494 make the arange test check correctness [run_process_replay] (#5920) 2024-08-05 13:41:06 -07:00
George Hotz
8d1c884e78 capture the const pattern in both directions (#5919)
* capture the const pattern in both directions

* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
42f599870c unroll arange is broken (#5918)
* unroll arange is broken

* fix unrolled arange

* one more test
2024-08-05 12:15:07 -07:00
wozeparrot
6740a0a6a0 hip_ioctl changes (#5917) 2024-08-05 11:58:38 -07:00
qazal
70949ea7e6 test cstyle compile error for max with inline const (#5838)
* test_failure_46

* GPU=1 fails too

* add test_renderer

* add failing platforms

* nv too

* assert return value
2024-08-05 19:02:16 +03:00
nimlgen
98df648a79 metal sync queues in transfer (#5308)
* metal sync queues

* cleaner

* need this

* oops
2024-08-05 18:43:22 +03:00
qazal
6a70c69167 hotfix: TC renders nv_bfloat16 (#5913)
* fix wmma bfloat16

* cleanup
2024-08-05 18:40:31 +03:00
P4ssenger
8ce9e6e693 Fix vectorized dtype rendering bug in CLANG (#5911)
* fix vectorized types rendering for clang

* fix bug in fix

* fix bug 2 in fix 2
2024-08-05 17:43:26 +03:00
qazal
e0c6520138 check arange fusing with VIEW and COPY (#5912)
* check arange fusing with VIEW and COPY

* gpu and clang
2024-08-05 17:09:21 +03:00
nimlgen
590b9ebb34 hcq copy queue is optional (#5909)
* hcq copy queue is optional

* one more

* this
2024-08-05 14:03:25 +03:00
George Hotz
159ac06b5b remove unused reduce rules + improve unparented (#5908)
* remove unused reduce rules [run_process_replay]

* this work

* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz
d7387d31bf remove useless reduce cases [run_process_replay] (#5907)
* remove useless reduce cases [run_process_replay]

* do_reduce cleanup

* more cleanups + no longer supported tests

* Revert "more cleanups + no longer supported tests"

This reverts commit e9f2f6ba70.

* no longer supported tests

* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00