Commit Graph

10633 Commits

Author SHA1 Message Date
George Hotz
1417cc8df1 can reenable that test now (#5914) 2024-08-06 13:38:21 -07:00
George Hotz
75154d7ae2 add some types to the scheduler [run_process_replay] (#5941)
* add some types to the scheduler [run_process_replay]

* set -> dedup
2024-08-06 12:23:54 -07:00
George Hotz
e077bc7baf move memory planner to realize (#5937) 2024-08-06 10:41:29 -07:00
chenyu
489575c3be more UOp sum div with gcd tests (#5936)
* more UOp sum div with gcd tests

* one more
2024-08-06 12:50:10 -04:00
ignaciosica
81ae9fadc8 Float4 support for CLANG (#5915)
* float4 support on clang

* skip linearizer tests that require locals

* add aligned attribute
2024-08-06 07:50:12 -07:00
qazal
a7db4c3ee9 show timings for DIFF_ARANGE=1 (#5935)
* show timings for DIFF_ARANGE=1

* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b diff fused arange schedules with ARANGE_DIFF=1 (#5934)
* diff fused arange schedules with ARANGE_DIFF=1

* better llama diff
2024-08-06 16:52:26 +03:00
qazal
f7761245aa save_schedule pre toposort [run_process_replay] (#5933) 2024-08-06 15:10:01 +03:00
nimlgen
895e062723 nv remove useless init (#5932) 2024-08-06 14:41:40 +03:00
qazal
3d4742dd2e override output shape in fused assign (#5930)
* override output shape in fused assign

This makes

```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.

* merge asserts
2024-08-06 13:28:50 +03:00
nimlgen
341c394c89 amd save exec offsets (#5928)
* amd save exec offsets

* fix

* better

* ugh
2024-08-06 12:11:46 +03:00
wozeparrot
5808e8a30f mockgpu remu changes (#5925) 2024-08-05 19:26:58 -07:00
chenyu
09b7722637 UOp generic div folding (#5896) 2024-08-05 21:38:43 -04:00
George Hotz
3e1336957d test arange with all opts (#5923)
* test arange with all opts

* Update test_arange.py

* Update test_arange.py

* Update test_arange.py

* Update test_arange.py

* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
2e7adb529f don't run kernels with 1000x more compute (fix BEAM with FUSE_ARANGE) (#5924) 2024-08-05 16:28:09 -07:00
George Hotz
5d17f54e3c fast mnist indexing (#5921)
* fast mnist indexing

* more tests

* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
George Hotz
e81c18f494 make the arange test check correctness [run_process_replay] (#5920) 2024-08-05 13:41:06 -07:00
George Hotz
8d1c884e78 capture the const pattern in both directions (#5919)
* capture the const pattern in both directions

* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
42f599870c unroll arange is broken (#5918)
* unroll arange is broken

* fix unrolled arange

* one more test
2024-08-05 12:15:07 -07:00
wozeparrot
6740a0a6a0 hip_ioctl changes (#5917) 2024-08-05 11:58:38 -07:00
qazal
70949ea7e6 test cstyle compile error for max with inline const (#5838)
* test_failure_46

* GPU=1 fails too

* add test_renderer

* add failing platforms

* nv too

* assert return value
2024-08-05 19:02:16 +03:00
nimlgen
98df648a79 metal sync queues in transfer (#5308)
* metal sync queues

* cleaner

* need this

* oops
2024-08-05 18:43:22 +03:00
qazal
6a70c69167 hotfix: TC renders nv_bfloat16 (#5913)
* fix wmma bfloat16

* cleanup
2024-08-05 18:40:31 +03:00
P4ssenger
8ce9e6e693 Fix vectorized dtype rendering bug in CLANG (#5911)
* fix vectorized types rendering for clang

* fix bug in fix

* fix bug 2 in fix 2
2024-08-05 17:43:26 +03:00
qazal
e0c6520138 check arange fusing with VIEW and COPY (#5912)
* check arange fusing with VIEW and COPY

* gpu and clang
2024-08-05 17:09:21 +03:00
nimlgen
590b9ebb34 hcq copy queue is optional (#5909)
* hcq copy queue is optional

* one more

* this
2024-08-05 14:03:25 +03:00
George Hotz
159ac06b5b remove unused reduce rules + improve unparented (#5908)
* remove unused reduce rules [run_process_replay]

* this work

* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz
d7387d31bf remove useless reduce cases [run_process_replay] (#5907)
* remove useless reduce cases [run_process_replay]

* do_reduce cleanup

* more cleanups + no longer supported tests

* Revert "more cleanups + no longer supported tests"

This reverts commit e9f2f6ba70.

* no longer supported tests

* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
wozeparrot
94917521ee fix: sqlite on pypy (#5906) 2024-08-04 16:40:59 -07:00
George Hotz
be8958e26b use CONTRACT before REDUCE (#5903)
* use CONTRACT before REDUCE [run_process_replay]

* support half expand

* EXPAND GEP
2024-08-04 16:17:33 -07:00
wozeparrot
f33950f454 tracemeta fixups (#5904) 2024-08-04 16:15:06 -07:00
chenyu
adba5efc64 enable llama 2 70B in tinybox green CI (#5905)
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
chenyu
4a65010de8 remove CUDACPU flag in tests [run_process_replay] (#5902)
no longer used
2024-08-04 16:06:38 -04:00
chenyu
996ff0c135 pow(2) -> square in RMSNorm [run_process_replay] (#5901)
reads nicer in metadata
2024-08-04 14:21:31 -04:00
qazal
aad9234e52 test fused precompute_freqs_cis (#5900)
* test_precompute_freqs_cis

* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu
c67e9887f7 support using str to specify dtype (#5897)
* support using str to specify dtype

in Tensor creation and args into `cast` and `bitcast`, and acc_dtype

* more tests
2024-08-04 12:56:28 -04:00
nimlgen
4f9221e8dd remove useless _ensure_shared_time_base (#5899) 2024-08-04 17:01:54 +03:00
qazal
4c5ef2cc4f setitem with arange fusion 1 (#5898) 2024-08-04 16:09:21 +03:00
chenyu
59315ffc78 minor cleanup to UOp mod folding [run_process_replay] (#5895)
some walrus
2024-08-03 21:38:44 -04:00
nimlgen
dad8e72ee9 hcq graph refactor (#5887)
* cleanup

* prof

* cleaner

* comments

* more types
2024-08-03 23:35:33 +03:00
chenyu
da61dea1b2 simple failed UOp sub symbolic test case (#5894) 2024-08-03 14:27:23 -04:00
Elias Wahl
937bf5fe12 better hparam (#5891) 2024-08-03 12:38:53 -04:00
qazal
37cc87ea75 save lines in the scheduler [run_process_replay] (#5890) 2024-08-03 14:20:11 +03:00
qazal
56ef9e453e pad reduceops to the max of each dimension (#5889)
* early verify

* pad reduceops to the max of each dim

* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a indexing fusion 2 (#5888)
* arange fusion

* kernels that fuse

* tests
2024-08-03 13:13:39 +03:00
qazal
af59b2eea9 tests from the indexing fusion branch (#5886) 2024-08-03 11:56:48 +03:00
chenyu
a77eab89ca UOp mod folding cleanup (#5885)
move patterns around and update comments
2024-08-02 22:56:32 -04:00
chenyu
d5de44340e UOp add mod folding (#5862)
* UOp add mod folding

* that passes now
2024-08-02 18:31:46 -04:00
George Hotz
714d00f325 hotfix: median > mean for sampling clock jitter 2024-08-02 22:07:58 +00:00
George Hotz
7348c40d9d sampling time sync (8700 lines) (#5843)
* sampling time sync

* jitter matrix

* comment

* pass mypy

* line count
2024-08-02 14:44:35 -07:00