George Hotz
|
1417cc8df1
|
can reenable that test now (#5914)
|
2024-08-06 13:38:21 -07:00 |
|
George Hotz
|
75154d7ae2
|
add some types to the scheduler [run_process_replay] (#5941)
* add some types to the scheduler [run_process_replay]
* set -> dedup
|
2024-08-06 12:23:54 -07:00 |
|
George Hotz
|
e077bc7baf
|
move memory planner to realize (#5937)
|
2024-08-06 10:41:29 -07:00 |
|
chenyu
|
489575c3be
|
more UOp sum div with gcd tests (#5936)
* more UOp sum div with gcd tests
* one more
|
2024-08-06 12:50:10 -04:00 |
|
ignaciosica
|
81ae9fadc8
|
Float4 support for CLANG (#5915)
* float4 support on clang
* skip linearizer tests that require locals
* add aligned attribute
|
2024-08-06 07:50:12 -07:00 |
|
qazal
|
a7db4c3ee9
|
show timings for DIFF_ARANGE=1 (#5935)
* show timings for DIFF_ARANGE=1
* always with DEBUG=2
|
2024-08-06 17:20:38 +03:00 |
|
qazal
|
102a8c184b
|
diff fused arange schedules with ARANGE_DIFF=1 (#5934)
* diff fused arange schedules with ARANGE_DIFF=1
* better llama diff
|
2024-08-06 16:52:26 +03:00 |
|
qazal
|
f7761245aa
|
save_schedule pre toposort [run_process_replay] (#5933)
|
2024-08-06 15:10:01 +03:00 |
|
nimlgen
|
895e062723
|
nv remove useless init (#5932)
|
2024-08-06 14:41:40 +03:00 |
|
qazal
|
3d4742dd2e
|
override output shape in fused assign (#5930)
* override output shape in fused assign
This makes
```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.
* merge asserts
|
2024-08-06 13:28:50 +03:00 |
|
nimlgen
|
341c394c89
|
amd save exec offsets (#5928)
* amd save exec offsets
* fix
* better
* ugh
|
2024-08-06 12:11:46 +03:00 |
|
wozeparrot
|
5808e8a30f
|
mockgpu remu changes (#5925)
|
2024-08-05 19:26:58 -07:00 |
|
chenyu
|
09b7722637
|
UOp generic div folding (#5896)
|
2024-08-05 21:38:43 -04:00 |
|
George Hotz
|
3e1336957d
|
test arange with all opts (#5923)
* test arange with all opts
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
|
2024-08-05 18:38:25 -07:00 |
|
George Hotz
|
2e7adb529f
|
don't run kernels with 1000x more compute (fix BEAM with FUSE_ARANGE) (#5924)
|
2024-08-05 16:28:09 -07:00 |
|
George Hotz
|
5d17f54e3c
|
fast mnist indexing (#5921)
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
|
2024-08-05 13:55:15 -07:00 |
|
George Hotz
|
e81c18f494
|
make the arange test check correctness [run_process_replay] (#5920)
|
2024-08-05 13:41:06 -07:00 |
|
George Hotz
|
8d1c884e78
|
capture the const pattern in both directions (#5919)
* capture the const pattern in both directions
* add regression test
|
2024-08-05 12:15:38 -07:00 |
|
George Hotz
|
42f599870c
|
unroll arange is broken (#5918)
* unroll arange is broken
* fix unrolled arange
* one more test
|
2024-08-05 12:15:07 -07:00 |
|
wozeparrot
|
6740a0a6a0
|
hip_ioctl changes (#5917)
|
2024-08-05 11:58:38 -07:00 |
|
qazal
|
70949ea7e6
|
test cstyle compile error for max with inline const (#5838)
* test_failure_46
* GPU=1 fails too
* add test_renderer
* add failing platforms
* nv too
* assert return value
|
2024-08-05 19:02:16 +03:00 |
|
nimlgen
|
98df648a79
|
metal sync queues in transfer (#5308)
* metal sync queues
* cleaner
* need this
* oops
|
2024-08-05 18:43:22 +03:00 |
|
qazal
|
6a70c69167
|
hotfix: TC renders nv_bfloat16 (#5913)
* fix wmma bfloat16
* cleanup
|
2024-08-05 18:40:31 +03:00 |
|
P4ssenger
|
8ce9e6e693
|
Fix vectorized dtype rendering bug in CLANG (#5911)
* fix vectorized types rendering for clang
* fix bug in fix
* fix bug 2 in fix 2
|
2024-08-05 17:43:26 +03:00 |
|
qazal
|
e0c6520138
|
check arange fusing with VIEW and COPY (#5912)
* check arange fusing with VIEW and COPY
* gpu and clang
|
2024-08-05 17:09:21 +03:00 |
|
nimlgen
|
590b9ebb34
|
hcq copy queue is optional (#5909)
* hcq copy queue is optional
* one more
* this
|
2024-08-05 14:03:25 +03:00 |
|
George Hotz
|
159ac06b5b
|
remove unused reduce rules + improve unparented (#5908)
* remove unused reduce rules [run_process_replay]
* this work
* those tests are meaningless now
|
2024-08-04 18:18:27 -07:00 |
|
George Hotz
|
d7387d31bf
|
remove useless reduce cases [run_process_replay] (#5907)
* remove useless reduce cases [run_process_replay]
* do_reduce cleanup
* more cleanups + no longer supported tests
* Revert "more cleanups + no longer supported tests"
This reverts commit e9f2f6ba70.
* no longer supported tests
* switch ReduceOps.SUM -> BinaryOps.ADD
|
2024-08-04 17:11:08 -07:00 |
|
wozeparrot
|
94917521ee
|
fix: sqlite on pypy (#5906)
|
2024-08-04 16:40:59 -07:00 |
|
George Hotz
|
be8958e26b
|
use CONTRACT before REDUCE (#5903)
* use CONTRACT before REDUCE [run_process_replay]
* support half expand
* EXPAND GEP
|
2024-08-04 16:17:33 -07:00 |
|
wozeparrot
|
f33950f454
|
tracemeta fixups (#5904)
|
2024-08-04 16:15:06 -07:00 |
|
chenyu
|
adba5efc64
|
enable llama 2 70B in tinybox green CI (#5905)
runnable with MAX_CONTEXT=256
|
2024-08-04 18:48:46 -04:00 |
|
chenyu
|
4a65010de8
|
remove CUDACPU flag in tests [run_process_replay] (#5902)
no longer used
|
2024-08-04 16:06:38 -04:00 |
|
chenyu
|
996ff0c135
|
pow(2) -> square in RMSNorm [run_process_replay] (#5901)
reads nicer in metadata
|
2024-08-04 14:21:31 -04:00 |
|
qazal
|
aad9234e52
|
test fused precompute_freqs_cis (#5900)
* test_precompute_freqs_cis
* tiny for ci
|
2024-08-04 21:01:05 +03:00 |
|
chenyu
|
c67e9887f7
|
support using str to specify dtype (#5897)
* support using str to specify dtype
in Tensor creation and args into `cast` and `bitcast`, and acc_dtype
* more tests
|
2024-08-04 12:56:28 -04:00 |
|
nimlgen
|
4f9221e8dd
|
remove useless _ensure_shared_time_base (#5899)
|
2024-08-04 17:01:54 +03:00 |
|
qazal
|
4c5ef2cc4f
|
setitem with arange fusion 1 (#5898)
|
2024-08-04 16:09:21 +03:00 |
|
chenyu
|
59315ffc78
|
minor cleanup to UOp mod folding [run_process_replay] (#5895)
some walrus
|
2024-08-03 21:38:44 -04:00 |
|
nimlgen
|
dad8e72ee9
|
hcq graph refactor (#5887)
* cleanup
* prof
* cleaner
* comments
* more types
|
2024-08-03 23:35:33 +03:00 |
|
chenyu
|
da61dea1b2
|
simple failed UOp sub symbolic test case (#5894)
|
2024-08-03 14:27:23 -04:00 |
|
Elias Wahl
|
937bf5fe12
|
better hparam (#5891)
|
2024-08-03 12:38:53 -04:00 |
|
qazal
|
37cc87ea75
|
save lines in the scheduler [run_process_replay] (#5890)
|
2024-08-03 14:20:11 +03:00 |
|
qazal
|
56ef9e453e
|
pad reduceops to the max of each dimension (#5889)
* early verify
* pad reduceops to the max of each dim
* remove the function
|
2024-08-03 14:03:30 +03:00 |
|
qazal
|
65fa86901a
|
indexing fusion 2 (#5888)
* arange fusion
* kernels that fuse
* tests
|
2024-08-03 13:13:39 +03:00 |
|
qazal
|
af59b2eea9
|
tests from the indexing fusion branch (#5886)
|
2024-08-03 11:56:48 +03:00 |
|
chenyu
|
a77eab89ca
|
UOp mod folding cleanup (#5885)
move patterns around and update comments
|
2024-08-02 22:56:32 -04:00 |
|
chenyu
|
d5de44340e
|
UOp add mod folding (#5862)
* UOp add mod folding
* that passes now
|
2024-08-02 18:31:46 -04:00 |
|
George Hotz
|
714d00f325
|
hotfix: median > mean for sampling clock jitter
|
2024-08-02 22:07:58 +00:00 |
|
George Hotz
|
7348c40d9d
|
sampling time sync (8700 lines) (#5843)
* sampling time sync
* jitter matrix
* comment
* pass mypy
* line count
|
2024-08-02 14:44:35 -07:00 |
|