Commit Graph

6609 Commits

Author SHA1 Message Date
qazal
a0bd385448 late uop_bufs [pr] (#7438) 2024-10-31 17:30:32 +08:00
qazal
7916d1f6ab shorter UOps.BUFFER init [pr] (#7436) 2024-10-31 17:14:19 +08:00
George Hotz
2e3048fc57 Revert "improve full_graph_rewrite matchers for speed (#7431)" (#7434)
This reverts commit 996152d2de.
2024-10-31 16:16:47 +08:00
George Hotz
996152d2de improve full_graph_rewrite matchers for speed (#7431)
* remove finalize [pr]

* early transcendental

* fix tests

* load store indexing runs with devectorize

* move delete_redundant_gates

* ptx has to wait for the mask to move
2024-10-31 16:13:11 +08:00
qazal
5f49651360 verify assign pre astfying [pr] (#7417) 2024-10-31 16:02:07 +08:00
George Hotz
17c9a9fde4 pm_render [pr] (#7430)
* pm_render [pr]

* test fixes

* use gep, not src

* ptx only symbolic, not sym

* move cast rules
2024-10-31 15:04:50 +08:00
George Hotz
8fff8fc3e7 replace REDUCE and clean up arange (#7429)
* break apart arange [pr]

* fix missing

* cleanups to add/mul

* UOps.VECTORIZE

* don't vectorize const
2024-10-31 14:02:20 +08:00
George Hotz
fe2bc4c613 clean up arange/indexing matchers [pr] (#7427)
* clean up arange/indexing matchers [pr]

* syntax for assign
2024-10-31 12:12:44 +08:00
George Hotz
e446e95974 enforce ctx is called ctx [pr] (#7424)
* enforce ctx is called ctx [pr]

* fix bug and use has_ctx

* inspect signature

* assert

* no slow asserts

* now we can support contextual reduce
2024-10-31 11:39:19 +08:00
chenyu
9b08bb4c3e fold the +x term in sine inside sin_poly (#7425) 2024-10-30 23:13:08 -04:00
chenyu
0739895b4d tiny clena up pow2if and payne_hanek_reduction (#7423) 2024-10-30 22:22:48 -04:00
chenyu
118dd7721f clean up transcendental.rintk [pr] (#7422)
added unit tests and updated the comment. it's rounding away from 0 for negatives
2024-10-30 20:37:28 -04:00
chenyu
fb694a63eb Tensor.erf (#7419)
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
qazal
e955aa1bee hotfix: process replay (#7418) 2024-10-30 22:45:40 +02:00
qazal
4c0ee32ef2 delete metadata from schedule ctx [pr] (#7415) 2024-10-31 01:49:49 +08:00
George Hotz
b4410545d8 hotfix: INDEX is yellow-green 2024-10-31 01:42:54 +08:00
qazal
d81e07e4fc compare schedule len against group count [pr] (#7414) 2024-10-31 01:42:10 +08:00
qazal
1a2ee37dd3 hotfix: remove redundant test_schedules [pr] (#7412) 2024-10-31 01:10:31 +08:00
George Hotz
7039fba406 move indexing first (#7409)
* move indexing first [pr]

* no create gate

* fix create_gate

* fix load/store folding

* fix index folding

* remove comment, no process replay
2024-10-31 00:50:35 +08:00
George Hotz
133fe81cc5 Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" (#7407)
* Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)"

This reverts commit ea5654a9bc.

* test padded in emulation too

* bring back early folding
2024-10-30 23:25:45 +08:00
chenyu
ea5654a9bc Revert "move up migrate + new gated fold (#7403)" (#7406)
This reverts commit adccfade7f.
2024-10-30 23:02:18 +08:00
George Hotz
adccfade7f move up migrate + new gated fold (#7403)
* move up migrate + new gated fold [pr]

* vcount for const ptr

* move those rules there

* fix openpilot
2024-10-30 22:14:01 +08:00
chenyu
16e60d25b9 move polyN to helper [pr] (#7405)
also move `eval_uop` to `test.helpers`
2024-10-30 10:09:57 -04:00
George Hotz
f3bd5cbf78 simplest migration of indexing [pr] (#7402)
* simplest migration of indexing [pr]

* fix locals/barrier
2024-10-30 20:58:18 +08:00
George Hotz
ee9ef93617 delete old rules [pr] (#7400) 2024-10-30 19:45:04 +08:00
vinzentbeer
573a848229 fix small typo (#7399)
"We use with Tensor.train() set the internal flag" -> "We use with Tensor.train() *to* set the internal flag"
2024-10-30 19:20:28 +08:00
George Hotz
d39f21da8f scalar image is image [pr] (#7398)
* scalar image is image [pr]

* base property
2024-10-30 18:51:47 +08:00
George Hotz
76a41a1083 don't compare with pointer dtype (#7394)
* don't compare with pointer dtype

* more cleanup

* images are pointers

* handle IMAGE better

* cleaner test_image

* this work

* pr match

* cleanup
2024-10-30 17:48:27 +08:00
qazal
95390df02a save lines [pr] (#7373) 2024-10-30 17:34:00 +08:00
George Hotz
4e2895f8d2 safe changes from new dtype branch [pr] (#7397)
* safe changes from new dtype branch [pr]

* only image test on GPU
2024-10-30 17:18:48 +08:00
George Hotz
0ca241693b viz loads nothing by default [pr] (#7395) 2024-10-30 15:40:08 +08:00
qazal
5e2e5b2cdc finally big graph (#7293)
* real big graph

* extra lines
2024-10-30 13:58:09 +08:00
George Hotz
27995a2a04 vcount + cleanups (#7393)
* Revert "Revert "Restore vcount [pr] (#7390)" (#7392)"

This reverts commit 4ca53db604.

* ugh bugfix [pr]

* uops_to_dtypes function

* fixups

* varnames

* fix mypy

* just 4,8

* tests
2024-10-30 12:50:15 +08:00
George Hotz
32dd2dcba5 minor cleanups of cstyle [pr] (#7391)
* minor cleanups of cstyle [pr]

* work
2024-10-30 11:59:27 +08:00
George Hotz
4ca53db604 Revert "Restore vcount [pr] (#7390)" (#7392)
This reverts commit 1058f9c9ff.
2024-10-30 11:40:25 +08:00
George Hotz
1058f9c9ff Restore vcount [pr] (#7390)
* Revert "Revert "add vcount to PtrDtype (#7388)""

This reverts commit 399a5219dd.

* Revert "Revert "add tests to vcount stuff [pr] (#7389)""

This reverts commit cc8d6dbdf3.

* no ptr
2024-10-30 11:27:55 +08:00
George Hotz
399a5219dd Revert "add vcount to PtrDtype (#7388)"
This reverts commit b086584d64.
2024-10-30 10:56:52 +08:00
George Hotz
cc8d6dbdf3 Revert "add tests to vcount stuff [pr] (#7389)"
This reverts commit 1b7084899b.
2024-10-30 10:56:49 +08:00
George Hotz
1b7084899b add tests to vcount stuff [pr] (#7389) 2024-10-30 10:54:54 +08:00
George Hotz
b086584d64 add vcount to PtrDtype (#7388) 2024-10-30 10:43:54 +08:00
uuuvn
06a8700bfa Replace sqrtl (long double) with sqrt (double) for double (#7366) 2024-10-30 10:20:41 +08:00
gonutz
e7cbc6dc23 Fix ValueError in Yolo 8 example (#7387)
Calling

    python3 examples/yolov8.py ./test/models/efficientnet/Chicken.jpg

used to result in this error

    ValueError: Calling nonzero on 0d arrays is not allowed.

Using np.atleast_1d makes sure we avoid a zero-dimension array.

Co-authored-by: gonutz <gonutz@fake.mail>
2024-10-30 10:18:39 +08:00
chenyu
f389e1a8a0 test more special values for sin/cos/tan [pr] (#7386) 2024-10-29 21:13:37 -04:00
chenyu
33acbaeb24 reuse polyN in trig_poly float64 (#7385)
similar speed, less alu (151 v.s. 154 per sine) and simpler, the power of 2 thing should probably be done in polyN if needed
2024-10-29 20:45:56 -04:00
chenyu
6bf38c35e5 clean up transcendental frexp [pr] (#7384)
also added some unit tests for frexp
2024-10-29 18:51:37 -04:00
chenyu
99b82f5708 minor cleanup payne_hanek_reduction [pr] (#7383) 2024-10-29 17:59:18 -04:00
chenyu
f6abde95fa clean up Tensor._reduce (#7382)
use make_tuple and self.ndim
2024-10-29 17:23:57 -04:00
nimlgen
4ed2c40d48 qcom a bit cleaner (#7380) 2024-10-29 23:50:28 +03:00
chenyu
07ad6d20ed simpler commutative flipping condition (#7377)
`x.src[1].tuplize < x.src[0].tuplize` implies `x.src[0] is not x.src[1]`

also renamed cc -> op
2024-10-29 13:51:24 -04:00
chenyu
d3c192b056 Device method cleanup [pr] (#7375) 2024-10-29 12:49:47 -04:00