Commit Graph

10490 Commits

Author SHA1 Message Date
chenyu
72c9b22833 sort vars in jit when building expected input args (#4990)
* sort vars in jit when building expected input args

fixed symbolic jit bugs with two variables.

* sort in clanggraph

* space

* one more
2024-06-16 15:55:51 -04:00
qazal
71aad183fd check Program from HEAD [run_process_replay] (#4996)
* use the same prg [run_process_replay]

* put var back
2024-06-16 20:12:30 +03:00
chenyu
2b07847f2b matmul returns in acc_dtype if specified (#4994)
more flexible to not automatically downcast, can fix bert mixed precision training with this
2024-06-16 12:56:15 -04:00
George Hotz
1d6f1a15e1 add lt and ge uop methods [run_process_replay] (#4995)
* add lt and ge uop methods [run_process_replay]

* more correct (should still run process replay)
2024-06-16 09:33:53 -07:00
uuuvn
1b3f27565a Boring UOps to UPat compiler [run_process_replay] (#4991)
* Boring UOps to UPat compiler

* ruff

* weirdness

* dtype fix

* Revert "weirdness"

This reverts commit 4bc213a157.

* weirdness

* end weirdness?

* a bunch more rules

* more patterns
2024-06-16 09:03:41 -07:00
George Hotz
dac96f177e ignore indexing in the flopcounter (#4993) 2024-06-16 08:59:55 -07:00
Timmy
01b26756d6 Multireduce Scheduler Tests (#4972)
* scheduler tests

* linters

* cleaning up tests

* fixing tests

* syntax

* fixing metal
2024-06-16 16:30:22 +03:00
chenyu
5eb8001514 minor cleanup in jit (#4989)
found a non-deterministic bug in jit with multiple variables. but first cleanup some variable names.
[run_process_replay]
2024-06-15 23:43:17 -04:00
chenyu
44dfa37c70 use threefry in stable diffusion benchmark (#4988)
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
chenyu
20b50d8d64 doc: manual_seed (#4987)
there was a docstring just not linked to the doc page. also updated the example to show re-seed instead of a internal variable
2024-06-15 19:57:26 -04:00
wozeparrot
ce1ed374c9 more tinychat fixes (#4971) 2024-06-15 16:29:39 -07:00
chenyu
50bc14d186 re-enable test that loads torch pkl format (#4986) 2024-06-15 14:11:30 -04:00
qazal
ff8e9eefc3 hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981)
* use replay_codegen [run_process_replay]

* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06 Trigger process replay from pull request title [run_process_replay] (#4980)
* Trigger process replay from pull request title

* idk how this thing works btw

* test if it will work

* try 2

* Revert "idk how this thing works btw"

This reverts commit 580da51b07.

* Revert "try 2"

This reverts commit 7ff1e86d5d.

* test if it works

* meh

* Reapply "idk how this thing works btw"

This reverts commit dd33ad7c14.

* revert
2024-06-15 16:21:00 +03:00
uuuvn
033fb53f9e Incomplete/buggy rule breaks process replay on #4976 (#4978)
* Incomplete/buggy rule breaks process replay on #4976

* test passes

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-15 15:18:35 +03:00
qazal
d91f0ee85b add regression test for the neg folding pattern (#4979) 2024-06-15 15:08:28 +03:00
nimlgen
dfadf82e10 hcq optimize enqueue time (#4973)
* hcq optimize enqueue time

* linter
2024-06-15 10:47:25 +03:00
chenyu
5f7dd74655 docs: update wording for unflatten (#4974)
it was using `Expands`, the same in torch doc, but we also have expand so it's confusing
2024-06-14 23:12:41 -04:00
Cyril Roumégous
efbf4fca05 perf: graph_rewrite line reduction and make it a little bit faster [run_process_replay] (#4958) 2024-06-14 16:37:27 -07:00
wozeparrot
8209cd3c55 easier llama3 + fetch subdir (#4938) 2024-06-14 13:47:27 -07:00
chenyu
64cda3c481 raise TypeError calling len() on a 0-d tensor (#4970)
matched numpy and torch
2024-06-14 16:34:27 -04:00
chenyu
67e8df4969 remove numpy from dtype (#4969)
replaced all dtype.np with _to_np_dtype defined in tensor.py.

after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
wozeparrot
62dc36d371 autogen _try_dlopen (#4949) 2024-06-14 12:12:18 -07:00
qazal
3e297d8216 delete Linearizer.const [run_process_replay] (#4967) 2024-06-14 21:51:37 +03:00
chenyu
118c9fe468 Tensor._fromcpu -> Tensor._fromnp (#4966)
and moved to constructor with np.ndarray
2024-06-14 14:33:43 -04:00
wozeparrot
2a974ff257 fix: no readablestream await of, too new (#4965) 2024-06-14 11:22:19 -07:00
nimlgen
9436cd4551 hcq add memory_barrier (#4963)
* hcq add memory_barrier

* fix nv
2024-06-14 21:02:55 +03:00
chenyu
dae1c8abe2 create Tensor from bytes without numpy (#4964) 2024-06-14 13:37:27 -04:00
chenyu
5eee974b2a construct Tensor from python list/tuple directly (#4947)
* construct Tensor from python list/tuple directly

no numpy. annoying that half memoryview is 3.12 feature...

* simpler, and test

* flat already

* simpler

* cute

* 10% faster

* 5%
2024-06-14 11:36:05 -04:00
geohotstan
90332eb529 Getitem pin None dimension (#4960)
* fix

* remove torch out of bounds test

* 1 more test case
2024-06-14 10:48:59 -04:00
qazal
2eeddf1a46 IF ends with STORE, RANGE ends with PHI [run_process_replay] (#4953) 2024-06-14 16:00:32 +03:00
George Hotz
d5a92b9b83 sort the axis in reduce op [run_process_replay] (#4956) 2024-06-14 05:16:05 -07:00
George Hotz
14189bca68 graph_dedup function [run_process_replay] (#4955) 2024-06-14 04:24:37 -07:00
George Hotz
63a8add2c2 move uops add logic to linearize (#4952)
* move logic to linearize

* idk how this should work

* empty
2024-06-14 03:52:37 -07:00
qazal
7e32b8c930 refactor generic UOps.END* insertion (#4951)
* merge loops children

* rename to scope_children

* refactor ends

* merge with ends [run_process_replay]
2024-06-14 13:42:41 +03:00
George Hotz
9823752397 make uops.add private (#4950)
* make uops.add private

* modernize all tests
2024-06-14 03:23:25 -07:00
Jhenner Tigreros
dc9e9e4363 Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887)
* Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV

* Delete unused import

* Add cstyle renderer

* Fix formatting text

* Fix test error due to bad implementation of renderer

* Add PTX support

* Add RECIP to LLVMIR

* Remove BinaryOps.DIV from symbolic test

* Change some test and fix C floor division

* Change references to DIV for the RECIP or IDIV

* Add mimic idiv for symbolic test

* Restore floor

* Mimic idiv

* cast to int

* Fix some test and renderer

* Remove DIV for render nodes

* Resolve issue with div

* Add TestRenderer

* Fix test

* fix error

* Fix PAD test

* Fix div implementation

* Remove DIV

* Add upcast to rshift, due to use of MUL and RECIP on DIV

* Fix linter

* Remove complete BinaryOps.DIV

* Fix lint

* Fix some test

* Revert mul modification

* Fix tests

* Fix CLANG for uops

* Revert IDIV function

* Minor fix

* modify pattern matching rule to support nan

* Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP

* Remove const folding for IDIV and fix PTX

* Complete remove IDIV from extra

* Remove test_div from TestFloatUOps due to test on recip

* Fix linearizer

* fix

* Fix test_22

* Fix llvm

* Apply trunc function for llvmlit

* use floor instead of trunc

* Use correct type

* Generate new fuzz db

* Fix rshift, do not cast to float to support idiv

* Return upcast=false to rshift

* Add to unsafepad BinaryOps.IDIV

* Remove RECIP override for CUDA

* add atol / rtol for the test

* Remove cast to int on IDIV

* Regenerate sops

* delete sops.gz

* regenerate

* regenerate

* regenerate

* Reduce margins

* pass atol and rtol as parametersg for _test_metrics

* regenerated dataset

* Regenerate

* Remove duplicated

* Revert changes on extra

* Remove changes extra and NOQA for test

* Remove E501

* Remove and change line

* Remove E501

* Fix atan2

* Revert import and E501

* Remove E501

* Add hrcp to halp ops

* Remove 1 of hrcp

* Remove last DIV and add type check on uops for IDIV

* Fix new tests

* Fix tests and custom function

* Regenerate dataset

* Regenerate dataset

* Revert dataset

* Change generate dataset script

* Remove line

* Change IDIV, type checker validate if x,y and z are int

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-14 02:43:46 -07:00
SnakeOnex
f87ba6016a tqdm total=0 fix (#4939)
* fixes

* fixes

* removed auto loop closing

* one line shorter
2024-06-14 02:31:59 -07:00
nimlgen
225f792330 amd hdp flush regs are on seg2 (#4925) 2024-06-14 01:42:23 +03:00
nimlgen
4bfd1904f6 nv do not modify prg's qmd (#4948) 2024-06-14 01:15:40 +03:00
chenyu
845c10bc28 add Node to _broadcasted type annotation (#4946) 2024-06-13 14:10:56 -04:00
chenyu
287d3c3b84 support list, tuple input in dtypes.from_py (#4945)
* support list, tuple input in dtypes.from_py

and used it to infer dtype from python list and tuple in Tensor constructor.

* fix tests
2024-06-13 13:38:06 -04:00
chenyu
7aecea4f56 support creating Tensor from python tuple (#4944)
added a small fuzzer to test data with mixed tuple and list of numbers matched with numpy
2024-06-13 12:18:37 -04:00
chenyu
74586bc339 fix getitem with leading None (#4943)
i think all None handling can be unified and remove the calc_dim in advanced indexing
2024-06-13 11:23:40 -04:00
George Hotz
e63701fbd4 RDNA3 assembly support (#3637)
* amazing that i can use comgr for this

* compile empty kernel

* cleanups

* tiny_add compiles

* ugh

* more work

* put that in extra
2024-06-13 09:09:24 +02:00
nimlgen
fd071ba27e amd mockgpu correct timer resolution (#4942)
* amd mockgpu correct timer resolution

* test it
2024-06-13 10:07:34 +03:00
chenyu
fae08c4d48 fix Tensor.triu / Tensor.triu with boolean input (#4941)
`where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches
2024-06-12 20:16:13 -04:00
chenyu
cc90b3ef9f simpler Tensor.gather (#4940)
get rid of some confusing transpose and permute, and the if condition on dim. Saved a kernel for each dim != 0 case in test by removing the dangling transpose at the end
2024-06-12 19:42:40 -04:00
George Hotz
fa00ef66fd Update README.md 2024-06-13 00:29:19 +02:00
chenyu
eb0f5b5660 failed test case for getitem with leading Nones (#4936)
* failed test case for getitem with leading Nones

torch matched numpy so tinygrad is incorrect.
another repro
```
t = np.arange(12).reshape((3, 4))
print(t[None, None, np.array([1, 2])])

t = torch.arange(12).reshape((3, 4))
print(t[None, None, torch.tensor([1, 2])].numpy())

t = Tensor.arange(12).reshape(3, 4)
print(t[None, None, Tensor([1, 2])].numpy())
```

* # noqa
2024-06-12 16:19:42 -04:00