Commit Graph

4766 Commits

Author SHA1 Message Date
kormann
7c3b877216 rename uop [run_process_replay] (#5031)
* rename

* fix unittests

* rename vin

* fix test

* fix type [run_process_replay]

* rm pre commit hook change
2024-06-18 21:34:05 +03:00
chenyu
dc942bf1f6 jit sampling functionn in test_randomness.test_multinomial (#5034)
* jit sampling functionn in test_randomness.test_multinomial

`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec

* skip that
2024-06-18 14:21:05 -04:00
Elias Wahl
f31ef11537 Better default hparams for large BS (#5030)
* better default hparams for large BS

* bf16 too

* use tuple
2024-06-18 11:13:06 -04:00
Francis Lam
8d33998e0d [run_process_replay] linearizer: fix get_grouping_dims to respect global/local max (#4855)
* linearizer: fix get_grouping_dims to respect global/local max

* fix lidx variable index offset and unrestrict clang/llvm global len

* test reverse variable indexing when reverse_dims is true

* change the collapse axis to be the right most if reversed
2024-06-18 16:51:27 +03:00
joeshmoe0112358
7842559952 simplification of exp2 (#5023) 2024-06-18 06:51:16 -07:00
kormann
acc8f5e30e print_tree for uops (#5028) 2024-06-18 06:36:14 -07:00
Junjun Dong
c8cd6e725c Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977)
* feat: remove BinaryOps.SUB

* remove SUB in test_early_end_local

* regenerate dataset. remove SUB in test_linearizer_*

* reenable overflow tests

* simplify tensor.sub function by returning a+(-b)

* remove whitespaces

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-18 09:06:13 -04:00
chenyu
620fa6e5a2 check Tensor.reshape can have at most one -1 (#5026)
raise RuntimeError to match torch. on master it throws weird errors from shapetracker
2024-06-18 08:17:12 -04:00
Elias Wahl
7bfa9101c0 Float in scaled dot product attention (#4985)
* Monkeypatch scaled-dot-product-attention

* Use dot instead of matmul

* new api

* imports

* least_upper_dtype
2024-06-18 08:16:41 -04:00
nimlgen
194a168630 hcq signal scheduler (#5016)
* faster hcq

* fix nv

* linter

* cleaner

* fix sync

* cleaner

* a bit cleaner
2024-06-18 14:02:21 +03:00
chenyu
e9c6a36894 remove CACHELEVEL=0 in llama3 benchmark (#5025) 2024-06-17 22:43:16 -04:00
chenyu
acaf9a490d RECIP(-0.0) should be -inf (#5024)
* RECIP(-0.0) should be -inf

added test_dtype_alu for PYTHON backend

* catcht that

* fix those two
2024-06-17 22:26:58 -04:00
GabrielZCode
66760ae558 graph display floats rounded (#5021)
Co-authored-by: gabrielsouza <gabriel.martins@perdcomp.com.br>
2024-06-17 18:22:55 -07:00
chenyu
03b367c014 handle float16 overflow in PYTHON (#5022)
* handle float16 overflow in PYTHON

use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf.

* all valid fmt supports truncate
2024-06-17 21:12:52 -04:00
chenyu
c0139b05d8 python_alu sin(inf) is nan (#5020)
* python_alu sin(inf) is nan

without special handling, it throws ValueError: math domain error

* skip CUDACPU
2024-06-17 19:47:30 -04:00
chenyu
4296507021 Tensor.sum returns in acc_dtype if specified (#5012)
* Tensor.sum returns in acc_dtype if specified

* skip PYTHON for now

* revert that

* relax that
2024-06-17 16:35:52 -04:00
chenyu
013c73c3b3 minor refactor overflow handing in python backend (#5015)
made it clear that it's only handing int now. need to handle float inf next
2024-06-17 12:18:38 -04:00
Ray
1ad3b25461 fix einsum output str (#4998)
* fix einsum output str

* new line to satisfy linter

* removed redundant cast (satisfy linter)
2024-06-17 12:18:14 -04:00
nimlgen
794acefbf3 hcq update waits and signals in place (#4984)
* hcq update waits and signals in place

* start amd

* amd works

* prettier

* test

* normal messages

* linetr

* linter 2
2024-06-17 17:19:07 +03:00
qazal
603a4a0ce1 process replay contributor docs (#5010) 2024-06-17 09:38:59 -04:00
qazal
026c59543c allow keyword args in UOp.store [run_process_replay] (#5008)
* allow keyword args in UOp.store [run_process_replay]

* same for load

* typing can stay
2024-06-17 15:42:27 +03:00
uuuvn
f1de8cd8cf Convert a bunch more rules [run_process_replay] (#5007)
* Convert a bunch more rules [run_process_replay]

* more rules, narrow down CMPLT rule

* smart linter cut two lines

* nope, the linter is dumb

* make dumb linter shut up

* revert two rules

* Revert "revert two rules"

This reverts commit 585688da17.

* fix
2024-06-17 15:16:31 +03:00
chenyu
c52352bd9a fix yolov8 example (#5003)
it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.
2024-06-16 20:47:29 -04:00
nimlgen
8bc0cbf67b nv tiny cleanups (#5001)
* nv tiny cleanups

* gpfifo rework

* return type
2024-06-17 00:43:44 +03:00
qazal
04feeb37e6 look for unsafe pad ops in multiview ShapeTrackers (#5002) 2024-06-17 00:28:12 +03:00
George Hotz
bee8fc29ee add GPT2 half/half+beam to AMD (#5000)
* add GPT2 half/half+beam to AMD

* winograd in training. half and half/beam file upload
2024-06-16 14:07:14 -07:00
chenyu
72c9b22833 sort vars in jit when building expected input args (#4990)
* sort vars in jit when building expected input args

fixed symbolic jit bugs with two variables.

* sort in clanggraph

* space

* one more
2024-06-16 15:55:51 -04:00
qazal
71aad183fd check Program from HEAD [run_process_replay] (#4996)
* use the same prg [run_process_replay]

* put var back
2024-06-16 20:12:30 +03:00
chenyu
2b07847f2b matmul returns in acc_dtype if specified (#4994)
more flexible to not automatically downcast, can fix bert mixed precision training with this
2024-06-16 12:56:15 -04:00
George Hotz
1d6f1a15e1 add lt and ge uop methods [run_process_replay] (#4995)
* add lt and ge uop methods [run_process_replay]

* more correct (should still run process replay)
2024-06-16 09:33:53 -07:00
uuuvn
1b3f27565a Boring UOps to UPat compiler [run_process_replay] (#4991)
* Boring UOps to UPat compiler

* ruff

* weirdness

* dtype fix

* Revert "weirdness"

This reverts commit 4bc213a157.

* weirdness

* end weirdness?

* a bunch more rules

* more patterns
2024-06-16 09:03:41 -07:00
George Hotz
dac96f177e ignore indexing in the flopcounter (#4993) 2024-06-16 08:59:55 -07:00
Timmy
01b26756d6 Multireduce Scheduler Tests (#4972)
* scheduler tests

* linters

* cleaning up tests

* fixing tests

* syntax

* fixing metal
2024-06-16 16:30:22 +03:00
chenyu
5eb8001514 minor cleanup in jit (#4989)
found a non-deterministic bug in jit with multiple variables. but first cleanup some variable names.
[run_process_replay]
2024-06-15 23:43:17 -04:00
chenyu
44dfa37c70 use threefry in stable diffusion benchmark (#4988)
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
chenyu
20b50d8d64 doc: manual_seed (#4987)
there was a docstring just not linked to the doc page. also updated the example to show re-seed instead of a internal variable
2024-06-15 19:57:26 -04:00
wozeparrot
ce1ed374c9 more tinychat fixes (#4971) 2024-06-15 16:29:39 -07:00
chenyu
50bc14d186 re-enable test that loads torch pkl format (#4986) 2024-06-15 14:11:30 -04:00
qazal
ff8e9eefc3 hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981)
* use replay_codegen [run_process_replay]

* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06 Trigger process replay from pull request title [run_process_replay] (#4980)
* Trigger process replay from pull request title

* idk how this thing works btw

* test if it will work

* try 2

* Revert "idk how this thing works btw"

This reverts commit 580da51b07.

* Revert "try 2"

This reverts commit 7ff1e86d5d.

* test if it works

* meh

* Reapply "idk how this thing works btw"

This reverts commit dd33ad7c14.

* revert
2024-06-15 16:21:00 +03:00
uuuvn
033fb53f9e Incomplete/buggy rule breaks process replay on #4976 (#4978)
* Incomplete/buggy rule breaks process replay on #4976

* test passes

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-15 15:18:35 +03:00
qazal
d91f0ee85b add regression test for the neg folding pattern (#4979) 2024-06-15 15:08:28 +03:00
nimlgen
dfadf82e10 hcq optimize enqueue time (#4973)
* hcq optimize enqueue time

* linter
2024-06-15 10:47:25 +03:00
chenyu
5f7dd74655 docs: update wording for unflatten (#4974)
it was using `Expands`, the same in torch doc, but we also have expand so it's confusing
2024-06-14 23:12:41 -04:00
Cyril Roumégous
efbf4fca05 perf: graph_rewrite line reduction and make it a little bit faster [run_process_replay] (#4958) 2024-06-14 16:37:27 -07:00
wozeparrot
8209cd3c55 easier llama3 + fetch subdir (#4938) 2024-06-14 13:47:27 -07:00
chenyu
64cda3c481 raise TypeError calling len() on a 0-d tensor (#4970)
matched numpy and torch
2024-06-14 16:34:27 -04:00
chenyu
67e8df4969 remove numpy from dtype (#4969)
replaced all dtype.np with _to_np_dtype defined in tensor.py.

after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
wozeparrot
62dc36d371 autogen _try_dlopen (#4949) 2024-06-14 12:12:18 -07:00
qazal
3e297d8216 delete Linearizer.const [run_process_replay] (#4967) 2024-06-14 21:51:37 +03:00