kormann
7c3b877216
rename uop [run_process_replay] ( #5031 )
...
* rename
* fix unittests
* rename vin
* fix test
* fix type [run_process_replay]
* rm pre commit hook change
2024-06-18 21:34:05 +03:00
chenyu
dc942bf1f6
jit sampling functionn in test_randomness.test_multinomial ( #5034 )
...
* jit sampling functionn in test_randomness.test_multinomial
`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec
* skip that
2024-06-18 14:21:05 -04:00
Elias Wahl
f31ef11537
Better default hparams for large BS ( #5030 )
...
* better default hparams for large BS
* bf16 too
* use tuple
2024-06-18 11:13:06 -04:00
Francis Lam
8d33998e0d
[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max ( #4855 )
...
* linearizer: fix get_grouping_dims to respect global/local max
* fix lidx variable index offset and unrestrict clang/llvm global len
* test reverse variable indexing when reverse_dims is true
* change the collapse axis to be the right most if reversed
2024-06-18 16:51:27 +03:00
joeshmoe0112358
7842559952
simplification of exp2 ( #5023 )
2024-06-18 06:51:16 -07:00
kormann
acc8f5e30e
print_tree for uops ( #5028 )
2024-06-18 06:36:14 -07:00
Junjun Dong
c8cd6e725c
Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset ( #4977 )
...
* feat: remove BinaryOps.SUB
* remove SUB in test_early_end_local
* regenerate dataset. remove SUB in test_linearizer_*
* reenable overflow tests
* simplify tensor.sub function by returning a+(-b)
* remove whitespaces
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-18 09:06:13 -04:00
chenyu
620fa6e5a2
check Tensor.reshape can have at most one -1 ( #5026 )
...
raise RuntimeError to match torch. on master it throws weird errors from shapetracker
2024-06-18 08:17:12 -04:00
Elias Wahl
7bfa9101c0
Float in scaled dot product attention ( #4985 )
...
* Monkeypatch scaled-dot-product-attention
* Use dot instead of matmul
* new api
* imports
* least_upper_dtype
2024-06-18 08:16:41 -04:00
nimlgen
194a168630
hcq signal scheduler ( #5016 )
...
* faster hcq
* fix nv
* linter
* cleaner
* fix sync
* cleaner
* a bit cleaner
2024-06-18 14:02:21 +03:00
chenyu
e9c6a36894
remove CACHELEVEL=0 in llama3 benchmark ( #5025 )
2024-06-17 22:43:16 -04:00
chenyu
acaf9a490d
RECIP(-0.0) should be -inf ( #5024 )
...
* RECIP(-0.0) should be -inf
added test_dtype_alu for PYTHON backend
* catcht that
* fix those two
2024-06-17 22:26:58 -04:00
GabrielZCode
66760ae558
graph display floats rounded ( #5021 )
...
Co-authored-by: gabrielsouza <gabriel.martins@perdcomp.com.br >
2024-06-17 18:22:55 -07:00
chenyu
03b367c014
handle float16 overflow in PYTHON ( #5022 )
...
* handle float16 overflow in PYTHON
use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf.
* all valid fmt supports truncate
2024-06-17 21:12:52 -04:00
chenyu
c0139b05d8
python_alu sin(inf) is nan ( #5020 )
...
* python_alu sin(inf) is nan
without special handling, it throws ValueError: math domain error
* skip CUDACPU
2024-06-17 19:47:30 -04:00
chenyu
4296507021
Tensor.sum returns in acc_dtype if specified ( #5012 )
...
* Tensor.sum returns in acc_dtype if specified
* skip PYTHON for now
* revert that
* relax that
2024-06-17 16:35:52 -04:00
chenyu
013c73c3b3
minor refactor overflow handing in python backend ( #5015 )
...
made it clear that it's only handing int now. need to handle float inf next
2024-06-17 12:18:38 -04:00
Ray
1ad3b25461
fix einsum output str ( #4998 )
...
* fix einsum output str
* new line to satisfy linter
* removed redundant cast (satisfy linter)
2024-06-17 12:18:14 -04:00
nimlgen
794acefbf3
hcq update waits and signals in place ( #4984 )
...
* hcq update waits and signals in place
* start amd
* amd works
* prettier
* test
* normal messages
* linetr
* linter 2
2024-06-17 17:19:07 +03:00
qazal
603a4a0ce1
process replay contributor docs ( #5010 )
2024-06-17 09:38:59 -04:00
qazal
026c59543c
allow keyword args in UOp.store [run_process_replay] ( #5008 )
...
* allow keyword args in UOp.store [run_process_replay]
* same for load
* typing can stay
2024-06-17 15:42:27 +03:00
uuuvn
f1de8cd8cf
Convert a bunch more rules [run_process_replay] ( #5007 )
...
* Convert a bunch more rules [run_process_replay]
* more rules, narrow down CMPLT rule
* smart linter cut two lines
* nope, the linter is dumb
* make dumb linter shut up
* revert two rules
* Revert "revert two rules"
This reverts commit 585688da17 .
* fix
2024-06-17 15:16:31 +03:00
chenyu
c52352bd9a
fix yolov8 example ( #5003 )
...
it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.
2024-06-16 20:47:29 -04:00
nimlgen
8bc0cbf67b
nv tiny cleanups ( #5001 )
...
* nv tiny cleanups
* gpfifo rework
* return type
2024-06-17 00:43:44 +03:00
qazal
04feeb37e6
look for unsafe pad ops in multiview ShapeTrackers ( #5002 )
2024-06-17 00:28:12 +03:00
George Hotz
bee8fc29ee
add GPT2 half/half+beam to AMD ( #5000 )
...
* add GPT2 half/half+beam to AMD
* winograd in training. half and half/beam file upload
2024-06-16 14:07:14 -07:00
chenyu
72c9b22833
sort vars in jit when building expected input args ( #4990 )
...
* sort vars in jit when building expected input args
fixed symbolic jit bugs with two variables.
* sort in clanggraph
* space
* one more
2024-06-16 15:55:51 -04:00
qazal
71aad183fd
check Program from HEAD [run_process_replay] ( #4996 )
...
* use the same prg [run_process_replay]
* put var back
2024-06-16 20:12:30 +03:00
chenyu
2b07847f2b
matmul returns in acc_dtype if specified ( #4994 )
...
more flexible to not automatically downcast, can fix bert mixed precision training with this
2024-06-16 12:56:15 -04:00
George Hotz
1d6f1a15e1
add lt and ge uop methods [run_process_replay] ( #4995 )
...
* add lt and ge uop methods [run_process_replay]
* more correct (should still run process replay)
2024-06-16 09:33:53 -07:00
uuuvn
1b3f27565a
Boring UOps to UPat compiler [run_process_replay] ( #4991 )
...
* Boring UOps to UPat compiler
* ruff
* weirdness
* dtype fix
* Revert "weirdness"
This reverts commit 4bc213a157 .
* weirdness
* end weirdness?
* a bunch more rules
* more patterns
2024-06-16 09:03:41 -07:00
George Hotz
dac96f177e
ignore indexing in the flopcounter ( #4993 )
2024-06-16 08:59:55 -07:00
Timmy
01b26756d6
Multireduce Scheduler Tests ( #4972 )
...
* scheduler tests
* linters
* cleaning up tests
* fixing tests
* syntax
* fixing metal
2024-06-16 16:30:22 +03:00
chenyu
5eb8001514
minor cleanup in jit ( #4989 )
...
found a non-deterministic bug in jit with multiple variables. but first cleanup some variable names.
[run_process_replay]
2024-06-15 23:43:17 -04:00
chenyu
44dfa37c70
use threefry in stable diffusion benchmark ( #4988 )
...
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
chenyu
20b50d8d64
doc: manual_seed ( #4987 )
...
there was a docstring just not linked to the doc page. also updated the example to show re-seed instead of a internal variable
2024-06-15 19:57:26 -04:00
wozeparrot
ce1ed374c9
more tinychat fixes ( #4971 )
2024-06-15 16:29:39 -07:00
chenyu
50bc14d186
re-enable test that loads torch pkl format ( #4986 )
2024-06-15 14:11:30 -04:00
qazal
ff8e9eefc3
hotfix: don't use ASSERT_COMPILE for benchmarks process replay ( #4981 )
...
* use replay_codegen [run_process_replay]
* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06
Trigger process replay from pull request title [run_process_replay] ( #4980 )
...
* Trigger process replay from pull request title
* idk how this thing works btw
* test if it will work
* try 2
* Revert "idk how this thing works btw"
This reverts commit 580da51b07 .
* Revert "try 2"
This reverts commit 7ff1e86d5d .
* test if it works
* meh
* Reapply "idk how this thing works btw"
This reverts commit dd33ad7c14 .
* revert
2024-06-15 16:21:00 +03:00
uuuvn
033fb53f9e
Incomplete/buggy rule breaks process replay on #4976 ( #4978 )
...
* Incomplete/buggy rule breaks process replay on #4976
* test passes
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-06-15 15:18:35 +03:00
qazal
d91f0ee85b
add regression test for the neg folding pattern ( #4979 )
2024-06-15 15:08:28 +03:00
nimlgen
dfadf82e10
hcq optimize enqueue time ( #4973 )
...
* hcq optimize enqueue time
* linter
2024-06-15 10:47:25 +03:00
chenyu
5f7dd74655
docs: update wording for unflatten ( #4974 )
...
it was using `Expands`, the same in torch doc, but we also have expand so it's confusing
2024-06-14 23:12:41 -04:00
Cyril Roumégous
efbf4fca05
perf: graph_rewrite line reduction and make it a little bit faster [run_process_replay] ( #4958 )
2024-06-14 16:37:27 -07:00
wozeparrot
8209cd3c55
easier llama3 + fetch subdir ( #4938 )
2024-06-14 13:47:27 -07:00
chenyu
64cda3c481
raise TypeError calling len() on a 0-d tensor ( #4970 )
...
matched numpy and torch
2024-06-14 16:34:27 -04:00
chenyu
67e8df4969
remove numpy from dtype ( #4969 )
...
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
wozeparrot
62dc36d371
autogen _try_dlopen ( #4949 )
2024-06-14 12:12:18 -07:00
qazal
3e297d8216
delete Linearizer.const [run_process_replay] ( #4967 )
2024-06-14 21:51:37 +03:00