George Hotz
b69afc67d8
tinybox docs typo
2024-06-20 17:58:40 -07:00
George Hotz
6bc5e5f41c
start tinybox docs
2024-06-20 17:04:45 -07:00
chenyu
f6d6760f71
don't cast tuple to list before creating Tensor ( #5071 )
...
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
qazal
97f1347dd9
fix check_process_replay for special characters ( #5072 )
...
* 'test' [run_process_replay] [no_assert]
* test with ( ) { } '' " "
* remove the log [run_process_replay] '' () { } '{
* helpful echos [run_process_replay] [no_assert] () ''
* test [run_process_replay] [no_assert]
* test2 [run_process_replay] [no_assert]
* test3 [run_process_replay] [no_assert]
* it's also correct this way [run_process_replay] [no_assert]
* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
George Hotz
6f6b3b10c9
import from uops, not linearizer ( #5064 )
2024-06-20 08:08:44 -07:00
chenyu
50700171ef
minor cleanup to reshape arg handling ( #5070 )
...
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00
chenyu
f4355d0f1b
check Tensor.permute input arg is a valid permutation ( #5069 )
...
also added support of negative axes
2024-06-20 10:01:28 -04:00
qazal
24c89a2a33
move assert_equiv_uops to helpers + use == for dtypes ( #5067 )
...
* dtypes should use ==
* use TestUOps
* should use assertIs
2024-06-20 16:39:34 +03:00
chenyu
e8f39fcaaa
check arg to Tensor.flip can appear only once ( #5068 )
...
* check arg to Tensor.flip can appear only once
raise RuntimeError if there are multiple
* fix test
2024-06-20 09:33:42 -04:00
qazal
55e02cdd84
generic gate folding ( #5061 )
...
* add assert
* fold truthy gates [run_process_replay]
* fold falsy gates [run_process_replay] [no_assert]
* redo asserts
* check both barriers
* spec start
* spec end
* assert srcs
* make test_fold_gated_load_local better
* [run_process_replay] [no_assert]
2024-06-20 16:10:08 +03:00
kormann
bdca2da2be
typannos ( #5059 )
2024-06-20 09:02:31 -04:00
chenyu
5f7edc7a46
minor cleanup getting output shape of _pool ( #5065 )
...
math.ceil makes intent clear, also same formula works in both cases [run_process_replay]
2024-06-20 09:00:48 -04:00
qazal
a6a5dba637
Revert "UPat for has_valid in load/store ( #5052 )" ( #5056 )
...
* manually insert in the Linearizer
* fix process replay
2024-06-19 20:53:36 +03:00
qazal
ee01e464e3
use process replay as a diff creator ( #4903 )
...
* add no_assert option [run_process_replay] [no_assert]
* test [run_process_replay] [no_assert]
* [run_process_replay]
* back to normal [run_process_replay]
* remove the log
2024-06-19 18:17:31 +03:00
qazal
99fc275c27
UPat line savings [run_process_replay] ( #5053 )
...
* line savings
* move to new style
2024-06-19 12:43:20 +03:00
qazal
71194df1da
UPat for has_valid in load/store [run_process_replay] ( #5052 )
...
* fold gated load/store [run_process_replay]
* handle temp loads
* direct store
2024-06-19 12:20:22 +03:00
chenyu
996788358d
minor change to gelu ( #5048 )
...
used `math.sqrt(2 / math.pi)` instead of `0.7978845608`, and moved one mul self inside parentheses. this matched the paper and llm.c
2024-06-18 22:26:56 -04:00
chenyu
4c7e316ded
update pylint for ops_python ( #5046 )
...
the two errors (cell-var-from-loop and arguments-out-of-order) does not apply as we use it as intended.
2024-06-18 20:15:34 -04:00
wozeparrot
acb715c64c
fix: llama3 special tokens ( #5045 )
2024-06-18 17:08:44 -07:00
chenyu
a8e9307e0b
pylint runtime/ and shape/ ( #5044 )
...
as pointed out by #4877 , need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
chenyu
cc2be9064f
fix out of bound python list into numpy array ( #5043 )
...
numpy 2.0 does not allow oob python const and recommends writing as `np.array(value).astype(dtype)`
2024-06-18 18:05:21 -04:00
chenyu
4e5add4d01
move test_tqdm to test/unit/ ( #5042 )
2024-06-18 17:41:39 -04:00
chenyu
2b2488f2e2
revert creating Tensor from a list without numpy ( #5041 )
...
the change was incomplete and broke creating Tensor from a list of np array
2024-06-18 17:31:22 -04:00
chenyu
e2c5054bdd
update resnet.load_from_pretrained ( #5040 )
2024-06-18 16:29:22 -04:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
kormann
fe332464d2
src->vin [run_process_replay] ( #5036 )
2024-06-18 22:23:49 +03:00
reddyn12
f171006ded
Should this symbolic test fail? ( #4501 )
...
* add test
* skip test
* use expected failure decorator
---------
Co-authored-by: schlimeszn <schlimeszn@gmail.com >
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-06-18 15:21:26 -04:00
kormann
7c3b877216
rename uop [run_process_replay] ( #5031 )
...
* rename
* fix unittests
* rename vin
* fix test
* fix type [run_process_replay]
* rm pre commit hook change
2024-06-18 21:34:05 +03:00
chenyu
dc942bf1f6
jit sampling functionn in test_randomness.test_multinomial ( #5034 )
...
* jit sampling functionn in test_randomness.test_multinomial
`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec
* skip that
2024-06-18 14:21:05 -04:00
Elias Wahl
f31ef11537
Better default hparams for large BS ( #5030 )
...
* better default hparams for large BS
* bf16 too
* use tuple
2024-06-18 11:13:06 -04:00
Francis Lam
8d33998e0d
[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max ( #4855 )
...
* linearizer: fix get_grouping_dims to respect global/local max
* fix lidx variable index offset and unrestrict clang/llvm global len
* test reverse variable indexing when reverse_dims is true
* change the collapse axis to be the right most if reversed
2024-06-18 16:51:27 +03:00
joeshmoe0112358
7842559952
simplification of exp2 ( #5023 )
2024-06-18 06:51:16 -07:00
kormann
acc8f5e30e
print_tree for uops ( #5028 )
2024-06-18 06:36:14 -07:00
Junjun Dong
c8cd6e725c
Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset ( #4977 )
...
* feat: remove BinaryOps.SUB
* remove SUB in test_early_end_local
* regenerate dataset. remove SUB in test_linearizer_*
* reenable overflow tests
* simplify tensor.sub function by returning a+(-b)
* remove whitespaces
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-18 09:06:13 -04:00
chenyu
620fa6e5a2
check Tensor.reshape can have at most one -1 ( #5026 )
...
raise RuntimeError to match torch. on master it throws weird errors from shapetracker
2024-06-18 08:17:12 -04:00
Elias Wahl
7bfa9101c0
Float in scaled dot product attention ( #4985 )
...
* Monkeypatch scaled-dot-product-attention
* Use dot instead of matmul
* new api
* imports
* least_upper_dtype
2024-06-18 08:16:41 -04:00
nimlgen
194a168630
hcq signal scheduler ( #5016 )
...
* faster hcq
* fix nv
* linter
* cleaner
* fix sync
* cleaner
* a bit cleaner
2024-06-18 14:02:21 +03:00
chenyu
e9c6a36894
remove CACHELEVEL=0 in llama3 benchmark ( #5025 )
2024-06-17 22:43:16 -04:00
chenyu
acaf9a490d
RECIP(-0.0) should be -inf ( #5024 )
...
* RECIP(-0.0) should be -inf
added test_dtype_alu for PYTHON backend
* catcht that
* fix those two
2024-06-17 22:26:58 -04:00
GabrielZCode
66760ae558
graph display floats rounded ( #5021 )
...
Co-authored-by: gabrielsouza <gabriel.martins@perdcomp.com.br >
2024-06-17 18:22:55 -07:00
chenyu
03b367c014
handle float16 overflow in PYTHON ( #5022 )
...
* handle float16 overflow in PYTHON
use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf.
* all valid fmt supports truncate
2024-06-17 21:12:52 -04:00
chenyu
c0139b05d8
python_alu sin(inf) is nan ( #5020 )
...
* python_alu sin(inf) is nan
without special handling, it throws ValueError: math domain error
* skip CUDACPU
2024-06-17 19:47:30 -04:00
chenyu
4296507021
Tensor.sum returns in acc_dtype if specified ( #5012 )
...
* Tensor.sum returns in acc_dtype if specified
* skip PYTHON for now
* revert that
* relax that
2024-06-17 16:35:52 -04:00
chenyu
013c73c3b3
minor refactor overflow handing in python backend ( #5015 )
...
made it clear that it's only handing int now. need to handle float inf next
2024-06-17 12:18:38 -04:00
Ray
1ad3b25461
fix einsum output str ( #4998 )
...
* fix einsum output str
* new line to satisfy linter
* removed redundant cast (satisfy linter)
2024-06-17 12:18:14 -04:00
nimlgen
794acefbf3
hcq update waits and signals in place ( #4984 )
...
* hcq update waits and signals in place
* start amd
* amd works
* prettier
* test
* normal messages
* linetr
* linter 2
2024-06-17 17:19:07 +03:00
qazal
603a4a0ce1
process replay contributor docs ( #5010 )
2024-06-17 09:38:59 -04:00
qazal
026c59543c
allow keyword args in UOp.store [run_process_replay] ( #5008 )
...
* allow keyword args in UOp.store [run_process_replay]
* same for load
* typing can stay
2024-06-17 15:42:27 +03:00
uuuvn
f1de8cd8cf
Convert a bunch more rules [run_process_replay] ( #5007 )
...
* Convert a bunch more rules [run_process_replay]
* more rules, narrow down CMPLT rule
* smart linter cut two lines
* nope, the linter is dumb
* make dumb linter shut up
* revert two rules
* Revert "revert two rules"
This reverts commit 585688da17 .
* fix
2024-06-17 15:16:31 +03:00
chenyu
c52352bd9a
fix yolov8 example ( #5003 )
...
it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.
2024-06-16 20:47:29 -04:00