Commit Graph

3395 Commits

Author SHA1 Message Date
George Hotz
e4528543fa remove LLVMOPT 2024-01-15 16:01:09 -08:00
George Hotz
a464909d79 fast resnet eval (#3135)
* fast resnet eval

* fix HIP multidevice graph

* neater expression for devices

* lines

* add decorator test
2024-01-15 14:15:18 -08:00
Jyotirmaya Mahanta
b7b494e9b8 no numpy (#3134) 2024-01-15 13:09:05 -08:00
Paul Gustafson
6bb65cd02e fix off-by-one error in st_equal (#3131)
* fix off by one error

* whitespace
2024-01-15 11:32:13 -08:00
George Hotz
44c05919c1 dtype fmt (#3132)
* dtype fmt

* three ways to access
2024-01-15 11:31:54 -08:00
nimlgen
5ec66938de remove np from metal graph (#3129) 2024-01-15 11:44:35 -05:00
Jyotirmaya Mahanta
2ef09ca641 update test_ptr_ne (#3130) 2024-01-15 11:36:29 -05:00
chenyu
e39cd3e7f2 update env_vars.md (#3127)
mostly removed deprecated ones. not clear how to maintain this especially for extra/examples
2024-01-15 01:06:56 -05:00
chenyu
537fb8b0b8 separate try except blocks in onnx2torch in model benchmark (#3126)
exceptions can be raised from either model conversion or individual backend failed. openpilot on torch mps works, but does not work with torch cpu.
seperate the expcetion block so that the benchmark can inlcude torch mps for openpilot.
2024-01-15 00:39:33 -05:00
Guy Leroy
0dba34b81c Fix backward fn for < and == (#3037)
* fix no grad fn for < and ==

* remove 2 line breaks

* Remove deprecated autograd variable

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-14 20:39:52 -08:00
chenyu
db965a0c74 remove numpy from ops_torch (#3124)
updated mnist test to cast label to int8 and avoid hacking cast issue of torch uint8
2024-01-14 22:46:57 -05:00
George Hotz
1f9aee8b6f remove numpy from device (#3123)
* remove numpy from device

* fix tests

* np item

* cleanups

* simplify with as_buffer

* no toCPU

* tinygradic

* cast to scalar
2024-01-14 19:36:05 -08:00
George Hotz
ea5824657d move fromcpu out of lazy.py (#3122)
* move fromcpu out of lazy.py

* fix abstractions2
2024-01-14 18:21:08 -08:00
George Hotz
96345061d3 hotfix: ptrdtype compare was broken 2024-01-14 18:08:22 -08:00
Jyotirmaya Mahanta
26e0faf656 make DType a dataclass (#3111)
* remove np from DType

* convert to dataclass

* remove dunder hash, eq, ne overrides from ImageDType

* is dataclass required for PtrDType?

* fix GPU tests

* reduce lines

* revert changes to np

* minor cleanup
2024-01-14 17:15:59 -08:00
Yixiang Gao
c13d51da1d add device options for tests in multigpu (#3121) 2024-01-14 15:17:47 -08:00
chenyu
79f4627fbc fix conversation: llama generates token not prob now (#3120) 2024-01-14 13:10:01 -05:00
chenyu
152ef7fc79 minor cleanups of onnx_ops (#3116) 2024-01-14 02:15:24 -05:00
chenyu
fb3f8f7597 move sample inside jit for beautiful_mnist (#3115)
also removed .realize() for jit functions since jit does it automatically now. a little more beautiful
2024-01-14 01:36:30 -05:00
chenyu
a313e63a9b add Tensor.var (#3114)
also updated MeanVarianceNormalization and made test_ops test tensors of var and std smaller
2024-01-14 01:11:08 -05:00
chenyu
c658aa4fbf minor cleanup of test_disk_tensor (#3112) 2024-01-13 20:54:58 -05:00
chenyu
9c73d2724f cleanup ops_disk type annotation and redundant str cast (#3110) 2024-01-13 16:56:48 -05:00
chenyu
a300fea2a4 failed test case due to cast resets shapetracker (#3109)
cast implicitly resets shapetracker and makes it contiguous (for disk tensor), which fails for Interpreted backend if inputs contain non-contiguous st.
2024-01-13 12:46:51 -05:00
nimlgen
cf1d0a6704 no exceptions in __del__ when module creation is failed in hip/cuda (#3107) 2024-01-13 12:03:55 -05:00
chenyu
12f28ac9d4 catch runtime error in search._time_program (#3106)
return inf if search encountered runtime errors.
2024-01-12 21:53:13 -05:00
chenyu
f018a55ea1 update NumNode.__hash__ to be hash(self.b) (#3105)
with this, `a:=NumNode(x) == b` implies `hash(a) == hash(b)`
2024-01-12 19:46:21 -05:00
chenyu
c3c35f9142 flag to profile mixtral - 1.7 tok/s now (#3104) 2024-01-12 18:54:27 -05:00
chenyu
e078e2d060 add half @ half to mac benchmark (#3103) 2024-01-12 16:38:41 -05:00
Francis Lam
ddbdb52f77 wmma: enable METAL half tensor cores and clean up cstyle (#3095)
* wmma: enable METAL half tensor cores and clean up cstyle

* revert simple_matmul rand changes and break line in tensor

* added metal fp16->fp32 tensor core
2024-01-12 16:25:28 -05:00
chenyu
f96fc6e9d4 fix gpt2 with empty prompt take 2 (#3102)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b Revert "fix gpt2 with empty prompt" (#3101) 2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d fix gpt2 with empty prompt (#3100)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
SnakeOnex
0c49d38ba7 replace with tensor op (#3099) 2024-01-12 14:13:40 -05:00
chenyu
f3a50b4e40 fix broadcasted logic if there's 0 in shapes (#3097)
* fix broadcasted logic if there's 0 in shapes

should always expand into 0, not the other way around. fixed matmul with 0 in input shapes.
for forwards for now though, backward is more involved and would need to change 0 size shortcuts

* fix tests
2024-01-12 13:32:43 -05:00
SnakeOnex
025fbf4e80 One hot in tensor.py (#3093)
* onehot in Tensor.py

* one_hot tests

* works for all shapes, not just 1

* pylint

* not a static method

* moved around, num_classes mandatory

* pylint

* pylint

* space & moving

* formatting

* moved tests
2024-01-12 13:31:18 -05:00
chenyu
7086d77db1 bugfix do not reset shapetracker of 0 size lazybuffer (#3096)
it might be coming from an expand, and resetting results incorrect stride. caught by interpreted backend
2024-01-11 23:22:52 -05:00
Yixiang Gao
13e872b53f add mutigpu support for llama attention (#3064)
* add llama attention test for multigpu

* test fails

* kv cache trying to shrink on sharded axis

* mask None works for scale dot product

* kv cache seems to be working but scale dot product breaks

* scaled dot product works, but the last linear layer failed

* running into the reshape case where it could be wrong for multigpu

* making sure it was the reshape

* adding contiguous doesn't solve

* need to shard more properly

* remove reshape test

* minor adjustment to scale dot product attention test

* weights are sharded wrong

* continue fix new weight sharding

* clean up

* fix attention when start_pos is 0

* remove print

* add TODOs for the best mutigpu interface
2024-01-11 16:31:02 -08:00
chenyu
dcf7ecaaff update jit type annotation post lazy rewrite (#3091) 2024-01-11 15:49:30 -05:00
chenyu
0fe6904351 use device from LinearizerOptions in kernel search (#3090)
* use device from LinearizerOptions in kernel search

removed all Device.DEFAULT in search.py

* pass device string for parallel pickle

* device for interpreted backends in LinearizerOptions
2024-01-11 14:46:03 -05:00
chenyu
93e3f952aa use BEAM=2 instead of BEAM=4 in cuda ci gpt2 (#3089)
BEAM=2 is faster and less search time. investigating why BEAM2+BEAM4 is slower than BEAM2 alone
2024-01-11 13:21:06 -05:00
chenyu
f502c9b08f minor cleanup of View.reshape (#3088)
* minor cleanup of View.reshape

removed some redundant logic

* new_strides

* revert that
2024-01-11 13:05:54 -05:00
chenyu
f40299c3fe remove the third merging state in view._merge_dims (#3085)
no logic depends on state == 0 or state == 2
2024-01-11 12:07:43 -05:00
chenyu
7f9590d357 hotfix disable flaky mac runner wino cifar (#3087) 2024-01-11 11:57:05 -05:00
Yixiang Gao
adcc844755 cat works (#3086) 2024-01-11 08:25:20 -08:00
chenyu
cdeab9ad97 mem_estimate is always int, not symbolic (#3083)
* mem_estimate is always int, not symbolic

op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it.
fixed some long lines too. update_stats is a very big function

* operator does not need underscores
2024-01-10 23:39:51 -05:00
Francis Lam
162fa61a32 wmma: clean up device specific tensor core code (#3081) 2024-01-10 21:03:09 -05:00
chenyu
d218d13885 minor cleanups of lazy.py (#3080) 2024-01-10 20:17:56 -05:00
chenyu
56dda33fc6 Tensor.expand resolves the new_shape before shortcut return (#3078)
similar to how reshape is done. also updated shrink shortcut criteria to read similar to pad
2024-01-10 14:29:15 -05:00
Yixiang Gao
6842476ca6 better test demonstration (#3077)
* a better test demonstration

* fix white space
2024-01-10 10:50:52 -08:00
chenyu
507e0afba0 fix onehot and jit in examples/transformer (#3073)
trained to 0.999 in < 6 seconds on M1 Max consistently
2024-01-10 02:22:41 -05:00