Commit Graph

3386 Commits

Author SHA1 Message Date
Guy Leroy
0dba34b81c Fix backward fn for < and == (#3037)
* fix no grad fn for < and ==

* remove 2 line breaks

* Remove deprecated autograd variable

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-14 20:39:52 -08:00
chenyu
db965a0c74 remove numpy from ops_torch (#3124)
updated mnist test to cast label to int8 and avoid hacking cast issue of torch uint8
2024-01-14 22:46:57 -05:00
George Hotz
1f9aee8b6f remove numpy from device (#3123)
* remove numpy from device

* fix tests

* np item

* cleanups

* simplify with as_buffer

* no toCPU

* tinygradic

* cast to scalar
2024-01-14 19:36:05 -08:00
George Hotz
ea5824657d move fromcpu out of lazy.py (#3122)
* move fromcpu out of lazy.py

* fix abstractions2
2024-01-14 18:21:08 -08:00
George Hotz
96345061d3 hotfix: ptrdtype compare was broken 2024-01-14 18:08:22 -08:00
Jyotirmaya Mahanta
26e0faf656 make DType a dataclass (#3111)
* remove np from DType

* convert to dataclass

* remove dunder hash, eq, ne overrides from ImageDType

* is dataclass required for PtrDType?

* fix GPU tests

* reduce lines

* revert changes to np

* minor cleanup
2024-01-14 17:15:59 -08:00
Yixiang Gao
c13d51da1d add device options for tests in multigpu (#3121) 2024-01-14 15:17:47 -08:00
chenyu
79f4627fbc fix conversation: llama generates token not prob now (#3120) 2024-01-14 13:10:01 -05:00
chenyu
152ef7fc79 minor cleanups of onnx_ops (#3116) 2024-01-14 02:15:24 -05:00
chenyu
fb3f8f7597 move sample inside jit for beautiful_mnist (#3115)
also removed .realize() for jit functions since jit does it automatically now. a little more beautiful
2024-01-14 01:36:30 -05:00
chenyu
a313e63a9b add Tensor.var (#3114)
also updated MeanVarianceNormalization and made test_ops test tensors of var and std smaller
2024-01-14 01:11:08 -05:00
chenyu
c658aa4fbf minor cleanup of test_disk_tensor (#3112) 2024-01-13 20:54:58 -05:00
chenyu
9c73d2724f cleanup ops_disk type annotation and redundant str cast (#3110) 2024-01-13 16:56:48 -05:00
chenyu
a300fea2a4 failed test case due to cast resets shapetracker (#3109)
cast implicitly resets shapetracker and makes it contiguous (for disk tensor), which fails for Interpreted backend if inputs contain non-contiguous st.
2024-01-13 12:46:51 -05:00
nimlgen
cf1d0a6704 no exceptions in __del__ when module creation is failed in hip/cuda (#3107) 2024-01-13 12:03:55 -05:00
chenyu
12f28ac9d4 catch runtime error in search._time_program (#3106)
return inf if search encountered runtime errors.
2024-01-12 21:53:13 -05:00
chenyu
f018a55ea1 update NumNode.__hash__ to be hash(self.b) (#3105)
with this, `a:=NumNode(x) == b` implies `hash(a) == hash(b)`
2024-01-12 19:46:21 -05:00
chenyu
c3c35f9142 flag to profile mixtral - 1.7 tok/s now (#3104) 2024-01-12 18:54:27 -05:00
chenyu
e078e2d060 add half @ half to mac benchmark (#3103) 2024-01-12 16:38:41 -05:00
Francis Lam
ddbdb52f77 wmma: enable METAL half tensor cores and clean up cstyle (#3095)
* wmma: enable METAL half tensor cores and clean up cstyle

* revert simple_matmul rand changes and break line in tensor

* added metal fp16->fp32 tensor core
2024-01-12 16:25:28 -05:00
chenyu
f96fc6e9d4 fix gpt2 with empty prompt take 2 (#3102)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b Revert "fix gpt2 with empty prompt" (#3101) 2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d fix gpt2 with empty prompt (#3100)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
SnakeOnex
0c49d38ba7 replace with tensor op (#3099) 2024-01-12 14:13:40 -05:00
chenyu
f3a50b4e40 fix broadcasted logic if there's 0 in shapes (#3097)
* fix broadcasted logic if there's 0 in shapes

should always expand into 0, not the other way around. fixed matmul with 0 in input shapes.
for forwards for now though, backward is more involved and would need to change 0 size shortcuts

* fix tests
2024-01-12 13:32:43 -05:00
SnakeOnex
025fbf4e80 One hot in tensor.py (#3093)
* onehot in Tensor.py

* one_hot tests

* works for all shapes, not just 1

* pylint

* not a static method

* moved around, num_classes mandatory

* pylint

* pylint

* space & moving

* formatting

* moved tests
2024-01-12 13:31:18 -05:00
chenyu
7086d77db1 bugfix do not reset shapetracker of 0 size lazybuffer (#3096)
it might be coming from an expand, and resetting results incorrect stride. caught by interpreted backend
2024-01-11 23:22:52 -05:00
Yixiang Gao
13e872b53f add mutigpu support for llama attention (#3064)
* add llama attention test for multigpu

* test fails

* kv cache trying to shrink on sharded axis

* mask None works for scale dot product

* kv cache seems to be working but scale dot product breaks

* scaled dot product works, but the last linear layer failed

* running into the reshape case where it could be wrong for multigpu

* making sure it was the reshape

* adding contiguous doesn't solve

* need to shard more properly

* remove reshape test

* minor adjustment to scale dot product attention test

* weights are sharded wrong

* continue fix new weight sharding

* clean up

* fix attention when start_pos is 0

* remove print

* add TODOs for the best mutigpu interface
2024-01-11 16:31:02 -08:00
chenyu
dcf7ecaaff update jit type annotation post lazy rewrite (#3091) 2024-01-11 15:49:30 -05:00
chenyu
0fe6904351 use device from LinearizerOptions in kernel search (#3090)
* use device from LinearizerOptions in kernel search

removed all Device.DEFAULT in search.py

* pass device string for parallel pickle

* device for interpreted backends in LinearizerOptions
2024-01-11 14:46:03 -05:00
chenyu
93e3f952aa use BEAM=2 instead of BEAM=4 in cuda ci gpt2 (#3089)
BEAM=2 is faster and less search time. investigating why BEAM2+BEAM4 is slower than BEAM2 alone
2024-01-11 13:21:06 -05:00
chenyu
f502c9b08f minor cleanup of View.reshape (#3088)
* minor cleanup of View.reshape

removed some redundant logic

* new_strides

* revert that
2024-01-11 13:05:54 -05:00
chenyu
f40299c3fe remove the third merging state in view._merge_dims (#3085)
no logic depends on state == 0 or state == 2
2024-01-11 12:07:43 -05:00
chenyu
7f9590d357 hotfix disable flaky mac runner wino cifar (#3087) 2024-01-11 11:57:05 -05:00
Yixiang Gao
adcc844755 cat works (#3086) 2024-01-11 08:25:20 -08:00
chenyu
cdeab9ad97 mem_estimate is always int, not symbolic (#3083)
* mem_estimate is always int, not symbolic

op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it.
fixed some long lines too. update_stats is a very big function

* operator does not need underscores
2024-01-10 23:39:51 -05:00
Francis Lam
162fa61a32 wmma: clean up device specific tensor core code (#3081) 2024-01-10 21:03:09 -05:00
chenyu
d218d13885 minor cleanups of lazy.py (#3080) 2024-01-10 20:17:56 -05:00
chenyu
56dda33fc6 Tensor.expand resolves the new_shape before shortcut return (#3078)
similar to how reshape is done. also updated shrink shortcut criteria to read similar to pad
2024-01-10 14:29:15 -05:00
Yixiang Gao
6842476ca6 better test demonstration (#3077)
* a better test demonstration

* fix white space
2024-01-10 10:50:52 -08:00
chenyu
507e0afba0 fix onehot and jit in examples/transformer (#3073)
trained to 0.999 in < 6 seconds on M1 Max consistently
2024-01-10 02:22:41 -05:00
chenyu
4342fccc83 filter_strides -> canonicalize_strides (#3072) 2024-01-10 01:06:48 -05:00
chenyu
023f5df0e9 simpler idxs_to_idx (#3071) 2024-01-10 00:30:10 -05:00
George Hotz
2495ca95c7 early gate the graph (#3070) 2024-01-09 20:17:13 -08:00
George Hotz
ff0d6e4551 jit autorealizes output (#3069) 2024-01-09 20:10:22 -08:00
George Hotz
ae83733431 hotfix: examples/transformer.py 2024-01-09 19:28:09 -08:00
chenyu
145718a90f unbind view or shapetracker also returns var_val (#3067)
* unbind view or shapetracker also returns var_val

4% faster for llama compile time

* one line less

* unbound_views
2024-01-09 21:45:05 -05:00
jxdv
ef3aa6d7fb update gh actions (#3033)
* update checkout actions

* update upload artifact

* update setup python

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-09 17:52:22 -08:00
George Hotz
3f80c1a098 speedtweaks3: apply shouldn't use the tensor constructor (#3065)
* speedtweaks3: apply shouldn't use the tensor constructor

* replace 0 size with CONST, not 0 in shape
2024-01-09 17:42:33 -08:00
George Hotz
0abe72b677 hotfix: use is for enum compare, a few more 2024-01-09 16:53:13 -08:00