Commit Graph

4147 Commits

Author SHA1 Message Date
chenyu
7c80b78be9 cleanup gpt2 build function (#3018) 2024-01-04 23:14:53 -05:00
chenyu
55e52abeba minor cleanup of matvec in hand_coded_optimizations (#3015)
remove noop isinstance check and fix long lines
2024-01-04 19:43:49 -05:00
chenyu
f88506e630 move gpt2/llama sampling inside the model call (#3013)
* move gpt2/llama sampling inside the model call

* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
George Hotz
c2a044ed83 disk_read_speed example 2024-01-04 13:59:43 -08:00
Yixiang Gao
8a63f26a0f make LR scheduler work with multigpu (#3011)
* add a failing test for LR scheduler when using multigpu

* fix calculation order and unnecessary tensor created for float

* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
chenyu
8524493748 minor gpt2 cleanup (#3012) 2024-01-04 13:53:18 -05:00
chenyu
2b6670d2ea separate entry for HALF hlb_cifar10 in benchmark (#3010) 2024-01-04 13:24:10 -05:00
chenyu
5337211058 llvm CMPEQ 2024-01-04 13:12:22 -05:00
chenyu
b8c30eb358 no midcast MULACC for llvm 2024-01-04 13:12:22 -05:00
chenyu
91665ef143 rewrite MUL CAST SUM to CAST MULACC 2024-01-04 13:12:22 -05:00
chenyu
ab7dfd637b use float for acc dtype for half tensor sum
we previously only upcast uint and int, and half was using half for acc.
change to acc in float for precision. but cast the result back to half to match torch/jax output dtype
2024-01-04 13:12:22 -05:00
chenyu
6fa285b943 touchup onnx xor and not (#3008) 2024-01-04 02:02:42 -05:00
geohotstan
57817028bb removed redundant dtype hacks in onnx_ops (#2939)
* updated most dtype hacks in onnx_ops

* temporarily revert dequantizelinear change

* I think this is right...

* MORE FIXES WOOOO NEW DTYPE IS AWESOME

* ok

* oops missed a print

* half -> float32 for CI

* is npdtype

* some more

* fix if ordering

* more clean ups

* final cleanups

* casting to half not allowed

* k nvm

* revert ArgMax change

* only GPU

* llvm begone

* teeny tiny change

* fix: attempt to add cast tests

* try this

* fix dequantizelinear

* revert some stuff

* tests pass pls

* less lines in onnx_tests

* oops missed string tensor tests

* clean up

* try: revert default behavior changes

* fix: disabled Cast and Castlike tests

* docs: small changes

* fix: fixed isNaN op and enabled associated tests

* fix: forgot about float16

* done

* update disabled test

* gah missed another float16

* disable rest of failing tests

* rm extra line

* try...

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-01-04 01:45:24 -05:00
chenyu
9f39165188 correct (dtype, device) in test_dtype.is_dtype_supported (#3007)
corrected dtypes for TORCH and float64 support
2024-01-04 00:25:37 -05:00
chenyu
ae112c9dbe fix some long lines in tests (#3006)
* fix some long lines in tests

* better
2024-01-03 23:53:33 -05:00
George Hotz
7e191fbb86 hotfix: don't jitcache with 1 kernel. improvements to hip sniffer 2024-01-03 19:17:08 -08:00
George Hotz
bcc1aa21ac make disk simpler (#3002)
* make disk simpler

* upd ops_disk

* works on osx too

* revert ops_hip
2024-01-03 17:46:21 -08:00
George Hotz
9699c8c90b don't alloc for InterpretedASTRunner (#2999) 2024-01-03 17:05:53 -08:00
chenyu
bca0b95ee3 bump shapetracker simplify message to DEBUG >= 5 (#2998) 2024-01-03 20:00:36 -05:00
chenyu
74a30431b4 replace d[a] if a in d else b with d.get(a, b) (#2997) 2024-01-03 18:10:25 -05:00
chenyu
74cc6fd3c2 remove AndNode.__floordiv__ special case (#2996)
* remove AndNode.__floordiv__

AndNode produces a Node that min/max is bounded by [0, 1] so `//` on top of that is almost always 0.
we don't really use that either

* keep the test
2024-01-03 17:44:55 -05:00
George Hotz
a0c7cb2564 hotfix: create weights dir in local tg checkout 2024-01-03 14:14:33 -08:00
George Hotz
fc36a7d669 tinygrad weights 2024-01-03 14:09:28 -08:00
chenyu
1ac4d27869 remove VariableOrNum from Node.substitute arg (#2995)
having NumNode in var_vals does not change the substitute output
2024-01-03 17:02:25 -05:00
George Hotz
65dc3700b7 hip device is default on supported platforms (#2993) 2024-01-03 13:42:13 -08:00
George Hotz
77c98a1543 hotfix: remove weights directory 2024-01-03 13:40:39 -08:00
George Hotz
0be0f2f745 remove stable diffusion test on tinymac 2024-01-03 13:18:24 -08:00
George Hotz
a354ec9dad Revert "hotfix: HIP is the default device on HIP platforms"
This reverts commit b748b569f5.
2024-01-03 13:16:54 -08:00
George Hotz
b748b569f5 hotfix: HIP is the default device on HIP platforms 2024-01-03 13:13:52 -08:00
George Hotz
753a7ecc05 Hip driver (#2992)
* start hip driver

* fix hip llama

* make HIP default if we can

* don't change those
2024-01-03 12:53:47 -08:00
George Hotz
f290ca3924 hotfix: save lines in graph 2024-01-03 12:03:42 -08:00
Yixiang Gao
bc4b6e758b Merge pull request #2981 from g1y5x3/cifar_fp16
adjsut div factor to avoid underflow for cifar in fp16
2024-01-03 11:15:42 -08:00
George Hotz
d7d5a487ad hotfix: all device canonicalize should be done in Tensor 2024-01-03 10:48:04 -08:00
Yixiang Gao
ea3bc2f509 remove wino benchmark for now 2024-01-03 10:46:43 -08:00
Yixiang Gao
5663dd46b6 Merge branch 'master' of github.com:tinygrad/tinygrad into cifar_fp16 2024-01-03 10:11:46 -08:00
chenyu
81b97cd2c6 canonicalize device in LazyBuffer constructor (#2991)
fixed the multitensor +1 then sum bug
2024-01-03 12:55:25 -05:00
chenyu
db525cf8c2 multitensor failed test case with +1 then sum on DEVICE:0 (#2990) 2024-01-03 12:17:11 -05:00
Yixiang Gao
7f1802cd50 update benchmark 2024-01-03 09:09:34 -08:00
George Hotz
5dbaaa7061 hotfix: make multitensor shard contiguous 2024-01-03 08:48:30 -08:00
chenyu
590268fa03 out_tokens -> grouped in linearizer (#2989)
no more token now
2024-01-03 11:45:28 -05:00
Yixiang Gao
8e1fd6ae9d test works 2024-01-03 07:22:01 -08:00
Yixiang Gao
4f89f8b73a make sure the old hyp breaks the test 2024-01-03 07:13:54 -08:00
Yixiang Gao
84eb6dd32a skip GPU cause opencl on intel can't compile half 2024-01-03 07:07:21 -08:00
Yixiang Gao
73879b50ad only need to check the min_lr for the nan bug 2024-01-03 07:00:50 -08:00
Yixiang Gao
99f8740c60 running half in CI CPU is slow 2024-01-02 18:44:35 -08:00
Yixiang Gao
781690fd99 how long it takes on CI CPU without the lr scheduler 2024-01-02 18:33:48 -08:00
Yixiang Gao
dd00bcb9c0 fix whitespace 2024-01-02 18:16:33 -08:00
Yixiang Gao
841487cad9 add half test with using hyp from benchmarks 2024-01-02 18:14:30 -08:00
George Hotz
f494b9d463 simple multitensor API (#2903)
* simple multitensor API

* test multitensor

* mt work

* new api

* copies

* all but data parallel

* allreduce there

* works, but axis sharded

* fix all mt tests

* features/multi

* work

* backprop

* fix tests

* tests passing

* mt progress

* cleanups

* less lines

* tensor cleanup

* save more lines

* mypy passes

* fix tests

* skip for cuda too

* bump download cache
2024-01-02 17:49:44 -08:00
George Hotz
5522ba234b simplify image functions (#2987)
* simplify image functions

* line in tensor
2024-01-02 17:35:08 -08:00