George Hotz
8cbcd1b342
Remove webgpu, back to 5k lines ( #3040 )
...
* remove webgpu
* max 5000 lines
2024-01-08 09:10:07 -08:00
George Hotz
cf2eea961c
more beautiful_cartpole with exposed hparams
2024-01-07 17:41:09 -08:00
Yixiang Gao
44618427f1
add bf16 type_map for both cuda and hip ( #3036 )
...
* add typemap bfloat16 for cuda and hip
* add render_dtype
* add def in CStyleLanguage
* fix def
* save one line
* add header file for cuda bf16
2024-01-07 14:26:55 -08:00
chenyu
ef5f545fd8
add more Tensor.clip test cases ( #3034 )
...
* add more Tensor.clip test cases
add cases for same low/high and both negative etc
* case min > max
2024-01-07 13:08:59 -05:00
chenyu
c9371f0d31
hotfix llama conversation mode ( #3031 )
...
without contiguous on keys and values, it runs but the update is incorrect
2024-01-06 16:57:07 -05:00
chenyu
fa707c81e5
move beautiful cartpole action sampling inside jit ( #3028 )
...
tested by getting 3 full scores in a row
2024-01-06 00:39:55 -05:00
George Hotz
ebb81e8f11
hotfix: st.size() -> st.size in llama
2024-01-05 20:18:52 -08:00
George Hotz
a8ba1ac08f
track size in shapetracker ( #3026 )
...
* track size in shapetracker
* shapetracker adapter
* size is an int
* create Buffer with st.size
* only compare the views for the jit
* fix webgpu
2024-01-05 20:15:53 -08:00
chenyu
138c17c094
enable argmax tests for METAL/WEBGPU in CI ( #3027 )
...
not sure why it was skipped but works now in CI
2024-01-05 21:43:00 -05:00
George Hotz
2a2d3233d2
add test that the compiler isn't used ( #3025 )
...
* add test that the compiler isn't used
* one print_tree
* improve speed with st size cache
* switch to gpt-2
2024-01-05 17:24:01 -08:00
chenyu
520406cf3a
add Tensor.unflatten and Tensor.flatten(end_dim) ( #3023 )
...
simplified cases when splitting a dim, or merge dims in predix
2024-01-05 17:55:29 -05:00
George Hotz
f432ec9c33
Bitcast hip fix + fix mixtral ( #3022 )
...
* fix bitcast in hip
* wrong dtype for precast, double COPY
2024-01-05 14:51:25 -08:00
chenyu
eda43767de
use Scalar = Union[float, int, bool] in tensor.py ( #3021 )
...
unify the type spec for Tensor creation functions and broadcasted elementwise ops that take python scalar
2024-01-05 13:56:26 -05:00
George Hotz
60abc62a3f
fast hip read ( #3014 )
...
* fast hip read
* hip read faster
* fix tests
* to_mv
* simplify
* bump to 6k lines
2024-01-05 10:33:13 -08:00
chenyu
4465ef28c5
add test_softmax to test_ops ( #3020 )
...
* add test_softmax to test_ops
somehow it was not tested
* too many buffers in softmax backward for WEBGPU
2024-01-05 11:19:49 -05:00
chenyu
7c80b78be9
cleanup gpt2 build function ( #3018 )
2024-01-04 23:14:53 -05:00
chenyu
55e52abeba
minor cleanup of matvec in hand_coded_optimizations ( #3015 )
...
remove noop isinstance check and fix long lines
2024-01-04 19:43:49 -05:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
George Hotz
c2a044ed83
disk_read_speed example
2024-01-04 13:59:43 -08:00
Yixiang Gao
8a63f26a0f
make LR scheduler work with multigpu ( #3011 )
...
* add a failing test for LR scheduler when using multigpu
* fix calculation order and unnecessary tensor created for float
* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
chenyu
8524493748
minor gpt2 cleanup ( #3012 )
2024-01-04 13:53:18 -05:00
chenyu
2b6670d2ea
separate entry for HALF hlb_cifar10 in benchmark ( #3010 )
2024-01-04 13:24:10 -05:00
chenyu
5337211058
llvm CMPEQ
2024-01-04 13:12:22 -05:00
chenyu
b8c30eb358
no midcast MULACC for llvm
2024-01-04 13:12:22 -05:00
chenyu
91665ef143
rewrite MUL CAST SUM to CAST MULACC
2024-01-04 13:12:22 -05:00
chenyu
ab7dfd637b
use float for acc dtype for half tensor sum
...
we previously only upcast uint and int, and half was using half for acc.
change to acc in float for precision. but cast the result back to half to match torch/jax output dtype
2024-01-04 13:12:22 -05:00
chenyu
6fa285b943
touchup onnx xor and not ( #3008 )
2024-01-04 02:02:42 -05:00
geohotstan
57817028bb
removed redundant dtype hacks in onnx_ops ( #2939 )
...
* updated most dtype hacks in onnx_ops
* temporarily revert dequantizelinear change
* I think this is right...
* MORE FIXES WOOOO NEW DTYPE IS AWESOME
* ok
* oops missed a print
* half -> float32 for CI
* is npdtype
* some more
* fix if ordering
* more clean ups
* final cleanups
* casting to half not allowed
* k nvm
* revert ArgMax change
* only GPU
* llvm begone
* teeny tiny change
* fix: attempt to add cast tests
* try this
* fix dequantizelinear
* revert some stuff
* tests pass pls
* less lines in onnx_tests
* oops missed string tensor tests
* clean up
* try: revert default behavior changes
* fix: disabled Cast and Castlike tests
* docs: small changes
* fix: fixed isNaN op and enabled associated tests
* fix: forgot about float16
* done
* update disabled test
* gah missed another float16
* disable rest of failing tests
* rm extra line
* try...
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-01-04 01:45:24 -05:00
chenyu
9f39165188
correct (dtype, device) in test_dtype.is_dtype_supported ( #3007 )
...
corrected dtypes for TORCH and float64 support
2024-01-04 00:25:37 -05:00
chenyu
ae112c9dbe
fix some long lines in tests ( #3006 )
...
* fix some long lines in tests
* better
2024-01-03 23:53:33 -05:00
George Hotz
7e191fbb86
hotfix: don't jitcache with 1 kernel. improvements to hip sniffer
2024-01-03 19:17:08 -08:00
George Hotz
bcc1aa21ac
make disk simpler ( #3002 )
...
* make disk simpler
* upd ops_disk
* works on osx too
* revert ops_hip
2024-01-03 17:46:21 -08:00
George Hotz
9699c8c90b
don't alloc for InterpretedASTRunner ( #2999 )
2024-01-03 17:05:53 -08:00
chenyu
bca0b95ee3
bump shapetracker simplify message to DEBUG >= 5 ( #2998 )
2024-01-03 20:00:36 -05:00
chenyu
74a30431b4
replace d[a] if a in d else b with d.get(a, b) ( #2997 )
2024-01-03 18:10:25 -05:00
chenyu
74cc6fd3c2
remove AndNode.__floordiv__ special case ( #2996 )
...
* remove AndNode.__floordiv__
AndNode produces a Node that min/max is bounded by [0, 1] so `//` on top of that is almost always 0.
we don't really use that either
* keep the test
2024-01-03 17:44:55 -05:00
George Hotz
a0c7cb2564
hotfix: create weights dir in local tg checkout
2024-01-03 14:14:33 -08:00
George Hotz
fc36a7d669
tinygrad weights
2024-01-03 14:09:28 -08:00
chenyu
1ac4d27869
remove VariableOrNum from Node.substitute arg ( #2995 )
...
having NumNode in var_vals does not change the substitute output
2024-01-03 17:02:25 -05:00
George Hotz
65dc3700b7
hip device is default on supported platforms ( #2993 )
2024-01-03 13:42:13 -08:00
George Hotz
77c98a1543
hotfix: remove weights directory
2024-01-03 13:40:39 -08:00
George Hotz
0be0f2f745
remove stable diffusion test on tinymac
2024-01-03 13:18:24 -08:00
George Hotz
a354ec9dad
Revert "hotfix: HIP is the default device on HIP platforms"
...
This reverts commit b748b569f5 .
2024-01-03 13:16:54 -08:00
George Hotz
b748b569f5
hotfix: HIP is the default device on HIP platforms
2024-01-03 13:13:52 -08:00
George Hotz
753a7ecc05
Hip driver ( #2992 )
...
* start hip driver
* fix hip llama
* make HIP default if we can
* don't change those
2024-01-03 12:53:47 -08:00
George Hotz
f290ca3924
hotfix: save lines in graph
2024-01-03 12:03:42 -08:00
Yixiang Gao
bc4b6e758b
Merge pull request #2981 from g1y5x3/cifar_fp16
...
adjsut div factor to avoid underflow for cifar in fp16
2024-01-03 11:15:42 -08:00
George Hotz
d7d5a487ad
hotfix: all device canonicalize should be done in Tensor
2024-01-03 10:48:04 -08:00
Yixiang Gao
ea3bc2f509
remove wino benchmark for now
2024-01-03 10:46:43 -08:00
Yixiang Gao
5663dd46b6
Merge branch 'master' of github.com:tinygrad/tinygrad into cifar_fp16
2024-01-03 10:11:46 -08:00