Commit Graph

1843 Commits

Author SHA1 Message Date
George Hotz
cb7c22beeb fix mypy 2023-05-06 19:18:54 +00:00
George Hotz
5190037cbc rocm: disassembler for shader 2023-05-06 19:07:52 +00:00
George Hotz
7fbf96b992 jit: TODO, use abstractions 2023-05-05 22:51:30 -07:00
George Hotz
0cd3feb452 jit oops. should add that to commit tests 2023-05-05 22:01:13 -07:00
George Hotz
5b2ae262db assertions for jit 2023-05-05 21:56:32 -07:00
George Hotz
42256c0d9d rocm sniffer dumps code 2023-05-05 18:36:53 +00:00
George Hotz
81aa3e546b exclude GPU on tiny (#766) 2023-05-05 10:07:23 -07:00
George Hotz
f2a964f447 nocopy (#764) 2023-05-05 09:32:06 -07:00
George Hotz
466ffeb04f fast cifar on AMD 2023-05-05 02:10:50 +00:00
George Hotz
3a2011ab2d rocm sniffer 2023-05-04 22:22:39 +00:00
George Hotz
a55c4f5000 better rocm build scripts 2023-05-04 09:14:05 +00:00
George Hotz
987b1aaf96 rocm build scripts 2023-05-04 08:45:23 +00:00
George Hotz
f28df9900f multidevice works (#763)
* basic multigpu working

* better multigpu test

* upper

* touchups

* cl sync
2023-05-04 01:04:58 -07:00
George Hotz
4f6d674ec0 use CPU tests in pre-commit 2023-05-03 19:46:16 +00:00
George Hotz
ed33a89d52 no werror in archprobe 2023-05-03 19:34:17 +00:00
George Hotz
7ecf4dff68 multi cl_queue (#762)
* multi cl_queue

* only platforms 1

* gpus first, then cpus

* put device on underlying buffer

* cl_queue array
2023-05-03 12:15:28 -07:00
Rylan Justice
7757f5fed2 Fixed package description (#761)
* Updated LICENSE year

* Fixed package description
2023-05-03 10:21:05 -07:00
George Hotz
3b933b0a2f rocm setup script 2023-05-03 16:01:17 +00:00
Rylan Justice
9628a3f190 Updated LICENSE year (#760) 2023-05-01 15:35:23 -07:00
Joqsan
0b9d4126d0 Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758)
* add stack() and repeat() methods

* make stack a static method
2023-05-01 09:37:46 -07:00
George Hotz
59d0d168cd FLOAT16 off works 2023-04-19 15:34:56 -07:00
George Hotz
3d15769a8f 50 TFLOPS cuda matmul 2023-04-19 14:38:24 -07:00
George Hotz
03b38864db fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
1aa0648d6a fix path linter issue 2023-04-18 19:17:41 -07:00
George Hotz
cbe2564b7b oops, no hip yet 2023-04-18 19:10:36 -07:00
George Hotz
e4db0c820f hlb_cifar10 init from torch weights 2023-04-18 19:09:13 -07:00
George Hotz
a6b9733256 GB/s can be higher 2023-04-18 17:51:03 -07:00
George Hotz
9fb3f9ace3 Revert "move t.grad realize on SGD"
This reverts commit ccdc0290d6.
2023-04-18 17:50:08 -07:00
George Hotz
e93e04ed6e Revert "huh...this is faster"
This reverts commit aedd4685fa.
2023-04-18 17:50:07 -07:00
George Hotz
aedd4685fa huh...this is faster 2023-04-18 17:36:31 -07:00
George Hotz
dbc99c243b why did that test break? 2023-04-18 17:08:38 -07:00
George Hotz
ccdc0290d6 move t.grad realize on SGD 2023-04-18 16:47:51 -07:00
George Hotz
8b7ecd63bb Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
Jan Henrik Høiland
4e17d27d09 Fix cuda errors when running llama example (#749) 2023-04-16 13:52:10 -07:00
George Hotz
0b5a0b9ba4 winograd comment 2023-04-16 03:36:51 -07:00
George Hotz
8b777af571 metal_conv gets over 10.4 TFLOPS... 2023-04-15 03:31:22 -07:00
George Hotz
d66e682205 metal matmul from tcores branch 2023-04-14 23:29:29 -07:00
George Hotz
732884653c osx in hlb_cifar10_torch 2023-04-14 13:12:08 -07:00
George Hotz
17e37157b6 fix backward convs (#746)
* fix backward convs

* no pushing in reduce

* late cout

* test_fold_4convs_sgd
2023-04-14 10:42:11 -07:00
George Hotz
f7f416d6f4 back to 6 for test_fold_conv_sgd 2023-04-14 07:34:00 -07:00
George Hotz
133521e730 relu UnaryOp is back 2023-04-14 07:12:53 -07:00
George Hotz
584ee6f616 don't graph consts 2023-04-14 03:32:20 -07:00
George Hotz
9a39ebefde hlb_cifar10_torch gets 80% 2023-04-14 02:47:03 -07:00
worldwalker2000
552a048a33 make maximum split the grad like torch when equal (#738)
* make maximum split grad

* added test for maximum split grad when equal

* minor expr simplification

* (2-eq)/2 only once

* update test bc one more sum output child stays
2023-04-14 00:17:46 -07:00
Jacky Lee
06ed958abd Fix train_resnet example (#744)
* Fix ResNet example

* Scientific notation
2023-04-12 13:48:39 +05:30
Sohaib
70b9072663 add Pad onnx operator and rework _padding (#740) 2023-04-06 17:07:36 +05:30
jintuzhang
8e40ff8c8d Do not specify errors when trying to load devices. (#741) 2023-04-06 17:05:36 +05:30
Jacky Lee
7a45b989a1 Device: make GPU default and METAL/CUDA if possible (#732)
* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* run github workflow

* Fix logic to select default

* pass if an error occurs

* use separate function for try except
2023-04-04 09:41:52 +05:30
George Hotz
94e2c49c35 test_cacheline_size that works in both places 2023-03-30 06:47:20 +04:00
George Hotz
b05c2828f7 better cacheline test 2023-03-30 06:08:54 +04:00