George Hotz
81a11d891d
ops rdna
2023-05-21 11:45:38 -07:00
George Hotz
ed038ba129
Contract float4 ALU operations ( #780 )
...
* wrong expand
* tests passing
* pass lint
2023-05-16 19:03:49 -07:00
George Hotz
90fff82c8a
Rdna ( #776 )
...
* assembler maybe
* custom asm
* rdna3 on quiet
* trigger crashes
* fixed notes
* non-fatal rdna2 crash
* Crash4
* improve rdna sniffer
* comments
* improve sniffer
* asm
* 131 TFLOPS RDNA3
* opt simple matmul
* todos
2023-05-16 05:33:57 -07:00
George Hotz
89b8b39d9c
fix mypy
2023-05-13 21:25:36 -07:00
George Hotz
e0b2035023
fast imagenet eval, gets 76.14% across the set
2023-05-13 21:18:31 -07:00
Jacky Lee
c552f6f92b
Inference test: add tests for ResNet50 ( #773 )
...
* Add ResNet inference test and cannon
* Test with ResNet50
* test_car works with resnet fix
2023-05-13 21:18:15 -07:00
Rabia Eda Yılmaz
e5b4b36cba
add std to tensor.py ( #767 )
...
* add std
* delete comment
* edit: one liner std, add: test
* adjust
* fix: shape mismatch
* set unbiased to False
* added unbiased option
* fix unbiased option in test and clean code
* better
* generalize axis
* holly coffee molly
* generalize axes without unbiased opt.
* hopefully done
* complete unbiased true for axes
* Update test_ops.py
* fixed
* std completed without bessels correction
* fix comment
* ups
2023-05-13 12:20:44 -07:00
George Hotz
b705510d5c
getting 77% on imagenet eval
2023-05-13 07:46:27 -07:00
George Hotz
810f03dafa
conv3d + unet3d ( #772 )
...
* conv3d, needs test
* test passes, padding wrong on unet
* unet3d
* no conv3d on images
2023-05-12 13:54:07 -07:00
George Hotz
46d419060b
start on mlperf models
2023-05-10 16:30:49 -07:00
Jacky Lee
d13629cb26
ResNet: match implementation with Nvidia and PyTorch ( #770 )
...
* Match ResNet implementation with pytorch and nvidia
* Reduce number of Epochs
2023-05-10 09:01:22 -07:00
Jacky Lee
b80cf9220c
Statistics test: check if distributions match torch ( #769 )
...
* Check if tensor values match torch
* Clean up randomness tests and remove dependency
* Remove kaiming uniform test
2023-05-07 21:43:23 -07:00
George Hotz
cb7c22beeb
fix mypy
2023-05-06 19:18:54 +00:00
George Hotz
5190037cbc
rocm: disassembler for shader
2023-05-06 19:07:52 +00:00
George Hotz
7fbf96b992
jit: TODO, use abstractions
2023-05-05 22:51:30 -07:00
George Hotz
0cd3feb452
jit oops. should add that to commit tests
2023-05-05 22:01:13 -07:00
George Hotz
5b2ae262db
assertions for jit
2023-05-05 21:56:32 -07:00
George Hotz
42256c0d9d
rocm sniffer dumps code
2023-05-05 18:36:53 +00:00
George Hotz
81aa3e546b
exclude GPU on tiny ( #766 )
2023-05-05 10:07:23 -07:00
George Hotz
f2a964f447
nocopy ( #764 )
2023-05-05 09:32:06 -07:00
George Hotz
466ffeb04f
fast cifar on AMD
2023-05-05 02:10:50 +00:00
George Hotz
3a2011ab2d
rocm sniffer
2023-05-04 22:22:39 +00:00
George Hotz
a55c4f5000
better rocm build scripts
2023-05-04 09:14:05 +00:00
George Hotz
987b1aaf96
rocm build scripts
2023-05-04 08:45:23 +00:00
George Hotz
f28df9900f
multidevice works ( #763 )
...
* basic multigpu working
* better multigpu test
* upper
* touchups
* cl sync
2023-05-04 01:04:58 -07:00
George Hotz
4f6d674ec0
use CPU tests in pre-commit
2023-05-03 19:46:16 +00:00
George Hotz
ed33a89d52
no werror in archprobe
2023-05-03 19:34:17 +00:00
George Hotz
7ecf4dff68
multi cl_queue ( #762 )
...
* multi cl_queue
* only platforms 1
* gpus first, then cpus
* put device on underlying buffer
* cl_queue array
2023-05-03 12:15:28 -07:00
Rylan Justice
7757f5fed2
Fixed package description ( #761 )
...
* Updated LICENSE year
* Fixed package description
2023-05-03 10:21:05 -07:00
George Hotz
3b933b0a2f
rocm setup script
2023-05-03 16:01:17 +00:00
Rylan Justice
9628a3f190
Updated LICENSE year ( #760 )
2023-05-01 15:35:23 -07:00
Joqsan
0b9d4126d0
Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) ( #758 )
...
* add stack() and repeat() methods
* make stack a static method
2023-05-01 09:37:46 -07:00
George Hotz
59d0d168cd
FLOAT16 off works
2023-04-19 15:34:56 -07:00
George Hotz
3d15769a8f
50 TFLOPS cuda matmul
2023-04-19 14:38:24 -07:00
George Hotz
03b38864db
fix batchnorm at training ( #753 )
...
* e2e testing
* min failure
* no affine on bn, still fails
* why did i think i could detach that?
* allow more kernels for bn
* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
1aa0648d6a
fix path linter issue
2023-04-18 19:17:41 -07:00
George Hotz
cbe2564b7b
oops, no hip yet
2023-04-18 19:10:36 -07:00
George Hotz
e4db0c820f
hlb_cifar10 init from torch weights
2023-04-18 19:09:13 -07:00
George Hotz
a6b9733256
GB/s can be higher
2023-04-18 17:51:03 -07:00
George Hotz
9fb3f9ace3
Revert "move t.grad realize on SGD"
...
This reverts commit ccdc0290d6 .
2023-04-18 17:50:08 -07:00
George Hotz
e93e04ed6e
Revert "huh...this is faster"
...
This reverts commit aedd4685fa .
2023-04-18 17:50:07 -07:00
George Hotz
aedd4685fa
huh...this is faster
2023-04-18 17:36:31 -07:00
George Hotz
dbc99c243b
why did that test break?
2023-04-18 17:08:38 -07:00
George Hotz
ccdc0290d6
move t.grad realize on SGD
2023-04-18 16:47:51 -07:00
George Hotz
8b7ecd63bb
Remove Zeroview ( #748 )
...
* no zeroview start
* closer
* stride mask
* st tests pass, delete ZeroView
* byebye zv
* close to working
* not contiguous with mask
* subtract, don't add
* mask on view
* ugh, that shouldn't have been in there
* shape merge
* bugfixes
* fuzzer + 4 fuzzer failures
* fuzzer for symbolic
* more fuzzing and nothing
* that fuzzer doesn't hit either
* fixes padding...ugh
* no more offsets
* working
* rewrite load and store
* all checks
* fix idxs
* progress
* bugfix
* float4_axis
* works
* cleanups
* complex valids_okay
2023-04-17 08:21:46 -07:00
Jan Henrik Høiland
4e17d27d09
Fix cuda errors when running llama example ( #749 )
2023-04-16 13:52:10 -07:00
George Hotz
0b5a0b9ba4
winograd comment
2023-04-16 03:36:51 -07:00
George Hotz
8b777af571
metal_conv gets over 10.4 TFLOPS...
2023-04-15 03:31:22 -07:00
George Hotz
d66e682205
metal matmul from tcores branch
2023-04-14 23:29:29 -07:00
George Hotz
732884653c
osx in hlb_cifar10_torch
2023-04-14 13:12:08 -07:00