George Hotz
fbf17f0031
intel benchmark matmul gets 60 TFLOPS?
2023-06-04 17:01:50 +00:00
Steven Anderson
657e642e3a
Fixed test suite for Clip ( #912 )
...
* Fixed test suite for Clip
* fixed issue with clip when taking large negative numbers as min
* Remove typings
2023-06-04 09:01:01 -07:00
George Hotz
afd0be8a9c
intel example
2023-06-04 06:43:09 +00:00
George Hotz
ed1963b899
Fast DiskTensor to other Tensor ( #916 )
...
* make disktensors fast
* loading
* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
791530045d
Refactor LoadOps ( #910 )
...
* test
* work
* upd test
* loadops
* cleanups
* real ones
* remove LazyNumpyArray
* fix assign test
* remove range
* np.require
* llama uses arange kernels
* no caching consts
* fix enet
* torch load support
* tests cleanup
* fix shufflenet
* fix image
* fix torch_load test
2023-06-03 09:40:43 -07:00
Steven Anderson
513aeb2f66
Fixed all ConstantOfShape test suite ( #907 )
2023-06-02 11:26:40 -07:00
Steven Anderson
301f7b54c6
ConstantOfShape ONNX test fixed. ( #890 )
...
* ConstantOfShape ONNX test fixed.
* removed redundant if statement
* value is optional and should default to a float32 tensor with value of 0
* fixed: default parameters are created at function definition, bad for mutable objects.
2023-06-02 07:34:25 -07:00
kposborne2
ae83e9844c
add output_padding to transposed conv ( #875 )
2023-06-01 00:03:22 -07:00
Friedrich Carl Eichenroth
740304ef9d
Small Onnx Parser Improvements ( #885 )
...
* wip
* rename onnx_version to onnx_model_versioN
* add type
* add types
* small cleanup
* revert some changes from before
* add todo
* dumb fix
2023-06-01 00:01:01 -07:00
Marcello Fuschi
3924aae8ed
Fix ONNX dropout and unify the implementation ( #857 )
...
* Fix ONNX dropout and unify the implementation
* Use tensor rand method for dropout
* Change approach for RNG in ONNX Dropout
* Fix style
* Test legacy RNG seeding
* Remove the necessity for legacy RNG in Tensor class
2023-05-31 07:40:47 -07:00
skobsman
2e393f7ef2
InstanceNormalization ONNX test fixed. ( #870 )
2023-05-30 16:07:44 -07:00
Friedrich Carl Eichenroth
f91f28d9e2
fix a bunch of tests ( #856 )
2023-05-29 17:48:26 -07:00
zk-tarts
174c65b7d9
add onnx Binarizer op ( #850 )
...
Co-authored-by: zk-tarts <>
2023-05-29 13:15:50 -07:00
M4tthewDE
4408c25e9a
Add Onnx op Shrink ( #851 )
...
* Add onnx Shrink operation
* Fix soft/hard shrink onnx test
2023-05-29 13:15:39 -07:00
Friedrich Carl Eichenroth
6f2b3755ca
set axis default to 0 ( #854 )
2023-05-29 13:15:28 -07:00
Friedrich Carl Eichenroth
3b158f7a5f
fix onnx versions greater or equal 10 ( #853 )
2023-05-29 13:04:06 -07:00
Diogo
1a5d72f812
Onnx ops And, Or, Xor, Not ( #847 )
...
* onnx and, or, xor, not
* added bool type to llvm and clang
* removed float conversion
* switched where op to use tensor func
2023-05-29 11:09:20 -07:00
SnakeOnex
844e6d0753
conv1d & conv3d onnx tests ( #835 )
...
* conv1d onnx
* [Work in progress] conv1d + enforcing full padding tuple length
* make ONNX padding reorder not hardcoded, works for 1D and 3D convs now
* conv2d interprets padding based on the input tensor dimensions
2023-05-29 10:16:45 -07:00
Marcello Fuschi
6d49925a26
Add max_pool2d dilation ( #833 )
2023-05-28 15:16:48 -07:00
cheeetoo
21d27d31a9
Fix a couple pad tests ( #827 )
...
* fix pad bug
* float type hint for value
* convert pads to list
* update Pad type signature
* Change | to Union since not supported in < python 3.10
2023-05-28 12:06:46 -07:00
Mattis Megevand
606b841d3f
LR Schedulers ( #755 )
...
* lr schedulers + test
* lr scheduler test moved + integration test
* integration test for all lr scheduler
* lr scheduler test now deterministic
* changed optimizer + parameters for lr sched test
2023-05-27 07:47:49 -07:00
George Hotz
87fa5af70a
ptx example
2023-05-26 19:28:51 -07:00
George Hotz
26014a0fa1
add convtranspose ( #809 )
...
* add convtranspose
* onnx convtranspose
2023-05-26 12:35:03 -07:00
wozeparrot
7351eb4b61
feat: put temperary file in the same directory as the destination file ( #805 )
2023-05-25 20:46:02 -07:00
Diogo
c19ef0fcce
Add sin/cos/tan ( #794 )
...
* added sin/cos/tan
* fix lint
* added onnx ops support
2023-05-25 09:04:56 -07:00
George Hotz
0400315078
Revert "ops rdna"
...
This reverts commit 81a11d891d .
2023-05-21 13:02:18 -07:00
George Hotz
325a3bf2cf
Revert "writing 2"
...
This reverts commit dddd6c42f0 .
2023-05-21 13:02:17 -07:00
George Hotz
dddd6c42f0
writing 2
2023-05-21 12:52:36 -07:00
George Hotz
81a11d891d
ops rdna
2023-05-21 11:45:38 -07:00
George Hotz
90fff82c8a
Rdna ( #776 )
...
* assembler maybe
* custom asm
* rdna3 on quiet
* trigger crashes
* fixed notes
* non-fatal rdna2 crash
* Crash4
* improve rdna sniffer
* comments
* improve sniffer
* asm
* 131 TFLOPS RDNA3
* opt simple matmul
* todos
2023-05-16 05:33:57 -07:00
George Hotz
89b8b39d9c
fix mypy
2023-05-13 21:25:36 -07:00
George Hotz
e0b2035023
fast imagenet eval, gets 76.14% across the set
2023-05-13 21:18:31 -07:00
George Hotz
46d419060b
start on mlperf models
2023-05-10 16:30:49 -07:00
George Hotz
cb7c22beeb
fix mypy
2023-05-06 19:18:54 +00:00
George Hotz
5190037cbc
rocm: disassembler for shader
2023-05-06 19:07:52 +00:00
George Hotz
42256c0d9d
rocm sniffer dumps code
2023-05-05 18:36:53 +00:00
George Hotz
f2a964f447
nocopy ( #764 )
2023-05-05 09:32:06 -07:00
George Hotz
3a2011ab2d
rocm sniffer
2023-05-04 22:22:39 +00:00
George Hotz
a55c4f5000
better rocm build scripts
2023-05-04 09:14:05 +00:00
George Hotz
987b1aaf96
rocm build scripts
2023-05-04 08:45:23 +00:00
George Hotz
ed33a89d52
no werror in archprobe
2023-05-03 19:34:17 +00:00
George Hotz
7ecf4dff68
multi cl_queue ( #762 )
...
* multi cl_queue
* only platforms 1
* gpus first, then cpus
* put device on underlying buffer
* cl_queue array
2023-05-03 12:15:28 -07:00
George Hotz
3b933b0a2f
rocm setup script
2023-05-03 16:01:17 +00:00
George Hotz
59d0d168cd
FLOAT16 off works
2023-04-19 15:34:56 -07:00
George Hotz
3d15769a8f
50 TFLOPS cuda matmul
2023-04-19 14:38:24 -07:00
George Hotz
0b5a0b9ba4
winograd comment
2023-04-16 03:36:51 -07:00
George Hotz
8b777af571
metal_conv gets over 10.4 TFLOPS...
2023-04-15 03:31:22 -07:00
George Hotz
d66e682205
metal matmul from tcores branch
2023-04-14 23:29:29 -07:00
Sohaib
70b9072663
add Pad onnx operator and rework _padding ( #740 )
2023-04-06 17:07:36 +05:30
George Hotz
94e2c49c35
test_cacheline_size that works in both places
2023-03-30 06:47:20 +04:00