George Hotz
dca084f227
minor == to is touchups
2023-06-15 17:11:12 -07:00
blake
041d96083c
clang rt for msvc ( #986 )
...
* added platform config for clang runtime and tempfile dir for xplatform /tmp
* flake8 lint
* mypy lint
* pythonic?
* python?
* return darwin cflags
* <lines
* lint;
2023-06-15 17:06:44 -07:00
George Hotz
039f0d372f
delete ltypes ( #984 )
...
* delete ltypes
* only upcast float types
* test dtype on mac passes
* ugh, these upcasts
2023-06-15 16:24:45 -07:00
Yahya Lmallas
804c45b5fc
FIX: Can't pickle local object ( #979 )
...
_early_exec_process is a local function that is defined whiting the scope of another function, should be global
2023-06-14 12:32:17 -07:00
Rayan Hatout
2d567ef688
Optimizations in tensor.py ( #974 )
...
* optimizations in tensor.py
* make mypy happy
* revert split of Function class
2023-06-14 08:44:35 -07:00
Diogo
0629791cbd
F64 support ( #976 )
...
* initial commit
* added osx check for opencl
* added llvm f64 conversions
* typo in llvmir
* more tests and modified unsupported error
* fixed linting error
* added pragma fp64
* simplified exclusion for OSX
* fixed device check and also added it to cast func
* added ifdef check for fp16 in ops_gpu
* Revert "added ifdef check for fp16 in ops_gpu"
This reverts commit 92de754d48 .
* f64 prekernel signature match f16
* moved condition to buffer init
2023-06-13 21:31:31 -07:00
John Moore
45bc040a63
Fix typo ( #978 )
2023-06-13 15:15:45 -07:00
George Hotz
80e665bddb
a couple new tests
2023-06-13 12:36:05 -07:00
George Hotz
ba4eadb04c
PTX assembly support ( #977 )
...
* ptx assembly
* all ops tests pass
* fix tests
2023-06-13 12:31:42 -07:00
Rayan Hatout
727416201f
Shapetracker optimizations ( #966 )
...
* optimizations in shapetracker.py
* revert micro-optimizations in assertions
* make mypy happy
* list comp instead of map in get_unsafe_resize_offset
* list comp instead of map in get_unsafe_resize_offset
2023-06-12 18:13:21 -07:00
cloud11665
5f13e7c3cf
cuda: fix fp16, uint8, int64, half4 codegen ( #968 )
...
* cuda: add uchar, int64 typedefs
* cuda: fix float16 codegen
* fuck it, half4 stub. llama time!
* inline fp16 half4, revert changes to CStyleLanguage
* add inline just in case
* remove half4 operators
* use dict
2023-06-12 11:15:44 -07:00
Steven Anderson
e54b6c5e7f
One hot ( #972 )
...
* passing with 1d indices
* passing all test
* cleanup
* using safe_numpy for scalar
2023-06-12 10:13:29 -07:00
Diogo
613c74ca9f
maintain input tensor dtype ( #969 )
2023-06-12 10:12:47 -07:00
Diogo
2d4370b487
Adds tril & triu support ( #936 )
...
* triu & tril support
* lint and kernel count error
* switched shape indicies
* larger shape tests
* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz
48e9461197
broken tests for #862 and #942
2023-06-09 22:02:59 -07:00
George Hotz
c62c64f0b7
remove GeNode ( #965 )
2023-06-09 21:48:56 -07:00
George Hotz
2c324d0685
fix metal uaf ( #964 )
2023-06-09 21:28:06 -07:00
Steven Anderson
c0e558b77c
Test nllloss ( #958 )
...
* works but slow
* work with NC and NCd1 it still slow
* refactor
* support for k dimensions
* without numpy
2023-06-09 09:00:29 -07:00
Diogo
6b1280f01c
fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement ( #956 )
...
* prelu and where casting
* typing for safe_numpy
* optional
* get rid of tracing in ci
* cleanup and resolved layernorm issues
* removed debug print
2023-06-08 16:09:19 -07:00
Nicklas Boman
5c7248c72d
imagenet download and prepare ( #928 )
...
Changing if not exist to the exist_ok=True parameter and adding a variable check if you want to download training data also
adding variable to env_vars.md
2023-06-08 12:55:33 -07:00
George Hotz
df40a9c238
EXP+LOG -> EXP2+LOG2 ( #954 )
...
* EXP+LOG -> EXP2+LOG2
* update docs
2023-06-08 10:57:31 -07:00
Diogo
666d151f8a
Onnx slice fixups ( #952 )
...
* resolved some slice test errors and added some more debugging logs
* use same device in cumsum
* increased float priority
* onnx debug ouput match input
2023-06-07 19:44:30 -07:00
cloud11665
e8a23d4331
there is a better way to do that! ( #950 )
2023-06-06 15:23:30 -07:00
SnakeOnex
990fc40219
made seed None by default -> numpy picks a random seed ( #946 )
...
* made seed None by default -> numpy picks a random seed
* fixed _seed type
* set the seed to unix timestamp
* make filetype int only
2023-06-06 13:06:23 -07:00
Timothy Lindblom
a149f12a5b
Replaced broken link to /tests with /test ( #939 )
2023-06-06 10:29:09 -07:00
M4tthewDE
664d6cc7e5
Implement onnx MeanVarianceNormalization ( #943 )
2023-06-06 10:28:19 -07:00
Diogo
3bb38c3518
limit split to 1 due to windows path containing : ( #944 )
2023-06-06 10:27:54 -07:00
Steven Anderson
079ea217a3
fix test_pow_type - autocasting for Pow with inputs of diff type ( #937 )
2023-06-05 15:22:35 -07:00
George Hotz
76ab379f9b
readme updates
2023-06-05 12:20:14 -07:00
cloud11665
43ea1614b0
fix inf/nan codegen ( #935 )
...
* fix inf/nan codegen
* remove nasty oneliner, fix -inf
* inf/nan const mul/div tests
2023-06-05 11:24:09 -07:00
Filip Dimitrovski
78460034ff
Initial ellipsis support when slicing Tensors ( #843 )
...
* Initial ellipsis support when slicing Tensors
* Better comments in ellipsis slicing
* Formatting
2023-06-05 07:52:49 -07:00
M4tthewDE
70f12fdb57
Fix wrong op version being used if versions equal ( #934 )
2023-06-05 07:45:10 -07:00
Steven Anderson
79613eb83e
Test min ( #932 )
...
* fix __neg__ defaulting to float32 due to 0.0
* fixed __neg__ always defaulting to float32
* fixed openpilot (OpenCL) Test
2023-06-05 00:03:30 -07:00
kposborne2
00360da05b
Update broken docs/abstractions.py for changed ops, and add to CI ( #930 )
...
* fix and add to ci
* still have those
* ocd
* update other doc
2023-06-04 19:21:20 -07:00
Tom Edwards
5bbcbd145c
Add cumsum with n-dim inputs ( #922 )
...
* add cumsum with n-dim inputs, over arbitrary axis + relevant tests
* increased rtol for cumsum test
* move test_cumsum into test_ops
* skip arange test for images as relies on cumsum
* Fix typo
* rewrite cumsum to work with images
2023-06-04 16:55:23 -07:00
wozeparrot
091bd65a68
feat: quick doc fixups ( #923 )
2023-06-04 11:03:57 -07:00
George Hotz
fbf17f0031
intel benchmark matmul gets 60 TFLOPS?
2023-06-04 17:01:50 +00:00
Steven Anderson
657e642e3a
Fixed test suite for Clip ( #912 )
...
* Fixed test suite for Clip
* fixed issue with clip when taking large negative numbers as min
* Remove typings
2023-06-04 09:01:01 -07:00
kposborne2
0b88c5f923
Eliminate LoadOps.FROMCPU ( #920 )
...
* Add fromCPU method to init LazyBuffer to eliminate LoadOps.FROMCPU
* squish
* remove failing test
* seems logical
* Revert "seems logical"
This reverts commit bbdcdc8713 .
* inline and remove assertion
* fromCPU staticmethod, defer non-cpu device to loadop
* restore test
2023-06-04 08:55:50 -07:00
George Hotz
3e0b37f050
randn slow
2023-06-04 08:52:13 -07:00
wozeparrot
e9c1ae3825
Add a quick start guide ( #900 )
...
* feat: initial quick start guide
* fix: fix link
* feat: add note about jit
* feat: add note about load/store ops
* feat: add link to discord
* feat: add note about saving and loading models
* fix: correct code for saving and loading
* feat: overhaul docs
* fix: fix link
* feat: wording
* feat: add link to discord
* feat: contributing guidelines
* feat: make contributing section more doc focused
* feat: add link to env_vars from readme
* fix: wording
* feat: move community to bottom
* feat: showcase
* feat: linebreak
* feat: redesigned header
* feat: tweaks
* feat: tweaks
* feat: badge for lines of code
* feat: move installation instructions to repo readme
* feat: readme overhaul number 2
* feat: move visualization to quick start guide
* feat: readme 2 electric boogaloo
* fix: grammar
* fix: formatting
* feat: no ugly line
* feat: add line back
* feat: new load method
* feat: split adding accelerator docs out
* feat: showcase whisper
* feat: smaller tweaks
* feat: bring back oneliner
2023-06-04 08:51:20 -07:00
Alexey Zaytsev
d429553730
Allow Tensor(tuple) ( #911 )
2023-06-03 23:48:19 -07:00
George Hotz
afd0be8a9c
intel example
2023-06-04 06:43:09 +00:00
MohammedAlkhrashi
2b4baa97e9
exclude string type from external_test_onnx_backend.py ( #918 )
2023-06-03 19:10:52 -07:00
George Hotz
b78addf2f8
Whisper ( #919 )
...
* no whispering yet
* whispering
* live whisper
* small support
2023-06-03 18:55:14 -07:00
George Hotz
ed1963b899
Fast DiskTensor to other Tensor ( #916 )
...
* make disktensors fast
* loading
* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
791530045d
Refactor LoadOps ( #910 )
...
* test
* work
* upd test
* loadops
* cleanups
* real ones
* remove LazyNumpyArray
* fix assign test
* remove range
* np.require
* llama uses arange kernels
* no caching consts
* fix enet
* torch load support
* tests cleanup
* fix shufflenet
* fix image
* fix torch_load test
2023-06-03 09:40:43 -07:00
George Hotz
d58586bb17
safetensors! ( #903 )
...
* safetensors test
* safe_save
* load back with real safetensors
* bugfix in device name. add simple torch_load
* it works for llama, but it's slower...
* mmap
* no intermediate
* load mmaped
* readinto speed
* not ready yet
* revert that
2023-06-02 13:41:09 -07:00
Steven Anderson
513aeb2f66
Fixed all ConstantOfShape test suite ( #907 )
2023-06-02 11:26:40 -07:00
Steven Anderson
301f7b54c6
ConstantOfShape ONNX test fixed. ( #890 )
...
* ConstantOfShape ONNX test fixed.
* removed redundant if statement
* value is optional and should default to a float32 tensor with value of 0
* fixed: default parameters are created at function definition, bad for mutable objects.
2023-06-02 07:34:25 -07:00