* add cumsum with n-dim inputs, over arbitrary axis + relevant tests
* increased rtol for cumsum test
* move test_cumsum into test_ops
* skip arange test for images as relies on cumsum
* Fix typo
* rewrite cumsum to work with images
* safetensors test
* safe_save
* load back with real safetensors
* bugfix in device name. add simple torch_load
* it works for llama, but it's slower...
* mmap
* no intermediate
* load mmaped
* readinto speed
* not ready yet
* revert that
* add and reorganize test_slice_* tests
* refactor Tensor.__getitem__()
* preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones
* always compare shapes of the numpy arrays obtained from tinygrad and torch tensors
* add more tests for 0D support
* remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py
* add zero-dim support
* make test_end2end.py consistent with 0dim support
* add test for tensor with zero in shape
* don't simplify ones if shape is ()
* skip tests that need zero-size tensor support.
- zero-size tensor support not related to 0dim tensors.
* add tests for __getitem__() supporting strides >= 1
* refactor __getitem__: support for strides >= 1
* minor refactors and add comments to __getitem__
* add tests for slices with negative steps
* add support for slices with negative strides
* Added few missing return typehints for tensor.py
* added test for empty tensor for Tensor.numel()
* fixed missing numel call in test_numel
---------
Co-authored-by: deefi <dee7ine@gmail.com>
* added metal int64 and some simple tests
* removed bool return type def
* typo in test
* also missing in clang and gpu runtimes
* switched order for opencl
* increased atol and removed new line in kernel prefix
* added kaiming_uniform init for conv2d and linear layers
* fix: set getattr
* up
* fix: set getattr
* fix comments
* better does not mean it is good
* more nonlinearities
* added test
checks the distribution of default relu option
* prettier
* fix kernel size
* edit distribution of returned tensor
* complete tests and fix fan_mode
* added higher dim test
* prettier test
* fix silly blank
* just leaky_relu mode
* default fan in and leaky relu
* update params
* fix test
* shorter
* generalize Tensor.uniform and adjust kaiming init
- added low and high parameters to Tensor.uniform function, so it can have a specific range (default is 0 to 1)
- adjusted return line of kaiming_uniform
* range from -1 to 1
* delete comment
* adjusted test_uniform
* fixed
* delete comment
* use tensor dtype for zeros_like()
* add tests for zeros_like dtype
* iterate over dtypes
* remove space
* remove print
* fix test, iterate over a list
* feat: int8 support
* feat: uint8 support
* feat: int8 tests
* fix: fix uint8 on clang
* feat: test casting between int8/uint8/float16/float32
* clean: way cleaner dtype tests
* feat: preprocess_imagenet using the correct dtype
* feat: add test for overflow between uint8 and int8
* Add ResNet inference test and cannon
* Test with ResNet50
* test_car works with resnet fix
* Add KiTS19 dataset
* KiTS19: Implement iterate
* No batch load for this dataset
* Save results on iterate
* Implement dice score
* Add data prep and eval functions
* Resolve shape issue
* Conversion works but wrong values
* Segfaults when load_from_pretrained is called
* Fix segfault and assign properly
* Final result generated, though very slow
* Store and load final result to save time
* Fix typo in finalize
* Score computes
* More bug fixes, dice score is very low
* Working broken code
* Assign output values to result
* Getting a much higher score now
* Fix dataset preprocessing
* Mean DICE score of 88.5
* Ugh, typo
* Attempt to reimplement model
* Rename layers
* Tiny model works, kinda
* Accuracy? gone
* Implement InstanceNorm and match torch
* Test instance norm 2d and 3d
* Combined input block with downsample block
* Tiny model works, support strided convtranspose
* Commands to download dataset
* Clean up a bit
* unet3d_v2 -> unet3d
* Remove duplicated code
* Oops, put tests back
* lr schedulers + test
* lr scheduler test moved + integration test
* integration test for all lr scheduler
* lr scheduler test now deterministic
* changed optimizer + parameters for lr sched test
* optimizations in symbolic.py
* fix infinite recursion when expanding sums
* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
* Don't collapse dimensions during batched matmul (FIX#799)
* Avoid reshaping tensor to the same shape
* Skip batched matrix multiply when IMAGE is set
* feat: promote Embedding to nn
* fix: fix failing test
* feat: add test with jit
* feat: rewrite embedding to no longer need stacked for loops
* clean+fix: don't know how that happened
* feat: initial rnn-t
* feat: working with BS>1
* feat: add lstm test
* feat: test passing hidden
* clean: cleanup
* feat: specify start
* feat: way faster lstm & model
* fix: default batch size
* feat: optimization
* fix: fix metrics
* fix: fix feature splicing
* feat: cleaner stacktime
* clean: remove unused import
* clean: remove extra prints
* fix: fix tests and happy llvm
* feat: have the librispeech dataset in its own dir
* clean: unused variable
* feat: no longer need numpy for the embedding + slightly more memory efficient lstm
* fix: forgot to remove something that broke tests
* feat: use relative paths
* feat: even faster
* feat: remove pointless transposes in StackTime
* fix: correct forward
* feat: switch to soundfile for loading and fix some leaks
* feat: add comment about initial dataset setup
* feat: jit more things
* feat: default batch size back to 1
larger than 1 is broken again :(
and even in the reference implementation it gives worse results