* initial commit
* added osx check for opencl
* added llvm f64 conversions
* typo in llvmir
* more tests and modified unsupported error
* fixed linting error
* added pragma fp64
* simplified exclusion for OSX
* fixed device check and also added it to cast func
* added ifdef check for fp16 in ops_gpu
* Revert "added ifdef check for fp16 in ops_gpu"
This reverts commit 92de754d48.
* f64 prekernel signature match f16
* moved condition to buffer init
* resolved some slice test errors and added some more debugging logs
* use same device in cumsum
* increased float priority
* onnx debug ouput match input
* add cumsum with n-dim inputs, over arbitrary axis + relevant tests
* increased rtol for cumsum test
* move test_cumsum into test_ops
* skip arange test for images as relies on cumsum
* Fix typo
* rewrite cumsum to work with images
* safetensors test
* safe_save
* load back with real safetensors
* bugfix in device name. add simple torch_load
* it works for llama, but it's slower...
* mmap
* no intermediate
* load mmaped
* readinto speed
* not ready yet
* revert that
* add and reorganize test_slice_* tests
* refactor Tensor.__getitem__()
* preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones
* always compare shapes of the numpy arrays obtained from tinygrad and torch tensors
* add more tests for 0D support
* remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py
* add zero-dim support
* make test_end2end.py consistent with 0dim support
* add test for tensor with zero in shape
* don't simplify ones if shape is ()
* skip tests that need zero-size tensor support.
- zero-size tensor support not related to 0dim tensors.
* add tests for __getitem__() supporting strides >= 1
* refactor __getitem__: support for strides >= 1
* minor refactors and add comments to __getitem__
* add tests for slices with negative steps
* add support for slices with negative strides
* Added few missing return typehints for tensor.py
* added test for empty tensor for Tensor.numel()
* fixed missing numel call in test_numel
---------
Co-authored-by: deefi <dee7ine@gmail.com>
* added metal int64 and some simple tests
* removed bool return type def
* typo in test
* also missing in clang and gpu runtimes
* switched order for opencl
* increased atol and removed new line in kernel prefix
* added kaiming_uniform init for conv2d and linear layers
* fix: set getattr
* up
* fix: set getattr
* fix comments
* better does not mean it is good
* more nonlinearities
* added test
checks the distribution of default relu option
* prettier
* fix kernel size
* edit distribution of returned tensor
* complete tests and fix fan_mode
* added higher dim test
* prettier test
* fix silly blank
* just leaky_relu mode
* default fan in and leaky relu
* update params
* fix test
* shorter
* generalize Tensor.uniform and adjust kaiming init
- added low and high parameters to Tensor.uniform function, so it can have a specific range (default is 0 to 1)
- adjusted return line of kaiming_uniform
* range from -1 to 1
* delete comment
* adjusted test_uniform
* fixed
* delete comment
* use tensor dtype for zeros_like()
* add tests for zeros_like dtype
* iterate over dtypes
* remove space
* remove print
* fix test, iterate over a list
* feat: int8 support
* feat: uint8 support
* feat: int8 tests
* fix: fix uint8 on clang
* feat: test casting between int8/uint8/float16/float32
* clean: way cleaner dtype tests
* feat: preprocess_imagenet using the correct dtype
* feat: add test for overflow between uint8 and int8
* Add ResNet inference test and cannon
* Test with ResNet50
* test_car works with resnet fix
* Add KiTS19 dataset
* KiTS19: Implement iterate
* No batch load for this dataset
* Save results on iterate
* Implement dice score
* Add data prep and eval functions
* Resolve shape issue
* Conversion works but wrong values
* Segfaults when load_from_pretrained is called
* Fix segfault and assign properly
* Final result generated, though very slow
* Store and load final result to save time
* Fix typo in finalize
* Score computes
* More bug fixes, dice score is very low
* Working broken code
* Assign output values to result
* Getting a much higher score now
* Fix dataset preprocessing
* Mean DICE score of 88.5
* Ugh, typo
* Attempt to reimplement model
* Rename layers
* Tiny model works, kinda
* Accuracy? gone
* Implement InstanceNorm and match torch
* Test instance norm 2d and 3d
* Combined input block with downsample block
* Tiny model works, support strided convtranspose
* Commands to download dataset
* Clean up a bit
* unet3d_v2 -> unet3d
* Remove duplicated code
* Oops, put tests back
* lr schedulers + test
* lr scheduler test moved + integration test
* integration test for all lr scheduler
* lr scheduler test now deterministic
* changed optimizer + parameters for lr sched test
* optimizations in symbolic.py
* fix infinite recursion when expanding sums
* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
* Don't collapse dimensions during batched matmul (FIX#799)
* Avoid reshaping tensor to the same shape
* Skip batched matrix multiply when IMAGE is set