Commit Graph

1961 Commits

Author SHA1 Message Date
Friedrich Carl Eichenroth
6f2b3755ca set axis default to 0 (#854) 2023-05-29 13:15:28 -07:00
Friedrich Carl Eichenroth
3b158f7a5f fix onnx versions greater or equal 10 (#853) 2023-05-29 13:04:06 -07:00
Ubaidullah Khan
0e89c3f456 zeros_like use dtype if specified else default to tensor’s dtype (#848) 2023-05-29 11:38:34 -07:00
Diogo
1a5d72f812 Onnx ops And, Or, Xor, Not (#847)
* onnx and, or, xor, not

* added bool type to llvm and clang

* removed float conversion

* switched where op to use tensor func
2023-05-29 11:09:20 -07:00
SnakeOnex
844e6d0753 conv1d & conv3d onnx tests (#835)
* conv1d onnx

* [Work in progress] conv1d + enforcing full padding tuple length

* make ONNX padding reorder not hardcoded, works for 1D and 3D convs now

* conv2d interprets padding based on the input tensor dimensions
2023-05-29 10:16:45 -07:00
George Hotz
ae204e40c8 move line counter to python 2023-05-29 09:21:40 -07:00
wozeparrot
8c6085a715 Rewrite Adam/W as functions of LAMB (#839)
* feat: rewrite adam/w as functions of lamb

* feat: use adam style adam update + comment

* fix: nvm need to use the lamb adam update
2023-05-29 09:21:35 -07:00
George Hotz
ddc9dafe62 tighten up the kernel count tests 2023-05-29 08:48:54 -07:00
vaibhav
9a18dade4b Fix ResNet Kernel Fusion (from bounty) (#825)
* Fix ResNet

* Revert resnet to orignal

* Change fix merge elementwise_op with reduce_op
2023-05-29 08:46:36 -07:00
JudeDavis1
f3168ee69b default transformer dropout to 0 (#828)
* default mha dropout to 0

* simplify assert

* reform

* default to 0.1
2023-05-29 08:06:16 -07:00
Ubaidullah Khan
c825cc4774 use tensor dtype for zeros_like() (#842)
* use tensor dtype for zeros_like()

* add tests for zeros_like dtype

* iterate over dtypes

* remove space

* remove print

* fix test, iterate over a list
2023-05-29 08:05:50 -07:00
crthilakraj
7925fa58d9 Fix cuda (#836)
* disabled float4 ALU ops for CUDA, small fix to add half_prekernel before kernel_prefix

* added supports_float4_alu option, and disabled for ops_cuda
2023-05-29 07:59:36 -07:00
Marcello Fuschi
6ea5df19b2 Fix conv_transpose2d asymmetric padding (#840) 2023-05-29 07:57:06 -07:00
wozeparrot
2fd2fb6380 int8/uint8 support (#837)
* feat: int8 support

* feat: uint8 support

* feat: int8 tests

* fix: fix uint8 on clang

* feat: test casting between int8/uint8/float16/float32

* clean: way cleaner dtype tests

* feat: preprocess_imagenet using the correct dtype

* feat: add test for overflow between uint8 and int8
2023-05-28 23:15:06 -07:00
Ali Benkassou
2939e40b98 Count Python tokens (#817)
* Count Python tokens

* Minor change

* Dumb syntax

* 2 spaces indentations + descending order
2023-05-28 21:26:49 -07:00
Jacky Lee
5d212864b5 Add MLPerf UNet3D model (#775)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix

* Add KiTS19 dataset

* KiTS19: Implement iterate

* No batch load for this dataset

* Save results on iterate

* Implement dice score

* Add data prep and eval functions

* Resolve shape issue

* Conversion works but wrong values

* Segfaults when load_from_pretrained is called

* Fix segfault and assign properly

* Final result generated, though very slow

* Store and load final result to save time

* Fix typo in finalize

* Score computes

* More bug fixes, dice score is very low

* Working broken code

* Assign output values to result

* Getting a much higher score now

* Fix dataset preprocessing

* Mean DICE score of 88.5

* Ugh, typo

* Attempt to reimplement model

* Rename layers

* Tiny model works, kinda

* Accuracy? gone

* Implement InstanceNorm and match torch

* Test instance norm 2d and 3d

* Combined input block with downsample block

* Tiny model works, support strided convtranspose

* Commands to download dataset

* Clean up a bit

* unet3d_v2 -> unet3d

* Remove duplicated code

* Oops, put tests back
2023-05-28 20:38:19 -07:00
Sohaib
65d09031f2 add retinanet with resnet backbone (#813)
* add retinanet with resnet backbone

* adds resnext to support loading retinanet pretrained on openimages
* object detection post processing with numpy
* data is downloaded and converted to coco format with fiftyone
* data loading and mAP evaluation with pycocotools

* remove fiftyone dep

* * eval freq

* fix model timing

* del jit for last batch

* faster accumulate
2023-05-28 20:20:16 -07:00
George Hotz
46327f7420 bugfix for stable diffusion 2023-05-29 00:03:09 +00:00
George Hotz
59f9bcd4a4 Disktensors! (#819)
* make empty a real thing

* start ops_disk

* disk tensor works

* interpreted cleanup

* slice write to disk

* preprocess imagenet

* fix custom function
2023-05-28 15:40:37 -07:00
Marcello Fuschi
6d49925a26 Add max_pool2d dilation (#833) 2023-05-28 15:16:48 -07:00
wozeparrot
7460bd9b02 Add LAMB optimizer (#821)
* feat: initial lamb optimizer

* feat: corrently match tf impl and add test
2023-05-28 15:09:05 -07:00
SnakeOnex
1b337b5533 ONNX tests exclude all unsupported filetype tests (#832) 2023-05-28 13:31:20 -07:00
cheeetoo
21d27d31a9 Fix a couple pad tests (#827)
* fix pad bug

* float type hint for value

* convert pads to list

* update Pad type signature

* Change | to Union since not supported in < python 3.10
2023-05-28 12:06:46 -07:00
Kirill R
081b3ab639 Tensor.where method (#830) 2023-05-28 10:20:33 -07:00
George Hotz
eea3542975 remove other install method 2023-05-28 08:36:21 -07:00
Kirill R
0c0c7380af Add Tensor.where (#826)
* Add Tensor.where

* fix linter

* fix mypy
2023-05-28 08:04:56 -07:00
kposborne2
2163a1b049 Add shrink step to fix strided conv_transpose2d, and add to nn (#823)
* implement conv transpose 2d

* don't inherit, remove old assert

---------

Co-authored-by: Kyle <kposborne@gmail.com>
2023-05-28 07:52:45 -07:00
crthilakraj
01daa74f9b fixed TestCustomFunction (#820) 2023-05-27 18:27:46 -07:00
wozeparrot
67de3aa1de Add mlperf bert model (#803)
* feat: add mlperf bert model

* feat: switch to nn.Embedding

* clean+fix: fix formatting

* feat: add simple downloader

* feat: metrics

* feat: don't actually need exact match

* feat: doing a run

* feat: set eps on the layernorms

* clean+fix: cleaner impl + hopefully fixed

* feat: move dataset initialization into iterate

* feat: move tokenizer out of iterate

* clean+fix: cleaner + working

* clean: cleanup

* fix: fix metrics

* feat: need to use original bert gelu + download vocab

* feat: make directory if it doesn't exist yet

* feat: jit go brrr
2023-05-27 14:53:32 -07:00
George Hotz
1e56aced05 add changeable DEBUG (#816) 2023-05-27 13:28:25 -07:00
George Hotz
a3feee29c5 make tests faster + add onnx (#815)
* search one dir, disable slow

* onnx tests

* fast rnnt test
2023-05-27 08:53:32 -07:00
Mattis Megevand
606b841d3f LR Schedulers (#755)
* lr schedulers + test

* lr scheduler test moved + integration test

* integration test for all lr scheduler

* lr scheduler test now deterministic

* changed optimizer + parameters for lr sched test
2023-05-27 07:47:49 -07:00
George Hotz
87fa5af70a ptx example 2023-05-26 19:28:51 -07:00
George Hotz
fd296ce444 have kernels wait on DEBUG=1 2023-05-26 22:51:16 +00:00
Rayan Hatout
8b2c2d6896 Optimizations in symbolic.py (#796)
* optimizations in symbolic.py

* fix infinite recursion when expanding sums

* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
2023-05-26 12:59:53 -07:00
George Hotz
26014a0fa1 add convtranspose (#809)
* add convtranspose

* onnx convtranspose
2023-05-26 12:35:03 -07:00
symlon
04284414db Batchnorm2d fixed running_var (#807)
* BatchNorm2d match pytorch

* removed comment

* Batchnorm2d test multiple sizes
2023-05-26 12:32:49 -07:00
George Hotz
65d63f5b40 support folding multiple of 4 into float4 (#808) 2023-05-26 12:17:48 -07:00
Aneesh Durg
6d4a728f62 Don't collapse dimensions during batched matmul (FIX #799) (#800)
* Don't collapse dimensions during batched matmul (FIX #799)

* Avoid reshaping tensor to the same shape

* Skip batched matrix multiply when IMAGE is set
2023-05-26 11:15:34 -07:00
George Hotz
803587b8b4 update readme 2023-05-26 06:11:05 +00:00
wozeparrot
7351eb4b61 feat: put temperary file in the same directory as the destination file (#805) 2023-05-25 20:46:02 -07:00
George Hotz
3ddcb5c36f Half4 load (#804)
* support half4 load

* cast to float4

* dead assert
2023-05-25 20:21:15 -07:00
George Hotz
ee2c8423c7 disable that test on LLVM. i have to stop pushing to master 2023-05-26 03:11:03 +00:00
George Hotz
ea3194f68e test touchups 2023-05-26 02:39:42 +00:00
wozeparrot
0dc333cfab Promote Embedding to nn (#798)
* feat: promote Embedding to nn

* fix: fix failing test

* feat: add test with jit

* feat: rewrite embedding to no longer need stacked for loops

* clean+fix: don't know how that happened
2023-05-25 18:39:45 -07:00
George Hotz
f4f23dc9a3 version bump v0.6.0 2023-05-26 00:51:25 +00:00
George Hotz
faf80418b7 pyopencl by default since GPU is default (#802) 2023-05-25 17:48:18 -07:00
wozeparrot
fca5028d78 feat: ability to exclude cl devices from being used (#801) 2023-05-25 17:31:29 -07:00
Benedikt
3c465470f2 pip installation one liner (#793) 2023-05-25 16:43:42 -07:00
George Hotz
a968c4c3a4 Cleanup mlperf (#797)
* improve factorization

* cleanups
2023-05-25 11:36:43 -07:00