Commit Graph

638 Commits

Author SHA1 Message Date
Francesco Castelli
579f4615a0 Add assert for wrong matmul/dot shapes (#1438) 2023-08-04 18:16:56 -04:00
Umut Zengin
52db7d7435 inf, -inf support for pad (#1436) 2023-08-04 15:05:25 -04:00
Umut Zengin
8889821547 Const pad support to pad2d and slice (#1392)
* slice to pad2d migrate

* Gain line

* Mypy happy

* Mypy happy

* Revert

* whitespace
2023-08-02 08:58:52 -07:00
Diogo
ba5e3818a0 Limit dims based on max size (#1390)
* working

* whitespace

* changed defaults to None

* linter

* last linter error
2023-07-31 19:18:19 -07:00
Umut Zengin
0de5f20970 Re-open constant pad support to Tensor.pad (#1388)
* Added const padding support to .pad

* Linter
2023-07-31 17:08:57 -07:00
wozeparrot
32d1afa4b5 feat: correct case when base is 0 (#1360) 2023-07-27 13:53:38 -04:00
wozeparrot
c22e77abfd Match torch on fractional negative base pow (#1352)
* feat: match torch on fractional negative base pow

* feat: tests for trunc
2023-07-26 19:14:54 -07:00
Umut Zengin
d4ebadf2da Small Tensor.cat optimization and reformating (#1347) 2023-07-26 18:01:12 -04:00
geohotstan
4056f97187 Gather (#1329) 2023-07-25 15:05:41 -04:00
waifairer
d89fb729e5 flake8 (#1323)
* flake8: Ignore frequent violations, correct infrequent ones

* Ignore some rules in test

* Reorder test ignores

* Lint test + main

* EOF indent

* Include all E71,E72 errors

* Test the failing case in CI

* Revert "Test the failing case in CI"

This reverts commit 110add0a70.

* Push to test!
This reverts commit f317532779.

* ok back to passing
This reverts commit ba5052685f.

* Prove that CI fails when formatting is incorrect.

* Fix formatting

* Remove duplicitous E117 rule

* Use flake8 config for precommit

---------

Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-24 11:19:58 -04:00
George Hotz
086382b64e Revert "Fix max nan (#1298)" (#1334)
This reverts commit 50774470b2.
2023-07-23 20:41:28 -07:00
uncommonSensor
50774470b2 Fix max nan (#1298)
* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests

* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Turned off due to the need for granularity
2023-07-23 19:39:44 -07:00
cheeetoo
a0965ee198 CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
Umut Zengin
74e63fe4ee Added test_chunk and fixed (#1283) 2023-07-19 22:21:26 -04:00
Umut Zengin
fde9f0e60d Slice migrated in Eye op (#1281)
* Migrated from slice to pad and shrink, made cleaner

* Changed repeat with reshape and expand
2023-07-19 09:08:38 -07:00
Umut Zengin
fa0265b173 Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266) 2023-07-18 16:09:19 -04:00
Stan
ed472bffea Fix: negative axis in tensor.cumsum (#1261) 2023-07-17 16:16:38 -07:00
Adrian Kretz
5a8ad57163 Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
Stan
264d467f2b Added tensor.squeeze and support for testing exceptions (#1241)
* WIP: `tensor.squeeze` function

* Added `test_except` param to `helper_test_op` to avoid false positives

* Extracted new method `helper_test_exception` for testing exceptions

* Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch
2023-07-15 00:33:24 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
madt2709
bb316a42af Fix pow to work with negative tensors (#1191) 2023-07-09 17:33:04 -07:00
George Hotz
43385c7dbf remove contiguous on full (#1212) 2023-07-09 17:31:15 -07:00
George Hotz
67e34b356a good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz
7151382364 Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
Eli Frigo
801564f31b Remove POW llop and add SQRT llop (#1104)
* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
George Hotz
793a670187 from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
George Hotz
c709dec8b5 gelu: weird test was broken for metal 2023-07-04 00:43:54 -07:00
George Hotz
daf8e1942f sigmoid: test large postive also and add note 2023-07-04 00:18:31 -07:00
Kunwar Raj Singh
9e6067378f Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113)
* Add failing sigmoid test

* update more tests

* add mlop for sigmoid

* add back test

* math.log(math.e) = 1

* remove divides

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-04 00:14:22 -07:00
geohotstan
575f75f613 hello (#1084) 2023-07-01 01:29:35 -07:00
Jacky Lee
754e54ebb9 Fix Tensor ceil and floor for whole numbers (#1071)
* Works on non-special numbers

* Test different cases
2023-06-27 23:22:17 -07:00
George Hotz
d16c16ec28 new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
George Hotz
3e33befc1d realize hotspots (#1059)
* realize hotspots

* no str check

* minor changes

* make this an assert

* faster and more readable

* nicer self.buffers

* tests for weak op + LAZYCACHE=0
2023-06-26 18:31:18 -07:00
Kunwar Raj Singh
5d3310ce56 MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
Francesco Castelli
6ff720103e Reduce tensor dot line count and fixed 1d tensor dot (#1045)
* fixed tensor.dot

* no 1d dot for image=1

* shorter lines

* add 3d dot tests
2023-06-25 10:32:45 -07:00
George Hotz
18892242b0 global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
George Hotz
0d4c4f4e9e metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
George Hotz
5428b5d774 good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
Diogo
d2b837c1d9 Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
George Hotz
ba56ee6020 RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
George Hotz
80e665bddb a couple new tests 2023-06-13 12:36:05 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz
48e9461197 broken tests for #862 and #942 2023-06-09 22:02:59 -07:00
cloud11665
43ea1614b0 fix inf/nan codegen (#935)
* fix inf/nan codegen

* remove nasty oneliner, fix -inf

* inf/nan const mul/div tests
2023-06-05 11:24:09 -07:00
Filip Dimitrovski
78460034ff Initial ellipsis support when slicing Tensors (#843)
* Initial ellipsis support when slicing Tensors

* Better comments in ellipsis slicing

* Formatting
2023-06-05 07:52:49 -07:00
Tom Edwards
5bbcbd145c Add cumsum with n-dim inputs (#922)
* add cumsum with n-dim inputs, over arbitrary axis + relevant tests

* increased rtol for cumsum test

* move test_cumsum into test_ops

* skip arange test for images as relies on cumsum

* Fix typo

* rewrite cumsum to work with images
2023-06-04 16:55:23 -07:00
Alexey Zaytsev
5feee9c94b Fix .std() tests on torch=1.13 (#904) 2023-06-02 07:33:51 -07:00
SnakeOnex
67a7674787 added conv1d tests -> simple, padding, stride, asymmetric padding (#896)
* added conv1d tests -> simple, padding, stride, asymmetric padding

* fixed linting

* skip conv1d tests for images
2023-06-01 13:10:37 -07:00
Joqsan
ef129bcb85 Zero dim Tensor support (#777)
* add and reorganize test_slice_* tests

* refactor Tensor.__getitem__()

* preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones

* always compare shapes of the numpy arrays obtained from tinygrad and torch tensors

* add more tests for 0D support

* remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py

* add zero-dim support

* make test_end2end.py consistent with 0dim support

* add test for tensor with zero in shape

* don't simplify ones if shape is ()

* skip tests that need zero-size tensor support.

- zero-size tensor support not related to 0dim tensors.

* add tests for __getitem__() supporting strides >= 1

* refactor __getitem__: support for strides >= 1

* minor refactors and add comments to __getitem__

* add tests for slices with negative steps

* add support for slices with negative strides
2023-06-01 11:32:02 -07:00