41 Commits

Author SHA1 Message Date
Yixiang Gao
4d54afb6df sparse cat cross entropy (#1597)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs

* fix training loss

* add device
2023-08-21 14:14:54 -07:00
George Hotz
2e60920317 Revert "sparse cat cross entropy (#1591)" (#1596)
This reverts commit f0ee850e98.
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98 sparse cat cross entropy (#1591)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741 .cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Diogo
ba5e3818a0 Limit dims based on max size (#1390)
* working

* whitespace

* changed defaults to None

* linter

* last linter error
2023-07-31 19:18:19 -07:00
cheeetoo
a0965ee198 CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
Stan
872e2198fe Added nn.ConvTranspose1d (#1243)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-15 00:42:42 -07:00
Stan
a8f3b3f4ed Added test for nn.Conv1d (#1242) 2023-07-15 00:30:50 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
Diogo
0dab8edc97 support Int64 type in cstyle gen (#860)
* added metal int64 and some simple tests

* removed bool return type def

* typo in test

* also missing in clang and gpu runtimes

* switched order for opencl

* increased atol and removed new line in kernel prefix
2023-05-30 16:04:46 -07:00
Jacky Lee
5d212864b5 Add MLPerf UNet3D model (#775)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix

* Add KiTS19 dataset

* KiTS19: Implement iterate

* No batch load for this dataset

* Save results on iterate

* Implement dice score

* Add data prep and eval functions

* Resolve shape issue

* Conversion works but wrong values

* Segfaults when load_from_pretrained is called

* Fix segfault and assign properly

* Final result generated, though very slow

* Store and load final result to save time

* Fix typo in finalize

* Score computes

* More bug fixes, dice score is very low

* Working broken code

* Assign output values to result

* Getting a much higher score now

* Fix dataset preprocessing

* Mean DICE score of 88.5

* Ugh, typo

* Attempt to reimplement model

* Rename layers

* Tiny model works, kinda

* Accuracy? gone

* Implement InstanceNorm and match torch

* Test instance norm 2d and 3d

* Combined input block with downsample block

* Tiny model works, support strided convtranspose

* Commands to download dataset

* Clean up a bit

* unet3d_v2 -> unet3d

* Remove duplicated code

* Oops, put tests back
2023-05-28 20:38:19 -07:00
kposborne2
2163a1b049 Add shrink step to fix strided conv_transpose2d, and add to nn (#823)
* implement conv transpose 2d

* don't inherit, remove old assert

---------

Co-authored-by: Kyle <kposborne@gmail.com>
2023-05-28 07:52:45 -07:00
symlon
04284414db Batchnorm2d fixed running_var (#807)
* BatchNorm2d match pytorch

* removed comment

* Batchnorm2d test multiple sizes
2023-05-26 12:32:49 -07:00
wozeparrot
0dc333cfab Promote Embedding to nn (#798)
* feat: promote Embedding to nn

* fix: fix failing test

* feat: add test with jit

* feat: rewrite embedding to no longer need stacked for loops

* clean+fix: don't know how that happened
2023-05-25 18:39:45 -07:00
George Hotz
d6f4219952 LayerNorm2d for 2 lines 2023-03-20 16:58:43 -07:00
George Hotz
8c475ea86a relax atol, merge_view 2023-03-03 07:48:44 -08:00
George Hotz
9d539b8ebb more intuitive output shape from _pool 2023-02-28 11:41:48 -08:00
George Hotz
758515dcc0 conv2d is an hlop (#589)
* conv2d is an hlop

* shorter conv

* KOPT=-1

* alt imp

* MULACC

* smarter mulacc

* pop conv

* 7x7 -> 5x5

* didn't fix, that's not going to work

* this is faster and matches old behavior

* oh, non lazy just won't work with mulacc

* mulacc in torch

* bool types were creeping in

* optimizer is actually better with hlop conv

* fix pushing permutes issue

* refactor einsum_mulacc

* fix up readme

* update readme

* _image_conv2d

* fix bias addition location

* pushing permutes gets back to 200 kernels

* conv cleanup

* disable hlop conv

* don't hide that in helpers
2023-02-23 17:52:31 -08:00
George Hotz
628ce067a1 add tests to mypy 2023-02-22 07:07:38 -08:00
Kirill
7944cfdadc Remove Tensor.data (#565) 2023-02-18 16:36:12 -08:00
Jacky Lee
e172f0087a BatchNorm2D -> BatchNorm2d (#558)
* BatchNorm2D -> BatchNorm2d

* Fix typo
2023-02-16 12:31:49 -08:00
Jacky Lee
486f023e81 Rename Normalize and move to nn (#513)
* Rename Normalize and move to nn

* Match PyTorch for dim>1
2023-02-01 11:55:03 -08:00
George Hotz
487685919b Revert "Rename Normalize and move to nn (#415)" (#474)
This reverts commit d768acb6a9.
2023-01-25 07:50:04 -08:00
Jacky Lee
d768acb6a9 Rename Normalize and move to nn (#415)
* Rename Normalize and move to nn

* Fix comparison to None error

* Add test for GroupNorm

* Rename test case

* Flip parameters to match PyTorch

* Increase error tolerance

* Fix elementwise_affine on channels

* Match arguments with PyTorch

* Initialize weight and bias only when affine is true

* Is this it?

* A bit cleaner

* Handle case where weight or bias is None
2023-01-25 07:47:59 -08:00
George Hotz
392e57aea7 ugh, why did that fail 2022-10-01 13:38:43 -04:00
George Hotz
ecc1a0470d add Linear to tinygrad.nn 2022-09-07 07:40:48 -07:00
George Hotz
cff297ef9d w/e, that's a later prob 2022-07-17 12:32:50 -07:00
George Hotz
bf299802f8 fixup tests 2022-07-17 12:11:53 -07:00
George Hotz
8ba3d1f803 fix bn test, affine is True 2022-01-15 19:52:15 -08:00
George Hotz
bd21304e3c linear takes in weight and bias 2021-11-30 00:38:47 -05:00
George Hotz
dca076dbf1 remove dumb nn ops 2021-11-29 18:05:31 -05:00
George Hotz
ce3d198bb7 less lines and fix default device 2021-11-27 11:18:49 -05:00
Guglielmo Camporese
2b7589db64 Added ResNet-{18, 34, 50, 101, 152} (#271)
* added resnets

* fix minor

* fix minor

* resnet in models

* added resnet test

* added resnet train test

* added linear, conv2d nn tests

* fix minor in extra/training

* resnet in models

* fix minor

* fix tolerance for linear in nn test

* fix eval, this causes cpu and gpu UT failing

* revert transformer test

* fix minor for CPU test

* improved model get_params for sequential layer

* fix minor for params counting

* commented broken ops tests

* improved train for resnet
2021-06-21 09:37:24 -07:00
Skosh
78aa147b39 [WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
Liam
ebd72ff437 Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz
2f1b2c0a3b add transpose, start on transformer 2020-12-27 16:59:12 -05:00
Liam
bcf1518309 All devices are equal! (#196)
* Update all devices to be tested

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a08b

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.

* Add six dependency in requirements.txt

* Resolve failure to run tests

Move six into gpu required installs. Remove six from standard
installation.

* Remove repeated data conversion

* Refactor method names

Also reduce code with .to and .to_

* Dynamic device handlers

* Refactor DeviceTypes -> Device

* Add mem copy profiling back

* test_backward_pass_diamond_model passing

* Resolve Sum issue on GPU

* Revert batchnorm2d tests

* Update README with upadated API

* ANE testing with

* Last minute line gains
2020-12-15 23:44:08 -08:00
George Hotz
ffb96b2d0b batchnorm by marcelbischoff 2020-12-09 03:23:04 -08:00
George Hotz
17659f7dd7 gpu speedup, tests work on M1 2020-12-06 09:05:49 -08:00
George Hotz
c14473f87d unit test for batchnorm2d 2020-10-30 08:19:58 -07:00