86 Commits

Author SHA1 Message Date
chenyu
b5d700adae update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
643cbdfd50 make embedding and GPT-2 fast (#1631)
* make embedding fast

* jit more, variable shape support

* print mem bw
2023-08-22 15:14:38 -07:00
George Hotz
718ced296c move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
Yixiang Gao
8d6662a741 .cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
chenyu
ae39cf84ab Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
nimlgen
bd111411bf init allocator for compiled backends (#1467)
* init allocator for compiled backends

* Update ops_webgpu.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-17 10:33:32 -07:00
George Hotz
1e1d48b4e6 single model (#1560) 2023-08-16 13:22:19 -07:00
JaSpa99
491e85597a Run onnx commavq model (#1537)
* try to run commavq

* fix 0 dim, start implementing new ops

- Implement EmbedLayerNormalization
- Implement Attention

* SkipLayerNormalization and FastGelu

* use original torch model, cast inputs

* fix some ops:

- properly do Cast
- Attention: bi- and unidirectional
- FastGelu: add bias before gelu

* cleanup onnx_ops.py

* add validation option to benchmark

* cleanup imports

* add checks incase onnx2torch implements ops in future

* run onnx instead of original torch

* just skip gpu on m1

* reactivate the other models

* check for strange params & squash whitespace

* cleanup

* fix causal mask Attention

* Range doesn't need int cast

* embedding vocab_counter same dtype as input

* no need to cast

* always validate, fix PosixPath ort

---------

Co-authored-by: George Hotz <george@comma.ai>
2023-08-16 12:24:40 -07:00
Steven Anderson
93a36c3659 Arm (#1421)
* testing new memops

* better debugging

* testing padded conv

* branching with load

* refactoring a bit

* first try

* fixing bugs

* fixing some

* eq

* eq2

* do not use x's

* working

* fixing imm

* getting things working

* refactor

* pow not working

* working except one

* refactor: one store mem

* refactor: global load

* refactor: imm

* refactor: cleaning

* fixing big offsets

* refactor with ci

* try ci

* typo

* another typo

* ubuntu default

* forgot git

* do i need git?

* missing packages

* adding python-dev

* with cache?

* buildx action

* buildx name issue?

* maybe now?

* python3

* newline warning

* maybe now

* i actually need this

* ci should work now

* improved caching

* fixing cache

* maybe now it will cache

* this

* testing cache

* trying again

* load

* missing platform

* caching gha

* testing cache

* full testing

* typo

* now?

* why

* adding checkout back

* bad formatting

* fixing convention issues

* supporting python

* adding CI flag

* testing all

* better comments

* adding debugging

* takes 12x longer

* does it output progress now?

* ignore models for speed

* fixing merge

* excluding conv_transpose2d

* only 2 test cuz is to slow

* another approach

* let's see

* faster duh

* my bad

* T_T

* typo

* sup

* with output?

* comment test

* comment test

* comment test

* :?

* no comment

* with cache

* back to normal

* testing that ci works

* back to passing

* trying again

* does it create another entry

* does it create another entry?

* build local

* hey

* Revert "excluding conv_transpose2d"

This reverts commit cc7348de03.

* does it cache if done before?

* does it cache?

* done

* adding test ops

* bad formatting

* no need for this

* working static mem

* sum 1d

* add ndim

* better reg import

* fix stack

* back to np

* working except for softmax

* 5 failing

* no pogress

* remove keystone

* remove keystone

* testops passing

* cleanups

* more cleanup

* typo

* ci

* ci2

* cond import

* ci3

* ci4

* ci4

* ci5

* ci5

* ci6

* aligment

* test all

* correct test

* err read_unmapped

* passing test

* ignore for speed

* ignore for speed

* ci7

* cleanup

* remove docker

* fixing merge

* fixing bugs

* add skipload for const ops

* comments

* First merge to master: Renderer

* fix emulation

* passing all tests arm64

* cleaning

* fix handcoded binary

* cleaning

* fix errs

* fix runtime arg binary

* clean git diff

* fix and clean

* fixing metal test

* cleaning

* fix metal test

* ci ~8 min

* fix pylint and clang

* cache the files in ops_clang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-14 19:29:30 -07:00
wozeparrot
29d5801387 distributed collectives (#1519)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device

* feat: allreduce

* feat: test

* feat: need contiguous

* feat: test in ci

* feat: exit with correct code

* feat: don't need that

* feat: opencl wait_for just doesn't work

* feat: synchronize on out

* feat: try?

* feat: try again?

* feat: add extra realizes

* feat: print

* feat: seed

* feat: tol

* feat: test ones and zeros

* feat: remove print

* feat: are you just flaky

* feat: seperate scatter and gather?

* feat: just try synchronizing

* feat: remove print again

* feat: bring back difference

* feat: no sync

* feat: revert that

* feat: back to wait_for

* fix: typo
2023-08-11 10:22:07 -07:00
wozeparrot
7e7c9001e9 distributed world (#1481)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device
2023-08-10 10:00:51 -07:00
nimlgen
046fd7437a use fake buffer for external_test_speed_llama.py (#1478) 2023-08-07 22:05:44 -04:00
George Hotz
2ab282bfec run on update_benchmark too (#1460)
* run on update_benchmark too

* amd inference test

* name it better

* add 10 CIFAR training steps
2023-08-06 08:58:37 -07:00
George Hotz
bf21aec81f do benchmarking (#1451)
* do benchmarking

* system

* artifact

* go

* name artifact
2023-08-05 23:35:01 -07:00
nimlgen
1ba8ae62a1 Match Torch speed for sum reduction (#1387)
Co-authored-by: Alexander Edwards <alex@alexedw.com>
2023-08-05 22:27:33 -07:00
George Hotz
7fa730b506 external model benchmark test 2023-08-05 22:10:48 -07:00
George Hotz
84c430355e fix backends for new style (#1443)
* fix backends for new style

* fix method cache

* fix fakeless

* llvm blacklist

* fix kernel optimizer
2023-08-05 11:07:04 -07:00
Felix
97a6029cf7 Corrected a few misspelled words (#1435) 2023-08-04 16:51:08 -07:00
George Hotz
37fa7e96fb Revert "update editorconfig, enforce via CI (#1343)" (#1380)
This reverts commit da2efecbe2.
2023-07-31 10:35:50 -07:00
Pavol Rusnak
da2efecbe2 update editorconfig, enforce via CI (#1343)
* update editorconfig to set unix-style newlines and trim whitespace

* add editorconfig github action to the CI

* fix whitespace
2023-07-30 18:44:30 -07:00
chenyu
1fdf560fb1 simplify get_contraction (#1373) 2023-07-30 18:35:22 -07:00
Francis Lam
9d142430cb Add option in llama.py to quantize weights to int8 at runtime (#1289)
* Add option in llama.py to quantize weights to int8 at runtime

Also added lm-eval to external

* Add support for llama-2 evaluation
2023-07-24 17:22:38 -07:00
Pavol Rusnak
cd60b8561c Add LLaMA-2 support (#1284)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-07-24 17:12:02 -04:00
waifairer
d89fb729e5 flake8 (#1323)
* flake8: Ignore frequent violations, correct infrequent ones

* Ignore some rules in test

* Reorder test ignores

* Lint test + main

* EOF indent

* Include all E71,E72 errors

* Test the failing case in CI

* Revert "Test the failing case in CI"

This reverts commit 110add0a70.

* Push to test!
This reverts commit f317532779.

* ok back to passing
This reverts commit ba5052685f.

* Prove that CI fails when formatting is incorrect.

* Fix formatting

* Remove duplicitous E117 rule

* Use flake8 config for precommit

---------

Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-24 11:19:58 -04:00
cheeetoo
a0965ee198 CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
chenyu
aa05495620 symbolic stride (#1326) 2023-07-23 12:41:22 -07:00
George Hotz
45ecae1ab3 Revert "Match Torch speed for sum reduction on M1 (#1187)" (#1286)
This reverts commit 59af9b81c5.
2023-07-19 13:39:16 -07:00
Alexander Edwards
59af9b81c5 Match Torch speed for sum reduction on M1 (#1187)
* Add additional kernel when reducing multiple dimensions at once.

* Faster for smaller inputs

* Whitespace and naming

* Cleaner, guard for Metal only, and max 1 split rather than N

* Draft of different approach

* One additional kernel call for this test (as expected)
2023-07-19 09:18:58 -07:00
chenyu
a5f5330d91 Add Fuzz Test symbolic / shapetracker to CI. (#1278)
* Fuzz test symbolic and shapetracker

This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.

* mess again

* no tail

* test shapetracker too

* Revert mess and enable all tests

* removed leftover
2023-07-19 09:05:45 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Rayan Hatout
9975f24452 Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134)
* fold expands that precede a reduce if the reduction is on the same axis as the expansion

* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization

* add a test case to make sure we don't fold reduce-expand-reduce on different axes
2023-07-06 13:41:05 -07:00
Reza Rezvan
8ae9a054ae Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
George Hotz
d16c16ec28 new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
Roelof van Dijk
8c65f9324c refactor: print formatting for llama timing (#1050)
* refactor: print formatting for llama timing, report median and individual runs

* feat: back to mean

* fix: whitespace

* fix: add mean to print

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:49:31 -07:00
George Hotz
0f281e7b18 touchups 2023-06-25 15:24:26 -07:00
George Hotz
c8fbdeb48e test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00
Casey Primozic
aab9ee0fca Add RDNA3 assembler UOps.CAST partial support + other fixes/improvements (#1012)
* Add support for one case of `UOps.CAST` for RDNA3 assembler

 * Adds support for casting from `bool` -> `float32`.  Seems like a very common operation that is required in many places.
 * Fix bool register definition for vector operations
   * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32
 * Add vector support for some places that were scalar only in register definition and comparison ops
 * Fix some issues in what seems to be defunct `external_test_image.py`
   * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed

* Refactor RDNA3 assembler register definition

 * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day
2023-06-20 11:34:10 -07:00
George Hotz
0ac84d5e94 exclude a few more onnx tests 2023-06-19 08:51:29 -07:00
George Hotz
0fd648dff4 exclude more dumb onnx tests 2023-06-19 08:51:29 -07:00
Diogo
d2b837c1d9 Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz
2c324d0685 fix metal uaf (#964) 2023-06-09 21:28:06 -07:00
Diogo
666d151f8a Onnx slice fixups (#952)
* resolved some slice test errors and added some more debugging logs

* use same device in cumsum

* increased float priority

* onnx debug ouput match input
2023-06-07 19:44:30 -07:00
MohammedAlkhrashi
2b4baa97e9 exclude string type from external_test_onnx_backend.py (#918) 2023-06-03 19:10:52 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
Diogo
1a5d72f812 Onnx ops And, Or, Xor, Not (#847)
* onnx and, or, xor, not

* added bool type to llvm and clang

* removed float conversion

* switched where op to use tensor func
2023-05-29 11:09:20 -07:00
George Hotz
ddc9dafe62 tighten up the kernel count tests 2023-05-29 08:48:54 -07:00