Commit Graph

268 Commits

Author SHA1 Message Date
chenyu
175cdbe815 fix pad None will value (#2308) 2023-11-14 23:57:05 -05:00
George Hotz
78623ba204 two simple tests 2023-11-10 16:16:06 -08:00
George Hotz
85d26ddc36 uops loop removal (#2262)
* remove the loop

* cleanups

* tests failing still

* global_loop_ctx wasn't needed

* replace_op is cleaner

* minor opt

* cast opt was wrong

* uop_num

* uop num was dumb

* tuplize_uops

* torch tests

* fix test_uops
2023-11-10 15:24:47 -08:00
George Hotz
38b7f5a7fd less phi, proper phi (#2241)
* less phi, proper phi

* disable flaky whisper test
2023-11-08 16:13:43 -08:00
chenyu
719a97b337 fix IMAGE=2 failed with NOOPT=1 (#2209)
* IMAGE=2 failed with NOOPT=1

* fix it
2023-11-05 13:16:37 -08:00
chenyu
f582ec56d5 Replace (getenv("CI", "") != "") with helpers.CI (#2213) 2023-11-03 15:20:44 -07:00
George Hotz
b245f1307e add exp2 (#2192) 2023-10-31 17:48:42 -07:00
George Hotz
87b714b8cb split test_conv2d 2023-10-18 14:00:50 -07:00
George Hotz
15da96f393 print test durations and add speed (#2107)
* print test durations

* decrease sizes to increase speed

* faster

* GPU/CLANG onnx in seperate runner

* test split, move ONNX CPU CI

* simpler tests

* simpler uops test

* faster

* less cuda apt

* running ninja install

* apt install

* split fancy indexing
2023-10-18 13:46:42 -07:00
George Hotz
c5edb3c374 train value net, improve API, add BCE (#2047)
* api cleanups, BCE losses

* valuenet

* fixup examples

* learning okay

* add valuenet runner

* net improvements

* net improvements

* 40% win rate
2023-10-12 07:56:38 -07:00
geohotstan
8d6cecb25c Torch eq fix (#1562)
* init

* Revert "init"

This reverts commit 682bf2073a.

* kids dont do drugs

* one way to fix

* resolve merge conflict

* no more or

* clean up
2023-10-11 12:57:11 -07:00
George Hotz
adab724caa schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
geohotstan
e36148b1ce Make __getitem__ TINYer (#1661) 2023-09-02 23:01:01 -04:00
George Hotz
cd844ec4b2 remove Token class (#1723)
* no fusion

* no float4 grouping

* mulacc fusion is fine. remove uop_alu

* fully remove get_grouped_maybe_float4

* removed that test

* that's not float4 anymore

* disable failing arm64

* metal ops pass tokenless

* fix wmma

* update test_uops with new style

* fix gep

* fix float4 store

* fix float4 store more

* cuda tests pass

* disable broadcast pow

* fix ptx

* reenable arm64

* bring cse back

* don't cache the acc

* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz
458eb89463 minor changes from prerender (#1734) 2023-09-01 10:04:47 -07:00
George Hotz
e3a062ad17 real matvec test 2023-08-31 17:27:25 -07:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b do replace div->mul for non-floats (#1644) 2023-08-23 07:34:31 -07:00
George Hotz
db8344ab83 add noalias to llvm (#1622) 2023-08-22 09:26:01 -07:00
George Hotz
c64c47a6ae test arange simple 2023-08-21 20:16:17 -07:00
Umut Zengin
35bf21276f Argmax/Argmin Feature (#1576)
* implemented argmax and argmin

* lint

* lint

* match torch behaviour

* format

* removed flip
2023-08-20 18:46:46 -07:00
geohotstan
a293c18d34 Gather bugfix (#1561) 2023-08-16 19:53:14 -04:00
geohotstan
8763037f0e Fancy indexing is fancy wow and gather thing (#1399) 2023-08-16 18:35:49 -04:00
nimlgen
b6937acb7e fix casting behavior for interpreted buffers (#1525) 2023-08-13 19:21:37 -07:00
George Hotz
38fe84d92b cleanup mlops (#1521)
* cleanup mlops

* that line belongs there
2023-08-10 19:53:28 -07:00
geohotstan
07b79f210f llvmir support for bool <-> float casting (#1492) 2023-08-09 13:12:52 -04:00
Jacky Lee
ef5f648e2f Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502)
* Implement scaled_dot_product_attention and test

* Support attn_mask

* Support is_causal too

* Use in llama

* Don't forget to reshape

* Set requires_grad=False for causal

* Remove staticmethod

* Remove extra spaces
2023-08-08 23:27:13 -07:00
George Hotz
d24f936501 just cmplt (#1493)
* just cmplt

* fix maximum

* don't save, there's no backward

* ugh, no slot either

* eq is a scam
2023-08-08 13:58:10 -07:00
nimlgen
932dad1a2b fix cast bool->float in llvmir (#1480)
Closes #1479
2023-08-07 21:30:51 -07:00
Diogo
d7d1011f1e Add WEBGPU tests to CI (#1463)
* webgpu tests

* assert device is webgpu

* missed env set

* exclude failing ci tests

* ignore test file

* changed acc for adam test
2023-08-06 10:32:01 -07:00
Francesco Castelli
579f4615a0 Add assert for wrong matmul/dot shapes (#1438) 2023-08-04 18:16:56 -04:00
Umut Zengin
52db7d7435 inf, -inf support for pad (#1436) 2023-08-04 15:05:25 -04:00
Umut Zengin
8889821547 Const pad support to pad2d and slice (#1392)
* slice to pad2d migrate

* Gain line

* Mypy happy

* Mypy happy

* Revert

* whitespace
2023-08-02 08:58:52 -07:00
Diogo
ba5e3818a0 Limit dims based on max size (#1390)
* working

* whitespace

* changed defaults to None

* linter

* last linter error
2023-07-31 19:18:19 -07:00
Umut Zengin
0de5f20970 Re-open constant pad support to Tensor.pad (#1388)
* Added const padding support to .pad

* Linter
2023-07-31 17:08:57 -07:00
wozeparrot
32d1afa4b5 feat: correct case when base is 0 (#1360) 2023-07-27 13:53:38 -04:00
wozeparrot
c22e77abfd Match torch on fractional negative base pow (#1352)
* feat: match torch on fractional negative base pow

* feat: tests for trunc
2023-07-26 19:14:54 -07:00
Umut Zengin
d4ebadf2da Small Tensor.cat optimization and reformating (#1347) 2023-07-26 18:01:12 -04:00
geohotstan
4056f97187 Gather (#1329) 2023-07-25 15:05:41 -04:00
waifairer
d89fb729e5 flake8 (#1323)
* flake8: Ignore frequent violations, correct infrequent ones

* Ignore some rules in test

* Reorder test ignores

* Lint test + main

* EOF indent

* Include all E71,E72 errors

* Test the failing case in CI

* Revert "Test the failing case in CI"

This reverts commit 110add0a70.

* Push to test!
This reverts commit f317532779.

* ok back to passing
This reverts commit ba5052685f.

* Prove that CI fails when formatting is incorrect.

* Fix formatting

* Remove duplicitous E117 rule

* Use flake8 config for precommit

---------

Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-24 11:19:58 -04:00
George Hotz
086382b64e Revert "Fix max nan (#1298)" (#1334)
This reverts commit 50774470b2.
2023-07-23 20:41:28 -07:00
uncommonSensor
50774470b2 Fix max nan (#1298)
* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests

* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Turned off due to the need for granularity
2023-07-23 19:39:44 -07:00
cheeetoo
a0965ee198 CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
Umut Zengin
74e63fe4ee Added test_chunk and fixed (#1283) 2023-07-19 22:21:26 -04:00
Umut Zengin
fde9f0e60d Slice migrated in Eye op (#1281)
* Migrated from slice to pad and shrink, made cleaner

* Changed repeat with reshape and expand
2023-07-19 09:08:38 -07:00
Umut Zengin
fa0265b173 Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266) 2023-07-18 16:09:19 -04:00
Stan
ed472bffea Fix: negative axis in tensor.cumsum (#1261) 2023-07-17 16:16:38 -07:00
Adrian Kretz
5a8ad57163 Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
Stan
264d467f2b Added tensor.squeeze and support for testing exceptions (#1241)
* WIP: `tensor.squeeze` function

* Added `test_except` param to `helper_test_op` to avoid false positives

* Extracted new method `helper_test_exception` for testing exceptions

* Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch
2023-07-15 00:33:24 -07:00