Commit Graph

4433 Commits

Author SHA1 Message Date
George Hotz
b245f1307e add exp2 (#2192) 2023-10-31 17:48:42 -07:00
qazal
e2428b63a6 external (#2191) 2023-10-31 13:57:24 -07:00
nimlgen
8c07c73a9b Fix cl map buffer (#2190)
* fix gpu enqueue_map_buffer out of space

* add test
2023-10-31 12:02:46 -07:00
qazal
be5f185ac0 Higher test coverage for dtypes (#2156)
* refactor unit tests for dtypes

* add missing dtypes in llvmir.py and lib.py

* skip torch tests

* webgpu

* cleaner skips

* fix llvm bool casting issue using compare

* llvm 100% passing

* llvm segfault

* TEMP decrease timeout mins to 11

debug

* add bf16 to setup

* skip half tests in cuda cpu

* check for CUDACPU insetad

* add int16 to triton dtypes

* u16 for triton

* remove debug - diff is still hard to read

* derive from base class TestDType

* enhance test_upcast and downcast by running on every possible version

* dummy commit to rerun the flakey test

* skip the correct tests for CUDA

* bf16 should be skipped in the common TestDType cases

* re-enable bf16

* more consistent structure

* tiny changes to is_dtype_supported 1

* tiny changes 2

add reason

* fuzz

* fuzzer p2

* run fp32 twice

* remove duplicate fp32 run

* clang: use stdbool

* skip triton on bool casts

* merge and resolve conflicts
2023-10-30 22:38:42 -07:00
Akshay Kashyap
018bd29e37 Enable Multi-Output Export (#2179)
* Enable Multi-Output Export

* Add test

* Update examples and lint

* fix padding

* test ops

* dummy commit to rerun test

* revert cuda lint

* Enforce tuple/list of tensors

* subscripted generics

* put back webgpu test

* Re-enable WebGPU Efficientnet test
2023-10-30 18:42:26 -07:00
qazal
a7439af786 Fix llvm int->bool cast (#2164)
* add to ir

* add test case

* minimize diff

* todo

* enable fast math

* added both False and True case
2023-10-30 15:28:23 -07:00
chenyu
3c88af5071 use unique table name for each disk_cache test (#2184) 2023-10-30 13:49:49 -07:00
George Hotz
194e4ad6f8 Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162)" (#2182)
This reverts commit 8cf0bb9351.
2023-10-30 10:22:26 -07:00
Francis Lam
8cf0bb9351 optimizer: simplify GROUP and LOCAL to have one of each (#2162)
* optimizer: simplify GROUP and LOCAL to have one of each

Now that tensor cores only use LASTLOCAL, we can simplify to use
only that op everywhere.

The only use of GROUP is in matvec hand-coded opts and it doesn't
make a performance difference so switching to use only the top
behavior.

Also adds additional asserts to prevent tensor core dims from
being altered which causes bad kernels to be generated.

* search: remove duplicated actions
2023-10-27 11:37:44 -10:00
George Hotz
e0201922e3 Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142)
* stable diffusion < 324ms

* revert swap action

* fix tests due to more sum splitting

* REDUCEOP_SPLIT_THRESHOLD env var

* added from unaligned np test (#2134)

* align cpu buffer before copy into cl buffer (#2135)

* remove shelve from handcode_resnet50_opt.py (#2139)

* Add dictionary keys to reduce db size (#2131)

* work

* ignore beam cache

* dictionary keys are generic

* minor db cleanups

* fix baseline and extract dataset

* fix training

* log likelihood

* more lin to feats

* sts

* training policynet

* net sort of works

* dedup

* refactor, stupid new actions

* fix uops deduping

* BEAM_ESTIMATE

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>
2023-10-27 10:53:06 -10:00
chenyu
9215bccb41 Tensor.uniform set default to standard uniform (#2158)
* Tensor.uniform set default to standard uniform

* clean up test to reuse function
2023-10-27 16:15:30 -04:00
Roelof van Dijk
36ab04ae35 perf: lazyop as dataclass (#1603)
* perf: lazyop as dataclass

fix: linter

fix: restore eq

* use builtin methods, buffers to property to allow freezing

* fix: reduce diff

* fix: can't freeze due to KOPT tests, mypy

* fix: explicit hash

* can freeze if tests are fixed

* fix: typo

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-25 17:54:30 -04:00
Francis Lam
bf3490cdf9 wmma: refactor tensor cores using existing local dims (#2097)
* wmma: refactor tensor cores using existing local dims

* optimizer: fix bad rebase and break after one late local

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-25 13:10:46 -04:00
wozeparrot
c29653605e hip multigpu training (#1878)
* feat: move to hip

* feat: special path for RawBufferTransfer

* feat: initial rawbuffertransfer

* feat: hip ipc

* feat: working hip ipc

* feat: need to base device without args

* feat: close mem handle

* feat: modified test

* feat: more multihip stuff

* clean: cleanup

* feat: cleaner

* feat: don't crash

* feat: test more

* clean: way cleaner hip wrapper

* feat: barrier

* feat: barrier

* feat: this breaks stuff

* feat: we can use empty here

* feat: maybe fix tests

* feat: maybe fix tests again?

* fix: probably fix tests

* feat: no waiting here

* feat: wait here

* feat: much larger test

* feat: need to sync here

* feat: make this async

* feat: no waiting!

* feat: cut here

* feat: sync copy

* feat: random imports

* feat: much cleaner world

* feat: restore this

* feat: restore this

* clean: cleanup

* feat: set this
2023-10-24 17:35:53 -04:00
nimlgen
2e89fd264f Refactor hipgraph (#2141)
* refactor hip graph

* linter happy

* happy liner
2023-10-24 15:45:56 -04:00
George Hotz
cea2bc7964 Add dictionary keys to reduce db size (#2131)
* work

* ignore beam cache

* dictionary keys are generic

* minor db cleanups

* fix baseline and extract dataset

* fix training

* log likelihood
2023-10-24 10:49:22 -04:00
imaolo
6ee0435263 added from unaligned np test (#2134) 2023-10-23 11:38:57 -04:00
George Hotz
6dc8eb5bfd universal disk cache (#2130)
* caching infra for tinygrad

* nons tr key

* fix linter

* no shelve in beam search

* beam search caching

* check tensor cores with beam too

* pretty print

* LATEBEAM in stable diffusion
2023-10-22 10:56:57 -07:00
Francis Lam
ace6b2a151 optimizer: add test for correctness of opts (#2124)
* optimizer: add test for correctness of opts

Also added OptOps.UPCASTMID to constrain valid axes for opts with
group_for_reduce.

* llvm: fix LinearizerOptions to correctly not has_shared

* optimizer: remove premature test scaffold for TC opts

* search: fix the action space
2023-10-22 08:02:22 -07:00
George Hotz
cb508e6923 uops graphing + phi (#2120)
* uops graphing

* add_phi_node

* less phi nodes

* where graph uops should live

* naming

* move it to external

* fix triton yolo

* fix clang and preserve behavior
2023-10-19 22:26:28 -07:00
20kdc
bedd028061 waifu2x vgg7: testcase, auto-RGBA->RGB, function to grab pretrained models, training "fix" (#2117) 2023-10-19 22:07:15 -07:00
qazal
36d4001b4f add test coverage for search (#2104)
* add test coverage for search

* only in compiled backends

* dont use device.default in decorator

* time_til is the other way around xd
2023-10-19 17:06:47 -07:00
David Hou
95e17ff0d4 fix wino mask upcast calculation (#2057)
* fix wino mask upcast calculation

* add tests for wino upcast hcopt

* add info to note

* real world wino hcopt test

* wino backward test

* whitespace
2023-10-18 16:54:48 -07:00
George Hotz
87b714b8cb split test_conv2d 2023-10-18 14:00:50 -07:00
George Hotz
15da96f393 print test durations and add speed (#2107)
* print test durations

* decrease sizes to increase speed

* faster

* GPU/CLANG onnx in seperate runner

* test split, move ONNX CPU CI

* simpler tests

* simpler uops test

* faster

* less cuda apt

* running ninja install

* apt install

* split fancy indexing
2023-10-18 13:46:42 -07:00
George Hotz
881fd7c141 add mops to graph, refactor IMAGE (#2100)
* add mops to graph, refactor IMAGE

* no reshape pushing

* add todo

* fix openpilot model alt

* push reshapes reduces kernels in new op

* IMAGE=2 is a first class citizen now
2023-10-17 21:27:51 -07:00
Umut Zengin
01b98b7f42 MulNode.__lt__ rule (#2086)
* Added the rule

* Added tests

* flake8

* self.b == -1 shortcut
2023-10-17 13:18:35 -07:00
Szymon Ożóg
4bef1591f0 Disable ocelot cache + fix matvec in triton (#2010)
* Revert "disable flaky triton test"

This reverts commit 1e15fdaee7.

* Update test.yml

* check if has shared for matvec

* disable ocelot cache for triton

* disable ocelot cache

* disable ocelot cache

* pass shared to triton uops tests

* temporary debugs for CI crash

* Revert "temporary debugs for CI crash"

This reverts commit fee3ea96c8.

* Revert "triton isn't tested, and allows this refactor (#2007)"

This reverts commit dea8bb0938.

* add runtime_args to every renderer, move triton local size override to runtime args

* Add binary to args, correct type returned

* update to new loops

* Update test.yml
2023-10-17 10:33:32 -07:00
geohotstan
5ed630204b Add ONNX to CI for other backends (#2069)
* some cleanup

* move continue back

* more more more

* added to CI

* try

* try intentionally break some tests

* wtf

* del True for test

* yay tests broke, now pls no break

* try AGAIN

* gahy

* lol

* try

* move over constant

* moved over MORE

* move shrink over

* trailing lines

* try CUDA CI

* try again

* boom

* oops

* improved comments

* try: disable some flags and disable CUDA

* try breaking tests

* traceback has too much info so add --tb=no

* revert forced CI failure

* add comments and del unused imports

* oooooooo using regular debug try enable tb

* intentionally break tests

* added tb back. Maybe not too verbose

* strip whitespcae

* missed something

* Shape op int32 -> int64

* oops missed something

* add some types

* get rid of crazy 1 liners in pad op

* actually test Split this time LOL

* strip that whitespace
2023-10-17 09:33:54 -07:00
George Hotz
5a4a62ecae Disable logging in early compile2 and lower kernel counts (#2090)
* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* gate behind OPT >= 4

* disable_logging in schedule

* simple

* from master

* more images

* revert that

* 206 kernels
2023-10-16 20:15:24 -07:00
George Hotz
1bf4aef0f5 fix image dtype cmp (#2089)
* fix image dtype cmp

* print that with debug 3
2023-10-16 17:52:38 -07:00
George Hotz
e4846771b2 Revert "limit metal buffers and revert the 207 fix (try 2) (#2088)"
This reverts commit 5e24dc5a95.
2023-10-16 17:50:11 -07:00
George Hotz
d0aaf7d83b Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)""
This reverts commit f22a7cf656.
2023-10-16 17:47:00 -07:00
George Hotz
5e24dc5a95 limit metal buffers and revert the 207 fix (try 2) (#2088)
* limit metal buffers

* look at the base, not the srcs

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* add a test for that
2023-10-16 14:52:16 -07:00
George Hotz
e8fcd2f3db Revert "limit metal buffers and revert the 207 fix (#2087)"
This reverts commit 2fb10f6a19.
2023-10-16 14:32:22 -07:00
George Hotz
2fb10f6a19 limit metal buffers and revert the 207 fix (#2087)
* limit metal buffers

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.
2023-10-16 14:26:32 -07:00
George Hotz
c36d306606 KOPT is over, BEAM is upstream (#2071)
* create cache for q learning

* make linter happy

* global beam

* where it belongs

* bugfix

* ditch the kopt, use the beam

* faster lin and DEBUG=2 okay

* remove kopt, move search to features
2023-10-16 09:46:03 -07:00
Umut Zengin
776605f2fc O(1) VALIDHACKS (#2072)
* first refactoring

* O(1) validhacks

* O(1) validhacks

* Some cleaning

* mypy

* flake8

* Trim trim

* flake8

* clean

* less chaotic

* less chaotic

* flake8

* Symbolic, SumNode include mulnode for gcd

* fix tests

* smal optim

* revert

* clean

* clean

* flake8

* small fix

* Add symbolic test
2023-10-15 11:26:41 -07:00
mmmkkaaayy
91168a28c4 whisper: make file transcription work, add basic CI test (#2042) 2023-10-13 17:13:35 -07:00
George Hotz
924ecc4d6a Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)
This reverts commit 63869c62fc.
2023-10-13 12:01:55 -07:00
Amrit Sahu
63869c62fc openpilot kernel fix from 209 to 207 (#2006)
* Fix openpilot kernel from 209 to 206

1. Use push_movement_ops conditions in _movement_op. Don't push
PAD or check if the ops are safe to be pushed with PAD

2. Don't push if all the op.buffers are realized

* change ALLOWED_KERNEL_COUNT to 206 for openpilot

* don't push through sourceless buffers

* change the tests to adjust kernel counts for new behaviour

* restore pushing of movement ops through childless buffer

* don't push EXPAND, causes OOM

* allow push of intermediate movement ops

* adding new test behaviour

* modifying external_test_opt for new behaviour

* restore old tests

* Reenable push of EXPAND and introduce new tests

I was wrong intially thinking EXPAND can cause OOM and hence I had
disabled it. Since it is 0 stride and doesn't allocate memory its cool

* Don't push EXPAND above LoadOps LB. This is causing OOM

* Push should be decided on movement root of bufs

To check if ast.op.buffers is sourceless/ realized go the the movement
root and then decide if pushing should be done or not

* refactor for readability

* use .base instead

* don't push expand, bad memory/compute consumption

* restrict push of reshape, seeing improvement

* push reshape if unary without further check

* disable PAD solves convnext kernel count increase

* reenable test_cache_binaryop_transpose

* small nit
2023-10-13 11:59:15 -07:00
George Hotz
90c777d815 remove apply_auto_opt (#2063) 2023-10-13 07:44:14 -07:00
nimlgen
bd42fa0b73 kernel cache (#2035)
* init compiled cache

* clang not compile to stdout

* use kwrags in compile

* remove some useless lines

* slimmer

* fix

* tabs

* retry

* remove decorator

* no race in hip

* smaller hip

* unused import

* unused pathlib

* path to str

* add test

* fix linter

* less lines?

* decorator is back

* update tests

* no hip version

* better comments

* a bit better test

* linter

* work wo decorator

* linter happy

* simpler return type

* more tests

* better comment

* readable

* readable

* readable

* compile returns bytes

* no ununsed imports

* readable
2023-10-13 06:32:01 -07:00
Umut Zengin
6b7ac5c431 ModNode __mod__ rule (#2039)
* Implement mod rule

* mypy

* feat: New test added
2023-10-12 11:30:10 -07:00
George Hotz
c5edb3c374 train value net, improve API, add BCE (#2047)
* api cleanups, BCE losses

* valuenet

* fixup examples

* learning okay

* add valuenet runner

* net improvements

* net improvements

* 40% win rate
2023-10-12 07:56:38 -07:00
geohotstan
8d6cecb25c Torch eq fix (#1562)
* init

* Revert "init"

This reverts commit 682bf2073a.

* kids dont do drugs

* one way to fix

* resolve merge conflict

* no more or

* clean up
2023-10-11 12:57:11 -07:00
George Hotz
41bfeb2c1e start work on auto opt (#2034)
* start work on auto opt

* lin failure

* not beating hcopt

* greedy

* timing is fast

* codegen.search

* greedy search in handcode_opt

* track running gflops

* clean up those files

* no failure
2023-10-11 12:54:53 -07:00
Francis Lam
81c7d750db test: fix test_linearizer.test_tensor_core test (#2036)
must use apply_tensor_core instead of hand_coded_optimizations
2023-10-10 14:48:28 -07:00
chenyu
e2b83f1b42 Variable.bind newer (#2017)
* Variable.bind attempt 2

* ShapeTracker.unbind

* fix llama

* fix types

* test case

* View.vars cleanup

* include mask in symbolic source

* mask can be sint

* st.unbind in bufferops

* assert ast contain free Variable only

* cleanup

* conservative unbinding reduce op arg

* move reduceop unbind

* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
qazal
e40f141203 Refactor and add more unit tests for disktensors (#2022)
* testing with the test_ops pattern

* add assign test

* flake8 complaining about single line fn

* slice 2d and minor cleanup

* make assign_slice a one-liner

* we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn

* back assign fn for np array

* implement __setitem__ in tensor.py

* dont re-slice the ret tesnsor

* one liner assign

* drop the permute test
2023-10-09 18:46:29 -07:00