Commit Graph

10417 Commits

Author SHA1 Message Date
Mehmet Kuzucu
c3173ff281 Add return statement to the train function (#1135)
add a return statement to the train function in order to provide access to the losses and accuracies lists
2023-07-05 08:13:38 -07:00
wozeparrot
981d4980c4 feat: reword contributing (#1131) 2023-07-04 22:17:47 -07:00
George Hotz
793a670187 from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
George Hotz
2f968f8547 ignore cloudpickle type for local mypy 2023-07-04 13:51:20 -07:00
George Hotz
87d21ea979 examples: simple conv bn 2023-07-04 13:50:26 -07:00
Reza Rezvan
535224ac20 Remove float64 (#1101)
* Refactor: Remove float64

* Refactor: Remove unused imports

* Refactor: Remove float64

* Refactor: Remove float64

* Refactor: Exclude float64 onnx backend

* Add: Skip jacobian and gradcheck tests;
2023-07-04 08:40:51 -07:00
Daniel Hipke
b4ce23e4b8 Make cross_process use cloudpickle (#1118)
* fix syntax issues in imagenet_download.py

* use cloudpickle in cross_process to make it work in Python 3.9+

* add cross_process test

* prevent unpickling on every function call

* add cloudpickle to setup.py

* add support for args/kwargs
2023-07-04 00:47:34 -07:00
George Hotz
c709dec8b5 gelu: weird test was broken for metal 2023-07-04 00:43:54 -07:00
George Hotz
daf8e1942f sigmoid: test large postive also and add note 2023-07-04 00:18:31 -07:00
Kunwar Raj Singh
9e6067378f Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113)
* Add failing sigmoid test

* update more tests

* add mlop for sigmoid

* add back test

* math.log(math.e) = 1

* remove divides

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-04 00:14:22 -07:00
Daniel Hipke
d58a9603ab Create COCO data directory if it doesn't exist. (#1114)
* Create COCO data directory if it doesn't exist.

* update paths to support windows
2023-07-03 18:15:53 -07:00
Anselm Coogan
a22aad7d32 Use generators instead of lists in anys and alls (#1111)
* Use generators in any(..) instead of lists for better best-case

* Use generators in all(...) instead of lists

* enable R1729 in .pylintrc

* revert import sorting

---------

Co-authored-by: Anselm Coogan <anselm@scandit.com>
2023-07-03 16:06:06 -07:00
tricky-labyrinth
fd98f6cffa Small fix to abstractions.py so it runs on Windows without throwing an AttributeError (#1109)
Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>
2023-07-03 13:44:49 -07:00
Mike Ovyan
651d080594 [perf] Replace more list comprehension with * (#1106)
* [perf] Replace more list comprehension with *

* comeback

* final fix?

* blind me

* kill me

* ?

* rev

* [none]
2023-07-03 10:49:23 -07:00
Frank Pinnola
2071e53da8 Handle broadcast flag on gemm (#1103) 2023-07-02 22:15:07 -07:00
Taras Tsugrii
cbb5c655e5 [tensor][perf] Replace list comprehension with *. (#1102)
It's more concise, idiomatic and faster:
```
In [8]: %timeit [1 for _ in range(100)]
2.12 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [9]: %timeit [1] * 100
515 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```
2023-07-02 18:34:23 -07:00
David Hou
363fbfc2e4 do not emit loop end code for global+local loops in assembly kernel (#1100) 2023-07-02 18:33:57 -07:00
Reza Rezvan
8ae9a054ae Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo
10f1aeb144 fixed broken link (#1097) 2023-07-02 15:06:59 -07:00
Rob Grossman
c8ddc34368 include missing queue in thneed load (#1095) 2023-07-02 12:33:59 -07:00
nmarwell26
12ce68c1ee Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086) 2023-07-01 12:04:28 -07:00
Rob Grossman
2533a992e7 remove unused imports in models (#1088) 2023-07-01 12:04:19 -07:00
geohotstan
575f75f613 hello (#1084) 2023-07-01 01:29:35 -07:00
foreign-sub
574cbda979 Quickstart (#1015)
* fix quickstart md

* add quickstart to ci
2023-06-29 13:26:58 -07:00
Roelof van Dijk
542b2d93a5 Perf/cache string ops (#1078)
* perf: remove extra function, include in cached getitem

* perf: only calculate hash once per node

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-29 13:23:11 -07:00
George Hotz
e234bf2298 hip matmul : add K support 2023-06-28 19:54:33 +00:00
George Hotz
0e93b9642a hip matmul 2023-06-28 19:21:01 +00:00
Jacky Lee
754e54ebb9 Fix Tensor ceil and floor for whole numbers (#1071)
* Works on non-special numbers

* Test different cases
2023-06-27 23:22:17 -07:00
George Hotz
1f5d45ca8c imagenet loader minor cleanups 2023-06-28 05:08:09 +00:00
George Hotz
6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
George Hotz
9fabdbd054 speed (#1070) 2023-06-27 20:28:57 -07:00
George Hotz
d16c16ec28 new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
ernie
4d703be6d7 fix typo (#1065) 2023-06-27 10:56:54 -07:00
George Hotz
70c07dfea5 5k line max (#1064) 2023-06-27 10:53:18 -07:00
George Hotz
c8d87eb8d4 strip whitespace 2023-06-27 10:11:43 -07:00
Rayan Hatout
23648538fa fix folding of float4 add/mul (#1060) 2023-06-26 20:59:29 -07:00
George Hotz
a98e361da0 torch speed test, add add 2023-06-26 18:55:27 -07:00
George Hotz
3e33befc1d realize hotspots (#1059)
* realize hotspots

* no str check

* minor changes

* make this an assert

* faster and more readable

* nicer self.buffers

* tests for weak op + LAZYCACHE=0
2023-06-26 18:31:18 -07:00
George Hotz
2977fb17f6 various touchups (#1058)
* op isn't optional

* barrier + named local buffers

* end global and local loop together to avoid useless if statement

* better comments
2023-06-26 15:41:23 -07:00
George Hotz
f265e8523a movement ops aren't really ops (#1056) 2023-06-26 15:01:28 -07:00
Rayan Hatout
65cbaa3429 no need to slice A and B twice in LLaMa complex multiplication (#1054) 2023-06-26 14:42:58 -07:00
George Hotz
571089f10e Back off minor speed stuff for simplicity (#1053)
* passing in buffers doesn't increase speed

* functools.reduce

* no more get_buffers
2023-06-26 14:42:17 -07:00
Rayan Hatout
dedbd970aa Optimizations in lazy.py (#987)
* optimizations in lazy.py

* make mypy happy with stubs and fix the graph import hack

* merge conflict in helpers.py
2023-06-26 13:55:42 -07:00
Roelof van Dijk
8bea6b6d35 perf/refactor_weakops (#1052)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 10:13:33 -07:00
Roelof van Dijk
8c65f9324c refactor: print formatting for llama timing (#1050)
* refactor: print formatting for llama timing, report median and individual runs

* feat: back to mean

* fix: whitespace

* fix: add mean to print

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:49:31 -07:00
Roelof van Dijk
c604ef4beb symbolic.py: faster Node.sum, faster SumNode.div (#1014)
* refactor: replace isinstance with class check where possible

* refactor: faster partition

* fix; flake8

* feat: rework node.sum, correct list typing

* fix: typo

* feat: refactor sum

* fix: pylint

* refactor: simpler sum and factorize

* feat; clean up sumnode div, all cpu tests pass

* feat: simplify floordiv, cache factorization

* don't factor numnodes at all

* python 3.8 functools does not yet have @cache

* fix: restore assert

* refactor, fix failing tests

* fix: address review comments

* feat: rework, add specialization, remove cache

* fix: remove specialization

* feat: no tuple conversion, faster loop

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:47:17 -07:00
Casey Primozic
52b7105f87 Dedup params in Optimizer (#1047)
* Dedup params in optimizer

 * Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced.  This dedups tensors to avoid the problem.

* Fix types

* Use new variable to satisfy linter

* Use `helpers.dedup` instead of `set()` to dedup params

* Add test for duped params in optimizers
2023-06-26 00:49:23 -07:00
Kunwar Raj Singh
5d3310ce56 MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
George Hotz
0f281e7b18 touchups 2023-06-25 15:24:26 -07:00
George Hotz
c8fbdeb48e test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00