Commit Graph

2555 Commits

Author SHA1 Message Date
Francis Lam
df86672bd4 Fix LazyBuffer SHUFFLE_PAD_OPS to prevent invalid pad movement (#1223)
In addition to div, any ops that will generate non-zero outputs from
zero inputs need to be guarded.
2023-07-11 15:30:35 -07:00
chenyu
ab645317c9 Fix constant folding for Tensor([3]) (#1227)
* Fix constant folding for Tensor([3])

* Remove duplicated prod import

* load in the same device

* better numpy

* add constant fold shape test cases

* improve tests
2023-07-11 14:01:32 -07:00
madt2709
bb316a42af Fix pow to work with negative tensors (#1191) 2023-07-09 17:33:04 -07:00
George Hotz
43385c7dbf remove contiguous on full (#1212) 2023-07-09 17:31:15 -07:00
George Hotz
67e34b356a good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz
7151382364 Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
fluffy χατγιρλ
628ee46627 Fix bug where Tensor.randn returns inf (#1192)
* fix randn inf bug

* add test

* more compact test

* clarify test purpose
2023-07-08 12:03:46 -07:00
George Hotz
0ad99038ef Revert "Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156)" (#1181)" + add test
This reverts commit a374b62bfe.
2023-07-07 18:37:04 -07:00
George Hotz
a374b62bfe Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156)" (#1181)
This reverts commit 8ff7184b1b.
2023-07-07 18:29:05 -07:00
fluffy χατγιρλ
8ff7184b1b Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156)
* init shape tracker with strides to fix mismatch

Author:    sekstini <sekstinilol@gmail.com>

* fix whitespace

* add tests
2023-07-07 18:28:21 -07:00
Stan
69d33cab0d Fix: auto create parent dir when downloading file (#1173)
* Fix: auto create parent dir when downloading file

also removed duplicate import `os`

* Added test for auto parent dir creation when downloading file
2023-07-07 13:40:29 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh
8391648822 Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
Rayan Hatout
9975f24452 Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134)
* fold expands that precede a reduce if the reduction is on the same axis as the expansion

* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization

* add a test case to make sure we don't fold reduce-expand-reduce on different axes
2023-07-06 13:41:05 -07:00
Eli Frigo
801564f31b Remove POW llop and add SQRT llop (#1104)
* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
Reza Rezvan
d1356cac27 Fix: Jacobian tests [WIP] (#1126)
* Fix: Jacobian tests; num_jacobian either bugged or not accurate enough;

* Fix: Jacobian tests;

* Fix: Gradcheck;
2023-07-05 15:36:22 -07:00
George Hotz
793a670187 from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
Reza Rezvan
535224ac20 Remove float64 (#1101)
* Refactor: Remove float64

* Refactor: Remove unused imports

* Refactor: Remove float64

* Refactor: Remove float64

* Refactor: Exclude float64 onnx backend

* Add: Skip jacobian and gradcheck tests;
2023-07-04 08:40:51 -07:00
Daniel Hipke
b4ce23e4b8 Make cross_process use cloudpickle (#1118)
* fix syntax issues in imagenet_download.py

* use cloudpickle in cross_process to make it work in Python 3.9+

* add cross_process test

* prevent unpickling on every function call

* add cloudpickle to setup.py

* add support for args/kwargs
2023-07-04 00:47:34 -07:00
George Hotz
c709dec8b5 gelu: weird test was broken for metal 2023-07-04 00:43:54 -07:00
George Hotz
daf8e1942f sigmoid: test large postive also and add note 2023-07-04 00:18:31 -07:00
Kunwar Raj Singh
9e6067378f Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113)
* Add failing sigmoid test

* update more tests

* add mlop for sigmoid

* add back test

* math.log(math.e) = 1

* remove divides

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-04 00:14:22 -07:00
Reza Rezvan
8ae9a054ae Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
geohotstan
575f75f613 hello (#1084) 2023-07-01 01:29:35 -07:00
Jacky Lee
754e54ebb9 Fix Tensor ceil and floor for whole numbers (#1071)
* Works on non-special numbers

* Test different cases
2023-06-27 23:22:17 -07:00
George Hotz
d16c16ec28 new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
George Hotz
a98e361da0 torch speed test, add add 2023-06-26 18:55:27 -07:00
George Hotz
3e33befc1d realize hotspots (#1059)
* realize hotspots

* no str check

* minor changes

* make this an assert

* faster and more readable

* nicer self.buffers

* tests for weak op + LAZYCACHE=0
2023-06-26 18:31:18 -07:00
George Hotz
2977fb17f6 various touchups (#1058)
* op isn't optional

* barrier + named local buffers

* end global and local loop together to avoid useless if statement

* better comments
2023-06-26 15:41:23 -07:00
Rayan Hatout
dedbd970aa Optimizations in lazy.py (#987)
* optimizations in lazy.py

* make mypy happy with stubs and fix the graph import hack

* merge conflict in helpers.py
2023-06-26 13:55:42 -07:00
Roelof van Dijk
8c65f9324c refactor: print formatting for llama timing (#1050)
* refactor: print formatting for llama timing, report median and individual runs

* feat: back to mean

* fix: whitespace

* fix: add mean to print

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:49:31 -07:00
Casey Primozic
52b7105f87 Dedup params in Optimizer (#1047)
* Dedup params in optimizer

 * Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced.  This dedups tensors to avoid the problem.

* Fix types

* Use new variable to satisfy linter

* Use `helpers.dedup` instead of `set()` to dedup params

* Add test for duped params in optimizers
2023-06-26 00:49:23 -07:00
Kunwar Raj Singh
5d3310ce56 MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
George Hotz
0f281e7b18 touchups 2023-06-25 15:24:26 -07:00
George Hotz
c8fbdeb48e test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00
Jacky Lee
5d16cc283f Docker fix (#1039)
* Docker test

* Remove extra installs

* Don't run full test

* No need for testing dependencies
2023-06-25 10:38:58 -07:00
Francesco Castelli
6ff720103e Reduce tensor dot line count and fixed 1d tensor dot (#1045)
* fixed tensor.dot

* no 1d dot for image=1

* shorter lines

* add 3d dot tests
2023-06-25 10:32:45 -07:00
cloud11665
2407690d82 add cuda on cpu tests (#1020) 2023-06-22 14:15:50 -07:00
George Hotz
18892242b0 global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Casey Primozic
aab9ee0fca Add RDNA3 assembler UOps.CAST partial support + other fixes/improvements (#1012)
* Add support for one case of `UOps.CAST` for RDNA3 assembler

 * Adds support for casting from `bool` -> `float32`.  Seems like a very common operation that is required in many places.
 * Fix bool register definition for vector operations
   * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32
 * Add vector support for some places that were scalar only in register definition and comparison ops
 * Fix some issues in what seems to be defunct `external_test_image.py`
   * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed

* Refactor RDNA3 assembler register definition

 * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day
2023-06-20 11:34:10 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
0d4c4f4e9e metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
George Hotz
0ac84d5e94 exclude a few more onnx tests 2023-06-19 08:51:29 -07:00
George Hotz
0fd648dff4 exclude more dumb onnx tests 2023-06-19 08:51:29 -07:00
George Hotz
5428b5d774 good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
Diogo
d2b837c1d9 Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
George Hotz
ba56ee6020 RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
George Hotz
039f0d372f delete ltypes (#984)
* delete ltypes

* only upcast float types

* test dtype on mac passes

* ugh, these upcasts
2023-06-15 16:24:45 -07:00