Commit Graph

10417 Commits

Author SHA1 Message Date
Jacky Lee
5d16cc283f Docker fix (#1039)
* Docker test

* Remove extra installs

* Don't run full test

* No need for testing dependencies
2023-06-25 10:38:58 -07:00
Francesco Castelli
6ff720103e Reduce tensor dot line count and fixed 1d tensor dot (#1045)
* fixed tensor.dot

* no 1d dot for image=1

* shorter lines

* add 3d dot tests
2023-06-25 10:32:45 -07:00
George Hotz
9c6e507518 move accel into extra 2023-06-23 16:38:15 -07:00
Yair Lifshitz
7f73d6a4da Fix input path in examples/compile_efficientnet.py, examples/efficientnet.py. (#1034) 2023-06-23 16:34:33 -07:00
兰天游
0222ee7bd2 feat: fix shell alias on readme (#1022)
* feat: fix shell alias on readme

* feat: edit the install command
2023-06-23 00:00:34 -07:00
cloud11665
264b1e5f48 cache gpuocelot build in cuda CI (#1032) 2023-06-22 17:42:12 -07:00
cloud11665
2407690d82 add cuda on cpu tests (#1020) 2023-06-22 14:15:50 -07:00
Eli Frigo
e09219df0f fixed division by zero for fast kernels (#1021)
* fixed division by zero for fast operations

* made et closer to 0
2023-06-22 14:02:53 -07:00
George Hotz
18892242b0 global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Casey Primozic
aab9ee0fca Add RDNA3 assembler UOps.CAST partial support + other fixes/improvements (#1012)
* Add support for one case of `UOps.CAST` for RDNA3 assembler

 * Adds support for casting from `bool` -> `float32`.  Seems like a very common operation that is required in many places.
 * Fix bool register definition for vector operations
   * Use `vcc_lo` instead of `vcc` which seems to be required since it's configured to use wavefront_size=32
 * Add vector support for some places that were scalar only in register definition and comparison ops
 * Fix some issues in what seems to be defunct `external_test_image.py`
   * Some tests still don't pass for other reasons, but it at least runs now and one broken test is now fixed

* Refactor RDNA3 assembler register definition

 * Unify multi-registor code between dtypes and combine with single-register allocation since they're all untyped registers at the end of the day
2023-06-20 11:34:10 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
0d4c4f4e9e metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
George Hotz
0ac84d5e94 exclude a few more onnx tests 2023-06-19 08:51:29 -07:00
George Hotz
0fd648dff4 exclude more dumb onnx tests 2023-06-19 08:51:29 -07:00
Pasan Perera
b6102ba4ac added CUDA and PTX to env_vars.md (#1009) 2023-06-19 08:47:44 -07:00
Sayantan Das
e829e0e718 Update CONTRIBUTING.md (#1008) 2023-06-18 22:09:03 -07:00
George Hotz
d84c600e5d contibuting 2023-06-18 21:48:18 -07:00
Casey Primozic
651d6ea457 Minor improvements + cleanup to ops_gpu.py (#1006)
* Minor improvements + cleanup to `ops_gpu.py`

 * Add some previously undocumented environment variables from `ops_gpu.py` to `env_vars.md`
 * Update debug print for OpenCL to print the devices that will be used post-filtering with `CL_EXCLUDE`
 * Remove a couple unused or superfluous variables and assignments
 * Use `fromimport` shorthand to shave off a couple precious LOC
 * Couple small whitespace changes to clean things up

* Revert change to ordering of OpenCL devices

* Small refactor for OpenCL context creation
2023-06-18 21:26:40 -07:00
George Hotz
5428b5d774 good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
Yann Huynh
ccb51ff5b0 "Fixed argument passing in example yolov8" (#1004)
"Fixed argument passing in example yolov8"
2023-06-18 14:29:39 -07:00
George Hotz
b14b7bc749 don't make HIP the default...it's slower 2023-06-18 19:11:39 +00:00
Alex Wang
3d63c71e27 HIP backend (#750)
* llama works for HIP backend

* Use hipMemcpyAsync; Less lines of code

* Remove unused code

* Refactor

* Add comments; hipDeviceSynchronize

* HIP over GPU; Remove PyHIP dependency

* Cleanups

* Fix mypy check

* Merge master; Dump assembly code
2023-06-18 11:35:57 -07:00
Casey Primozic
805eef10dd Add tensorflow GEMM benchmark script (#1000)
* Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow
2023-06-18 10:57:45 -07:00
George Hotz
c690eeaca9 flip mulacc to save a line (#997) 2023-06-17 16:47:55 -07:00
Diogo
d2b837c1d9 Adds floor/ceil (#989)
* floor ceil impl

* control casting in numpy
2023-06-17 10:56:21 -07:00
sehaj
775287ed91 Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
George Hotz
fe71282ba1 faster RDNA assembly backend (#990)
* fast asm

* torch gemm
2023-06-16 12:06:38 -07:00
George Hotz
ba56ee6020 RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
George Hotz
dca084f227 minor == to is touchups 2023-06-15 17:11:12 -07:00
blake
041d96083c clang rt for msvc (#986)
* added platform config for clang runtime and tempfile dir for xplatform /tmp

* flake8 lint

* mypy lint

* pythonic?

* python?

* return darwin cflags

* <lines

* lint;
2023-06-15 17:06:44 -07:00
George Hotz
039f0d372f delete ltypes (#984)
* delete ltypes

* only upcast float types

* test dtype on mac passes

* ugh, these upcasts
2023-06-15 16:24:45 -07:00
Yahya Lmallas
804c45b5fc FIX: Can't pickle local object (#979)
_early_exec_process is a local function that is defined whiting the scope of another function, should be global
2023-06-14 12:32:17 -07:00
Rayan Hatout
2d567ef688 Optimizations in tensor.py (#974)
* optimizations in tensor.py

* make mypy happy

* revert split of Function class
2023-06-14 08:44:35 -07:00
Diogo
0629791cbd F64 support (#976)
* initial commit

* added osx check for opencl

* added llvm f64 conversions

* typo in llvmir

* more tests and modified unsupported error

* fixed linting error

* added pragma fp64

* simplified exclusion for OSX

* fixed device check and also added it to cast func

* added ifdef check for fp16 in ops_gpu

* Revert "added ifdef check for fp16 in ops_gpu"

This reverts commit 92de754d48.

* f64 prekernel signature match f16

* moved condition to buffer init
2023-06-13 21:31:31 -07:00
John Moore
45bc040a63 Fix typo (#978) 2023-06-13 15:15:45 -07:00
George Hotz
80e665bddb a couple new tests 2023-06-13 12:36:05 -07:00
George Hotz
ba4eadb04c PTX assembly support (#977)
* ptx assembly

* all ops tests pass

* fix tests
2023-06-13 12:31:42 -07:00
Rayan Hatout
727416201f Shapetracker optimizations (#966)
* optimizations in shapetracker.py

* revert micro-optimizations in assertions

* make mypy happy

* list comp instead of map in get_unsafe_resize_offset

* list comp instead of map in get_unsafe_resize_offset
2023-06-12 18:13:21 -07:00
cloud11665
5f13e7c3cf cuda: fix fp16, uint8, int64, half4 codegen (#968)
* cuda: add uchar, int64 typedefs

* cuda: fix float16 codegen

* fuck it, half4 stub. llama time!

* inline fp16 half4, revert changes to CStyleLanguage

* add inline just in case

* remove half4 operators

* use dict
2023-06-12 11:15:44 -07:00
Steven Anderson
e54b6c5e7f One hot (#972)
* passing with 1d indices

* passing all test

* cleanup

* using safe_numpy for scalar
2023-06-12 10:13:29 -07:00
Diogo
613c74ca9f maintain input tensor dtype (#969) 2023-06-12 10:12:47 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
George Hotz
48e9461197 broken tests for #862 and #942 2023-06-09 22:02:59 -07:00
George Hotz
c62c64f0b7 remove GeNode (#965) 2023-06-09 21:48:56 -07:00
George Hotz
2c324d0685 fix metal uaf (#964) 2023-06-09 21:28:06 -07:00
Steven Anderson
c0e558b77c Test nllloss (#958)
* works but slow

* work with NC and NCd1 it still slow

* refactor

* support for k dimensions

* without numpy
2023-06-09 09:00:29 -07:00
Diogo
6b1280f01c fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956)
* prelu and where casting

* typing for safe_numpy

* optional

* get rid of tracing in ci

* cleanup and resolved layernorm issues

* removed debug print
2023-06-08 16:09:19 -07:00
Nicklas Boman
5c7248c72d imagenet download and prepare (#928)
Changing if not exist to the exist_ok=True parameter and adding a variable check if you want to download training data also
adding variable to env_vars.md
2023-06-08 12:55:33 -07:00
George Hotz
df40a9c238 EXP+LOG -> EXP2+LOG2 (#954)
* EXP+LOG -> EXP2+LOG2

* update docs
2023-06-08 10:57:31 -07:00
Diogo
666d151f8a Onnx slice fixups (#952)
* resolved some slice test errors and added some more debugging logs

* use same device in cumsum

* increased float priority

* onnx debug ouput match input
2023-06-07 19:44:30 -07:00