Commit Graph

127 Commits

Author SHA1 Message Date
qazal
27af37f2ad misc: remove unused env vars (#3963)
* remove unused env vars

* delete CPU
2024-03-27 16:08:15 -04:00
George Hotz
68ca4d4276 split to schedule.py (#3949)
* split to schedule.py

* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76 create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
George Hotz
629cbc5587 only abstractions 2 (#3947) 2024-03-26 20:02:18 -07:00
chenyu
83f39a8ceb env var to change default float (#3902)
* env var to change default float to fp16 or bf16

looking for standard names for these. we have FLOAT16 that does something to IMAGE and HALF to convert weights.

working on default bf16 too.
```
RuntimeError: compile failed: <null>(6): error: identifier "__bf16" is undefined
    __bf16 cast0 = (nv_bfloat16)(val0);
```

remove that in cifar

* DEFAULT_FLOAT

* default of default

* unit test

* don't check default

* tests work on linux
2024-03-24 20:33:57 -04:00
nimlgen
3fb13ff892 HIP -> HSA in docs/env_vars (#3824) 2024-03-19 22:53:33 +03:00
qazal
337cd53444 multioutput ScheduleItem (#3699)
* refactor realize.py

* update docs

* update test_sched

* update runners and devices

* update openpilot and unit tests

* cleanup runner lowering

* update more tests
2024-03-13 08:59:38 -07:00
qazal
aec4c4f01b linearizer ast as a tuple of lazyops (#3689)
* multi store op linearizer

* currently we do only one output per kernel

* named opts
2024-03-11 15:39:04 -07:00
Jungwan Woo
e5ee6bb2bd fix outdated url in showcase doc (#3624) 2024-03-05 14:44:40 -08:00
geohotstan
9268a8b154 remove MULACC (#3459)
* init

* removed mulacc

* is uoptimize the problem?

* lol hax make work temporarily fix l8er

* revert extra/ changes

* clean up

* flaky metal tests?

* add back mulacc for metal

* revert last commit

* try skipping linearizer_failure tests

* skip flammit tests... cuz tests all work locally

* try narrow down exact linearizer failure test

* try 2

* try 4

* generated code is the exact same wtf why CI fails

* code for 15 and 17 are exact same with or without mulacc, this should pass

* try only 1 failure

* try garbage collecting lol...

* try del variables lol

* try gcing after del lol...

* is diskcache the problem???

* try disabling opts cache idk

* try remove hack

* try disable github metal cache...

* try CACHELEVEL=0 :D idk anymore

* try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas...

* revert

* actually not a HACK

* oops
2024-02-29 07:40:40 -05:00
Caleb Bunch
b41761488d change specific string 'CLANG' to DEVICE variable in abstractions2.py (#3488) 2024-02-24 07:51:39 -05:00
qazal
7864fb69d1 delete MovementOps (#3434)
* delete MovementOps

* keep extra/to_movement_ops.py
2024-02-19 23:21:44 +01:00
Daniel Yeh
0a4029c519 fix path to models folder (#3442)
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de>
2024-02-19 13:35:57 +01:00
xarkes
28a8b72024 Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
George Hotz
b1c0d8c99d remove cpu and torch backends (#3399)
* remove cpu and torch backends

* don't copy to cpu

* use clang instead of cpu

* multitensor gathers on the first device

* clang is cpu + use default

* fixup

* bugfix
2024-02-15 16:55:39 +01:00
George Hotz
a40df14fef ops_ext to replace cpu import (#3409)
* ops_ext to replace cpu import

* don't allow zero copy with as buffer

* memoryview(bytearray

* reenable test

* fix jit issue
2024-02-15 13:03:42 +01:00
George Hotz
6356474d6d Revert "ops_ext to replace cpu import (#3406)" (#3408)
This reverts commit 91eb93f85a.
2024-02-15 12:16:10 +01:00
George Hotz
91eb93f85a ops_ext to replace cpu import (#3406)
* ops_ext to replace cpu import

* don't allow zero copy with as buffer

* memoryview(bytearray

* reenable test
2024-02-15 12:14:58 +01:00
George Hotz
ce1f9f5556 hotfix: new linearizer docs 2024-02-12 18:56:30 +01:00
George Hotz
2e60012bcf move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
41efaa848c move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
Mason Mahaffey
3ebf7a3e38 reflect changes to shapetracker in doc printouts (#3349) 2024-02-08 16:20:30 +01:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
chenyu
1b508e0f71 fix fuzz_linearizer toCPU to as_buffer (#3158) 2024-01-17 13:18:46 -05:00
George Hotz
e4528543fa remove LLVMOPT 2024-01-15 16:01:09 -08:00
chenyu
e39cd3e7f2 update env_vars.md (#3127)
mostly removed deprecated ones. not clear how to maintain this especially for extra/examples
2024-01-15 01:06:56 -05:00
George Hotz
1f9aee8b6f remove numpy from device (#3123)
* remove numpy from device

* fix tests

* np item

* cleanups

* simplify with as_buffer

* no toCPU

* tinygradic

* cast to scalar
2024-01-14 19:36:05 -08:00
George Hotz
ea5824657d move fromcpu out of lazy.py (#3122)
* move fromcpu out of lazy.py

* fix abstractions2
2024-01-14 18:21:08 -08:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
chenyu
3ba591c3fd less outdated abstraction.py (#2917)
removed some old terms and updated types and code pointers
2023-12-22 15:31:02 -05:00
chenyu
50927defad s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
George Hotz
1765849937 new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
George Hotz
00d9eda961 FROM -> COPY, move vars_from_ast (#2675) 2023-12-07 16:32:30 -08:00
chenyu
9996f1adf9 no document prs (#2622) 2023-12-05 13:05:36 -05:00
Amrit Sahu
e8d6a6ef2e view.reshape without symbolic (#2218)
* handle reshape of contiguous subparts with explicit mask

* remove the add/remove ones logic in reshape

* accomodate ones in accumulate logic

* make multiply commutative

* fix linting

* make mypy happy

* add test for commutative mul

* merge dimensions in shape_strides for 1 range masks

* add offsets for merging

* fix linting

* add back explicit 1 reshapes

* fix mypy errors

* fix accumulate by includng state

* include non-zero stride dimension in acc

* small cleanup

* more compact to_shape_strides

* more logical cleanup

* compress more

* compress reshape mask

* adding some comments

* small bug fix

* improve test coverage

* remove explicit add remove ones

* small bug in test

* enable test_reshape_splitting_combining

* small fix

* 10 lines less to_shape_strides

* shorten reshape mask

* some more cleanup

* more cleanup

* introduce some symbols for compactness

* more symbols

* more cleaner

* lessen symbols, it became less readable

* remove merge_views from view.reshape

* change to_shape_strides to _merge_dims

* improve readability

* fix corner case

* cleanup

* better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10)

* rewrite _reshape_mask for readability

* fix white space

* add comment

* nice shorthands for readability

* add proof in docs

* small nit

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-04 12:46:53 -05:00
George Hotz
d6b404ac11 No dtype alloc (#2570)
* fix all allocs

* improve docs

* ugh fix fake alloc
2023-12-02 13:29:40 -08:00
George Hotz
5068e99d18 refactor to remove extra kernel params (#2563)
* refactor to have compiled kernel

* bugfixes

* docs/beautiful.py

* revert that

* fix tests
2023-12-02 00:32:25 -08:00
George Hotz
6733425095 lower schedule (#2559)
* lower schedule

* remove RAND, and don't put load in the JIT yet

* better fix for that test
2023-12-01 19:17:46 -08:00
wozeparrot
28183c7438 feat: reword (#2549) 2023-12-01 10:56:18 -08:00
chenyu
7fec966b5e bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
George Hotz
2c363b5f0b new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
Yingbo Ma
d43485ae9e Fix graph_uops (#2457)
* Load networkx when we need to graph uops

* Document GRAPHUOPS

* import nx in `graph_uops`
2023-11-27 18:42:48 -08:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
chenyu
c4dfde761e remove the commented import (#2463) 2023-11-27 11:50:41 -05:00
George Hotz
4da2ddea6e Interpreted cleanups (#2312)
* move the compiler out of ops

* don't return realized

* var_vals filter, fix custom

* typing
2023-11-15 09:02:23 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz
0c9b4ab885 no to_underlying (#2222)
* no to_underlying

* context is no longer used

* no more optimizing

* update docs
2023-11-05 21:34:20 -08:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
chenyu
5d5921d2c8 small doc env update (#2112) 2023-10-18 14:49:25 -07:00