Commit Graph

4147 Commits

Author SHA1 Message Date
George Hotz
4447188051 gate METAL_FAST_LOAD 2023-12-01 15:28:40 -08:00
chenyu
e9426f4fe4 simpler get_contraction (#2552)
* simpler get_contraction

* and test
2023-12-01 18:02:52 -05:00
George Hotz
eb595588bb device.py cleanups for -1 line 2023-12-01 14:59:33 -08:00
George Hotz
f9b1de598f hotfix: metal fastpath on sonoma 2023-12-01 14:55:34 -08:00
George Hotz
f5de21e753 fast path for copy (#2548)
* fast copy

* ruff first

* flat_mv on malloc

* order + webgpu test
2023-12-01 11:34:47 -08:00
wozeparrot
28183c7438 feat: reword (#2549) 2023-12-01 10:56:18 -08:00
George Hotz
4c984bba7e bump version to 0.8.0, clean CI, remove requests (#2545)
* bump version to 0.8.0, clean CI, remove requests

* why was that even there
2023-12-01 10:42:50 -08:00
nimlgen
ff47be3a01 ruff check whitespaces (#2547) 2023-12-01 10:42:20 -08:00
George Hotz
8fd8399437 remove flake8 (#2544) 2023-12-01 09:48:41 -08:00
George Hotz
d8175a4380 simple fix (#2543) 2023-12-01 09:42:15 -08:00
qazal
04483f8187 refactor llvm consts (#2537) 2023-12-01 09:39:40 -08:00
nimlgen
badc97f824 hip & cuda to gpuctypes (#2539)
* cuda with gpuctypes

* hip gpuctypes

* graphs

* rename + linter happy

* use cpu_time_execution

* no ji in build_kernel_node_params

* remove hip_wrapper

* hip fix

* no arc

* smalle changes

* no clean moduke in cudacpu
2023-12-01 09:25:27 -08:00
qazal
0fb4ff30c8 share duplicate renders with cstyle (#2538) 2023-12-01 08:10:36 -08:00
chenyu
7fec966b5e bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
Joe Donovan
fa549d198d Remove type: ignore comments (#2533)
* remove some type ignore comments and fix errors

* remove unnecessary get_args import

* revert triton changes

* remove changes not in tinygrad
2023-11-30 22:15:55 -08:00
George Hotz
12fa846122 zero copy (#2531)
* zero copy

* zero copy test

* loads coder in milliseconds

* zero copy for cpu and torch

* src_from_buffer is None

* SLOW_METAL_COPY there
2023-11-30 18:38:41 -08:00
Matthias Kronberg
5394a05b9d Fix: Get item from ndarray before casting to int (#2525)
Directly casting is deprecated and will error in the future.
2023-11-30 18:34:31 -08:00
George Hotz
2c363b5f0b new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
chenyu
e56511b59a more type annotation for tensor and lazy (#2528)
* more type annotation for tensor and lazy

* don't need that
2023-11-30 17:50:22 -05:00
Davi Silva
ddeec24fa8 Cleanup & fix llama.py (#2524)
* docs, cleanup crap

* comma AI

* fix 70B

* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
chenyu
7d26452305 call ruff with --preview (#2522)
some checks are ignored without --preview
2023-11-30 13:59:00 -05:00
chenyu
5db0cdfbd3 support list of ints (or other Tensorable) in tensor indices (#2520)
* support list of ints (or other Tensorable) in tensor indices

* enable some index test cases
2023-11-30 12:46:33 -05:00
chenyu
bd941a0df1 first version of test_indexing (#2515)
* first version of test_indexing

* move to test/imported
2023-11-30 00:03:59 -05:00
chenyu
d210f6a786 minor device.py cleanups (#2510) 2023-11-29 18:16:25 -05:00
qazal
370cfbb957 Cleanup vectorized hip renders (#2497)
* add typedefs and make_dtypen functions

use ext_vector_type for half16 kernels

* remove the old test_render because we just use whatever cstyle has

* align vectors
2023-11-29 14:02:12 -08:00
George Hotz
abfc99187d cleanup realize (#2505)
* delete reallocs

* cleaner

* that's real

* less lines
2023-11-29 11:38:38 -08:00
George Hotz
3dedeaae74 rebalance tests (#2504)
* rebalance

* balance

* parallel apt-get for all

* .local/lib/python3.11/site-packages

* what is user doing

* is that path right

* Update test.yml

* okay where are you

* site-packages
2023-11-29 11:18:22 -08:00
George Hotz
065aff747e make webgpu test reliable (#2502)
* remove retry that doesn't work

* fix cleanup

* process exit in cleanup

* add space
2023-11-29 10:02:24 -08:00
George Hotz
6707f2588e use copyin (#2500)
* it's always copyin

* all RawBuffer are RawBufferCopyIn

* cleanups

* this fixes it

* requirements='C'

* more correct
2023-11-29 09:34:00 -08:00
George Hotz
947711a532 split metal and webgpu tests (#2501) 2023-11-29 09:32:09 -08:00
chenyu
3eb3c74675 metal ci tests everything (#2499)
* metal ci tests everything

* pretty good

* METAL
2023-11-29 12:04:37 -05:00
George Hotz
889acefe85 Support weird loads in Image (#2498)
* image support weird loads

* umm, that was always wrong

* openpilot compile fails with a weird error

* image test passes

* we have valids now

* clean that up

* no more required opts

* add fastvits test, fix bug

* minor cleanups
2023-11-29 08:30:46 -08:00
George Hotz
e333672675 realize cleanup (#2496)
* move that logic

* revert that change

* clean up transfer and asserts

* what's that junk
2023-11-28 21:08:39 -08:00
George Hotz
5629fc368c Use Buffer.STORE at the end of ASTs (#2494)
* work

* store broken

* interpreteds work

* this passes

* symbolic cpu

* fix tests

* fix opt tests

* images fail

* fix InterpretedFlopCounter

* stupid hack for images
2023-11-28 20:11:37 -08:00
Liam
cf0c9096a9 Removing METAL Skips as CI works (#2488)
* Test metal CI

* remove metal and CI restrictions

* enable dtype tests for metal ci
2023-11-28 19:46:59 -08:00
Jake
5588922884 Update cuda_matmul.py (#2495) 2023-11-28 19:46:01 -08:00
George Hotz
cdc3b95729 if you don't appreciate a 15 second timeout, you get a 10 second timeout 2023-11-28 17:44:09 -08:00
George Hotz
d87a246439 move to new cached fetch (#2493)
* move to new cached fetch

* extra.utils is over

* loads

* bump download cache

* bump timeout
2023-11-28 17:36:55 -08:00
George Hotz
ab5d14d4ba MEM -> LOAD (#2492)
* MEM -> LOAD

* keep legacy working
2023-11-28 16:46:37 -08:00
chenyu
a739c6646e fp16 in gpt2 attention (#2491)
* fp16 in gpt2 attention

* HALF
2023-11-28 19:27:03 -05:00
chenyu
847f0a02b1 non-simplifiable mod should result in ModNode (#2490)
* non-simplifiable mod should result in ModNode

* space
2023-11-28 16:52:19 -05:00
George Hotz
3f137b134a jax parallel matmul example 2023-11-28 13:48:11 -08:00
mmmkkaaayy
ddb6a33ae5 improve test assertions for jit cache len with graph executor (#2476)
* improve test assertions for jit cache len with graph executor

* delete newline

* unused import

* another unused import
2023-11-27 23:02:45 -08:00
chenyu
28a67106ca enable symbolic ops tests for hip (#2485) 2023-11-27 22:33:41 -08:00
Christopher Mauri Milan
7f01dd04f0 Apply ruff linting rules to tests (#2473)
* everything except F821

* enable F821 with noqa

* dumb fix

* fix remaining imports and (former) lambdas

* replace _ with noqa to avoid gc
2023-11-27 21:24:06 -08:00
Davi Silva
136dbd8b36 HIP CI that compiles (to RDNA3) but doesn't have to run (#2482)
* hip amd compilation

* gate the test properly

* cleanup unused import

* remove superfluous numpy conversion

* add SpeedyNet tests (f32 [passes] & f16 [fails])

* make CI verbose (error log from hip compiler)

* test the real ops_hip

* Merge branch 'tinygrad:master' into ci/hip-compilation

* fix CI

* cleanup

* really fix CI

* Fix CI Three: the refixening

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-27 21:17:06 -08:00
George Hotz
756b01f46f why were these ever called buffer (#2483) 2023-11-27 21:02:07 -08:00
George Hotz
acbe6d1b53 Revert "HIP compilation on CI targeting RDNA3 (#2459)" (#2481)
This reverts commit d275ff930a.
2023-11-27 20:41:21 -08:00
qtkite
cb507a9389 Remove the toCPU copy (#2445)
* Remove the rawbuffer copy in runtime/lib.py on line 44

* remove buffer view

* added metadata back, oops

* delayed cpu testcase

* whitespace

* whitespace

* buffer behavior as is

* Update test_jit.py
2023-11-27 20:37:13 -08:00
Davi Silva
d275ff930a HIP compilation on CI targeting RDNA3 (#2459)
* hip amd compilation

* gate the test properly

* cleanup unused import

* remove superfluous numpy conversion

* add SpeedyNet tests (f32 [passes] & f16 [fails])

* make CI verbose (error log from hip compiler)

* test the real ops_hip

* Merge branch 'tinygrad:master' into ci/hip-compilation

* fix CI

* cleanup

* really fix CI
2023-11-27 20:33:11 -08:00