Commit Graph

4539 Commits

Author SHA1 Message Date
qazal
c7b1d802f1 delete duplicate tests in test_linearizer (#4723)
* delete duplicate test

test_simplify_uop isnt needed

max works

* ci

* remove skip

* add skip back
2024-05-26 08:11:42 +03:00
nimlgen
c87b066b66 optimize nv sync (#4729)
* optimize nv sync

* sdma signal without wfi

* nv mockgou support

* sep change
2024-05-25 23:10:41 +03:00
chenyu
8415b14978 pow cleanup part 3 (#4731)
fast pow for int or (int+0.5) const exponent. and more comments
2024-05-25 15:48:52 -04:00
Szymon Ożóg
de5c69c4c9 Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0 pow cleanup part 2 (#4727)
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
85e57223bd pow cleanup part 1 (#4726)
use _broadcasted to convert 3 cases into 1. const simplification should be handled by const folding.
2024-05-25 03:24:10 -04:00
Szymon Ożóg
f7201b6852 Remove deprecated code (#4724) 2024-05-25 03:02:12 -04:00
wozeparrot
5f503226de finish tensor docs (#4722) 2024-05-24 15:57:43 -07:00
chenyu
edf27470c1 docs: fix stack and add dtype.DType (#4721) 2024-05-24 18:23:01 -04:00
chenyu
a16d2572a0 docs: clean up mentions of mlops (#4720) 2024-05-24 17:49:32 -04:00
chenyu
31358cbea5 change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
chenyu
ba116ff630 docs: fix mnist type and fixed seed and loss (#4717) 2024-05-24 16:18:30 -04:00
Szymon Ożóg
212025b53c Int mulacc for ptx (#4680)
* IntMulacc

* don't mov const

* Dont do int mulacc on ocelot

* Workaround for ocelot

* Remove ocelot workaround

* Fix tests that merged into mulacc

* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
chenyu
0ac761716a docs: logo and favicon (#4716) 2024-05-24 14:33:12 -04:00
Szymon Ożóg
a4de81e9a6 Update ocelot version (#4715) 2024-05-24 14:32:53 -04:00
chenyu
a894209bf7 docs: add ConstType to dtypes, limit function to its member (#4714) 2024-05-24 14:22:34 -04:00
chenyu
a41701ce71 docs: elementwise ops (broadcasted) and update examples (#4713)
* docs: elementwise ops (broadcasted) and update examples

* fix where

* space
2024-05-24 13:19:21 -04:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7 Fix int8 and uint8 on PTX (#4711)
* Fix mem type for uchar

* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
a921f3317f docs: move down tinygrad op and add missing methods (#4710) 2024-05-24 00:11:12 -04:00
chenyu
12ec02d6a3 docs: example formatting, multi examples, activation inputs (#4709) 2024-05-23 23:39:02 -04:00
chenyu
4398cc3654 update test_linearizer.py (#4707)
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
chenyu
8aee3f5a9a docs: split, chunk, pad2d, flatten, unflatten (#4706) 2024-05-23 20:34:40 -04:00
wozeparrot
2c56aa7fe0 activation function docs (#4705) 2024-05-23 17:12:16 -07:00
nimlgen
27abbd5b2b signal pool for nv/amd (#4701)
* signal pool

* useless
2024-05-24 02:09:52 +03:00
Francis Lam
49225522aa wmma: chain unrolled WMMAs and phi only at the end (#4703)
* wmma: chain unrolled WMMAs and phi only at the end

* fix linter and tests

* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d fix UOps.CAST noop for vectorized dtypes (#4704)
* ==

* add test

* not lazyop

* use str comparison for PtrDType

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 17:33:29 -04:00
Szymon Ożóg
00bc2b738c Fix tensor cores in PTX (#4698) 2024-05-23 16:27:51 -04:00
chenyu
38bc38cdff fix llama example quantize (#4699)
* fix llama example quantize

import quantize layers from new example llama3

add to mac benchmark

* fix that

* save the files
2024-05-23 15:35:26 -04:00
qazal
532c9e08e3 proposal: PHI nodes in TC shouldn't have children inside the loop (#4694)
* expectations from UOpGraph

* one with children

* minimal repro

* replace
2024-05-23 15:11:26 -04:00
chenyu
afb426acaf docs: gather, cat, stack, repeat, squeeze, unsqueeze (#4697)
* docs: gather, cat, stack, repeat, squeeze, unsqueeze

repeat can take separate args now to match torch

* new style for multi examples
2024-05-23 14:20:19 -04:00
chenyu
ce46a7e83f raise CompileError in metal if newLibraryWithSource_options_error_ fails (#4695) 2024-05-23 12:52:46 -04:00
Timmy
871a3292f4 Refactors linearizer acc to a Dict (#4675)
* dict accs refactor

* bug

* linters

* fix line length limit

* renaming do_reduce to reduce_acc b/c it's the acc for whatever reduce we are doing

* reduce_acc is None

* x.op and reduce_acc is not None

* delete extra check

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 19:05:23 +03:00
chenyu
72560e30fe add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693)
* add CACHELEVEL=0 to tinybox green GEMM BEAM

* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
Yury Zhuravlev
af56f0e68a fix HSA/KFD load for system-wide installation (#4218)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564 disable cuda test in ci (#4630)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:23:32 -04:00
Szymon Ożóg
9a9963ba7b Remove uops deepcopy from PTX (#4671)
* Remove uops deepcopy from PTX

* Update test

* Fix test

* fix for non-ptx

* Clean

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64 update Torch.gather api (#4692)
* update Torch.gather api

gather(self, dim, index) to match torch

* fix that
2024-05-22 21:54:06 -04:00
chenyu
792a494eb8 fix various examples (#4691)
* fix examples that used ax1 and ax2 for transpose

* fix that

* update those
2024-05-22 20:43:21 -04:00
wozeparrot
30b07f3c5d reduce ops (#4690) 2024-05-22 16:20:56 -07:00
chenyu
a46be6cfef docs for transpose (#4689)
* docs for transpose

change the arg from ax1, ax2 to dim0, dim1 too

* too clever
2024-05-22 18:44:33 -04:00
chenyu
86da83f86d move movement op docs (#4688) 2024-05-22 18:09:14 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707 isolated test for BEAM=2 llama wrong uops toposort (#4687)
* add ast

* skip test in CI
2024-05-23 00:47:37 +03:00
wozeparrot
6020595eb0 more tensor.py docs (#4686)
wow much docs
2024-05-22 21:28:26 +00:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
chenyu
f8f97562e0 remove File Specific Variables from env_vars.md (#4684) 2024-05-22 17:00:14 -04:00
chenyu
225dcab3be prepend _ to broadcast_shape and deepwalk (#4683)
* prepend `_` to broadcast_shape and deepwalk

internal only

* that too
2024-05-22 16:39:05 -04:00
qazal
c5f5755328 correctness test for multireduce nested locals (#4682)
* nested locals test

* move st
2024-05-22 19:35:35 +03:00
chenyu
bc9be39dec set timeout in search _try_compile_linearized_w_idx (#4677) 2024-05-22 12:30:31 -04:00