qazal
c7b1d802f1
delete duplicate tests in test_linearizer ( #4723 )
...
* delete duplicate test
test_simplify_uop isnt needed
max works
* ci
* remove skip
* add skip back
2024-05-26 08:11:42 +03:00
nimlgen
c87b066b66
optimize nv sync ( #4729 )
...
* optimize nv sync
* sdma signal without wfi
* nv mockgou support
* sep change
2024-05-25 23:10:41 +03:00
chenyu
8415b14978
pow cleanup part 3 ( #4731 )
...
fast pow for int or (int+0.5) const exponent. and more comments
2024-05-25 15:48:52 -04:00
Szymon Ożóg
de5c69c4c9
Unify test_dtype naming conventions ( #4730 )
2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0
pow cleanup part 2 ( #4727 )
...
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
85e57223bd
pow cleanup part 1 ( #4726 )
...
use _broadcasted to convert 3 cases into 1. const simplification should be handled by const folding.
2024-05-25 03:24:10 -04:00
Szymon Ożóg
f7201b6852
Remove deprecated code ( #4724 )
2024-05-25 03:02:12 -04:00
wozeparrot
5f503226de
finish tensor docs ( #4722 )
2024-05-24 15:57:43 -07:00
chenyu
edf27470c1
docs: fix stack and add dtype.DType ( #4721 )
2024-05-24 18:23:01 -04:00
chenyu
a16d2572a0
docs: clean up mentions of mlops ( #4720 )
2024-05-24 17:49:32 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
ba116ff630
docs: fix mnist type and fixed seed and loss ( #4717 )
2024-05-24 16:18:30 -04:00
Szymon Ożóg
212025b53c
Int mulacc for ptx ( #4680 )
...
* IntMulacc
* don't mov const
* Dont do int mulacc on ocelot
* Workaround for ocelot
* Remove ocelot workaround
* Fix tests that merged into mulacc
* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
chenyu
0ac761716a
docs: logo and favicon ( #4716 )
2024-05-24 14:33:12 -04:00
Szymon Ożóg
a4de81e9a6
Update ocelot version ( #4715 )
2024-05-24 14:32:53 -04:00
chenyu
a894209bf7
docs: add ConstType to dtypes, limit function to its member ( #4714 )
2024-05-24 14:22:34 -04:00
chenyu
a41701ce71
docs: elementwise ops (broadcasted) and update examples ( #4713 )
...
* docs: elementwise ops (broadcasted) and update examples
* fix where
* space
2024-05-24 13:19:21 -04:00
qazal
c170ddceaf
fix commavq benchmark ( #4712 )
...
* fix _slice and assert explicit device
* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7
Fix int8 and uint8 on PTX ( #4711 )
...
* Fix mem type for uchar
* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
a921f3317f
docs: move down tinygrad op and add missing methods ( #4710 )
2024-05-24 00:11:12 -04:00
chenyu
12ec02d6a3
docs: example formatting, multi examples, activation inputs ( #4709 )
2024-05-23 23:39:02 -04:00
chenyu
4398cc3654
update test_linearizer.py ( #4707 )
...
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
chenyu
8aee3f5a9a
docs: split, chunk, pad2d, flatten, unflatten ( #4706 )
2024-05-23 20:34:40 -04:00
wozeparrot
2c56aa7fe0
activation function docs ( #4705 )
2024-05-23 17:12:16 -07:00
nimlgen
27abbd5b2b
signal pool for nv/amd ( #4701 )
...
* signal pool
* useless
2024-05-24 02:09:52 +03:00
Francis Lam
49225522aa
wmma: chain unrolled WMMAs and phi only at the end ( #4703 )
...
* wmma: chain unrolled WMMAs and phi only at the end
* fix linter and tests
* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d
fix UOps.CAST noop for vectorized dtypes ( #4704 )
...
* ==
* add test
* not lazyop
* use str comparison for PtrDType
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-23 17:33:29 -04:00
Szymon Ożóg
00bc2b738c
Fix tensor cores in PTX ( #4698 )
2024-05-23 16:27:51 -04:00
chenyu
38bc38cdff
fix llama example quantize ( #4699 )
...
* fix llama example quantize
import quantize layers from new example llama3
add to mac benchmark
* fix that
* save the files
2024-05-23 15:35:26 -04:00
qazal
532c9e08e3
proposal: PHI nodes in TC shouldn't have children inside the loop ( #4694 )
...
* expectations from UOpGraph
* one with children
* minimal repro
* replace
2024-05-23 15:11:26 -04:00
chenyu
afb426acaf
docs: gather, cat, stack, repeat, squeeze, unsqueeze ( #4697 )
...
* docs: gather, cat, stack, repeat, squeeze, unsqueeze
repeat can take separate args now to match torch
* new style for multi examples
2024-05-23 14:20:19 -04:00
chenyu
ce46a7e83f
raise CompileError in metal if newLibraryWithSource_options_error_ fails ( #4695 )
2024-05-23 12:52:46 -04:00
Timmy
871a3292f4
Refactors linearizer acc to a Dict ( #4675 )
...
* dict accs refactor
* bug
* linters
* fix line length limit
* renaming do_reduce to reduce_acc b/c it's the acc for whatever reduce we are doing
* reduce_acc is None
* x.op and reduce_acc is not None
* delete extra check
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-23 19:05:23 +03:00
chenyu
72560e30fe
add CACHELEVEL=0 to tinybox green GEMM BEAM ( #4693 )
...
* add CACHELEVEL=0 to tinybox green GEMM BEAM
* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
Yury Zhuravlev
af56f0e68a
fix HSA/KFD load for system-wide installation ( #4218 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564
disable cuda test in ci ( #4630 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-22 23:23:32 -04:00
Szymon Ożóg
9a9963ba7b
Remove uops deepcopy from PTX ( #4671 )
...
* Remove uops deepcopy from PTX
* Update test
* Fix test
* fix for non-ptx
* Clean
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64
update Torch.gather api ( #4692 )
...
* update Torch.gather api
gather(self, dim, index) to match torch
* fix that
2024-05-22 21:54:06 -04:00
chenyu
792a494eb8
fix various examples ( #4691 )
...
* fix examples that used ax1 and ax2 for transpose
* fix that
* update those
2024-05-22 20:43:21 -04:00
wozeparrot
30b07f3c5d
reduce ops ( #4690 )
2024-05-22 16:20:56 -07:00
chenyu
a46be6cfef
docs for transpose ( #4689 )
...
* docs for transpose
change the arg from ax1, ax2 to dim0, dim1 too
* too clever
2024-05-22 18:44:33 -04:00
chenyu
86da83f86d
move movement op docs ( #4688 )
2024-05-22 18:09:14 -04:00
qazal
498cf3e7e0
fuzzer path search for DEFINE_ACC ( #4656 )
...
* insert acc
* add test_ops
* find toposorts
* todo - not yet ready
* remove the import
* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707
isolated test for BEAM=2 llama wrong uops toposort ( #4687 )
...
* add ast
* skip test in CI
2024-05-23 00:47:37 +03:00
wozeparrot
6020595eb0
more tensor.py docs ( #4686 )
...
wow much docs
2024-05-22 21:28:26 +00:00
Francis Lam
721f9f6acf
test/external/verify_kernel: fix LOGKERNS variable name in comments ( #4685 )
...
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
chenyu
f8f97562e0
remove File Specific Variables from env_vars.md ( #4684 )
2024-05-22 17:00:14 -04:00
chenyu
225dcab3be
prepend _ to broadcast_shape and deepwalk ( #4683 )
...
* prepend `_` to broadcast_shape and deepwalk
internal only
* that too
2024-05-22 16:39:05 -04:00
qazal
c5f5755328
correctness test for multireduce nested locals ( #4682 )
...
* nested locals test
* move st
2024-05-22 19:35:35 +03:00
chenyu
bc9be39dec
set timeout in search _try_compile_linearized_w_idx ( #4677 )
2024-05-22 12:30:31 -04:00