Commit Graph

4522 Commits

Author SHA1 Message Date
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7 Fix int8 and uint8 on PTX (#4711)
* Fix mem type for uchar

* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
a921f3317f docs: move down tinygrad op and add missing methods (#4710) 2024-05-24 00:11:12 -04:00
chenyu
12ec02d6a3 docs: example formatting, multi examples, activation inputs (#4709) 2024-05-23 23:39:02 -04:00
chenyu
4398cc3654 update test_linearizer.py (#4707)
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
chenyu
8aee3f5a9a docs: split, chunk, pad2d, flatten, unflatten (#4706) 2024-05-23 20:34:40 -04:00
wozeparrot
2c56aa7fe0 activation function docs (#4705) 2024-05-23 17:12:16 -07:00
nimlgen
27abbd5b2b signal pool for nv/amd (#4701)
* signal pool

* useless
2024-05-24 02:09:52 +03:00
Francis Lam
49225522aa wmma: chain unrolled WMMAs and phi only at the end (#4703)
* wmma: chain unrolled WMMAs and phi only at the end

* fix linter and tests

* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d fix UOps.CAST noop for vectorized dtypes (#4704)
* ==

* add test

* not lazyop

* use str comparison for PtrDType

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 17:33:29 -04:00
Szymon Ożóg
00bc2b738c Fix tensor cores in PTX (#4698) 2024-05-23 16:27:51 -04:00
chenyu
38bc38cdff fix llama example quantize (#4699)
* fix llama example quantize

import quantize layers from new example llama3

add to mac benchmark

* fix that

* save the files
2024-05-23 15:35:26 -04:00
qazal
532c9e08e3 proposal: PHI nodes in TC shouldn't have children inside the loop (#4694)
* expectations from UOpGraph

* one with children

* minimal repro

* replace
2024-05-23 15:11:26 -04:00
chenyu
afb426acaf docs: gather, cat, stack, repeat, squeeze, unsqueeze (#4697)
* docs: gather, cat, stack, repeat, squeeze, unsqueeze

repeat can take separate args now to match torch

* new style for multi examples
2024-05-23 14:20:19 -04:00
chenyu
ce46a7e83f raise CompileError in metal if newLibraryWithSource_options_error_ fails (#4695) 2024-05-23 12:52:46 -04:00
Timmy
871a3292f4 Refactors linearizer acc to a Dict (#4675)
* dict accs refactor

* bug

* linters

* fix line length limit

* renaming do_reduce to reduce_acc b/c it's the acc for whatever reduce we are doing

* reduce_acc is None

* x.op and reduce_acc is not None

* delete extra check

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 19:05:23 +03:00
chenyu
72560e30fe add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693)
* add CACHELEVEL=0 to tinybox green GEMM BEAM

* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
Yury Zhuravlev
af56f0e68a fix HSA/KFD load for system-wide installation (#4218)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564 disable cuda test in ci (#4630)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:23:32 -04:00
Szymon Ożóg
9a9963ba7b Remove uops deepcopy from PTX (#4671)
* Remove uops deepcopy from PTX

* Update test

* Fix test

* fix for non-ptx

* Clean

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64 update Torch.gather api (#4692)
* update Torch.gather api

gather(self, dim, index) to match torch

* fix that
2024-05-22 21:54:06 -04:00
chenyu
792a494eb8 fix various examples (#4691)
* fix examples that used ax1 and ax2 for transpose

* fix that

* update those
2024-05-22 20:43:21 -04:00
wozeparrot
30b07f3c5d reduce ops (#4690) 2024-05-22 16:20:56 -07:00
chenyu
a46be6cfef docs for transpose (#4689)
* docs for transpose

change the arg from ax1, ax2 to dim0, dim1 too

* too clever
2024-05-22 18:44:33 -04:00
chenyu
86da83f86d move movement op docs (#4688) 2024-05-22 18:09:14 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707 isolated test for BEAM=2 llama wrong uops toposort (#4687)
* add ast

* skip test in CI
2024-05-23 00:47:37 +03:00
wozeparrot
6020595eb0 more tensor.py docs (#4686)
wow much docs
2024-05-22 21:28:26 +00:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
chenyu
f8f97562e0 remove File Specific Variables from env_vars.md (#4684) 2024-05-22 17:00:14 -04:00
chenyu
225dcab3be prepend _ to broadcast_shape and deepwalk (#4683)
* prepend `_` to broadcast_shape and deepwalk

internal only

* that too
2024-05-22 16:39:05 -04:00
qazal
c5f5755328 correctness test for multireduce nested locals (#4682)
* nested locals test

* move st
2024-05-22 19:35:35 +03:00
chenyu
bc9be39dec set timeout in search _try_compile_linearized_w_idx (#4677) 2024-05-22 12:30:31 -04:00
qazal
d12d412e8b revert uops dtype in pattern matcher (#4681)
This reverts commit 5f84cbb5df.
2024-05-22 14:45:51 +03:00
Elias Wahl
acc0039cfc Resume fix + scheduler for non weight decay params (#4679)
* move ckpt dir

* fix resume. Add scheduler group
2024-05-21 19:38:13 -04:00
chenyu
0f21aa0416 example kernel that triggers Memory access fault for resnet on red (#4678) 2024-05-21 18:59:36 -04:00
qazal
5f84cbb5df keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674)
* these should be val.dtype

* cast float4 and float2 to root

* document tests

* 2 args

* fix assert

* match dtype

* no extra lines

* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
wozeparrot
00432496d7 feat: tinyboxgreen (#4366)
* feat: tinyboxgreen

* feat: tinyboxgreenv2

* fix symlink weights

* fix: remove llama 2 70b for now

* feat: naming

* fix: remove extra cifar steps

* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
Timmy
de733d73cf Multireduce Linearizer Tests (#4665)
* updated tests

* make sure the upcasting tests actually causes the problem

* diff cleanup

* use UOpGraph utils

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-21 02:43:25 +03:00
chenyu
5e3fbbb33e llama3 example add manual seed and log seed (#4667) 2024-05-20 19:09:57 -04:00
chenyu
8c99cc17f5 remove link to old adding_new_accelerators.md (#4666)
fix #4657
2024-05-20 19:05:23 -04:00
chenyu
c4089d169f update BEAM_LOCAL_MAX to 1024 (#4664)
we used 1024 for mlperf submission and result steps time is 20% faster. the default should not be worse
2024-05-20 18:06:32 -04:00
chenyu
704cb1d8a0 fix conversation.py quantize (#4663)
it used to be true for int8, not it's a string for int8 or nf4
2024-05-20 17:36:37 -04:00
chenyu
ae861325ce update llama sample for mac 32 input buffer limit (#4662)
set default sampling params to function call to 0, and top k in llama3 to 25.
2024-05-20 17:23:39 -04:00
Elias Wahl
993091adfa loss scaler + nan fixes (#4661) 2024-05-20 17:08:35 -04:00
qazal
b33c827aed UOps.RANGE toposort spec (#4660)
* use iterator

* nested loops and outer loads

* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83 consolidate uops tests (#4659)
* merge uoptimize

* move tests

* fix skip message
2024-05-20 21:42:31 +03:00
Szymon Ożóg
1e7b7b2c3c Fix flop coutning for mulacc (#4640)
* Fix flop coutning for mulacc

* add test_simple_mulacc

* Update test_uops_stats.py

* Update test_uops_stats.py

* revert test_mulacc

* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
wozeparrot
b144d4b460 new llama3 example (#4576) 2024-05-19 22:42:23 -07:00