Commit Graph

4517 Commits

Author SHA1 Message Date
chenyu
8aee3f5a9a docs: split, chunk, pad2d, flatten, unflatten (#4706) 2024-05-23 20:34:40 -04:00
wozeparrot
2c56aa7fe0 activation function docs (#4705) 2024-05-23 17:12:16 -07:00
nimlgen
27abbd5b2b signal pool for nv/amd (#4701)
* signal pool

* useless
2024-05-24 02:09:52 +03:00
Francis Lam
49225522aa wmma: chain unrolled WMMAs and phi only at the end (#4703)
* wmma: chain unrolled WMMAs and phi only at the end

* fix linter and tests

* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d fix UOps.CAST noop for vectorized dtypes (#4704)
* ==

* add test

* not lazyop

* use str comparison for PtrDType

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 17:33:29 -04:00
Szymon Ożóg
00bc2b738c Fix tensor cores in PTX (#4698) 2024-05-23 16:27:51 -04:00
chenyu
38bc38cdff fix llama example quantize (#4699)
* fix llama example quantize

import quantize layers from new example llama3

add to mac benchmark

* fix that

* save the files
2024-05-23 15:35:26 -04:00
qazal
532c9e08e3 proposal: PHI nodes in TC shouldn't have children inside the loop (#4694)
* expectations from UOpGraph

* one with children

* minimal repro

* replace
2024-05-23 15:11:26 -04:00
chenyu
afb426acaf docs: gather, cat, stack, repeat, squeeze, unsqueeze (#4697)
* docs: gather, cat, stack, repeat, squeeze, unsqueeze

repeat can take separate args now to match torch

* new style for multi examples
2024-05-23 14:20:19 -04:00
chenyu
ce46a7e83f raise CompileError in metal if newLibraryWithSource_options_error_ fails (#4695) 2024-05-23 12:52:46 -04:00
Timmy
871a3292f4 Refactors linearizer acc to a Dict (#4675)
* dict accs refactor

* bug

* linters

* fix line length limit

* renaming do_reduce to reduce_acc b/c it's the acc for whatever reduce we are doing

* reduce_acc is None

* x.op and reduce_acc is not None

* delete extra check

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 19:05:23 +03:00
chenyu
72560e30fe add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693)
* add CACHELEVEL=0 to tinybox green GEMM BEAM

* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
Yury Zhuravlev
af56f0e68a fix HSA/KFD load for system-wide installation (#4218)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564 disable cuda test in ci (#4630)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:23:32 -04:00
Szymon Ożóg
9a9963ba7b Remove uops deepcopy from PTX (#4671)
* Remove uops deepcopy from PTX

* Update test

* Fix test

* fix for non-ptx

* Clean

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64 update Torch.gather api (#4692)
* update Torch.gather api

gather(self, dim, index) to match torch

* fix that
2024-05-22 21:54:06 -04:00
chenyu
792a494eb8 fix various examples (#4691)
* fix examples that used ax1 and ax2 for transpose

* fix that

* update those
2024-05-22 20:43:21 -04:00
wozeparrot
30b07f3c5d reduce ops (#4690) 2024-05-22 16:20:56 -07:00
chenyu
a46be6cfef docs for transpose (#4689)
* docs for transpose

change the arg from ax1, ax2 to dim0, dim1 too

* too clever
2024-05-22 18:44:33 -04:00
chenyu
86da83f86d move movement op docs (#4688) 2024-05-22 18:09:14 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707 isolated test for BEAM=2 llama wrong uops toposort (#4687)
* add ast

* skip test in CI
2024-05-23 00:47:37 +03:00
wozeparrot
6020595eb0 more tensor.py docs (#4686)
wow much docs
2024-05-22 21:28:26 +00:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
chenyu
f8f97562e0 remove File Specific Variables from env_vars.md (#4684) 2024-05-22 17:00:14 -04:00
chenyu
225dcab3be prepend _ to broadcast_shape and deepwalk (#4683)
* prepend `_` to broadcast_shape and deepwalk

internal only

* that too
2024-05-22 16:39:05 -04:00
qazal
c5f5755328 correctness test for multireduce nested locals (#4682)
* nested locals test

* move st
2024-05-22 19:35:35 +03:00
chenyu
bc9be39dec set timeout in search _try_compile_linearized_w_idx (#4677) 2024-05-22 12:30:31 -04:00
qazal
d12d412e8b revert uops dtype in pattern matcher (#4681)
This reverts commit 5f84cbb5df.
2024-05-22 14:45:51 +03:00
Elias Wahl
acc0039cfc Resume fix + scheduler for non weight decay params (#4679)
* move ckpt dir

* fix resume. Add scheduler group
2024-05-21 19:38:13 -04:00
chenyu
0f21aa0416 example kernel that triggers Memory access fault for resnet on red (#4678) 2024-05-21 18:59:36 -04:00
qazal
5f84cbb5df keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674)
* these should be val.dtype

* cast float4 and float2 to root

* document tests

* 2 args

* fix assert

* match dtype

* no extra lines

* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
wozeparrot
00432496d7 feat: tinyboxgreen (#4366)
* feat: tinyboxgreen

* feat: tinyboxgreenv2

* fix symlink weights

* fix: remove llama 2 70b for now

* feat: naming

* fix: remove extra cifar steps

* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
Timmy
de733d73cf Multireduce Linearizer Tests (#4665)
* updated tests

* make sure the upcasting tests actually causes the problem

* diff cleanup

* use UOpGraph utils

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-21 02:43:25 +03:00
chenyu
5e3fbbb33e llama3 example add manual seed and log seed (#4667) 2024-05-20 19:09:57 -04:00
chenyu
8c99cc17f5 remove link to old adding_new_accelerators.md (#4666)
fix #4657
2024-05-20 19:05:23 -04:00
chenyu
c4089d169f update BEAM_LOCAL_MAX to 1024 (#4664)
we used 1024 for mlperf submission and result steps time is 20% faster. the default should not be worse
2024-05-20 18:06:32 -04:00
chenyu
704cb1d8a0 fix conversation.py quantize (#4663)
it used to be true for int8, not it's a string for int8 or nf4
2024-05-20 17:36:37 -04:00
chenyu
ae861325ce update llama sample for mac 32 input buffer limit (#4662)
set default sampling params to function call to 0, and top k in llama3 to 25.
2024-05-20 17:23:39 -04:00
Elias Wahl
993091adfa loss scaler + nan fixes (#4661) 2024-05-20 17:08:35 -04:00
qazal
b33c827aed UOps.RANGE toposort spec (#4660)
* use iterator

* nested loops and outer loads

* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83 consolidate uops tests (#4659)
* merge uoptimize

* move tests

* fix skip message
2024-05-20 21:42:31 +03:00
Szymon Ożóg
1e7b7b2c3c Fix flop coutning for mulacc (#4640)
* Fix flop coutning for mulacc

* add test_simple_mulacc

* Update test_uops_stats.py

* Update test_uops_stats.py

* revert test_mulacc

* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
wozeparrot
b144d4b460 new llama3 example (#4576) 2024-05-19 22:42:23 -07:00
nimlgen
c9f7f2da70 nv hcq bind api (#4629)
* hcq bind api for nv

* linter

* linter

* add test

* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a correctly insert UOps.END* in fuzz result (#4653) 2024-05-19 21:10:28 +03:00
chenyu
456aa0b656 update test_search kernel count (#4652)
integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer
2024-05-19 13:54:52 -04:00
qazal
954718e6bf reorder DEFINE_GLOBAL in fuzz_uops (#4651)
* globals base

* test: opt out of DEFINE_GLOBAL

* do it like ExecItem
2024-05-19 20:51:31 +03:00
Léo
967e35f8b8 fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598)
* fix(beam): GlobalCounters kernel count increasing when clearing l2

* fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize()

* test(search): regression test for _time_program, should not increment kernel_count

* fix(test_search): unused var and now properly checking when l2 is cleared

* fix(test_search): added assert message

* fix(test_search): now testing public beam api for kcount

* ruff fixes

---------

Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>
2024-05-19 10:03:47 -07:00