tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 23:38:58 -05:00

Author	SHA1	Message	Date
qazal	c170ddceaf	fix commavq benchmark (#4712 ) * fix _slice and assert explicit device * with _slice	2024-05-24 19:40:57 +03:00
Szymon Ożóg	84255069e7	Fix int8 and uint8 on PTX (#4711 ) * Fix mem type for uchar * Bring tests back	2024-05-24 11:08:52 -04:00
chenyu	a921f3317f	docs: move down tinygrad op and add missing methods (#4710 )	2024-05-24 00:11:12 -04:00
chenyu	12ec02d6a3	docs: example formatting, multi examples, activation inputs (#4709 )	2024-05-23 23:39:02 -04:00
chenyu	4398cc3654	update test_linearizer.py (#4707 ) tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc	2024-05-23 22:41:22 -04:00
chenyu	8aee3f5a9a	docs: split, chunk, pad2d, flatten, unflatten (#4706 )	2024-05-23 20:34:40 -04:00
wozeparrot	2c56aa7fe0	activation function docs (#4705 )	2024-05-23 17:12:16 -07:00
nimlgen	27abbd5b2b	signal pool for nv/amd (#4701 ) * signal pool * useless	2024-05-24 02:09:52 +03:00
Francis Lam	49225522aa	wmma: chain unrolled WMMAs and phi only at the end (#4703 ) * wmma: chain unrolled WMMAs and phi only at the end * fix linter and tests * reduce lines	2024-05-23 17:50:18 -04:00
chenyu	eb714a600d	fix UOps.CAST noop for vectorized dtypes (#4704 ) * == * add test * not lazyop * use str comparison for PtrDType --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-23 17:33:29 -04:00
Szymon Ożóg	00bc2b738c	Fix tensor cores in PTX (#4698 )	2024-05-23 16:27:51 -04:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
qazal	532c9e08e3	proposal: PHI nodes in TC shouldn't have children inside the loop (#4694 ) * expectations from UOpGraph * one with children * minimal repro * replace	2024-05-23 15:11:26 -04:00
chenyu	afb426acaf	docs: gather, cat, stack, repeat, squeeze, unsqueeze (#4697 ) * docs: gather, cat, stack, repeat, squeeze, unsqueeze repeat can take separate args now to match torch * new style for multi examples	2024-05-23 14:20:19 -04:00
chenyu	ce46a7e83f	raise CompileError in metal if newLibraryWithSource_options_error_ fails (#4695 )	2024-05-23 12:52:46 -04:00
Timmy	871a3292f4	Refactors linearizer `acc` to a Dict (#4675 ) * dict accs refactor * bug * linters * fix line length limit * renaming do_reduce to reduce_acc b/c it's the acc for whatever reduce we are doing * reduce_acc is None * x.op and reduce_acc is not None * delete extra check --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-23 19:05:23 +03:00
chenyu	72560e30fe	add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693 ) * add CACHELEVEL=0 to tinybox green GEMM BEAM * BEAM=4 is more stable	2024-05-22 23:59:50 -04:00
Yury Zhuravlev	af56f0e68a	fix HSA/KFD load for system-wide installation (#4218 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-05-22 20:33:21 -07:00
nimlgen	12339f6564	disable cuda test in ci (#4630 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:23:32 -04:00
Szymon Ożóg	9a9963ba7b	Remove uops deepcopy from PTX (#4671 ) * Remove uops deepcopy from PTX * Update test * Fix test * fix for non-ptx * Clean --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:14:17 -04:00
chenyu	47aba47f64	update Torch.gather api (#4692 ) * update Torch.gather api gather(self, dim, index) to match torch * fix that	2024-05-22 21:54:06 -04:00
chenyu	792a494eb8	fix various examples (#4691 ) * fix examples that used ax1 and ax2 for transpose * fix that * update those	2024-05-22 20:43:21 -04:00
wozeparrot	30b07f3c5d	reduce ops (#4690 )	2024-05-22 16:20:56 -07:00
chenyu	a46be6cfef	docs for transpose (#4689 ) * docs for transpose change the arg from ax1, ax2 to dim0, dim1 too * too clever	2024-05-22 18:44:33 -04:00
chenyu	86da83f86d	move movement op docs (#4688 )	2024-05-22 18:09:14 -04:00
qazal	498cf3e7e0	fuzzer path search for DEFINE_ACC (#4656 ) * insert acc * add test_ops * find toposorts * todo - not yet ready * remove the import * atol and childless children	2024-05-23 00:50:01 +03:00
qazal	f11a81f707	isolated test for BEAM=2 llama wrong uops toposort (#4687 ) * add ast * skip test in CI	2024-05-23 00:47:37 +03:00
wozeparrot	6020595eb0	more tensor.py docs (#4686 ) wow much docs	2024-05-22 21:28:26 +00:00
Francis Lam	721f9f6acf	test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685 ) should've been changed with the LOGKERN to LOGKERNS change	2024-05-22 17:08:40 -04:00
chenyu	f8f97562e0	remove File Specific Variables from env_vars.md (#4684 )	2024-05-22 17:00:14 -04:00
chenyu	225dcab3be	prepend `_` to broadcast_shape and deepwalk (#4683 ) * prepend `_` to broadcast_shape and deepwalk internal only * that too	2024-05-22 16:39:05 -04:00
qazal	c5f5755328	correctness test for multireduce nested locals (#4682 ) * nested locals test * move st	2024-05-22 19:35:35 +03:00
chenyu	bc9be39dec	set timeout in search _try_compile_linearized_w_idx (#4677 )	2024-05-22 12:30:31 -04:00
qazal	d12d412e8b	revert uops dtype in pattern matcher (#4681 ) This reverts commit `5f84cbb5df`.	2024-05-22 14:45:51 +03:00
Elias Wahl	acc0039cfc	Resume fix + scheduler for non weight decay params (#4679 ) * move ckpt dir * fix resume. Add scheduler group	2024-05-21 19:38:13 -04:00
chenyu	0f21aa0416	example kernel that triggers Memory access fault for resnet on red (#4678 )	2024-05-21 18:59:36 -04:00
qazal	5f84cbb5df	keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674 ) * these should be val.dtype * cast float4 and float2 to root * document tests * 2 args * fix assert * match dtype * no extra lines * better fix	2024-05-21 14:59:49 -04:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
Timmy	de733d73cf	Multireduce Linearizer Tests (#4665 ) * updated tests * make sure the upcasting tests actually causes the problem * diff cleanup * use UOpGraph utils --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-21 02:43:25 +03:00
chenyu	5e3fbbb33e	llama3 example add manual seed and log seed (#4667 )	2024-05-20 19:09:57 -04:00
chenyu	8c99cc17f5	remove link to old adding_new_accelerators.md (#4666 ) fix #4657	2024-05-20 19:05:23 -04:00
chenyu	c4089d169f	update BEAM_LOCAL_MAX to 1024 (#4664 ) we used 1024 for mlperf submission and result steps time is 20% faster. the default should not be worse	2024-05-20 18:06:32 -04:00
chenyu	704cb1d8a0	fix conversation.py quantize (#4663 ) it used to be true for int8, not it's a string for int8 or nf4	2024-05-20 17:36:37 -04:00
chenyu	ae861325ce	update llama sample for mac 32 input buffer limit (#4662 ) set default sampling params to function call to 0, and top k in llama3 to 25.	2024-05-20 17:23:39 -04:00
Elias Wahl	993091adfa	loss scaler + nan fixes (#4661 )	2024-05-20 17:08:35 -04:00
qazal	b33c827aed	UOps.RANGE toposort spec (#4660 ) * use iterator * nested loops and outer loads * uop after phi	2024-05-20 23:38:20 +03:00
qazal	0d9e623d83	consolidate uops tests (#4659 ) * merge uoptimize * move tests * fix skip message	2024-05-20 21:42:31 +03:00
Szymon Ożóg	1e7b7b2c3c	Fix flop coutning for mulacc (#4640 ) * Fix flop coutning for mulacc * add test_simple_mulacc * Update test_uops_stats.py * Update test_uops_stats.py * revert test_mulacc * Test for MULACC vs MUL+ADD	2024-05-20 12:06:00 -04:00
wozeparrot	b144d4b460	new llama3 example (#4576 )	2024-05-19 22:42:23 -07:00

1 2 3 4 5 ...

4522 Commits