tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 21:38:10 -05:00

Author	SHA1	Message	Date
Roelof van Dijk	1900acda09	[READY] ci: setup venv cache (#1475 ) * ci: cache installed packages * ci: trigger jobs * ci: fix hashfiles argument --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-20 18:43:16 -07:00
Umut Zengin	3fc7e984f0	__getitem__ refactoring (#1586 ) * dene * dene * form * form * form * form * lint * small change * preserve old * revert to explicit reshape	2023-08-20 18:42:30 -07:00
George Hotz	d627349af0	teeny changes (#1589 ) * teeny changes * import order	2023-08-20 13:38:38 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00
George Hotz	739f327d2d	Shorter (#1582 ) * deleting lines * remove insert dims * if statement is never hit * bug fixes	2023-08-20 08:12:16 -07:00
David Hou	4fbce972d7	CSE at uop level (#1483 ) * uop-level cse * add test * don't cache reduce alu ops * types * rename variable * fix * delete lines	2023-08-19 23:40:40 -07:00
George Hotz	b9feb1b743	fp16 support in stable diffusion	2023-08-20 05:37:21 +00:00
George Hotz	ad7d26c393	fix __launch_bounds__ and benchmark TC MATMUL (#1575 ) * fix * benchmark matmul	2023-08-19 10:54:39 -07:00
David Hou	92754e177c	cache buffer loads across multiple bufs (#1482 ) * cache loads across buffers (since they may share rawbufs) * typing * add test * fix test * small changes to test * fix test * one big cache * whitespace * golf a line? * invalid is RawBuffer(0)[0], valid 1.	2023-08-19 09:09:58 -07:00
George Hotz	e464442adf	WMMA for 7900XTX (#1563 ) * go * hip no LRU * work * works * 16 TFLOPS * 29 TFLOPS * 30 TFLOPS * never mind, it's 60 TFLOPS * fix metal WMMA * put hip alloc back	2023-08-19 09:07:23 -07:00
nimlgen	faa521bcab	fix usage of arm64 regs according to CC (#1570 )	2023-08-18 21:40:32 -07:00
corranr	68ebbd2954	for issue #1555 , int64 and int8 in CI=1 ARM64=1 CLANG=1 (#1572 ) * fixed for int8,int64, added dtype broadcasting test, passing all CI,ARM64,CLANG tests * remove shifts	2023-08-18 21:40:13 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
Roelof van Dijk	84e6693915	fix: apt-get to apt, no recommends, clean up (#1571 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-18 13:48:59 -07:00
wozeparrot	50decf0d45	train cifar using multigpu (#1529 ) * feat: train cifar using multigpu * feat: split eval batch across 5 * feat: cleaner allreduce * feat: 93.88% * feat: cleaner batch chunking from bert * feat: cleaner grad sync * feat: tinygrad argmax * feat: make it work with different gpu counts * feat: move some stuff into the normal __init__ * feat: autodetect gpu count * feat: move import inside	2023-08-18 09:35:44 -07:00
chenyu	be50b2fe8f	more symbolic symbolic ops (#1564 ) * more symbolic symbolic ops * handle NumNode in __mul__	2023-08-18 09:21:41 -07:00
chenyu	dfec16cc83	Support arg int for CUDA kernel (#1565 )	2023-08-18 09:19:40 -07:00
wozeparrot	15150d60c4	fix: small fix for lru on hip (#1567 )	2023-08-18 09:18:38 -07:00
wozeparrot	c65ad43a93	cleanup ops_gpu (#1566 )	2023-08-17 23:43:08 -04:00
nimlgen	bd111411bf	init allocator for compiled backends (#1467 ) * init allocator for compiled backends * Update ops_webgpu.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-17 10:33:32 -07:00
geohotstan	a293c18d34	Gather bugfix (#1561 )	2023-08-16 19:53:14 -04:00
Ethan Sorrell	cb62911f6b	PTX Reintegration and Passing Tests (#1512 ) * move assembly, assembly_ptx * successful but broken rendering of ptx asm * clear ins before render asm * slightly less broken :') * we needed thread syncs * fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half * Fix runtime_args for gpuocelot * our casts were flipped on both ends * more casting * add ternary where op * dealing with storing/loading bool * add test for casting to bool from negative * Fix args.valid on ConstOp * add to CI, TODO: fix runtime_args for test_uops * fix placement of runtime_args to work with lazy.Device * undo ci changes so I can push * fix lints * start cleanup and fix things we broke fixing lints * add checks for PTX specifc asm instructions * revert added test -- doesn't pass on llvm * skip tests for underflow,overflow * another fix for how we're setting runtime args * Less broken cleanup * add to CI * add more env variables for ci test * fix ci to install pycuda for ptx * ci: copy cuda test command * cleanup * assert to make sure we're actually running ptx in ci * remove test assert * move is_ptx arg * move assembly, assembly_ptx back to extras * fix imports * initial merge fixes * clear registers, fix UOps.LOAD with invalid value * draft merge fixes * remove prints * quick lint and merge fixes * cleanup * remove PTXProgram wrapper * final cleanup * temp change for ci rerun * ci rerun * rollback ISA version	2023-08-16 16:20:20 -07:00
geohotstan	8763037f0e	Fancy indexing is fancy wow and gather thing (#1399 )	2023-08-16 18:35:49 -04:00
chenyu	11dd9b1741	symbolic codegen and exec (#1552 ) * symbolic codegen and exec * fix and add test * no sketchy * merge_dicts type * dtypes._arg_int32	2023-08-16 14:43:41 -07:00
George Hotz	1e1d48b4e6	single model (#1560 )	2023-08-16 13:22:19 -07:00
JaSpa99	491e85597a	Run onnx commavq model (#1537 ) * try to run commavq * fix 0 dim, start implementing new ops - Implement EmbedLayerNormalization - Implement Attention * SkipLayerNormalization and FastGelu * use original torch model, cast inputs * fix some ops: - properly do Cast - Attention: bi- and unidirectional - FastGelu: add bias before gelu * cleanup onnx_ops.py * add validation option to benchmark * cleanup imports * add checks incase onnx2torch implements ops in future * run onnx instead of original torch * just skip gpu on m1 * reactivate the other models * check for strange params & squash whitespace * cleanup * fix causal mask Attention * Range doesn't need int cast * embedding vocab_counter same dtype as input * no need to cast * always validate, fix PosixPath ort --------- Co-authored-by: George Hotz <george@comma.ai>	2023-08-16 12:24:40 -07:00
wozeparrot	55d95d1658	llama 70b (#1558 ) * feat: llama 70b * feat: llama 70b but simpler	2023-08-16 11:36:12 -07:00
nimlgen	c93e63b8b5	make TestNonFloatUOps.test_mul_bool pass on all platforms (#1557 )	2023-08-16 11:34:09 -07:00
wozeparrot	074c467020	hotfix for broken ci (#1559 )	2023-08-16 13:52:03 -04:00
madt2709	962972ee68	Fix uops int32 for llvm (#1554 ) * fix-uops-int32-llvm * fix tests * Ignore mypy error	2023-08-15 23:22:32 -07:00
Sam Barani	2cde667d40	Change Any to List[Optional[RawBuffer]] in JIT (#1553 ) * Change Any to List[Optional[RawBuffer]] in JIT * remove ignore[no-redef] * remove ignore * pick different names	2023-08-15 23:21:33 -07:00
nimlgen	fa81e282c2	fix missing dtypes in is_int,is_float,is_unsigned (#1550 )	2023-08-15 21:22:29 -04:00
Diogo	d17ecccd78	Torch/LLVM/arm F64 support (#1551 )	2023-08-15 21:21:08 -04:00
YiMing Han	913263c155	add double: c_type.double for CLANG (#1549 )	2023-08-15 13:19:33 -07:00
George Hotz	0b5930d406	more uops testing, who isn't passing right now... (#1522 ) * more uops * llvm refactor * update test uops * rest of the nodes * ors and ands	2023-08-15 09:07:26 -07:00
George Hotz	f8109b830c	promote assembly to the main codebase (#1544 ) * promote assembly to the main codebase * not namedtuple	2023-08-14 22:47:45 -07:00
wozeparrot	666ac61070	support for p2p buffer transfers (#1523 ) * feat: RawBufferTransfer * feat: gate behind P2P * feat: gate properly * feat: raise error when not implemented	2023-08-14 22:39:57 -07:00
Steven Anderson	93a36c3659	Arm (#1421 ) * testing new memops * better debugging * testing padded conv * branching with load * refactoring a bit * first try * fixing bugs * fixing some * eq * eq2 * do not use x's * working * fixing imm * getting things working * refactor * pow not working * working except one * refactor: one store mem * refactor: global load * refactor: imm * refactor: cleaning * fixing big offsets * refactor with ci * try ci * typo * another typo * ubuntu default * forgot git * do i need git? * missing packages * adding python-dev * with cache? * buildx action * buildx name issue? * maybe now? * python3 * newline warning * maybe now * i actually need this * ci should work now * improved caching * fixing cache * maybe now it will cache * this * testing cache * trying again * load * missing platform * caching gha * testing cache * full testing * typo * now? * why * adding checkout back * bad formatting * fixing convention issues * supporting python * adding CI flag * testing all * better comments * adding debugging * takes 12x longer * does it output progress now? * ignore models for speed * fixing merge * excluding conv_transpose2d * only 2 test cuz is to slow * another approach * let's see * faster duh * my bad * T_T * typo * sup * with output? * comment test * comment test * comment test * :? * no comment * with cache * back to normal * testing that ci works * back to passing * trying again * does it create another entry * does it create another entry? * build local * hey * Revert "excluding conv_transpose2d" This reverts commit `cc7348de03`. * does it cache if done before? * does it cache? * done * adding test ops * bad formatting * no need for this * working static mem * sum 1d * add ndim * better reg import * fix stack * back to np * working except for softmax * 5 failing * no pogress * remove keystone * remove keystone * testops passing * cleanups * more cleanup * typo * ci * ci2 * cond import * ci3 * ci4 * ci4 * ci5 * ci5 * ci6 * aligment * test all * correct test * err read_unmapped * passing test * ignore for speed * ignore for speed * ci7 * cleanup * remove docker * fixing merge * fixing bugs * add skipload for const ops * comments * First merge to master: Renderer * fix emulation * passing all tests arm64 * cleaning * fix handcoded binary * cleaning * fix errs * fix runtime arg binary * clean git diff * fix and clean * fixing metal test * cleaning * fix metal test * ci ~8 min * fix pylint and clang * cache the files in ops_clang --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-14 19:29:30 -07:00
chenyu	a89142e46f	ShapeTracker.var_vals (#1540 )	2023-08-14 18:53:37 -07:00
Pavol Rusnak	a453d718a1	fix file race condition in ops_clang via pid in the filename (#1541 ) * fix file race condition in ops_clang via pid in the filename as suggested in https://github.com/tinygrad/tinygrad/pull/1458/files#r1292819054 * add explanation why a temp file is required on ops_clang	2023-08-14 18:50:10 -07:00
wozeparrot	9cb2bda34f	Revert "Better reshape (#1423 )" (#1538 )	2023-08-14 13:04:54 -04:00
Sieds Lykles	cf2bf1518d	Better reshape (#1423 ) * do reshaping without merge_views and reshape masks * added tests * properly do reshaping of zero or negative masks * replace while loop with single expression * remove old condition * add more tests and comments * remove empty file	2023-08-14 09:09:04 -07:00
YiMing Han	e00acb1eaf	fix deepwalk ctx check (#1536 )	2023-08-13 23:03:17 -07:00
JaSpa99	2fd7004980	Implementation of SoftVC VITS SVC model (#1371 ) * [WIP]: implementation of SoftVC VITS SVC model * fix typo * fix whitespace * Fully implement Generator & Synthesizer - implement SineGen & SourceHnNSF to reconstruct source signal from F0 - source signal is added during Generator - fix various typos - start loading state dict for synthesizer * Load Synthesizer weights - Fix typos in Synthesizer - Slightly modify vits::load_checkpoint to skip a specified layer - Test with Saul Goodman model because Drake weights are on mega * start work on ContentVec - implement ConvFeatureExtractionModel for ContentVec - start work on TransformerEncoder for ContentVec: - this transformer probably needs its own MultiheadAttention implementation - fix various typos in synthesizer - add helpers to mask behavior of ~ and % operator of torch * use normal and kaiming_normal * Implement ContentVec - load ContentVec weights and config from fairseq hyperparams - use MultiHeadAttention from whisper.py - TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing - redid tilde() - some cleanup * rename the file so it can be imported * forgot to lint * use float() instead of cast() * add contentvec256l9 and cleanup * Implement SoVITS fully and run it - Fully run sovits with .wav file - Drake weights need to be manually downloaded for now - Fix bugs - Add examples/sovits_helpers - Big TODO: INVALID Kernel for recordings > 4.5 secs * temp fix for longer audio recordings * Upsample no more torch * cleanup & detailed inference time measuring * Completely remove torch(audio) - Implement sinc resample in tinygrad - Load audio via Soundfile - Some cleanups * move stuff to helper files * Cleanup * fix invalid kernel * Cleanup & add more models * Metal sounds good after master merge - But Synthesizer pass became much slower * drake weights now marked save * do load/store in numpy * no commas needed here * remove extra newline * call Tensor::where on object * use Tensor::cat instead of numpy * pull out first iteration * remove Sequential, Dropout, GELU, TransposeLast * cast during loading * clean up attention * remove SamePad * Major cleanup / line reduction - Finish implementation of GroupNormMasked - Simplify parts of TransformerEncoder - Simplify parts of Generator - Move all helpers to common section - Only use repeat_expand_left for interp after SpeechEncoder - Moved SVC-specfic ContentVec impls up (canonically) - Proper annotations for get_encoder - Finished all TODOs - Squashed some whitespaces * clean up preprocess as well * more straightforward bool expr * add demo mode	2023-08-13 19:43:23 -07:00
nimlgen	b6937acb7e	fix casting behavior for interpreted buffers (#1525 )	2023-08-13 19:21:37 -07:00
David Heidelberg	13659ac6fa	examples: numpy() array returns only one value, not an array (#1534 ) Fixes issue: ``` loss_cpu = loss.detach().numpy()[0] ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed ``` Signed-off-by: David Heidelberg <david@ixit.cz>	2023-08-13 14:33:05 -07:00
chenyu	3e0c2d256f	symbolic shapetracker (#1506 ) * symbolic shapetracker * no need * keep only symbolic and clean up * explicit // and % Node support * NumNode * Node	2023-08-12 12:22:58 -07:00
Pavol Rusnak	875da762a8	fix file race condition in ops_clang (#1458 )	2023-08-12 09:31:46 -07:00
JaSpa99	d3d58a37e5	Bert: use Tensor.scaled_dot_product_attention (#1528 ) * use scaled attn from Tensor * add a test for bert * linter * no more tokenizer * without loading weights * remove prints * tribute to linter lords * smaller input and less runs * small bert	2023-08-12 08:46:04 -07:00
Szymon Ożóg	330fb7b1a3	Print more meaningfull hip error messages (#1530 )	2023-08-12 07:16:20 -07:00

1 2 3 4 5 ...

2311 Commits