tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-09 22:26:26 -05:00

Author	SHA1	Message	Date
wozeparrot	074c467020	hotfix for broken ci (#1559 )	2023-08-16 13:52:03 -04:00
madt2709	962972ee68	Fix uops int32 for llvm (#1554 ) * fix-uops-int32-llvm * fix tests * Ignore mypy error	2023-08-15 23:22:32 -07:00
Sam Barani	2cde667d40	Change Any to List[Optional[RawBuffer]] in JIT (#1553 ) * Change Any to List[Optional[RawBuffer]] in JIT * remove ignore[no-redef] * remove ignore * pick different names	2023-08-15 23:21:33 -07:00
nimlgen	fa81e282c2	fix missing dtypes in is_int,is_float,is_unsigned (#1550 )	2023-08-15 21:22:29 -04:00
Diogo	d17ecccd78	Torch/LLVM/arm F64 support (#1551 )	2023-08-15 21:21:08 -04:00
YiMing Han	913263c155	add double: c_type.double for CLANG (#1549 )	2023-08-15 13:19:33 -07:00
George Hotz	0b5930d406	more uops testing, who isn't passing right now... (#1522 ) * more uops * llvm refactor * update test uops * rest of the nodes * ors and ands	2023-08-15 09:07:26 -07:00
George Hotz	f8109b830c	promote assembly to the main codebase (#1544 ) * promote assembly to the main codebase * not namedtuple	2023-08-14 22:47:45 -07:00
wozeparrot	666ac61070	support for p2p buffer transfers (#1523 ) * feat: RawBufferTransfer * feat: gate behind P2P * feat: gate properly * feat: raise error when not implemented	2023-08-14 22:39:57 -07:00
Steven Anderson	93a36c3659	Arm (#1421 ) * testing new memops * better debugging * testing padded conv * branching with load * refactoring a bit * first try * fixing bugs * fixing some * eq * eq2 * do not use x's * working * fixing imm * getting things working * refactor * pow not working * working except one * refactor: one store mem * refactor: global load * refactor: imm * refactor: cleaning * fixing big offsets * refactor with ci * try ci * typo * another typo * ubuntu default * forgot git * do i need git? * missing packages * adding python-dev * with cache? * buildx action * buildx name issue? * maybe now? * python3 * newline warning * maybe now * i actually need this * ci should work now * improved caching * fixing cache * maybe now it will cache * this * testing cache * trying again * load * missing platform * caching gha * testing cache * full testing * typo * now? * why * adding checkout back * bad formatting * fixing convention issues * supporting python * adding CI flag * testing all * better comments * adding debugging * takes 12x longer * does it output progress now? * ignore models for speed * fixing merge * excluding conv_transpose2d * only 2 test cuz is to slow * another approach * let's see * faster duh * my bad * T_T * typo * sup * with output? * comment test * comment test * comment test * :? * no comment * with cache * back to normal * testing that ci works * back to passing * trying again * does it create another entry * does it create another entry? * build local * hey * Revert "excluding conv_transpose2d" This reverts commit `cc7348de03`. * does it cache if done before? * does it cache? * done * adding test ops * bad formatting * no need for this * working static mem * sum 1d * add ndim * better reg import * fix stack * back to np * working except for softmax * 5 failing * no pogress * remove keystone * remove keystone * testops passing * cleanups * more cleanup * typo * ci * ci2 * cond import * ci3 * ci4 * ci4 * ci5 * ci5 * ci6 * aligment * test all * correct test * err read_unmapped * passing test * ignore for speed * ignore for speed * ci7 * cleanup * remove docker * fixing merge * fixing bugs * add skipload for const ops * comments * First merge to master: Renderer * fix emulation * passing all tests arm64 * cleaning * fix handcoded binary * cleaning * fix errs * fix runtime arg binary * clean git diff * fix and clean * fixing metal test * cleaning * fix metal test * ci ~8 min * fix pylint and clang * cache the files in ops_clang --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-14 19:29:30 -07:00
chenyu	a89142e46f	ShapeTracker.var_vals (#1540 )	2023-08-14 18:53:37 -07:00
Pavol Rusnak	a453d718a1	fix file race condition in ops_clang via pid in the filename (#1541 ) * fix file race condition in ops_clang via pid in the filename as suggested in https://github.com/tinygrad/tinygrad/pull/1458/files#r1292819054 * add explanation why a temp file is required on ops_clang	2023-08-14 18:50:10 -07:00
wozeparrot	9cb2bda34f	Revert "Better reshape (#1423 )" (#1538 )	2023-08-14 13:04:54 -04:00
Sieds Lykles	cf2bf1518d	Better reshape (#1423 ) * do reshaping without merge_views and reshape masks * added tests * properly do reshaping of zero or negative masks * replace while loop with single expression * remove old condition * add more tests and comments * remove empty file	2023-08-14 09:09:04 -07:00
YiMing Han	e00acb1eaf	fix deepwalk ctx check (#1536 )	2023-08-13 23:03:17 -07:00
JaSpa99	2fd7004980	Implementation of SoftVC VITS SVC model (#1371 ) * [WIP]: implementation of SoftVC VITS SVC model * fix typo * fix whitespace * Fully implement Generator & Synthesizer - implement SineGen & SourceHnNSF to reconstruct source signal from F0 - source signal is added during Generator - fix various typos - start loading state dict for synthesizer * Load Synthesizer weights - Fix typos in Synthesizer - Slightly modify vits::load_checkpoint to skip a specified layer - Test with Saul Goodman model because Drake weights are on mega * start work on ContentVec - implement ConvFeatureExtractionModel for ContentVec - start work on TransformerEncoder for ContentVec: - this transformer probably needs its own MultiheadAttention implementation - fix various typos in synthesizer - add helpers to mask behavior of ~ and % operator of torch * use normal and kaiming_normal * Implement ContentVec - load ContentVec weights and config from fairseq hyperparams - use MultiHeadAttention from whisper.py - TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing - redid tilde() - some cleanup * rename the file so it can be imported * forgot to lint * use float() instead of cast() * add contentvec256l9 and cleanup * Implement SoVITS fully and run it - Fully run sovits with .wav file - Drake weights need to be manually downloaded for now - Fix bugs - Add examples/sovits_helpers - Big TODO: INVALID Kernel for recordings > 4.5 secs * temp fix for longer audio recordings * Upsample no more torch * cleanup & detailed inference time measuring * Completely remove torch(audio) - Implement sinc resample in tinygrad - Load audio via Soundfile - Some cleanups * move stuff to helper files * Cleanup * fix invalid kernel * Cleanup & add more models * Metal sounds good after master merge - But Synthesizer pass became much slower * drake weights now marked save * do load/store in numpy * no commas needed here * remove extra newline * call Tensor::where on object * use Tensor::cat instead of numpy * pull out first iteration * remove Sequential, Dropout, GELU, TransposeLast * cast during loading * clean up attention * remove SamePad * Major cleanup / line reduction - Finish implementation of GroupNormMasked - Simplify parts of TransformerEncoder - Simplify parts of Generator - Move all helpers to common section - Only use repeat_expand_left for interp after SpeechEncoder - Moved SVC-specfic ContentVec impls up (canonically) - Proper annotations for get_encoder - Finished all TODOs - Squashed some whitespaces * clean up preprocess as well * more straightforward bool expr * add demo mode	2023-08-13 19:43:23 -07:00
nimlgen	b6937acb7e	fix casting behavior for interpreted buffers (#1525 )	2023-08-13 19:21:37 -07:00
David Heidelberg	13659ac6fa	examples: numpy() array returns only one value, not an array (#1534 ) Fixes issue: ``` loss_cpu = loss.detach().numpy()[0] ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed ``` Signed-off-by: David Heidelberg <david@ixit.cz>	2023-08-13 14:33:05 -07:00
chenyu	3e0c2d256f	symbolic shapetracker (#1506 ) * symbolic shapetracker * no need * keep only symbolic and clean up * explicit // and % Node support * NumNode * Node	2023-08-12 12:22:58 -07:00
Pavol Rusnak	875da762a8	fix file race condition in ops_clang (#1458 )	2023-08-12 09:31:46 -07:00
JaSpa99	d3d58a37e5	Bert: use Tensor.scaled_dot_product_attention (#1528 ) * use scaled attn from Tensor * add a test for bert * linter * no more tokenizer * without loading weights * remove prints * tribute to linter lords * smaller input and less runs * small bert	2023-08-12 08:46:04 -07:00
Szymon Ożóg	330fb7b1a3	Print more meaningfull hip error messages (#1530 )	2023-08-12 07:16:20 -07:00
wozeparrot	29d5801387	distributed collectives (#1519 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device * feat: allreduce * feat: test * feat: need contiguous * feat: test in ci * feat: exit with correct code * feat: don't need that * feat: opencl wait_for just doesn't work * feat: synchronize on out * feat: try? * feat: try again? * feat: add extra realizes * feat: print * feat: seed * feat: tol * feat: test ones and zeros * feat: remove print * feat: are you just flaky * feat: seperate scatter and gather? * feat: just try synchronizing * feat: remove print again * feat: bring back difference * feat: no sync * feat: revert that * feat: back to wait_for * fix: typo	2023-08-11 10:22:07 -07:00
Jacky Lee	2e85fce068	Transformer: use Tensor.scaled_dot_product_attention (#1520 )	2023-08-11 09:00:37 -07:00
George Hotz	38fe84d92b	cleanup mlops (#1521 ) * cleanup mlops * that line belongs there	2023-08-10 19:53:28 -07:00
George Hotz	47f18f4d60	[New] SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1516 ) (#1518 ) * Refactor AttnBlock, CrossAttention, CLIPAttention to share code * Reshape and transpose in loop * Bugfix on attention mask Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>	2023-08-10 15:04:18 -07:00
wozeparrot	7e7c9001e9	distributed world (#1481 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device	2023-08-10 10:00:51 -07:00
George Hotz	e3c6c0c6db	add GPT2 example (#1511 ) (#1514 ) * add gpt2 to examples * some cleanup * fixes * argparse + scaled_dot_product_attention * add timing * add to benchmark Co-authored-by: YassineYousfi <yassine.y10@gmail.com>	2023-08-10 09:09:47 -07:00
George Hotz	c82bd59b85	Revert "SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513 )" (#1515 ) This reverts commit `85e02311a2`.	2023-08-10 09:08:51 -07:00
Jacky Lee	85e02311a2	SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513 ) * Refactor AttnBlock, CrossAttention, CLIPAttention to share code * Reshape and transpose in loop	2023-08-10 08:52:33 -07:00
geohotstan	07b79f210f	llvmir support for bool <-> float casting (#1492 )	2023-08-09 13:12:52 -04:00
wozeparrot	351684395c	dont run on fork (#1510 )	2023-08-09 13:06:45 -04:00
wozeparrot	88e2e0c8a3	Revert "don't try to run benchmark on forks" (#1508 )	2023-08-09 12:59:49 -04:00
wozeparrot	65b65b760b	don't try to run benchmark on forks (#1507 )	2023-08-09 12:59:19 -04:00
George Hotz	c417cd3c97	fast HIP gemm -> 100 TFLOPS (#1476 ) * fast HIP gemm * wmma * correct b * fix spilling * 60 TFLOPS * 64 TFLOPS * 65 TFLOPS	2023-08-09 06:54:15 -07:00
David Hou	1766f0c0cf	use ConstOp for valid.max == 0 (#1501 ) * use ConstOp for valid.max == 0 * don't render valid for invalid load cache key * Update linearizer.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-09 00:01:59 -07:00
Jacky Lee	ef5f648e2f	Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502 ) * Implement scaled_dot_product_attention and test * Support attn_mask * Support is_causal too * Use in llama * Don't forget to reshape * Set requires_grad=False for causal * Remove staticmethod * Remove extra spaces	2023-08-08 23:27:13 -07:00
nimlgen	dabfd7569a	use allclose instead of equals in test_jit (#1504 ) Closes #1503	2023-08-08 22:22:17 -07:00
chenyu	827d13e64e	correct patch JIT llama chat (#1500 )	2023-08-08 19:52:09 -04:00
Yixiang Gao	7c2ea85bb0	Raise memory limit for CIFAR test (#1499 )	2023-08-08 19:40:56 -04:00
Thiago Franco de Moraes	293a10204b	Add tinygrad.renderer to packages in setup.py (#1497 )	2023-08-08 15:51:49 -07:00
chenyu	0415a48cfc	patch JIT llama chat mode (#1496 )	2023-08-08 15:15:56 -07:00
Yixiang Gao	6480a1a180	CIFAR 94.03% (#1340 ) * add disk_tensor * fix jit * new baseline before whitening * whitening through torch * whiting done currently at 91.65% * 91.99% * clean up mixup and 92.3% * clean up 92.30% * 92.49% before searching for new hyper-parameters * fix CI * fix white space * add whitening init in test * refactor, update hyperpara, 92.72% * converting whiting to tinygrad operation * update CI kernels count for CIFAR * add pad reflect * add random crop 92.53% * update hyperpara 93% * 93.15% on docker container, need to refactor the assignment for hyper param * print out weights and bias to be separated * bias/non-bias params separated * fix whitespace * clean up * refactor hyper-param with dict * refactor lr schedular params * fix whitespace * fix cross entropy loss * fix whitespace * move opt hyp to hyp dict * minor fixup * adjust model, loss scaling * 92.74% while using half of compute as before * update hyp for cutmix * random shuffle during batches * clean up * updating the model * update ConvGroup * disable gradients for batchnorm layer weights * whitespace * 93.92% * clean up * finally 94%git add .! * rewrite whitening to remove dependency on torch * whitespace * remove dependency on torch, 93.91% * back to 94.03% * clean up * update test_real_world	2023-08-08 15:13:24 -07:00
Roelof van Dijk	aa83a9e910	ci: fix gpuocelot build cache (#1474 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 14:00:04 -07:00
George Hotz	d24f936501	just cmplt (#1493 ) * just cmplt * fix maximum * don't save, there's no backward * ugh, no slot either * eq is a scam	2023-08-08 13:58:10 -07:00
Roelof van Dijk	e2cf0f322e	[READY] ci: missing n=auto (#1486 ) * ci: missing n=auto * fix: add to commented test --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 07:37:24 -07:00
Roelof van Dijk	0ce7511110	fix: is not use with a literal (#1487 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 07:35:30 -07:00
nimlgen	932dad1a2b	fix cast bool->float in llvmir (#1480 ) Closes #1479	2023-08-07 21:30:51 -07:00
nimlgen	046fd7437a	use fake buffer for external_test_speed_llama.py (#1478 )	2023-08-07 22:05:44 -04:00
George Hotz	5fdd248617	don't download cifar (#1472 )	2023-08-06 21:38:59 -07:00

... 166 167 168 169 170 ...

10633 Commits