tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-13 17:08:11 -05:00

Author	SHA1	Message	Date
George Hotz	b9feb1b743	fp16 support in stable diffusion	2023-08-20 05:37:21 +00:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
wozeparrot	50decf0d45	train cifar using multigpu (#1529 ) * feat: train cifar using multigpu * feat: split eval batch across 5 * feat: cleaner allreduce * feat: 93.88% * feat: cleaner batch chunking from bert * feat: cleaner grad sync * feat: tinygrad argmax * feat: make it work with different gpu counts * feat: move some stuff into the normal __init__ * feat: autodetect gpu count * feat: move import inside	2023-08-18 09:35:44 -07:00
wozeparrot	55d95d1658	llama 70b (#1558 ) * feat: llama 70b * feat: llama 70b but simpler	2023-08-16 11:36:12 -07:00
JaSpa99	2fd7004980	Implementation of SoftVC VITS SVC model (#1371 ) * [WIP]: implementation of SoftVC VITS SVC model * fix typo * fix whitespace * Fully implement Generator & Synthesizer - implement SineGen & SourceHnNSF to reconstruct source signal from F0 - source signal is added during Generator - fix various typos - start loading state dict for synthesizer * Load Synthesizer weights - Fix typos in Synthesizer - Slightly modify vits::load_checkpoint to skip a specified layer - Test with Saul Goodman model because Drake weights are on mega * start work on ContentVec - implement ConvFeatureExtractionModel for ContentVec - start work on TransformerEncoder for ContentVec: - this transformer probably needs its own MultiheadAttention implementation - fix various typos in synthesizer - add helpers to mask behavior of ~ and % operator of torch * use normal and kaiming_normal * Implement ContentVec - load ContentVec weights and config from fairseq hyperparams - use MultiHeadAttention from whisper.py - TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing - redid tilde() - some cleanup * rename the file so it can be imported * forgot to lint * use float() instead of cast() * add contentvec256l9 and cleanup * Implement SoVITS fully and run it - Fully run sovits with .wav file - Drake weights need to be manually downloaded for now - Fix bugs - Add examples/sovits_helpers - Big TODO: INVALID Kernel for recordings > 4.5 secs * temp fix for longer audio recordings * Upsample no more torch * cleanup & detailed inference time measuring * Completely remove torch(audio) - Implement sinc resample in tinygrad - Load audio via Soundfile - Some cleanups * move stuff to helper files * Cleanup * fix invalid kernel * Cleanup & add more models * Metal sounds good after master merge - But Synthesizer pass became much slower * drake weights now marked save * do load/store in numpy * no commas needed here * remove extra newline * call Tensor::where on object * use Tensor::cat instead of numpy * pull out first iteration * remove Sequential, Dropout, GELU, TransposeLast * cast during loading * clean up attention * remove SamePad * Major cleanup / line reduction - Finish implementation of GroupNormMasked - Simplify parts of TransformerEncoder - Simplify parts of Generator - Move all helpers to common section - Only use repeat_expand_left for interp after SpeechEncoder - Moved SVC-specfic ContentVec impls up (canonically) - Proper annotations for get_encoder - Finished all TODOs - Squashed some whitespaces * clean up preprocess as well * more straightforward bool expr * add demo mode	2023-08-13 19:43:23 -07:00
David Heidelberg	13659ac6fa	examples: numpy() array returns only one value, not an array (#1534 ) Fixes issue: ``` loss_cpu = loss.detach().numpy()[0] ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed ``` Signed-off-by: David Heidelberg <david@ixit.cz>	2023-08-13 14:33:05 -07:00
George Hotz	47f18f4d60	[New] SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1516 ) (#1518 ) * Refactor AttnBlock, CrossAttention, CLIPAttention to share code * Reshape and transpose in loop * Bugfix on attention mask Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>	2023-08-10 15:04:18 -07:00
George Hotz	e3c6c0c6db	add GPT2 example (#1511 ) (#1514 ) * add gpt2 to examples * some cleanup * fixes * argparse + scaled_dot_product_attention * add timing * add to benchmark Co-authored-by: YassineYousfi <yassine.y10@gmail.com>	2023-08-10 09:09:47 -07:00
George Hotz	c82bd59b85	Revert "SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513 )" (#1515 ) This reverts commit `85e02311a2`.	2023-08-10 09:08:51 -07:00
Jacky Lee	85e02311a2	SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513 ) * Refactor AttnBlock, CrossAttention, CLIPAttention to share code * Reshape and transpose in loop	2023-08-10 08:52:33 -07:00
Jacky Lee	ef5f648e2f	Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502 ) * Implement scaled_dot_product_attention and test * Support attn_mask * Support is_causal too * Use in llama * Don't forget to reshape * Set requires_grad=False for causal * Remove staticmethod * Remove extra spaces	2023-08-08 23:27:13 -07:00
chenyu	827d13e64e	correct patch JIT llama chat (#1500 )	2023-08-08 19:52:09 -04:00
chenyu	0415a48cfc	patch JIT llama chat mode (#1496 )	2023-08-08 15:15:56 -07:00
Yixiang Gao	6480a1a180	CIFAR 94.03% (#1340 ) * add disk_tensor * fix jit * new baseline before whitening * whitening through torch * whiting done currently at 91.65% * 91.99% * clean up mixup and 92.3% * clean up 92.30% * 92.49% before searching for new hyper-parameters * fix CI * fix white space * add whitening init in test * refactor, update hyperpara, 92.72% * converting whiting to tinygrad operation * update CI kernels count for CIFAR * add pad reflect * add random crop 92.53% * update hyperpara 93% * 93.15% on docker container, need to refactor the assignment for hyper param * print out weights and bias to be separated * bias/non-bias params separated * fix whitespace * clean up * refactor hyper-param with dict * refactor lr schedular params * fix whitespace * fix cross entropy loss * fix whitespace * move opt hyp to hyp dict * minor fixup * adjust model, loss scaling * 92.74% while using half of compute as before * update hyp for cutmix * random shuffle during batches * clean up * updating the model * update ConvGroup * disable gradients for batchnorm layer weights * whitespace * 93.92% * clean up * finally 94%git add .! * rewrite whitening to remove dependency on torch * whitespace * remove dependency on torch, 93.91% * back to 94.03% * clean up * update test_real_world	2023-08-08 15:13:24 -07:00
George Hotz	d78fb8f4ed	add stable diffusion and llama (#1471 ) * add stable diffusion and llama * pretty in CI * was CI not true * that * CI=true, wtf * pythonpath * debug=1 * oops, wrong place * uops test broken for wgpu * wgpu tests flaky	2023-08-06 21:31:51 -07:00
George Hotz	67781fcf5d	fix fail fast in CI	2023-08-05 10:24:24 -07:00
Felix	97a6029cf7	Corrected a few misspelled words (#1435 )	2023-08-04 16:51:08 -07:00
ian	c08ed1949f	Fix plt output comment (#1428 )	2023-08-03 23:35:52 -07:00
Paolo Gavazzi	9ffa1eb7e2	Removed dep of torch, torchaudio, kept librosa only (#1264 )	2023-08-02 13:52:04 -04:00
Diogo	4dc8595069	simple exporting models (#1344 ) * unified exporting * json exporting * ignore more * simplified buffer export * added dtypes * added assert * swift example * fix tests * linter * remove whitespace * fixed tests * remove swift example * remove unintended changes * allow callable models to be used * whitespace * more readable json export * name change * whitespace * whitespace	2023-08-01 09:35:48 -07:00
George Hotz	f27df835a6	delete dead stuff (#1382 ) * delete bpe from repo * remove yolo examples * Revert "remove yolo examples" This reverts commit `cd1f49d466`. * no windows	2023-07-31 11:17:49 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
Francis Lam	9d142430cb	Add option in llama.py to quantize weights to int8 at runtime (#1289 ) * Add option in llama.py to quantize weights to int8 at runtime Also added lm-eval to external * Add support for llama-2 evaluation	2023-07-24 17:22:38 -07:00
Pavol Rusnak	cd60b8561c	Add LLaMA-2 support (#1284 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-07-24 17:12:02 -04:00
Giles Bathgate	c4238b4ea0	Fix discriminator balancing in mnist_gan example (#1332 )	2023-07-23 12:43:05 -07:00
Cole Sutyak	2d4e182294	change fetch to allow for local file selection (#1309 )	2023-07-23 15:00:16 -04:00
Maxim Zakharov	48c4df1263	fix: prevent infinite "loading..." state (#1319 ) * demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func * because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever * added try catch here to communicate about the error in a proper way	2023-07-21 14:01:53 -07:00
Jacob Pradels	b112edd2c3	Add pylint trailing whitespace rule (#1314 )	2023-07-21 13:37:55 -04:00
George Hotz	bfbb8d3d0f	fix ones, BS=2 stable diffusion, caching optimizer (#1312 ) * fix ones, BS=2 stable diffusion * caching optimizer * print search time * minor bug fix	2023-07-21 09:55:49 -07:00
madt2709	d2c1e8409a	Update arange to be (start, stop, step) (#1308 )	2023-07-21 00:27:23 -04:00
George Hotz	f45013f0a3	stable diffusion: remove realizes we don't need	2023-07-20 19:53:07 -07:00
George Hotz	b58dd015e3	stable diffusion: remove import numpy as np	2023-07-20 19:35:44 -07:00
George Hotz	35bc46289c	stable diffusion: use new tinygrad primitives	2023-07-20 19:25:49 -07:00
Stan	0a3d4f8103	Implementation of VITS TTS model (#1188 ) * [WIP]: implementation of VITS TTS model * Implemented VITS model, moved all code to examples/vits.py * Added support for vctk model, auto download, and cleanups * Invoke tensor.realize() before measuring inference time * Added support for mmts-tts model, extracted TextMapper class, cleanups * Removed IPY dep, added argument parser, cleanups * Tiny fixes to wav writing * Simplified the code in a few places, set diff log level for some prints * Some refactoring, added support for uma_trilingual model (anime girls) * Fixed bug where embeddings are loaded with same backing tensor, oops * Added emotional embed support, added cjks + voistock models - voistock is multilingual model with over 2k anime characters - cjks is multilingual model with 24 speakers both are kinda bad for english though :c * Removed `Tensor.Training=False` (not needed and wrong oop) * Changed default model and speaker to vctk with speaker 6 * Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy * Removed accidentally pushed test/spline.py * Some slight refactors * Replaced masked_fill with tensor.where * Added y_length estimating, plus installation instructions, plus some cleanups * Fix overestimation log message. * Changed default value of `--estimate_max_y_length` to False This is only useful for larger inputs. * Removed printing of the phonemes * Changed default value of `--text_to_synthesize`	2023-07-20 17:37:14 -07:00
George Hotz	f7b0320d8b	add cifar training regression test (#1287 ) * add cifar training regression test * clean up print	2023-07-19 14:17:09 -07:00
Francis Lam	3db57d3118	Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275 )	2023-07-19 13:22:33 -04:00
Yixiang Gao	a8f2c16f8e	add contiguous (#1246 )	2023-07-15 08:36:34 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Hey	4f72eb823c	Outdated repository URL (#1218 ) * Update outdated repo url * Update more outdated repo url's	2023-07-11 23:14:19 -07:00
AN Long	f75de602df	fix typo in stable diffusion example (#1219 )	2023-07-11 15:26:40 -07:00
Stan	f40f8cd055	Initialise numpy arrays as float32 in DDPG (#1171 ) float64 is not supported by tinygrad	2023-07-07 12:05:31 -07:00
terafo	aa60feda48	Fix naming conflict with huggingface datasets (#1161 ) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com>	2023-07-07 10:43:44 -07:00
Stan	9b6e57eccd	helpers.py: improved test coverage + exception handling (#1165 ) * Fixes + improved test coverage for helpers.py - added exception handling in `proc`, if an exception was thrown, the thread would hang - made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread * Made `_early_exec_process` catch any Exception Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output` * Fixed `from tinygrad.helpers import Timing` import oops, for some reason my IDE cleaned that import from extra/helpers. * Fixed import in llama.py Another one that I skipped by accident, mybad * Extracted a class for tests of early exec * Normalize line endings, windows uses /r/n * Made `cross_process` not a daemon	2023-07-07 10:26:05 -07:00
Kunwar Raj Singh	8391648822	Over 90% on CIFAR with examples/hlb_cifar10.py (#1073 ) * fix eval, lr decay, best eval * 82.27 * 82.64 * 82.79, reproducable * add lr sched, 85.26 * 87.42 * 87.94 * 87.42 * tta with flip * training flip aug * refactor * using Tensor for LR is faster * 89.5 * refactor, flip only train set * 90.01 * 90.64 * eval jit * refactor * only JIT model * fix eval JIT * fix eval JIT * 90.82 * STEPS=900 reaches 90.22 * TTA envvar * TTA default 0 * fully jit training * refactor optim * fix sched * add label smoothing * param changes * patial gelu * OneCycle with pause * gelu maybe works * 90.12 * remove pause lr * maybe fix lr schedulers * scheduler test passing * comments * try mixup * shuffle! * add back the missing last eval * fix shuffle bugs * add mixup prob * fix mixup prob * 90.19 * correct mixup * correct mixup * correct mixup * 90.24 * 90.33 * refactor, add type hints * add gradient clipping * maybe fix test * full JIT * back to relu for now * pass mixup prob as param * add typehints * maybe CI works * try erf gelu * CI, types * remove useless import/ * refactor optim * refactor optim * try leakyrelu * try celu * gelu * 90.67 * remove grad clip * remove grad clip tests * revert params * add test for OneCycleLR * 90.62 * fix eval timing * fix eval timing again * so where i calculate mixup_prob matters --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-06 20:46:22 -07:00
nimlgen	d363d25ee2	fix imports for examples/transformer.py (#1136 )	2023-07-05 08:15:13 -07:00
George Hotz	87d21ea979	examples: simple conv bn	2023-07-04 13:50:26 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
Eli Frigo	10f1aeb144	fixed broken link (#1097 )	2023-07-02 15:06:59 -07:00
nmarwell26	12ce68c1ee	Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086 )	2023-07-01 12:04:28 -07:00

... 16 17 18 19 20 ...

1207 Commits