tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 15:28:10 -05:00

Author	SHA1	Message	Date
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	d3c401ba3c	llama quantize: scale uses mul, not div	2023-08-22 11:48:56 -07:00
chenyu	89e13f2f04	support symbols in shrink (#1611 )	2023-08-22 09:08:21 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
Umut Zengin	f720682beb	np.argmax to Tensor.argmax (#1608 ) * to tensor argmax * removed keepdim * training update	2023-08-21 15:22:29 -07:00
George Hotz	4ea00bad38	track down llama bug	2023-08-21 15:14:21 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
wozeparrot	55d95d1658	llama 70b (#1558 ) * feat: llama 70b * feat: llama 70b but simpler	2023-08-16 11:36:12 -07:00
Jacky Lee	ef5f648e2f	Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502 ) * Implement scaled_dot_product_attention and test * Support attn_mask * Support is_causal too * Use in llama * Don't forget to reshape * Set requires_grad=False for causal * Remove staticmethod * Remove extra spaces	2023-08-08 23:27:13 -07:00
chenyu	827d13e64e	correct patch JIT llama chat (#1500 )	2023-08-08 19:52:09 -04:00
chenyu	0415a48cfc	patch JIT llama chat mode (#1496 )	2023-08-08 15:15:56 -07:00
George Hotz	67781fcf5d	fix fail fast in CI	2023-08-05 10:24:24 -07:00
Francis Lam	9d142430cb	Add option in llama.py to quantize weights to int8 at runtime (#1289 ) * Add option in llama.py to quantize weights to int8 at runtime Also added lm-eval to external * Add support for llama-2 evaluation	2023-07-24 17:22:38 -07:00
Pavol Rusnak	cd60b8561c	Add LLaMA-2 support (#1284 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-07-24 17:12:02 -04:00
Francis Lam	3db57d3118	Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275 )	2023-07-19 13:22:33 -04:00
Stan	9b6e57eccd	helpers.py: improved test coverage + exception handling (#1165 ) * Fixes + improved test coverage for helpers.py - added exception handling in `proc`, if an exception was thrown, the thread would hang - made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread * Made `_early_exec_process` catch any Exception Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output` * Fixed `from tinygrad.helpers import Timing` import oops, for some reason my IDE cleaned that import from extra/helpers. * Fixed import in llama.py Another one that I skipped by accident, mybad * Extracted a class for tests of early exec * Normalize line endings, windows uses /r/n * Made `cross_process` not a daemon	2023-07-07 10:26:05 -07:00
Rayan Hatout	65cbaa3429	no need to slice A and B twice in LLaMa complex multiplication (#1054 )	2023-06-26 14:42:58 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
George Hotz	ed1963b899	Fast DiskTensor to other Tensor (#916 ) * make disktensors fast * loading * loader for sd and llama	2023-06-03 12:25:41 -07:00
George Hotz	d58586bb17	safetensors! (#903 ) * safetensors test * safe_save * load back with real safetensors * bugfix in device name. add simple torch_load * it works for llama, but it's slower... * mmap * no intermediate * load mmaped * readinto speed * not ready yet * revert that	2023-06-02 13:41:09 -07:00
wozeparrot	0dc333cfab	Promote Embedding to `nn` (#798 ) * feat: promote Embedding to nn * fix: fix failing test * feat: add test with jit * feat: rewrite embedding to no longer need stacked for loops * clean+fix: don't know how that happened	2023-05-25 18:39:45 -07:00
Jacky Lee	7a45b989a1	Device: make GPU default and METAL/CUDA if possible (#732 ) * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * run github workflow * Fix logic to select default * pass if an error occurs * use separate function for try except	2023-04-04 09:41:52 +05:30
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Kirill	26a3888ab8	Fix llama 13B RAM usage (#710 )	2023-03-18 13:50:09 -07:00
Kirill	0fe5014b1f	Use pathlib (#711 ) * Use pathlib in llama * Use pathlib in stablediffusion	2023-03-18 13:49:21 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
George Hotz	fe0e8a306f	jittable llama	2023-03-12 14:15:04 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
George Hotz	046b3952c3	get_state_dict	2023-03-11 23:46:53 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
George Hotz	61071f881a	fix bug, and add unit test to catch failure	2023-03-11 16:57:25 -08:00
George Hotz	3ec457248c	failing llama test	2023-03-11 16:28:10 -08:00
George Hotz	8aa63847c7	llama: up max tokens to 1000	2023-03-11 13:39:33 -08:00
George Hotz	5ea44cefcc	llama: add lexie personality	2023-03-11 10:23:33 -08:00
George Hotz	c908f911a7	llama defaults to metal on osx	2023-03-11 09:30:13 -08:00
George Hotz	5e1380df6a	profiling llama + cache is_contiguous	2023-03-11 08:23:21 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00

39 Commits