tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
George Hotz	732884653c	osx in hlb_cifar10_torch	2023-04-14 13:12:08 -07:00
George Hotz	584ee6f616	don't graph consts	2023-04-14 03:32:20 -07:00
George Hotz	9a39ebefde	hlb_cifar10_torch gets 80%	2023-04-14 02:47:03 -07:00
Jacky Lee	06ed958abd	Fix train_resnet example (#744 ) * Fix ResNet example * Scientific notation	2023-04-12 13:48:39 +05:30
Jacky Lee	7a45b989a1	Device: make GPU default and METAL/CUDA if possible (#732 ) * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * run github workflow * Fix logic to select default * pass if an error occurs * use separate function for try except	2023-04-04 09:41:52 +05:30
Jacky Lee	156640e90d	Permute examples (#731 ) * examples: use permute instead of transpose * Use transpose but change args	2023-03-29 05:07:06 +04:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	26a3888ab8	Fix llama 13B RAM usage (#710 )	2023-03-18 13:50:09 -07:00
Kirill	0fe5014b1f	Use pathlib (#711 ) * Use pathlib in llama * Use pathlib in stablediffusion	2023-03-18 13:49:21 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
Ayushman Kumar	e28bd11ff1	Cast Tensor data to float32 (#703 ) * Cast Tensor data to float32 * astype('float32') --> Tensor.randn()	2023-03-14 23:09:41 -07:00
Jacky Lee	5e820818e9	Cast image to float32 (#702 )	2023-03-14 08:13:19 -07:00
George Hotz	fe0e8a306f	jittable llama	2023-03-12 14:15:04 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Kirill	af7745073f	Add comments to SD (#686 ) * Add explanation for empty lambdas * Fix my_unpickle if pytorch_lightning is installed * oops	2023-03-12 10:56:49 -07:00
George Hotz	046b3952c3	get_state_dict	2023-03-11 23:46:53 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
George Hotz	61071f881a	fix bug, and add unit test to catch failure	2023-03-11 16:57:25 -08:00
George Hotz	3ec457248c	failing llama test	2023-03-11 16:28:10 -08:00
George Hotz	8aa63847c7	llama: up max tokens to 1000	2023-03-11 13:39:33 -08:00
George Hotz	5ea44cefcc	llama: add lexie personality	2023-03-11 10:23:33 -08:00
George Hotz	c908f911a7	llama defaults to metal on osx	2023-03-11 09:30:13 -08:00
George Hotz	5e1380df6a	profiling llama + cache is_contiguous	2023-03-11 08:23:21 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	8bf75a7fdd	fix stable diffusion and CI	2023-03-10 17:48:12 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00
jspieler	da7fb4b227	Fixed DDPG example (#667 )	2023-03-09 11:49:52 -08:00
George Hotz	c22afc52db	move the custom function example to a test	2023-03-08 10:05:04 -08:00
George Hotz	7d3b9d0e95	oops, things relied on that API. the global cache needs access to the ASTRunner class	2023-03-08 08:39:31 -08:00
George Hotz	4f957423c3	jitting custom ops + OPTLOCAL assignment bugfix	2023-03-08 08:30:37 -08:00
George Hotz	7285de41a1	tinygrad supports CUSTOM functions	2023-03-08 07:50:33 -08:00
Pankaj Doharey	9d97d97b26	Opens image in default viewer after saving. (#612 )	2023-03-03 17:28:49 -08:00
George Hotz	2e26286294	speed like you wouldn't believe (#626 ) * speed like you wouldn't believe * fix tests	2023-03-02 07:49:19 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	c4856aa193	fix yolo webcam	2023-02-26 17:24:05 -08:00
Jacky Lee	0f58c4c648	Cleanup yolo and remove stateless classes (#604 ) * Add AvgPool2d as a layer * Clean up a bit * Remove stateless layers in yolo_nn * More cleanup * Save label for test * Add test for YOLO * Test without cv2 * Don't fail if cv2 not installed * Better import * Fix image read * Use opencv :) * Don't download the file * Fix errors * Use same version * Set higher confidence * Why is the confidence so low? * Start over * Remove stateless layers * Remove extra lines * Revert changes * Save a few more lines	2023-02-26 16:55:21 -08:00
voidz	94bec40110	moved extras/jit.py -> tinygrad/jit.py (#599 ) * moved extras/jit.py to tinygrad/jit.py * fixed indent * removed tinygrad.helpers.DEBUG from jit.py	2023-02-25 08:32:33 -08:00
Benedikt Mandelkow	7348e9a6c6	add restrict qualifier to inputs in c backend (#593 ) * add restrict qualifier for clang backend convolution inputs/ outputs see https://godbolt.org/z/Tb9jMxWfx for generated assembly * enable more checks * inline fmax to motivate the compiler to inline some more * fix if else binding power	2023-02-25 08:32:21 -08:00
George Hotz	2e56a4793e	rename log_softmax, support dim, fix onnx Softmax	2023-02-24 10:11:24 -08:00
George Hotz	94ccab941e	compile_tensorflow: no cast required	2023-02-22 21:14:21 -08:00
George Hotz	135d0ddb78	compile_tensorflow: read weights from disk	2023-02-22 21:12:35 -08:00
George Hotz	0615dcffe7	compile_tensorflow: save the weights	2023-02-22 21:05:45 -08:00
George Hotz	c537fd0614	compile_tensorflow: add initialize and tests	2023-02-22 20:50:53 -08:00
George Hotz	dc914cde50	compile_tensorflow	2023-02-22 20:08:58 -08:00
George Hotz	76b4d0577d	yolov8 works up to the MaxPool	2023-02-22 19:32:13 -08:00
Mischa Untaga	14bb2c40a2	Fix yolov3 example (#577 )	2023-02-21 09:24:00 -08:00
George Hotz	d9fa47ecc9	use the TinyJit in the efficientnet runner, 200ms -> 20ms	2023-02-20 19:58:16 -08:00

... 17 18 19 20 21 ...

1179 Commits