tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-19 02:44:40 -05:00

Author	SHA1	Message	Date
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
Jacky Lee	e0c2ae8984	Update file paths (#1179 )	2023-07-07 18:41:58 -07:00
George Hotz	b8dfbba703	hip_matmul: f16 gemm 2048x2048 gets 36 TFLOPS	2023-07-08 00:35:45 +00:00
Stan	69d33cab0d	Fix: auto create parent dir when downloading file (#1173 ) * Fix: auto create parent dir when downloading file also removed duplicate import `os` * Added test for auto parent dir creation when downloading file	2023-07-07 13:40:29 -07:00
terafo	aa60feda48	Fix naming conflict with huggingface datasets (#1161 ) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com>	2023-07-07 10:43:44 -07:00
Stan	9b6e57eccd	helpers.py: improved test coverage + exception handling (#1165 ) * Fixes + improved test coverage for helpers.py - added exception handling in `proc`, if an exception was thrown, the thread would hang - made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread * Made `_early_exec_process` catch any Exception Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output` * Fixed `from tinygrad.helpers import Timing` import oops, for some reason my IDE cleaned that import from extra/helpers. * Fixed import in llama.py Another one that I skipped by accident, mybad * Extracted a class for tests of early exec * Normalize line endings, windows uses /r/n * Made `cross_process` not a daemon	2023-07-07 10:26:05 -07:00
Kunwar Raj Singh	8391648822	Over 90% on CIFAR with examples/hlb_cifar10.py (#1073 ) * fix eval, lr decay, best eval * 82.27 * 82.64 * 82.79, reproducable * add lr sched, 85.26 * 87.42 * 87.94 * 87.42 * tta with flip * training flip aug * refactor * using Tensor for LR is faster * 89.5 * refactor, flip only train set * 90.01 * 90.64 * eval jit * refactor * only JIT model * fix eval JIT * fix eval JIT * 90.82 * STEPS=900 reaches 90.22 * TTA envvar * TTA default 0 * fully jit training * refactor optim * fix sched * add label smoothing * param changes * patial gelu * OneCycle with pause * gelu maybe works * 90.12 * remove pause lr * maybe fix lr schedulers * scheduler test passing * comments * try mixup * shuffle! * add back the missing last eval * fix shuffle bugs * add mixup prob * fix mixup prob * 90.19 * correct mixup * correct mixup * correct mixup * 90.24 * 90.33 * refactor, add type hints * add gradient clipping * maybe fix test * full JIT * back to relu for now * pass mixup prob as param * add typehints * maybe CI works * try erf gelu * CI, types * remove useless import/ * refactor optim * refactor optim * try leakyrelu * try celu * gelu * 90.67 * remove grad clip * remove grad clip tests * revert params * add test for OneCycleLR * 90.62 * fix eval timing * fix eval timing again * so where i calculate mixup_prob matters --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-06 20:46:22 -07:00
Eli Frigo	801564f31b	Remove POW llop and add SQRT llop (#1104 ) * fixed division by zero for fast operations * made et closer to 0 * replace POW llop with SQRT * updated mlops to swap SQRT and POW llops * updated hlops to swap POW and SQRT * added sqrt llop to cpu runtime * added sqrt llop to cstyle codegen * added POW llop to llvm ir codegen * added SQRT llop to torch runtime * moved pow from mlops to hlops * found a better way to do reverse pow * fixed indentation * added SQRT llop to triton * update docs to match new llops * removed POW operator from assembly codegen * added sqrt and rsqrt to pow hlop * rewrote pow function in tensor.py * Adjust tolerance * Adjust for adamw * Reduce for Adam too * removed accidental leftover code * removed all of accidental code * added rsqrt test * removed pow from mlops again it was added back when resolving merge conflicts --------- Co-authored-by: Jacky Lee <jla524@sfu.ca>	2023-07-05 18:07:58 -07:00
Reza Rezvan	d1356cac27	Fix: Jacobian tests [WIP] (#1126 ) * Fix: Jacobian tests; num_jacobian either bugged or not accurate enough; * Fix: Jacobian tests; * Fix: Gradcheck;	2023-07-05 15:36:22 -07:00
Mehmet Kuzucu	c3173ff281	Add return statement to the train function (#1135 ) add a return statement to the train function in order to provide access to the losses and accuracies lists	2023-07-05 08:13:38 -07:00
George Hotz	2f968f8547	ignore cloudpickle type for local mypy	2023-07-04 13:51:20 -07:00
Daniel Hipke	b4ce23e4b8	Make cross_process use cloudpickle (#1118 ) * fix syntax issues in imagenet_download.py * use cloudpickle in cross_process to make it work in Python 3.9+ * add cross_process test * prevent unpickling on every function call * add cloudpickle to setup.py * add support for args/kwargs	2023-07-04 00:47:34 -07:00
Anselm Coogan	a22aad7d32	Use generators instead of lists in `any`s and `all`s (#1111 ) * Use generators in any(..) instead of lists for better best-case * Use generators in all(...) instead of lists * enable R1729 in .pylintrc * revert import sorting --------- Co-authored-by: Anselm Coogan <anselm@scandit.com>	2023-07-03 16:06:06 -07:00
Frank Pinnola	2071e53da8	Handle broadcast flag on gemm (#1103 )	2023-07-02 22:15:07 -07:00
Rob Grossman	c8ddc34368	include missing queue in thneed load (#1095 )	2023-07-02 12:33:59 -07:00
George Hotz	e234bf2298	hip matmul : add K support	2023-06-28 19:54:33 +00:00
George Hotz	0e93b9642a	hip matmul	2023-06-28 19:21:01 +00:00
George Hotz	6ec0a24706	imagenet eval in 1 min 28 sec	2023-06-28 04:23:26 +00:00
George Hotz	9c6e507518	move accel into extra	2023-06-23 16:38:15 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
Alex Wang	3d63c71e27	HIP backend (#750 ) * llama works for HIP backend * Use hipMemcpyAsync; Less lines of code * Remove unused code * Refactor * Add comments; hipDeviceSynchronize * HIP over GPU; Remove PyHIP dependency * Cleanups * Fix mypy check * Merge master; Dump assembly code	2023-06-18 11:35:57 -07:00
Casey Primozic	805eef10dd	Add tensorflow GEMM benchmark script (#1000 ) * Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow	2023-06-18 10:57:45 -07:00
Diogo	d2b837c1d9	Adds floor/ceil (#989 ) * floor ceil impl * control casting in numpy	2023-06-17 10:56:21 -07:00
George Hotz	fe71282ba1	faster RDNA assembly backend (#990 ) * fast asm * torch gemm	2023-06-16 12:06:38 -07:00
George Hotz	ba56ee6020	RDNA assembly backend ($1000 bounty) (#787 ) * Revert "Revert "ops rdna"" This reverts commit `0400315078`. * Revert "Revert "writing 2"" This reverts commit `325a3bf2cf`. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more	2023-06-16 09:33:18 -07:00
Yahya Lmallas	804c45b5fc	FIX: Can't pickle local object (#979 ) _early_exec_process is a local function that is defined whiting the scope of another function, should be global	2023-06-14 12:32:17 -07:00
Steven Anderson	e54b6c5e7f	One hot (#972 ) * passing with 1d indices * passing all test * cleanup * using safe_numpy for scalar	2023-06-12 10:13:29 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
Steven Anderson	c0e558b77c	Test nllloss (#958 ) * works but slow * work with NC and NCd1 it still slow * refactor * support for k dimensions * without numpy	2023-06-09 09:00:29 -07:00
Diogo	6b1280f01c	fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956 ) * prelu and where casting * typing for safe_numpy * optional * get rid of tracing in ci * cleanup and resolved layernorm issues * removed debug print	2023-06-08 16:09:19 -07:00
Diogo	666d151f8a	Onnx slice fixups (#952 ) * resolved some slice test errors and added some more debugging logs * use same device in cumsum * increased float priority * onnx debug ouput match input	2023-06-07 19:44:30 -07:00
M4tthewDE	664d6cc7e5	Implement onnx MeanVarianceNormalization (#943 )	2023-06-06 10:28:19 -07:00
Steven Anderson	079ea217a3	fix test_pow_type - autocasting for Pow with inputs of diff type (#937 )	2023-06-05 15:22:35 -07:00
M4tthewDE	70f12fdb57	Fix wrong op version being used if versions equal (#934 )	2023-06-05 07:45:10 -07:00
Steven Anderson	79613eb83e	Test min (#932 ) * fix __neg__ defaulting to float32 due to 0.0 * fixed __neg__ always defaulting to float32 * fixed openpilot (OpenCL) Test	2023-06-05 00:03:30 -07:00
George Hotz	fbf17f0031	intel benchmark matmul gets 60 TFLOPS?	2023-06-04 17:01:50 +00:00
Steven Anderson	657e642e3a	Fixed test suite for Clip (#912 ) * Fixed test suite for Clip * fixed issue with clip when taking large negative numbers as min * Remove typings	2023-06-04 09:01:01 -07:00
George Hotz	afd0be8a9c	intel example	2023-06-04 06:43:09 +00:00
George Hotz	ed1963b899	Fast DiskTensor to other Tensor (#916 ) * make disktensors fast * loading * loader for sd and llama	2023-06-03 12:25:41 -07:00
George Hotz	791530045d	Refactor LoadOps (#910 ) * test * work * upd test * loadops * cleanups * real ones * remove LazyNumpyArray * fix assign test * remove range * np.require * llama uses arange kernels * no caching consts * fix enet * torch load support * tests cleanup * fix shufflenet * fix image * fix torch_load test	2023-06-03 09:40:43 -07:00
Steven Anderson	513aeb2f66	Fixed all ConstantOfShape test suite (#907 )	2023-06-02 11:26:40 -07:00
Steven Anderson	301f7b54c6	ConstantOfShape ONNX test fixed. (#890 ) * ConstantOfShape ONNX test fixed. * removed redundant if statement * value is optional and should default to a float32 tensor with value of 0 * fixed: default parameters are created at function definition, bad for mutable objects.	2023-06-02 07:34:25 -07:00
kposborne2	ae83e9844c	add output_padding to transposed conv (#875 )	2023-06-01 00:03:22 -07:00
Friedrich Carl Eichenroth	740304ef9d	Small Onnx Parser Improvements (#885 ) * wip * rename onnx_version to onnx_model_versioN * add type * add types * small cleanup * revert some changes from before * add todo * dumb fix	2023-06-01 00:01:01 -07:00
Marcello Fuschi	3924aae8ed	Fix ONNX dropout and unify the implementation (#857 ) * Fix ONNX dropout and unify the implementation * Use tensor rand method for dropout * Change approach for RNG in ONNX Dropout * Fix style * Test legacy RNG seeding * Remove the necessity for legacy RNG in Tensor class	2023-05-31 07:40:47 -07:00
skobsman	2e393f7ef2	InstanceNormalization ONNX test fixed. (#870 )	2023-05-30 16:07:44 -07:00
Friedrich Carl Eichenroth	f91f28d9e2	fix a bunch of tests (#856 )	2023-05-29 17:48:26 -07:00
zk-tarts	174c65b7d9	add onnx Binarizer op (#850 ) Co-authored-by: zk-tarts <>	2023-05-29 13:15:50 -07:00
M4tthewDE	4408c25e9a	Add Onnx op Shrink (#851 ) * Add onnx Shrink operation * Fix soft/hard shrink onnx test	2023-05-29 13:15:39 -07:00
Friedrich Carl Eichenroth	6f2b3755ca	set axis default to 0 (#854 )	2023-05-29 13:15:28 -07:00

1 2 3 4 5 ...

321 Commits