tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
Casey Primozic	805eef10dd	Add tensorflow GEMM benchmark script (#1000 ) * Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow	2023-06-18 10:57:45 -07:00
Diogo	d2b837c1d9	Adds floor/ceil (#989 ) * floor ceil impl * control casting in numpy	2023-06-17 10:56:21 -07:00
George Hotz	fe71282ba1	faster RDNA assembly backend (#990 ) * fast asm * torch gemm	2023-06-16 12:06:38 -07:00
George Hotz	ba56ee6020	RDNA assembly backend ($1000 bounty) (#787 ) * Revert "Revert "ops rdna"" This reverts commit `0400315078`. * Revert "Revert "writing 2"" This reverts commit `325a3bf2cf`. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more	2023-06-16 09:33:18 -07:00
Yahya Lmallas	804c45b5fc	FIX: Can't pickle local object (#979 ) _early_exec_process is a local function that is defined whiting the scope of another function, should be global	2023-06-14 12:32:17 -07:00
Steven Anderson	e54b6c5e7f	One hot (#972 ) * passing with 1d indices * passing all test * cleanup * using safe_numpy for scalar	2023-06-12 10:13:29 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
Steven Anderson	c0e558b77c	Test nllloss (#958 ) * works but slow * work with NC and NCd1 it still slow * refactor * support for k dimensions * without numpy	2023-06-09 09:00:29 -07:00
Diogo	6b1280f01c	fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956 ) * prelu and where casting * typing for safe_numpy * optional * get rid of tracing in ci * cleanup and resolved layernorm issues * removed debug print	2023-06-08 16:09:19 -07:00
Diogo	666d151f8a	Onnx slice fixups (#952 ) * resolved some slice test errors and added some more debugging logs * use same device in cumsum * increased float priority * onnx debug ouput match input	2023-06-07 19:44:30 -07:00
M4tthewDE	664d6cc7e5	Implement onnx MeanVarianceNormalization (#943 )	2023-06-06 10:28:19 -07:00
Steven Anderson	079ea217a3	fix test_pow_type - autocasting for Pow with inputs of diff type (#937 )	2023-06-05 15:22:35 -07:00
M4tthewDE	70f12fdb57	Fix wrong op version being used if versions equal (#934 )	2023-06-05 07:45:10 -07:00
Steven Anderson	79613eb83e	Test min (#932 ) * fix __neg__ defaulting to float32 due to 0.0 * fixed __neg__ always defaulting to float32 * fixed openpilot (OpenCL) Test	2023-06-05 00:03:30 -07:00
George Hotz	fbf17f0031	intel benchmark matmul gets 60 TFLOPS?	2023-06-04 17:01:50 +00:00
Steven Anderson	657e642e3a	Fixed test suite for Clip (#912 ) * Fixed test suite for Clip * fixed issue with clip when taking large negative numbers as min * Remove typings	2023-06-04 09:01:01 -07:00
George Hotz	afd0be8a9c	intel example	2023-06-04 06:43:09 +00:00
George Hotz	ed1963b899	Fast DiskTensor to other Tensor (#916 ) * make disktensors fast * loading * loader for sd and llama	2023-06-03 12:25:41 -07:00
George Hotz	791530045d	Refactor LoadOps (#910 ) * test * work * upd test * loadops * cleanups * real ones * remove LazyNumpyArray * fix assign test * remove range * np.require * llama uses arange kernels * no caching consts * fix enet * torch load support * tests cleanup * fix shufflenet * fix image * fix torch_load test	2023-06-03 09:40:43 -07:00
Steven Anderson	513aeb2f66	Fixed all ConstantOfShape test suite (#907 )	2023-06-02 11:26:40 -07:00
Steven Anderson	301f7b54c6	ConstantOfShape ONNX test fixed. (#890 ) * ConstantOfShape ONNX test fixed. * removed redundant if statement * value is optional and should default to a float32 tensor with value of 0 * fixed: default parameters are created at function definition, bad for mutable objects.	2023-06-02 07:34:25 -07:00
kposborne2	ae83e9844c	add output_padding to transposed conv (#875 )	2023-06-01 00:03:22 -07:00
Friedrich Carl Eichenroth	740304ef9d	Small Onnx Parser Improvements (#885 ) * wip * rename onnx_version to onnx_model_versioN * add type * add types * small cleanup * revert some changes from before * add todo * dumb fix	2023-06-01 00:01:01 -07:00
Marcello Fuschi	3924aae8ed	Fix ONNX dropout and unify the implementation (#857 ) * Fix ONNX dropout and unify the implementation * Use tensor rand method for dropout * Change approach for RNG in ONNX Dropout * Fix style * Test legacy RNG seeding * Remove the necessity for legacy RNG in Tensor class	2023-05-31 07:40:47 -07:00
skobsman	2e393f7ef2	InstanceNormalization ONNX test fixed. (#870 )	2023-05-30 16:07:44 -07:00
Friedrich Carl Eichenroth	f91f28d9e2	fix a bunch of tests (#856 )	2023-05-29 17:48:26 -07:00
zk-tarts	174c65b7d9	add onnx Binarizer op (#850 ) Co-authored-by: zk-tarts <>	2023-05-29 13:15:50 -07:00
M4tthewDE	4408c25e9a	Add Onnx op Shrink (#851 ) * Add onnx Shrink operation * Fix soft/hard shrink onnx test	2023-05-29 13:15:39 -07:00
Friedrich Carl Eichenroth	6f2b3755ca	set axis default to 0 (#854 )	2023-05-29 13:15:28 -07:00
Friedrich Carl Eichenroth	3b158f7a5f	fix onnx versions greater or equal 10 (#853 )	2023-05-29 13:04:06 -07:00
Diogo	1a5d72f812	Onnx ops And, Or, Xor, Not (#847 ) * onnx and, or, xor, not * added bool type to llvm and clang * removed float conversion * switched where op to use tensor func	2023-05-29 11:09:20 -07:00
SnakeOnex	844e6d0753	conv1d & conv3d onnx tests (#835 ) * conv1d onnx * [Work in progress] conv1d + enforcing full padding tuple length * make ONNX padding reorder not hardcoded, works for 1D and 3D convs now * conv2d interprets padding based on the input tensor dimensions	2023-05-29 10:16:45 -07:00
Marcello Fuschi	6d49925a26	Add max_pool2d dilation (#833 )	2023-05-28 15:16:48 -07:00
cheeetoo	21d27d31a9	Fix a couple pad tests (#827 ) * fix pad bug * float type hint for value * convert pads to list * update Pad type signature * Change \| to Union since not supported in < python 3.10	2023-05-28 12:06:46 -07:00
Mattis Megevand	606b841d3f	LR Schedulers (#755 ) * lr schedulers + test * lr scheduler test moved + integration test * integration test for all lr scheduler * lr scheduler test now deterministic * changed optimizer + parameters for lr sched test	2023-05-27 07:47:49 -07:00
George Hotz	87fa5af70a	ptx example	2023-05-26 19:28:51 -07:00
George Hotz	26014a0fa1	add convtranspose (#809 ) * add convtranspose * onnx convtranspose	2023-05-26 12:35:03 -07:00
wozeparrot	7351eb4b61	feat: put temperary file in the same directory as the destination file (#805 )	2023-05-25 20:46:02 -07:00
Diogo	c19ef0fcce	Add sin/cos/tan (#794 ) * added sin/cos/tan * fix lint * added onnx ops support	2023-05-25 09:04:56 -07:00
George Hotz	0400315078	Revert "ops rdna" This reverts commit `81a11d891d`.	2023-05-21 13:02:18 -07:00
George Hotz	325a3bf2cf	Revert "writing 2" This reverts commit `dddd6c42f0`.	2023-05-21 13:02:17 -07:00
George Hotz	dddd6c42f0	writing 2	2023-05-21 12:52:36 -07:00
George Hotz	81a11d891d	ops rdna	2023-05-21 11:45:38 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00
George Hotz	89b8b39d9c	fix mypy	2023-05-13 21:25:36 -07:00
George Hotz	e0b2035023	fast imagenet eval, gets 76.14% across the set	2023-05-13 21:18:31 -07:00
George Hotz	46d419060b	start on mlperf models	2023-05-10 16:30:49 -07:00
George Hotz	cb7c22beeb	fix mypy	2023-05-06 19:18:54 +00:00
George Hotz	5190037cbc	rocm: disassembler for shader	2023-05-06 19:07:52 +00:00
George Hotz	42256c0d9d	rocm sniffer dumps code	2023-05-05 18:36:53 +00:00

1 2 3 4 5 ...

300 Commits