tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	e37bff6c19	fix bug in jit prune with copy [pr] (#8073 )	2024-12-06 18:38:23 +08:00
George Hotz	aae8557ada	test copy inside jit [pr] (#8072 )	2024-12-06 17:51:50 +08:00
George Hotz	e2fe7f0d2f	hotfix: actually fix pylint, it's a python 3.10 issue	2024-12-06 13:53:46 +08:00
George Hotz	b28d660172	update self_tokenize, fix pylint maybe	2024-12-06 13:49:41 +08:00
George Hotz	344fd4845c	example: self_tokenize. someday tinygrad will be recursively self improving	2024-12-06 13:35:02 +08:00
JaSpa99	3c5d5f9414	mypy==1.13.0 (#7990 ) * explicit instantiation and narrowing asserts * explicit cast * bump * one line assert * handle case for no copy_queue_t * Revert "handle case for no copy_queue_t" This reverts commit `38347806ca`. * more readable control flow --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-06 12:09:14 +08:00
leopf	65b6696f3b	refactor safe_load (#8035 ) * refactor safe_load * cleanup	2024-12-06 12:08:21 +08:00
chenyu	e7d5fe4a32	improve idiv _min_max (#8066 ) for the cases that the we don't know the exact bounds, we might still know the sign. with this, can remove some resolve for symbolic shapetracker	2024-12-05 23:02:16 -05:00
chenyu	13b954f22c	unify expand conditions [pr] (#8065 ) same condition (check if old == new or old == 1) in tensor and view. also renamed _pad_left to _align_left because it's not really a pad	2024-12-05 21:40:14 -05:00
chenyu	aefdff4ef5	reshape mask cleanups [pr] (#8064 ) don't need canonicalize_st because we always merge 1 in `_merge_dims`	2024-12-05 20:20:43 -05:00
chenyu	05dba6e4ee	minor to_indexed_uops cleanup [pr] (#8063 )	2024-12-05 17:15:03 -05:00
chenyu	b2dd703592	fix typing of UOp.range [pr] (#8062 ) start/end should not be float or bool	2024-12-05 14:56:34 -05:00
Sieds Lykles	49c6dab74b	Add pattern for div mod recombine with gcd (#8061 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-05 13:16:58 -05:00
geohotstan	707e9a9c8e	add _one_hot_along_dim helper for Tensor.arange masking (#8039 ) * feelsbadman * feelsextrabadman * make sure indices is on same device as self Tensor * renamed to _one_hot_along_dim * revert onnx change will do them in onnx only PRs * address feedback * add onnx changes here too * make pad arg better * revert pad arg * maybe still keep dim * simplify onehot onnx ops more --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-05 12:43:00 -05:00
chenyu	3c5983473a	combine parentless reduce rule [pr] (#8059 )	2024-12-05 11:28:35 -05:00
chenyu	87594a8153	simpler dtypes.max for int [pr] (#8058 )	2024-12-05 10:31:41 -05:00
geohotstan	66b8242375	Simple onnx.py clean ups (#8054 ) * start * simplify ops * why did this not work before * will split buffer parse to separate pr * flip the error order * only this much for now * to_python_const clean up * minimize diff * move tensor_methods into onnx.py * improve some type signatures --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-05 10:31:26 -05:00
chenyu	5c6ed5dba6	lower test_conv_3x3_256_32_32_256_256 expectation (#8060 ) failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210	2024-12-05 10:30:56 -05:00
Ahmed Harmouche	c6f5bb03fa	YoloV8 WebGPU fixes (#8057 ) * Bump up input size to 416, show if webgpu is not supported * Minor fix in export_model	2024-12-05 16:23:45 +01:00
nimlgen	78c01a5c2b	amd general _gpu_alloc (#8056 ) * amd general _gpu_alloc * hmm * ops	2024-12-05 15:50:23 +03:00
nimlgen	8071600897	nv one _gpu_alloc (#8055 )	2024-12-05 15:22:03 +03:00
Ahmed Harmouche	ff9a89f714	Proper dtypes for input/output of exported WebGPU model (#8053 ) * Respect input/output dtypes in exported WebGPU model * Add some comments about skipped dtypes	2024-12-05 10:38:05 +01:00
qazal	435a51e10c	reduce folding simple tests [pr] (#8040 ) * reduce folding simple tests [pr] * test for view and realized src pattern * realize / buffer behavior	2024-12-05 12:22:45 +08:00
George Hotz	20878be2af	lower test_gemv_4096_16384 expectations	2024-12-05 12:08:26 +08:00
George Hotz	83aecbdc70	do gpuocelot copy manually [pr] (#8050 )	2024-12-05 11:51:20 +08:00
George Hotz	4a208bfb28	bump download cache version	2024-12-05 11:42:34 +08:00
George Hotz	df18e7cc37	accept filename decorator [pr] (#8049 ) * accept filename decorator [pr] * add test for safe_load * bring old tar tests back	2024-12-05 11:40:59 +08:00
Francis Lata	c3187087f7	QwQ-32B-Preview support (#7962 ) * load weights with some debugging * start running a prompt * cleanup * optionally permute layers and cleanup * add validation for simple prompt * small cleanup * minor cleanup with formatting download links * add a longer prompt * add timing option * some typings * remove unused arg * reset GlobalCounters * minor cleanups	2024-12-04 21:46:37 -05:00
chenyu	b3220ca7b1	test cases of always True/False lt (#8048 ) * test cases of always True/False lt * one more	2024-12-04 20:38:40 -05:00
chenyu	8bb806888b	hook_overflow -> safe_exp2 [pr] (#8047 ) that's the only use case, so no need for indirection	2024-12-04 19:05:38 -05:00
chenyu	99abdc6d39	minor push_swizzle_down_through_elementwise cleanup [pr] (#8046 ) walrus, and if x are the same, prod(x) must be the same	2024-12-04 17:22:37 -05:00
chenyu	5933ec8dc3	use argfix in smax/smin and remove if [pr] (#8045 )	2024-12-04 17:06:13 -05:00
chenyu	4e518334b8	minor get_grouped_dims cleanup [pr] (#8044 )	2024-12-04 16:22:51 -05:00
geohotstan	5ce8090d42	simple onnx_ops cleanups (#8003 ) * simple clean ups first * more work * kinda have adam * ooo momentum worked nicely * almost there * wow.. is the onnx test wrong * nicer optim stuff * just skip that test * small comment changes * use naming convention from other parts of codebase --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 15:33:03 -05:00
Sieds Lykles	70db1bab5c	Fold nested div with const (#8010 ) * Rebase nested div and with const * Update the ordering * return None on vectors Fixes cpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 14:59:09 -05:00
chenyu	0693158d28	lower v_theoretical gemv on red (#8042 ) tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209	2024-12-04 13:59:40 -05:00
chenyu	5c2b1089b2	vectorized input in div_and_mod_folding returns None [pr] (#8041 )	2024-12-04 13:36:41 -05:00
qazal	ff6def9ffb	simple contiguous_while_contiguous prereqs [pr] (#8038 ) * simple contiguous_while_contiguous prereqs [pr] * early realize * fine if it's folding a non-contig buffer	2024-12-04 23:00:28 +08:00
Ahmed Harmouche	c9e7701417	Fast YoloV8 on WebGPU (#8036 ) * Fast yolov8 with downscaled input * Faster + FPS meter * Add loader while model is downloading/compiling * Title touchup	2024-12-04 15:23:09 +01:00
qazal	b116e1511d	make device on uop optional [pr] (#8034 )	2024-12-04 20:18:00 +08:00
Ahmed Harmouche	13eedd373b	Run WebGPU tests on ubuntu (#8033 )	2024-12-04 12:42:04 +01:00
leopf	fb89971e73	use BufferedReader (#8032 )	2024-12-04 19:08:54 +08:00
George Hotz	08657cb7b0	hotfix: bump expectations in speed_v_theoretical	2024-12-04 19:00:33 +08:00
George Hotz	ea65c79ba2	hotfix: don't spam BEAM debug in speed_v_theoretical	2024-12-04 18:47:16 +08:00
George Hotz	09b00b1b04	hotfix: use kernel timings instead of python timings in speed_v_theoretical	2024-12-04 18:36:17 +08:00
George Hotz	8f65c1fafb	simpler block reorder function [pr] (#8031 ) * simpler block reorder function [pr] * simpler * block_reorder in substitute, so wasteful otherwise * extend and count * leave push logic for same order * sort new ctx * less loop * Revert "less loop" This reverts commit `30249d097a`.	2024-12-04 17:57:35 +08:00
leopf	f0401e14e8	tar_extract with Tensors (#7853 ) * initial * USTAR, PAX and GNU support + testing * from_bytes byteorder * use TarInfo.frombuf * tensor only usage * remove contextlib.suppress * shorter ow,pax * more tests * testing length + move tests * cleanup * new approach: RawTensorIO * fix fetch * enable read test * cleanup and ignore fix * fix for python < 3.12 * make it RawIO * functions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 17:03:19 +08:00
George Hotz	1e06aefde7	bunch up ops for lines [pr] (#8030 )	2024-12-04 17:03:01 +08:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
George Hotz	bb98bae751	local reordering in block (#8029 ) * local reordering in block * load (and parents) is highest priority * minor loads in order * comments * explicit depth * simpler * matters less, but store early too	2024-12-04 15:11:29 +08:00

1 2 3 4 5 ...

7104 Commits