tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-19 02:44:40 -05:00

Author	SHA1	Message	Date
chenyu	a77ee72d11	clean up reshape size check [pr] (#8067 ) removed a resolve, and remove special case for 0 size assert since it's covered by generic size check	2024-12-06 07:51:19 -05:00
geohotstan	074a67a6eb	combine get inputs and type_parse function in onnx (#8069 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous	2024-12-06 07:42:35 -05:00
nimlgen	c0240855b9	qcom has not transfer (#8075 ) * qcom alloc is not hcq alloc * maybe base? * test	2024-12-06 14:45:01 +03:00
Ahmed Harmouche	ce72fe1411	u32 to f16 in tinygrad (#8074 ) * f16 decompression in tinygrad * Typing and cleanup	2024-12-06 12:00:13 +01:00
George Hotz	e37bff6c19	fix bug in jit prune with copy [pr] (#8073 )	2024-12-06 18:38:23 +08:00
George Hotz	aae8557ada	test copy inside jit [pr] (#8072 )	2024-12-06 17:51:50 +08:00
chenyu	e7d5fe4a32	improve idiv _min_max (#8066 ) for the cases that the we don't know the exact bounds, we might still know the sign. with this, can remove some resolve for symbolic shapetracker	2024-12-05 23:02:16 -05:00
Sieds Lykles	49c6dab74b	Add pattern for div mod recombine with gcd (#8061 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-05 13:16:58 -05:00
chenyu	5c6ed5dba6	lower test_conv_3x3_256_32_32_256_256 expectation (#8060 ) failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210	2024-12-05 10:30:56 -05:00
Ahmed Harmouche	ff9a89f714	Proper dtypes for input/output of exported WebGPU model (#8053 ) * Respect input/output dtypes in exported WebGPU model * Add some comments about skipped dtypes	2024-12-05 10:38:05 +01:00
qazal	435a51e10c	reduce folding simple tests [pr] (#8040 ) * reduce folding simple tests [pr] * test for view and realized src pattern * realize / buffer behavior	2024-12-05 12:22:45 +08:00
George Hotz	20878be2af	lower test_gemv_4096_16384 expectations	2024-12-05 12:08:26 +08:00
George Hotz	df18e7cc37	accept filename decorator [pr] (#8049 ) * accept filename decorator [pr] * add test for safe_load * bring old tar tests back	2024-12-05 11:40:59 +08:00
chenyu	b3220ca7b1	test cases of always True/False lt (#8048 ) * test cases of always True/False lt * one more	2024-12-04 20:38:40 -05:00
geohotstan	5ce8090d42	simple onnx_ops cleanups (#8003 ) * simple clean ups first * more work * kinda have adam * ooo momentum worked nicely * almost there * wow.. is the onnx test wrong * nicer optim stuff * just skip that test * small comment changes * use naming convention from other parts of codebase --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 15:33:03 -05:00
Sieds Lykles	70db1bab5c	Fold nested div with const (#8010 ) * Rebase nested div and with const * Update the ordering * return None on vectors Fixes cpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 14:59:09 -05:00
chenyu	0693158d28	lower v_theoretical gemv on red (#8042 ) tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209	2024-12-04 13:59:40 -05:00
qazal	b116e1511d	make device on uop optional [pr] (#8034 )	2024-12-04 20:18:00 +08:00
Ahmed Harmouche	13eedd373b	Run WebGPU tests on ubuntu (#8033 )	2024-12-04 12:42:04 +01:00
George Hotz	08657cb7b0	hotfix: bump expectations in speed_v_theoretical	2024-12-04 19:00:33 +08:00
George Hotz	ea65c79ba2	hotfix: don't spam BEAM debug in speed_v_theoretical	2024-12-04 18:47:16 +08:00
George Hotz	09b00b1b04	hotfix: use kernel timings instead of python timings in speed_v_theoretical	2024-12-04 18:36:17 +08:00
leopf	f0401e14e8	tar_extract with Tensors (#7853 ) * initial * USTAR, PAX and GNU support + testing * from_bytes byteorder * use TarInfo.frombuf * tensor only usage * remove contextlib.suppress * shorter ow,pax * more tests * testing length + move tests * cleanup * new approach: RawTensorIO * fix fetch * enable read test * cleanup and ignore fix * fix for python < 3.12 * make it RawIO * functions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 17:03:19 +08:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
chenyu	0c060fa040	update uop and tests to not use lt/gt/le/ge [pr] (#8023 ) just use dunder methods, eventually remove those from ops	2024-12-03 21:02:52 -05:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
chenyu	ef3752625b	add test case of realize_size with 0 in shape (#8011 )	2024-12-03 09:19:50 -05:00
George Hotz	09eac42fd6	cache indexed uops in st [pr] (#8008 ) * cache indexed uops in st [pr] * remove arg from range	2024-12-03 21:27:07 +08:00
Sieds Lykles	e44183647f	Improved div folding (#7996 ) * First version of div_mod folding together * Working version with old div folding behaviour * Test is fixed * Fix linting * Happy mypy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-03 08:11:25 -05:00
qazal	5441127417	assert const folding return shape matches [pr] (#8006 )	2024-12-03 19:31:06 +08:00
George Hotz	dddfb494d7	don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000 ) * don't mutate the uop/lazybuffer, just the Buffer [pr] * fix red test * try different fix * that * that's the right fix * test for fixed behavior * bump to 3.12	2024-12-03 19:03:51 +08:00
George Hotz	b8bf5b2787	minor uop speedups [pr] (#8002 ) * minor uop cleaner [pr] * free uop creation speed by removing WeakValueDictionary * a lil faster * disable that test * lines * and it doesn't print non hit patterns	2024-12-03 17:04:48 +08:00
George Hotz	0905f87b68	hotfix: print only kernel time	2024-12-03 14:25:08 +08:00
chenyu	c7bc75e634	alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900 ) * alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) only do if at least one branch is const, so total alu won't increase * tests and interesting TODO cases	2024-12-02 17:19:27 -05:00
chenyu	b91fa24387	script to run regressed sd conv on metal (#7995 ) * script to run regressed sd conv on metal this and other similar `conv2d + add` kernels contributed to most of the speed regression * # ruff: noqa: E501	2024-12-02 15:34:27 -05:00
geohotstan	0a2e10be1d	add SELU to Tensor (#7993 ) * add selu * more clean ups	2024-12-02 10:04:01 -05:00
qazal	bb606e5bcf	process replayable ops.py changes from delete_lazy [pr] (#7994 ) * process replayable ops.py changes from delete_lazy [pr] * hotfix: seed tiny_jit	2024-12-02 19:38:31 +08:00
George Hotz	0c7477b108	no bool in range [pr] (#7988 ) * no bool in range [pr] * fix llvm * add arg to range spec * fix broken test * forgot this one * hotfix: test_tiny jit is a real test	2024-12-02 19:05:16 +08:00
Ahmed Harmouche	1ea0925744	Support packed types in smem in webgpu	2024-12-02 10:13:25 +01:00
George Hotz	275951b730	clean up a few parents -> toposort [pr] (#7984 ) * clean up a few parents -> toposort [pr] * rename to old_parents + sched tests * a few more * that one * second to last * final	2024-12-02 15:59:31 +08:00
George Hotz	f17af70d17	replace all sparents with toposort (#7983 )	2024-12-02 15:00:30 +08:00
qazal	b797aee720	uop global buf number tracking try 2 [pr] (#7912 ) * uop buffer init small refactor [pr] * add early * this way it doesn't need late * buffer_num * itertools.count * count from 0 * down to 380	2024-12-02 14:45:17 +08:00
George Hotz	cbcc1c20eb	second try at block linearize (#7892 ) * second try at block linearize * weeee, works for lil matmul * it's so beautiful * test tiny passes * fix bugs * combine matching BLOCKENDS * wrapping * test lin failures passes * those failures were fake * flip sort order * fix ptx tests * deal with store better * dumb ptx fix * expect less * reduce lines * reduce lines * less lines and cleaner * no defaultdict * tighter * simpler block_parent_count	2024-12-02 13:43:09 +08:00
mesozoic-egg	90e2b2d577	Remove gated store, put rewrite to uopgraph [pr] (#7975 ) * update test for gated store * put gated store rewrite to uopgraph, rm from ptx * update test update test update test * remove gated st rewrite in llvm * lint --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-02 12:33:16 +08:00
George Hotz	d53cd92364	fix tests for delete lazy [pr] (#7980 )	2024-12-02 12:00:48 +08:00
George Hotz	6c1efb9a72	hotfix: amd gemv was flaky	2024-12-02 11:08:24 +08:00
ignaciosica	509c4a573f	increase tolerance on test (#7972 )	2024-11-30 11:50:10 -05:00
qazal	6f17eedaea	schedule sink folding try 2 [pr] (#7968 )	2024-11-30 20:46:26 +08:00
qazal	5615e92df8	const folding tests [pr] (#7967 )	2024-11-30 19:27:30 +08:00
qazal	8780818d04	Revert "schedule sink folding with graph_rewrite [pr] (#7963 )" (#7965 ) This reverts commit `4529c5d0da`.	2024-11-30 19:02:06 +08:00

1 2 3 4 5 ...

2997 Commits