tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-29 08:48:15 -05:00

Author	SHA1	Message	Date
qazal	ff6def9ffb	simple contiguous_while_contiguous prereqs [pr] (#8038 ) * simple contiguous_while_contiguous prereqs [pr] * early realize * fine if it's folding a non-contig buffer	2024-12-04 23:00:28 +08:00
Ahmed Harmouche	c9e7701417	Fast YoloV8 on WebGPU (#8036 ) * Fast yolov8 with downscaled input * Faster + FPS meter * Add loader while model is downloading/compiling * Title touchup	2024-12-04 15:23:09 +01:00
qazal	b116e1511d	make device on uop optional [pr] (#8034 )	2024-12-04 20:18:00 +08:00
Ahmed Harmouche	13eedd373b	Run WebGPU tests on ubuntu (#8033 )	2024-12-04 12:42:04 +01:00
leopf	fb89971e73	use BufferedReader (#8032 )	2024-12-04 19:08:54 +08:00
George Hotz	08657cb7b0	hotfix: bump expectations in speed_v_theoretical	2024-12-04 19:00:33 +08:00
George Hotz	ea65c79ba2	hotfix: don't spam BEAM debug in speed_v_theoretical	2024-12-04 18:47:16 +08:00
George Hotz	09b00b1b04	hotfix: use kernel timings instead of python timings in speed_v_theoretical	2024-12-04 18:36:17 +08:00
George Hotz	8f65c1fafb	simpler block reorder function [pr] (#8031 ) * simpler block reorder function [pr] * simpler * block_reorder in substitute, so wasteful otherwise * extend and count * leave push logic for same order * sort new ctx * less loop * Revert "less loop" This reverts commit `30249d097a`.	2024-12-04 17:57:35 +08:00
leopf	f0401e14e8	tar_extract with Tensors (#7853 ) * initial * USTAR, PAX and GNU support + testing * from_bytes byteorder * use TarInfo.frombuf * tensor only usage * remove contextlib.suppress * shorter ow,pax * more tests * testing length + move tests * cleanup * new approach: RawTensorIO * fix fetch * enable read test * cleanup and ignore fix * fix for python < 3.12 * make it RawIO * functions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 17:03:19 +08:00
George Hotz	1e06aefde7	bunch up ops for lines [pr] (#8030 )	2024-12-04 17:03:01 +08:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
George Hotz	bb98bae751	local reordering in block (#8029 ) * local reordering in block * load (and parents) is highest priority * minor loads in order * comments * explicit depth * simpler * matters less, but store early too	2024-12-04 15:11:29 +08:00
George Hotz	4cb630ac1c	hotfix: early INDEX	2024-12-04 14:47:47 +08:00
George Hotz	fdd1e56827	clean up rewrite logic + merge siblings (#8026 ) * clean up rewrite logic [pr] * simpler * merge sibling blocks * no PR	2024-12-04 13:26:16 +08:00
chenyu	004b2ecff5	remove lt/gt/le/ge from SimpleMathTrait [pr] (#8027 ) just use the dunder methods	2024-12-04 00:24:33 -05:00
chenyu	39e0fc05f5	update function to not use gt/lt [pr] (#8025 ) pr does not test this, but it's the same	2024-12-03 22:39:06 -05:00
chenyu	cfd4d19250	replace .lt in rewrite rules with < [pr] (#8024 )	2024-12-03 21:34:47 -05:00
chenyu	0c060fa040	update uop and tests to not use lt/gt/le/ge [pr] (#8023 ) just use dunder methods, eventually remove those from ops	2024-12-03 21:02:52 -05:00
chenyu	03bf9c2985	unused mul add lt rule [pr] (#8022 )	2024-12-03 19:38:34 -05:00
nimlgen	7fda464b08	hcq c-like args state (#8020 ) * hcq c-like args state * ugh * Dfix * rename * i	2024-12-03 23:53:35 +03:00
qazal	099364ed32	lazy srcs shape mistmatch assert + fix ASSIGN [pr] (#8014 ) * lazy srcs shape mistmatch assert [pr] * duplicate assert * base it later * keep the assert	2024-12-03 15:40:37 -05:00
ignaciosica	f14dd1488e	reduce on wmma (#8016 )	2024-12-03 12:46:28 -05:00
chenyu	dacb1ff38a	minor nn cleanups (#8018 ) use more .numel and .ndim	2024-12-03 12:34:52 -05:00
chenyu	35c30f76f2	minor tweak in ptx asm_for_op [pr] (#8017 ) always compare with dtypes instead of name string	2024-12-03 12:34:22 -05:00
chenyu	a5af4e5596	clean up wgsl_matcher [pr] (#8015 ) use more UPat syntatic sugar and remove unneeded rules	2024-12-03 11:55:03 -05:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
chenyu	ef3752625b	add test case of realize_size with 0 in shape (#8011 )	2024-12-03 09:19:50 -05:00
Ahmed Harmouche	8818046940	YoloV8 on WebGPU (#8007 ) Port YoloV8 to WebGPU	2024-12-03 15:10:41 +01:00
George Hotz	09eac42fd6	cache indexed uops in st [pr] (#8008 ) * cache indexed uops in st [pr] * remove arg from range	2024-12-03 21:27:07 +08:00
Sieds Lykles	e44183647f	Improved div folding (#7996 ) * First version of div_mod folding together * Working version with old div folding behaviour * Test is fixed * Fix linting * Happy mypy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-03 08:11:25 -05:00
George Hotz	32675a8a77	sacrifice ClangGraph on the altar of lines [pr] (#8009 )	2024-12-03 21:11:15 +08:00
qazal	5441127417	assert const folding return shape matches [pr] (#8006 )	2024-12-03 19:31:06 +08:00
George Hotz	dddfb494d7	don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000 ) * don't mutate the uop/lazybuffer, just the Buffer [pr] * fix red test * try different fix * that * that's the right fix * test for fixed behavior * bump to 3.12	2024-12-03 19:03:51 +08:00
qazal	ba1183314a	const_like can return a valid [pr] (#8005 ) * const_like can return a valid [pr] * fixup	2024-12-03 18:42:12 +08:00
qazal	4e91533419	test: don't ref until schedule (#8004 )	2024-12-03 18:06:52 +08:00
George Hotz	b8bf5b2787	minor uop speedups [pr] (#8002 ) * minor uop cleaner [pr] * free uop creation speed by removing WeakValueDictionary * a lil faster * disable that test * lines * and it doesn't print non hit patterns	2024-12-03 17:04:48 +08:00
George Hotz	1028b34a20	add typing to basicblocks (#7999 )	2024-12-03 15:05:11 +08:00
George Hotz	0905f87b68	hotfix: print only kernel time	2024-12-03 14:25:08 +08:00
chenyu	17d5719a38	add process replay to webgpu tests (#7998 )	2024-12-02 20:27:29 -05:00
chenyu	c7bc75e634	alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900 ) * alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) only do if at least one branch is const, so total alu won't increase * tests and interesting TODO cases	2024-12-02 17:19:27 -05:00
chenyu	b91fa24387	script to run regressed sd conv on metal (#7995 ) * script to run regressed sd conv on metal this and other similar `conv2d + add` kernels contributed to most of the speed regression * # ruff: noqa: E501	2024-12-02 15:34:27 -05:00
geohotstan	0a2e10be1d	add SELU to Tensor (#7993 ) * add selu * more clean ups	2024-12-02 10:04:01 -05:00
Ahmed Harmouche	146e1caea3	Downgrade wgpu to prevent sd segfault (#7969 )	2024-12-02 15:48:44 +01:00
wozeparrot	077e7e8ed2	fix: private segment sgpr on gfx103x (#7987 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-02 20:54:50 +08:00
qazal	bb606e5bcf	process replayable ops.py changes from delete_lazy [pr] (#7994 ) * process replayable ops.py changes from delete_lazy [pr] * hotfix: seed tiny_jit	2024-12-02 19:38:31 +08:00
George Hotz	0c7477b108	no bool in range [pr] (#7988 ) * no bool in range [pr] * fix llvm * add arg to range spec * fix broken test * forgot this one * hotfix: test_tiny jit is a real test	2024-12-02 19:05:16 +08:00
Ahmed Harmouche	8909dbd82c	Remove wgpu specific checks from stable diffusion example (#7991 )	2024-12-02 11:31:14 +01:00
qazal	e2916ff210	image dtype fixup refactor for delete_lazy [pr] (#7989 )	2024-12-02 18:25:13 +08:00
Ahmed Harmouche	5340d3dedf	Merge pull request #7986 from tinygrad/atomics-in-smem-wgpu Support packed types in smem on webgpu	2024-12-02 10:38:19 +01:00

... 66 67 68 69 70 ...

10417 Commits