tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 01:18:26 -05:00

Author	SHA1	Message	Date
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
Ahmed Harmouche	8818046940	YoloV8 on WebGPU (#8007 ) Port YoloV8 to WebGPU	2024-12-03 15:10:41 +01:00
George Hotz	32675a8a77	sacrifice ClangGraph on the altar of lines [pr] (#8009 )	2024-12-03 21:11:15 +08:00
geohotstan	0a2e10be1d	add SELU to Tensor (#7993 ) * add selu * more clean ups	2024-12-02 10:04:01 -05:00
nimlgen	81d415be03	amd pkt3 refactor (#7923 ) * amd pkt3 refactor * replace this * linter * fix * cmt * fast * simpler * linter * smth * missing	2024-11-28 11:06:37 +03:00
chenyu	336a9b6bf3	remove dtype from llama precompute_freqs_cis (#7930 ) do the cast based on input in first forward call instead	2024-11-27 22:28:40 -05:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
geohotstan	753f07e193	add circular pad mode to Tensor.pad (#7918 ) * start * send it * no more neg circular pads * quick fix onnx too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:30:51 -05:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	3b26e51fce	Tensor.cummax (#7854 ) generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD	2024-11-22 15:55:02 -05:00
George Hotz	1d6d842887	move DSP to extra (room for webgpu) [pr] (#7836 )	2024-11-22 11:32:57 +08:00
George Hotz	6fc7013463	put all DSP in dsp file [pr] (#7833 )	2024-11-22 11:22:59 +08:00
chenyu	69e382216d	fix wino conv output dtype for half inputs (#7829 )	2024-11-21 12:13:54 -05:00
geohotstan	cf1ec90ad4	add inverse trig functions to Tensor (#7805 ) * implement inverse trig functions * guess we should still test nans? * magnitude as variable name :D * reorder onnx_ops ops * approximation -> x for consistency * address feedback * simpler acos * improvement? * actually just have asin depend on atan * actually this is nicer * remove a comment --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-21 09:13:36 -05:00
George Hotz	e9ae2ccd09	_prg to match _buf [pr] (#7816 )	2024-11-21 12:44:48 +08:00
George Hotz	c5d458ce02	BufferSpec and ProgramSpec [pr] (#7814 ) * BufferSpec and ProgramSpec [pr] * delete preallocate, it's unused * Revert "delete preallocate, it's unused" This reverts commit `dcfcfaccde`.	2024-11-21 12:18:05 +08:00
George Hotz	bc977fec53	dname -> device [pr] (#7804 ) * dname -> device [pr] * a few more * only one left	2024-11-20 17:57:14 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
geohotstan	72a41095bc	add Tensor.meshgrid (#7714 ) * initial implementation and test * some other places that can use meshgrid * revert the onnx_ops change * add to docs * revert interpolate too * update * improve edge case test * might as well test grad * add to test can improve docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-16 23:06:47 -05:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
chenyu	d736ae7153	example script to show BasicTransformerBlock speed regression (#7724 )	2024-11-15 15:48:25 -05:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
qazal	e07d2d0966	skip TestBeamSearch.test_large_ast (#7652 )	2024-11-12 20:52:22 +08:00
chenyu	035e39f900	remove copied is_dtype_supported from onnx [pr] (#7646 )	2024-11-11 19:20:32 -05:00
Ahmed Harmouche	9c63c3d8ab	These casts should only happen if these are supported (#7644 )	2024-11-12 07:56:50 +08:00
nimlgen	4d81b7952a	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 21:56:51 +03:00
uuuvn	94a484542b	Hook memoryview via class instead of a function (#7627 )	2024-11-11 09:07:06 +08:00
chenyu	e7b18cf5c0	fix load_worlds filter_novariable (#7564 ) filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels	2024-11-05 16:06:39 -05:00
chenyu	207bca6cea	set PAGE_SIZE=1 and generate new dataset (#7559 ) 13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example	2024-11-05 11:25:01 -05:00
chenyu	7581a57aac	show the actual dataset size in error message (#7557 )	2024-11-05 09:16:30 -05:00
chenyu	0db5f52b2a	check `datasets/sops.gz` size to be > 5000 (#7555 ) it has > 12000 rows now, but it depends on the backend that generates these so setting a lower but meaningful threshold	2024-11-05 09:03:19 -05:00
chenyu	e641bbc859	safe softmax trick in MCTS ucb_explored_children (#7515 ) * safe softmax trick in MCTS ucb_explored_children fixed ``` File "numpy/random/mtrand.pyx", line 971, in numpy.random.mtrand.RandomState.choice ValueError: probabilities contain NaN ``` when all ucb_explored_children are big negative numbers result in all NaN probabilities * better type	2024-11-03 15:59:31 -05:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
chenyu	fb694a63eb	Tensor.erf (#7419 ) the same one used in onnx and the one in bert.	2024-10-30 18:12:28 -04:00
eliotgolding	e920f1d663	Llama 3.2 1B load from GGUF (#7295 ) * gguf 1b-instruct * not needed	2024-10-27 09:29:02 +08:00
nimlgen	68cd2c0669	nv correct local memory based on device (#7307 ) * nv correct local memory based on device * linter * oops * oops2	2024-10-25 22:23:42 +03:00
nimlgen	ea11382087	nv fix shared_memory_size (#7239 )	2024-10-23 21:59:47 +03:00
qazal	aeeb917b6e	mask out writable bufs in runtime access_resources (#7234 )	2024-10-23 16:13:50 +03:00
George Hotz	b0a13896d7	PtrDType is dataclass [pr] (#7125 ) * PtrDType is dataclass [pr] * new dataset --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-18 09:40:33 -04:00
nimlgen	45db7d9045	fuzz qcom vs opencl (#7130 ) * fuzz qcom vs opencl * fix nv * bettre? * typo * open both devs	2024-10-17 18:49:08 +03:00
George Hotz	3169cb386d	remove graph [pr] (#7085 )	2024-10-16 11:40:07 +08:00
nimlgen	b025495e5c	fuzz nv vs cuda (#7066 ) * fuzz nv vs cuda * fixes * smth * um * cmp the same * dnrt * correct gpfifo scan * fix	2024-10-15 22:22:40 +03:00
qazal	8ff6514ba3	delete extra/ops.py [pr] (#7072 )	2024-10-15 22:14:21 +03:00
nimlgen	586ff4c910	nv record uvm mappings (#7059 ) * nv record uvm mappings * linteeer * smth * ooops	2024-10-15 00:12:49 +03:00
nimlgen	8094340221	nv print info about faults (#7057 ) * nv print info about faults * unrelated changes * nv_gpu.GT200_DEBUGGER in mockgpu * regen with ocrrect version * spacing	2024-10-14 21:49:38 +03:00
chenyu	bd8ecf7fd6	remove NumNode (#7035 )	2024-10-13 16:42:19 -04:00
chenyu	c4c806a210	generate new kernel dataset (#7034 ) * generate new kernel dataset pre req to remove NumNode ``` extra/optimization/generate_dataset.sh gzip -k /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix var range in fuzz_linearizer	2024-10-13 16:19:41 -04:00
qazal	13846930cd	hotfix: extract_dataset.py (#7029 )	2024-10-13 11:18:23 +03:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00

... 9 10 11 12 13 ...

1363 Commits