tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	cb69b7b2b2	comment out fold_where_closure (#14316 )	2026-01-24 10:15:42 -05:00
wozeparrot	d74587f16d	fa multi fix 2 (#14314 )	2026-01-23 23:35:02 -08:00
chenyu	d9f0ad1d87	update return type for Tensor.tolist (#14313 ) since sequence is incorrect since it can be list of list, use Any to avoid recursive type	2026-01-23 23:21:49 -05:00
qazal	807bc40931	assembly/amd: dsl and disasm cleanup (#14311 ) * rdna4 inst helper * remove dsl aliases	2026-01-24 11:36:12 +09:00
Christopher Milan	e782d44918	WEBGPU/NIR truncates ints (#14307 ) * WEBGPU truncates ints * nir has this bug too	2026-01-23 19:28:06 -05:00
nimlgen	26220a472e	no core_id (#14265 ) * no core_id * kwargs * est * linters * ugh * revert this * deps * glb * should work? * nn * line * fx * ym * z * d * um? * revert * this one? * first half * um p2 * all? * um * cleaner * um	2026-01-23 21:30:12 +03:00
chenyu	e65bc7a7c5	where closure folding (#14304 )	2026-01-23 10:55:13 -05:00
chenyu	d5a3b02a9c	clean up xpow (#14295 ) mostly for `ret * (base < 0).where(adj, ret.const_like(1))` -> `(base < 0).where(neg_base, ret)`, since it's good for NAN neg_base but not generic	2026-01-23 10:19:47 -05:00
qazal	b913c910c5	assembly/amd: rdna4 passing test_roundtrip (#14300 ) * test_roundtrip on different archs * failing tests * take RDNA4 xml changes from the emu branch * work * min diff to disasm flat * test_add passes, rdna4 first * correct vgpr field for the multi dword store stuff * amdllvm * recompile in roundtrip, get sources from emulator * amdllvm, 2 * clean clean * note, don't rely on that os.environ --------- Co-authored-by: George Hotz <geohot@gmail.com>	2026-01-23 21:33:53 +09:00
qazal	f3b0e42863	remove extra sqtt pickles in gfx1200 (#14302 )	2026-01-23 20:13:48 +09:00
George Hotz	d116312b1a	get cdna sqtt working (#14301 ) * get cdna sqtt working * cnd aprser * wavestart/waveend * names * cdna * test that	2026-01-23 18:46:15 +08:00
George Hotz	a5c4fa39d1	RDNA4 support in SQTT (#14299 ) * table test * cleanups * dead file * delta short * tests * delta test * work * l4 tests pass * l0 * cnda * print * reverT * wave failure * wave failure * test * encs * no l0 crap * L4 * rdna4 sqtt * notes * linter	2026-01-23 16:16:45 +08:00
wozeparrot	963c59ebdb	fix: pull fixes from gradacc branch (#14296 )	2026-01-22 23:07:54 -08:00
Christopher Milan	68668b8f28	fix WEBGPU NEG (#14298 ) * fix WEBGPU NEG * add test * parenthesize	2026-01-23 01:44:52 -05:00
qazal	3b8a7bb8c9	use existing roc.py infra for sqtt tests (#14297 ) * add pc, per kernel tracing * work * remove those imports * min diff	2026-01-23 14:07:11 +09:00
chenyu	5f32f7a06b	fix winograd padding order (#14294 )	2026-01-22 23:00:14 -05:00
George Hotz	52b989c6c8	don't place consts early + fixes from anthropic challenge (#14286 ) * don't place consts early * add anthropic challenge * with ref * do we still have to devectorize bools? * tests pass * just WHERE * fine, revert that * fine, revert * only index * z3 validator doesn't support vectorized * Revert "z3 validator doesn't support vectorized" This reverts commit `1b7930ecb3`. * z3 not for vec * no spec * VLIWRenderer * loop unrolling * better comments * cleanups * skip cast * renderer * cleanups * prints * no hack * hacks * bump to 11 * reg warning * lil clean * cleaner renderer	2026-01-23 10:48:39 +09:00
chenyu	0903782bc0	remove few dead or unneeded codes [pr] (#14275 )	2026-01-22 20:05:43 -05:00
chenyu	3eb5cd7d32	stronger test_rand_is_lazy (#14293 )	2026-01-22 18:58:53 -05:00
chenyu	c15b6e6709	update test_randn_finite skipped device (#14292 )	2026-01-22 18:26:02 -05:00
chenyu	073c6a81b5	raise if Tensor._buffer is called during jit (#14114 ) * raise if Tensor._buffer is called during jit * cleaner	2026-01-22 17:30:18 -05:00
nimlgen	8cd22df2dd	amd: alive wgps (#14149 ) * amd: disabled wgps * l * wgp * uoops * mockgpu * drm * ad this * fi * reg	2026-01-23 00:08:45 +03:00
chenyu	a738c4bb22	test symbolic view broken with jit (#14290 )	2026-01-22 13:44:47 -05:00
chenyu	f22fa6a5be	test rand is lazy (#14289 )	2026-01-22 13:07:55 -05:00
chenyu	1726b884f2	update test_jit_v_nojit_random_regen (#14288 ) current behavior is that jit and non-jit consume random seed differently, still the random values are different	2026-01-22 12:21:47 -05:00
chenyu	fbed36fa15	jit graph handle input==output aliasing (#14287 ) a position that wasn't an input during capture should never become an input during execution, but graph cannot tell this by jit_cache and input_buffers only	2026-01-22 11:37:41 -05:00
chenyu	8bb61c2490	stronger test_graph_input_output_aliasing (#14282 ) * stronger test_graph_input_output_aliasing * comfirmed failure	2026-01-22 09:59:34 -05:00
qazal	d7afa02085	clean up the extra/sqtt directory (#14284 ) * remove legacy test_timing stuff * remove legacy test_pmc, update active_sqtt_parse	2026-01-22 19:10:59 +09:00
qazal	dff5f361b0	support rendering assembly kernels on the NULL backend (#14283 ) * assembly custom kernels in DEV=NULL, use renderer arch * update mmapeak * llvm	2026-01-22 15:49:07 +09:00
qazal	dfefeddeed	add tflops to cdna gemm custom kernel (#14281 )	2026-01-22 12:48:28 +09:00
qazal	18f408a35a	custom assembly kernel with variable tests (#14280 ) * custom assembly kernel with variable tests * different threads * sink * zeros like / flatten	2026-01-22 11:34:17 +09:00
chenyu	4de107b764	jit graph bug when input is output (#14278 ) * jit graph bug when input is output wrong result in llm * not just metal	2026-01-21 18:49:52 -05:00
wozeparrot	76a9242a66	fa: merge kv bwd into one kernel (#14277 )	2026-01-21 15:24:41 -08:00
chenyu	6279ae4a94	remove llm generate always reset start_pos (#14276 ) * remove llm generate always reset start_pos by itself seems like a bug, also added a test to repro forward_jit.reset() issue * issue is jit graph, so revert that test	2026-01-21 16:54:30 -05:00
nimlgen	da1fedc3c8	working ioctls (#14272 )	2026-01-21 20:29:04 +03:00
chenyu	574d171fa6	fix onnx Pad constant_value=None (#14271 ) also removed a dead branch in _resolve_pool_pads	2026-01-21 11:51:34 -05:00
chenyu	a18d34be1e	simpler split_store outer range check [pr] (#14273 ) also fixed comment	2026-01-21 11:51:14 -05:00
chenyu	e64111ad08	update all_same [pr] (#14270 ) add type annotation and unit test	2026-01-21 11:26:15 -05:00
chenyu	9ad3c865ac	fix bug in logsumexp keepdim=True (#14268 )	2026-01-21 09:49:55 -05:00
George Hotz	41d00a046d	add device to local, fix PCONTIG=2 (#14266 ) * add device to local, fix PCONTIG=2 * regression test * remove the device when we render * viz slowness * no long	2026-01-21 22:12:18 +09:00
wozeparrot	c1d14ea832	llama8b train fixes (#14264 )	2026-01-20 20:34:47 -08:00
qazal	549dbabfcb	move ALLOW_DEVICE_USAGE=0 to get_program [pr] (#14263 )	2026-01-21 12:56:05 +09:00
qazal	78a28227c6	assembly/amd: cdna4 mfma support (#14206 )	2026-01-21 09:12:05 +09:00
George Hotz	1baefed530	assembly/amd: add hw tests from ucode branch (#14259 ) * assembly/amd: add hw tests from ucode branch * fix is per lane	2026-01-21 08:53:54 +09:00
wozeparrot	ba90e1b52e	feat: script to run llama8b training (#14239 )	2026-01-20 12:44:06 -08:00
Christopher Milan	daf9414bff	fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1 (#14256 ) * fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1 * ruff	2026-01-20 12:30:07 -05:00
chenyu	e04767e39e	run pre-commit in ci (#14253 ) * run pre-commit in ci prevents pre-commit regression * IGNORE_OOB=1 * pytest * unit test * split	2026-01-20 12:24:33 -05:00
nimlgen	22af7132cd	fix test_dev_jitter_matrix (#14255 )	2026-01-20 20:07:51 +03:00
Robbe Derks	c7fbd177d4	USBGPU: debug script for comma chestnut (#14252 ) * initial debug script * improvements	2026-01-20 18:52:25 +03:00
C T	26f8b12e01	Whisper audio helpers (mel filters in tinygrad) (#13478 ) * add whisper audio helpers for stft/mel/resample * cleanup * add whisper stft test * make only stft test explicitly depend on librosa * extract sinc_window_kernel * dehardcode device * use same device argument * simplify * type annotate * ruff format audio_helpers.py * ruff format test_whisper.py * add WHISPER_NEW_STFT * rename * undo ruff format changes * use new stft and mel for whisper * remove stft test that depends on librosa * remove whitespace * add Tensor.log10 with test\test_ops.py::TestOps::test_log10 * use Tensor.log10 * fix lint * future: remove unused STFT class * future: remove resample code since it isn't used (yet) * match openai with pad_mode="reflect" * pad_to * future: cut resample leftovers * cleanup * add mel tests * future: cut stft * future: cut non-mel prep_audio changes * reduce diff * move audio_helpers.py to examples * reduce whitespace * fix imports * reduce whitespace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2026-01-20 10:50:02 -05:00

1 2 3 4 5 ...

11849 Commits