tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 23:08:06 -05:00

Author	SHA1	Message	Date
qazal	230a369708	remove some IGNORE_OOB [pr] (#10142 ) * remove some IGNORE_OOB * remove fuzz_schedule stuff * test with global * add for amd ci	2025-05-03 01:16:14 +03:00
qazal	1ed5d733bd	disable TRACK_MATCH_STATS for type_verify (#10141 )	2025-05-02 20:59:19 +03:00
nimlgen	993f0a0e87	am: a bit faster alloc (#10138 ) * am: a bit faster allocs * am: faster allocs	2025-05-02 16:03:42 +03:00
nimlgen	81410befc2	am: remove sleep from wait_reg (#10139 ) * am: remove sleep from wait_reg * fst * ooops	2025-05-02 15:46:29 +03:00
nimlgen	45bf7c5b81	am: add allocation bench (#10135 ) * init allocation bench * sorryg * betetr	2025-05-02 13:51:07 +03:00
nimlgen	6a845c2de2	amd: fix sigs on xcc path (#10137 )	2025-05-02 13:50:56 +03:00
nimlgen	bdd4dd9238	am: do not expect aligned size in valloc (#10136 )	2025-05-02 12:19:59 +03:00
Ignacio Sica	8f79492c75	fix test_tensor_cores_codegen for ptx renderer (#10119 )	2025-05-01 21:52:36 -03:00
nimlgen	30bd6a619f	usb gpu (#8766 ) * start gpu * progress * fixes * read correct * libusb * libusb works * support asm24 * hmm * one access file * fix extra * start AMBar * works on am * back to usb * patch fw * full fast write into a bar * ugh, minus one gpus, next please * mute libusb for now * usb for asm24 * 63 * hmm * ops * rescan * and gpu shoudl be there * enumerate them? * usbgpu bus 4, 100% reliable (draft) * lil * works * comments * add DEBUG * cleaner * simplest * Revert "simplest" This reverts commit `1d00354c16`. * Revert "cleaner" This reverts commit `c5662de956`. * assert we find gpu * that's simpler * this back * simpler? * correcT * work * nonsense * works with more checks * this works * the 6s in the right place * reliable now * fix after reboot * set config * 1s timeouts * close to fw loading * streams * usbhub works * endpoints * fix * want to test tiny10 * move to tiny 10 * fix gpu * ugly speed * smth * mostly broken, but signals and dmas * do not reset gpu every time * changes to run kernels * ugh, not working * t10 * pg and sc files * some prog * um? * somehow it works * patched for 24 * some tries * minimal * moving * back to working * so sloooooow * move to controller * usb.py rewrite * rework * cleaner 1 * cleaner 2 * cleaner 3 * new abstractions * aft merge * init controller * cleaner 4 * cleaner 5 * patcher + tiny changes * ignore that * cleaner 6 * after rebase * cleaner 7 * bring it back * start linter war * linter 2 * autogen was missing * fix autogen * typing * better? * mypy * extra/legacy rename and cleaner * shuffle * better printing * tiny changes and tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-01 18:03:47 +03:00
nimlgen	7573c0ef4e	amd,nv: use .cpu_view() in bind (#10131 )	2025-05-01 17:46:12 +03:00
nimlgen	16e5376ae8	line limit 12800 for usb (#10130 )	2025-05-01 16:57:44 +03:00
qazal	0c59c6b8c7	remove replace from Tensor assign [pr] (#10127 ) * remove replace from Tensor assign * assign is contiguous * allow chaining view * only assert axis	2025-05-01 19:37:55 +08:00
nimlgen	9caceda79a	amd: comgr is not required (#10128 )	2025-05-01 13:41:44 +03:00
nimlgen	c3d2e4a6e1	amd: use sdma to copy program (#10126 ) * amd: use sdma to copy program * rm * ensure prog is copies * match nv style	2025-05-01 13:04:22 +03:00
nimlgen	09f5be9bcb	amd: finalize device in case of failures (#10124 )	2025-05-01 10:41:15 +03:00
George Hotz	ef011ff5f9	flip Ops.COPY order [pr] (#10122 ) * flip Ops.COPY order [pr] * fix copy and support multi device copy in _device	2025-05-01 00:26:24 -04:00
chenyu	145e51247a	split CAST and BITCAST in PYTHON [pr] (#10123 ) CAST only needs truncate and does not require dtype fmt. added bfloat16 tests can run locally	2025-04-30 23:27:35 -04:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
George Hotz	dd0070daab	Revert "flip Ops.COPY order [pr] (#10120 )" (#10121 ) This reverts commit `984f09ac74`.	2025-04-30 17:25:21 -04:00
George Hotz	984f09ac74	flip Ops.COPY order [pr] (#10120 )	2025-04-30 16:50:18 -04:00
chenyu	17d4d258ea	simple symbolic slice in llama [pr] (#10112 ) support slice that has step None and stop > start	2025-04-30 14:36:35 -04:00
nimlgen	b583ece8f3	amd: replace AMD_DRIVERLESS with AMD_IFACE (#10116 ) * amd: replace AMD_DRIVERLESS with AMD_IFACE * docs * print direct err for amd_iface * print for all	2025-04-30 20:22:02 +03:00
nimlgen	0e1beaf44f	nv: align copies + better test (#10118 )	2025-04-30 20:09:53 +03:00
Ignacio Sica	2941537250	cast is noop if src has dtypes.void (#10110 )	2025-04-30 13:55:41 -03:00
nimlgen	fcdda4fc09	am: move boot memory to vram start (#10115 )	2025-04-30 19:12:19 +03:00
nimlgen	844d5577d8	hcq: make copy_bufs and kernargs_size params configurable per device (#10114 )	2025-04-30 18:43:50 +03:00
nimlgen	2ec3b722e2	nv: fix copies larger than 4g (#10117 )	2025-04-30 18:43:17 +03:00
George Hotz	d81acbeef6	multi: move shrink after copy (#10109 ) * multi: move shrink after copy * passing now	2025-04-30 10:29:51 -04:00
qazal	67bd8489ad	grouper cleanups [pr] (#10113 )	2025-04-30 18:54:47 +08:00
nimlgen	b4c9a3d8f4	hcq: use mmio iface in copies (#10111 ) * hcq: use mmio iface in copies * linter * fix_am * am	2025-04-30 11:05:13 +03:00
nimlgen	5c7d004da5	hcq: refactor int ptrs to hcqbuffers (#10105 ) * hcq: refactor int ptrs to hcqbuffers * more refactors * linter * use in allocator * test fiz * fx * ops * final? * simpler * keep this for now	2025-04-30 00:12:18 +03:00
chenyu	573bbb9746	Revert "remove TransformerBlock contiguous in llama (#10104 )" (#10108 ) This reverts commit `b8d07dcc54`.	2025-04-29 15:28:38 -04:00
chenyu	4a04098389	fix llama3 with nf4 quantize (#10107 ) also int8 outputs is wrong	2025-04-29 15:14:36 -04:00
George Hotz	9c1b80499f	names for graph rewrites + null device supports exp and friends (#10106 )	2025-04-29 14:28:20 -04:00
chenyu	b8d07dcc54	remove TransformerBlock contiguous in llama (#10104 )	2025-04-29 14:15:39 -04:00
Ignacio Sica	9d5677c12c	fix `ptx` linearizer bug 2 [pr] (#9967 ) * check for local buffer * hotfix * add test_tensor_cores_emulation run for ptx	2025-04-29 14:30:07 -03:00
qazal	a59d18da21	hack for VIZ=1 with examples/llama (#10103 ) * hack for VIZ=1 with examples/llama * move it alongside BEAM=0	2025-04-29 23:42:17 +08:00
qazal	93bf8764f2	do not open devices in lowering (#10101 ) * do not open devices in lowering [pr] * ctx=opts * ctx * fuzz test	2025-04-29 23:18:16 +08:00
George Hotz	c3ff308abb	range has only one src now [pr] (#10100 ) * range has only one op now * fix z3 checker * ci fix * needs shell * try pip ensure update * that ensurepip is useless * upgrade pip before cache * windows happy?	2025-04-29 10:31:05 -04:00
George Hotz	427471550a	hotfix: amd tflops to 74 and some external_benchmark_sdxl_softmax stuff	2025-04-29 09:02:27 -04:00
Ignacio Sica	58cf8cd493	add support for "shared_mem" for `LLVM` (#10093 ) * init llvm shared * add test_tensor_cores_emulation run for llvm	2025-04-29 08:56:36 -04:00
qazal	ad7546c931	assert in test_indexing_two_bind instead of silent fail (#10099 ) * assert in test_indexing_two_bind instead of silent fail * debuggable * skip test_simple_train	2025-04-29 20:23:25 +08:00
George Hotz	cee220a1ab	always expand ssa on wheres (#9697 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-04-29 20:08:41 +08:00
qazal	3b67f56c02	kernelize some llama realizes (#10098 )	2025-04-29 18:39:56 +08:00
qazal	cbf7347cd6	display viz rewrites with tabbing if they are subrewrites (#10097 ) * display viz rewrites with tabbing if they are subrewrites * update viz api	2025-04-29 17:57:21 +08:00
George Hotz	73c2f6602f	test sdxl softmax (#10096 )	2025-04-28 21:55:50 -04:00
George Hotz	eaceafecae	do fusion locally (#10095 ) * do fusion locally * oops, that's the right way * explicit delete closure	2025-04-28 20:45:37 -04:00
chenyu	3eba3d6ee9	don't pass model in convert_from_huggingface and convert_from_gguf (#10094 ) it only needs n_layers	2025-04-28 20:11:19 -04:00
George Hotz	a2d0684fc1	test_attention_simple_view (#10092 ) * test_attention_simple_view * correct comment	2025-04-28 20:01:22 -04:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00

... 34 35 36 37 38 ...

10417 Commits