tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-28 00:08:16 -05:00

Author	SHA1	Message	Date
uuuvn	7ecced7f6d	LLVM JIT prereqs (#8634 ) * LLVM JIT prereqs This commit moves jit loading, disassembling and CPUProgram logic from `ops_clang.py` to `elf.py`, `helpers.py` and `device.py` respectively I don't quite like the `helpers.py` destination for capstone_flatdump but this is where cpu_objdump is so presumably this is how it's supposed to be * Types	2025-01-15 09:47:08 -08:00
qazal	a1f70ce7d0	only use BUFFER_VIEW in disk [pr] (#8629 ) * only use BUFFER_VIEW in disk [pr] * delete can_view * BUFFER_VIEW op on DISK * remove that allow_buffer_view=False * notes * bitcast is a low-level op too * this passes on AMD and LLVM	2025-01-15 12:34:15 -05:00
ignaciosica	bae20e5043	Generic PTX wmma rendering [pr] (#8632 ) * make wmma rendering dtype size generic * use var instead of calculating multiple times * compact rendering	2025-01-15 09:31:48 -08:00
qazal	6193e279d4	isolate simple failing test for subbuffer on CONST [pr] (#8630 ) * simple failing test for subbuffer on CONST [pr] * add view_supported_devices check	2025-01-15 05:45:03 -05:00
George Hotz	e1f7c90459	gradient is a set [pr] (#8626 ) * gradient is a set [pr] * typing for deepwalk	2025-01-14 20:48:23 -08:00
chenyu	7fb1c7af61	minor multi cleanups [pr] (#8625 )	2025-01-14 22:25:23 -05:00
George Hotz	504ad08e73	hotfix: add test_example_matmul_same	2025-01-14 19:03:17 -08:00
George Hotz	f29d6f54b8	support multilb gradient [pr] (#8624 )	2025-01-14 18:33:33 -08:00
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
chenyu	7860a80801	simpler MultiLazyBuffer alu [pr] (#8622 )	2025-01-14 19:19:13 -05:00
chenyu	930728c069	bert BS 72->66 [pr] (#8621 ) 72 does not fit now	2025-01-14 18:41:41 -05:00
chenyu	0790d8059f	remove MultiLazyBuffer.from_sharded [pr] (#8620 ) it's eqivalent to taking the lazydata from Tensor.split, then copy to devices	2025-01-14 18:00:49 -05:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
George Hotz	fdd46c9f28	delete view instant rule (#8616 ) * remove cast before view * greener * indexing * delete view instant rule * that passes too * openpilot too * ack * base on cast_before_view * add it as a rewrite rule * VIEW(DEVICE) is also fine * test_shard_memory depends on forced_realize removal * put that back, will go soon * UOp representations change once we don't instantly fold things * do not duplicate tests --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 16:15:13 -05:00
qazal	dddd4e5f9f	hotfix: remove duplicate TestTensorMutates [pr] (#8619 ) * hotfix: remove duplicate TestTensorMutates [pr] * imports	2025-01-14 16:03:17 -05:00
nimlgen	c5782e85d2	tlsf: optimize alloc (#8608 )	2025-01-14 23:48:07 +03:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
chenyu	cbfd51f5a5	make MultiLazyBuffer.bounds a property [pr] (#8614 ) determined by lbs shapes and axis	2025-01-14 13:25:54 -05:00
chenyu	52e7003414	Revert "make kits19 dataset samples have small sizes (#8591 )" (#8610 ) This reverts commit `76a03e950a`.	2025-01-14 12:24:27 -05:00
Francis Lata	76a03e950a	make kits19 dataset samples have small sizes (#8591 )	2025-01-14 08:27:45 -08:00
ignaciosica	4057b98f7f	rename i and j into k and row/col (#8607 )	2025-01-14 08:27:05 -08:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
qazal	97ec564b03	noop changes from the block_assign branch [pr] (#8606 )	2025-01-14 07:47:17 -05:00
qazal	5aab2806f0	rename to test_tensor_uop + use upats for asserting [pr] (#8604 ) * rename to test_tensor_uop + use upats for asserting [pr] * fix pr	2025-01-14 05:09:56 -05:00
qazal	863abc7140	scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598 ) * remove the BUF_LIMIT assert * skip the base one * work * work * good error * ok comment * shorter check	2025-01-14 03:01:59 -05:00
chenyu	05e54f00d3	remove bounds from MultiLazyBuffer.from_sharded [pr] (#8603 ) without a custom bound, the bound is uniquely determined by shape and axis	2025-01-13 23:40:05 -05:00
chenyu	d443e91d82	remove custom splits in Tensor.shard [pr] (#8602 ) towards even split only	2025-01-13 21:29:13 -05:00
chenyu	227d96d7a3	remove unused src from metaop [pr] (#8601 )	2025-01-13 20:28:14 -05:00
chenyu	c4e33048c6	test Tensor.clone has a different lazydata [pr] (#8600 )	2025-01-13 20:13:44 -05:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
nimlgen	c2504357af	am: lock to access dev (#8594 ) * amm lock to access dev * wording * just works * disbale	2025-01-13 23:53:13 +03:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
George Hotz	d19c1c7f03	bump 75 -> 73 for test failure	2025-01-13 09:18:38 -08:00
Francis Lata	c25d5d3101	improve isin checks (#8589 )	2025-01-13 12:12:31 -05:00
nimlgen	74b83c4c41	am in ci (#8532 ) * try am in ci * no sudo * temp * run more am test * run half on am * insert amdgpu * other machine as well	2025-01-13 19:55:17 +03:00
nimlgen	d224d0ed7f	nv: fix fault info (#8587 ) * nv: fix fault info * and emu for amd * skip if not mock	2025-01-13 14:38:43 +03:00
qazal	586e730d32	use UOp.st for kernel reduce axes (#8499 ) * use UOp.st for kernel reduce axes [pr] * do not return dict	2025-01-13 06:24:11 -05:00
qazal	7562cc0399	better test for reduce swizzle + don't use double dtype [pr] (#8586 ) * better test_permute_rewrite * use float32	2025-01-13 05:02:21 -05:00
George Hotz	df59b072db	rename to top_down_rewrite [pr] (#8583 )	2025-01-12 18:36:38 -08:00
chenyu	994944920b	simpler batch_load_train_bert [pr] (#8582 ) don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step. https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview	2025-01-12 20:25:05 -05:00
George Hotz	05e5de6a91	ugh, remove that binary blob	2025-01-12 17:02:28 -08:00
George Hotz	4ac4c1415a	free intermediate buffers in the jit [pr] (#8581 ) * free intermediate buffers in the jit [pr] * intermediates_freed * deallocate if not allocated * self._first_run is simpler	2025-01-12 15:41:41 -08:00
George Hotz	d817dc10db	start on test rewrite map [pr] (#8432 ) * start on test rewrite map [pr] * chatgpt writes dumb tests * comment out failing * fix that test * fix gc issue * oh, frame 2 * remove uop mutability * map is only the map * simplier + more tests * test tiny passes * tests that need to pass * parent test passes * child test passes * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * test_mutate_add passes * this can pass * Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map" This reverts commit `657822dcdc`, reversing changes made to `2a126c145b`. * Revert "test_mutate_add passes" This reverts commit `ab4fc4c78e`. * correct enough * remove test_rewrite_map_schedule.py * viz * uops are immutable --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-12 13:13:51 -05:00
qazal	2f71a00236	remove PYTHONPATH=. from mypy ci [pr] (#8578 )	2025-01-12 09:52:03 -08:00
qazal	cde18fddce	fix DEBUG=2 output for copy runners [pr] (#8579 ) * fix DEBUG=2 output for copy runners [pr] * itemsize is constant	2025-01-12 12:03:01 -05:00
eliotgolding	867004fbeb	use unravel in views_to_indexed_uops [pr] (#8560 ) * use unravel in shape * make process replay work * earlier View.minify() * fix * fix tests * mypy * get rid of early minify * fix * linter * clean and add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-12 10:25:55 -05:00
nimlgen	38b5ac4d4a	mypy for mockgpu/cuda & dsp/run (#8575 )	2025-01-12 18:25:39 +03:00
chenyu	def90b22f6	EVAL_BS=36 for bert [pr] (#8576 ) 3X faster eval compared to BS=6. green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview	2025-01-12 09:43:56 -05:00

1 2 3 4 5 ...

7531 Commits