tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 11:14:56 -05:00

Author	SHA1	Message	Date
nimlgen	47bfd7c2b7	fix sync of offset buffers in graphs (#4850 ) * correctly sync offset buffers * test * style * run less * just use base	2024-06-06 16:09:45 +03:00
qazal	eeb5a7af39	refactor `linearize` to render_block, P1 (#4839 ) * refactor to render_block * move rendering the reduce to its own thing * add todo and cleanups [run_process_replay] * inplace update of idxs [run_process_replay]	2024-06-06 15:31:43 +03:00
George Hotz	b932ce0f1d	[run_process_replay] style: clean up UPat	2024-06-06 08:54:24 +02:00
chenyu	b42f49b506	minor cleanup of view _merge_dims (#4849 )	2024-06-05 23:20:26 -04:00
nimlgen	1649c21ead	nv fix round of allocation sizes (#4828 ) * fix round of allocation sizes * comment on prefetch * use huge pages	2024-06-06 00:21:56 +03:00
nimlgen	09bfb8c10a	nv sync program copies to other exection (#4845 )	2024-06-05 23:34:33 +03:00
chenyu	99e7a1d5e9	support symbolic reshape with non-contiguous (#4844 ) * support symbolic reshape with non-contiguous pre-requisite for symbolic arange (make symbolic ones that can be folded). * test cases * typo * shorter	2024-06-05 16:01:19 -04:00
chenyu	a352b6d9ce	symbolic Tensor.var (#4843 ) taken from #4446 and add more tests	2024-06-05 12:55:54 -04:00
Nik	085c0bbf6b	add mlperf train subset of openimages (#4841 )	2024-06-05 10:10:11 -04:00
Timmy	887643cf34	Multireduce atomic local load/store test (#4786 ) * atomic load/store test * tests for nested & unrolled * check barriers * linters * cleaning up diff * fix assert in _temp_create_multireduce_ast changes * cleaning up the check for redundant barriers * minor cleanups for the assert * always seed randn, helps with debuggability --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-05 14:41:19 +03:00
George Hotz	3954f102aa	style: make __init__ first in Tensor class	2024-06-05 12:51:41 +02:00
Szymon Ożóg	273945df67	Regression tests for bitshift (#4829 ) * Regression tests for bitshift * Add test for bitshift not triggered * Enable tests	2024-06-05 11:42:34 +02:00
Alec Chen	5ac30c29d8	Construct UOps patterns using UPat (#4821 ) * Allow UPat pattern definitions * Convert pattern matcher tests to UPat constructions * Convert constant_folder patterns to upat constructions * Convert assembly patterns to upat constructions * [run_process_replay] Drop UPat.from_dict	2024-06-05 10:29:37 +02:00
Szymon Ożóg	e47277d18a	Disable for PTX as well (#4838 ) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-06-05 10:37:59 +03:00
Francis Lam	890e7c12bb	test/external/verify_kernel: add support for single pickled kernel (#4836 )	2024-06-04 18:59:21 -04:00
Elias Wahl	e576aca044	Disable dropout (#4837 )	2024-06-04 18:57:26 -04:00
Elias Wahl	bb248a0dd1	Optional half matmul (#4835 ) * half linear * move weight cast back * oops * matmul dtype var * todo comment	2024-06-04 17:53:41 -04:00
Elias Wahl	04e237328b	Refactor to class style (#4804 )	2024-06-04 14:08:31 -07:00
nimlgen	1b8bed4a26	nv check cmdq overrun (#4824 ) * nv check cmdq overrun * fix assert	2024-06-04 23:22:58 +03:00
David Hou	cddce0e168	don't cast before view on shape changing bitcast (#4833 ) * don't cast before view on shape changing bitcast * make sure cast before view triggers	2024-06-04 16:04:52 -04:00
Alec Chen	0c3a996e64	Nest ifs for dtype and uop in pattern matcher (#4834 )	2024-06-04 15:51:28 -04:00
Alec Chen	4909a0d16f	Fix arg set in pattern matcher (#4830 )	2024-06-04 15:10:09 -04:00
Alec Chen	c96026ac65	Add arg set regression test for pattern matcher (#4827 ) * Add arg set regression test for pattern matcher * real regression --------- Co-authored-by: qazalin <qazal.software@gmail.com>	2024-06-04 13:35:09 -04:00
chenyu	a70e8a80d7	test_ops test cmp with special floats (#4826 ) prepare to fix nan, it did not work with ge and le before either	2024-06-04 12:10:21 -04:00
Szymon Ożóg	b6895dabaa	Remove ssa label (#4823 ) * remove ssa label * linting	2024-06-04 16:51:05 +02:00
George Hotz	052c928d06	hotfix: touchups from presentation	2024-06-04 16:31:03 +02:00
chenyu	1e02b4cae1	default skip all exception in beam (#4822 ) added a flag `BEAM_STRICT_MODE` to catch compile error or other exceptions on demand	2024-06-03 18:21:36 -04:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
qazal	79c7d402ee	improve augmented assign error message (#4813 )	2024-06-03 16:57:22 -04:00
Szymon Ożóg	bb7b031c5c	Bitshift (#4728 ) * WIP * Cleanup * Cleanup * Fix variable, refactor to use set * right shift should be signed/unsigned * Test for bitshifts * Allow a neg	2024-06-03 21:16:01 +02:00
nimlgen	e78a9bf3f2	support view in nv/amd (#4812 ) * support view in nv/amd * fix amd * fix * run test on nv/amd	2024-06-03 22:11:52 +03:00
chenyu	45083ccb43	canonicalize 0 in shape in View.create (#4815 ) set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.	2024-06-03 13:37:37 -04:00
Szymon Ożóg	d064bf6d8c	b2 is useless (#4814 )	2024-06-03 18:29:53 +02:00
nimlgen	65f0071c4b	amd compute queue bind api (#4732 ) * amd hcq bind api * revert copy queue * revert	2024-06-03 18:36:56 +03:00
chenyu	3cc6ae0d85	layernorm backward is indepedent of its mean (#4806 )	2024-06-03 09:49:59 -04:00
George Hotz	2dae657415	improve readability (#4809 )	2024-06-03 14:57:57 +02:00
George Hotz	eecfdd2f6e	hotfix: fix dataset reading for new llm.c	2024-06-03 14:10:05 +02:00
qazal	6e0c16dfb0	cleanup render_reduceop (#4807 ) * update acc key * refactor return type * remove return type * run all reduces * set acc key [run_process_replay] * local_idxs are copied in render_reduceop [run_process_replay]	2024-06-03 14:39:02 +03:00
George Hotz	dd84f7d35e	touchup: show process name in multiprocess assert	2024-06-03 13:09:40 +02:00
qazal	0db9674dea	skip process replay on master (#4808 )	2024-06-03 12:29:28 +03:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
nimlgen	e8b5f2040d	nv faster signal on dma queue (#4789 )	2024-06-02 21:47:24 +03:00
Francis Lata	707099487a	Multiprocessing UNet3D dataloader (#4801 ) * testing dataloader * matching dataloader implementation for unet3d * remove comments * clean up dataloader * add cookie and cleanup * use shm_path when creating SharedMemory * add support for testing resnet and unet3d dataloaders * update dataset test to return preprocesed data directory in prep for dataloader testing * pass preprocessed dataset directory properly * update loader function for dataloader * add shuffling on indices * update shm name * more cleanup for unet3d dataloader * remove changes to tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-02 11:30:47 -04:00
Timmy	ca32921f84	Multireduce PADTO Test (#4785 ) * padto test * expanded multireduce padto tests * cuda doesnt run on ci * moving padto_where_multireduce test to SUM so that we can check the reduce axis * cleaning up tests some more * add wanna_outputs * refactor test_padto_sum_multireduce * fix max and refactor where * fix axis --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-02 13:46:53 +03:00
qazal	231ed2c656	compute aliased buffer idxs pre reduce (#4788 )	2024-06-01 16:46:52 -04:00
nimlgen	1b18ebb133	minor cleanups (#4802 )	2024-06-01 20:11:43 +03:00
chenyu	1ffa5ec492	unit test ShapeTracker.consecutive (#4800 )	2024-06-01 10:10:51 -04:00
nimlgen	7384ee08a0	amd cleanup sdma (#4796 ) * amd cleanup sdma * faster enqueue for sdma * typo * remove commnted lines * fix overrun check * flushhdp better command	2024-06-01 17:06:44 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
Alec Chen	b377db7f0d	Refactor UOps pattern matcher to UPat instead of dicts (#4791 )	2024-06-01 10:55:51 +02:00

... 119 120 121 122 123 ...

10633 Commits