tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-30 09:18:07 -05:00

Author	SHA1	Message	Date
Elias Wahl	bb248a0dd1	Optional half matmul (#4835 ) * half linear * move weight cast back * oops * matmul dtype var * todo comment	2024-06-04 17:53:41 -04:00
Elias Wahl	04e237328b	Refactor to class style (#4804 )	2024-06-04 14:08:31 -07:00
nimlgen	1b8bed4a26	nv check cmdq overrun (#4824 ) * nv check cmdq overrun * fix assert	2024-06-04 23:22:58 +03:00
David Hou	cddce0e168	don't cast before view on shape changing bitcast (#4833 ) * don't cast before view on shape changing bitcast * make sure cast before view triggers	2024-06-04 16:04:52 -04:00
Alec Chen	0c3a996e64	Nest ifs for dtype and uop in pattern matcher (#4834 )	2024-06-04 15:51:28 -04:00
Alec Chen	4909a0d16f	Fix arg set in pattern matcher (#4830 )	2024-06-04 15:10:09 -04:00
Alec Chen	c96026ac65	Add arg set regression test for pattern matcher (#4827 ) * Add arg set regression test for pattern matcher * real regression --------- Co-authored-by: qazalin <qazal.software@gmail.com>	2024-06-04 13:35:09 -04:00
chenyu	a70e8a80d7	test_ops test cmp with special floats (#4826 ) prepare to fix nan, it did not work with ge and le before either	2024-06-04 12:10:21 -04:00
Szymon Ożóg	b6895dabaa	Remove ssa label (#4823 ) * remove ssa label * linting	2024-06-04 16:51:05 +02:00
George Hotz	052c928d06	hotfix: touchups from presentation	2024-06-04 16:31:03 +02:00
chenyu	1e02b4cae1	default skip all exception in beam (#4822 ) added a flag `BEAM_STRICT_MODE` to catch compile error or other exceptions on demand	2024-06-03 18:21:36 -04:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
qazal	79c7d402ee	improve augmented assign error message (#4813 )	2024-06-03 16:57:22 -04:00
Szymon Ożóg	bb7b031c5c	Bitshift (#4728 ) * WIP * Cleanup * Cleanup * Fix variable, refactor to use set * right shift should be signed/unsigned * Test for bitshifts * Allow a neg	2024-06-03 21:16:01 +02:00
nimlgen	e78a9bf3f2	support view in nv/amd (#4812 ) * support view in nv/amd * fix amd * fix * run test on nv/amd	2024-06-03 22:11:52 +03:00
chenyu	45083ccb43	canonicalize 0 in shape in View.create (#4815 ) set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.	2024-06-03 13:37:37 -04:00
Szymon Ożóg	d064bf6d8c	b2 is useless (#4814 )	2024-06-03 18:29:53 +02:00
nimlgen	65f0071c4b	amd compute queue bind api (#4732 ) * amd hcq bind api * revert copy queue * revert	2024-06-03 18:36:56 +03:00
chenyu	3cc6ae0d85	layernorm backward is indepedent of its mean (#4806 )	2024-06-03 09:49:59 -04:00
George Hotz	2dae657415	improve readability (#4809 )	2024-06-03 14:57:57 +02:00
George Hotz	eecfdd2f6e	hotfix: fix dataset reading for new llm.c	2024-06-03 14:10:05 +02:00
qazal	6e0c16dfb0	cleanup render_reduceop (#4807 ) * update acc key * refactor return type * remove return type * run all reduces * set acc key [run_process_replay] * local_idxs are copied in render_reduceop [run_process_replay]	2024-06-03 14:39:02 +03:00
George Hotz	dd84f7d35e	touchup: show process name in multiprocess assert	2024-06-03 13:09:40 +02:00
qazal	0db9674dea	skip process replay on master (#4808 )	2024-06-03 12:29:28 +03:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
nimlgen	e8b5f2040d	nv faster signal on dma queue (#4789 )	2024-06-02 21:47:24 +03:00
Francis Lata	707099487a	Multiprocessing UNet3D dataloader (#4801 ) * testing dataloader * matching dataloader implementation for unet3d * remove comments * clean up dataloader * add cookie and cleanup * use shm_path when creating SharedMemory * add support for testing resnet and unet3d dataloaders * update dataset test to return preprocesed data directory in prep for dataloader testing * pass preprocessed dataset directory properly * update loader function for dataloader * add shuffling on indices * update shm name * more cleanup for unet3d dataloader * remove changes to tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-02 11:30:47 -04:00
Timmy	ca32921f84	Multireduce PADTO Test (#4785 ) * padto test * expanded multireduce padto tests * cuda doesnt run on ci * moving padto_where_multireduce test to SUM so that we can check the reduce axis * cleaning up tests some more * add wanna_outputs * refactor test_padto_sum_multireduce * fix max and refactor where * fix axis --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-02 13:46:53 +03:00
qazal	231ed2c656	compute aliased buffer idxs pre reduce (#4788 )	2024-06-01 16:46:52 -04:00
nimlgen	1b18ebb133	minor cleanups (#4802 )	2024-06-01 20:11:43 +03:00
chenyu	1ffa5ec492	unit test ShapeTracker.consecutive (#4800 )	2024-06-01 10:10:51 -04:00
nimlgen	7384ee08a0	amd cleanup sdma (#4796 ) * amd cleanup sdma * faster enqueue for sdma * typo * remove commnted lines * fix overrun check * flushhdp better command	2024-06-01 17:06:44 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
Alec Chen	b377db7f0d	Refactor UOps pattern matcher to UPat instead of dicts (#4791 )	2024-06-01 10:55:51 +02:00
qazal	de8c8abbd8	define indexes pre reduce (#4795 )	2024-05-31 18:53:27 -04:00
nimlgen	bd2e7c8b31	amd registers from file (#4778 ) * amd registers from file * remove commentes * linetr * no off	2024-05-31 18:48:57 +03:00
chenyu	8942230b1f	minor cleanups of test_tensor and extend some cases (#4794 )	2024-05-31 10:43:22 -04:00
qazal	637f482588	configure derandomizing CI tests (#4793 )	2024-05-31 17:06:58 +03:00
wozeparrot	ed0a740fe4	greater chat api endpoint compat (#4792 )	2024-05-30 22:47:31 -07:00
chenyu	7cc883ecee	CMPLT is safe to pad (#4790 ) 0 < 0 evals to False	2024-05-30 22:50:48 -04:00
chenyu	236390aafb	fix lazy r const folding with variable shape (#4783 ) currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node. also added some failed required tests for symbolic arange.	2024-05-30 15:19:28 -04:00
chenyu	c4d1283049	simplify _cumsum with _first_zero=True (#4782 ) handled the case with 0 in shape output of _cumsum, and _cumsum returns the correct shape with _first_zero=True	2024-05-30 13:19:33 -04:00
chenyu	4921de1945	fix cumsum of 0-d tensor (#4781 ) * fix cumsum of 0-d tensor * _resolve_dim for all	2024-05-30 12:41:09 -04:00
chenyu	4cf0eadf8f	failed test case for ellipsis in einsum (#4779 ) from #4156	2024-05-30 11:14:42 -04:00
Alec Chen	e89bc42cc7	Add UOps pattern matcher regression tests (#4725 ) * add pattern matcher regression tests * Remove test for dtype str after rebasing * Make test uops match type spec * leave const const, add const alu vin test * correct uops * actually correct uops	2024-05-30 17:12:20 +03:00
qazal	c2945be0a3	add fused tensor core opts tests (#4775 ) * add fused tc opts tests * n=64	2024-05-30 13:50:00 +03:00
chenyu	f1bf916b8a	apply NOOPT in test_arange complexity (#4774 ) with hcopt, arange(2560) uses less ops than arange(256)	2024-05-29 23:12:35 -04:00
chenyu	cde7a7cda7	isolate the 134ms kernel in train_gpt2.py (#4773 ) 133ms on tinybox red with BEAM=2	2024-05-29 17:26:24 -04:00
nimlgen	57204c4014	amd cleanup pm4 queue (#4772 )	2024-05-29 22:59:06 +03:00
lopusz	b2c408912c	Add docs link to README (#4768 )	2024-05-29 17:47:47 +00:00

... 115 116 117 118 119 ...

10417 Commits