tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-30 17:28:24 -05:00

Author	SHA1	Message	Date
chenyu	3cc6ae0d85	layernorm backward is indepedent of its mean (#4806 )	2024-06-03 09:49:59 -04:00
George Hotz	2dae657415	improve readability (#4809 )	2024-06-03 14:57:57 +02:00
George Hotz	eecfdd2f6e	hotfix: fix dataset reading for new llm.c	2024-06-03 14:10:05 +02:00
qazal	6e0c16dfb0	cleanup render_reduceop (#4807 ) * update acc key * refactor return type * remove return type * run all reduces * set acc key [run_process_replay] * local_idxs are copied in render_reduceop [run_process_replay]	2024-06-03 14:39:02 +03:00
George Hotz	dd84f7d35e	touchup: show process name in multiprocess assert	2024-06-03 13:09:40 +02:00
qazal	0db9674dea	skip process replay on master (#4808 )	2024-06-03 12:29:28 +03:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
nimlgen	e8b5f2040d	nv faster signal on dma queue (#4789 )	2024-06-02 21:47:24 +03:00
Francis Lata	707099487a	Multiprocessing UNet3D dataloader (#4801 ) * testing dataloader * matching dataloader implementation for unet3d * remove comments * clean up dataloader * add cookie and cleanup * use shm_path when creating SharedMemory * add support for testing resnet and unet3d dataloaders * update dataset test to return preprocesed data directory in prep for dataloader testing * pass preprocessed dataset directory properly * update loader function for dataloader * add shuffling on indices * update shm name * more cleanup for unet3d dataloader * remove changes to tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-02 11:30:47 -04:00
Timmy	ca32921f84	Multireduce PADTO Test (#4785 ) * padto test * expanded multireduce padto tests * cuda doesnt run on ci * moving padto_where_multireduce test to SUM so that we can check the reduce axis * cleaning up tests some more * add wanna_outputs * refactor test_padto_sum_multireduce * fix max and refactor where * fix axis --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-02 13:46:53 +03:00
qazal	231ed2c656	compute aliased buffer idxs pre reduce (#4788 )	2024-06-01 16:46:52 -04:00
nimlgen	1b18ebb133	minor cleanups (#4802 )	2024-06-01 20:11:43 +03:00
chenyu	1ffa5ec492	unit test ShapeTracker.consecutive (#4800 )	2024-06-01 10:10:51 -04:00
nimlgen	7384ee08a0	amd cleanup sdma (#4796 ) * amd cleanup sdma * faster enqueue for sdma * typo * remove commnted lines * fix overrun check * flushhdp better command	2024-06-01 17:06:44 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
Alec Chen	b377db7f0d	Refactor UOps pattern matcher to UPat instead of dicts (#4791 )	2024-06-01 10:55:51 +02:00
qazal	de8c8abbd8	define indexes pre reduce (#4795 )	2024-05-31 18:53:27 -04:00
nimlgen	bd2e7c8b31	amd registers from file (#4778 ) * amd registers from file * remove commentes * linetr * no off	2024-05-31 18:48:57 +03:00
chenyu	8942230b1f	minor cleanups of test_tensor and extend some cases (#4794 )	2024-05-31 10:43:22 -04:00
qazal	637f482588	configure derandomizing CI tests (#4793 )	2024-05-31 17:06:58 +03:00
wozeparrot	ed0a740fe4	greater chat api endpoint compat (#4792 )	2024-05-30 22:47:31 -07:00
chenyu	7cc883ecee	CMPLT is safe to pad (#4790 ) 0 < 0 evals to False	2024-05-30 22:50:48 -04:00
chenyu	236390aafb	fix lazy r const folding with variable shape (#4783 ) currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node. also added some failed required tests for symbolic arange.	2024-05-30 15:19:28 -04:00
chenyu	c4d1283049	simplify _cumsum with _first_zero=True (#4782 ) handled the case with 0 in shape output of _cumsum, and _cumsum returns the correct shape with _first_zero=True	2024-05-30 13:19:33 -04:00
chenyu	4921de1945	fix cumsum of 0-d tensor (#4781 ) * fix cumsum of 0-d tensor * _resolve_dim for all	2024-05-30 12:41:09 -04:00
chenyu	4cf0eadf8f	failed test case for ellipsis in einsum (#4779 ) from #4156	2024-05-30 11:14:42 -04:00
Alec Chen	e89bc42cc7	Add UOps pattern matcher regression tests (#4725 ) * add pattern matcher regression tests * Remove test for dtype str after rebasing * Make test uops match type spec * leave const const, add const alu vin test * correct uops * actually correct uops	2024-05-30 17:12:20 +03:00
qazal	c2945be0a3	add fused tensor core opts tests (#4775 ) * add fused tc opts tests * n=64	2024-05-30 13:50:00 +03:00
chenyu	f1bf916b8a	apply NOOPT in test_arange complexity (#4774 ) with hcopt, arange(2560) uses less ops than arange(256)	2024-05-29 23:12:35 -04:00
chenyu	cde7a7cda7	isolate the 134ms kernel in train_gpt2.py (#4773 ) 133ms on tinybox red with BEAM=2	2024-05-29 17:26:24 -04:00
nimlgen	57204c4014	amd cleanup pm4 queue (#4772 )	2024-05-29 22:59:06 +03:00
lopusz	b2c408912c	Add docs link to README (#4768 )	2024-05-29 17:47:47 +00:00
chenyu	f2414c666f	fix train_gpt2.py (#4771 ) added `with Tensor.train():`	2024-05-29 12:01:34 -04:00
chenyu	59c6472b9f	check contiguous in View.create after canonicalizing mask and offset (#4770 ) mask / offset / strides can change during canonicalization, and contiguous can be True at the end	2024-05-29 11:31:13 -04:00
qazal	6e5fa5fd92	map local aliases to reduceop (#4766 ) * map * ugh * save one line * concerning, does this pass * Revert "concerning, does this pass" This reverts commit `64d4664f17`. * use local_alias	2024-05-28 21:11:25 -04:00
chenyu	7624ad3ddd	add --timing and --profile to llama3 example (#4767 )	2024-05-28 16:24:44 -04:00
qazal	c235223c07	refactor tc_opt creation (#4765 ) * move reduceop loop * this is more mergable code add assert * integrate s2	2024-05-28 23:10:27 +03:00
qazal	a88aea626d	map tensor core bufs to reduceop (#4763 ) * tc_opts.bufs to its only map * lint * iterate reduceop bufs	2024-05-28 22:07:39 +03:00
wozeparrot	6fcf220b21	feat: tag 0.9.0 (#4762 ) v0.9.0	2024-05-28 18:44:45 +00:00
chenyu	e22cdb40f3	docs: fix mkdoc warnings and link to tensor.md (#4760 )	2024-05-28 14:24:11 -04:00
nimlgen	872827b6ae	fix usage of args struct in hcq (#4758 ) * do not allocate empty buffer in hcq * do not take args struct from program	2024-05-28 21:10:39 +03:00
wozeparrot	b2b49cef6f	split tensor docs (#4754 )	2024-05-28 11:03:52 -07:00
nimlgen	fe26d3fefe	nv sync before free for binded commands (#4759 ) * nv sync before free for binded commands * shorter comment	2024-05-28 20:49:29 +03:00
chenyu	e614b7c696	docs: showcase remove mnist_gan and add conversation.py (#4757 ) fixed both examples, and i think it's better to show conversation	2024-05-28 11:09:26 -04:00
nimlgen	019f4680e5	check dims before execution on nv (#4756 ) * check dims before execution on nv * fix linter	2024-05-28 16:57:28 +03:00
qazal	0e824741c4	pre multi reduce codegen/* cleanup (#4755 ) * refactor self.reduceop * free lines * fix test	2024-05-28 08:15:48 -04:00
chenyu	fd249422f5	minor cleanup example stable_diffusion (#4753 )	2024-05-28 00:05:37 -04:00
chenyu	53b9081aab	check arg types of Tensor.randint (#4751 ) raise TypeError if low, high, dtype are not ints	2024-05-27 20:24:10 -04:00
chenyu	16756af13c	docs: polish tensor.py (#4750 ) * docs: polish tensor.py * don't change that	2024-05-27 20:00:56 -04:00
Elias Wahl	c4b0acf095	Global norm + small changes (#4749 ) * norm * no empty * default loss scaler in float	2024-05-27 18:35:27 -04:00

1 2 3 4 5 ...

4599 Commits