tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 06:58:11 -05:00

Author	SHA1	Message	Date
George Hotz	fa00ef66fd	Update README.md	2024-06-13 00:29:19 +02:00
chenyu	eb0f5b5660	failed test case for getitem with leading Nones (#4936 ) * failed test case for getitem with leading Nones torch matched numpy so tinygrad is incorrect. another repro ``` t = np.arange(12).reshape((3, 4)) print(t[None, None, np.array([1, 2])]) t = torch.arange(12).reshape((3, 4)) print(t[None, None, torch.tensor([1, 2])].numpy()) t = Tensor.arange(12).reshape(3, 4) print(t[None, None, Tensor([1, 2])].numpy()) ``` * # noqa	2024-06-12 16:19:42 -04:00
Elias Wahl	d2e3c391e8	Residual in MLM loss + Change default steps (#4935 ) * Residual in mlm loss * Reduce default steps to 160K * 24 * oops * comment	2024-06-12 16:09:18 -04:00
chenyu	a21ea165bc	skip linearizer test_failure_22 on llvm (#4937 ) getting flaky recently	2024-06-12 16:03:38 -04:00
chenyu	27903c5ed5	minor minor Tensor.__getitem__ cleanup (#4934 ) more consistent variable names and update comments before next minor cleanup that touches logic [run_process_replay]	2024-06-12 15:08:18 -04:00
chenyu	5e6336edda	minor Tensor.gather cleanup (#4933 ) `permarg[i]` is just `i`, and break the big return into two lines. [run_process_replay]	2024-06-12 13:57:28 -04:00
Timmy	720c700a8a	Multireduce-Kernels: Linearizer Changes and Tests (#4259 ) * basic tests * cleanup * pylint * ruff * use define acc as a proxy for rendered reductions * use define acc as a proxy for rendered reductions * recursive reduceop rendering via ast_parse * linters + cleanup * fixing late buf loading * plus linters * removing extra line * linters * does this break ci? * added tests and if add end change * typo in add_ends * linters * removing comments * allow endifs to be inserted before the end of the graph * find add ENDIF before next BARRIER * removing tests with manual ENDIF + linters * specifically the next barrier aftr the store of the local result * Revert "specifically the next barrier aftr the store of the local result" This reverts commit `b288a5c3ce`. * keeping up to date * linters + merge changes * cleaning up old bad decisions * linters and opts * mrged linearizer tests * fixing merge issues * removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions * small diff fixes * updating linearizer to work without uops.add( ... cachable) * linters * comment in multireduce tests * skipping tests without locals * full tests * linters * load_cache[key] fix for multiple accs * linters * assert only one reduceop * fix loop_scope test to actually cause an issue * self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique * updated tests * fixing merge * removing debug prints * complete merge fix * linters * diff cleanup * adding tests in * give each reduce it's own local buffer * gpu=1 changes * store and load locals with upcasting * modifying test? * make multireduce_netsted_local_upcast test match single reduce shapes * removing todo * cleaning up the diff * unroll test * unroll and upcast tests * fix gpu * seq and self.load_cache[key] cleaning * linters * padto works * merge fixes * fixes * add skips for amd * linters + seq * cleaning & more tests * softmax tests * linters * [run_process_replay] * add new tests back This reverts commit `19dec22e01`. * more hardcoded -1s * fix ptx * Fix name for loop in ptx * cleaning up the diff * cleaning up the uops diff * nv ci is too slow --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-06-12 13:29:43 -04:00
Nicklas Boman	6e86472cd6	fix typing for test to run in py38 (#4930 )	2024-06-12 13:22:30 -04:00
chenyu	1326f29e24	fix Tensor.gather shape checking criteria (#4932 ) it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`	2024-06-12 13:10:14 -04:00
qazal	898430c004	more typing in linearizer uoping utils (#4929 ) * type check everything * idxs will be uops	2024-06-12 11:00:02 -04:00
George Hotz	828c98d5c4	add slides from code europe to docs	2024-06-12 14:35:08 +02:00
George Hotz	9a3c1e4a17	fix mul div failure (#4928 )	2024-06-12 13:58:46 +02:00
George Hotz	11a03cbbf5	don't use uops.add while constructing (#4913 ) * don't use uops.add while constructing * rebase * bugfixes * have to use BFS * prove it's late * simpler uop symbolic test (why we did this) * use dict, not set	2024-06-12 13:31:34 +02:00
qazal	d894acbb50	remove hardcoded -1s referencing late reduce (#4926 )	2024-06-12 04:50:15 -04:00
qazal	b833a112ba	allocate shared memory per block (#4924 ) * define temp * use idx * cleaner [run_process_replay]	2024-06-12 03:43:10 -04:00
George Hotz	ca4ccddcd6	docsfix: nn.Tensor -> Tensor	2024-06-12 09:18:32 +02:00
wozeparrot	3d13c23bfa	llama3 `--download_model` (#4922 )	2024-06-11 22:59:59 -07:00
chenyu	f902af4f0b	increase metal ci test timeout to 20 minutes (#4920 ) make it less annoying for now	2024-06-11 18:45:51 -04:00
chenyu	fdbb4305cb	skip unsupported dtype in fuzz_linearizer (#4917 ) resolve issues in #4887. dataset generated from ubuntu but metal does not support double	2024-06-11 18:18:21 -04:00
qazal	7f3d9e6d94	revert hsa autogen removal (#4914 ) * Revert "only install comgr in AMD CI (#4909)" This reverts commit `7f03420d05`. * rocm-llvm only removal	2024-06-11 12:55:45 -04:00
nimlgen	58cf6eaba9	add missing dir level for amd mockgpu (#4911 )	2024-06-11 18:35:04 +02:00
chenyu	b886d250fb	improve test_dropout_on_shard (#4912 ) tested some basic property, also minor formatting for a few Tensor.training setups	2024-06-11 11:36:02 -04:00
qazal	7f03420d05	only install comgr in AMD CI (#4909 ) * test * delete hsa autogen	2024-06-11 06:19:33 -04:00
George Hotz	35e53c0809	add sharded arange test (#4908 )	2024-06-11 10:58:33 +02:00
chenyu	798ea61377	widen test_ops [low, high] and more strict atol (#4906 ) default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan). dropped several explicit atol if it's unnecessarily larger than default 1e-6. tested on mac, tinybox red / green	2024-06-10 20:47:09 -04:00
chenyu	97b05f567e	revert the .detach() in layernorm (#4904 ) * revert the .detach() in layernorm it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm. Added backward tests for weights, bias and input for these norms. * bigger atol for llvm * relax backward more	2024-06-10 18:02:05 -04:00
qazal	8b5bcf309a	process replay in all of CI (#4884 )	2024-06-10 14:49:29 -04:00
George Hotz	9715a7193a	replace set with dedup (#4901 )	2024-06-10 18:20:38 +02:00
chenyu	c8cd637236	test case for Tensor.var reducing over size = 1 axis (#4902 ) backward failed when correction >= reducing n	2024-06-10 12:11:39 -04:00
chenyu	c0fb7eee09	cleanup lazy const fold for binary (#4900 ) removed pylint: disable=possibly-used-before-assignment [run_process_replay]	2024-06-10 10:46:58 -04:00
nimlgen	5bf1f7d4d3	nv better error messages for ioctls (#4899 )	2024-06-10 16:01:50 +03:00
George Hotz	b9f26eedc9	hotfix: import datasets in nn init	2024-06-10 11:33:50 +02:00
chenyu	b56ae5606c	cosmetic changes to uop _match (#4897 ) minor cleanup before fixing two level match [run_process_replay]	2024-06-09 18:29:42 -04:00
SnakeOnex	b1db2d0094	tqdm replacement (#4846 ) * tqdm replacement almost * formatting * formatting * imports * line len * fix * removed set description :( * removed set description :( * fix * fix * green check? * rewrote as class, fixed several bugs * types spacing * removed imports * fix * iterable * typing * mypy disagreement * imports * more e2e tests vs tqdm * removed seed setting * robustness against time.sleep() flakiness * flaky fix * automatic bar closing when count==total * cleanup * clang error with tqdm * tqdm back * use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program * back to shutil * unit_scale + unit_scale test * custom unit to tests * pretty * clean * removed flaky test * less test iters * empty line * remove disable	2024-06-09 23:46:03 +02:00
qazal	05d7ab774f	set tensor core opt options in Renderer (#4896 )	2024-06-09 14:12:41 -04:00
George Hotz	f42183ba28	hotfix: relax cifar to 93.2	2024-06-09 13:09:21 +02:00
qazal	1dde829e34	UOps.IF* to graph spec (#4894 )	2024-06-09 07:00:12 -04:00
George Hotz	b9afb0d577	test uop as symbolic (#4870 ) * start work * more tests passing * more tests passing * more * 34 failures * expect the failures * remove broken rule * render is fine in just the test * simplify and put in test	2024-06-09 12:15:11 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00
chenyu	e33efd6a3d	test cases for multitensor adds const (#4892 ) Tested const remained const in ast. Removed the TODO in _to_const_val too	2024-06-08 22:57:48 -04:00
chenyu	a3ec4234df	expand broadcast functions a bit (#4891 ) taking some good stuff from the #4886. I think `from_, to` is more readble than `sh, s` too [run_process_replay]	2024-06-08 20:16:54 -04:00
wozeparrot	2849d0a2a1	fix copying to clipboard on a non secure context (#4890 )	2024-06-08 16:51:47 -07:00
nimlgen	6327b50e51	amd in benchmarks (#4861 ) * amd in benchmarks * remove all hsa	2024-06-08 23:24:46 +03:00
nimlgen	d24e57c615	amd support kernel with bf16 (#4863 ) * amd support kernels with dispatch_ptr * fixes * line savings * one line * try * Revert "try" This reverts commit `5f340dfdd4`. * not used will be back when hsa is gone * gone will be back * add this as well	2024-06-08 22:52:32 +03:00
wozeparrot	6c24eda522	feat: tinychat (#4869 )	2024-06-08 12:05:45 -07:00
Brennan Kinney	9445946cae	docs: Update referenced yaml in `yolov8.py` (#4871 ) YAML files have since been relocated.	2024-06-08 15:05:00 -04:00
Roelof van Dijk	794fecf8e3	perf: faster element deletion during matching (#4882 ) * perf: faster deletion * fix: leave the tuple init	2024-06-08 15:16:35 +02:00
Roelof van Dijk	0eebb8e998	fix: _free should not return (#4880 )	2024-06-08 14:45:06 +02:00
Roelof van Dijk	1785a70e77	fix: else-return on runtime (#4881 ) * fix: add init file * fix: no else-return * fix: remove file again	2024-06-08 14:44:24 +02:00
qazal	1e3325f369	raise assert [run_process_replay] (#4879 )	2024-06-08 08:31:44 -04:00

1 2 3 4 5 ...

4692 Commits