tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
qazal	bee96a19ff	fuzz uop schedules (#5345 ) * basic blocks + cleanups * fixups * elif is better for future me * fuzz_schedule_max_paths * fix linter	2024-07-09 15:24:56 +03:00
Ian Paul	d5a68ae6b3	Simple abstractions3.py fix (#5343 ) * abstractions3.py fix * Add abstractions3.py to CI tests	2024-07-09 13:48:42 +03:00
nimlgen	a2a9bfd2ec	nv correct error messages with ptx (#5341 ) * nv correct error messages with ptx * return compile error	2024-07-09 10:39:39 +03:00
George Hotz	c13da83f12	tests from lowerer branch (#5339 ) * tests from lowerer branch * Update test_image_dtype.py * Update test_image_dtype.py * Update test_image_dtype.py	2024-07-08 21:23:19 -07:00
chenyu	4ceab5d2b1	fix PTX match rule for gated LOAD (#5338 ) * test padto sum with bool tensor and bool acc dtype make sure bool tensor acc with gate is handled correctly * broken in PTX * fix ptx	2024-07-08 22:25:03 -04:00
chenyu	a80f2df1bd	fix some PTX tests (#5337 ) fix broken PTX tests in test_linearizer and test_uops. there are tests that were skipped and broken because it runs only with CUDA=1 and we run PTX with NV=1 now	2024-07-08 21:33:05 -04:00
wozeparrot	9150a6be7a	tensor metadata (#5271 )	2024-07-08 17:45:40 -07:00
chenyu	7f642aa7ed	minor PTX matcher cleanup [run_process_replay] (#5336 ) * minor PTX matcher cleanup [run_process_replay] uop.cast syntatic sugar and some newline/space cleanup * comment	2024-07-08 19:19:20 -04:00
chenyu	0f0940225a	fix Tensor.all and Tensor.any for PTX (#5335 ) supported boolean acc and boolean phi. and rewrite boolean max to uint8 max	2024-07-08 18:15:04 -04:00
Roelof van Dijk	053c706961	refactor: expr_view on View (#5315 )	2024-07-08 11:47:34 -07:00
kormann	2349d837fb	Fix scope order in graph toposort [run_process_replay] (#5330 ) * fix * test * nothing	2024-07-08 11:46:15 -07:00
chenyu	631bc974a0	raise line count limit to 8500 (#5331 )	2024-07-08 14:00:28 -04:00
Timmy	bb7746985f	multireduce scheduler tests (#5141 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-08 20:28:55 +03:00
nimlgen	bb2222e488	nv default for ampere & ada (#5329 )	2024-07-08 19:01:27 +03:00
nimlgen	51d6f372e4	nv get classes based on device (#5325 ) * nv get classes * support in mockgpu * choose sm based on gpu * fix * fix * fix arch	2024-07-08 18:25:05 +03:00
chenyu	7d049fc20c	move getting 0 and min value of a dtype to dtype.py (#5328 ) cleanup getting base case for reduce ops [run_process_replay]	2024-07-08 10:51:56 -04:00
nimlgen	b0c5c58833	nv rm_control to rmctrl type (#5327 ) * nv rm_control to rmctrl type * fix	2024-07-08 17:24:33 +03:00
Elias Wahl	73bddc44f6	Fix fake dataloader (#5326 )	2024-07-08 09:07:44 -04:00
chenyu	6856f915d6	Tensor.any and Tensor.all (#5320 ) does not work in ptx yet due to how boolean tensor is handled	2024-07-07 14:36:00 -04:00
chenyu	2029cb7047	support passing None to Tensor.clip (#5319 ) passing None for no upper bound or no lower bound	2024-07-07 13:04:22 -04:00
chenyu	296a1a36bb	update Tensor.round doc and example (#5318 ) document rounding half to even and update examples to show	2024-07-07 12:10:39 -04:00
chenyu	c1e330f302	Tensor.int and Tensor.bool (#5317 )	2024-07-07 11:52:58 -04:00
nimlgen	778d1cdbee	nv allocate local memory dynamically (#5277 ) * nv allocate local memory dynamically * fix * linter * linter 2 * linter * fixes	2024-07-07 17:34:49 +03:00
qazal	ae10e936e7	UOps.VECTORIZE cleanups [run_process_replay] (#5314 ) * still render_cast * one extra line ok * these are all just vectorize * save space * behavior change can go in a different diff	2024-07-07 10:49:08 +03:00
greg-niemeyer	77b2ce9fc9	Add UOps.VECTORIZE [run_process_replay] (#5289 ) * Add UOps.VECTORIZE to core * Update vectorized cast tests * Addresses code review comments - Removes VECTORIZE from LLVMRenderer - Add line breaks to unduly long lines - Add noop CAST rule back - Update asserts and add render_vectorize in CSytleLanguage renderer * Add missing const folding rule for VECTORIZE Also adds corresponding test * Fixes test_const_vectorize_fold and add assert - Use sane types with VECTORIZE in test_const_vectorize_fold - Add assert that sanity checks the types for VECTORIZE * Rename test_cast_vectorized_fold Renames test_cast_vectorized_fold to test_noop_vectorize_fold because the test targets a very specific rule and there are other tests for VECTORIZE. * Revert unrelated changes --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2024-07-07 09:59:57 +03:00
qazal	2a7282c1e1	test: delete the extra cast in cstyle load [run_process_replay] [no_assert] (#5310 ) * test: delete the extra cast in cstyle load [run_process_replay] [no_assert] * assert buf_uop * ImageDType * ptx is actually a 64bit address	2024-07-07 09:12:49 +03:00
chenyu	cededd8eb4	minor multi cleanup (#5311 ) add type, move around and some newlines	2024-07-06 21:55:59 -04:00
qazal	8a99514462	generalize the uops toposort spec to ptx (#5309 ) * generalize spec to ptx * redundant assert * extra print	2024-07-07 00:06:30 +03:00
chenyu	ca0ef1700b	use precise::sin in metal (#5307 )	2024-07-06 12:47:27 -04:00
qazal	5c2ca7bad4	remove UOps.SPECIAL rendering from llvm (#5306 )	2024-07-06 19:28:47 +03:00
chenyu	356e5d2e54	touchup multi dtype in elementwise (#5305 ) only need to check real once, also added type annotation	2024-07-06 11:54:12 -04:00
qazal	7ddda9f9f1	hotfix: cache seen graphs in fusion (#5302 )	2024-07-06 14:13:58 +03:00
qazal	11dfb19b20	track seen graphs in recursive group (#5301 ) * track seen * maybe never add realized * ahh it needs to track sts * delete extra check * cache typings * minor cleanup	2024-07-06 12:39:31 +03:00
qazal	d813617742	prescheduling refactor (#5300 ) * p1 * refactor tuple	2024-07-06 12:04:03 +03:00
qazal	c1e166c08a	fix dtype mismatch for bool ops in multi (#5299 )	2024-07-06 11:36:40 +03:00
chenyu	fc03fc025e	enable sin on METAL in test_dtype_alu (#5298 )	2024-07-05 14:52:09 -04:00
qazal	b369e75ed0	refactor schedule creation (#5297 )	2024-07-05 21:14:38 +03:00
qazal	5292d37db6	LoadOps.VIEW in the scheduler spec (#5296 ) * refactor to allow_buffer_view * tests * fix multi	2024-07-05 19:43:50 +03:00
hikettei	1ab7a4cff0	Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] (#5172 ) * [Patch] added an option not to ignore view replacing when doing bitcast * added the testcase * [Add] reproduced bitcast cannot be fused into a single kernel in the unittest --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-05 19:16:44 +03:00
chenyu	43c3f73fbc	handcode_bert_opt.py (#5295 ) similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.	2024-07-05 11:01:20 -04:00
nimlgen	d7835a705c	hotfix: fix metal with vars (#5294 ) * hotfix: fix metal with vars * one more place	2024-07-05 16:53:40 +03:00
nimlgen	8a548b0b6e	metal support offset (#5293 )	2024-07-05 16:13:05 +03:00
qazal	1cefbb33ab	uop graph tests + type_verify cleanup (#5292 ) * test_cast_alu_fold * test_double_cast_fold + these should assert	2024-07-05 13:00:01 +03:00
qazal	341c4a29d1	hotfix: use dtype.scalar() for rendering cast [run_process_replay] [no_assert] (#5290 )	2024-07-05 11:29:35 +03:00
chenyu	87d27c45ec	minor _broadcast cleanup (#5286 ) `any(x==0 for x in y)` is `0 in y`. also `get_args(ConstType)` instead of hard coded `float, int, bool`	2024-07-04 14:25:24 -04:00
SnakeOnex	8c03816ae9	fix README example (#5284 ) * fixed README example * README test * changed py -> python markdown code flags in REAME	2024-07-04 11:15:07 -04:00
nimlgen	2778b6046c	new memory scheduler (#5278 ) * new memory schedule algo * works * fix * fix * linter * tiny fixes * do not optimize copy buffers * mpre comments * tiny cleanups	2024-07-04 18:06:04 +03:00
nimlgen	84b3e3bb6f	hcq exec no embedded signal (#5142 )	2024-07-04 13:29:21 +03:00
Tobias Fischer	0c3a35e5c2	Stable Diffusion v2 Inference (#5283 ) * model implementation * clip fix, more qol options	2024-07-03 22:47:10 -04:00
chenyu	e5ba385f03	remove first contiguous in multi from_sharded (#5121 ) second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous	2024-07-03 19:42:56 -04:00

1 2 3 4 5 ...

4981 Commits