tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-02 18:54:58 -05:00

Author	SHA1	Message	Date
Roelof van Dijk	56b7fadc2f	perf: skip type verify with -O (#6319 )	2024-08-29 13:47:51 -07:00
qazal	7a08b881ed	st_fixup explicit UOp init [run_process_replay] (#6320 )	2024-08-29 23:21:10 +03:00
qazal	539654fbe1	graph_rewrite complexity tests [run_process_replay] (#6317 )	2024-08-29 22:39:08 +03:00
qazal	07942ef361	Proposal: Better UOps.SWIZZLE (#6309 ) * better UOps.SWIZZLE * test_swizzle_rewrite * add it to docs * show a diff * a lil more verbose * two teeny notes * hotfix: sink	2024-08-29 15:39:48 +03:00
qazal	8c50ef8b7c	start uop docs (#6291 ) * start uop docs * only need show_labels * sink comes first * hotfix: invalid * touchups * 2 space indent works * limit some buffer uops * better BARRIER doc, Op -> UOp when it makes sense. * make KernelInfo optional * more work relative links don't work * this can be local in multi reduce+pads * add UOps.SHAPETRACKER details * UOps.CONST both types * nit: local buffer isn't device Buffer, habit * nit2: dtype -> DType	2024-08-29 15:22:39 +03:00
qazal	dd4e5f1c8d	process replay rewrite (#6284 ) * process replay rewrite p2 * start some unittests + exceptions and exits * shebang * remove extra kernel init	2024-08-29 15:08:27 +03:00
pedro	7de4eac8f7	add support and tests for nearest modes in interpolate, adapt uint8 bilinear to torch implementation (#6308 ) * add `nearest` mode to interpolate matching pytorch `nearest` which is knowingly buggy + relevant TestsOps * add `nearest-exact` mode to interpolate matching pytorch `nearest-exact` + relevant TestOps * fix uint8 bilinear interpolation by matching custom torch implementation * implement uint8 lerp with torch interpolation trick without converting it to float	2024-08-28 21:59:51 -07:00
George Hotz	638b4843da	fix for metal ICB issue on M1/M2 [run_process_replay] (#6313 ) * this is a working fix * better comment * repro	2024-08-28 21:31:14 -07:00
wozeparrot	cb61cfce24	feat: example and extra tweaks (#6310 )	2024-08-28 19:26:11 -07:00
wozeparrot	ea5b7910b7	AMD support gfx103x (#5926 )	2024-08-28 14:17:08 -07:00
gswangg	94a72d44d2	update CI tests in extra with UOp AST (#6290 )	2024-08-28 22:26:50 +03:00
Tobias Fischer	3517aa89d9	sdxl batched inference fixes (#6293 )	2024-08-28 07:44:58 -04:00
Roelof van Dijk	85591bd1ae	no need for functools here (#6303 )	2024-08-28 01:19:57 -07:00
nimlgen	b1e5343133	nv better error msg for p2p failure (#6301 ) * nv better error msg for p2p failure * linetr * from * mypy	2024-08-28 01:40:45 +03:00
nimlgen	ac303146ca	nv sure qmd addr less than 40bits (#6288 )	2024-08-27 20:47:38 +03:00
George Hotz	5ed6c6ef3e	hotfix: 220V 15A -> 220V 20A	2024-08-27 10:20:43 -07:00
qazal	ec34d9ee36	start benchmarking ast graph rewrite (#6297 ) * ast_rewrite to ctx var * add external_benchmark_ast * refactor to asts * track lazybuffers * more work * record checkpoint * cleanup	2024-08-27 18:18:44 +03:00
qazal	552fbd5527	update llm.c with UOp ast [run_process_replay] (#6296 )	2024-08-27 15:04:54 +03:00
Tobias Fischer	211bfb6d8a	fixed batched clip computation (#6292 )	2024-08-26 20:48:15 -04:00
ignaciosica	3918f6eea0	refactor amd render_kernel (#6223 ) * refactor amd render_kernel * fix spacing * add half alias back * use itemsize * 8 insted of fixed values * reverting becasue it broke as no longer 32 was default * remove comment * remove nested tuples * hotfix: prefix.append * hotfix2: is not None * more diff cleanups * hotfix 4: spacing changes must not be in the same diff * revert wmma dtype rendering --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-08-27 00:28:36 +08:00
ignaciosica	3132449086	refactor _make_{cuda/clang}_dtype into render_vector_prefix (#6287 )	2024-08-26 09:14:44 -07:00
Max-We	ab2714423b	Add einsum tests (#6286 ) Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>	2024-08-26 09:09:25 -07:00
chenyu	b76f0c875e	lazy const fold idiv 1 (#6285 )	2024-08-26 10:29:59 -04:00
chenyu	af7c04ff57	Tensor.__floordiv__ (#6283 ) support Tensor.__floordiv__ and friends	2024-08-26 09:43:40 -04:00
qazal	d2f8eeed2e	make [compare_schedule] the default [run_process_replay] (#6273 ) * make [compare_schedule] the default * capture ctx * logging * set capture to false	2024-08-26 21:40:03 +08:00
qazal	067aeaeb2f	single arange fusion with graph rewrite (#6160 )	2024-08-26 18:18:16 +08:00
qazal	b4381e9777	uop output_st is Optional [run_process_replay] (#6282 )	2024-08-26 17:58:55 +08:00
qazal	1c0456af89	add UOps.SWIZZLE (#6271 ) * add UOps.SWIZZLE * flip swizzle init * generic st_fixup	2024-08-26 16:08:51 +08:00
CaltropHungerton	002f60b4c3	fix intel wmma flop counting, add flop counting tests for different tensor cores (#6192 ) * fix wmma flop counting on intel, add count tests * half * add half gemm * Update test.yml * one test * Update test_uops_stats.py * Update test_uops_stats.py * Update test_uops_stats.py * smaller matrix, use unittest skipUnless decorator	2024-08-25 18:37:05 -07:00
Tobias Fischer	331b0f5477	new clip gather (#6277 )	2024-08-25 19:27:24 -04:00
qazal	f0cc8ca5f2	generic st_fixup in scheduler graph rewrite [compare_schedule] (#6278 )	2024-08-25 11:02:17 +03:00
qazal	70015bd89c	move permute_reduces to uop movementops [run_process_replay] (#6272 )	2024-08-25 10:25:51 +03:00
chenyu	b86907c6c7	UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] (#6276 )	2024-08-24 21:39:50 -04:00
chenyu	00282afa41	identity element of binary ops (#6275 ) helper for the number reduce acc is inited to (0 for ADD, 1 for MUL and -inf for MAX)	2024-08-24 18:10:19 -04:00
qazal	ee245b48a9	refactor reduceop swizzling (prep for UOps.SWIZZLE) [compare_schedule] (#6269 )	2024-08-24 18:17:19 +03:00
gswangg	3cf507ae7f	remove extra.ops and LazyOp support from Kernel (#6267 ) * remove extra.ops and BufferOps * remove extra.ops and LazyOp support in Kernel	2024-08-24 16:44:38 +03:00
qazal	ccb05d8baa	fixup neg tests [run_process_replay] (#6268 )	2024-08-24 16:35:43 +03:00
gswangg	ea76b93814	migrate test_linearizer_dumb.py to UOp AST (#6241 ) * add imports and update test_unmerged_ifs to UOp AST * test_max_simplify_and_cancel * test_expander_new_srcs * test_llama_embedding * test_unaligns_idxs * test_unrolled_float4_align * test_upcasted_stores_out_of_order * remove LazyOp * remove extra/ops and replace ReduceOps.SUM with BinaryOps.ADD	2024-08-24 16:27:29 +03:00
gswangg	e44653e25a	migrate test_linearizer_failures.py to UOp AST (#6240 ) * add imports and update test_failure_1 to UOp AST * update test_failure_2 with UOp AST * update test_failure_3 * test_failure_5 * test_failure_6 * test_failure_7 * test_failure_8 * test_failure_9 * test_failure_10 * test_failure_11 * test_failure_12 * test_failure_12_multireduce * uncomment skip and migrate test_failure_13 * test_failure_14 * test_failure_15 * test_failure_16 * test_failure_17 * test_failure_18 * test_failure_19 * test_failure_20 * test_failure_21 * test_failure_22 * test_failure_23 * test_failure_24 * test_failure_25 * test_failure_26 * test_failure_27 * test_failure_28 * test_failure_29 * test_failure_30 * test_failure_31 * test_failure_32 * test_failure_33 * test_failure_34 * test_failure_36 * test_failure_37 * test_failure_38 * test_update_39 * test_failure_40 * test_failure_41 * test_failure_42 * test_failure_43 * test_failure_44 * test_failure_45 * test_failure_46 * test_failure_47 * test_failure_48 * test_failure_49 * test_failure_50 * remove LazyOp * reskip test_failure_22 * remove extra/ops * replace ReduceOps with BinaryOps * fixup that import --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-08-24 16:26:58 +03:00
qazal	1b4ad982e5	share REDUCE_ALU in multi and schedule [run_process_replay] (#6266 )	2024-08-24 16:16:38 +03:00
gswangg	1dc6040877	migrate test_search.py to UOp AST (#6245 ) * add imports and update test_kernel_count with UOp AST * test_filter_global_buffer * remove LazyOp * remove extra.ops and ReduceOps --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-08-24 16:13:53 +03:00
qazal	ae23540d6e	refresh process replay schedule ref in reset.py (#6265 )	2024-08-24 16:12:51 +03:00
gswangg	7be5eede71	migrate test_linearizer_overflows.py to UOp AST (#6244 ) * add imports, remove ConstBuffer, and update test_overflow_1 with UOp AST * test_overflow_2 * test_overflow_3 * test_overflow_4 * test_overflow_5 * test_overflow_6 * test_overflow_7 * TestLinearizerOverflowAlt::test_overflow_1 * TestLinearizerOverflowAlt::test_overflow_2 * remove LazyOp * remove extra.ops * remove ReduceOps	2024-08-24 16:10:29 +03:00
chenyu	943ab97d24	fix Tensor.prod for multitensor (#6264 )	2024-08-24 08:52:24 -04:00
qazal	bcb2f1caa3	init REDUCE_AXIS with BinaryOps (#6256 ) * REDUCE_AXIS arg with BinaryOps * more work in kernel.py fixup sops.gz * fix TestGraphRewriteEfficiency	2024-08-24 11:28:41 +03:00
chenyu	da5cf11859	fix acc init value for MUL (#6263 )	2024-08-23 23:19:44 -04:00
wozeparrot	a7bf20c7cd	feat: updated tinybox docs (#6261 ) * feat: updated tinybox docs * fix: grammar	2024-08-23 18:27:46 -07:00
George Hotz	26498b322e	add BEAM to external_benchmark_schedule.py	2024-08-23 18:10:46 -07:00
George Hotz	53a73038e3	hotfix: TestGraphRewriteEfficiency.test_create_many_uops	2024-08-23 15:51:57 -07:00
George Hotz	7c3ba3fa8a	improve match stats + custom early reject [run_process_replay] (#6260 ) * improve match stats [run_process_replay] * custom_early_reject	2024-08-23 15:28:57 -07:00

... 92 93 94 95 96 ...

10417 Commits