tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
George Hotz	df3b320f46	rewriter -> devectorizer [pr] (#9147 )	2025-02-18 12:42:08 +08:00
chenyu	5dc1257ce0	clean up bert fake data iterator [pr] (#9145 ) reuse the same get_data_bert path in setup and real run	2025-02-17 20:03:38 -05:00
qazal	751c517b6c	cancel viz request after the kernel clicked away [pr] (#9144 )	2025-02-17 20:19:09 +01:00
chenyu	465421b525	fix Tensor.isclose (#9143 ) many corner cases around inf and nan	2025-02-17 12:03:12 -05:00
qazal	36741cbbc1	enable real_size assert for test_conv_2x2_backward_one_view [pr] (#9142 )	2025-02-17 17:53:44 +01:00
qazal	e9ff4ef4f7	s/ScheduleContext/GrouperContext [pr] (#9141 ) * refactor to kernel context [pr] * s/ScheduleContext/GrouperContext [pr]	2025-02-17 17:14:17 +01:00
qazal	96cc9f59e0	refactor to kernel context [pr] (#9140 )	2025-02-17 16:57:14 +01:00
qazal	df6781332e	remove var_vals from the scheduler context [pr] (#9139 ) * remove var_vals from the scheduler context [pr] * maps to int	2025-02-17 16:43:50 +01:00
Ali Ladjevardi	35e9c4657b	Use proper units when printing beam time (#9103 ) * use proper units when printing beam time * refactor DEBUG=2	2025-02-17 23:41:38 +08:00
Clément Verrier	a7f91224eb	add `Tensor.isclose()` (#8844 ) * add `Tensor.isclose()` * support `equal_nan` so as to match PyTorch's behavior * update unit tests * remove some tests temporarily * re-enable one test * re-enable other test * try to fix failing tests during CI * save one line of code --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 10:11:40 -05:00
qazal	2b787c3b17	hotfix: lower ul.disabled opacity for viz [pr] (#9138 )	2025-02-17 15:16:48 +01:00
qazal	660c034da6	KERNEL op try 3 (#9061 ) * work * tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL) * err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL) * burn the boats * assign slightly works * assign works * cleanup + var_vals can exist * fine image + fix metadata * metadata, without making everything 30% slower * diff pruning * faster assign schedule * add_buffer_ops stage * add kernel_spec back * add viz display * more strict kernel_spec	2025-02-17 14:47:54 +01:00
qazal	ec80df5115	add PROGRAM renderer to viz [pr] (#9137 )	2025-02-17 14:46:08 +01:00
qazal	7b09a72682	don't display void dtype in viz nodes [pr] (#9136 ) * don't display void dtype in viz nodes [pr] * extra	2025-02-17 13:49:36 +01:00
George Hotz	4dd10d03b7	move is_increasing to ops [pr] (#9134 )	2025-02-17 19:27:48 +08:00
qazal	22c571d3cb	add kernel axis colors to viz [pr] (#9129 ) * add kernel axis colors to viz [pr] * slightly blending with white makes this nicer * space	2025-02-17 12:21:35 +01:00
George Hotz	1bf66d62cf	symbolic gets its own file [pr] (#9132 )	2025-02-17 18:55:21 +08:00
George Hotz	bd694faf6c	factor out the expander logic [pr] (#9131 )	2025-02-17 18:09:48 +08:00
quortus	5bdf0c7951	Bitcast constant folding 2.0 (#9089 ) * Prevent const folding in test_payne_hanek_reduction * Do not use list as a default parameter * Bitcast constant folding --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 18:08:20 +08:00
quortus	2be4529f14	Test broken const folding wraparound behavior (#9080 ) * Test broken const folding wraparound behavior * Add repro for test_payne_hanek_reduction const folding bug --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 17:44:56 +08:00
George Hotz	7eea9b639d	hotfix: add replay_pkl debugging env	2025-02-17 17:34:58 +08:00
George Hotz	af9d8d39d2	dsp matchers + bump line count to 11300 (#9130 )	2025-02-17 17:31:54 +08:00
quortus	638d925e4e	Prevent const folding in test_payne_hanek_reduction (#9088 ) * Prevent const folding in test_payne_hanek_reduction * Do not use list as a default parameter	2025-02-17 17:31:10 +08:00
George Hotz	9289425170	add ast to ProgramSpec + pre matcher [pr] (#9128 ) * add ast to ProgramSpec + pre matcher [pr] * cleaner cast + test fix	2025-02-17 16:39:14 +08:00
qazal	fe260ac4d7	viz/server cleanups [pr] (#9127 ) * viz/server cleanups [pr] * space	2025-02-17 09:59:41 +02:00
George Hotz	a38b47e026	hotfix: DSP doesn't use that path	2025-02-17 10:45:29 +08:00
quortus	edf7213f34	Make bitcast to the same dtype noop (#9121 )	2025-02-16 20:28:44 -05:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
Ahmed Harmouche	84dc331dd1	Refactor async (#9126 )	2025-02-16 17:47:15 -05:00
qazal	6a9e5598f9	small viz touchups [pr] (#9123 )	2025-02-16 20:07:40 +01:00
qazal	b3127f38e6	faster viz data fetching with streaming [pr] (#9122 ) * refactor to generator * yield * switch to SSE * start client side + end events * start javascript work * need to redo this whole part * more correct * diff * works * diff cleanup * more diff cleanup	2025-02-16 19:31:11 +01:00
uuuvn	8926bac00a	am: profiling working (#9119 ) ops_amd.py registres device finalization via atexit.register after finalize_profile is registred in device.py leading to AM device being closed before finalizing profile leading to hang. (atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register) This pr moves registring device finalization to device.py before registring profile finalization	2025-02-16 18:51:08 +03:00
qazal	97cb9cb1ed	always viz the first graph + non blocking matches fetch [pr] (#9117 ) * always display the first graph in viz [pr] * simpler * progress indicator is the matches list style * remove extra * back * res.json is still slow	2025-02-16 13:39:51 +01:00
chenyu	1fda98d14f	fix import time_linearizer [pr] (#9118 ) only test that used it was skipped in CI due to being slow	2025-02-15 21:33:28 -05:00
chenyu	c1dfe5c00d	compact get_late_rewrite_patterns [pr] (#9116 )	2025-02-15 20:33:09 -05:00
qazal	2e97022e5e	remove extra block in viz [pr] (#9115 )	2025-02-16 02:38:09 +02:00
chenyu	fd95543ff1	user scatter_reduce in scatter [pr] (#9114 )	2025-02-15 18:21:01 -05:00
chenyu	c954419bc8	minor tweak to transcendental pow (#9112 ) also added more pow with const test cases	2025-02-15 18:03:25 -05:00
chenyu	8dfa0024f0	raise in scatter if self and src have different dtype [pr] (#9109 ) raise RuntimeError that matches torch instead of an implcitly cast	2025-02-15 11:21:34 -05:00
chenyu	d129ccda4c	add RAWAST back to DEBUG=3 [pr] (#9107 )	2025-02-15 09:12:51 -05:00
qazal	2e19976d03	assert views in tensor uops [pr] (#9106 )	2025-02-15 13:27:55 +02:00
George Hotz	81f5a7af7d	improve DEBUG=3 [pr] (#9105 )	2025-02-15 18:44:56 +08:00
qazal	41d143d27c	new order to prepare for becomes_map = tensor_map [pr] (#9104 )	2025-02-15 10:37:36 +01:00
George Hotz	4672d9af73	actual tests for the dsp backend [pr] (#9102 ) * actual tests for the dsp backend [pr] * fix name	2025-02-15 15:17:56 +08:00
George Hotz	7e09057afa	fixup clang devectorize (#9099 ) * fixup clang devectorize * __builtin_convertvector is some casts * dsp fixups	2025-02-15 09:29:47 +08:00
Marcello Fuschi	8824f7e9df	Make logcumsumexp numerically stable (#9050 ) * Make logcumsumexp numerically stable * Refactor * Refactor for special case ndim=0 * Refactor * Use the correct device for mask --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-14 19:25:17 -05:00
chenyu	81597ddd96	increase lr for bert (#9098 ) had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview	2025-02-14 19:10:35 -05:00
b1tg	3ad39b247b	refactor LLVMRenderer (#9090 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 08:00:31 +08:00
b1tg	1f1362fd27	add truncate_bf16 (#9078 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 07:59:09 +08:00
Ahmed Harmouche	2dc8f1867c	Synchronize webgpu (#9093 )	2025-02-15 00:52:10 +03:00

1 2 3 4 5 ...

7917 Commits