tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
George Hotz	cf66df0ea6	put load early to make pointers match (#11524 )	2025-08-05 20:04:32 -07:00
George Hotz	92175626e3	prereqs: move views to codegen (#11522 )	2025-08-05 19:27:58 -07:00
chenyu	c9225d22ce	only disable flaky test_jit_multidev_xfer (#11523 )	2025-08-05 22:17:25 -04:00
George Hotz	f58fd3143d	cleanup fix_kernel (#11520 ) * cleanup fix_kernel * early load buffer * early meta ops * move those to fix_kernel_ops * fix tests * remote metal was flaky * Revert "fix tests" This reverts commit `a27019383d`. * that hack broke things * fine for ptx	2025-08-05 18:38:43 -07:00
George Hotz	067daee5be	pin torch to 2.7.1 (#11519 )	2025-08-05 15:58:57 -07:00
George Hotz	b39f43c46a	optimize in rewrite, try 2 (#11518 ) * changes * fix test uops * optimize in rewrite, try 2	2025-08-05 15:52:53 -07:00
George Hotz	07b0df0d86	hotfix: test tensor dims start at 1	2025-08-05 15:40:24 -07:00
George Hotz	4dabdf7c6d	Revert "optimize in rewrite (#11516 )" (#11517 ) This reverts commit `3b777a9e05`.	2025-08-05 15:39:07 -07:00
George Hotz	3b777a9e05	optimize in rewrite (#11516 ) * changes * fix test uops * dim shouldn't be 0 * huh, why did that one not save	2025-08-05 15:33:26 -07:00
nimlgen	ec676eddfa	nv: move base address higher (#11514 )	2025-08-05 22:42:53 +03:00
qazal	7703f8b805	viz: skip flops info if estimates is symbolic (#11513 )	2025-08-05 22:12:52 +03:00
nimlgen	fc4e713d1c	jit graph split tests (#11507 ) * jit graph split tests * fix * one more test * more tests * fix * xm * rmeote	2025-08-05 21:32:37 +03:00
George Hotz	c57fde51f9	move swizzler to opt (#11509 )	2025-08-05 11:31:30 -07:00
chenyu	ace8e9a706	fix test_conv2d_winograd (#11511 )	2025-08-05 12:15:46 -04:00
chenyu	223aaa0492	clean up more conv tests (#11510 )	2025-08-05 12:15:30 -04:00
Garret Castro	76e62a1c23	extract conv layer test logic (#11488 ) * refactor: extract conv layer test logic * tuple is unnecessary * integrate _test_conv logic into all conv tests * fix linter, forgot dilation * undo winograd extraction adds too many if statements for a single case	2025-08-05 11:15:54 -04:00
b1tg	8b8bd6c534	make einsum generate same kernels (#11508 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-05 11:12:52 -04:00
uuuvn	011ef8fa9d	Fix incorrect jit current batch devs reset (#11505 ) `current_batch_devs = []` (in `flush_batch()`) happens between `new_batched_devs = ...` and `current_batch_devs = new_batched_devs` => doesn't actually reset anything leading to things not jitting properly which 2xs remote bert step time (should have similar effects on any non-hcq backend)	2025-08-05 08:16:16 +03:00
chenyu	f02720ca2d	fix fuse gate_contiguous unique (#11504 )	2025-08-04 23:43:31 -04:00
George Hotz	7f6acfb0d5	give define global and friends a shape (#11502 ) * give define global and friends a shape * ignore negative size * ptx fix	2025-08-04 19:09:39 -07:00
chenyu	83385e7abc	update gradient src in ramp.py (#11499 ) that's simplified now	2025-08-04 18:58:03 -04:00
qazal	846a2826ab	viz: remove TracingKey.fmt (#11482 ) * viz: remove TracingKey.fmt * remove from test too	2025-08-05 00:00:03 +03:00
chenyu	01d44e8f16	tiny reduce_gradient cleanup [pr] (#11498 )	2025-08-04 16:12:53 -04:00
chenyu	8a11af01ed	remove broken paperswithcode links in doc (#11497 )	2025-08-04 13:12:33 -04:00
leopf	4f0ee4e982	BPE tokenizer (#11415 ) * BPE works * refactor tok * oops * basic tests * fix eval * smaller diff * fix error * proper vocab decoding * use regex for splitting * escape ucatrange * full compat --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-04 09:52:38 -07:00
b1tg	06af9f9236	fix double exception + add name,loc in error msg (#11487 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-04 13:41:23 +03:00
nimlgen	4877aa965a	ast seems to probe nv as well (#11494 )	2025-08-04 11:47:07 +03:00
chenyu	e0106b6b25	1/(xc) -> (1/c)(1/x) (#11491 ) example: 2(2a).reciprocal() -> a.reciprocal() # TODO: bounds for reciprocal # TODO: should z3 work?	2025-08-03 23:35:46 -04:00
qazal	5870352fe1	viz: factorize llvm-mca call (#11490 )	2025-08-04 00:31:23 +03:00
chenyu	dbc7807c61	enable WEBGPU tests with buffer limit (#11489 ) TestSample still fails?	2025-08-03 13:02:44 -07:00
nimlgen	8f374ee1f7	nv: print devfmr in gsp logs (#11484 )	2025-08-03 15:12:53 +03:00
chenyu	823f1a01db	move cast around expand backward to tensor.py (#11483 )	2025-08-02 23:03:54 -04:00
chenyu	0ce0f51010	generic double cast folding (#11481 ) b.cast(a).cast(b) -> b if a preserves all values in b	2025-08-02 19:26:37 -04:00
qazal	72e0d1d0dc	viz: profile the compiler in TINY device (#11457 ) * viz: profile the compiler in TINY device * leanup	2025-08-03 02:03:20 +03:00
chenyu	66be747908	few more dtype cast convinience methods (#11480 )	2025-08-02 15:47:09 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
nimlgen	da0b955be4	hcq: cpu can be graphed (#11474 ) * hcq: cpu can be graphed * ops * new jit decisions * fix test * fix remote * cleaner * fix	2025-08-02 21:01:19 +03:00
chenyu	f7965f85aa	Revert "feat: faster index building (#11462 )" (#11478 ) This reverts commit `3a4deb08d2`.	2025-08-02 12:50:48 -04:00
kevvz	ef7e01cadf	Fix SVD shape bug + Fix batched SVD bug (#11477 ) * failing test case * fix * better test * space	2025-08-02 09:47:41 -07:00
b1tg	6ecaf8e7b2	refactor: use less index and simplify reduce axes check [pr] (#11476 ) * use output_shape/full_shape * simple final_reduces check --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-02 09:44:51 -07:00
wozeparrot	3a4deb08d2	feat: faster index building (#11462 ) * feat: faster index building * feat: correct training samples	2025-08-02 11:50:18 -04:00
nimlgen	8cc2d64edb	amd: reuse create_queues for usb iface (#11473 )	2025-08-02 14:40:46 +03:00
chenyu	9e8e6b45ab	grad acc train llama (#11467 ) * grad acc train llama * log step time	2025-08-01 15:54:50 -04:00
chenyu	7ad7329257	data parallel train llama (#11466 )	2025-08-01 12:13:51 -04:00
nimlgen	9f2182f92f	cpu: start threading (#11324 ) * cpu: threading * syncs * llvm * fix * opt * fx * fix * missed sync * one line less * cleaner * fix	2025-08-01 15:35:07 +03:00
qazal	c7ae1bd474	viz: more consistent border styling (#11464 )	2025-08-01 09:31:06 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
qazal	719827b95d	viz: add flops / mem bw to device programs (#11459 ) * viz: add flops / mem bw to device programs * better spacing style	2025-08-01 02:12:30 +03:00
chenyu	3f742a5a7c	comma space lab models benchmark (#11461 )	2025-07-31 19:06:18 -04:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00

1 2 3 4 5 ...

9695 Commits