tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 23:08:06 -05:00

Author	SHA1	Message	Date
qazal	0b464df605	base change scheduling spec (#4613 ) * spec and kernel cnt * dont use half * skip half	2024-05-16 13:30:49 +03:00
nimlgen	65f7e3b3ab	nv setup constbuf4 (#4511 ) * nv correct constbuf 4 * compare results to cuda * test fixed * failed kernel * repro * revert this change	2024-05-16 10:42:35 +03:00
chenyu	04f2327ca3	fix abs of diff of uint (#4411 )	2024-05-15 18:39:11 -04:00
chenyu	2119e0456d	redo simpler abs and sign (#4611 ) moved Sign logic to function.py, and backward always returns 0 to match torch. rewrite abs as `self * self.sign()`, so it's backward also matches torch.	2024-05-15 18:19:46 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
chenyu	3c11ca452e	skip CLANG test casts between double and half for now (#4609 ) start breaking after github CI image update	2024-05-15 16:17:06 -04:00
chenyu	8694eeb16d	Revert "simpler abs and sign (#4606 )" (#4608 ) This reverts commit `a5e157f663`.	2024-05-15 15:46:33 -04:00
chenyu	a5e157f663	simpler abs and sign (#4606 )	2024-05-15 14:33:09 -04:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
qazal	36d2ac603e	allbufs are base (#4605 )	2024-05-15 20:46:37 +03:00
chenyu	067ff719c2	fix comment for Tensor.swish (#4604 ) bad string replacement when we changed `function.` to `F.`	2024-05-15 13:32:47 -04:00
qazal	cd4d7e18c7	_recurse_lb small cleanup (#4601 ) * minor cleanups * comments * extend env in replay	2024-05-15 19:10:42 +03:00
Ahmed Harmouche	662bca8134	Split UnaryOps.CAST into CAST and BITCAST (#4487 ) * Separate cast and bitcast * Fix lint * No more arg[0] * Revert "No more arg[0]" This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964. * CAST/BITCAST arg is the dtype only, no more tuple * No image bitcast, regenerate dataset * Small fixes	2024-05-15 11:43:31 -04:00
George Hotz	53d082a2aa	move memory into schedule (#4597 )	2024-05-15 07:54:20 -07:00
qazal	a4a23c40a0	test masked assign views (#4599 ) * possible masked * not contiguous mask	2024-05-15 15:06:48 +03:00
George Hotz	ff64bcab69	move graph/search to engine (#4596 )	2024-05-14 23:12:59 -07:00
George Hotz	afa9753d39	ruff cleanup (#4594 ) * check editor config * no editorconfig, it doesn't work * ruff cleanups	2024-05-14 21:16:14 -07:00
wozeparrot	7f009cf9fa	split arange threefry (#4590 )	2024-05-14 21:10:22 -07:00
George Hotz	9425973bc7	docs cleanup and move (#4593 ) * cleanup and move * docs-legacy is gone * don't update setup.py	2024-05-14 20:44:59 -07:00
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
chenyu	2b0ee74bb6	lshift and rshift (#4591 )	2024-05-14 19:16:31 -04:00
nimlgen	45e7400e3c	start amd cleanup (#4583 )	2024-05-14 22:59:59 +03:00
chenyu	a65c8de735	move .half() llama freq_cis to the end of sin and cos (#4587 ) otherwise arange has inf if either dim or context length exceeds half.max	2024-05-14 15:00:18 -04:00
qazal	9aa5e02229	update llmc export (#4584 ) * update example * move train to optim * rename * b2	2024-05-14 21:18:38 +03:00
qazal	355e1c135c	pad fusion tests (#4570 ) * what breaks * Revert "what breaks" This reverts commit `e79f679283`. * simplest case * one unsafe op * expand+pad, shrink+pad * safe case * refactor	2024-05-14 20:34:46 +03:00
chenyu	7afca52796	replace pow in LAMB by tracking b1t and b2t per step (#4582 ) * replace pow in LAMB by tracking b1t and b2t per step * remove t, add [self.b1_t, self.b2_t] to return * adam has one less kernel	2024-05-14 13:08:22 -04:00
nimlgen	9b02aef45a	remove rhip (#4579 ) * remove rhip * remove hip runner	2024-05-14 17:58:19 +03:00
Szymon Ożóg	5eb81ff764	Fix speed compare script (#4581 ) * Fix speed compare script * Update speed_compare_cuda_ptx.py * Update speed_compare_cuda_ptx.py * Remove unused function	2024-05-14 17:47:03 +03:00
nimlgen	2131556c2c	amd mockgpu (#4535 ) * start mock amd gpu * virt files * cleaner * init ci * small fixes * linter * better? * ugh * linter * fix * diable some * run shorter * fixes * add hcq test * fix * fix cmd revert	2024-05-14 14:28:04 +03:00
geohotstan	089eeec271	setitem in-place operator tests (#4577 ) * tests and error * rename to in-place * add a note * more comments * more comments * disable folded advanced setitem tests for now	2024-05-14 01:28:02 -04:00
chenyu	0fa57b8ce9	raise error if setitem tensors have requires_grad (#4575 ) * raise error if setitem tensors have requires_grad working on supporting this, first properly raises error * NotImplementedError	2024-05-13 18:56:47 -04:00
Filip Brzek	f7d08bd454	feat: add acc_dtype to einsum (#4571 )	2024-05-13 14:02:07 -04:00
Szymon Ożóg	d97d5a7689	Optimize PTX gated loads index calculation (#4304 ) * WIP but working * Cleanup * Remove float4 pred and alt * Cleanup * this is somehow slowin it down * Simplify * add define var to ignore when optimizing gates * Update assembly.py * Test for optimizing gated loads * Cleanup * Fix NEG needed before if * Remove unused parameters * Update assembly.py * Fix for cachable gone --------- Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-13 10:14:01 -07:00
qazal	c67b70ca67	small scheduler refactor (#4569 ) * outputs * consistent * more style * doesnt need tuple	2024-05-13 10:47:39 +03:00
qazal	77aa8659f5	use assign_targets in LazyOp creation (#4568 ) * start * correct error * this is possible * document it	2024-05-13 10:24:35 +03:00
qazal	b0fa97e176	assert error detail in test_assign (#4567 ) * use regex assert * that shouldnt raise	2024-05-13 09:56:05 +03:00
chenyu	25ec40ca93	cleanup dtype of tensor creation from list (#4566 )	2024-05-13 02:47:41 -04:00
qazal	4e1135a0bc	assign buffer read/write tests (#4565 ) * simple tests * more tests	2024-05-13 09:43:36 +03:00
George Hotz	b660f60125	all uops are now cachable (#4564 ) * all uops are now cachable * cachable is gone	2024-05-12 22:34:35 -07:00
George Hotz	02327b8adf	simple stuff from new_uops branch (#4563 )	2024-05-12 22:18:05 -07:00
ziereis	f53a23d21e	Test for optim assertion (#4558 ) * add test for assertion * whitespace * restore state --------- Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>	2024-05-12 14:21:28 -07:00
wozeparrot	d7670f8141	quantized llama multilazybuffer fix (#4557 )	2024-05-12 14:19:21 -07:00
ziereis	bcee4743ce	fix error message (#4556 ) * fix error messgae * typo * add suggestion to fix error --------- Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>	2024-05-12 12:35:51 -07:00
chenyu	01a0c1a948	slightly faster nf4 llama (#4542 )	2024-05-12 14:24:42 -04:00
qazal	4c232dc0ae	refactor LoadOps scheduling (#4553 ) * refactor * op -> lop	2024-05-12 12:59:24 +03:00
qazal	3da152f0fe	scheduler docs 2 (#4551 ) * docs * delete cleanups	2024-05-12 12:15:39 +03:00
wozeparrot	e07c7668b3	nf4 llama (#4540 )	2024-05-11 22:22:34 -07:00
George Hotz	7a26bdac65	move scheduleitem to schedule.py (#4541 ) * move scheduleitem to schedule.py * don't need that type checking anymore	2024-05-11 21:13:04 -07:00
George Hotz	508e8a6666	add cpu objdump to LLVM/CLANG (#4537 )	2024-05-11 14:28:44 -07:00
chenyu	bed70b130c	mlperf bert getenv-able EVAL_STEP_FREQ (#4534 )	2024-05-11 14:36:56 -04:00

1 2 3 4 5 ...

4447 Commits