tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 09:28:04 -05:00

Author	SHA1	Message	Date
nimlgen	dc10187fc0	am: add am_smi (#8739 ) * am: start monitor * cleanups * fixes * hmm * progress * cleanup	2025-01-24 20:16:19 +03:00
geohotstan	04846b91aa	reorder and categorize onnx_ops (#8731 ) * new order * remove a todo * constant node is definitely requires_grad false * one new line spacing * property and graph * oops linter	2025-01-23 13:18:54 -05:00
chenyu	49b914ee69	simpler bert acc [pr] (#8714 ) logit.log_softmax().argmax(-1) is equivalent to logit.argmax(-1)	2025-01-22 10:32:19 -05:00
geohotstan	dd82b4c913	make onnx runner a class (#8647 ) * this * clean up * more clean ups and improve debug msg * more correct training toggler * remove manual training toggling * change some variable names * actually just add the training toggle for LIMIT envvar too * more refinement * __call__ and OnnxRunner * fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later * ahhhh found another mistake * remove limit from __call__ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 10:11:05 -08:00
ignaciosica	b49a04145e	fix for int plus minor cleanup (#8650 )	2025-01-17 22:30:39 -05:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
Francis Lata	c25d5d3101	improve isin checks (#8589 )	2025-01-13 12:12:31 -05:00
nimlgen	38b5ac4d4a	mypy for mockgpu/cuda & dsp/run (#8575 )	2025-01-12 18:25:39 +03:00
geohotstan	815c505e1d	fixes from adapting tvm tests (#8570 )	2025-01-11 11:38:36 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
George Hotz	c7acd40574	more aggressive onnx const creation [pr] (#8561 )	2025-01-10 17:38:32 -08:00
George Hotz	70fa65cd95	viz fixups + scheduler option [pr] (#8557 )	2025-01-10 15:09:31 -08:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
George Hotz	5720871903	onnx consts are const [pr] (#8548 )	2025-01-09 16:09:22 -08:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
geohotstan	c69f459c96	Add checking variable dimension to onnx (#8518 ) * validate variable dims and fix buffer_parse to not use numpy * fix var_dim parsing * gah float16 * revert buffer_parse stuff * revert that revert * correct some err msges * add some more debug msgs I find helpful * tensor init noop * add an assert just for the sake of it. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-07 00:30:35 -05:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
geohotstan	de306c615b	[fixed] onnx pool cleanup (#8474 ) * pool janitor duty * actually conv allows asymmetric pads * a little prettier	2025-01-02 16:56:10 -05:00
chenyu	6fa38367bf	Revert "onnx pool ops clean up (#8471 )" (#8472 ) This reverts commit `241db29ede`.	2025-01-02 11:04:34 -05:00
geohotstan	241db29ede	onnx pool ops clean up (#8471 )	2025-01-02 10:45:30 -05:00
geohotstan	c4b13e2f6d	add onnx DequantizeLinear (#8468 ) * is this right? * small changes * dont support float8 * mergeable?	2025-01-02 09:52:49 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
George Hotz	d4a1d5211e	bring back the DSP runtime	2024-12-31 12:01:42 -05:00
George Hotz	24de25b52f	example to benchmark onnx [pr] (#8459 ) * example to benchmark onnx [pr] * reset global count	2024-12-31 11:38:33 -05:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
Francis Lata	5755ac1f72	Fix FC layer ResNet load_from_pretrained error (#8387 ) * validate that FC exists before loading pretrained weights * add test case for ResNet pretrained model without FC layer * remove extra newline * rename test case * reraise exception if not handled by check	2024-12-26 18:11:27 -05:00
nimlgen	a647f3dd2c	move mockgpu to tests [pr] (#8396 ) * move mockgpu to tests * linter * i'm so sorry * sorry, python * path	2024-12-24 23:48:02 +03:00
Francis Lata	239d2a7214	explicitly check value for not None (#8382 )	2024-12-23 11:12:39 -05:00
geohotstan	3f83748661	update onnx and onnx_ops to 3.10+ typing (#8360 ) * fixed mypy and updated to modern typing * selective ruff check changes (all except E501) * some more clean ups * fix comment * small nit	2024-12-21 11:17:47 -05:00
geohotstan	423d823c50	add GatherND and ScatterND to onnx ops (#8241 ) * implemented * this implementation is now correct * this is fine I guess * better variable names * finally correct gathernd * add a note * eh just leave it at this for now * teeny adjustment	2024-12-19 00:35:04 -05:00
George Hotz	bd9c015b09	tests from grad uop path [pr] (#8313 )	2024-12-18 09:25:05 -08:00
geohotstan	32c995a5da	move to_python_const from onnx_ops to onnx (#8158 ) * move to_python_const out * move more over * try deleting alternative gather implementation * Revert "try deleting alternative gather implementation" This reverts commit `d46b30b717`. * add types to onnx ops * better debug msg * improve some com.microsoft too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-17 14:12:06 -05:00
geohotstan	eebb3a1bb9	unique names (#8213 )	2024-12-13 12:14:47 -05:00
Ahmed Harmouche	651f72442c	encapsulate the exported webgpu model (#8203 )	2024-12-13 10:55:37 +01:00
George Hotz	8a04a3a77a	rename LazyBuffer -> UOp [pr] (#8169 ) * rename LazyBuffer -> UOp [pr] * fix docs	2024-12-11 16:15:52 -08:00
qazal	07b6d5cf63	assign early folding (#8093 ) * assign early folding [pr] * move to to_si * - * fix generate_dataset * diff too big * no recreation, no diff * gzip * new sops from tiny10 * final try	2024-12-07 17:02:55 +08:00
chenyu	564b3a3e1b	onnx Bitwise ops (#8095 ) free stuff!	2024-12-06 16:58:09 -05:00
geohotstan	5184410fc3	combine get inputs and type_parse function in onnx [fixed] (#8081 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous * bug fix for model benchmark * fix comments too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-06 12:34:47 -05:00
Ahmed Harmouche	ba35c4138b	Use matching JS TypedArray for buffer dtype (#8080 )	2024-12-06 14:52:23 +01:00
chenyu	b73d9a7d24	Revert "combine get inputs and type_parse function in onnx (#8069 )" (#8079 ) This reverts commit `074a67a6eb`.	2024-12-06 08:04:21 -05:00
geohotstan	074a67a6eb	combine get inputs and type_parse function in onnx (#8069 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous	2024-12-06 07:42:35 -05:00
Ahmed Harmouche	ce72fe1411	u32 to f16 in tinygrad (#8074 ) * f16 decompression in tinygrad * Typing and cleanup	2024-12-06 12:00:13 +01:00
geohotstan	707e9a9c8e	add _one_hot_along_dim helper for Tensor.arange masking (#8039 ) * feelsbadman * feelsextrabadman * make sure indices is on same device as self Tensor * renamed to _one_hot_along_dim * revert onnx change will do them in onnx only PRs * address feedback * add onnx changes here too * make pad arg better * revert pad arg * maybe still keep dim * simplify onehot onnx ops more --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-05 12:43:00 -05:00
geohotstan	66b8242375	Simple onnx.py clean ups (#8054 ) * start * simplify ops * why did this not work before * will split buffer parse to separate pr * flip the error order * only this much for now * to_python_const clean up * minimize diff * move tensor_methods into onnx.py * improve some type signatures --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-05 10:31:26 -05:00
Ahmed Harmouche	c6f5bb03fa	YoloV8 WebGPU fixes (#8057 ) * Bump up input size to 416, show if webgpu is not supported * Minor fix in export_model	2024-12-05 16:23:45 +01:00
Ahmed Harmouche	ff9a89f714	Proper dtypes for input/output of exported WebGPU model (#8053 ) * Respect input/output dtypes in exported WebGPU model * Add some comments about skipped dtypes	2024-12-05 10:38:05 +01:00
Francis Lata	c3187087f7	QwQ-32B-Preview support (#7962 ) * load weights with some debugging * start running a prompt * cleanup * optionally permute layers and cleanup * add validation for simple prompt * small cleanup * minor cleanup with formatting download links * add a longer prompt * add timing option * some typings * remove unused arg * reset GlobalCounters * minor cleanups	2024-12-04 21:46:37 -05:00
geohotstan	5ce8090d42	simple onnx_ops cleanups (#8003 ) * simple clean ups first * more work * kinda have adam * ooo momentum worked nicely * almost there * wow.. is the onnx test wrong * nicer optim stuff * just skip that test * small comment changes * use naming convention from other parts of codebase --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 15:33:03 -05:00

... 8 9 10 11 12 ...

1363 Commits