tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
qazal	a70d1bf439	move print_diff to process replay [pr] (#8566 ) * move print_diff to process replay [pr] * ruff rightfully complians	2025-01-11 09:28:45 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
chenyu	6a7f971fa0	hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] (#8553 )	2025-01-10 12:57:44 -05:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
chenyu	2cbb34535c	simpler allreduce script [pr] (#8551 ) time everything on tensor level and get time from GlobalCounters.time_sum_s	2025-01-09 21:38:13 -05:00
chenyu	23c56817d8	update and clean up allreduce script [pr] (#8549 ) make `run` to able to run with ring only	2025-01-09 19:35:28 -05:00
geohotstan	299d333806	Add QLinearConv, QLinearMatMul, QLinearAdd, QLinearGlobalAveragePool to onnx (#8478 ) * QLinearEverything * ok ort verify passes * this should be int instead * cast to int then char to do wraparound * cleaner * move contrib ops to microsoft ops --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-09 15:08:53 -08:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
qazal	12fa4340b3	pickle ContextVars in process replay [pr] (#8484 ) * pickle ContextVars in process replay * add test_pickle_context_var [pr] * more realistic	2025-01-03 23:11:54 +08:00
geohotstan	de306c615b	[fixed] onnx pool cleanup (#8474 ) * pool janitor duty * actually conv allows asymmetric pads * a little prettier	2025-01-02 16:56:10 -05:00
chenyu	6fa38367bf	Revert "onnx pool ops clean up (#8471 )" (#8472 ) This reverts commit `241db29ede`.	2025-01-02 11:04:34 -05:00
geohotstan	241db29ede	onnx pool ops clean up (#8471 )	2025-01-02 10:45:30 -05:00
geohotstan	c4b13e2f6d	add onnx DequantizeLinear (#8468 ) * is this right? * small changes * dont support float8 * mergeable?	2025-01-02 09:52:49 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
chenyu	b7397c1322	more typing cleanups [pr] (#8376 ) List, Tuple, DefaultDict	2024-12-22 05:21:03 -05:00
chenyu	18dca3c3d7	isolate train_gpt2 slow kernels [pr] (#8358 ) also fixed run_linearizer with var_vals=None	2024-12-20 17:59:01 -05:00
qazal	5776ea9386	hotfix: account for all changes in process_replay early stopping [pr] (#8348 )	2024-12-20 23:46:46 +08:00
geohotstan	423d823c50	add GatherND and ScatterND to onnx ops (#8241 ) * implemented * this implementation is now correct * this is fine I guess * better variable names * finally correct gathernd * add a note * eh just leave it at this for now * teeny adjustment	2024-12-19 00:35:04 -05:00
chenyu	9789a83064	hotfix DEBUG in speed_v_theoretical.py conv (#8266 ) infinite loop with manual DEBUG set `DEBUG=2 python test/external/speed_v_theoretical.py -k conv` ``` File "/Users/chenyu/code/tinygrad/tinygrad/helpers.py", line 95, in __ge__ def __ge__(self, x): return self.value >= x ^^^^^^^^^^^^^^^ [Previous line repeated 4984 more times] RecursionError: maximum recursion depth exceeded in comparison ```	2024-12-15 19:44:45 -05:00
qazal	67e66ac1ab	hotfix: schedule_uop in process replay (#8260 ) * hotfix: schedule_uop in process replay * notes	2024-12-15 21:24:54 +08:00
chenyu	62e19649c0	lower test_conv_3x3_256_32_32_256_256 (#8226 ) tiny7 is slow	2024-12-13 17:15:53 -05:00
qazal	5864627abe	process replay filter warnings [pr] (#8199 )	2024-12-13 17:43:43 +08:00
George Hotz	8a04a3a77a	rename LazyBuffer -> UOp [pr] (#8169 ) * rename LazyBuffer -> UOp [pr] * fix docs	2024-12-11 16:15:52 -08:00
chenyu	155f7df599	lower test_gemm_4096 expectation on green (#8152 ) getting 119 sometimes, so lowered to 115	2024-12-10 18:05:12 -05:00
qazal	07b6d5cf63	assign early folding (#8093 ) * assign early folding [pr] * move to to_si * - * fix generate_dataset * diff too big * no recreation, no diff * gzip * new sops from tiny10 * final try	2024-12-07 17:02:55 +08:00
chenyu	564b3a3e1b	onnx Bitwise ops (#8095 ) free stuff!	2024-12-06 16:58:09 -05:00
chenyu	d000c08f04	fix return type of Tensor.pow (#8091 ) int to power of int should return int etc, it hints that we would like to have Ops.POW	2024-12-06 13:38:29 -05:00
geohotstan	5184410fc3	combine get inputs and type_parse function in onnx [fixed] (#8081 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous * bug fix for model benchmark * fix comments too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-06 12:34:47 -05:00
chenyu	b73d9a7d24	Revert "combine get inputs and type_parse function in onnx (#8069 )" (#8079 ) This reverts commit `074a67a6eb`.	2024-12-06 08:04:21 -05:00
geohotstan	074a67a6eb	combine get inputs and type_parse function in onnx (#8069 ) * 1 is simpler than 2 * variable name * change error wording * shapes for sequence type must be homogeneous	2024-12-06 07:42:35 -05:00
chenyu	5c6ed5dba6	lower test_conv_3x3_256_32_32_256_256 expectation (#8060 ) failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210	2024-12-05 10:30:56 -05:00
George Hotz	20878be2af	lower test_gemv_4096_16384 expectations	2024-12-05 12:08:26 +08:00
geohotstan	5ce8090d42	simple onnx_ops cleanups (#8003 ) * simple clean ups first * more work * kinda have adam * ooo momentum worked nicely * almost there * wow.. is the onnx test wrong * nicer optim stuff * just skip that test * small comment changes * use naming convention from other parts of codebase --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-04 15:33:03 -05:00
chenyu	0693158d28	lower v_theoretical gemv on red (#8042 ) tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209	2024-12-04 13:59:40 -05:00
George Hotz	08657cb7b0	hotfix: bump expectations in speed_v_theoretical	2024-12-04 19:00:33 +08:00
George Hotz	ea65c79ba2	hotfix: don't spam BEAM debug in speed_v_theoretical	2024-12-04 18:47:16 +08:00
George Hotz	09b00b1b04	hotfix: use kernel timings instead of python timings in speed_v_theoretical	2024-12-04 18:36:17 +08:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
George Hotz	09eac42fd6	cache indexed uops in st [pr] (#8008 ) * cache indexed uops in st [pr] * remove arg from range	2024-12-03 21:27:07 +08:00
George Hotz	b8bf5b2787	minor uop speedups [pr] (#8002 ) * minor uop cleaner [pr] * free uop creation speed by removing WeakValueDictionary * a lil faster * disable that test * lines * and it doesn't print non hit patterns	2024-12-03 17:04:48 +08:00
George Hotz	0905f87b68	hotfix: print only kernel time	2024-12-03 14:25:08 +08:00
chenyu	b91fa24387	script to run regressed sd conv on metal (#7995 ) * script to run regressed sd conv on metal this and other similar `conv2d + add` kernels contributed to most of the speed regression * # ruff: noqa: E501	2024-12-02 15:34:27 -05:00
qazal	b797aee720	uop global buf number tracking try 2 [pr] (#7912 ) * uop buffer init small refactor [pr] * add early * this way it doesn't need late * buffer_num * itertools.count * count from 0 * down to 380	2024-12-02 14:45:17 +08:00
George Hotz	cbcc1c20eb	second try at block linearize (#7892 ) * second try at block linearize * weeee, works for lil matmul * it's so beautiful * test tiny passes * fix bugs * combine matching BLOCKENDS * wrapping * test lin failures passes * those failures were fake * flip sort order * fix ptx tests * deal with store better * dumb ptx fix * expect less * reduce lines * reduce lines * less lines and cleaner * no defaultdict * tighter * simpler block_parent_count	2024-12-02 13:43:09 +08:00
George Hotz	6c1efb9a72	hotfix: amd gemv was flaky	2024-12-02 11:08:24 +08:00
chenyu	bb23469f93	lower conv threshold on red (#7948 )	2024-11-28 13:31:06 -05:00

... 4 5 6 7 8 ...

870 Commits