tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-31 09:48:25 -05:00

Author	SHA1	Message	Date
wozeparrot	d269bc95fa	faster tinychat (#5993 )	2024-08-08 19:16:26 -07:00
chenyu	1f1eb46af6	more failed simplified UOp div test case (#5992 ) this speculative div was handled by "divisor" in symbolic.	2024-08-08 18:39:25 -04:00
gswangg	94dc61aa1f	typeguard fixes for beautiful-mnsit JIT=2 METAL=1 [run_process_replay] (#5984 ) * fix 'npdtype is not a class' typeguard error * list -> tuple to fix Tensor.shrink typeguard errors * list -> tuple to pass global_size typeguard check * convert src -> list to fix partition typeguard error * add type annotation and dtype assert to do_reduce * update partition instead of caller	2024-08-08 17:53:33 -04:00
chenyu	c3e1ae2535	add failed simplified UOp div test case (#5990 ) more cases!	2024-08-08 17:37:48 -04:00
nimlgen	38d5eecc68	hcq profiler support args (#5989 ) * hcq profiler support args * bytes -> _bytes * fix * add test * mypy * not f strings * percison	2024-08-09 00:18:36 +03:00
qazal	45b1761175	smaller test_llama_embedding + assert correctness (#5986 ) * smaller test_llama_embedding in CI * test correctness	2024-08-08 22:11:29 +03:00
Timmy	8c99bdab08	More Multireduce Tests (#5968 ) * multireduce tests * linters * more linters * more linters * seeing how it works with parallel	2024-08-08 22:04:08 +03:00
chenyu	3c0924cac4	UOp int alu patterns match all int (#5987 ) instead of just dtypes.int	2024-08-08 14:50:58 -04:00
gswangg	df44a4e861	Make vectorization of CONST explicit (#5322 ) * remove test_const_vectorize_fold * remove const folding UPat for VECTORIZE * refactor cstyle render_const * remove calls to dtype.scalar() in render_const * add assert * add vectorized const to UOp.const * add UPat GEP-VECTORIZE-CONST -> CONST * render_vectorize for DEFINE_ACC in cstyle * add back missing render_cast in render_const * generate vectorized consts as UOps for DEFINE_ACC * update asserts for DEFINE_ACC with VECTORIZE src * add UPats for PHI with VECTORIZE src * use prev rendered vectorize in DEFINE_ACC render * update DEFINE_ACC in python runtime * update vectorized DEFINE_ACC in PTXRenderer * rebase DEFINE_ACC changes on lowerer * verbose rewrite of bad UPats * simplify UOps.CONST implementation in ops_python * update sum_collapse UPats for DEFINE_ACC-VECTORIZE * revert linearizer to TOT * fix DEFINE_ACC implementation in ops_python * simplify DEFINE_ACC in cstyle * Fix linter error * support VECTORIZE in fold gated load/store UPat * support VECTORIZE in other fold gated load UPats * rewrite VECTORIZE in UPat for no input DEFINE_ACC * simplify DEFINE_ACC render in cstyle * make VECTORIZE rules more concise * add more vectorize fold tests * inline VECTORIZE-CONSTs in cstyle render * revert VECTORIZE/GEP rule refactor * revert cstyle render_const refactor * inline VECTORIZE-CONSTs in cstyle render * implicitly vectorized const rendering -> explicit * WMMA VECTORIZE CONST process replay hacks * VECTORIZE CONST NAN process_replay hacks * more VECTORIZE CONST NAN hacks * cleanup process_replay hacks * isnan() -> not isfinite() cstyle VECTORIZE CONST * tweak isnan and isfinite checks VECTORIZE CONST * tweak for positive vs negative infinity VECTORIZE CONST * add assert to PTX CONST render * process_replay VECTORIZE CONST render parity for PTX STORE * vmin/vmax for VECTORIZE'd CONST * update WMMA folding rules * add tests for WMMA VECTORIZE fold * hack for cstyle half4 CONST zero process_replay parity * revert PTX backend changes * add back minimal DEFINE_ACC PTX change * remove cstyle process_replay hacks * remove dead code in PTX CONST render * cleanup vmin/vmax logic for VECTORIZE'd CONSTs * update vectorize fold tests to use DEFINE_VAR * fix long line formatting in test * remove unwanted merge artifact * more vmin/vmax cleanup * remove unnecessary asserts * yet more vmin/vmax cleanup * get rid of explicit VECTORIZE CONST logic in _min_max * reuse CONST instead of creating a new one * remove unneeded cast * handle DType correctly in sconst * improve readability of tests * save a line * save another line * tuplize pats in src * remove GEP-VECTORIZE pats * add vec +0 fold * HACK: fold only vec8 +0 * remove vectorized ALU fold hack --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-08-08 20:59:05 +03:00
chenyu	62c77a2831	trim const in UOp div_folding (#5982 ) simplify `(4x+4y+7)//16` to `(x+y+1)//4`. fixed `GPU=1 UOP_IS_SYMBOLIC=1 IMAGE=2 python -m pytest test/test_ops.py -k conv`	2024-08-08 12:49:05 -04:00
qazal	e6d41b0ce7	hotfix: adjust test_backward_pass_diamond_model thresholds (#5981 )	2024-08-09 00:20:53 +08:00
gswangg	08d22066ee	simplify ALU vmin==vmax fold (#5962 )	2024-08-08 11:29:16 -04:00
Elias Wahl	c9b4602854	no load in INITMLPERF (#5957 )	2024-08-08 11:28:24 -04:00
nimlgen	183c4c91a3	fix non-jitted transfers in profile (#5980 ) * fix transfers in profile * fix linter * sync to be sure everythin is recorded	2024-08-08 17:58:08 +03:00
nimlgen	76eca0d27e	nv fix host mem mappings (#5979 )	2024-08-08 17:03:44 +03:00
nimlgen	e89eff11a6	amd raise when not supported arch (#5978 )	2024-08-08 14:46:14 +03:00
George Hotz	bc55c8a30e	pmatmul example + GB/s bugfix [run_process_replay] (#5974 ) * pmatmul example + bugfix * improve pmatmul * Update real_pmatmul.py	2024-08-07 22:32:11 -07:00
George Hotz	c5baa3d66b	hotfix: don't run OOM test in CI	2024-08-07 22:19:29 -07:00
chenyu	859d0e4709	UOp simplify `(x+c0)c1 -> xc1+c0*c1` (#5973 )	2024-08-07 21:25:22 -04:00
wozeparrot	97d708252a	remove realize from threefry (#5969 )	2024-08-07 15:08:49 -07:00
George Hotz	bf8ec23b00	hotfix: contiguous on precompute_freqs_cis	2024-08-07 14:40:56 -07:00
wozeparrot	d3e427c8d9	fix sqlite3 locks (#5971 )	2024-08-07 14:38:19 -07:00
nimlgen	cc37c99ae4	tiny hcq touchups (#5964 )	2024-08-07 21:03:20 +03:00
nimlgen	8d8704af2d	fix amd exec_update for locals (#5966 )	2024-08-07 21:02:56 +03:00
ignaciosica	0ddcd005f5	fix priority width and give more space for src (#5509 )	2024-08-07 10:48:18 -07:00
tyoc213	0c4e9dbe71	retrieve defined opencl error codes (#5792 )	2024-08-07 10:46:24 -07:00
ignaciosica	4b48f166ec	Refactor render_kernel for NV [run_process_replay] (#5965 ) * start working on it * blind test with process replay * remove noqa:E501 refactoring make_cuda_dtype * refactor even more but with known bug * fix known bug with duplicated includes * working locally * add noqa:e501 * remove comment and move map * fix qaz comments * remove comment --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-08-07 20:36:04 +03:00
qazal	d6f4a61c42	graph LBScheduleItem [run_process_replay] (#5960 ) * add toposort key to LBScheduleItem * use dedup * graph LBScheduleItem * make that comment beautiful again * diff_schedule utils * update fuzz_schedule	2024-08-07 19:59:11 +03:00
George Hotz	0a8668cf30	improvements to docs	2024-08-07 09:57:24 -07:00
qazal	7677361d90	test pushing through different expands in 1 kernel (#5963 ) * test pushing through different expands in 1 kernel * realize eye * back to test_example_matmul	2024-08-07 19:33:18 +03:00
nimlgen	564a352194	nv unify _gpu_free (#5961 ) * nv unify _gpu_free * revert this	2024-08-07 18:18:17 +03:00
Eitan Turok	39c8c9c00a	Add docs (#5942 ) * init commit * finish writing * add to docs * fix docs * fix typo * delete new line * rename to tensor properties --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-08-07 07:38:51 -07:00
qazal	39dda3d042	rename prescheduled items to lsi [run_process_replay] (#5959 ) * rename to lsi * fuzz_schedule more typings * rename fuzz_schedule	2024-08-07 14:31:50 +03:00
qazal	728b7e189e	diff_schedule tests [run_process_replay] (#5958 ) * diff_schedule tests [run_process_replay] * ok to run serial	2024-08-07 13:50:27 +03:00
chenyu	a7163b80d8	lower test_transcendental fuzz test threshold for sin float64 (#5956 )	2024-08-07 02:04:37 -04:00
chenyu	fa3a36e576	fancier UOp div gcd folding (#5953 ) combine and cancel the remaining const based on gcd of other terms like SumNode.	2024-08-07 02:04:25 -04:00
chenyu	aa7fd7ef74	Use `(-self).lt(-x+1)` for `UOp.ge` (#5955 ) matched symbolic and fixed UOP_IS_SYMBOLIC=1 arange folding	2024-08-07 01:31:27 -04:00
George Hotz	3d445039c2	hotfix: 8800 lines for AMX+intel tc	2024-08-06 17:50:26 -07:00
George Hotz	658d58784b	embedding doesn't cast (#5952 ) * embedding doesn't cast * test the right thing * too much annoying with that test	2024-08-06 17:49:14 -07:00
wozeparrot	30d0cb2a82	fix: fix transcendental flakyness on exp float with 9.96875 (#5951 )	2024-08-06 17:32:13 -07:00
George Hotz	3a0515ea22	hotfix: process_replay/diff_schedule.py to LBScheduleItem	2024-08-06 17:01:05 -07:00
chenyu	aee737bd9e	divide by gcd in UOp div folding (#5949 ) * divide by gcd in UOp div folding `(6x+6y)//16 -> (3x+3y)//8` etc simpler version * only factor out const * don't apply for unsigned * don't need that if * space	2024-08-06 20:00:57 -04:00
George Hotz	6d1fdcfce2	don't reduce the same thing in a vector (#5950 ) * don't reduce the same thing over and over * cleaner way to write it that doesn't loop	2024-08-06 16:59:15 -07:00
qazal	d5d7f4e7b8	more TestIndexing correctness asserts [run_process_replay] (#5948 ) * use torch in test_mnist_val * more asserts	2024-08-07 01:50:42 +03:00
qazal	7f062929e8	start all cached scheduler functions with buf, st [run_process_replay] (#5946 ) * start all cached scheduler functions with buf, st - [x] _recursive_group - [x] _recursive_lazyop - [x] _recurse_reduceops * use dict [run_process_replay]	2024-08-07 01:24:22 +03:00
chenyu	794796256c	UOp.const_factor [run_process_replay] (#5945 ) * UOp.const_factor [run_process_replay] simplify mod and div folding * test does not work now	2024-08-06 18:18:29 -04:00
Elias Wahl	c9862e17d4	MLPERF BERT submission scripts (#5931 ) * green * red * fix benchmark * log * count train samples * oops. 4.0 -> 4.1 * note to todo * no pillow	2024-08-06 18:09:18 -04:00
George Hotz	73d4d51845	add LBScheduleItem type [run_process_replay] (#5944 ) * add LBScheduleItem type [run_process_replay] * minor cleanups * fix * fix fuzz tests * add group cache type	2024-08-06 14:49:40 -07:00
chenyu	1dab75ae37	clean up mlperf dataloader import (#5940 ) use tinygrad tqdm for dataset, and PIL Image is only needed for resnet	2024-08-06 17:10:08 -04:00
qazal	7b6496f2e6	fix the reduceops cache breaking beautiful_mnist (#5938 ) * fix the reduceops cache breaking beautiful_mnist * test_sparse_categorical_crossentropy_simple * starting tests * atol from test_nn * test_sparse_categorical_crossentropy_alt * dont use torch	2024-08-07 00:02:54 +03:00

... 101 102 103 104 105 ...

10633 Commits