tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	74742c018f	hotfix: setup_mock_nv_osx	2025-02-13 12:26:15 +08:00
JaSpa99	d2ff55e9c6	OSX GPUOcelot (#8209 ) * add patches * add osx test in ci * macos specific uvm, gpfifo mask * only do that for now * Revert "add patches" This reverts commit `80d3112a57`. * use fork for now * workflow only one worker * merge osxtests with tests * Revert "merge osxtests with tests" This reverts commit `3461c8f46c`. * macos pagesize 16384 --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 12:24:29 +08:00
chenyu	f4f56d7c15	move time_linearizer to extra.optimization.helpers [pr] (#9048 ) no longer used in tinygrad	2025-02-12 15:49:58 -05:00
chenyu	c15486cf39	remove contiguous in test_subbuffer_used [pr] (#9046 ) test works without contiguous	2025-02-12 14:41:16 -05:00
rmtew	b3eab03055	Three things to get Windows CI working correctly: (#9047 ) - Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV - It doesn't actually persist for Windows unless shell is explicitly set to bash. - Add the assertion to ensure the selected backend is actually used.	2025-02-12 14:41:00 -05:00
chenyu	f53b819648	UOps. -> Ops. [pr] (#9044 ) updated the comments and doc except extra	2025-02-12 12:53:23 -05:00
qazal	6811688d29	disallow VIEW(BUFFER) in tensor [pr] (#9041 )	2025-02-12 17:27:35 +01:00
chenyu	7b5ac2c15e	free_intermediates in bert (#9040 ) also re-enable dropout and update EVAL_BS	2025-02-12 10:00:39 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
Josh Moore	0c97c10814	TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034 )	2025-02-12 14:49:18 +08:00
Ignacio Sica	d581afd873	skipdata capstone (#9026 )	2025-02-12 08:11:14 +08:00
chenyu	2845f8797a	failed test cases for rsqrt at 0 and similar ones (#9035 ) * failed test cases for rsqrt at 0 and similar ones related to 0inf this failed	2025-02-11 17:50:16 -05:00
nimlgen	101652a55c	hcq: thread fence (#8991 ) * amd: thread fence * nv	2025-02-11 18:09:37 +03:00
George Hotz	45aae8a6bc	hotfix: add External Benchmark Schedule to CI	2025-02-11 22:06:17 +08:00
nimlgen	17fa6e7619	disk: better error desc when not opened (#9028 )	2025-02-11 16:31:04 +03:00
nimlgen	166670a2f2	nv: fill grid/block sizes (#9025 )	2025-02-11 16:30:30 +03:00
qazal	c80603285e	bring back some things from the fix_kernel_ops diff [pr] (#9027 ) * bring fix_kernel_ops back [pr] * fix	2025-02-11 14:20:31 +01:00
George Hotz	9209b85c91	add UOps.CAT (#9022 ) * add UOps.CAT [pr] * comment + no pr	2025-02-11 19:50:37 +08:00
George Hotz	a521260b7a	dont reduce the ptr size, sz is base for unaligned [pr] (#9023 )	2025-02-11 19:50:23 +08:00
George Hotz	d0d58a6771	add CUSTOM support to cstyle (#9020 )	2025-02-11 18:02:58 +08:00
George Hotz	fb698920f1	revert scheduler change (#9019 ) * Revert "cleanup ast rewriter [pr] (#9012)" This reverts commit `bf0bcb2d5a`. * Revert "kernel op cleanups + use ScheduleItem [pr] (#9009)" This reverts commit `c52cd2b437`. * Revert "construct the schedule sink 2 (#8925)" This reverts commit `cfd3db7862`.	2025-02-11 11:34:12 +08:00
George Hotz	16e9e4db37	make llvm opt the default (#9017 )	2025-02-11 10:08:45 +08:00
divinity76	bec4f59ce8	workaround f16 cast ambiguity (#8935 ) for unknown reasons, without this, when trying to execute "Llama 3.2 1B", I get the error below. Fwiw I do not know the performance impact for this change. I can't even get exo running, but this change allows me to /get further/ (before running into a separate issue with vram allocation? story for another day i suppose) error: ``` Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Nvrtc Error 6, NVRTC_ERROR_COMPILATION <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ 4 errors detected in the compilation of "<null>". ```	2025-02-11 09:38:56 +08:00
chenyu	b741a9aae7	update doc of Tensor.tolist (#9016 ) it returns single value for const tensor	2025-02-10 16:51:23 -05:00
Joel	04e64765c4	Minor typo in ReadMe (#9015 )	2025-02-10 15:30:20 -05:00
chenyu	6c39aa4a6b	adjust cuda ci test targets (#9014 )	2025-02-10 15:29:59 -05:00
nimlgen	dfc9d6827f	am_smi: print power state (#9013 )	2025-02-10 23:07:39 +03:00
qazal	bf0bcb2d5a	cleanup ast rewriter [pr] (#9012 )	2025-02-10 19:07:59 +01:00
chenyu	586e48d696	a few more backward tests now pass (#9010 )	2025-02-10 12:46:21 -05:00
chenyu	f9898f7554	update gpuocelot commit (#9011 )	2025-02-10 12:18:44 -05:00
qazal	c52cd2b437	kernel op cleanups + use ScheduleItem [pr] (#9009 )	2025-02-10 17:54:30 +01:00
chenyu	25fa5e4d5f	enable backward tests in test_std_one_in_axis [pr] (#9007 ) still one correction=0 case is broken Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-10 10:44:05 -05:00
qazal	d426f1ad6e	don't open devices in lowering (#9008 )	2025-02-10 15:28:51 +01:00
qazal	cfd3db7862	construct the schedule sink 2 (#8925 ) * work * delete preload * fix metadata * this can keep existing * assign pruning * dedup early * bfs * cycle asserts * move assign check * once	2025-02-10 22:23:02 +08:00
nimlgen	3e005ca0c2	am: resize bar0 to max supported (#9006 )	2025-02-10 16:48:44 +03:00
nimlgen	07cb7e701c	am: fix gfx usage at 100% (#9003 ) * am: fix gfx usage at 100% * not need * not needed * fix power con * not supported on 7600	2025-02-10 16:48:23 +03:00
nimlgen	f91409f038	am: fix proclogs (#9004 )	2025-02-10 16:38:58 +03:00
qazal	cd77e51810	fix tensor realization bug in #8975 (#8984 ) * fix tensor realization bug in #8975 * that's a reshape now * work * works * give those tests better names * test when multiple mops result in the same ShapeTracker * test_become_existing_buf_complex is enough * that too	2025-02-10 13:51:30 +01:00
qazal	b17ec42b56	remove const_arg (#9002 ) * remove const_arg * use -m pytest * remove test_const_arg test, variable arg on CONST does not exist. * use base in test_const_dtype	2025-02-10 12:45:11 +01:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
qazal	fd9f9ec772	realized base tensors become RESHAPE(BUFFER) [pr] (#8994 )	2025-02-10 10:17:54 +01:00
George Hotz	910ae260cd	dsp float4 fold + revectorize [pr] (#8995 ) * dsp float4 fold [pr] * revectorize * fix reg issue * no bool vectorize * cleanups * no need for that	2025-02-10 12:14:32 +08:00
George Hotz	e618efce22	COMMUTATIVE flipping is only for ints (#8996 ) * COMMUTATIVE flipping is only for ints [pr] * no pr * comm fixes this	2025-02-10 12:01:28 +08:00
George Hotz	2983285315	use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993 ) * use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] * add quantize test to dsp * fix tests * older onnx * debug, let's see what's happening	2025-02-10 11:07:35 +08:00
chenyu	9119716761	update Tensor.maximum (#8992 ) now it's just broadcast and UOp.maximum	2025-02-09 21:26:27 -05:00
nimlgen	88add71c25	amd: increase sdma copy size (#8989 ) * amd: increase sdma max copy size * rm this * fix * fx * ops	2025-02-09 20:53:35 +03:00
qazal	7eba5fb413	Tensor.empty is RESHAPE(BUFFER) (#8987 ) * empty is RESHAPE(BUFFER) * eh * add test_empty_buf * can we unsupport this * linter * Revert "can we unsupport this" This reverts commit `0f71e1aadb`.	2025-02-09 18:42:51 +01:00
qazal	44479f8ad6	raise ValueError in view reshape for negative dims [pr] (#8988 )	2025-02-09 17:27:15 +01:00
nimlgen	c6c2373bc0	replace libpciaccess autogen with just pci regs (#8983 ) * replace libpciaccess autogen with just pci regs * add pci.py	2025-02-09 18:40:45 +03:00

1 2 3 4 5 ...

7848 Commits