Commit Graph

7848 Commits

Author SHA1 Message Date
George Hotz
74742c018f hotfix: setup_mock_nv_osx 2025-02-13 12:26:15 +08:00
JaSpa99
d2ff55e9c6 OSX GPUOcelot (#8209)
* add patches

* add osx test in ci

* macos specific uvm, gpfifo mask

* only do that for now

* Revert "add patches"

This reverts commit 80d3112a57.

* use fork for now

* workflow only one worker

* merge osxtests with tests

* Revert "merge osxtests with tests"

This reverts commit 3461c8f46c.

* macos pagesize 16384

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:24:29 +08:00
chenyu
f4f56d7c15 move time_linearizer to extra.optimization.helpers [pr] (#9048)
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
c15486cf39 remove contiguous in test_subbuffer_used [pr] (#9046)
test works without contiguous
2025-02-12 14:41:16 -05:00
rmtew
b3eab03055 Three things to get Windows CI working correctly: (#9047)
- Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV
- It doesn't actually persist for Windows unless shell is explicitly set to bash.
- Add the assertion to ensure the selected backend is actually used.
2025-02-12 14:41:00 -05:00
chenyu
f53b819648 UOps. -> Ops. [pr] (#9044)
updated the comments and doc except extra
2025-02-12 12:53:23 -05:00
qazal
6811688d29 disallow VIEW(BUFFER) in tensor [pr] (#9041) 2025-02-12 17:27:35 +01:00
chenyu
7b5ac2c15e free_intermediates in bert (#9040)
also re-enable dropout and update EVAL_BS
2025-02-12 10:00:39 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
Ignacio Sica
aaed315fee add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00
Josh Moore
0c97c10814 TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034) 2025-02-12 14:49:18 +08:00
Ignacio Sica
d581afd873 skipdata capstone (#9026) 2025-02-12 08:11:14 +08:00
chenyu
2845f8797a failed test cases for rsqrt at 0 and similar ones (#9035)
* failed test cases for rsqrt at 0 and similar ones

related to 0*inf

* this failed
2025-02-11 17:50:16 -05:00
nimlgen
101652a55c hcq: thread fence (#8991)
* amd: thread fence

* nv
2025-02-11 18:09:37 +03:00
George Hotz
45aae8a6bc hotfix: add External Benchmark Schedule to CI 2025-02-11 22:06:17 +08:00
nimlgen
17fa6e7619 disk: better error desc when not opened (#9028) 2025-02-11 16:31:04 +03:00
nimlgen
166670a2f2 nv: fill grid/block sizes (#9025) 2025-02-11 16:30:30 +03:00
qazal
c80603285e bring back some things from the fix_kernel_ops diff [pr] (#9027)
* bring fix_kernel_ops back [pr]

* fix
2025-02-11 14:20:31 +01:00
George Hotz
9209b85c91 add UOps.CAT (#9022)
* add UOps.CAT [pr]

* comment + no pr
2025-02-11 19:50:37 +08:00
George Hotz
a521260b7a dont reduce the ptr size, sz is base for unaligned [pr] (#9023) 2025-02-11 19:50:23 +08:00
George Hotz
d0d58a6771 add CUSTOM support to cstyle (#9020) 2025-02-11 18:02:58 +08:00
George Hotz
fb698920f1 revert scheduler change (#9019)
* Revert "cleanup ast rewriter [pr] (#9012)"

This reverts commit bf0bcb2d5a.

* Revert "kernel op cleanups + use ScheduleItem [pr] (#9009)"

This reverts commit c52cd2b437.

* Revert "construct the schedule sink 2 (#8925)"

This reverts commit cfd3db7862.
2025-02-11 11:34:12 +08:00
George Hotz
16e9e4db37 make llvm opt the default (#9017) 2025-02-11 10:08:45 +08:00
divinity76
bec4f59ce8 workaround f16 cast ambiguity (#8935)
for unknown reasons, without this, when trying to execute "Llama 3.2 1B", I get the error below. Fwiw I do not know the performance impact for this change. I can't even get exo running, but this change allows me to /get further/ (before running into a separate issue with vram allocation? story for another day i suppose)

error: 
```
Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Nvrtc Error 6, NVRTC_ERROR_COMPILATION <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                 ^

<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                                ^

<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                                               ^

<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                                                              ^

4 errors detected in the compilation of "<null>".
```
2025-02-11 09:38:56 +08:00
chenyu
b741a9aae7 update doc of Tensor.tolist (#9016)
it returns single value for const tensor
2025-02-10 16:51:23 -05:00
Joel
04e64765c4 Minor typo in ReadMe (#9015) 2025-02-10 15:30:20 -05:00
chenyu
6c39aa4a6b adjust cuda ci test targets (#9014) 2025-02-10 15:29:59 -05:00
nimlgen
dfc9d6827f am_smi: print power state (#9013) 2025-02-10 23:07:39 +03:00
qazal
bf0bcb2d5a cleanup ast rewriter [pr] (#9012) 2025-02-10 19:07:59 +01:00
chenyu
586e48d696 a few more backward tests now pass (#9010) 2025-02-10 12:46:21 -05:00
chenyu
f9898f7554 update gpuocelot commit (#9011) 2025-02-10 12:18:44 -05:00
qazal
c52cd2b437 kernel op cleanups + use ScheduleItem [pr] (#9009) 2025-02-10 17:54:30 +01:00
chenyu
25fa5e4d5f enable backward tests in test_std_one_in_axis [pr] (#9007)
still one correction=0 case is broken

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-10 10:44:05 -05:00
qazal
d426f1ad6e don't open devices in lowering (#9008) 2025-02-10 15:28:51 +01:00
qazal
cfd3db7862 construct the schedule sink 2 (#8925)
* work

* delete preload

* fix metadata

* this can keep existing

* assign pruning

* dedup early

* bfs

* cycle asserts

* move assign check

* once
2025-02-10 22:23:02 +08:00
nimlgen
3e005ca0c2 am: resize bar0 to max supported (#9006) 2025-02-10 16:48:44 +03:00
nimlgen
07cb7e701c am: fix gfx usage at 100% (#9003)
* am: fix gfx usage at 100%

* not need

* not needed

* fix power con

* not supported on 7600
2025-02-10 16:48:23 +03:00
nimlgen
f91409f038 am: fix proclogs (#9004) 2025-02-10 16:38:58 +03:00
qazal
cd77e51810 fix tensor realization bug in #8975 (#8984)
* fix tensor realization bug in #8975

* that's a reshape now

* work

* works

* give those tests better names

* test when multiple mops result in the same ShapeTracker

* test_become_existing_buf_complex is enough

* that too
2025-02-10 13:51:30 +01:00
qazal
b17ec42b56 remove const_arg (#9002)
* remove const_arg

* use -m pytest

* remove test_const_arg test, variable arg on CONST does not exist.

* use base in test_const_dtype
2025-02-10 12:45:11 +01:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
qazal
fd9f9ec772 realized base tensors become RESHAPE(BUFFER) [pr] (#8994) 2025-02-10 10:17:54 +01:00
George Hotz
910ae260cd dsp float4 fold + revectorize [pr] (#8995)
* dsp float4 fold [pr]

* revectorize

* fix reg issue

* no bool vectorize

* cleanups

* no need for that
2025-02-10 12:14:32 +08:00
George Hotz
e618efce22 COMMUTATIVE flipping is only for ints (#8996)
* COMMUTATIVE flipping is only for ints [pr]

* no pr

* comm fixes this
2025-02-10 12:01:28 +08:00
George Hotz
2983285315 use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993)
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]

* add quantize test to dsp

* fix tests

* older onnx

* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
chenyu
9119716761 update Tensor.maximum (#8992)
now it's just broadcast and UOp.maximum
2025-02-09 21:26:27 -05:00
nimlgen
88add71c25 amd: increase sdma copy size (#8989)
* amd: increase sdma max copy size

* rm this

* fix

* fx

* ops
2025-02-09 20:53:35 +03:00
qazal
7eba5fb413 Tensor.empty is RESHAPE(BUFFER) (#8987)
* empty is RESHAPE(BUFFER)

* eh

* add test_empty_buf

* can we unsupport this

* linter

* Revert "can we unsupport this"

This reverts commit 0f71e1aadb.
2025-02-09 18:42:51 +01:00
qazal
44479f8ad6 raise ValueError in view reshape for negative dims [pr] (#8988) 2025-02-09 17:27:15 +01:00
nimlgen
c6c2373bc0 replace libpciaccess autogen with just pci regs (#8983)
* replace libpciaccess autogen with just pci regs

* add pci.py
2025-02-09 18:40:45 +03:00