Commit Graph

94 Commits

Author SHA1 Message Date
George Hotz
0695e322a8 fix android cpu device (#12148) 2025-09-13 15:42:04 +08:00
nimlgen
fb96394ff5 auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
nimlgen
21e6926a6a HostLLVMCompiler -> CPULLVMCompiler (#12096) 2025-09-10 14:04:16 +03:00
nimlgen
1c6c42715f unify cpu and llvm (#11982)
* try unify cpu and llvm

* fixes

* fix

* ops

* no llvm

* fix

* rm

* lvmm is ot

* oops

* override

* no llvm

* ignore

* skip llvm

* ooops
2025-09-09 13:54:44 +03:00
nimlgen
ebbcdd6577 cpu: use suppress_finalizing (#12071) 2025-09-08 18:28:09 +03:00
nimlgen
97187bf8b6 cleanup win and arch checks (#12060)
* cleanup win and arch checks

* stupid mypy
2025-09-06 23:08:46 +03:00
nimlgen
10ac427aaa cpu threading (#11951)
* start cpu threading

* fix

* fix2

* fix

* hacks?

* threads

* minor

* no dsp

* dsp 2

* n

* more

* test

* xm

* cleaner

* readable

* f

* reorder

* when no threads

* rangeify

* typos

* not needed

* reapply

* remoev this

* linter

* fixed cpu count in ci

* fix

* fixes

* rm

* typo

* sort based on speed

* test if test works in ci

* Revert "test if test works in ci"

This reverts commit 1f05edb531.

* do not pad thread
2025-09-06 16:13:43 +03:00
nimlgen
2b1844da27 cpu: support several threads in runtime (#12055) 2025-09-06 13:29:31 +03:00
nimlgen
e213b85810 cpu: add thread_id to worker (#11995) 2025-09-04 14:58:13 +03:00
ttomsa
ae0c3cfff6 change clang -march flag to -mcpu on arm (#10970)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
nimlgen
da0b955be4 hcq: cpu can be graphed (#11474)
* hcq: cpu can be graphed

* ops

* new jit decisions

* fix test

* fix remote

* cleaner

* fix
2025-08-02 21:01:19 +03:00
nimlgen
9f2182f92f cpu: start threading (#11324)
* cpu: threading

* syncs

* llvm

* fix

* opt

* fx

* fix

* missed sync

* one line less

* cleaner

* fix
2025-08-01 15:35:07 +03:00
nimlgen
a5371f514b cpu: copies in profile (#11392)
* cpu: copies in profile

* fix

* rename to tiny?
2025-07-27 20:56:27 +03:00
nimlgen
0f374e10d2 cpu: use mmap for allocations (#11349)
* cpu: use mmap for allocations

* ops

* fix mypy
2025-07-23 20:30:18 +03:00
nimlgen
ca09c180dc cpu: remove del spam (#11343)
* cpu: remove del spam

* fix
2025-07-23 12:02:37 +03:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
George Hotz
0f89660ce4 Revert "change clang -march flag to -mcpu on arm (#10841)" (#10942)
This reverts commit 897e42fd1b.
2025-06-23 16:48:28 -07:00
ttomsa
897e42fd1b change clang -march flag to -mcpu on arm (#10841)
* change clang -march flag to -mcpu with fp16 disassembly test

* fix

* add capstone to macos dependencies

* just check no cast in test

* rm import

* woops

* lets check

* move check

* llvm init before cpu chcek

* try this

* bump autogen llvm version

* also update libclang?

* revert

* add comment

* skip llvm test and add comment

* linter
2025-06-23 16:28:48 -07:00
George Hotz
0629e45332 remove cpu graph (#10836)
* remove cpu graph, it's different from the others

* remote was blacklisting CPUGraph

* remove cpugraph from dsp
2025-06-16 11:40:58 -07:00
George Hotz
413e223d6e Revert "remove cpu graph, it's different from the others (#10743)" (#10745)
This reverts commit 3d64a98432.
2025-06-09 22:40:48 -07:00
George Hotz
3d64a98432 remove cpu graph, it's different from the others (#10743)
* remove cpu graph, it's different from the others

* remote was blacklisting CPUGraph
2025-06-09 22:17:10 -07:00
quortus
9e49721c47 CPUGraph support for clang (#10014)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 07:52:35 -04:00
Alexey Zaytsev
3bce5ad2b4 clang should not emit the .comment section (#9859)
This section gets included in the finanl image, and we get a lot of garbage with DEBUG=7
2025-04-12 10:59:11 +08:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
George Hotz
b1c0d8c99d remove cpu and torch backends (#3399)
* remove cpu and torch backends

* don't copy to cpu

* use clang instead of cpu

* multitensor gathers on the first device

* clang is cpu + use default

* fixup

* bugfix
2024-02-15 16:55:39 +01:00
chenyu
ca7973f61c clean up einsum_mulacc (#3312)
* clean up einsum_mulacc

* push get_strides

* stride

* get_strides for ndim
2024-02-04 06:21:19 -05:00
Obada Khalili
b4ea0e18e3 Fix dot product on buffers with zero strides (#3303)
* skip matacc opt if the all src buffers of mul op are const buffers

* add noqa directive for long test

* unskip MALACC opt

* ensure that a_axes at least includes summation axes in order to perform np.einsum correctly

* add regression test for mulacc op

* compute a_slices using a_axes

* refactor  helper of  function to retrieve axes and slices for nonzero strides as well as summation axes

* include a regression test that uses  and  to test the behaviour indirectly
2024-02-04 05:15:06 -05:00
geohotstan
842053873d fix neg logical_not inconsistencies (#3222)
* try

* test: add logical_not tests

* gah im retarded, but this doesn't match types for const()

* fix: can't we jsut do this?

* big change: I don't actually know what I'm doing

* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later

* BYE BYE noqa: E501

* fix: less lines and add test

* fix: rm 2 redundant tests

* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
2024-01-24 11:48:40 -05:00
George Hotz
23b084e70a add device name to device, all are constructed (#3221) 2024-01-23 20:34:56 -08:00
George Hotz
cc2969f690 simpler cstyle (#2966)
* simpler cstyle

* save lines
2024-01-01 16:20:10 -08:00
chenyu
b469fe3723 add CMPEQ (#2931)
* CMPEQ

* work

* fix onnx

* fix round

* fix webgpu

* prettier

* no PADTO in actions
2023-12-25 00:15:55 -05:00
chenyu
677ae7673d use np.less and torch.lt for CMPLT (#2899)
also removed one unused output_type
2023-12-21 14:37:24 -05:00
chenyu
1500aca43d remove output_type in ops_cpu and ops_torch (#2892)
now the input types are matched and checked in lazy, we can remove these output_type.
also remove the usage of least_upper_dtype in ops.py since we can just use the input type
2023-12-21 02:11:27 -05:00
chenyu
2d2c4980fe assert for elementwise dtypes in lazy (#2888)
* assert for elementwise dtypes in lazy

* no image hack

* check dtype of scalar for IMAGE=2
2023-12-21 01:42:32 -05:00
chenyu
959d9cfed4 clean up ops_torch and ops_cpu (#2819) 2023-12-17 19:35:19 -05:00
chenyu
91adb119b8 remove match_type in ops_torch and ops_cpu (#2817)
* remove match_type in ops_torch and ops_cpu

input dtypes are aligned and casted in mlops

* dict union only after python3.9

* fix that

* fix Sigmoid forward cast
2023-12-17 15:32:30 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
wozeparrot
6d58c19736 binaryops xor (#2627)
* feat: initial xor

* feat: numpy xor

* feat: llvm xor

* feat: quick test for xor

* feat: slightly working xor in torch

* feat: xor in tensor

* feat: slightly better test
2023-12-05 13:21:42 -08:00
George Hotz
d6b404ac11 No dtype alloc (#2570)
* fix all allocs

* improve docs

* ugh fix fake alloc
2023-12-02 13:29:40 -08:00
George Hotz
f5de21e753 fast path for copy (#2548)
* fast copy

* ruff first

* flat_mv on malloc

* order + webgpu test
2023-12-01 11:34:47 -08:00
chenyu
7fec966b5e bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
George Hotz
12fa846122 zero copy (#2531)
* zero copy

* zero copy test

* loads coder in milliseconds

* zero copy for cpu and torch

* src_from_buffer is None

* SLOW_METAL_COPY there
2023-11-30 18:38:41 -08:00
George Hotz
2c363b5f0b new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz
6707f2588e use copyin (#2500)
* it's always copyin

* all RawBuffer are RawBufferCopyIn

* cleanups

* this fixes it

* requirements='C'

* more correct
2023-11-29 09:34:00 -08:00
George Hotz
5629fc368c Use Buffer.STORE at the end of ASTs (#2494)
* work

* store broken

* interpreteds work

* this passes

* symbolic cpu

* fix tests

* fix opt tests

* images fail

* fix InterpretedFlopCounter

* stupid hack for images
2023-11-28 20:11:37 -08:00
George Hotz
ab5d14d4ba MEM -> LOAD (#2492)
* MEM -> LOAD

* keep legacy working
2023-11-28 16:46:37 -08:00
George Hotz
756b01f46f why were these ever called buffer (#2483) 2023-11-27 21:02:07 -08:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
8e9cdef61f clean up the buffers (#2447)
* clean up the buffers

* remove allocate_output

* functools.lru_cache is methodcache

* add TestShapeTrackerSize

* cache_clear

* no 0 sz buffer, add _ on functions that shouldn't be imported

* fix size

* if -> while
2023-11-26 11:02:29 -08:00
George Hotz
8f89e21fca torch and numpy don't share ops anymore (#2412)
* torch and numpy don't share ops anymore

* that should be filtered out elsewhere

* still const

* graph + enet example cleanup

* hmm, we do still need it because of symbolic
2023-11-23 16:58:10 -08:00