Commit Graph

823 Commits

Author SHA1 Message Date
nimlgen
4ed2c40d48 qcom a bit cleaner (#7380) 2024-10-29 23:50:28 +03:00
George Hotz
2cfc7b6695 Index everywhere 2 (#7363)
* indexing everywhere [pr]

* fix tests
2024-10-29 19:29:40 +08:00
George Hotz
572499c71a add indexing to ops_python (#7358)
* add indexing to ops_python

* fix image
2024-10-29 18:11:03 +08:00
chenyu
6021bf87f4 unify T = TypeVar("T") (#7342) 2024-10-28 18:43:44 -04:00
nimlgen
68cd2c0669 nv correct local memory based on device (#7307)
* nv correct local memory based on device

* linter

* oops

* oops2
2024-10-25 22:23:42 +03:00
nimlgen
98f8d0ccf9 nv limit max local memory with envvar (#7265) 2024-10-24 16:01:50 +03:00
George Hotz
de7b9d7c42 improve pre-commit [pr] (#7256)
* improve pre-commit [pr]

* mypy passes on windows
2024-10-24 15:38:47 +08:00
George Hotz
9f32a6f496 Revert "move metal tc check to renderer [pr] (#7248)" (#7251)
This reverts commit 72ddcdb4d1.
2024-10-24 10:57:09 +08:00
George Hotz
72ddcdb4d1 move metal tc check to renderer [pr] (#7248) 2024-10-24 10:38:57 +08:00
nimlgen
ea11382087 nv fix shared_memory_size (#7239) 2024-10-23 21:59:47 +03:00
qazal
aeeb917b6e mask out writable bufs in runtime access_resources (#7234) 2024-10-23 16:13:50 +03:00
nimlgen
cef7078c14 nv limit mappings debug (#7215) 2024-10-22 16:41:43 +03:00
nimlgen
21acfc39d4 qcom cleanup allocs (#7200)
* qcom cleanup allocs

* oops
2024-10-21 23:20:15 +03:00
nimlgen
81349213c0 nv min regs count is 16 (#7166) 2024-10-20 20:03:55 +03:00
chenyu
11beb67400 fix import of truncate (#7157)
truncate was moved to dtype
2024-10-18 18:41:41 -04:00
nimlgen
99fb115791 cuda correct pointer type (#7153) 2024-10-18 22:39:59 +03:00
Jacky Lee
c8b59416d0 fix: find_library can be None (#7145) 2024-10-18 20:50:52 +03:00
nimlgen
211d9753f8 nv more lc checks (#7139)
* nv more lc checks

* revert

* linter
2024-10-18 00:21:53 +03:00
George Hotz
ca0dca35f7 move ptx renderer [pr] (#7118) 2024-10-17 14:50:32 +08:00
nimlgen
d1094fce5e amd reports on hang (#7101) 2024-10-16 21:32:44 +03:00
nimlgen
83e7dbd89e nv fix reallocation local memory when oom (#7098) 2024-10-16 18:17:50 +03:00
George Hotz
cd61e81f55 beautiful mnist works on windows (#7100)
* beautiful mnist works on windows [pr]

* add comment for that (no pr)
2024-10-16 23:00:05 +08:00
nimlgen
9f00eacde5 nv tagged memory + resnet failed kernel (#7061)
* nv tagged memory

* linter

* metal fix?
2024-10-15 18:19:58 +03:00
nimlgen
586ff4c910 nv record uvm mappings (#7059)
* nv record uvm mappings

* linteeer

* smth

* ooops
2024-10-15 00:12:49 +03:00
nimlgen
8094340221 nv print info about faults (#7057)
* nv print info about faults

* unrelated changes

* nv_gpu.GT200_DEBUGGER in mockgpu

* regen with ocrrect version

* spacing
2024-10-14 21:49:38 +03:00
nimlgen
942a17109a qcom use QCOMBuffer for all allocated buffers (#7023)
* qcom use QCOMBuffer for all allocated buffers

* checks
2024-10-12 23:44:36 +03:00
George Hotz
a71bb09ec3 remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
Francis Lam
b0dd407cdd ops_cuda: add optional dynamic smem parameter (#6956)
* ops_cuda: add optional dynamic smem parameter

This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.

* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
George Hotz
f50d0e0ee0 cloud device [pr] (#6964)
* first try at cloud device [pr]

* real separation

* we're free

* clang works

* unhappy with timeout

* better timeouts and free

* unrelated

* use http verbs + add test

* lines + better test

* fix DELETE

* shorter cloud

* split key

* fix sending renderer

* PTXRenderer serialization

* add sessions

* http.client

* minor timeout bump

* fix keep-alive

* inc server timeout

* real fix timeout

* that one too
2024-10-11 12:24:06 +08:00
nimlgen
f9d454aed5 correct kernargs alignment (#6984) 2024-10-11 00:06:28 +03:00
nimlgen
fad575ec76 qcom tiny cleanups (#6973) 2024-10-10 12:26:41 +03:00
nimlgen
f90d8493cc add HCQDEV_WAIT_TIMEOUT_MS (#6968) 2024-10-09 19:50:00 +03:00
mesozoic-egg
0e8bcda07e get readable error from wait_check (#6965)
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-09 17:28:58 +03:00
nimlgen
137ad5519f amd fix cwsr for gfx11 (#6950)
* amd cwsr

* ()
2024-10-08 17:44:29 +03:00
nimlgen
0d526e251e nv sync on gpu before local update (#6954) 2024-10-08 17:43:58 +03:00
vladov
20a9683403 Make self.fd Optional. (#6855)
* Make self.fd Optional.

* Fix io_uring when missing fd.

* Compress io_uring fast path code.
2024-10-08 13:25:34 +08:00
nimlgen
42609300ff hcq no timeline signals in init (#6944) 2024-10-07 23:36:19 +03:00
nimlgen
707c805a68 nv set localmem sm count to max (#6890) 2024-10-04 23:29:46 +03:00
George Hotz
6b063450df move hcq device to runtime [pr] (#6879)
* things that are only used in one place don't belong in helpers [pr]

* start moving hcq device [pr]

* fix paths
2024-10-04 22:26:50 +08:00
ignaciosica
8931f20765 CLANG fixed ops python [run_process_replay] (#6866)
* hotfix: fixed values in ops_python for AMX

* hotfix: remove unused import
2024-10-03 20:40:04 +08:00
nimlgen
8bbf6fb88c use mv_address in ops_gpu (#6856) 2024-10-02 22:31:51 +03:00
mesozoic-egg
d2e02b47e1 Construct c_ulong in blitCommandEncoder copy method (#6793)
* Construct c_ulong in blitCommandEncoder copy method

* line too long

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-02 11:09:37 +08:00
vladov
501cfde7e6 Fix GPT2 with OpenCL backend. (#6821)
* Fix GPT2 with OpenCL backend.

* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
nimlgen
e213bea426 nv shorter (#6819) 2024-09-30 19:39:32 +03:00
nimlgen
b95f47784a qcom sleep when sync (#6785)
* qcom sleep when sync

* linter

* short
2024-09-27 19:14:10 +08:00
nimlgen
3c56aeee70 add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
mesozoic-egg
992cde05d7 Metal with CDLL instead of py-objc (#6545)
* Add CDLL interface for metal

* remove two unused functions

* Cover most of the API methods

* switch to cdll

* directly call objc message in ops_metal

* keep only obj interface

* Use direct message sending for graph

* may have found a solution to the memoryview on ctypes pointer

* buf indexing bug fixed

* fix c_int

* fix c int to bytes

* fix gpu time bug

* line savings for cdll metal core

* wip

* c int bug

* fix buf casting

* dedup for c_void_p

* dedup for c_void_p

* linter fix

* remove unused stuff

* my py fix

* more mypy error fix

* line savings

* line savings

* rename send_message to msg; add __hash__ and __eq__ for dedup

* wip

* refactor

* refactor

* remove named import from ctypes

* forgot to change variable name

* file reorg, put support.py to ops_metal

* refactor

* hash error

* remove to_ns_array

* test oom exception, fix exception change

* typevar for msg

* add back dedup

* test for compile error

* move constant to graph

* move header constant around

* get label for icb buffer

* check icb label using "in"

* wip fixing mypy reported error

* fixed mypy error

* code formatting

* all_resources dedup match previous

* code formatting

* code formatting; buffer set to objc_id

* revert changes on buf for the manual release, seems like _free is not always called

* skip unless on metal, for test_metal

* fix premature mem release causing seg fault

* test_metal check for device before importing

* Buffer should only be released under _free explicitly

* mypy fixes

* change object ownership

* test compile success

* lint fixes

* remove load_library

* wrap sel_register in cache

* simplify to_struct

* swap lines

* fix type error in to_struct

* bump line to 9800

* remove pyobjc from setup.py

* command buffer should be objc_instance and get released

* stringWithUTF8String: returns objc_instance

* Use constant for MTLPipelineOptionNone

* better explanation for [MTLBuffer contents:] return

* Use dyld_find in case the path differs

* trailing whitespace

* handle exception for methods that take error:

* load /System/Library instead of /Library

* Init c_void_p with None instead of zero for error objects

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-25 17:43:01 +08:00
nimlgen
e31552e2e0 qcom reinit queue on exec (#6728)
* qcom setup on exec as gpu=1

* linter

* gpulike

* offsets
2024-09-25 16:08:50 +08:00
nimlgen
e1caa24a92 qcom fix binded queue might be overwritten (#6712) 2024-09-25 12:45:23 +08:00
nimlgen
75b7627db7 qcom do not recreate memoryviews on updates (#6701) 2024-09-24 15:36:22 +08:00