Commit Graph

1410 Commits

Author SHA1 Message Date
nimlgen
1216fff781 remote: raise runtimeerror in checkz (#12453) 2025-10-05 21:22:53 +08:00
wozeparrot
d2cd269e28 fix: try close mmap (#12306) 2025-09-25 20:54:27 -07:00
wozeparrot
dc4dd898b7 fix: close mmap (#12249) 2025-09-19 14:09:12 -07:00
b1tg
54c15d74a4 python float8 support (#11960)
* basic support

* alu

* nan in exec_alu

* rand_for_dtype

* inf + 0.0

* finfo

* revert rand_for_dtype

* clean

* truncate fp8s inf

* spec ok

* float_to_fp8 nan/inf

* least_upper_dtype

* clean up

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-18 09:17:09 -04:00
nimlgen
3c5b8bf50c am: bump fw to rocm7 (#12226) 2025-09-17 21:20:22 +03:00
nimlgen
53655a4ee5 cuda: cleanup old comment (#12215) 2025-09-16 23:11:32 +03:00
nimlgen
d1ae30f7ef hcq: do not spam with errors in -m device (#12150)
* hcq: do not spam with errors in -m device

* um?

* um?

* nn

* helps?

* um?

* no gc?

* fix
2025-09-14 10:56:59 +03:00
Meng Zhuo
4b7904eca9 add cpu support for riscv64 (#12136) 2025-09-14 11:40:58 +08:00
George Hotz
b2a95d32bb check clSetKernelArg (#12149) 2025-09-13 17:24:55 +08:00
George Hotz
0695e322a8 fix android cpu device (#12148) 2025-09-13 15:42:04 +08:00
nimlgen
81e33b8439 system: cpu memory mappings are uncached (#12137)
* system: cpu memory mappings is uncached

* adm amd
2025-09-12 13:28:25 +03:00
chenyu
e306650d39 remove GPUDevice (#12106) 2025-09-10 16:35:00 -04:00
chenyu
0e266f376c ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
nimlgen
fb96394ff5 auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
nimlgen
21e6926a6a HostLLVMCompiler -> CPULLVMCompiler (#12096) 2025-09-10 14:04:16 +03:00
nimlgen
1c6c42715f unify cpu and llvm (#11982)
* try unify cpu and llvm

* fixes

* fix

* ops

* no llvm

* fix

* rm

* lvmm is ot

* oops

* override

* no llvm

* ignore

* skip llvm

* ooops
2025-09-09 13:54:44 +03:00
nimlgen
ebbcdd6577 cpu: use suppress_finalizing (#12071) 2025-09-08 18:28:09 +03:00
nimlgen
ef71acc88a hcq: cleanup fileio iface (#12063)
* hcq: cleanup fileio iface

* typo

* _
2025-09-07 15:43:27 +03:00
nimlgen
97187bf8b6 cleanup win and arch checks (#12060)
* cleanup win and arch checks

* stupid mypy
2025-09-06 23:08:46 +03:00
nimlgen
10ac427aaa cpu threading (#11951)
* start cpu threading

* fix

* fix2

* fix

* hacks?

* threads

* minor

* no dsp

* dsp 2

* n

* more

* test

* xm

* cleaner

* readable

* f

* reorder

* when no threads

* rangeify

* typos

* not needed

* reapply

* remoev this

* linter

* fixed cpu count in ci

* fix

* fixes

* rm

* typo

* sort based on speed

* test if test works in ci

* Revert "test if test works in ci"

This reverts commit 1f05edb531.

* do not pad thread
2025-09-06 16:13:43 +03:00
nimlgen
2b1844da27 cpu: support several threads in runtime (#12055) 2025-09-06 13:29:31 +03:00
Sieds Lykles
c6c16b2946 var_vals uses str for var (#12011)
* var_vals is str,int

* remove imports

* remove print

* fix test

* change var_vals in hcq

* update test_hcq

* fix multitensor _device_num var

* fix syminfer test

* shorten line

* p.vars stays list[Variable]

* shorten line

* vars is back to tuple[Variable, ...]

* change var_vals in extra

* change var_vals from shapetracker

* var_vals is str:int

* fix signature
2025-09-06 04:16:12 +02:00
George Hotz
9dee724fc4 make EMULATE a context var (#12002)
* make EMULATE a context var

* fix test amx
2025-09-04 11:15:43 -07:00
nimlgen
e213b85810 cpu: add thread_id to worker (#11995) 2025-09-04 14:58:13 +03:00
Sieds Lykles
572a3c15c6 Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src

* arg to src

* remove import

* fixup linearizer

* arg to src

* fix test_uop_graph

* fix more tests

* fix python renderer

* get const value from const uop

* ssimplify uop estimates

* fix webgpu locals

* fix old test

* gate Ops.SPECIAL in linearizer

* use ssimplify() for local/global_size

* remove toposort gate_parents_instead_of_self

* fix rendering in comment

* cleanup

* rename and add comments

* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
nimlgen
020abe0556 hcq: finalize without synchronization when in error state (#11872)
* hcq: finalize without synchronization when in error state

* ooops

* fix

* fix

* fix
2025-08-31 18:39:13 +03:00
b1tg
75d380a77c fix transcendentals in python renderer (#11932)
* fix transcendentals in python renderer

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-31 09:37:17 -04:00
b1tg
b2cc06218a python bfloat16 (#11912)
* python bf16

* _to_torch_storage_type

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-29 15:18:02 -04:00
qazal
30e72d5820 multi device and copy tracing for NULL device (#11913)
* add device name to NULL programs

* trace transfers
2025-08-29 15:31:00 +03:00
qazal
d8e1e4dc61 tracing: show NULL programs (#11911) 2025-08-29 14:09:33 +03:00
nimlgen
75678b2cbe amd: retire pm4 xcc sync (#11835)
* amd: aql default when several xccs

* amd: retire om4 xcc sync

* remove more

* more

* more
2025-08-29 09:56:27 +03:00
nimlgen
bb55a3001f nv: flush reset message (#11897) 2025-08-28 22:17:20 +03:00
nimlgen
874c1db4af am: init support for aql (#11888) 2025-08-28 18:41:46 +03:00
nimlgen
60dd9a162c memory: tiny tlsf cleanup (#11887) 2025-08-28 14:07:18 +03:00
nimlgen
62df6c39af amd: correct handling of relocations (#11863)
* amd: correct handling of relocations

* ops

* add
2025-08-27 01:26:45 +03:00
George Hotz
b268755d51 small changes from postopt (#11854) 2025-08-26 11:56:16 -07:00
nimlgen
afe14ccbfa amd: aql default when several xccs (#11832) 2025-08-26 15:16:36 +03:00
nimlgen
bba088ef11 amd aql queue (#11708)
* amd aql queue

* xcc

* fiz

* aql better

* llvm

* no for aql

* wrap

* is_sql

* am support

* complete

* fix

* mypy

* minor
2025-08-24 19:53:00 +03:00
nimlgen
e19f901330 amd: rptr/wptr in create_queue (#11817) 2025-08-24 18:03:45 +03:00
nimlgen
d71444857e amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM (#11816)
* amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM

* fix
2025-08-24 17:48:40 +03:00
nimlgen
b057a90d49 memory: rename is_huge_page -> is_page (#11786) 2025-08-22 20:08:58 +03:00
nimlgen
698392334f system: message for eaccess as well (#11785) 2025-08-22 18:21:32 +03:00
uuuvn
bd4a9473b0 Multihost exception handling (#11729)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-21 13:51:49 -04:00
nimlgen
9eff7cd1d8 am: support 64bit discovery (#11768) 2025-08-21 18:28:13 +03:00
nimlgen
6589c9e643 hcq: better errors for ifaces (#11751)
* hcq: better errors for ifaces

* fix linter

* typo

* space
2025-08-20 17:50:51 +03:00
George Hotz
bf467c623d changes from rangeify + better NullRenderer (#11732)
* changes from rangeify + better NullRenderer

* fix test
2025-08-19 12:51:54 -07:00
nimlgen
9c9e337c78 amd: parse soc enums (#11727)
* amd: parse soc enums

* remove from mock

* fix

* minimal amd_gpu
2025-08-19 15:06:09 +03:00
b1tg
61884f2057 add cstyle renderer to the NULL device (#11709)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-18 09:52:22 -07:00
nimlgen
1c62a3833b am: add versioned_header to load_fw (#11702)
* am: add versioned_header to load_fw

* fix mypy
2025-08-17 20:11:57 +03:00
nimlgen
d1224a7c4a am: check both signatures (#11694)
* am: check both signatures

* fix
2025-08-16 20:01:07 +03:00