nimlgen
1216fff781
remote: raise runtimeerror in checkz ( #12453 )
2025-10-05 21:22:53 +08:00
wozeparrot
d2cd269e28
fix: try close mmap ( #12306 )
2025-09-25 20:54:27 -07:00
wozeparrot
dc4dd898b7
fix: close mmap ( #12249 )
2025-09-19 14:09:12 -07:00
b1tg
54c15d74a4
python float8 support ( #11960 )
...
* basic support
* alu
* nan in exec_alu
* rand_for_dtype
* inf + 0.0
* finfo
* revert rand_for_dtype
* clean
* truncate fp8s inf
* spec ok
* float_to_fp8 nan/inf
* least_upper_dtype
* clean up
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-18 09:17:09 -04:00
nimlgen
3c5b8bf50c
am: bump fw to rocm7 ( #12226 )
2025-09-17 21:20:22 +03:00
nimlgen
53655a4ee5
cuda: cleanup old comment ( #12215 )
2025-09-16 23:11:32 +03:00
nimlgen
d1ae30f7ef
hcq: do not spam with errors in -m device ( #12150 )
...
* hcq: do not spam with errors in -m device
* um?
* um?
* nn
* helps?
* um?
* no gc?
* fix
2025-09-14 10:56:59 +03:00
Meng Zhuo
4b7904eca9
add cpu support for riscv64 ( #12136 )
2025-09-14 11:40:58 +08:00
George Hotz
b2a95d32bb
check clSetKernelArg ( #12149 )
2025-09-13 17:24:55 +08:00
George Hotz
0695e322a8
fix android cpu device ( #12148 )
2025-09-13 15:42:04 +08:00
nimlgen
81e33b8439
system: cpu memory mappings are uncached ( #12137 )
...
* system: cpu memory mappings is uncached
* adm amd
2025-09-12 13:28:25 +03:00
chenyu
e306650d39
remove GPUDevice ( #12106 )
2025-09-10 16:35:00 -04:00
chenyu
0e266f376c
ops_gpu -> ops_cl ( #12103 )
2025-09-10 15:15:48 -04:00
nimlgen
fb96394ff5
auto-select available compilers ( #12094 )
...
* device: auto select compilers
* fix
* metal+opencl
* nv/cuda
* test without ptx
* ptx
* fix tests
* fix
* fix test
* rename
* test + cleaner
* xx
* ops
* better test
* win?
* um?
* types
* debug
* win??
* sep rung
* wtf?
* debug
* skip win
* revert this
* types
2025-09-10 19:52:01 +03:00
nimlgen
21e6926a6a
HostLLVMCompiler -> CPULLVMCompiler ( #12096 )
2025-09-10 14:04:16 +03:00
nimlgen
1c6c42715f
unify cpu and llvm ( #11982 )
...
* try unify cpu and llvm
* fixes
* fix
* ops
* no llvm
* fix
* rm
* lvmm is ot
* oops
* override
* no llvm
* ignore
* skip llvm
* ooops
2025-09-09 13:54:44 +03:00
nimlgen
ebbcdd6577
cpu: use suppress_finalizing ( #12071 )
2025-09-08 18:28:09 +03:00
nimlgen
ef71acc88a
hcq: cleanup fileio iface ( #12063 )
...
* hcq: cleanup fileio iface
* typo
* _
2025-09-07 15:43:27 +03:00
nimlgen
97187bf8b6
cleanup win and arch checks ( #12060 )
...
* cleanup win and arch checks
* stupid mypy
2025-09-06 23:08:46 +03:00
nimlgen
10ac427aaa
cpu threading ( #11951 )
...
* start cpu threading
* fix
* fix2
* fix
* hacks?
* threads
* minor
* no dsp
* dsp 2
* n
* more
* test
* xm
* cleaner
* readable
* f
* reorder
* when no threads
* rangeify
* typos
* not needed
* reapply
* remoev this
* linter
* fixed cpu count in ci
* fix
* fixes
* rm
* typo
* sort based on speed
* test if test works in ci
* Revert "test if test works in ci"
This reverts commit 1f05edb531 .
* do not pad thread
2025-09-06 16:13:43 +03:00
nimlgen
2b1844da27
cpu: support several threads in runtime ( #12055 )
2025-09-06 13:29:31 +03:00
Sieds Lykles
c6c16b2946
var_vals uses str for var (#12011 )
...
* var_vals is str,int
* remove imports
* remove print
* fix test
* change var_vals in hcq
* update test_hcq
* fix multitensor _device_num var
* fix syminfer test
* shorten line
* p.vars stays list[Variable]
* shorten line
* vars is back to tuple[Variable, ...]
* change var_vals in extra
* change var_vals from shapetracker
* var_vals is str:int
* fix signature
2025-09-06 04:16:12 +02:00
George Hotz
9dee724fc4
make EMULATE a context var ( #12002 )
...
* make EMULATE a context var
* fix test amx
2025-09-04 11:15:43 -07:00
nimlgen
e213b85810
cpu: add thread_id to worker ( #11995 )
2025-09-04 14:58:13 +03:00
Sieds Lykles
572a3c15c6
Move Ops.SPECIAL arg to src ( #11918 )
...
* initial moving bound to src
* arg to src
* remove import
* fixup linearizer
* arg to src
* fix test_uop_graph
* fix more tests
* fix python renderer
* get const value from const uop
* ssimplify uop estimates
* fix webgpu locals
* fix old test
* gate Ops.SPECIAL in linearizer
* use ssimplify() for local/global_size
* remove toposort gate_parents_instead_of_self
* fix rendering in comment
* cleanup
* rename and add comments
* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
nimlgen
020abe0556
hcq: finalize without synchronization when in error state ( #11872 )
...
* hcq: finalize without synchronization when in error state
* ooops
* fix
* fix
* fix
2025-08-31 18:39:13 +03:00
b1tg
75d380a77c
fix transcendentals in python renderer ( #11932 )
...
* fix transcendentals in python renderer
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-31 09:37:17 -04:00
b1tg
b2cc06218a
python bfloat16 ( #11912 )
...
* python bf16
* _to_torch_storage_type
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-29 15:18:02 -04:00
qazal
30e72d5820
multi device and copy tracing for NULL device ( #11913 )
...
* add device name to NULL programs
* trace transfers
2025-08-29 15:31:00 +03:00
qazal
d8e1e4dc61
tracing: show NULL programs ( #11911 )
2025-08-29 14:09:33 +03:00
nimlgen
75678b2cbe
amd: retire pm4 xcc sync ( #11835 )
...
* amd: aql default when several xccs
* amd: retire om4 xcc sync
* remove more
* more
* more
2025-08-29 09:56:27 +03:00
nimlgen
bb55a3001f
nv: flush reset message ( #11897 )
2025-08-28 22:17:20 +03:00
nimlgen
874c1db4af
am: init support for aql ( #11888 )
2025-08-28 18:41:46 +03:00
nimlgen
60dd9a162c
memory: tiny tlsf cleanup ( #11887 )
2025-08-28 14:07:18 +03:00
nimlgen
62df6c39af
amd: correct handling of relocations ( #11863 )
...
* amd: correct handling of relocations
* ops
* add
2025-08-27 01:26:45 +03:00
George Hotz
b268755d51
small changes from postopt ( #11854 )
2025-08-26 11:56:16 -07:00
nimlgen
afe14ccbfa
amd: aql default when several xccs ( #11832 )
2025-08-26 15:16:36 +03:00
nimlgen
bba088ef11
amd aql queue ( #11708 )
...
* amd aql queue
* xcc
* fiz
* aql better
* llvm
* no for aql
* wrap
* is_sql
* am support
* complete
* fix
* mypy
* minor
2025-08-24 19:53:00 +03:00
nimlgen
e19f901330
amd: rptr/wptr in create_queue ( #11817 )
2025-08-24 18:03:45 +03:00
nimlgen
d71444857e
amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM ( #11816 )
...
* amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM
* fix
2025-08-24 17:48:40 +03:00
nimlgen
b057a90d49
memory: rename is_huge_page -> is_page ( #11786 )
2025-08-22 20:08:58 +03:00
nimlgen
698392334f
system: message for eaccess as well ( #11785 )
2025-08-22 18:21:32 +03:00
uuuvn
bd4a9473b0
Multihost exception handling ( #11729 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-08-21 13:51:49 -04:00
nimlgen
9eff7cd1d8
am: support 64bit discovery ( #11768 )
2025-08-21 18:28:13 +03:00
nimlgen
6589c9e643
hcq: better errors for ifaces ( #11751 )
...
* hcq: better errors for ifaces
* fix linter
* typo
* space
2025-08-20 17:50:51 +03:00
George Hotz
bf467c623d
changes from rangeify + better NullRenderer ( #11732 )
...
* changes from rangeify + better NullRenderer
* fix test
2025-08-19 12:51:54 -07:00
nimlgen
9c9e337c78
amd: parse soc enums ( #11727 )
...
* amd: parse soc enums
* remove from mock
* fix
* minimal amd_gpu
2025-08-19 15:06:09 +03:00
b1tg
61884f2057
add cstyle renderer to the NULL device ( #11709 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-18 09:52:22 -07:00
nimlgen
1c62a3833b
am: add versioned_header to load_fw ( #11702 )
...
* am: add versioned_header to load_fw
* fix mypy
2025-08-17 20:11:57 +03:00
nimlgen
d1224a7c4a
am: check both signatures ( #11694 )
...
* am: check both signatures
* fix
2025-08-16 20:01:07 +03:00