Commit Graph

11630 Commits

Author SHA1 Message Date
George Hotz
87e72f1540 ftz 2026-01-04 16:32:35 -08:00
George Hotz
b52ff63896 fixes 2026-01-04 15:48:31 -08:00
George Hotz
7f7f12d5b4 99% match 2026-01-04 15:05:05 -08:00
George Hotz
b10ae6958e roundtripping 2026-01-04 14:31:40 -08:00
George Hotz
10e2c47d52 don't make dtype 2026-01-04 13:49:47 -08:00
George Hotz
058816dd92 use tinygrad UOps as DSL 2026-01-04 13:40:15 -08:00
George Hotz
28846cb6c4 simpler dsl 2026-01-04 13:24:01 -08:00
George Hotz
958bfa1c5b Op 2026-01-04 13:05:38 -08:00
George Hotz
23cf30820f more correct 2026-01-04 12:52:20 -08:00
George Hotz
ea51512f90 CMPLE 2026-01-04 12:45:21 -08:00
George Hotz
e9664fdf28 dtype is uop 2026-01-04 12:31:40 -08:00
George Hotz
5d50281896 Merge remote-tracking branch 'origin/master' into asm_ucode
# Conflicts:
#	tinygrad/dtype.py
2026-01-04 12:22:52 -08:00
George Hotz
cfeeab8485 work 2026-01-04 12:22:01 -08:00
George Hotz
7abf4591ba use bitsize on dtype (#14011)
* use bitsize on dtype [pr]

* bitsize

* bitsize in js export, but might be wrong

* reverts

* revert that
2026-01-04 12:16:21 -08:00
George Hotz
2be5f8b688 work 2026-01-04 11:57:42 -08:00
George Hotz
db9140b8b7 work 2026-01-04 11:34:07 -08:00
George Hotz
63f663bd4b progress 2026-01-04 10:24:40 -08:00
chenyu
cfb8bf5814 faster image load (#13977)
sometimes image load does not need to init with NAN
2026-01-04 13:09:59 -05:00
George Hotz
e38d311f3c simpler 2026-01-04 10:02:33 -08:00
George Hotz
8e8ad423a7 post parser 2026-01-04 09:16:00 -08:00
George Hotz
7ebda28692 assembly/amd: add CDNA support to asm (#13982)
* add CDNA support

* more cdna tests

* something

* fix more stuff

* more work

* simpler

* simplier

* cdna

* disasm

* less skip

* fixes

* simpler
2026-01-04 08:53:56 -08:00
George Hotz
acad5d7b30 parser 2026-01-04 08:43:41 -08:00
George Hotz
9ef8ae3199 qcode 2026-01-04 08:33:57 -08:00
George Hotz
1f96afb1cb getting big 2026-01-04 08:05:23 -08:00
chenyu
ad041416ca delete unused rewrite rule [pr] (#14006) 2026-01-04 09:48:52 -05:00
nimlgen
bf356ae996 am: mi300 48bit address space (#14004)
* am: mi300 48bit address space

* fix
2026-01-04 15:19:25 +03:00
nimlgen
606786e152 am: do not sleep for each hive node during resets (#14003) 2026-01-04 14:02:11 +03:00
George Hotz
59144c6af6 assembly/amd: start replacing pcode with ucode 2026-01-03 23:27:41 -08:00
George Hotz
34ea053b26 assembly/amd: clean up pcode, jit pcode instead of static (#14001)
* assembly/amd: clean up pcode

* regen

* lil

* jit the pcode

* sendmsg

* cleanups

* inst prefetch lol
2026-01-03 23:06:15 -08:00
kamilisjon
280790e438 Reuse toposort in recursive_property (#13993) 2026-01-03 22:04:13 -08:00
kamilisjon
9a9564118c [pr] Delete reverse_toposort (#13987)
* Delete reverse_toposort

* Update comment and profiler name

* Update profiler name
2026-01-03 22:03:44 -08:00
George Hotz
8328511808 assembly/amd: make the emu.py code shine (#13996)
* assembly/amd: make the code shine

* lil clean

* reg back in pcode

* cleanups

* gen fma_mix

* no writelane hacks

* fn cleanup

* dead vgpr_write

* readable

* smem

* cleanup bench_emu

* speedups

* simpler and faster

* direct inst._fn

* split fxn

* Revert "simpler and faster"

This reverts commit e85f6594b3.

* move lds to wavestate

* dispatcher

* pc in dispatch

* literal isn't wavestate

* cleanups + program

* one readlane

* exec_vop3sd in exec_vop

* cleaner exec_vopd

* fully merge VOP3P

* no special paths

* no SliceProxy

* low=0

* no bigint

* failing tests

* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bdb421f13e process_replay: passthrough sink arg for Ops.PROGRAM input (#14000) 2026-01-04 13:09:39 +09:00
Galax
66caa9fe1d fix: library linking for fedora systems (#13999) 2026-01-03 17:40:56 -08:00
chenyu
8003db2a28 test case of NOOP store load folding (#13997) 2026-01-03 14:39:26 -05:00
chenyu
c1b8644a3f test removing expander rules [pr] (#13994) 2026-01-03 12:38:01 -05:00
Christopher Milan
35c2870b1f gate image_conv2d pitch hacks on IMAGE==1 (#13995)
* gate image_conv2d pitch hacks on IMAGE==1

* fix opencl image copies

* cleanup
2026-01-03 12:27:31 -05:00
nimlgen
a49924a0e9 hcq: _sleep report status (#13992)
* hcq: _sleep report status

* msg

* print all
2026-01-03 14:28:28 +03:00
nimlgen
3b354bc11f hcq: better queue managment (#13991) 2026-01-03 13:11:15 +03:00
nimlgen
efb2ae87c6 hcq sync aql (#13756)
* hcq sync aql

* w
2026-01-03 12:59:24 +03:00
qazal
bd55507ee4 RDNA3 fp16 assembly gemm 85 TFLOPS (#13990) 2026-01-03 18:34:23 +09:00
wozeparrot
6242a9d151 tk: no global copy and clear ranges (#13988) 2026-01-02 23:45:15 -08:00
wozeparrot
9f082e8e25 fa: split kv bwd into 2 kernels (#13981) 2026-01-02 18:45:51 -08:00
qazal
2cc64d71b0 simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup

* asm tests work

* simpler asm tests
2026-01-03 11:11:07 +09:00
chenyu
7cbafb2ef1 update hypothesis min version (#13983)
there was a local_constants perf regression that made hypothesis related tests slow
2026-01-02 21:01:57 -05:00
Christopher Milan
9dc524536f IMAGE=1 creates "dynamic" images (#13769)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time

* Revert "bump step time"

This reverts commit 75a037c7d0.

* "dynamic textures" are optional

* a start

* IMAGE=1 works, no FLOAT16

* fast but wrong

* mypy

* some fixes

* better

* works

* refactor

* oops
2026-01-02 16:22:39 -05:00
Christopher Milan
61dc70f1a8 add driving_vision IMAGE=1 benchmark (#13979) 2026-01-02 13:58:27 -05:00
George Hotz
0e282025ff assembly/amd: split test_emu into hw tests (#13966)
* assmebly/amd: split test_emu into hw tests

* hw tests

* bugfixes

* more tests and fix
2026-01-02 08:04:56 -08:00
chenyu
2e2b5fed12 fix misspellings (#13976) 2026-01-02 10:37:38 -05:00
nietras
f49e4714af Fix spelling errors in README for AMD assembly (#13975) 2026-01-02 10:15:20 -05:00