Commit Graph

811 Commits

Author SHA1 Message Date
George Hotz
4b741e893f remove REMOTE=1 (#13722)
* remove REMOTE=1

* leave ibverbs
2025-12-16 15:58:10 -04:00
George Hotz
e5a66ace80 multi custom kernel support (#13716)
* multi custom kernel support

* custom kernel xfrom

* works

* no SPEC=2 on ck

* panic

* touchups
2025-12-16 11:36:30 -04:00
George Hotz
316da9f7ff llm: add created/model fields, non-streaming support, and tests (#13660)
* llm: add created/model fields, non-streaming support, and tests

- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* add openai to testing

* toml

* Remove 'openai' from dependencies

Removed 'openai' from the dependencies list.

* bump cache

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 14:50:36 -05:00
George Hotz
f0fa9bcd98 openai api for llm (#13648)
* openai api for llm

* responds to simple request

* schedule cache needs to unbind

* stream works

* share stream code

* 20k

* one print

* cid
2025-12-12 08:25:33 -05:00
Christopher Milan
cb3d756547 NAK compile-only test (#13621) 2025-12-08 15:53:46 -05:00
Christopher Milan
a4c3d48aa9 compile-only test for IR3 actually works (#13619) 2025-12-08 15:07:49 -05:00
Christopher Milan
1c16b6e082 Mesa: freedreno (#12746)
* ir3 init

* got a program

* 1 + 1 works

* use isa_disasm instead of shader_disasm

* wip

* matmul works

* works on py3.14

* fix const loading

* skip QCOM failing tests

* cleanup

* args actually work

* add compile-only tests

* fix typo and install tinymesa

* IR3 NULL backend

* (float32) images work

* autogen fix

* fix compile only test

* typo

* mypy happy

* compile-only uses py3.14

* bump mesa

* unify qcom disassembler

* float16 works

* disasm shows in viz

* save a line

* add real del

* variable workgroup sizes

* simplify diff

* bump line count

* properly set wgsz

* regen mesa

* no preamble

* bump lines
2025-12-08 14:02:08 -05:00
chenyu
b981b6f89e remove old llama grad_acc (#13611)
* remove old llama grad_acc

* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
dec2f50aee reenable process replay for lvp (#13592) 2025-12-05 12:36:35 -05:00
qazal
6d92e9ffbf hotfix: skip process replay on lvp (#13585) 2025-12-05 19:25:23 +08:00
George Hotz
24ca8eeaa7 small fixups from schedule_cache (#13557) 2025-12-03 15:41:16 -08:00
Douglas Nyberg
f5abd38132 remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555) 2025-12-03 17:48:27 -05:00
George Hotz
21184ae6b1 bump cache to 14 (#13530) 2025-12-02 08:02:19 -08:00
nimlgen
77a76d1b13 device: respect compiler ContextVars (#13523)
* device: envvars for cc

* fix

* fix

* x

* um

* fix

* remote

* em

* cleanup

* typing

* fix

* debug

* lvp?

* ugh

* singl

* rm

* lol

* fix

* ?

* this?

* why?

* rev

* mod test

* l
2025-12-02 14:42:04 +03:00
Sieds Lykles
63a931ff76 Symbolic divisor fuzzer (#13433)
* render z3 range better

* working version

* rename

* add to workflow

* factor out variable_names

* smaller expressions

* smaller

* + back
2025-11-23 20:29:32 +01:00
Roelof van Dijk
0dc2ff431d fix: revive torch backend (#13280)
* fix: revive torch backend

* as_strided view vs copy

* Revert "as_strided view vs copy"

This reverts commit 82a61223f2.

* add extra tests (move inplace, add fusion tests)

* better fusion with inplace_op

* no optimizer hooks (break mnist training fusion)

* split off fusion tests in separate file, assert on resnet fusion

fix: remove comments

* cleanup, reduce diff

* reduce diff

* better fusion and identity checks

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-19 15:26:50 -08:00
George Hotz
1a332afa76 spec test on 3.14 (#12957) 2025-11-19 00:43:04 -08:00
George Hotz
df53c62a9f bump line count 2025-11-15 08:16:20 -08:00
Christopher Milan
09f3aae169 In-tree autogen: all C libraries (#13220)
* checkout files from autogen branch

* ioctl with payload

* fix am generations

* properly fix generations

This reverts commit b2a54f4f41.

* revert discovery.h

* support pragma pack(1)

* typo

* better getter

* typo

* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE

* align support

* anon handling fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 18:57:44 -08:00
George Hotz
263b724143 one cache and bump it (#13258) 2025-11-13 07:33:31 -08:00
chenyu
3f939f3d3c update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
Christopher Milan
41a098a82d In-tree autogen: libc.py (#13217)
* checkout changes from autogen branch

* parents

* pylint happy

* move sys to system in helpers.py

* typo

* typo
2025-11-11 19:13:48 -08:00
chenyu
60e55d9a2d line count 18500 (#13191) 2025-11-10 13:52:13 -05:00
chenyu
e1d46de8f8 update GROUPTOP heuristic more (#13178)
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
8e868dced8 only GROUPTOP one reduce kernel (#13176)
* only GROUPTOP one reduce kernel

* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
George Hotz
036ee9f84c Self type + mixins (#13056)
* use Self type

* mixin

* fix later
2025-11-02 13:30:01 +08:00
George Hotz
65a0a31475 AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
nimlgen
f6786c1bfd autogen: py314 (#13038)
* autogen: py314

* bump py?
2025-11-01 04:02:19 +08:00
nimlgen
4b001ec723 amd: pmc in mockgpu (#13000)
* amd: pmc in mockgpu

* fix

* do not open in ci
2025-10-30 01:52:02 +08:00
George Hotz
5e01cc299b zero len ranges fail (#12974)
* zero len ranges fail

* fix Python backend

* fix llvm

* fix ptx

* yolo fix nir

* this works...

* always store...

* always store...

* Revert "always store..."

This reverts commit 0816cf344d.
2025-10-28 22:49:55 +08:00
George Hotz
e936aa7974 cleanups from if range branch (#12973) 2025-10-28 20:58:47 +08:00
George Hotz
2832954bcb test with IGNORE_OOB=0 (#12960) 2025-10-28 10:32:19 +08:00
George Hotz
7784cec48e pytest-split on spec (#12959) 2025-10-28 10:09:01 +08:00
George Hotz
25c2da1579 check SPEC=2 in CI (#12945)
* check SPEC=2 in CI

* split SPEC=2

* fast enough
2025-10-27 21:53:57 +08:00
George Hotz
8a941d95a4 SPEC=2 is full spec, SPEC=1 is default (#12910)
* SPEC=1 passes all tests

* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
chenyu
4b7329001d clean up test_avg_pool3d (#12905) 2025-10-24 14:31:36 -04:00
chenyu
154b4f9f40 test FUSE_OPTIM=1 test/test_optim.py (#12895) 2025-10-23 15:54:27 -04:00
b1tg
60d7e232f2 cuda fp8 (#12782)
* cuda fp8

* tensor core

* tc test

* clean

* clean pm
2025-10-21 15:05:25 -04:00
Harald Schäfer
587ccc0e5c compile3: make selftests opt-in (#12851) 2025-10-21 11:32:27 -07:00
Harald Schäfer
addc54b96c Simplify openpilot compile3.py (#12748)
* Simpler compile3

* tests

* remove default args

* onnx file is still fp16

* self-test FP16 too

* allow test disable

* absurd tolerance

* Just do latest

* Try simplest

* use later models

* kernel count not relevant if speed is good

* dead improts

* Revert "dead improts"

This reverts commit f68c2cd15d.

* Revert "kernel count not relevant if speed is good"

This reverts commit 0955ca4ee0.

* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
George Hotz
85a907605c hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow 2025-10-15 22:29:34 +08:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
George Hotz
84d4589ed4 remove pylint from pre-commit and CI (#12658)
* remove pylint from pre-commit and CI

* multidevice test is fast

* faster pre-commit

* 8 is faster than 4

* better name

* how did that typecheck?
2025-10-14 15:39:59 +08:00
Sieds Lykles
e537e895b1 drop unused invalid conditions (#12635)
* drop where conditions if the ranges are not used inside the index

* remove allow_any_len
2025-10-13 10:52:21 +02:00
chenyu
8f5f57c7d9 smaller CNT fuzz shapetracker (#12626) 2025-10-12 08:52:30 -04:00
Sieds Lykles
772a8dfe31 reshape uses valid when simplifying (#12597)
* reshape uses valid when simplifying

* try with IGNORE_OOB=0

* is it this test?

* skipif gpuocelot
2025-10-11 17:02:54 +02:00