Commit Graph

638 Commits

Author SHA1 Message Date
chenyu
2e2b5fed12 fix misspellings (#13976) 2026-01-02 10:37:38 -05:00
George Hotz
0221b96761 assembly/amd: fix all ops tests (#13910)
* assembly/amd: fix all ops tests

* test_ops with smaller sizes

* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
ayanhan
47a170be2e test: enable cummax scalar IndexError test (#13625) 2025-12-09 12:25:56 -05:00
Christopher Milan
1c16b6e082 Mesa: freedreno (#12746)
* ir3 init

* got a program

* 1 + 1 works

* use isa_disasm instead of shader_disasm

* wip

* matmul works

* works on py3.14

* fix const loading

* skip QCOM failing tests

* cleanup

* args actually work

* add compile-only tests

* fix typo and install tinymesa

* IR3 NULL backend

* (float32) images work

* autogen fix

* fix compile only test

* typo

* mypy happy

* compile-only uses py3.14

* bump mesa

* unify qcom disassembler

* float16 works

* disasm shows in viz

* save a line

* add real del

* variable workgroup sizes

* simplify diff

* bump line count

* properly set wgsz

* regen mesa

* no preamble

* bump lines
2025-12-08 14:02:08 -05:00
chenyu
e8879f7e31 match torch clamp backward (#13533)
* match torch clamp backward

* fix PYTHON
2025-12-02 17:58:32 -05:00
Sieds Lykles
114bb94c55 Fix load collapse MAX to ADD (#13406)
* add Ops.ADD to pattern

* add test
2025-11-21 12:26:14 +01:00
George Hotz
4027eef264 fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
chenyu
4b7329001d clean up test_avg_pool3d (#12905) 2025-10-24 14:31:36 -04:00
George Hotz
c780cd9abb new linearizer with early endrange (#12823)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx
2025-10-21 17:37:48 +08:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
George Hotz
fb61f3519f remove assign contiguous hack (#12659)
* remove assign contiguous hack

* remove bad contiguous usage in torch backend

* assign
2025-10-14 16:42:14 +08:00
chenyu
001b3710d3 enable some test_ops tests (#12607) 2025-10-10 07:23:21 -04:00
chenyu
cf8232ec6a clean up more RANGEIFY flag (#12556) 2025-10-09 03:06:48 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
George Hotz
0f25b4b289 move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
chenyu
f203d8b221 update RANGEIFY kernel count and test_masked_select (#12435) 2025-10-03 00:41:34 -04:00
wozeparrot
a6dd5a224b skip webgpu tests (#12433) 2025-10-02 21:31:07 -07:00
chenyu
6ba8bf282f skip test_masked_select for RANGEIFY PYTHON (#12395) 2025-10-01 04:13:31 -04:00
qazal
e8c595c29e remu: add new instructions introduced in RANGEIFY (#12363)
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d

* run amd test_ops

* skip test_masked_select
2025-09-30 12:36:29 +03:00
Sieds Lykles
d21e34e617 enable test_sum_twice (#12270)
* remove skip

* remove import
2025-09-23 00:57:29 +02:00
chenyu
393c6b236c test case to sum twice in different order (#12253)
* test case to sum twice in different order

fixed by #12251

* try metal
2025-09-20 10:11:57 -04:00
chenyu
edffc246ed MUL in reduce_unparented (#12223)
* MUL in reduce_unparented

* some test
2025-09-17 11:56:39 -04:00
chenyu
12a910f1d2 update torch 2.8 (#12172)
support _reshape_alias. something is wrong with one case of unfold
2025-09-14 15:19:03 -04:00
George Hotz
bcafa72b7f use tags instead of graph_rewrite_map in rangeify (#12110)
* use tags instead of graph_rewrite_map in rangeify

* new style, add realize

* metadata works

* simple failure

* fix

* loops

* stuff becomes a NOOP when you remove it

* stuff becomes a NOOP when you remove it

* tags on bufferize

* bmnist works

* locals don't work

* shippable

* fix some tests

* simpler map_realize

* remove const hack

* debuggable test

* broke

* assign test

* straight up bug

* wooo it passes

* sink shouldn't be there

* fix ops

* bmnist

* kv cache ish

* Set RANGEIFY context variable to 0

* should work normal

* better

* types

* hacks to fix test_symbolic

* pm_add_buffers

* tests should pass
2025-09-14 11:39:01 +08:00
chenyu
aac3dceaf6 merge two PYTHON backend ci job (#12143)
* merge two PYTHON backend ci job

and mark anything that takes > 10 in test_ops slow

* two more
2025-09-12 17:36:46 -04:00
chenyu
544eb2c402 clean up test_scatter_reduce (#12125) 2025-09-11 16:36:58 -04:00
chenyu
0e266f376c ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
nimlgen
1c6c42715f unify cpu and llvm (#11982)
* try unify cpu and llvm

* fixes

* fix

* ops

* no llvm

* fix

* rm

* lvmm is ot

* oops

* override

* no llvm

* ignore

* skip llvm

* ooops
2025-09-09 13:54:44 +03:00
chenyu
ce7163e9b4 clean up skip slow tests in PYTHON (#12028)
skip with SKIP_SLOW_TEST and decorators
2025-09-05 11:35:26 -04:00
chenyu
52166fd7eb smaller test_ops inputs (#12007) 2025-09-04 16:22:33 -04:00
chenyu
d0e739453e update many einsum tests (#11981)
correct the exception testing, and raise ValueError instead of assert when checking args
2025-09-03 15:40:20 -04:00
chenyu
69dd1817d0 raise RuntimeError in merge_dicts instead of assert [pr] (#11965) 2025-09-02 17:18:44 -04:00
chenyu
7123df3928 Use Tensor.logaddexp to implement Tensor.softplus (#11796)
instead of piecewise linear, numerical is handled by logaddexp. jax does this and i think it's more elegant than torch's approach
2025-08-23 11:52:29 -04:00
chenyu
fb8ee02424 Tensor.logaddexp (#11793) 2025-08-23 09:15:00 -04:00
geohotstan
1e679bd789 fix max_unpool2d inf (#11784)
* start

* add regression test for maxunpool2d
2025-08-22 08:31:24 -04:00
chenyu
91a4de4ca7 fix getitem with inf in tensor (#11781) 2025-08-21 21:55:32 -04:00
chenyu
5276fbc9c5 fix gather with inf values (#11760)
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
chenyu
4fe19eec72 Ops.TRUNC (#11659) 2025-08-13 18:40:48 -04:00
chenyu
0c97d6de1b don't round pow output for int pow int (#11625)
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850 support int Tensor pow to const non-negative int (#11624)
matches torch
2025-08-11 19:50:19 -04:00
chenyu
a67e0917c3 list indexing can normalize in python (#11609)
* list indexing can normalize in python

list index does not need to be normalized in tensor

* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2 few more tensor indexing test cases (#11608) 2025-08-10 18:56:42 -04:00
chenyu
dfb702ef33 fix sort for small dim (#11601)
* fix sort for small dim

* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
chenyu
aa1a6f2132 support threshold in Tensor.softplus (#11564)
fix gradient for large input
2025-08-07 13:43:18 -04:00
chenyu
dbc7807c61 enable WEBGPU tests with buffer limit (#11489)
TestSample still fails?
2025-08-03 13:02:44 -07:00
chenyu
2d7c28de6a clean up dup lambdas in helper_test_exception (#11325) 2025-07-22 12:21:57 -04:00
chenyu
fb42c84365 merge TestRollEdgeCases into test_ops (#11321) 2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c movementop only Tensor.roll (#11317)
* movementop only Tensor.roll

* fixed
2025-07-22 10:34:15 -04:00
chenyu
6e9506e6fd Tensor.roll supports dims=None (#11313) 2025-07-21 17:29:23 -04:00