Commit Graph

10490 Commits

Author SHA1 Message Date
eliotgolding
0289fbb1c2 limit real_size to the size of first View of ShapeTracker (#8628)
* fix real_size

* add fuzzer; typing

* spacing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-16 16:27:39 -05:00
nimlgen
f91ca508cf am: bind for sdma (#8633)
* am: bind for sdma

* fix
2025-01-16 15:22:27 +03:00
nimlgen
f671da6755 ci: add AM start time to benchmark (#8637)
* ci: add AM start time to benchmark

* am: unlock it

* add AMD

* revert this
2025-01-16 14:47:36 +03:00
qazal
81a84aa85a remove is_unrealized_unmasked_const [pr] (#8644) 2025-01-16 05:27:47 -05:00
uuuvn
00e5979897 Use full soname for libgcc_s in CPUProgram (#8642)
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
  File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
    helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```
2025-01-16 12:56:52 +03:00
qazal
611208cd8a Revert "Revert "move subbuffer to a rewrite rule in the scheduler (#8639)" (…" (#8643)
This reverts commit 82ef956cb8.
2025-01-16 04:30:11 -05:00
qazal
82ef956cb8 Revert "move subbuffer to a rewrite rule in the scheduler (#8639)" (#8641)
This reverts commit d5c90da286.
2025-01-16 03:29:07 -05:00
qazal
d5c90da286 move subbuffer to a rewrite rule in the scheduler (#8639)
* delete buffer_view from tensor

* add to the scheduler

* move buffer_view to the scheduler

* gradient doesn't care.

* for/with
2025-01-16 03:14:28 +02:00
nimlgen
b3efeeb717 docs: start am docs (#8638)
* docs: init am docs

* missing
2025-01-16 00:22:35 +03:00
uuuvn
7ecced7f6d LLVM JIT prereqs (#8634)
* LLVM JIT prereqs

This commit moves jit loading, disassembling and CPUProgram logic from
`ops_clang.py` to `elf.py`, `helpers.py` and `device.py` respectively

I don't quite like the `helpers.py` destination for capstone_flatdump
but this is where cpu_objdump is so presumably this is how it's supposed
to be

* Types
2025-01-15 09:47:08 -08:00
qazal
a1f70ce7d0 only use BUFFER_VIEW in disk [pr] (#8629)
* only use BUFFER_VIEW in disk [pr]

* delete can_view

* BUFFER_VIEW op on DISK

* remove that allow_buffer_view=False

* notes

* bitcast is a low-level op too

* this passes on AMD and LLVM
2025-01-15 12:34:15 -05:00
ignaciosica
bae20e5043 Generic PTX wmma rendering [pr] (#8632)
* make wmma rendering dtype size generic

* use var instead of calculating multiple times

* compact rendering
2025-01-15 09:31:48 -08:00
qazal
6193e279d4 isolate simple failing test for subbuffer on CONST [pr] (#8630)
* simple failing test for subbuffer on CONST [pr]

* add view_supported_devices check
2025-01-15 05:45:03 -05:00
George Hotz
e1f7c90459 gradient is a set [pr] (#8626)
* gradient is a set [pr]

* typing for deepwalk
2025-01-14 20:48:23 -08:00
chenyu
7fb1c7af61 minor multi cleanups [pr] (#8625) 2025-01-14 22:25:23 -05:00
George Hotz
504ad08e73 hotfix: add test_example_matmul_same 2025-01-14 19:03:17 -08:00
George Hotz
f29d6f54b8 support multilb gradient [pr] (#8624) 2025-01-14 18:33:33 -08:00
chenyu
4ee3243c93 JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623)
is it fast?
2025-01-14 19:52:38 -05:00
chenyu
7860a80801 simpler MultiLazyBuffer alu [pr] (#8622) 2025-01-14 19:19:13 -05:00
chenyu
930728c069 bert BS 72->66 [pr] (#8621)
72 does not fit now
2025-01-14 18:41:41 -05:00
chenyu
0790d8059f remove MultiLazyBuffer.from_sharded [pr] (#8620)
it's eqivalent to taking the lazydata from Tensor.split, then copy to devices
2025-01-14 18:00:49 -05:00
George Hotz
c85737c200 assert to prepare for grad uop [pr] (#8280)
* assert to prepare for grad uop [pr]

* fix test_nn

* fix most of test_tensor

* few more tests

* fix multi

* uniform gradient

* acc_dtype

* any for multi

* fix typing

* fix assert, CAST_BEFORE_VIEW is still the issue

* explict test for CAST_BEFORE_VIEW

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-01-14 13:26:56 -08:00
George Hotz
fdd46c9f28 delete view instant rule (#8616)
* remove cast before view

* greener

* indexing

* delete view instant rule

* that passes too

* openpilot too

* ack

* base on cast_before_view

* add it as a rewrite rule

* VIEW(DEVICE) is also fine

* test_shard_memory depends on forced_realize removal

* put that back, will go soon

* UOp representations change once we don't instantly fold things

* do not duplicate tests

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-01-14 16:15:13 -05:00
qazal
dddd4e5f9f hotfix: remove duplicate TestTensorMutates [pr] (#8619)
* hotfix: remove duplicate TestTensorMutates [pr]

* imports
2025-01-14 16:03:17 -05:00
nimlgen
c5782e85d2 tlsf: optimize alloc (#8608) 2025-01-14 23:48:07 +03:00
George Hotz
bfbe81df71 remove cast before view (#8613)
* remove cast before view

* greener

* indexing

* that passes too

* openpilot too

* ack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-14 15:04:58 -05:00
chenyu
393eec3201 raise RuntimeError for uneven shard [pr] (#8593)
no 7B llama on 6 GPUs

skip 70B
2025-01-14 14:51:48 -05:00
ignaciosica
d5a646d492 CUDA Turing TC (#8597)
* init turing tc

* reorder tc

* hotfix: remove some spaces

* revert var name to x

* consistent order of factors

* revert order of terms to match old stuff

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-14 10:35:14 -08:00
chenyu
cbfd51f5a5 make MultiLazyBuffer.bounds a property [pr] (#8614)
determined by lbs shapes and axis
2025-01-14 13:25:54 -05:00
chenyu
52e7003414 Revert "make kits19 dataset samples have small sizes (#8591)" (#8610)
This reverts commit 76a03e950a.
2025-01-14 12:24:27 -05:00
Francis Lata
76a03e950a make kits19 dataset samples have small sizes (#8591) 2025-01-14 08:27:45 -08:00
ignaciosica
4057b98f7f rename i and j into k and row/col (#8607) 2025-01-14 08:27:05 -08:00
nimlgen
1ff6862a3d ci: sleep a bit to let the driver unload the prev pid (#8605) 2025-01-14 15:55:23 +03:00
qazal
97ec564b03 noop changes from the block_assign branch [pr] (#8606) 2025-01-14 07:47:17 -05:00
qazal
5aab2806f0 rename to test_tensor_uop + use upats for asserting [pr] (#8604)
* rename to test_tensor_uop + use upats for asserting [pr]

* fix pr
2025-01-14 05:09:56 -05:00
qazal
863abc7140 scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598)
* remove the BUF_LIMIT assert

* skip the base one

* work

* work

* good error

* ok comment

* shorter check
2025-01-14 03:01:59 -05:00
chenyu
05e54f00d3 remove bounds from MultiLazyBuffer.from_sharded [pr] (#8603)
without a custom bound, the bound is uniquely determined by shape and axis
2025-01-13 23:40:05 -05:00
chenyu
d443e91d82 remove custom splits in Tensor.shard [pr] (#8602)
towards even split only
2025-01-13 21:29:13 -05:00
chenyu
227d96d7a3 remove unused src from metaop [pr] (#8601) 2025-01-13 20:28:14 -05:00
chenyu
c4e33048c6 test Tensor.clone has a different lazydata [pr] (#8600) 2025-01-13 20:13:44 -05:00
qazal
ae2229d727 assert kernel buffer limit at compile time [pr] (#8595)
* remove the BUF_LIMIT assert

* skip the base one
2025-01-13 16:32:07 -05:00
nimlgen
c2504357af am: lock to access dev (#8594)
* amm lock to access dev

* wording

* just works

* disbale
2025-01-13 23:53:13 +03:00
geohotstan
4abe631b56 fix onnx mobilenetv2-7-quantized.onnx (#8574)
* is 67% considered fixed?

* move test up

* share function

* add qgemm too

* make sure qgemm comes out as int

* actually that note is not right

* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
George Hotz
d19c1c7f03 bump 75 -> 73 for test failure 2025-01-13 09:18:38 -08:00
Francis Lata
c25d5d3101 improve isin checks (#8589) 2025-01-13 12:12:31 -05:00
nimlgen
74b83c4c41 am in ci (#8532)
* try am in ci

* no sudo

* temp

* run more am test

* run half on am

* insert amdgpu

* other machine as well
2025-01-13 19:55:17 +03:00
nimlgen
d224d0ed7f nv: fix fault info (#8587)
* nv: fix fault info

* and emu for amd

* skip if not mock
2025-01-13 14:38:43 +03:00
qazal
586e730d32 use UOp.st for kernel reduce axes (#8499)
* use UOp.st for kernel reduce axes [pr]

* do not return dict
2025-01-13 06:24:11 -05:00
qazal
7562cc0399 better test for reduce swizzle + don't use double dtype [pr] (#8586)
* better test_permute_rewrite

* use float32
2025-01-13 05:02:21 -05:00
George Hotz
df59b072db rename to top_down_rewrite [pr] (#8583) 2025-01-12 18:36:38 -08:00