Commit Graph

7726 Commits

Author SHA1 Message Date
eliotgolding
bb5ded85cc Don't rewrite idiv to rshift when numerator is negative (#8885)
* more conditions for shift rewrite mul/idiv

* make ptx test uint so the new condition is true

* delete idiv test

* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division

* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
pedro
666b6149bc Use full soname for libgcc_s in CPUProgram (#8642) (#8896)
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
  File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
    helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```

Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>
2025-02-05 07:45:48 +08:00
chenyu
48349efdc1 copy is already contiguous (#8886) 2025-02-04 17:53:33 -05:00
nimlgen
4c28235bd1 am: remove hardcodes (#8895)
* am: remove hardcodes for 7900

* h
2025-02-05 00:52:53 +03:00
geohotstan
057c70b05f add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890)
* start

* log severity

* only change this

* change abstraction so it's more usable for huggingface

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-04 16:36:01 -05:00
chenyu
89eebd4bfb pow cleanups (#8894)
more readable
2025-02-04 15:52:57 -05:00
qazal
7a9e3247c2 simple start to the Kernel UOp [pr] (#8893)
* simple start to a kernel [pr]

* add the sched_sink and spec

* rename kernels to sinks

* pylint complains
2025-02-04 21:48:15 +01:00
qazal
b4e8878e01 remove tensor_uops tracking from ScheduleContext [pr] (#8892)
* remove tensor_uops tracking from ScheduleContext [pr]

* cleaner
2025-02-04 20:34:15 +01:00
qazal
6a0da51ed0 truncate process replay logs [pr] (#8891)
* truncate process replay logs [pr]

* work

* max_lines

* bump to 1K
2025-02-04 20:26:48 +01:00
qazal
c7c279a6bd unbind ShapeTrackers without maintaining a cache [pr] (#8889)
* replace with a try [pr]

* check vars

* ahaa
2025-02-04 19:43:41 +01:00
chenyu
61de654efa minor shard cleanup [pr] (#8888) 2025-02-04 13:22:31 -05:00
qazal
6ec7f1b00f replace UPat(name="x") with UPat.var("x") [pr] (#8887)
* replace UPat(name="x") with UPat.var("x") [pr]

* a few more
2025-02-04 19:12:40 +01:00
qazal
c26b06eaeb delete fold_img_cast [pr] (#8875) 2025-02-04 18:43:45 +01:00
qazal
acf0baefee process replay from tensor uops to kernel ast (#8883)
* process replay from tensor uops to kernel ast

* this dedups

* switch back to string key
2025-02-04 18:09:20 +01:00
Ignacio Sica
dcf104ee68 ptx wmma render refactor (#8873)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-04 11:01:23 -05:00
qazal
b92f36179d don't use set in schedule + add GroupOp.All [pr] (#8882)
* don't use set in schedule + add GroupOp.All [pr]

* update that
2025-02-04 08:19:27 +01:00
George Hotz
56fa5c1191 dsp simulator (#8869)
* dsp simulator

* progress

* fix

* close on test tiny

* working

* less waste

* line savings

* Device DSP compiler

* mock DSP at the bottom

* DSP tests

* docker caching

* test update

* need load

* skip that test for CI DSP

* last touch

* ugh
2025-02-04 09:45:04 +08:00
chenyu
836cf42c2e fix rand_like for multi (#8880) 2025-02-03 19:00:14 -05:00
chenyu
746d899dbd move multi axis to property (#8879)
also updated tests so that axis is known prior to realize
2025-02-03 16:02:09 -05:00
nimlgen
fa90079370 amd: reallocate scratch (#8872)
* amd: reallocate scratch

* use it

* oops

* allocate default

* mypy

* ops

* address realloc from none better

* types correct

* this better

* ops

* rm
2025-02-03 23:21:37 +03:00
chenyu
ec447a31e7 factor out get_axis in multi [pr] (#8878)
ALU/REDUCE_AXIS/RESHAPE/PERMUTE can change axis. prereq to move this logic to ops.py
2025-02-03 14:39:08 -05:00
chenyu
cce26009f0 simplify pow to not call cos (#8877)
use %2 instead of cos to detect even numbers
2025-02-03 12:54:18 -05:00
geohotstan
d1aa9f30bc copy onnx_ops into onnx (#8876)
* just copy it over

* make OnnxOps a global var

* some small style stuff

* rerun CI but also some small clean up

* some comments
2025-02-03 12:15:07 -05:00
Ali Ladjevardi
73c75d6ee1 DEFINE_LOCAL variable names start from temp0, not temp1 (#8870) 2025-02-03 22:50:38 +08:00
qazal
b6c617272a New schedule.py Order [pr] (#8874) 2025-02-03 14:59:11 +02:00
George Hotz
b075aefc12 hotfix: revert llvm host_arch 2025-02-03 16:46:19 +08:00
George Hotz
a5753095dc llvm cleanups [pr] (#8867) 2025-02-03 15:32:41 +08:00
George Hotz
f484db0e63 dsp cleanups [pr] (#8866) 2025-02-03 15:18:53 +08:00
George Hotz
af2c2837f6 hotfix: skip broken test, add KERNEL Op 2025-02-03 14:02:55 +08:00
qazal
565c37c681 start simplifying the scheduler context [pr] (#8830) 2025-02-02 18:11:36 +02:00
qazal
d64af3c884 reorder simplifier and grouper logic in scheduler [pr] (#8861) 2025-02-02 17:19:52 +02:00
qazal
83a904aaad just schedule in test_recursive_pad [pr] (#8860) 2025-02-02 15:01:24 +02:00
uuuvn
6dadb60c93 LLVM JIT (+autogen llvm instead of llvmlite) (#8486)
* LLVM JIT

* Autogen LLVM

* Update autogen

* Move things around

* even more non-determinism

* windows

* more autogen weirdness

* more windows stuff

* blind windows development try 2

* more blind windows development

* even more blind windows development

* maybe i should just set up a windows vm...

* why can't everyone just use sysv abi?

* cleanup debugging stuff

* unused import

* icache flushing isn't required on x86

* merge jit_nt and jit_unix

* more

* Temporary hack to not segfault

* better error

* bad conflict resolution

* Attempt to simplify support/llvm.py

* More refactoring

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-02 19:52:42 +08:00
FICTURE7
66306b5321 Fix disk tensor assignment (#8855)
* Add test for disk tensor assignment failure

* Fix disk tensor assignment

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-02 13:50:34 +02:00
Ali Ladjevardi
6e523e4d17 Remove size arg from DEFINE_LOCAL [pr] (#8845)
* remove size arg form DEFINE_LOCAL

* make mypy happy

* whitespace

* dont change code in extra

* revert to temp1 to pass pr
2025-02-02 19:47:32 +08:00
nimlgen
7841852870 hcq pci signal fuzzer (#8854)
* hcq pci signal fuzzer

* kk

* correct
2025-02-01 23:42:27 +03:00
qazal
dc34a4146f better process_replay context print [pr] (#8856)
* better process_replay context print [pr]

* test: revert push cast

* Revert "test: revert push cast"

This reverts commit 38a2aef6f8.
2025-02-01 21:50:23 +02:00
chenyu
5b1fc4dcb2 push cast to branches in UOp where (#8850) 2025-02-01 13:55:24 -05:00
chenyu
73ee2d74c0 raise RuntimeError for int base pow (#8852)
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
qazal
72e1f41f8e add unbind_vars pattern matcher (#8851)
* add unbind_vars pattern matcher [pr]

* this can be cvar

* this is empty
2025-02-01 18:25:44 +02:00
nimlgen
b3fa76419a am: move queues to gpus (#8848)
* am: fix

* add flsg for thos

* do not depend on host parameter,
2025-02-01 18:02:52 +03:00
George Hotz
42d7c800a1 hotfix: add missing tinychat fonts + other assets 2025-02-01 09:34:44 +08:00
George Hotz
431a86615d fix multi Ops.CONTIGUOUS_BACKWARD [pr] (#8843) 2025-02-01 09:21:31 +08:00
Ahmed Harmouche
07d3676019 weights_only=False (#8839) 2025-01-31 17:16:47 -05:00
nimlgen
741bbc900d Revert "am: queues allocated on gpus (#8836)" (#8837)
This reverts commit 7bbb568dec.
2025-01-31 22:53:41 +03:00
nimlgen
7bbb568dec am: queues allocated on gpus (#8836)
* am: fix

* add flsg for thos
2025-01-31 22:14:43 +03:00
chenyu
1f730ae8f8 remove retain_graph in Tensor.backward [pr] (#8835)
not used. gradient accumulation works directly
2025-01-31 13:41:26 -05:00
chenyu
0a59db936a raise RuntimeError in schedule_step if not Tensor.training [pr] (#8834) 2025-01-31 12:03:04 -05:00
qazal
af4f9d1aa9 use matchers to verify AST shape [pr] (#8828)
* use matchers to verify kernel AST [pr]

* work

* use swizzle_cnt

* add comment

* imports

* modified_ast comment

* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6 tensor uop spec should be in spec.py [pr] (#8827)
* tensor uop spec should be in spec.py [pr]

* err, spec.py

* print uops can stay
2025-01-31 13:54:04 +08:00