Commit Graph

5073 Commits

Author SHA1 Message Date
chenyu
de6ab56458 clean up transcend math with uop syntactic sugar [run_process_replay] (#5455)
* clean up transcend math with uop syntactic sugar [run_process_replay]

* that?

* maybe
2024-07-13 14:00:14 -04:00
qazal
40ec9410f9 simpler process replay (#5452)
* remove check_process_replay

* that can go to the top

* add assert back

* [run_process_replay]

* checkout code [run_process_replay]

* temp [run_process_replay]

* revert temp [run_process_replay]

* ahh this is why [run_process_replay]

* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
chenyu
d2933d3548 simplify transcend math [run_process_replay] (#5454)
there are some (x - x) in dfadd2_f2_f2_f2, dfmul2_f2_f2_f2, dfdiv2_f2_f2_f2 that were removed by pattern matcher
2024-07-13 12:43:31 -04:00
qazal
23b907efbb restore process replay runs by their id (#5453) 2024-07-13 19:32:34 +03:00
qazal
b8c9298164 verify_lazyop in for WMMA and group_for_reduces (#5448)
* try passing no tc and group for reduces

* minor

* use op.arg

* group_for_reduces
2024-07-13 18:06:19 +03:00
George Hotz
955e1179fb move compile tests and merge (#5451)
* move compile tests and merge

* revert enet move, bump download cache

* oh, try setting clang
2024-07-13 08:04:46 -07:00
George Hotz
e638b0084f smaller multitensor resnet test (#5450)
* minor improvments to matcher speed [run_process_replay]

* oh, put that back

* make fake images smaller for resnet test
2024-07-13 07:31:28 -07:00
Simone Margaritelli
03c3b14cc2 docs: addded JIT description to dos/env_vars.md (#5445)
* docs: addded JIT description to dos/env_vars.md

* docs: rephrased JIT=2 in env_vars.md
2024-07-13 07:07:11 -07:00
qazal
bb1a9ebf78 run process replay in parallel (#5443) 2024-07-13 11:29:36 +03:00
chenyu
3ebf569f04 relax fuzz transend math threshold a bit (#5442)
* relax fuzz transend math threshold a bit

* fuzz more

* fuzz 50k
2024-07-13 03:31:21 -04:00
chenyu
e398734890 fuzz test transcend math (#5383)
* fuzz test transcend math

found something wrong with float64 sin reduction

```
from tinygrad import Tensor, dtypes
import numpy as np
print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy())
print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy())
print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy())
print(np.sin(np.array([39800.0], dtype=np.float64)))
print(np.sin(np.array([39800.0], dtype=np.float32)))
print(np.sin(np.array([39800.0], dtype=np.float16)))
```

```
CLANG=1 python test.py
[0.92785633]
[0.7428573]
[-0.7705]
[0.74285722]
[0.7428572]
[-0.7705]
```

* fix test

* abs

* skip
2024-07-13 01:54:52 -04:00
hikettei
3a7262d923 [Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) (#5441)
* [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203

* Patch: compare the value of xlog(x) using y, allowing x <= 1e-200

* mypy

* fuzzer tests for log2

* fix tests: use approximate dbl_min, fp64 fails at nv

* update: gradually increment the scale (if y is not inf)
2024-07-13 01:11:53 -04:00
wozeparrot
90f0e2fc49 db in wal mode (#5388) 2024-07-12 20:43:36 -07:00
George Hotz
414aa6ee98 minor improvments to matcher speed [run_process_replay] (#5439)
* minor improvments to matcher speed [run_process_replay]

* oh, put that back
2024-07-12 20:41:41 -07:00
chenyu
4df63da190 clean up rest of the loadop [run_process_replay] (#5440)
to metaop and filter_sink
2024-07-12 23:38:51 -04:00
hikettei
0795139f30 Fix TRANSCENDENTAL=2 fp64 sin (#5385)
* fixes on transcendental: fix for fp64 payne hanek, refactor for fp16 sin

* revert the changes on test

* refactor on xsin: removed cody_waite_reduction, always use payne_hanek

* Revert "refactor on xsin: removed cody_waite_reduction, always use payne_hanek"

This reverts commit 2fd401f251.

* still need cody_waite_reduction for the very smaller range

* test: added a regression test for transcendental sin

* test: found the worse case ulp 3.5 only in numpy

* give the input as a valid dtype

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-12 23:15:04 -04:00
George Hotz
fb3011ac61 improve matcher speed [run_process_replay] (#5438)
* improve matcher speed [run_process_replay]

* don't use arg set in ptx
2024-07-12 20:02:19 -07:00
George Hotz
03c2dc8bd7 lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
George Hotz
b8342fb085 independent lowerer [run_process_replay] (#5434)
* independent lowerer [run_process_replay]

* don't relinearize PTX

* fix ptx

* Revert "fix ptx"

This reverts commit f4e8e059c0.

* Revert "don't relinearize PTX"

This reverts commit f6c12c506c.

* parents is fine, no need for linearization

* remove loop local idxs

* recover stupid loop_idxs
2024-07-12 18:08:43 -07:00
chenyu
9a187e6102 fix handcode_opt script (#5435)
* fix handcode_opt script

* run in ci

* real run in ci

* HALF=0
2024-07-12 20:52:28 -04:00
wozeparrot
b80fd7d23c allow benchmarking forward only (#5436) 2024-07-12 17:37:49 -07:00
chenyu
00813a92a0 update Tensor.eye api to match torch (#5433)
* update Tensor.eye api to match torch

input is n for nrows and optional m for ncols

* space

* fix onnx
2024-07-12 20:25:12 -04:00
George Hotz
cddfd8e25d bugfix: group for reduce should check all dimensions (#5431) 2024-07-12 17:02:40 -07:00
George Hotz
fbaf040baf compute full_shape from LazyOp [run_process_replay] (#5429)
* compute full_shape from LazyOp

* put KernelInfo in the sink

* wrong but pass
2024-07-12 16:47:08 -07:00
George Hotz
870dc8c350 s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0 scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
chenyu
4cd1de038a smaller reshape_and_permute arg in shift_to (#5426)
adding tuples directly
[run_process_replay]
2024-07-12 17:46:48 -04:00
George Hotz
94599c0637 fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424)
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]

* fix tests

* fix more tests
2024-07-12 14:01:03 -07:00
George Hotz
b055ece550 hotfix: bump to cache gpuocelot 2024-07-12 13:54:14 -07:00
chenyu
d37056f3b1 pass Renderer.global_max / local_max into get_grouped_dims (#5423)
[run_process_replay]
2024-07-12 16:49:27 -04:00
George Hotz
4aefb1595d MetaOps.SINK [run_process_replay] (#5422)
* s/loadops/metaops [run_process_replay]

* add metaops.sink [run_process_replay]
2024-07-12 13:37:30 -07:00
George Hotz
f6ef283e6a s/loadops/metaops [run_process_replay] (#5421) 2024-07-12 13:26:50 -07:00
nimlgen
f4944ced09 tiny amd cleanups (#5420) 2024-07-12 22:54:42 +03:00
chenyu
b17e4adb3a add -c advice.detachedHead=false to process replay git checkout (#5419)
remove the noisy `Note: switching to 'origin/master'.

You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
wozeparrot
d1cbd6bb95 unity handcode_resnet_opt and handcode_bert_opt (#5418) 2024-07-12 12:05:01 -07:00
chenyu
a0dbe20dbd skip some redundant and slow tests in ci (#5416) 2024-07-12 14:43:13 -04:00
chenyu
76125c07be make some grouped_dim test work (#5415)
next need to support max size per dim, splitting and correct way to do reverse or arbitrary permute global dims
2024-07-12 14:22:50 -04:00
wozeparrot
b7cc75a9df usage summary in handcode opt (#5414) 2024-07-12 11:21:18 -07:00
uuuvn
3cb94a0a15 Rename tinygrad/runtime/driver to support (#5413) 2024-07-12 11:06:42 -07:00
nimlgen
6604d2b2c3 amd/nv respect visible devs (#5409)
* nv/amd respect visible devices

* linter

* sort amd gpus

* env docs
2024-07-12 20:02:12 +03:00
Roelof van Dijk
b18aa00bba refactor: consolidate replace [run_process_replay] (#5403) 2024-07-12 07:36:57 -07:00
chenyu
497274f663 add float64 to test_dtype_alu dtypes_float (#5410)
* add float64 to test_dtype_alu dtypes_float

* CUDACPU float64 crashes

* real NV failed
2024-07-12 10:21:32 -04:00
qazal
31fcc516dc more process replay tooling (#5407)
* replays

* what's in there

* can it be up there

* sha is enough

* insert sha as the key

* fix str

* update reset utils

* that nested try/except was terrible

* github_context can go
2024-07-12 13:11:34 +03:00
Roelof van Dijk
6ec7dbc287 ci: parallelize uops tests (#5405) 2024-07-12 11:22:41 +03:00
qazal
e22b377839 generalize FUSE_AS_ONE_KERNEL in the scheduler (#5397)
* test: use const

* hotfix: base

* asserts

* dont push through reshape

* cleanup

* dont need the cache

* test_reduceop_reshape_dont_push and test_index_fused are next
2024-07-12 10:23:16 +03:00
chenyu
6e0a523078 repro slow resnet kernel with 4 global dims (#5402)
* repro slow resnet kernel with 4 global dims

* fix ruff
2024-07-11 23:31:15 -04:00
George Hotz
8390feb7b9 optim.OptimizerGroup in hlb_cifar (#5401) 2024-07-11 20:14:36 -07:00
George Hotz
01fbd18209 metal compile fail 2024-07-11 19:27:05 -07:00
George Hotz
3a2b5a75d2 improve single kernel indexing (#5398)
* improve single kernel indexing

* metadata in graph (#5399)

* indexing is O(1)

* add failing test

* ugh, that all needs to be replaced with symbolic

* broken on ptx, it's fine

---------

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-07-11 19:00:57 -07:00
wozeparrot
c24d495ef9 metadata in handcode_opt (#5400) 2024-07-11 17:45:34 -07:00