chenyu
fd43d33b7d
shave some lines from transcend math [run_process_replay] ( #5500 )
...
* shave some lines from transcend math [run_process_replay]
* put input_dtype back
2024-07-15 21:02:24 -04:00
chenyu
63990705b5
test kernel opts case for 4 local and 4 groups ( #5499 )
...
make sure local grouped dim is correct
2024-07-15 20:09:38 -04:00
Alessandro Benetti
13e200b437
add strict mkdocs check ( #5497 )
2024-07-15 14:21:37 -07:00
nimlgen
8dfd11c1d8
docs: hcq add types ( #5495 )
...
* docs: hcq add types
* linter
2024-07-15 22:14:48 +03:00
George Hotz
aab1e8c6dc
uniform init to match torch ( #5494 )
2024-07-15 12:07:44 -07:00
George Hotz
338b7590b9
hotfix: docs for BatchNorm
2024-07-15 12:04:17 -07:00
nimlgen
c9ec7ce070
start hcq docs ( #5411 )
...
* start hcq docs
* more hcq docs
* docs
* docs
* linter
* correct args
* linter
* ts returns int
2024-07-15 21:31:11 +03:00
Edward Wang
9a7d5a148e
move colorize_float to helpers.py ( #5490 )
...
* add colorize_float to helpers.py
* update references
2024-07-15 11:29:03 -07:00
P4ssenger
a347d91e0e
remove outdated thread local aliases ( #5493 )
2024-07-15 11:28:11 -07:00
qazal
ac08f0eb00
reshape rawbufs in test_linearizer ( #5492 )
...
* reshape rawbufs in test_linearizer
* fix helper_linearizer_ast
2024-07-15 19:14:38 +03:00
qazal
ae4cb7994e
run process replay with DEBUG=0 ( #5491 )
...
* process replay with DEBUG=0
* graceful shutdown
* use and
2024-07-15 16:30:57 +03:00
Tobias Fischer
e219103677
Add Pad to Pooling ( #5488 )
2024-07-14 21:50:20 -07:00
chenyu
eef43c9f49
include dims in kernel/nv invalid err msg ( #5487 )
2024-07-14 22:51:30 -04:00
chenyu
c80801c266
len(full_shape)-ki.upcasted -> first_upcasted ( #5485 )
...
[run_process_replay]
2024-07-14 20:21:18 -04:00
Tobias Fischer
5849130cbb
gather negative dim fix ( #5486 )
2024-07-14 20:20:53 -04:00
qazal
3c378efcb6
process replay docs improvements ( #5481 )
...
* minor cleanups
* docs and logs
* shorter
* comma
* s/print/logging.info [run_process_replay]
* use logging.warn
* process name is noise
* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
chenyu
613a1dbeed
render lidx starting with 0 ( #5478 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-07-14 16:34:04 -04:00
qazal
671779f280
limit process replay diff to ~20% of kernels ( #5480 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
* add changed
* env var
* more early exit
* simpler?
* Revert "Merge branch 'lidx0' into process_replay_limit"
This reverts commit cbadcfa5e9 , reversing
changes made to fc9bf37ee7 .
* minor cleanup
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-14 23:10:08 +03:00
chenyu
f8a47608cc
test dtype.min and dtype.max ( #5479 )
...
compared with np.iinfo for integer dtype
2024-07-14 15:31:37 -04:00
George Hotz
a9f5a764dc
make BatchNorm work for 2D and 3D ( #5477 )
...
* make BatchNorm work for 2D and 3D
* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
chenyu
e41ab66653
use is to compare types ( #5476 )
...
new rule in latest ruff
2024-07-14 14:26:41 -04:00
George Hotz
aade18d20c
beautiful_mnist in torch
2024-07-14 11:09:58 -07:00
nimlgen
604fb60143
docs: fix link to jit in env_vars ( #5474 )
2024-07-14 16:08:16 +03:00
nimlgen
61822d1a14
nv fix timeline signal rollover on copy queue ( #5473 )
...
* hotfix: nv rollover to 32bits
* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a
cleanup nv/amd program ( #5449 )
...
* cleanup nv/amd program
* fix amd
* a bit cleaner
* ugh, typo
* linter
* fix nv
* tiny thing
2024-07-14 14:08:35 +03:00
qazal
0b3a34e3b1
vectorize folding [run_process_replay] ( #5470 )
...
* test_gep_vec_fold
* remove that
* fix process replay
* lint
2024-07-14 09:41:48 +03:00
George Hotz
cdf63e41bf
mnist mlx example uses compile to be fair to tinyjit
2024-07-13 18:14:45 -07:00
George Hotz
8940530290
add mlx beautiful_mnist example
2024-07-13 17:55:47 -07:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
Carson Powers
ef578b4de8
new UOp style patterns [run_process_replay] ( #5444 )
...
* express permute srcs in uop
* loop folding / sum collapse pats -> uop style
* UNMUL, const, phi on DEFINE_ACC pats -> uop style
* fix: cvar not const
* DEFINE_ACC w/o inputs, VECTORIZE-PHI-GEP pats -> uop style
* fix VECTORIZE-PHI-GEP pat
* contractor, reducer, float4 pats -> uop style
* arange folding .where
* one more
* revert permute expression in UOp
2024-07-13 17:21:08 -07:00
George Hotz
942c58be90
BEAM_COMPARE=2 validates the correctness of BEAM kernels ( #5458 )
...
* beam compare 2
* found issue maybe
* correct, not fail
* full rand
* less numpy
* extra simplify doesn't fix it
* reorder
* no numpy
* check in reverse
* test new tensor behavior
* better error msg
2024-07-13 13:53:43 -07:00
nimlgen
6943ea5f29
nv remove copy_from_cpu command ( #5459 )
2024-07-13 23:08:49 +03:00
nimlgen
67f70cef02
amd better allocation error messages ( #5462 )
...
* amd better allocation error messages
* a bit better
2024-07-13 22:55:09 +03:00
wozeparrot
2427f149a3
threefry as pattern matcher ( #5371 )
2024-07-13 11:59:03 -07:00
qazal
487ceff825
hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist ( #5456 )
2024-07-13 21:15:40 +03:00
chenyu
de6ab56458
clean up transcend math with uop syntactic sugar [run_process_replay] ( #5455 )
...
* clean up transcend math with uop syntactic sugar [run_process_replay]
* that?
* maybe
2024-07-13 14:00:14 -04:00
qazal
40ec9410f9
simpler process replay ( #5452 )
...
* remove check_process_replay
* that can go to the top
* add assert back
* [run_process_replay]
* checkout code [run_process_replay]
* temp [run_process_replay]
* revert temp [run_process_replay]
* ahh this is why [run_process_replay]
* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
chenyu
d2933d3548
simplify transcend math [run_process_replay] ( #5454 )
...
there are some (x - x) in dfadd2_f2_f2_f2, dfmul2_f2_f2_f2, dfdiv2_f2_f2_f2 that were removed by pattern matcher
2024-07-13 12:43:31 -04:00
qazal
23b907efbb
restore process replay runs by their id ( #5453 )
2024-07-13 19:32:34 +03:00
qazal
b8c9298164
verify_lazyop in for WMMA and group_for_reduces ( #5448 )
...
* try passing no tc and group for reduces
* minor
* use op.arg
* group_for_reduces
2024-07-13 18:06:19 +03:00
George Hotz
955e1179fb
move compile tests and merge ( #5451 )
...
* move compile tests and merge
* revert enet move, bump download cache
* oh, try setting clang
2024-07-13 08:04:46 -07:00
George Hotz
e638b0084f
smaller multitensor resnet test ( #5450 )
...
* minor improvments to matcher speed [run_process_replay]
* oh, put that back
* make fake images smaller for resnet test
2024-07-13 07:31:28 -07:00
Simone Margaritelli
03c3b14cc2
docs: addded JIT description to dos/env_vars.md ( #5445 )
...
* docs: addded JIT description to dos/env_vars.md
* docs: rephrased JIT=2 in env_vars.md
2024-07-13 07:07:11 -07:00
qazal
bb1a9ebf78
run process replay in parallel ( #5443 )
2024-07-13 11:29:36 +03:00
chenyu
3ebf569f04
relax fuzz transend math threshold a bit ( #5442 )
...
* relax fuzz transend math threshold a bit
* fuzz more
* fuzz 50k
2024-07-13 03:31:21 -04:00
chenyu
e398734890
fuzz test transcend math ( #5383 )
...
* fuzz test transcend math
found something wrong with float64 sin reduction
```
from tinygrad import Tensor, dtypes
import numpy as np
print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy())
print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy())
print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy())
print(np.sin(np.array([39800.0], dtype=np.float64)))
print(np.sin(np.array([39800.0], dtype=np.float32)))
print(np.sin(np.array([39800.0], dtype=np.float16)))
```
```
CLANG=1 python test.py
[0.92785633]
[0.7428573]
[-0.7705]
[0.74285722]
[0.7428572]
[-0.7705]
```
* fix test
* abs
* skip
2024-07-13 01:54:52 -04:00
hikettei
3a7262d923
[Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) ( #5441 )
...
* [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203
* Patch: compare the value of xlog(x) using y, allowing x <= 1e-200
* mypy
* fuzzer tests for log2
* fix tests: use approximate dbl_min, fp64 fails at nv
* update: gradually increment the scale (if y is not inf)
2024-07-13 01:11:53 -04:00
wozeparrot
90f0e2fc49
db in wal mode ( #5388 )
2024-07-12 20:43:36 -07:00
George Hotz
414aa6ee98
minor improvments to matcher speed [run_process_replay] ( #5439 )
...
* minor improvments to matcher speed [run_process_replay]
* oh, put that back
2024-07-12 20:41:41 -07:00