George Hotz
cdf63e41bf
mnist mlx example uses compile to be fair to tinyjit
2024-07-13 18:14:45 -07:00
George Hotz
8940530290
add mlx beautiful_mnist example
2024-07-13 17:55:47 -07:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
Carson Powers
ef578b4de8
new UOp style patterns [run_process_replay] ( #5444 )
...
* express permute srcs in uop
* loop folding / sum collapse pats -> uop style
* UNMUL, const, phi on DEFINE_ACC pats -> uop style
* fix: cvar not const
* DEFINE_ACC w/o inputs, VECTORIZE-PHI-GEP pats -> uop style
* fix VECTORIZE-PHI-GEP pat
* contractor, reducer, float4 pats -> uop style
* arange folding .where
* one more
* revert permute expression in UOp
2024-07-13 17:21:08 -07:00
George Hotz
942c58be90
BEAM_COMPARE=2 validates the correctness of BEAM kernels ( #5458 )
...
* beam compare 2
* found issue maybe
* correct, not fail
* full rand
* less numpy
* extra simplify doesn't fix it
* reorder
* no numpy
* check in reverse
* test new tensor behavior
* better error msg
2024-07-13 13:53:43 -07:00
nimlgen
6943ea5f29
nv remove copy_from_cpu command ( #5459 )
2024-07-13 23:08:49 +03:00
nimlgen
67f70cef02
amd better allocation error messages ( #5462 )
...
* amd better allocation error messages
* a bit better
2024-07-13 22:55:09 +03:00
wozeparrot
2427f149a3
threefry as pattern matcher ( #5371 )
2024-07-13 11:59:03 -07:00
qazal
487ceff825
hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist ( #5456 )
2024-07-13 21:15:40 +03:00
chenyu
de6ab56458
clean up transcend math with uop syntactic sugar [run_process_replay] ( #5455 )
...
* clean up transcend math with uop syntactic sugar [run_process_replay]
* that?
* maybe
2024-07-13 14:00:14 -04:00
qazal
40ec9410f9
simpler process replay ( #5452 )
...
* remove check_process_replay
* that can go to the top
* add assert back
* [run_process_replay]
* checkout code [run_process_replay]
* temp [run_process_replay]
* revert temp [run_process_replay]
* ahh this is why [run_process_replay]
* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
chenyu
d2933d3548
simplify transcend math [run_process_replay] ( #5454 )
...
there are some (x - x) in dfadd2_f2_f2_f2, dfmul2_f2_f2_f2, dfdiv2_f2_f2_f2 that were removed by pattern matcher
2024-07-13 12:43:31 -04:00
qazal
23b907efbb
restore process replay runs by their id ( #5453 )
2024-07-13 19:32:34 +03:00
qazal
b8c9298164
verify_lazyop in for WMMA and group_for_reduces ( #5448 )
...
* try passing no tc and group for reduces
* minor
* use op.arg
* group_for_reduces
2024-07-13 18:06:19 +03:00
George Hotz
955e1179fb
move compile tests and merge ( #5451 )
...
* move compile tests and merge
* revert enet move, bump download cache
* oh, try setting clang
2024-07-13 08:04:46 -07:00
George Hotz
e638b0084f
smaller multitensor resnet test ( #5450 )
...
* minor improvments to matcher speed [run_process_replay]
* oh, put that back
* make fake images smaller for resnet test
2024-07-13 07:31:28 -07:00
Simone Margaritelli
03c3b14cc2
docs: addded JIT description to dos/env_vars.md ( #5445 )
...
* docs: addded JIT description to dos/env_vars.md
* docs: rephrased JIT=2 in env_vars.md
2024-07-13 07:07:11 -07:00
qazal
bb1a9ebf78
run process replay in parallel ( #5443 )
2024-07-13 11:29:36 +03:00
chenyu
3ebf569f04
relax fuzz transend math threshold a bit ( #5442 )
...
* relax fuzz transend math threshold a bit
* fuzz more
* fuzz 50k
2024-07-13 03:31:21 -04:00
chenyu
e398734890
fuzz test transcend math ( #5383 )
...
* fuzz test transcend math
found something wrong with float64 sin reduction
```
from tinygrad import Tensor, dtypes
import numpy as np
print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy())
print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy())
print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy())
print(np.sin(np.array([39800.0], dtype=np.float64)))
print(np.sin(np.array([39800.0], dtype=np.float32)))
print(np.sin(np.array([39800.0], dtype=np.float16)))
```
```
CLANG=1 python test.py
[0.92785633]
[0.7428573]
[-0.7705]
[0.74285722]
[0.7428572]
[-0.7705]
```
* fix test
* abs
* skip
2024-07-13 01:54:52 -04:00
hikettei
3a7262d923
[Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) ( #5441 )
...
* [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203
* Patch: compare the value of xlog(x) using y, allowing x <= 1e-200
* mypy
* fuzzer tests for log2
* fix tests: use approximate dbl_min, fp64 fails at nv
* update: gradually increment the scale (if y is not inf)
2024-07-13 01:11:53 -04:00
wozeparrot
90f0e2fc49
db in wal mode ( #5388 )
2024-07-12 20:43:36 -07:00
George Hotz
414aa6ee98
minor improvments to matcher speed [run_process_replay] ( #5439 )
...
* minor improvments to matcher speed [run_process_replay]
* oh, put that back
2024-07-12 20:41:41 -07:00
chenyu
4df63da190
clean up rest of the loadop [run_process_replay] ( #5440 )
...
to metaop and filter_sink
2024-07-12 23:38:51 -04:00
hikettei
0795139f30
Fix TRANSCENDENTAL=2 fp64 sin ( #5385 )
...
* fixes on transcendental: fix for fp64 payne hanek, refactor for fp16 sin
* revert the changes on test
* refactor on xsin: removed cody_waite_reduction, always use payne_hanek
* Revert "refactor on xsin: removed cody_waite_reduction, always use payne_hanek"
This reverts commit 2fd401f251 .
* still need cody_waite_reduction for the very smaller range
* test: added a regression test for transcendental sin
* test: found the worse case ulp 3.5 only in numpy
* give the input as a valid dtype
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-12 23:15:04 -04:00
George Hotz
fb3011ac61
improve matcher speed [run_process_replay] ( #5438 )
...
* improve matcher speed [run_process_replay]
* don't use arg set in ptx
2024-07-12 20:02:19 -07:00
George Hotz
03c2dc8bd7
lowerer is kernel [run_process_replay] ( #5437 )
2024-07-12 18:50:55 -07:00
George Hotz
b8342fb085
independent lowerer [run_process_replay] ( #5434 )
...
* independent lowerer [run_process_replay]
* don't relinearize PTX
* fix ptx
* Revert "fix ptx"
This reverts commit f4e8e059c0 .
* Revert "don't relinearize PTX"
This reverts commit f6c12c506c .
* parents is fine, no need for linearization
* remove loop local idxs
* recover stupid loop_idxs
2024-07-12 18:08:43 -07:00
chenyu
9a187e6102
fix handcode_opt script ( #5435 )
...
* fix handcode_opt script
* run in ci
* real run in ci
* HALF=0
2024-07-12 20:52:28 -04:00
wozeparrot
b80fd7d23c
allow benchmarking forward only ( #5436 )
2024-07-12 17:37:49 -07:00
chenyu
00813a92a0
update Tensor.eye api to match torch ( #5433 )
...
* update Tensor.eye api to match torch
input is n for nrows and optional m for ncols
* space
* fix onnx
2024-07-12 20:25:12 -04:00
George Hotz
cddfd8e25d
bugfix: group for reduce should check all dimensions ( #5431 )
2024-07-12 17:02:40 -07:00
George Hotz
fbaf040baf
compute full_shape from LazyOp [run_process_replay] ( #5429 )
...
* compute full_shape from LazyOp
* put KernelInfo in the sink
* wrong but pass
2024-07-12 16:47:08 -07:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
chenyu
4cd1de038a
smaller reshape_and_permute arg in shift_to ( #5426 )
...
adding tuples directly
[run_process_replay]
2024-07-12 17:46:48 -04:00
George Hotz
94599c0637
fixup ast in kernel to be MetaOps.SINK [run_process_replay] ( #5424 )
...
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]
* fix tests
* fix more tests
2024-07-12 14:01:03 -07:00
George Hotz
b055ece550
hotfix: bump to cache gpuocelot
2024-07-12 13:54:14 -07:00
chenyu
d37056f3b1
pass Renderer.global_max / local_max into get_grouped_dims ( #5423 )
...
[run_process_replay]
2024-07-12 16:49:27 -04:00
George Hotz
4aefb1595d
MetaOps.SINK [run_process_replay] ( #5422 )
...
* s/loadops/metaops [run_process_replay]
* add metaops.sink [run_process_replay]
2024-07-12 13:37:30 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
nimlgen
f4944ced09
tiny amd cleanups ( #5420 )
2024-07-12 22:54:42 +03:00
chenyu
b17e4adb3a
add -c advice.detachedHead=false to process replay git checkout ( #5419 )
...
remove the noisy `Note: switching to 'origin/master'.
You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
wozeparrot
d1cbd6bb95
unity handcode_resnet_opt and handcode_bert_opt ( #5418 )
2024-07-12 12:05:01 -07:00
chenyu
a0dbe20dbd
skip some redundant and slow tests in ci ( #5416 )
2024-07-12 14:43:13 -04:00
chenyu
76125c07be
make some grouped_dim test work ( #5415 )
...
next need to support max size per dim, splitting and correct way to do reverse or arbitrary permute global dims
2024-07-12 14:22:50 -04:00
wozeparrot
b7cc75a9df
usage summary in handcode opt ( #5414 )
2024-07-12 11:21:18 -07:00
uuuvn
3cb94a0a15
Rename tinygrad/runtime/driver to support ( #5413 )
2024-07-12 11:06:42 -07:00
nimlgen
6604d2b2c3
amd/nv respect visible devs ( #5409 )
...
* nv/amd respect visible devices
* linter
* sort amd gpus
* env docs
2024-07-12 20:02:12 +03:00