George Hotz
b8342fb085
independent lowerer [run_process_replay] ( #5434 )
...
* independent lowerer [run_process_replay]
* don't relinearize PTX
* fix ptx
* Revert "fix ptx"
This reverts commit f4e8e059c0 .
* Revert "don't relinearize PTX"
This reverts commit f6c12c506c .
* parents is fine, no need for linearization
* remove loop local idxs
* recover stupid loop_idxs
2024-07-12 18:08:43 -07:00
chenyu
9a187e6102
fix handcode_opt script ( #5435 )
...
* fix handcode_opt script
* run in ci
* real run in ci
* HALF=0
2024-07-12 20:52:28 -04:00
wozeparrot
b80fd7d23c
allow benchmarking forward only ( #5436 )
2024-07-12 17:37:49 -07:00
chenyu
00813a92a0
update Tensor.eye api to match torch ( #5433 )
...
* update Tensor.eye api to match torch
input is n for nrows and optional m for ncols
* space
* fix onnx
2024-07-12 20:25:12 -04:00
George Hotz
cddfd8e25d
bugfix: group for reduce should check all dimensions ( #5431 )
2024-07-12 17:02:40 -07:00
George Hotz
fbaf040baf
compute full_shape from LazyOp [run_process_replay] ( #5429 )
...
* compute full_shape from LazyOp
* put KernelInfo in the sink
* wrong but pass
2024-07-12 16:47:08 -07:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
chenyu
4cd1de038a
smaller reshape_and_permute arg in shift_to ( #5426 )
...
adding tuples directly
[run_process_replay]
2024-07-12 17:46:48 -04:00
George Hotz
94599c0637
fixup ast in kernel to be MetaOps.SINK [run_process_replay] ( #5424 )
...
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]
* fix tests
* fix more tests
2024-07-12 14:01:03 -07:00
George Hotz
b055ece550
hotfix: bump to cache gpuocelot
2024-07-12 13:54:14 -07:00
chenyu
d37056f3b1
pass Renderer.global_max / local_max into get_grouped_dims ( #5423 )
...
[run_process_replay]
2024-07-12 16:49:27 -04:00
George Hotz
4aefb1595d
MetaOps.SINK [run_process_replay] ( #5422 )
...
* s/loadops/metaops [run_process_replay]
* add metaops.sink [run_process_replay]
2024-07-12 13:37:30 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
nimlgen
f4944ced09
tiny amd cleanups ( #5420 )
2024-07-12 22:54:42 +03:00
chenyu
b17e4adb3a
add -c advice.detachedHead=false to process replay git checkout ( #5419 )
...
remove the noisy `Note: switching to 'origin/master'.
You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
wozeparrot
d1cbd6bb95
unity handcode_resnet_opt and handcode_bert_opt ( #5418 )
2024-07-12 12:05:01 -07:00
chenyu
a0dbe20dbd
skip some redundant and slow tests in ci ( #5416 )
2024-07-12 14:43:13 -04:00
chenyu
76125c07be
make some grouped_dim test work ( #5415 )
...
next need to support max size per dim, splitting and correct way to do reverse or arbitrary permute global dims
2024-07-12 14:22:50 -04:00
wozeparrot
b7cc75a9df
usage summary in handcode opt ( #5414 )
2024-07-12 11:21:18 -07:00
uuuvn
3cb94a0a15
Rename tinygrad/runtime/driver to support ( #5413 )
2024-07-12 11:06:42 -07:00
nimlgen
6604d2b2c3
amd/nv respect visible devs ( #5409 )
...
* nv/amd respect visible devices
* linter
* sort amd gpus
* env docs
2024-07-12 20:02:12 +03:00
Roelof van Dijk
b18aa00bba
refactor: consolidate replace [run_process_replay] ( #5403 )
2024-07-12 07:36:57 -07:00
chenyu
497274f663
add float64 to test_dtype_alu dtypes_float ( #5410 )
...
* add float64 to test_dtype_alu dtypes_float
* CUDACPU float64 crashes
* real NV failed
2024-07-12 10:21:32 -04:00
qazal
31fcc516dc
more process replay tooling ( #5407 )
...
* replays
* what's in there
* can it be up there
* sha is enough
* insert sha as the key
* fix str
* update reset utils
* that nested try/except was terrible
* github_context can go
2024-07-12 13:11:34 +03:00
Roelof van Dijk
6ec7dbc287
ci: parallelize uops tests ( #5405 )
2024-07-12 11:22:41 +03:00
qazal
e22b377839
generalize FUSE_AS_ONE_KERNEL in the scheduler ( #5397 )
...
* test: use const
* hotfix: base
* asserts
* dont push through reshape
* cleanup
* dont need the cache
* test_reduceop_reshape_dont_push and test_index_fused are next
2024-07-12 10:23:16 +03:00
chenyu
6e0a523078
repro slow resnet kernel with 4 global dims ( #5402 )
...
* repro slow resnet kernel with 4 global dims
* fix ruff
2024-07-11 23:31:15 -04:00
George Hotz
8390feb7b9
optim.OptimizerGroup in hlb_cifar ( #5401 )
2024-07-11 20:14:36 -07:00
George Hotz
01fbd18209
metal compile fail
2024-07-11 19:27:05 -07:00
George Hotz
3a2b5a75d2
improve single kernel indexing ( #5398 )
...
* improve single kernel indexing
* metadata in graph (#5399 )
* indexing is O(1)
* add failing test
* ugh, that all needs to be replaced with symbolic
* broken on ptx, it's fine
---------
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2024-07-11 19:00:57 -07:00
wozeparrot
c24d495ef9
metadata in handcode_opt ( #5400 )
2024-07-11 17:45:34 -07:00
wozeparrot
c60838594c
metadata in graph ( #5399 )
2024-07-11 17:02:12 -07:00
George Hotz
c2da4454cd
indexing getting better ( #5389 )
...
* indexing getting better [run_process_replay] [no_assert]
* fix test
* test_arange_2_reduce is a simpler test
* put that print back, NOOPT
* don't merge reduces (they could be different reduces)
* FUSE_AS_ONE_KERNEL
* fix tests
* fix test_var_multireduce
* w/e put that there
* fails on others too
* fix test, revert UNMUL change
* in case order matters
* one kernel indexing works
* one kernel indexing works (test other)
2024-07-11 16:41:51 -07:00
qazal
9712d9ffb6
pass lowering errors if not asserting process replay ( #5395 )
...
* pass lowering errors if not asserting process replay
* ProcessReplayError
2024-07-11 19:09:12 -04:00
wozeparrot
a02b38c0ac
download openimages by running it ( #5396 )
2024-07-11 16:06:13 -07:00
qazal
0421f5d83e
hotfix: compare test_var_multireduce against numpy ( #5394 )
2024-07-11 18:57:08 -04:00
qazal
b91a0ccdc3
make [run_process_replay] [no_assert] the default ( #5390 )
2024-07-11 22:36:59 +03:00
George Hotz
e8191479a3
add bigint type for indexing [run_process_replay] ( #5387 )
2024-07-11 11:37:10 -07:00
George Hotz
5232e405ce
hotfix: add BS to beautiful_mnist
2024-07-11 10:55:05 -07:00
George Hotz
3e40211e45
add UOP_IS_SYMBOLIC [run_process_replay] [no_assert] ( #5386 )
...
* cleanup a few things in uops [run_process_replay] [no_assert]
* add optional UOP_IS_SYMBOLIC
2024-07-11 10:48:45 -07:00
nimlgen
b3790b759b
nv cleanup gpfifo setup ( #5382 )
...
* nv cleanup gpfifo setup
* save lines
2024-07-11 17:50:52 +03:00
chenyu
416f838a1a
hotfix tqdm respects total=0 if set ( #5380 )
...
if you insist total=0, it should use 0 instead of inferring from iterable. matched tqdm
2024-07-11 10:30:12 -04:00
nimlgen
2ba96d4c29
nv use mv_address ( #5381 )
...
* nv use mv_address
* unsued import
2024-07-11 16:45:03 +03:00
nimlgen
bd77efda2f
add HWCommandQueue base class for hcq devices ( #5303 )
...
* add HWCommandQueue as base queue for hcq devices
* try this
* fixes
* comments
* linter
* linetr2
* linter
* linter
* fixed
* revert this
2024-07-11 16:19:13 +03:00
qazal
dc3ea78560
hotfix: faster UOps.END* insert [run_process_replay] ( #5377 )
...
* is this faster
* p2
* don't waste lines
2024-07-11 13:20:19 +03:00
qazal
004366b193
context aware process replay [run_process_replay] ( #5378 )
...
* test tc as ctx var
* remove from opts
* process replay
* pop variable
* B -> Variable
* fix re-assign
* pop temp vars
* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
qazal
45e1b9d5e3
use TC options as ContextVars [run_process_replay] ( #5379 )
...
* delete from renderer
* move to ctx
2024-07-11 12:01:36 +03:00
qazal
289fd2e940
Lowerer cleanup 2 [run_process_replay] ( #5376 )
...
* test outbufs delete
* comments
* valid is bool
2024-07-11 10:56:53 +03:00
qazal
9ca2d96b6b
delete extra check in DEFINE_ACC [run_process_replay] ( #5375 )
2024-07-11 10:49:03 +03:00