Francis Lam
2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes ( #5519 )
...
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
Edward Wang
9a7d5a148e
move colorize_float to helpers.py ( #5490 )
...
* add colorize_float to helpers.py
* update references
2024-07-15 11:29:03 -07:00
qazal
ae4cb7994e
run process replay with DEBUG=0 ( #5491 )
...
* process replay with DEBUG=0
* graceful shutdown
* use and
2024-07-15 16:30:57 +03:00
qazal
3c378efcb6
process replay docs improvements ( #5481 )
...
* minor cleanups
* docs and logs
* shorter
* comma
* s/print/logging.info [run_process_replay]
* use logging.warn
* process name is noise
* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
qazal
671779f280
limit process replay diff to ~20% of kernels ( #5480 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
* add changed
* env var
* more early exit
* simpler?
* Revert "Merge branch 'lidx0' into process_replay_limit"
This reverts commit cbadcfa5e9 , reversing
changes made to fc9bf37ee7 .
* minor cleanup
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-14 23:10:08 +03:00
qazal
0b3a34e3b1
vectorize folding [run_process_replay] ( #5470 )
...
* test_gep_vec_fold
* remove that
* fix process replay
* lint
2024-07-14 09:41:48 +03:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
qazal
487ceff825
hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist ( #5456 )
2024-07-13 21:15:40 +03:00
qazal
40ec9410f9
simpler process replay ( #5452 )
...
* remove check_process_replay
* that can go to the top
* add assert back
* [run_process_replay]
* checkout code [run_process_replay]
* temp [run_process_replay]
* revert temp [run_process_replay]
* ahh this is why [run_process_replay]
* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
qazal
23b907efbb
restore process replay runs by their id ( #5453 )
2024-07-13 19:32:34 +03:00
qazal
bb1a9ebf78
run process replay in parallel ( #5443 )
2024-07-13 11:29:36 +03:00
George Hotz
fb3011ac61
improve matcher speed [run_process_replay] ( #5438 )
...
* improve matcher speed [run_process_replay]
* don't use arg set in ptx
2024-07-12 20:02:19 -07:00
George Hotz
03c2dc8bd7
lowerer is kernel [run_process_replay] ( #5437 )
2024-07-12 18:50:55 -07:00
wozeparrot
b80fd7d23c
allow benchmarking forward only ( #5436 )
2024-07-12 17:37:49 -07:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
94599c0637
fixup ast in kernel to be MetaOps.SINK [run_process_replay] ( #5424 )
...
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]
* fix tests
* fix more tests
2024-07-12 14:01:03 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
uuuvn
3cb94a0a15
Rename tinygrad/runtime/driver to support ( #5413 )
2024-07-12 11:06:42 -07:00
qazal
31fcc516dc
more process replay tooling ( #5407 )
...
* replays
* what's in there
* can it be up there
* sha is enough
* insert sha as the key
* fix str
* update reset utils
* that nested try/except was terrible
* github_context can go
2024-07-12 13:11:34 +03:00
chenyu
6e0a523078
repro slow resnet kernel with 4 global dims ( #5402 )
...
* repro slow resnet kernel with 4 global dims
* fix ruff
2024-07-11 23:31:15 -04:00
George Hotz
01fbd18209
metal compile fail
2024-07-11 19:27:05 -07:00
qazal
9712d9ffb6
pass lowering errors if not asserting process replay ( #5395 )
...
* pass lowering errors if not asserting process replay
* ProcessReplayError
2024-07-11 19:09:12 -04:00
qazal
004366b193
context aware process replay [run_process_replay] ( #5378 )
...
* test tc as ctx var
* remove from opts
* process replay
* pop variable
* B -> Variable
* fix re-assign
* pop temp vars
* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
George Hotz
d13654a820
move uopgraph to file [run_process_replay] ( #5364 )
...
* move uopgraph to file [run_process_replay]
* fix print tree test
2024-07-10 17:34:50 -07:00
Elias Wahl
097268fab3
Add layerwise performance bench for bert ( #5349 )
...
* add bert bench
* dont disable by defauöt
* remove lr
* linter
2024-07-09 15:03:25 -04:00
qazal
bee96a19ff
fuzz uop schedules ( #5345 )
...
* basic blocks + cleanups
* fixups
* elif is better for future me
* fuzz_schedule_max_paths
* fix linter
2024-07-09 15:24:56 +03:00
qazal
d813617742
prescheduling refactor ( #5300 )
...
* p1
* refactor tuple
2024-07-06 12:04:03 +03:00
qazal
b369e75ed0
refactor schedule creation ( #5297 )
2024-07-05 21:14:38 +03:00
chenyu
f1ff65e763
remove "no-nans-fp-math"="true" for LLVM ( #5282 )
...
fixed isnan for llvm (still have issue with < nan)
2024-07-03 17:52:50 -04:00
nimlgen
7be776f9af
add _alloc_signal/_free_signal to hcq ( #5264 )
...
* add _alloc_signal/_free_signal api
* oops, revert this
* linter
2024-07-02 23:35:39 +03:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
Roelof van Dijk
975b811ad9
names shadowing builtins ( #5179 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:15:01 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
qazal
6ca7b13ed1
limit pickled objects [run_process_replay] ( #5154 )
...
* limit pickled objects
* delete uop from the list
* debug metal
* need self.opts for TC
* dont need device
* [run_process_replay]
* minor
2024-06-26 13:51:32 +03:00
George Hotz
63ba2d05d1
uops dfs cleanup ( #5147 )
...
* uops dfs cleanup
* Update uops.py
2024-06-25 18:51:42 -07:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
nimlgen
f1e758bacb
graph fuzzer ( #5082 )
...
* graph fuzzer
* more options
* mypy
* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
8aa786232d
docs for running process replay locally ( #5083 )
2024-06-21 09:55:08 -04:00
George Hotz
6f6b3b10c9
import from uops, not linearizer ( #5064 )
2024-06-20 08:08:44 -07:00
qazal
ee01e464e3
use process replay as a diff creator ( #4903 )
...
* add no_assert option [run_process_replay] [no_assert]
* test [run_process_replay] [no_assert]
* [run_process_replay]
* back to normal [run_process_replay]
* remove the log
2024-06-19 18:17:31 +03:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
kormann
7c3b877216
rename uop [run_process_replay] ( #5031 )
...
* rename
* fix unittests
* rename vin
* fix test
* fix type [run_process_replay]
* rm pre commit hook change
2024-06-18 21:34:05 +03:00
nimlgen
794acefbf3
hcq update waits and signals in place ( #4984 )
...
* hcq update waits and signals in place
* start amd
* amd works
* prettier
* test
* normal messages
* linetr
* linter 2
2024-06-17 17:19:07 +03:00
qazal
71aad183fd
check Program from HEAD [run_process_replay] ( #4996 )
...
* use the same prg [run_process_replay]
* put var back
2024-06-16 20:12:30 +03:00
chenyu
67e8df4969
remove numpy from dtype ( #4969 )
...
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
George Hotz
14189bca68
graph_dedup function [run_process_replay] ( #4955 )
2024-06-14 04:24:37 -07:00
George Hotz
63a8add2c2
move uops add logic to linearize ( #4952 )
...
* move logic to linearize
* idk how this should work
* empty
2024-06-14 03:52:37 -07:00