Commit Graph

846 Commits

Author SHA1 Message Date
George Hotz
870dc8c350 s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0 scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
94599c0637 fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424)
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]

* fix tests

* fix more tests
2024-07-12 14:01:03 -07:00
George Hotz
f6ef283e6a s/loadops/metaops [run_process_replay] (#5421) 2024-07-12 13:26:50 -07:00
uuuvn
3cb94a0a15 Rename tinygrad/runtime/driver to support (#5413) 2024-07-12 11:06:42 -07:00
qazal
31fcc516dc more process replay tooling (#5407)
* replays

* what's in there

* can it be up there

* sha is enough

* insert sha as the key

* fix str

* update reset utils

* that nested try/except was terrible

* github_context can go
2024-07-12 13:11:34 +03:00
chenyu
6e0a523078 repro slow resnet kernel with 4 global dims (#5402)
* repro slow resnet kernel with 4 global dims

* fix ruff
2024-07-11 23:31:15 -04:00
George Hotz
01fbd18209 metal compile fail 2024-07-11 19:27:05 -07:00
qazal
9712d9ffb6 pass lowering errors if not asserting process replay (#5395)
* pass lowering errors if not asserting process replay

* ProcessReplayError
2024-07-11 19:09:12 -04:00
qazal
004366b193 context aware process replay [run_process_replay] (#5378)
* test tc as ctx var

* remove from opts

* process replay

* pop variable

* B -> Variable

* fix re-assign

* pop temp vars

* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
George Hotz
d13654a820 move uopgraph to file [run_process_replay] (#5364)
* move uopgraph to file [run_process_replay]

* fix print tree test
2024-07-10 17:34:50 -07:00
Elias Wahl
097268fab3 Add layerwise performance bench for bert (#5349)
* add bert bench

* dont disable by defauöt

* remove lr

* linter
2024-07-09 15:03:25 -04:00
qazal
bee96a19ff fuzz uop schedules (#5345)
* basic blocks + cleanups

* fixups

* elif is better for future me

* fuzz_schedule_max_paths

* fix linter
2024-07-09 15:24:56 +03:00
qazal
d813617742 prescheduling refactor (#5300)
* p1

* refactor tuple
2024-07-06 12:04:03 +03:00
qazal
b369e75ed0 refactor schedule creation (#5297) 2024-07-05 21:14:38 +03:00
chenyu
f1ff65e763 remove "no-nans-fp-math"="true" for LLVM (#5282)
fixed isnan for llvm (still have issue with < nan)
2024-07-03 17:52:50 -04:00
nimlgen
7be776f9af add _alloc_signal/_free_signal to hcq (#5264)
* add _alloc_signal/_free_signal api

* oops, revert this

* linter
2024-07-02 23:35:39 +03:00
Tobias Fischer
8c9c1cf62f Pulled CLIP and UNet into Seperate Files (#5253)
* pulled clip and unet into seperate files

* reference cleanup, lru cache fix

* better pool indexing
2024-07-01 22:33:01 -04:00
Roelof van Dijk
975b811ad9 names shadowing builtins (#5179)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:15:01 -04:00
Roelof van Dijk
f88f71d73a ruff: unnecessary-comprehension (#5174)
* enable ruff C416 unnecessary-comprehension

* already a list
2024-06-27 07:45:29 -04:00
qazal
6ca7b13ed1 limit pickled objects [run_process_replay] (#5154)
* limit pickled objects

* delete uop from the list

* debug metal

* need self.opts for TC

* dont need device

* [run_process_replay]

* minor
2024-06-26 13:51:32 +03:00
George Hotz
63ba2d05d1 uops dfs cleanup (#5147)
* uops dfs cleanup

* Update uops.py
2024-06-25 18:51:42 -07:00
chenyu
e356807696 tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
chenyu
8080298739 s/tinytqdm/tqdm (#5103)
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
nimlgen
f1e758bacb graph fuzzer (#5082)
* graph fuzzer

* more options

* mypy

* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
8aa786232d docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
George Hotz
6f6b3b10c9 import from uops, not linearizer (#5064) 2024-06-20 08:08:44 -07:00
qazal
ee01e464e3 use process replay as a diff creator (#4903)
* add no_assert option [run_process_replay] [no_assert]

* test [run_process_replay] [no_assert]

* [run_process_replay]

* back to normal [run_process_replay]

* remove the log
2024-06-19 18:17:31 +03:00
chenyu
a3ed4176c8 use tinytqdm in active tests and examples (#5038)
* use tinytqdm in active tests and examples

stress test this before 0.9.1

* no set_description
2024-06-18 16:01:19 -04:00
kormann
7c3b877216 rename uop [run_process_replay] (#5031)
* rename

* fix unittests

* rename vin

* fix test

* fix type [run_process_replay]

* rm pre commit hook change
2024-06-18 21:34:05 +03:00
nimlgen
794acefbf3 hcq update waits and signals in place (#4984)
* hcq update waits and signals in place

* start amd

* amd works

* prettier

* test

* normal messages

* linetr

* linter 2
2024-06-17 17:19:07 +03:00
qazal
71aad183fd check Program from HEAD [run_process_replay] (#4996)
* use the same prg [run_process_replay]

* put var back
2024-06-16 20:12:30 +03:00
chenyu
67e8df4969 remove numpy from dtype (#4969)
replaced all dtype.np with _to_np_dtype defined in tensor.py.

after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
George Hotz
14189bca68 graph_dedup function [run_process_replay] (#4955) 2024-06-14 04:24:37 -07:00
George Hotz
63a8add2c2 move uops add logic to linearize (#4952)
* move logic to linearize

* idk how this should work

* empty
2024-06-14 03:52:37 -07:00
Jhenner Tigreros
dc9e9e4363 Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887)
* Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV

* Delete unused import

* Add cstyle renderer

* Fix formatting text

* Fix test error due to bad implementation of renderer

* Add PTX support

* Add RECIP to LLVMIR

* Remove BinaryOps.DIV from symbolic test

* Change some test and fix C floor division

* Change references to DIV for the RECIP or IDIV

* Add mimic idiv for symbolic test

* Restore floor

* Mimic idiv

* cast to int

* Fix some test and renderer

* Remove DIV for render nodes

* Resolve issue with div

* Add TestRenderer

* Fix test

* fix error

* Fix PAD test

* Fix div implementation

* Remove DIV

* Add upcast to rshift, due to use of MUL and RECIP on DIV

* Fix linter

* Remove complete BinaryOps.DIV

* Fix lint

* Fix some test

* Revert mul modification

* Fix tests

* Fix CLANG for uops

* Revert IDIV function

* Minor fix

* modify pattern matching rule to support nan

* Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP

* Remove const folding for IDIV and fix PTX

* Complete remove IDIV from extra

* Remove test_div from TestFloatUOps due to test on recip

* Fix linearizer

* fix

* Fix test_22

* Fix llvm

* Apply trunc function for llvmlit

* use floor instead of trunc

* Use correct type

* Generate new fuzz db

* Fix rshift, do not cast to float to support idiv

* Return upcast=false to rshift

* Add to unsafepad BinaryOps.IDIV

* Remove RECIP override for CUDA

* add atol / rtol for the test

* Remove cast to int on IDIV

* Regenerate sops

* delete sops.gz

* regenerate

* regenerate

* regenerate

* Reduce margins

* pass atol and rtol as parametersg for _test_metrics

* regenerated dataset

* Regenerate

* Remove duplicated

* Revert changes on extra

* Remove changes extra and NOQA for test

* Remove E501

* Remove and change line

* Remove E501

* Fix atan2

* Revert import and E501

* Remove E501

* Add hrcp to halp ops

* Remove 1 of hrcp

* Remove last DIV and add type check on uops for IDIV

* Fix new tests

* Fix tests and custom function

* Regenerate dataset

* Regenerate dataset

* Revert dataset

* Change generate dataset script

* Remove line

* Change IDIV, type checker validate if x,y and z are int

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-14 02:43:46 -07:00
chenyu
fdbb4305cb skip unsupported dtype in fuzz_linearizer (#4917)
resolve issues in #4887. dataset generated from ubuntu but metal does not support double
2024-06-11 18:18:21 -04:00
chenyu
b886d250fb improve test_dropout_on_shard (#4912)
tested some basic property, also minor formatting for a few Tensor.training setups
2024-06-11 11:36:02 -04:00
nimlgen
d24e57c615 amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd4.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
qazal
1e3325f369 raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00
qazal
66dfd5e7bf faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
Francis Lam
890e7c12bb test/external/verify_kernel: add support for single pickled kernel (#4836) 2024-06-04 18:59:21 -04:00
Elias Wahl
04e237328b Refactor to class style (#4804) 2024-06-04 14:08:31 -07:00
chenyu
cde7a7cda7 isolate the 134ms kernel in train_gpt2.py (#4773)
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
chenyu
59c6472b9f check contiguous in View.create after canonicalizing mask and offset (#4770)
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
nimlgen
019f4680e5 check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
nimlgen
c9f7f2da70 nv hcq bind api (#4629)
* hcq bind api for nv

* linter

* linter

* add test

* small comment
2024-05-19 23:17:10 +03:00