Files
tinygrad/examples
chenyu ac183568be llama JIT python runtime speedup (#1633)
* no JIT call in TransformerBlock

* idea

* move 2 reshapes to jitted function

shrink inside jitted too, 6.3ms

remove back reshapes, 5.5ms

isinstance -> __class__ 4.99ms

* think

revert ops_gpu.py

revert symbolic.py too

PYOPENCL_COMPILER_OUTPUT=1

* cleanup

* fix cache shape for conversational model

only reshape if start_pos > 0

* small cleanup

* include var_vals.keys() to st.key

* add comments

* llama small update

* everything jitted again, similar structure to gpt2

* fix typing

* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
..
2023-08-21 15:22:29 -07:00
2023-03-11 16:28:10 -08:00
2023-08-03 23:35:52 -07:00
2023-08-23 08:30:17 -07:00
2023-08-26 20:15:54 -04:00
2023-08-01 09:35:48 -07:00
2023-06-25 15:37:51 -07:00
2023-08-22 07:36:24 -07:00
2023-08-22 07:36:24 -07:00
2023-08-22 07:36:24 -07:00
2023-08-22 07:36:24 -07:00
2023-08-21 09:53:29 -07:00
2023-08-22 07:36:24 -07:00
2023-08-22 07:36:24 -07:00
2023-08-21 09:53:29 -07:00
2023-08-22 07:36:24 -07:00