diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 18a01d085c..2031342b93 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -91,7 +91,7 @@ jobs:
       run: python -m mypy --strict-equality
     - name: Test Docs
       run: |
-        python docs-legacy/abstractions2.py
+        python docs/abstractions2.py
     - name: Test Quickstart
       run: awk '/```python/{flag=1;next}/```/{flag=0}flag' docs/quickstart.md > quickstart.py &&  PYTHONPATH=. python quickstart.py
     - name: Fuzz Test symbolic
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index a1bc8fcbd3..804712aef0 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -21,7 +21,7 @@ repos:
         pass_filenames: false
       - id: docs2
         name: docs2
-        entry: python3 docs-legacy/abstractions2.py
+        entry: python3 docs/abstractions2.py
         language: system
         always_run: true
         pass_filenames: false
diff --git a/.tokeignore b/.tokeignore
deleted file mode 100644
index cb7645b24d..0000000000
--- a/.tokeignore
+++ /dev/null
@@ -1,4 +0,0 @@
-*
-!*/
-
-!tinygrad/**
diff --git a/README.md b/README.md
index d3e74900be..740c3335c5 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
 <div align="center">
 
 <picture>
-  <source media="(prefers-color-scheme: light)" srcset="/docs-legacy/logo_tiny_light.svg">
-  <img alt="tiny corp logo" src="/docs-legacy/logo_tiny_dark.svg" width="50%" height="50%">
+  <source media="(prefers-color-scheme: light)" srcset="/docs/logo_tiny_light.svg">
+  <img alt="tiny corp logo" src="/docs/logo_tiny_dark.svg" width="50%" height="50%">
 </picture>
 
 tinygrad: For something between [PyTorch](https://github.com/pytorch/pytorch) and [karpathy/micrograd](https://github.com/karpathy/micrograd). Maintained by [tiny corp](https://tinygrad.org).
@@ -87,7 +87,7 @@ tinygrad already supports numerous accelerators, including:
 - [x] [HSA](tinygrad/runtime/ops_hsa.py)
 
 And it is easy to add more! Your accelerator of choice only needs to support a total of ~25 low level ops.
-More information can be found in the [documentation for adding new accelerators](/docs-legacy/adding_new_accelerators.md).
+More information can be found in the [documentation for adding new accelerators](/docs/adding_new_accelerators.md).
 
 ## Installation
 
diff --git a/docs-legacy/DESIGNv2.md b/docs-legacy/DESIGNv2.md
deleted file mode 100644
index 5dda08e36b..0000000000
--- a/docs-legacy/DESIGNv2.md
+++ /dev/null
@@ -1,17 +0,0 @@
-tinygrad is a bit bloated now, and there's several places where concerns should be seperated and they aren't.
-
-tensor.py and mlops.py are great code. The interface going backward here is:
-
-LazyBuffer.const (this creates a matching size buffer)
-LazyBuffer.contiguous (tbis is not exactly elementwise)
-LazyBuffer.e (elementwise)
-LazyBuffer.r (reduce)
-reshape/permute/expand/stride/shrink/pad (movement)
-
-The lazy.py reordering engine has a lot of junk to deal with movementops that should be removed.
-
-view.py is mostly great code, except it shouldn't have the rendering logic, and the int type should be parameterized to not import from symbolic.
-
-LazyOp shouldn't have LazyBuffers as sources, just LazyOp LoadOps with a tuple of Views. Then the LazyOp uniquely determines the kernel and we don't have to do any replacement.
-
-ShapeTracker probably shouldn't exist and just be a part of LazyBuffer. Most of the stuff in ShapeTracker should move to symbolic_view, which combines view and symbolic.
diff --git a/docs-legacy/OVERVIEW.md b/docs-legacy/OVERVIEW.md
deleted file mode 100644
index 6ccfb1712d..0000000000
--- a/docs-legacy/OVERVIEW.md
+++ /dev/null
@@ -1,25 +0,0 @@
-tinygrad has four pieces
-
-* frontend (Tensor -> LazyBuffer)
-  * See tensor.py, function.py, multi.py, and lazy.py
-  * The user interacts with the Tensor class
-  * This outputs LazyBuffers, which form the simple compute graph
-* scheduler (LazyBuffer -> ScheduleItem)
-  * See engine/schedule.py
-  * When a Tensor is realized, the scheduler is run to get its LazyBuffers to be computed
-  * This takes in LazyBuffers and groups them as appropriate into kernels.
-  * It returns a list of ScheduleItems + all the Variables used in the graph
-* lowering (TODO: lots of work to clean this up still)
-  * See codegen/ (ScheduleItem.ast -> UOps)
-    * ScheduleItems have an ast that's compiled into actual GPU code
-    * Many optimization choices can be made here, this contains a beam search.
-  * renderer/compiler (UOps -> machine code)
-    * UOps are tinygrad's IR, similar to LLVM IR
-    * Here we either convert them to a high level language or machine code directly
-  * engine/realize.py (ScheduleItem -> ExecItem)
-* runtime
-  * See runtime/
-  * Runtime actually interacts with the GPUs
-  * It manages Buffers, Programs, and Queues
-  * Sadly, METAL and GPU (OpenCL) don't have a compiler that can be pulled out from the device itself
-
diff --git a/docs-legacy/README.md b/docs-legacy/README.md
deleted file mode 100644
index a61c1ccf99..0000000000
--- a/docs-legacy/README.md
+++ /dev/null
@@ -1,31 +0,0 @@
-# Welcome to the tinygrad documentation!
-
-Here you will find documentation for tinygrad, as well as some examples and tutorials.
-
-## Getting Started
-
-Read the quick start guide [here](/docs/quickstart.md).
-
-Or if you want to jump right in to how tinygrad works, you can read the [abstraction stack](/docs-legacy/abstractions2.py) documentation.
-
-Or if you want to see some examples, you can look at the examples in the [examples](/examples) directory.
-
-Or if you just want to see some of the things tinygrad can do, check out the [showcase](/docs/showcase.md).
-
-## API
-
-This is currently a big work in progress.
-
-## Resources
-
-### Environment Variables
-
-[env_vars.md](/docs-legacy/env_vars.md)
-
-### Adding New Accelerators
-
-[adding_new_accelerators.md](/docs-legacy/adding_new_accelerators.md)
-
-### Community
-
-[![tinygrad discord](https://discordapp.com/api/guilds/1068976834382925865/widget.png?style=banner2)](https://discord.gg/ZjZadyC7PK)
diff --git a/docs-legacy/adding_new_accelerators.md b/docs-legacy/adding_new_accelerators.md
deleted file mode 100644
index 728a308398..0000000000
--- a/docs-legacy/adding_new_accelerators.md
+++ /dev/null
@@ -1,33 +0,0 @@
-# Adding a new accelerator to tinygrad
-
-It's pretty easy to add a new accelerator to tinygrad. All you need to do is implement a total of 20 (optionally 21) low level ops. Then tinygrad takes care of the rest, handling derivatives and syntactic sugar.
-
-## llops
-
-These are the ops that you must implement for your accelerator of choice.
-```
-Buffer                                                       # class of memory on this device
-unary_op   (NOOP, CAST, EXP2, LOG2, SIN, SQRT)               # A -> A
-reduce_op  (SUM, MAX)                                        # A -> B (smaller size, B has 1 in shape)
-binary_op  (ADD, SUB, MUL, DIV, CMPEQ, CMPLT, MAX)           # A + A -> A (all the same size)
-load_op    (EMPTY, CONST, FROM, CONTIGUOUS, CUSTOM)          # -> A   (initialize data on device)
-ternary_op (WHERE)                                           # A, A, A -> A
-```
-
-## mlops
-
-These are the mid level ops that handle the derivatives.
-```
-Relu, Log, Exp, Sin                            # unary ops
-Sum, Max                                       # reduce ops (with axis argument)
-Add, Sub, Mul, Div, Eq                         # binary ops (no broadcasting, use expand)
-Expand, Reshape, Permute, Pad, Shrink, Flip    # movement ops
-Where                                          # ternary ops
-```
-These are implemented in [function.py](/tinygrad/function.py).
-
-## hlops
-
-These are the syntax sugar. They are built on top of the mlops and support most of the things that you could expect from a tensor library.
-
-These are implemented in [tensor.py](/tinygrad/tensor.py).
diff --git a/docs-legacy/linearizer_v2.md b/docs-legacy/linearizer_v2.md
deleted file mode 100644
index 52aae91a0e..0000000000
--- a/docs-legacy/linearizer_v2.md
+++ /dev/null
@@ -1,27 +0,0 @@
-At base, the Linearizer a function that takes an AST + opts -> uops
-It should be rewritten like this. The AST can't be a LazyOp, because it should be able to have multiple outputs
-
-We need a generic class to represent DAGs.
-This refactor is probably a prereq for the new linearizer, and can be used on existing uops also.
-Can this class also represent the large graph? The op graph is a subset of the large graph.
-
-Currently the Linearizer is merging many concerns:
-
-1. LocalBuffers are added. These should be added to the upper DAG, for both grouping and tensor cores. Some opts are used here. NOTE: currently reduce splitting is done in lazy.py and it shouldn't be
-2. The ShapeTrackers at the edges are collected and modified according to the other opts.
-3. The Ops are toposorted.
-4. The Ops are lowered to UOps. This requires expansion and loop assignment, potentially to global dimensions
-5. The indexes into the Tensor are computed from the shapetrackers
-
-More generically, the whole network is a DAG. Ignore the forward/backward stuff, I'm fine with starting at the LazyBuffer level.
-
-1. Is it possible to put an entire network in a single kernel? I think the answer has to be yes, but you may end up doing an absolutely crazy amount of recomputation. This should still be doable to check correctness.
-2. You can use intermediate buffers, be they local or global, to do less compute.
-
-This is a rewrite of a lot of tinygrad. I don't think continuing to support Interpreted backends is worth it, have to deal with disk in a smart way.
-
-We keep the features and nn stuff = 793 lines
-We keep the frontend (Tensor -> LazyBuffer): tensor.py + mlops.py + lazy.py + dtype.py = 1032 lines
-We keep the shapetracker/symbolic (part of the frontend): shapetracker.py + view.py + symbolic.py = 603 lines
-Codegen is all rewritten. realize.py is simpler with the new codegen
-We keep the backend (uops renderer/runtime): cstyle.py/llvmir.py + device.py + ops_*.py = 1216 lines (less when we remove interpreted)
diff --git a/docs-legacy/reshape_without_symbolic.md b/docs-legacy/reshape_without_symbolic.md
deleted file mode 100644
index c6da7d7219..0000000000
--- a/docs-legacy/reshape_without_symbolic.md
+++ /dev/null
@@ -1,70 +0,0 @@
-## ["View.reshape without symbolic"](https://github.com/tinygrad/tinygrad/pull/2218)
-
-This section contains the sketch proof of "Complete, Fast and Correct View.reshapes without using Symbolic". The goal is to reduce multi-views which cost runtime.
-
-1. **old_shape = (s<sub>1</sub>,s<sub>2</sub>,...,s<sub>i</sub>,s<sub>(i+1)</sub>,...,s<sub>n</sub>)**
-2. **old_stride = (st<sub>1</sub>, st<sub>2</sub>, ... ,st<sub>i</sub>, st<sub>(i+1)</sub>, ..., st<sub>n</sub>)**
-3. **merge_old_shape = (p<sub>1</sub>, p<sub>2</sub>), where p<sub>1</sub> = s<sub>1</sub> * ... * s<sub>i</sub> & p<sub>2</sub> = s<sub>(i+1)</sub> * ... * s<sub>n</sub>**,
-4. **new_shape = (k<sub>1</sub>, ..., k<sub>p</sub>, k<sub>(p+1)</sub>, ..., k<sub>l</sub>)**
-5. **prod(new_shape) = p<sub>1</sub> * p<sub>2</sub>** (trivial)
-6. **mask** and **new_mask** represent valid indexes before & after reshape respectively.
- 
- 
-### Assumption
-
-**p<sub>1</sub>** & **p<sub>2</sub>** individually are mergeable (we will discuss later on this) & we cannot merge **p<sub>1</sub>** & **p<sub>2</sub>**.
-
-### Claim
-
-If **prod([k<sub>1</sub> ... k<sub>p</sub>]) < p<sub>1</sub>** and **prod([k<sub>1</sub> ... k<sub>(p+1)</sub>]) > p<sub>1</sub>**, reshape is not possible.
-
-**Proof**
-
-**k<sub>(p+1)</sub>** will require some dimensions from **p<sub>1</sub>** & some from **p<sub>2</sub>**, which means **p<sub>1</sub>** & **p<sub>2</sub>** should be mergeable, but they are not.
-
-**Conclusion**
-
-Hence, reshape is only possible **if ∃ a p, where prod([k<sub>1</sub> .. k<sub>p</sub>]) = p<sub>1</sub>**.
-
-
-### Conditions for mergeability
-
-**Case 1 - All non-zero strides**
-
-They will merge **if st<sub>x</sub> = st<sub>(x+1)</sub> * s<sub>(x+1)</sub>, where x ∈ [1, ..., i-1, i+1, ..., n-1]**.
-
-**Proof**
-
-Lets consider merging of **(s<sub>1</sub> ... s<sub>i</sub>) -> p<sub>1</sub>**, here we have to get a single new stride corresponding to **p<sub>1</sub>**. For which it has to be contiguous. 
-
-**Case 2 - Some stride is zero**
-
-Let **st<sub>j</sub> = 0 & st<sub>(j+1)</sub> != 0 & s<sub>(j+1)</sub> > 1, where 1 < j < i**.
-
-If **s<sub>j</sub> = 1** , reshape is trivial.
-
-If **s<sub>j</sub> > 1**,
-- If **mask<sub>j</sub>** has range > 1,
-	reshape is not possible, because **s<sub>(j+1)</sub>** will need to be repeated at-least once and a single stride can't capture repetition.
-- If **mask<sub>j</sub>** has range = 1,  reshape is possible, since it is virtually shape = 1, with some offset.
-
-
-
-### Conditions for reshaping mask
-
-**Case 1 - Splitting Dimension** - Mask shouldn't be cut for successful reshape.
-
-- **Example** - 
-[1,2,3,4,5,6,7,8] -> [[1,2,3,4], [5,6,7,8]] ; **mask** = ((2,6)) ; **new_mask[0]** = (0,2) (trivial split).
-
-- **new_mask[1]** = not possible. It is only possible if **mask spans [1-8] or lies within a single dimension [1-4] or [5-8]**.
-
-
-**Case 2 - Combining Dimension** - Mask should unfold continuously.
-
-- **Example** - **[[1,2],[3,4],[5,6]] -> [1,2,3,4,5,6]**;  **mask** = ((0,2),(0,2)).
-
-- **new_mask** = (0,4); only possible because **mask<sub>1</sub>** span the whole dimension.
-
-- If **mask<sub>1</sub>** did not span the whole dimension, the only way combining would be possible is if **mask<sub>0</sub>** had range 1 as shown below.
-	- **[[1,2,3],[4,5,6]] -> [1,2,3,4,5,6]**; **mask** = ((1,2),(0,2)); **new_mask** = ((3,5))
\ No newline at end of file
diff --git a/docs-legacy/abstractions2.py b/docs/abstractions2.py
similarity index 100%
rename from docs-legacy/abstractions2.py
rename to docs/abstractions2.py
diff --git a/docs-legacy/abstractions3.py b/docs/abstractions3.py
similarity index 100%
rename from docs-legacy/abstractions3.py
rename to docs/abstractions3.py
diff --git a/docs-legacy/env_vars.md b/docs/env_vars.md
similarity index 100%
rename from docs-legacy/env_vars.md
rename to docs/env_vars.md
diff --git a/docs-legacy/logo_tiny_dark.svg b/docs/logo_tiny_dark.svg
similarity index 100%
rename from docs-legacy/logo_tiny_dark.svg
rename to docs/logo_tiny_dark.svg
diff --git a/docs-legacy/logo_tiny_light.svg b/docs/logo_tiny_light.svg
similarity index 100%
rename from docs-legacy/logo_tiny_light.svg
rename to docs/logo_tiny_light.svg
diff --git a/docs/quickstart.md b/docs/quickstart.md
index ba2aaef4ab..94801a04b6 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -76,7 +76,7 @@ print(t6.numpy())
 ```
 
 There are a lot more operations that can be performed on tensors, you can find them in the [Tensor](tensor.md) file.
-Additionally reading through [abstractions2.py](https://github.com/tinygrad/tinygrad/blob/master/docs-legacy/abstractions2.py) will help you understand how operations on these tensors make their way down to your hardware.
+Additionally reading through [abstractions2.py](https://github.com/tinygrad/tinygrad/blob/master/docs/abstractions2.py) will help you understand how operations on these tensors make their way down to your hardware.
 
 ## Models
 
@@ -299,7 +299,7 @@ Many of the models in the [models/](https://github.com/tinygrad/tinygrad/tree/ma
 There exist a bunch of environment variables that control the runtime behavior of tinygrad.
 Some of the commons ones are `DEBUG` and the different backend enablement variables.
 
-You can find a full list and their descriptions in [env_vars.md](https://github.com/tinygrad/tinygrad/blob/master/docs-legacy/env_vars.md).
+You can find a full list and their descriptions in [env_vars.md](https://github.com/tinygrad/tinygrad/blob/master/docs/env_vars.md).
 
 ### Visualizing the Computation Graph
 
diff --git a/mkdocs.yml b/mkdocs.yml
index 9b5f8b09d6..433959b9b7 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -11,6 +11,7 @@ nav:
 - Showcase: showcase.md
 - Developer: developer.md
 - Function: function.md
+- Environment: env_vars.md
 #- tinygrad: reference/
 
 #extra_css:
diff --git a/ruff.toml b/ruff.toml
index b0a7913a43..fd32a4958d 100644
--- a/ruff.toml
+++ b/ruff.toml
@@ -27,7 +27,6 @@ line-length = 150
 
 exclude = [
   "docs/",
-  "docs-legacy/",
   "examples/",
   "extra/",
   "tinygrad/runtime/autogen",