merge gpt-pilot 0.2 codebase

This is a complete rewrite of the GPT Pilot core, from the ground up, making the agentic architecture front and center, and also fixing some long-standing problems with the database architecture that weren't feasible to solve without breaking compatibility. As the database structure and config file syntax have changed, we have automatic imports for projects and current configs, see the README.md file for details. This also relicenses the project to FSL-1.1-MIT license.
2026-01-07 12:23:57 -05:00 · 2024-05-22 21:42:25 +02:00
parent 391998ab67
commit 5b474ccc1f
203 changed files with 15412 additions and 0 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,35 @@
+name: Run unit tests
+
+on:
+  push:
+    branches: [ "main" ]
+  pull_request:
+    branches: [ "main" ]
+
+jobs:
+  build:
+
+    runs-on: ${{ matrix.os }}
+    timeout-minutes: 10
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.9", "3.12"]
+        os: [ubuntu-latest, macos-latest, windows-latest]
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v4
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: |
+        python -m pip install poetry
+        poetry install --with=dev
+    - name: Lint with ruff
+      run: poetry run ruff check --output-format github
+    - name: Check code style with ruff
+      run: poetry run ruff format --check --diff
+    - name: Test with pytest
+      run: poetry run pytest
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,18 @@
+__pycache__/
+.venv/
+.vscode/
+.idea/
+htmlcov/
+dist/
+workspace/
+
+.coverage
+*.code-workspace
+.*_cache
+.env
+*.pyc
+*.db
+config.json
+poetry.lock
+.DS_Store
+*.log
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,21 @@
+fail_fast: true
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.3.5
+    hooks:
+      # Run the linter.
+      - id: ruff
+        args: [ --fix ]
+      # Run the formatter.
+      - id: ruff-format
+  - repo: local
+    hooks:
+      # Run the tests
+      - id: pytest
+        name: pytest
+        stages: [commit]
+        types: [python]
+        entry: pytest
+        language: system
+        pass_filenames: false
--- a/110
+++ b/110
@@ -0,0 +1,110 @@
+# Functional Source License, Version 1.1, MIT Future License
+
+## Abbreviation
+
+FSL-1.1-MIT
+
+## Notice
+
+Copyright 2024 Pythagora Technologies, Inc.
+
+## Terms and Conditions
+
+### Licensor ("We")
+
+The party offering the Software under these Terms and Conditions.
+
+### The Software
+
+The "Software" is each version of the software that we make available under
+these Terms and Conditions, as indicated by our inclusion of these Terms and
+Conditions with the Software.
+
+### License Grant
+
+Subject to your compliance with this License Grant and the Patents,
+Redistribution and Trademark clauses below, we hereby grant you the right to
+use, copy, modify, create derivative works, publicly perform, publicly display
+and redistribute the Software for any Permitted Purpose identified below.
+
+### Permitted Purpose
+
+A Permitted Purpose is any purpose other than a Competing Use. A Competing Use
+means making the Software available to others in a commercial product or
+service that:
+
+1. substitutes for the Software;
+
+2. substitutes for any other product or service we offer using the Software
+   that exists as of the date we make the Software available; or
+
+3. offers the same or substantially similar functionality as the Software.
+
+Permitted Purposes specifically include using the Software:
+
+1. for your internal use and access;
+
+2. for non-commercial education;
+
+3. for non-commercial research; and
+
+4. in connection with professional services that you provide to a licensee
+   using the Software in accordance with these Terms and Conditions.
+
+### Patents
+
+To the extent your use for a Permitted Purpose would necessarily infringe our
+patents, the license grant above includes a license under our patents. If you
+make a claim against any party that the Software infringes or contributes to
+the infringement of any patent, then your patent license to the Software ends
+immediately.
+
+### Redistribution
+
+The Terms and Conditions apply to all copies, modifications and derivatives of
+the Software.
+
+If you redistribute any copies, modifications or derivatives of the Software,
+you must include a copy of or a link to these Terms and Conditions and not
+remove any copyright notices provided in or with the Software.
+
+### Disclaimer
+
+THE SOFTWARE IS PROVIDED "AS IS" AND WITHOUT WARRANTIES OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR
+PURPOSE, MERCHANTABILITY, TITLE OR NON-INFRINGEMENT.
+
+IN NO EVENT WILL WE HAVE ANY LIABILITY TO YOU ARISING OUT OF OR RELATED TO THE
+SOFTWARE, INCLUDING INDIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES,
+EVEN IF WE HAVE BEEN INFORMED OF THEIR POSSIBILITY IN ADVANCE.
+
+### Trademarks
+
+Except for displaying the License Details and identifying us as the origin of
+the Software, you have no right under these Terms and Conditions to use our
+trademarks, trade names, service marks or product names.
+
+## Grant of Future License
+
+We hereby irrevocably grant you an additional license to use the Software under
+the MIT license that is effective on the second anniversary of the date we make
+the Software available. On or after that date, you may use the Software under
+the MIT license, in which case the following will apply:
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,238 @@
+<div align="center">
+
+# 🧑‍✈️ GPT PILOT 🧑‍✈️
+
+</div>
+
+---
+
+<div align="center">
+
+[![Discord Follow](https://dcbadge.vercel.app/api/server/HaqXugmxr9?style=flat)](https://discord.gg/HaqXugmxr9)
+[![GitHub Repo stars](https://img.shields.io/github/stars/Pythagora-io/gpt-pilot?style=social)](https://github.com/Pythagora-io/gpt-pilot)
+[![Twitter Follow](https://img.shields.io/twitter/follow/HiPythagora?style=social)](https://twitter.com/HiPythagora)
+
+</div>
+
+---
+
+<div align="center">
+<a href="https://www.ycombinator.com/" target="_blank"><img src="https://s3.amazonaws.com/assets.pythagora.ai/yc/PNG/Black.png" alt="Pythagora-io%2Fgpt-pilot | Trendshift" style="width: 250px; height: 93px;"/></a>
+</div>
+<br>
+<div align="center">
+<a href="https://trendshift.io/repositories/466" target="_blank"><img src="https://trendshift.io/api/badge/repositories/466" alt="Pythagora-io%2Fgpt-pilot | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
+</div>
+
+<br>
+<br>
+
+<div align="center">
+
+### GPT Pilot doesn't just generate code, it builds apps!
+
+</div>
+
+---
+<div align="center">
+
+[![See it in action](https://i3.ytimg.com/vi/4g-1cPGK0GA/maxresdefault.jpg)](https://youtu.be/4g-1cPGK0GA)
+
+(click to open the video in YouTube) (1:40min)
+
+</div>
+
+---
+
+<div align="center">
+
+<a href="vscode:extension/PythagoraTechnologies.gpt-pilot-vs-code" target="_blank"><img src="https://github.com/Pythagora-io/gpt-pilot/assets/10895136/5792143e-77c7-47dd-ad96-6902be1501cd" alt="Pythagora-io%2Fgpt-pilot | Trendshift" style="width: 185px; height: 55px;" width="185" height="55"/></a>
+
+</div>
+
+GPT Pilot is the core technology for the [Pythagora VS Code extension](https://bit.ly/3IeZxp6) that aims to provide **the first real AI developer companion**. Not just an autocomplete or a helper for PR messages but rather a real AI developer that can write full features, debug them, talk to you about issues, ask for review, etc.
+
+---
+
+📫 If you would like to get updates on future releases or just get in touch, join our [Discord server](https://discord.gg/HaqXugmxr9) or you [can add your email here](http://eepurl.com/iD6Mpo). 📬
+
+---
+
+<!-- TOC -->
+* [🔌 Requirements](#-requirements)
+* [🚦How to start using gpt-pilot?](#how-to-start-using-gpt-pilot)
+* [🔎 Examples](#-examples)
+* [🐳 How to start gpt-pilot in docker?](#-how-to-start-gpt-pilot-in-docker)
+* [🧑‍💻️ CLI arguments](#-cli-arguments)
+* [🏗 How GPT Pilot works?](#-how-gpt-pilot-works)
+* [🕴How's GPT Pilot different from _Smol developer_ and _GPT engineer_?](#hows-gpt-pilot-different-from-smol-developer-and-gpt-engineer)
+* [🍻 Contributing](#-contributing)
+* [🔗 Connect with us](#-connect-with-us)
+* [🌟 Star history](#-star-history)
+<!-- TOC -->
+
+---
+
+GPT Pilot aims to research how much LLMs can be utilized to generate fully working, production-ready apps while the developer oversees the implementation.
+
+**The main idea is that AI can write most of the code for an app (maybe 95%), but for the rest, 5%, a developer is and will be needed until we get full AGI**.
+
+If you are interested in our learnings during this project, you can check [our latest blog posts](https://blog.pythagora.ai/2024/02/19/gpt-pilot-what-did-we-learn-in-6-months-of-working-on-a-codegen-pair-programmer/).
+
+---
+
+<br>
+
+<div align="center">
+
+### **[👉 Examples of apps written by GPT Pilot 👈](https://github.com/Pythagora-io/gpt-pilot/wiki/Apps-created-with-GPT-Pilot)**
+
+</div>
+<br>
+
+---
+
+# 🔌 Requirements
+
+- **Python 3.9+**
+
+# 🚦How to start using gpt-pilot?
+👉 If you are using VS Code as your IDE, the easiest way to start is by downloading [GPT Pilot VS Code extension](https://bit.ly/3IeZxp6). 👈
+
+Otherwise, you can use the CLI tool.
+
+### If you're new to GPT Pilot:
+
+After you have Python and (optionally) PostgreSQL installed, follow these steps:
+
+1. `git clone https://github.com/Pythagora-io/gpt-pilot.git` (clone the repo)
+2. `cd gpt-pilot` (go to the repo folder)
+3. `python -m venv venv` (create a virtual environment)
+4. `source venv/bin/activate` (or on Windows `venv\Scripts\activate`) (activate the virtual environment)
+5. `pip install -r requirements.txt` (install the dependencies)
+6. `cp example-config.json config.json` (create `config.json` file)
+7. Set your key and other settings in `config.json` file:
+   - LLM Provider (`openai`, `anthropic` or `groq`) key and endpoints (leave `null` for default) (note that Azure and OpenRouter are suppored via the `openai` setting)
+   - Your API key (if `null`, will be read from the environment variables)
+   - database settings: sqlite is used by default, PostgreSQL should also work
+   - optionally update `fs.ignore_paths` and add files or folders which shouldn't be tracked by GPT Pilot in workspace, useful to ignore folders created by compilers
+8. `python main.py` (start GPT Pilot)
+
+All generated code will be stored in the folder `workspace` inside the folder named after the app name you enter upon starting the pilot.
+
+### If you're upgrading from GPT Pilot v0.1
+
+Assuming you already have the git repository with an earlier version:
+
+1. `git pull` (update the repo)
+2. `source pilot-env/bin/activate` (or on Windows `pilot-env\Scripts\activate`) (activate the virtual environment)
+3. `pip install -r requirements.txt` (install the new dependencies)
+4. `python main.py --import-v0 pilot/gpt-pilot` (this should import your settings and existing projects)
+
+This will create a new database `pythagora.db` and import all apps from the old database. For each app,
+it will import the start of the latest task you were working on.
+
+To verify that the import was successful, you can run `python main.py --list` to see all the apps you have created,
+and check `config.json` to check the settings were correctly converted to the new config file format (and make
+any adjustments if needed).
+
+# 🔎 [Examples](https://github.com/Pythagora-io/gpt-pilot/wiki/Apps-created-with-GPT-Pilot)
+
+[Click here](https://github.com/Pythagora-io/gpt-pilot/wiki/Apps-created-with-GPT-Pilot) to see all example apps created with GPT Pilot.
+
+## 🐳 How to start gpt-pilot in docker?
+1. `git clone https://github.com/Pythagora-io/gpt-pilot.git` (clone the repo)
+2. Update the `docker-compose.yml` environment variables, which can be done via `docker compose config`. If you wish to use a local model, please go to [https://localai.io/basics/getting_started/](https://localai.io/basics/getting_started/).
+3. By default, GPT Pilot will read & write to `~/gpt-pilot-workspace` on your machine, you can also edit this in `docker-compose.yml`
+4. run `docker compose build`. this will build a gpt-pilot container for you.
+5. run `docker compose up`.
+6. access the web terminal on `port 7681`
+7. `python main.py` (start GPT Pilot)
+
+This will start two containers, one being a new image built by the `Dockerfile` and a Postgres database. The new image also has [ttyd](https://github.com/tsl0922/ttyd) installed so that you can easily interact with gpt-pilot. Node is also installed on the image and port 3000 is exposed.
+
+
+# 🧑‍💻️ CLI arguments
+
+### List created projects (apps)
+
+```bash
+python main.py --list
+```
+
+Note: for each project (app), this also lists "branches". Currently we only support having one branch (called "main"), and in the future we plan to add support for multiple project branches.
+
+### Load and continue from the latest step in a project (app)
+
+```bash
+python main.py --project <app_id>
+```
+
+### Load and continue from a specific step in a project (app)
+
+```bash
+python main.py --project <app_id> --step <step>
+```
+
+Warning: this will delete all progress after the specified step!
+
+### Delete project (app)
+
+```bash
+python main.py --delete <app_id>
+```
+
+Delete project with the specified `app_id`. Warning: this cannot be undone!
+
+### Import projects from v0.1
+
+```bash
+python main.py --import-v0 <path>
+```
+
+This will import projects from the old GPT Pilot v0.1 database. The path should be the path to the old GPT Pilot v0.1 database. For each project, it will import the start of the latest task you were working on. If the project was already imported, the import procedure will skip it (won't overwrite the project in the database).
+
+### Other command-line options
+
+There are several other command-line options that mostly support calling GPT Pilot from our VSCode extension. To see all the available options, use the `--help` flag:
+
+```bash
+python main.py --help
+```
+
+# 🏗 How GPT Pilot works?
+Here are the steps GPT Pilot takes to create an app:
+
+1. You enter the app name and the description.
+2. **Product Owner agent** like in real life, does nothing. :)
+3. **Specification Writer agent** asks a couple of questions to understand the requirements better if project description is not good enough.
+4. **Architect agent** writes up technologies that will be used for the app and checks if all technologies are installed on the machine and installs them if not.
+5. **Tech Lead agent** writes up development tasks that the Developer must implement.
+6. **Developer agent** takes each task and writes up what needs to be done to implement it. The description is in human-readable form.
+7. **Code Monkey agent** takes the Developer's description and the existing file and implements the changes.
+8. **Reviewer agent** reviews every step of the task and if something is done wrong Reviewer sends it back to Code Monkey.
+9. **Troubleshooter agent** helps you to give good feedback to GPT Pilot when something is wrong.
+10. **Debugger agent** hate to see him, but he is your best friend when things go south.
+11. **Technical Writer agent** writes documentation for the project.
+
+<br>
+
+# 🕴How's GPT Pilot different from _Smol developer_ and _GPT engineer_?
+
+- **GPT Pilot works with the developer to create a fully working production-ready app** - I don't think AI can (at least in the near future) create apps without a developer being involved. So, **GPT Pilot codes the app step by step** just like a developer would in real life. This way, it can debug issues as they arise throughout the development process. If it gets stuck, you, the developer in charge, can review the code and fix the issue. Other similar tools give you the entire codebase at once - this way, bugs are much harder to fix for AI and for you as a developer.
+  <br><br>
+- **Works at scale** - GPT Pilot isn't meant to create simple apps but rather so it can work at any scale. It has mechanisms that filter out the code, so in each LLM conversation, it doesn't need to store the entire codebase in context, but it shows the LLM only the relevant code for the current task it's working on. Once an app is finished, you can continue working on it by writing instructions on what feature you want to add.
+
+# 🍻 Contributing
+If you are interested in contributing to GPT Pilot, join [our Discord server](https://discord.gg/HaqXugmxr9), check out open [GitHub issues](https://github.com/Pythagora-io/gpt-pilot/issues), and see if anything interests you. We would be happy to get help in resolving any of those. The best place to start is by reviewing blog posts mentioned above to understand how the architecture works before diving into the codebase.
+
+## 🖥 Development
+Other than the research, GPT Pilot needs to be debugged to work in different scenarios. For example, we realized that the quality of the code generated is very sensitive to the size of the development task. When the task is too broad, the code has too many bugs that are hard to fix, but when the development task is too narrow, GPT also seems to struggle in getting the task implemented into the existing code.
+
+## 📊 Telemetry
+To improve GPT Pilot, we are tracking some events from which you can opt out at any time. You can read more about it [here](./docs/TELEMETRY.md).
+
+# 🔗 Connect with us
+🌟 As an open-source tool, it would mean the world to us if you starred the GPT-pilot repo 🌟
+
+💬 Join [the Discord server](https://discord.gg/HaqXugmxr9) to get in touch.
--- a/core/agents/init.py
+++ b/core/agents/init.py
--- a/core/agents/architect.py
+++ b/core/agents/architect.py
@@ -0,0 +1,146 @@
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.llm.parser import JSONParser
+from core.telemetry import telemetry
+from core.templates.registry import PROJECT_TEMPLATES, ProjectTemplateEnum
+from core.ui.base import ProjectStage
+
+ARCHITECTURE_STEP = "architecture"
+WARN_SYSTEM_DEPS = ["docker", "kubernetes", "microservices"]
+WARN_FRAMEWORKS = ["next.js", "vue", "vue.js", "svelte", "angular"]
+WARN_FRAMEWORKS_URL = "https://github.com/Pythagora-io/gpt-pilot/wiki/Using-GPT-Pilot-with-frontend-frameworks"
+
+
+# FIXME: all the reponse pydantic models should be strict (see config._StrictModel), also check if we
+# can disallow adding custom Python attributes to the model
+class SystemDependency(BaseModel):
+    name: str = Field(
+        None,
+        description="Name of the system dependency, for example Node.js or Python.",
+    )
+    description: str = Field(
+        None,
+        description="One-line description of the dependency.",
+    )
+    test: str = Field(
+        None,
+        description="Command line to test whether the dependency is available on the system.",
+    )
+    required_locally: bool = Field(
+        None,
+        description="Whether this dependency must be installed locally (as opposed to connecting to cloud or other server)",
+    )
+
+
+class PackageDependency(BaseModel):
+    name: str = Field(
+        None,
+        description="Name of the package dependency, for example Express or React.",
+    )
+    description: str = Field(
+        None,
+        description="One-line description of the dependency.",
+    )
+
+
+class Architecture(BaseModel):
+    architecture: str = Field(
+        None,
+        description="General description of the app architecture.",
+    )
+    system_dependencies: list[SystemDependency] = Field(
+        None,
+        description="List of system dependencies required to build and run the app.",
+    )
+    package_dependencies: list[PackageDependency] = Field(
+        None,
+        description="List of framework/language-specific packages used by the app.",
+    )
+    template: Optional[ProjectTemplateEnum] = Field(
+        None,
+        description="Project template to use for the app, if any (optional, can be null).",
+    )
+
+
+class Architect(BaseAgent):
+    agent_type = "architect"
+    display_name = "Architect"
+
+    async def run(self) -> AgentResponse:
+        await self.ui.send_project_stage(ProjectStage.ARCHITECTURE)
+
+        llm = self.get_llm()
+        convo = AgentConvo(self).template("technologies", templates=PROJECT_TEMPLATES).require_schema(Architecture)
+
+        await self.send_message("Planning project architecture ...")
+        arch: Architecture = await llm(convo, parser=JSONParser(Architecture))
+
+        await self.check_compatibility(arch)
+        await self.check_system_dependencies(arch.system_dependencies)
+
+        spec = self.current_state.specification.clone()
+        spec.architecture = arch.architecture
+        spec.system_dependencies = [d.model_dump() for d in arch.system_dependencies]
+        spec.package_dependencies = [d.model_dump() for d in arch.package_dependencies]
+        spec.template = arch.template.value if arch.template else None
+
+        self.next_state.specification = spec
+        telemetry.set(
+            "architecture",
+            {
+                "description": spec.architecture,
+                "system_dependencies": spec.system_dependencies,
+                "package_dependencies": spec.package_dependencies,
+            },
+        )
+        telemetry.set("template", spec.template)
+        return AgentResponse.done(self)
+
+    async def check_compatibility(self, arch: Architecture) -> bool:
+        warn_system_deps = [dep.name for dep in arch.system_dependencies if dep.name.lower() in WARN_SYSTEM_DEPS]
+        warn_package_deps = [dep.name for dep in arch.package_dependencies if dep.name.lower() in WARN_FRAMEWORKS]
+
+        if warn_system_deps:
+            await self.ask_question(
+                f"Warning: GPT Pilot doesn't officially support {', '.join(warn_system_deps)}. "
+                f"You can try to use {'it' if len(warn_system_deps) == 1 else 'them'}, but you may run into problems.",
+                buttons={"continue": "Continue"},
+                buttons_only=True,
+                default="continue",
+            )
+
+        if warn_package_deps:
+            await self.ask_question(
+                f"Warning: GPT Pilot works best with vanilla JavaScript. "
+                f"You can try try to use {', '.join(warn_package_deps)}, but you may run into problems. "
+                f"Visit {WARN_FRAMEWORKS_URL} for more information.",
+                buttons={"continue": "Continue"},
+                buttons_only=True,
+                default="continue",
+            )
+
+        # TODO: add "cancel" option to the above buttons; if pressed, Architect should
+        # return AgentResponse.revise_spec()
+        # that SpecWriter should catch and allow the user to reword the initial spec.
+        return True
+
+    async def check_system_dependencies(self, deps: list[SystemDependency]):
+        """
+        Check whether the required system dependencies are installed.
+        """
+
+        for dep in deps:
+            status_code, _, _ = await self.process_manager.run_command(dep.test)
+            if status_code != 0:
+                if dep.required_locally:
+                    remedy = "Please install it before proceeding with your app."
+                else:
+                    remedy = "If you would like to use it locally, please install it before proceeding."
+                await self.send_message(f"❌ {dep.name} is not available. {remedy}")
+            else:
+                await self.send_message(f"✅ {dep.name} is available.")
--- a/core/agents/base.py
+++ b/core/agents/base.py
@@ -0,0 +1,174 @@
+from typing import Any, Callable, Optional
+
+from core.agents.response import AgentResponse
+from core.config import get_config
+from core.db.models import ProjectState
+from core.llm.base import BaseLLMClient, LLMError
+from core.log import get_logger
+from core.proc.process_manager import ProcessManager
+from core.state.state_manager import StateManager
+from core.ui.base import AgentSource, UIBase, UserInput
+
+log = get_logger(__name__)
+
+
+class BaseAgent:
+    """
+    Base class for agents.
+    """
+
+    agent_type: str
+    display_name: str
+
+    def __init__(
+        self,
+        state_manager: StateManager,
+        ui: UIBase,
+        *,
+        step: Optional[Any] = None,
+        prev_response: Optional["AgentResponse"] = None,
+        process_manager: Optional["ProcessManager"] = None,
+    ):
+        """
+        Create a new agent.
+        """
+        self.ui_source = AgentSource(self.display_name, self.agent_type)
+        self.ui = ui
+        self.stream_output = True
+        self.state_manager = state_manager
+        self.process_manager = process_manager
+        self.prev_response = prev_response
+        self.step = step
+
+    @property
+    def current_state(self) -> ProjectState:
+        """Current state of the project (read-only)."""
+        return self.state_manager.current_state
+
+    @property
+    def next_state(self) -> ProjectState:
+        """Next state of the project (write-only)."""
+        return self.state_manager.next_state
+
+    async def send_message(self, message: str):
+        """
+        Send a message to the user.
+
+        Convenience method, uses `UIBase.send_message()` to send the message,
+        setting the correct source.
+
+        :param message: Message to send.
+        """
+        await self.ui.send_message(message + "\n", source=self.ui_source)
+
+    async def ask_question(
+        self,
+        question: str,
+        *,
+        buttons: Optional[dict[str, str]] = None,
+        default: Optional[str] = None,
+        buttons_only: bool = False,
+        initial_text: Optional[str] = None,
+        allow_empty: bool = False,
+        hint: Optional[str] = None,
+    ) -> UserInput:
+        """
+        Ask a question to the user and return the response.
+
+        Convenience method, uses `UIBase.ask_question()` to
+        ask the question, setting the correct source and
+        logging the question/response.
+
+        :param question: Question to ask.
+        :param buttons: Buttons to display with the question.
+        :param default: Default button to select.
+        :param buttons_only: Only display buttons, no text input.
+        :param allow_empty: Allow empty input.
+        :param hint: Text to display in a popup as a hint to the question.
+        :param initial_text: Initial text input.
+        :return: User response.
+        """
+        response = await self.ui.ask_question(
+            question,
+            buttons=buttons,
+            default=default,
+            buttons_only=buttons_only,
+            allow_empty=allow_empty,
+            hint=hint,
+            initial_text=initial_text,
+            source=self.ui_source,
+        )
+        await self.state_manager.log_user_input(question, response)
+        return response
+
+    async def stream_handler(self, content: str):
+        """
+        Handle streamed response from the LLM.
+
+        Serves as a callback to `AgentBase.llm()` so it can stream the responses to the UI.
+        This can be turned on/off on a pe-request basis by setting `BaseAgent.stream_output`
+        to True or False.
+
+        :param content: Response content.
+        """
+        if self.stream_output:
+            await self.ui.send_stream_chunk(content, source=self.ui_source)
+
+        if content is None:
+            await self.ui.send_message("")
+
+    async def error_handler(self, error: LLMError, message: Optional[str] = None):
+        """
+        Handle error responses from the LLM.
+
+        :param error: The exception that was thrown the the LLM client.
+        :param message: Optional message to show.
+        """
+
+        if error == LLMError.KEY_EXPIRED:
+            await self.ui.send_key_expired(message)
+        elif error == LLMError.RATE_LIMITED:
+            await self.stream_handler(message)
+
+    def get_llm(self, name=None) -> Callable:
+        """
+        Get a new instance of the agent-specific LLM client.
+
+        The client initializes the UI stream handler and stores the
+        request/response to the current state's log. The agent name
+        can be overridden in case the agent needs to use a different
+        model configuration.
+
+        :param name: Name of the agent for configuration (default: class name).
+        :return: LLM client for the agent.
+        """
+
+        if name is None:
+            name = self.__class__.__name__
+
+        config = get_config()
+
+        llm_config = config.llm_for_agent(name)
+        client_class = BaseLLMClient.for_provider(llm_config.provider)
+        llm_client = client_class(llm_config, stream_handler=self.stream_handler, error_handler=self.error_handler)
+
+        async def client(convo, **kwargs) -> Any:
+            """
+            Agent-specific LLM client.
+
+            For details on optional arguments to pass to the LLM client,
+            see `pythagora.llm.openai_client.OpenAIClient()`.
+            """
+            response, request_log = await llm_client(convo, **kwargs)
+            await self.state_manager.log_llm_request(request_log, agent=self)
+            return response
+
+        return client
+
+    async def run() -> AgentResponse:
+        """
+        Run the agent.
+
+        :return: Response from the agent.
+        """
+        raise NotImplementedError()
--- a/core/agents/code_monkey.py
+++ b/core/agents/code_monkey.py
@@ -0,0 +1,127 @@
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse, ResponseType
+from core.config import DESCRIBE_FILES_AGENT_NAME
+from core.llm.parser import JSONParser, OptionalCodeBlockParser
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class FileDescription(BaseModel):
+    summary: str = Field(
+        description="Detailed description summarized what the file is about, and what the major classes, functions, elements or other functionality is implemented."
+    )
+    references: list[str] = Field(
+        description="List of references the file imports or includes (only files local to the project), where each element specifies the project-relative path of the referenced file, including the file extension."
+    )
+
+
+class CodeMonkey(BaseAgent):
+    agent_type = "code-monkey"
+    display_name = "Code Monkey"
+
+    async def run(self) -> AgentResponse:
+        if self.prev_response and self.prev_response.type == ResponseType.DESCRIBE_FILES:
+            return await self.describe_files()
+        else:
+            return await self.implement_changes()
+
+    def _get_task_convo(self) -> AgentConvo:
+        # FIXME: Current prompts reuse task breakdown / iteration messages so we have to resort to this
+        task = self.current_state.current_task
+        current_task_index = self.current_state.tasks.index(task)
+
+        convo = AgentConvo(self).template(
+            "breakdown",
+            task=task,
+            iteration=None,
+            current_task_index=current_task_index,
+        )
+        # TODO: We currently show last iteration to the code monkey; we might need to show the task
+        # breakdown and all the iterations instead? To think about when refactoring prompts
+        if self.current_state.iterations:
+            convo.assistant(self.current_state.iterations[-1]["description"])
+        else:
+            convo.assistant(self.current_state.current_task["instructions"])
+        return convo
+
+    async def implement_changes(self) -> AgentResponse:
+        file_name = self.step["save_file"]["path"]
+
+        current_file = await self.state_manager.get_file_by_path(file_name)
+        file_content = current_file.content.content if current_file else ""
+
+        task = self.current_state.current_task
+
+        if self.prev_response and self.prev_response.type == ResponseType.CODE_REVIEW_FEEDBACK:
+            attempt = self.prev_response.data["attempt"] + 1
+            feedback = self.prev_response.data["feedback"]
+            log.debug(f"Fixing file {file_name} after review feedback: {feedback} ({attempt}. attempt)")
+            await self.send_message(f"Reworking changes I made to {file_name} ...")
+        else:
+            log.debug(f"Implementing file {file_name}")
+            await self.send_message(f"{'Updating existing' if file_content else 'Creating new'} file {file_name} ...")
+            attempt = 1
+            feedback = None
+
+        llm = self.get_llm()
+        convo = self._get_task_convo().template(
+            "implement_changes",
+            file_name=file_name,
+            file_content=file_content,
+            instructions=task["instructions"],
+        )
+        if feedback:
+            convo.assistant(f"```\n{self.prev_response.data['new_content']}\n```\n").template(
+                "review_feedback",
+                content=self.prev_response.data["approved_content"],
+                original_content=file_content,
+                rework_feedback=feedback,
+            )
+
+        response: str = await llm(convo, temperature=0, parser=OptionalCodeBlockParser())
+        # FIXME: provide a counter here so that we don't have an endless loop here
+        return AgentResponse.code_review(self, file_name, task["instructions"], file_content, response, attempt)
+
+    async def describe_files(self) -> AgentResponse:
+        llm = self.get_llm(DESCRIBE_FILES_AGENT_NAME)
+        to_describe = {
+            file.path: file.content.content for file in self.current_state.files if not file.meta.get("description")
+        }
+
+        for file in self.next_state.files:
+            content = to_describe.get(file.path)
+            if content is None:
+                continue
+
+            if content == "":
+                file.meta = {
+                    **file.meta,
+                    "description": "Empty file",
+                    "references": [],
+                }
+                continue
+
+            log.debug(f"Describing file {file.path}")
+            await self.send_message(f"Describing file {file.path} ...")
+
+            convo = (
+                AgentConvo(self)
+                .template(
+                    "describe_file",
+                    path=file.path,
+                    content=content,
+                )
+                .require_schema(FileDescription)
+            )
+            llm_response: FileDescription = await llm(convo, parser=JSONParser(spec=FileDescription))
+
+            file.meta = {
+                **file.meta,
+                "description": llm_response.summary,
+                "references": llm_response.references,
+            }
+        return AgentResponse.done(self)
--- a/core/agents/code_reviewer.py
+++ b/core/agents/code_reviewer.py
@@ -0,0 +1,328 @@
+import re
+from difflib import unified_diff
+from enum import Enum
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.llm.parser import JSONParser
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+# Constant for indicating missing new line at the end of a file in a unified diff
+NO_EOL = "\\ No newline at end of file"
+
+# Regular expression pattern for matching hunk headers
+PATCH_HEADER_PATTERN = re.compile(r"^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@")
+
+# Maximum number of attempts to ask for review if it can't be parsed
+MAX_REVIEW_RETRIES = 2
+
+# Maximum number of code implementation attempts after which we accept the changes unconditionaly
+MAX_CODING_ATTEMPTS = 3
+
+
+class Decision(str, Enum):
+    APPLY = "apply"
+    IGNORE = "ignore"
+    REWORK = "rework"
+
+
+class Hunk(BaseModel):
+    number: int = Field(description="Index of the hunk in the diff. Starts from 1.")
+    reason: str = Field(description="Reason for applying or ignoring this hunk, or for asking for it to be reworked.")
+    decision: Decision = Field(description="Whether to apply this hunk, rework, or ignore it.")
+
+
+class ReviewChanges(BaseModel):
+    hunks: list[Hunk]
+    review_notes: str = Field(description="Additional review notes (optional, can be empty).")
+
+
+class CodeReviewer(BaseAgent):
+    agent_type = "code-reviewer"
+    display_name = "Code Reviewer"
+
+    async def run(self) -> AgentResponse:
+        if (
+            not self.prev_response.data["old_content"]
+            or self.prev_response.data["new_content"] == self.prev_response.data["old_content"]
+            or self.prev_response.data["attempt"] >= MAX_CODING_ATTEMPTS
+        ):
+            # we always auto-accept new files and unchanged files, or if we've tried too many times
+            return await self.accept_changes(self.prev_response.data["path"], self.prev_response.data["new_content"])
+
+        approved_content, feedback = await self.review_change(
+            self.prev_response.data["path"],
+            self.prev_response.data["instructions"],
+            self.prev_response.data["old_content"],
+            self.prev_response.data["new_content"],
+        )
+        if feedback:
+            return AgentResponse.code_review_feedback(
+                self,
+                new_content=self.prev_response.data["new_content"],
+                approved_content=approved_content,
+                feedback=feedback,
+                attempt=self.prev_response.data["attempt"],
+            )
+        else:
+            return await self.accept_changes(self.prev_response.data["path"], approved_content)
+
+    async def accept_changes(self, path: str, content: str) -> AgentResponse:
+        await self.state_manager.save_file(path, content)
+        self.next_state.complete_step()
+
+        input_required = self.state_manager.get_input_required(content)
+        if input_required:
+            return AgentResponse.input_required(
+                self,
+                [{"file": path, "line": line} for line in input_required],
+            )
+        else:
+            return AgentResponse.done(self)
+
+    def _get_task_convo(self) -> AgentConvo:
+        # FIXME: Current prompts reuse conversation from the developer so we have to resort to this
+        task = self.current_state.current_task
+        current_task_index = self.current_state.tasks.index(task)
+
+        convo = AgentConvo(self).template(
+            "breakdown",
+            task=task,
+            iteration=None,
+            current_task_index=current_task_index,
+        )
+        # TODO: We currently show last iteration to the code monkey; we might need to show the task
+        # breakdown and all the iterations instead? To think about when refactoring prompts
+        if self.current_state.iterations:
+            convo.assistant(self.current_state.iterations[-1]["description"])
+        else:
+            convo.assistant(self.current_state.current_task["instructions"])
+        return convo
+
+    async def review_change(
+        self, file_name: str, instructions: str, old_content: str, new_content: str
+    ) -> tuple[str, str]:
+        """
+        Review changes that were applied to the file.
+
+        This asks the LLM to act as a PR reviewer and for each part (hunk) of the
+        diff, decide if it should be applied (kept) or ignored (removed from the PR).
+
+        :param file_name: name of the file being modified
+        :param instructions: instructions for the reviewer
+        :param old_content: old file content
+        :param new_content: new file content (with proposed changes)
+        :return: tuple with file content update with approved changes, and review feedback
+
+        Diff hunk explanation: https://www.gnu.org/software/diffutils/manual/html_node/Hunks.html
+        """
+
+        hunks = self.get_diff_hunks(file_name, old_content, new_content)
+
+        llm = self.get_llm()
+        convo = (
+            self._get_task_convo()
+            .template(
+                "review_changes",
+                instructions=instructions,
+                file_name=file_name,
+                old_content=old_content,
+                hunks=hunks,
+            )
+            .require_schema(ReviewChanges)
+        )
+        llm_response: ReviewChanges = await llm(convo, temperature=0, parser=JSONParser(ReviewChanges))
+
+        for i in range(MAX_REVIEW_RETRIES):
+            reasons = {}
+            ids_to_apply = set()
+            ids_to_ignore = set()
+            ids_to_rework = set()
+            for hunk in llm_response.hunks:
+                reasons[hunk.number - 1] = hunk.reason
+                if hunk.decision == "apply":
+                    ids_to_apply.add(hunk.number - 1)
+                elif hunk.decision == "ignore":
+                    ids_to_ignore.add(hunk.number - 1)
+                elif hunk.decision == "rework":
+                    ids_to_rework.add(hunk.number - 1)
+
+            n_hunks = len(hunks)
+            n_review_hunks = len(reasons)
+            if n_review_hunks == n_hunks:
+                break
+            elif n_review_hunks < n_hunks:
+                error = "Not all hunks have been reviewed. Please review all hunks and add 'apply', 'ignore' or 'rework' decision for each."
+            elif n_review_hunks > n_hunks:
+                error = f"Your review contains more hunks ({n_review_hunks}) than in the original diff ({n_hunks}). Note that one hunk may have multiple changed lines."
+
+            # Max two retries; if the reviewer still hasn't reviewed all hunks, we'll just use the entire new content
+            convo.assistant(llm_response.model_dump_json()).user(error)
+            llm_response = await llm(convo, parser=JSONParser(ReviewChanges))
+        else:
+            return new_content, None
+
+        hunks_to_apply = [h for i, h in enumerate(hunks) if i in ids_to_apply]
+        diff_log = f"--- {file_name}\n+++ {file_name}\n" + "\n".join(hunks_to_apply)
+
+        hunks_to_rework = [(i, h) for i, h in enumerate(hunks) if i in ids_to_rework]
+        review_log = (
+            "\n\n".join([f"## Change\n```{hunk}```\nReviewer feedback:\n{reasons[i]}" for (i, hunk) in hunks_to_rework])
+            + "\n\nReview notes:\n"
+            + llm_response.review_notes
+        )
+
+        if len(hunks_to_apply) == len(hunks):
+            await self.send_message("Applying entire change")
+            log.info(f"Applying entire change to {file_name}")
+            return new_content, None
+
+        elif len(hunks_to_apply) == 0:
+            if hunks_to_rework:
+                await self.send_message(
+                    f"Requesting rework for {len(hunks_to_rework)} changes with reason: {llm_response.review_notes}"
+                )
+                log.info(f"Requesting rework for {len(hunks_to_rework)} changes to {file_name} (0 hunks to apply)")
+                return old_content, review_log
+            else:
+                # If everything can be safely ignored, it's probably because the files already implement the changes
+                # from previous tasks (which can happen often). Insisting on a change here is likely to cause problems.
+                await self.send_message(f"Rejecting entire change with reason: {llm_response.review_notes}")
+                log.info(f"Rejecting entire change to {file_name} with reason: {llm_response.review_notes}")
+                return old_content, None
+
+        print("Applying code change:\n" + diff_log)
+        log.info(f"Applying code change to {file_name}:\n{diff_log}")
+        new_content = self.apply_diff(file_name, old_content, hunks_to_apply, new_content)
+        if hunks_to_rework:
+            print(f"Requesting rework for {len(hunks_to_rework)} changes with reason: {llm_response.review_notes}")
+            log.info(f"Requesting further rework for {len(hunks_to_rework)} changes to {file_name}")
+            return new_content, review_log
+        else:
+            return new_content, None
+
+    @staticmethod
+    def get_diff_hunks(file_name: str, old_content: str, new_content: str) -> list[str]:
+        """
+        Get the diff between two files.
+
+        This uses Python difflib to produce an unified diff, then splits
+        it into hunks that will be separately reviewed by the reviewer.
+
+        :param file_name: name of the file being modified
+        :param old_content: old file content
+        :param new_content: new file content
+        :return: change hunks from the unified diff
+        """
+        from_name = "old_" + file_name
+        to_name = "to_" + file_name
+        from_lines = old_content.splitlines(keepends=True)
+        to_lines = new_content.splitlines(keepends=True)
+        diff_gen = unified_diff(from_lines, to_lines, fromfile=from_name, tofile=to_name)
+        diff_txt = "".join(diff_gen)
+
+        hunks = re.split(r"\n@@", diff_txt, re.MULTILINE)
+        result = []
+        for i, h in enumerate(hunks):
+            # Skip the prologue (file names)
+            if i == 0:
+                continue
+            txt = h.splitlines()
+            txt[0] = "@@" + txt[0]
+            result.append("\n".join(txt))
+        return result
+
+    def apply_diff(self, file_name: str, old_content: str, hunks: list[str], fallback: str):
+        """
+        Apply the diff to the original file content.
+
+        This uses the internal `_apply_patch` method to apply the
+        approved diff hunks to the original file content.
+
+        If patch apply fails, the fallback is the full new file content
+        with all the changes applied (as if the reviewer approved everythng).
+
+        :param file_name: name of the file being modified
+        :param old_content: old file content
+        :param hunks: change hunks from the unified diff
+        :param fallback: proposed new file content (with all the changes applied)
+        """
+        diff = (
+            "\n".join(
+                [
+                    f"--- {file_name}",
+                    f"+++ {file_name}",
+                ]
+                + hunks
+            )
+            + "\n"
+        )
+        try:
+            fixed_content = self._apply_patch(old_content, diff)
+        except Exception as e:
+            # This should never happen but if it does, just use the new version from
+            # the LLM and hope for the best
+            print(f"Error applying diff: {e}; hoping all changes are valid")
+            return fallback
+
+        return fixed_content
+
+    # Adapted from https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc (Public Domain)
+    @staticmethod
+    def _apply_patch(original: str, patch: str, revert: bool = False):
+        """
+        Apply a patch to a string to recover a newer version of the string.
+
+        :param original: The original string.
+        :param patch: The patch to apply.
+        :param revert: If True, treat the original string as the newer version and recover the older string.
+        :return: The updated string after applying the patch.
+        """
+        original_lines = original.splitlines(True)
+        patch_lines = patch.splitlines(True)
+
+        updated_text = ""
+        index_original = start_line = 0
+
+        # Choose which group of the regex to use based on the revert flag
+        match_index, line_sign = (1, "+") if not revert else (3, "-")
+
+        # Skip header lines of the patch
+        while index_original < len(patch_lines) and patch_lines[index_original].startswith(("---", "+++")):
+            index_original += 1
+
+        while index_original < len(patch_lines):
+            match = PATCH_HEADER_PATTERN.match(patch_lines[index_original])
+            if not match:
+                raise Exception("Bad patch -- regex mismatch [line " + str(index_original) + "]")
+
+            line_number = int(match.group(match_index)) - 1 + (match.group(match_index + 1) == "0")
+
+            if start_line > line_number or line_number > len(original_lines):
+                raise Exception("Bad patch -- bad line number [line " + str(index_original) + "]")
+
+            updated_text += "".join(original_lines[start_line:line_number])
+            start_line = line_number
+            index_original += 1
+
+            while index_original < len(patch_lines) and patch_lines[index_original][0] != "@":
+                if index_original + 1 < len(patch_lines) and patch_lines[index_original + 1][0] == "\\":
+                    line_content = patch_lines[index_original][:-1]
+                    index_original += 2
+                else:
+                    line_content = patch_lines[index_original]
+                    index_original += 1
+
+                if line_content:
+                    if line_content[0] == line_sign or line_content[0] == " ":
+                        updated_text += line_content[1:]
+                    start_line += line_content[0] != line_sign
+
+        updated_text += "".join(original_lines[start_line:])
+        return updated_text
--- a/core/agents/convo.py
+++ b/core/agents/convo.py
@@ -0,0 +1,75 @@
+import json
+import sys
+from copy import deepcopy
+from typing import TYPE_CHECKING, Optional
+
+from pydantic import BaseModel
+
+from core.config import get_config
+from core.llm.convo import Convo
+from core.llm.prompt import JinjaFileTemplate
+from core.log import get_logger
+
+if TYPE_CHECKING:
+    from core.agents.response import BaseAgent
+
+log = get_logger(__name__)
+
+
+class AgentConvo(Convo):
+    prompt_loader: Optional[JinjaFileTemplate] = None
+
+    def __init__(self, agent: "BaseAgent"):
+        self.agent_instance = agent
+        super().__init__()
+        try:
+            system_message = self.render("system")
+            self.system(system_message)
+        except ValueError as err:
+            log.warning(f"Agent {agent.__class__.__name__} has no system prompt: {err}")
+
+    @classmethod
+    def _init_templates(cls):
+        if cls.prompt_loader is not None:
+            return
+
+        config = get_config()
+        cls.prompt_loader = JinjaFileTemplate(config.prompt.paths)
+
+    def _get_default_template_vars(self) -> dict:
+        if sys.platform == "win32":
+            os = "Windows"
+        elif sys.platform == "darwin":
+            os = "macOS"
+        else:
+            os = "Linux"
+
+        return {
+            "state": self.agent_instance.current_state,
+            "os": os,
+        }
+
+    def render(self, name: str, **kwargs) -> str:
+        self._init_templates()
+
+        kwargs.update(self._get_default_template_vars())
+
+        # Jinja uses "/" even in Windows
+        template_name = f"{self.agent_instance.agent_type}/{name}.prompt"
+        log.debug(f"Loading template {template_name}")
+        return self.prompt_loader(template_name, **kwargs)
+
+    def template(self, template_name: str, **kwargs) -> "AgentConvo":
+        message = self.render(template_name, **kwargs)
+        self.user(message)
+        return self
+
+    def fork(self) -> "AgentConvo":
+        child = AgentConvo(self.agent_instance)
+        child.messages = deepcopy(self.messages)
+        return child
+
+    def require_schema(self, model: BaseModel) -> "AgentConvo":
+        schema_txt = json.dumps(model.model_json_schema())
+        self.user(f"IMPORTANT: Your response MUST conform to this JSON schema:\n```\n{schema_txt}\n```")
+        return self
--- a/core/agents/developer.py
+++ b/core/agents/developer.py
@@ -0,0 +1,294 @@
+from enum import Enum
+from typing import Annotated, Literal, Optional, Union
+from uuid import uuid4
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse, ResponseType
+from core.llm.parser import JSONParser
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class StepType(str, Enum):
+    COMMAND = "command"
+    SAVE_FILE = "save_file"
+    HUMAN_INTERVENTION = "human_intervention"
+
+
+class CommandOptions(BaseModel):
+    command: str = Field(description="Command to run")
+    timeout: int = Field(description="Timeout in seconds")
+    success_message: str = ""
+
+
+class SaveFileOptions(BaseModel):
+    path: str
+
+
+class SaveFileStep(BaseModel):
+    type: Literal[StepType.SAVE_FILE] = StepType.SAVE_FILE
+    save_file: SaveFileOptions
+
+
+class CommandStep(BaseModel):
+    type: Literal[StepType.COMMAND] = StepType.COMMAND
+    command: CommandOptions
+
+
+class HumanInterventionStep(BaseModel):
+    type: Literal[StepType.HUMAN_INTERVENTION] = StepType.HUMAN_INTERVENTION
+    human_intervention_description: str
+
+
+Step = Annotated[
+    Union[SaveFileStep, CommandStep, HumanInterventionStep],
+    Field(discriminator="type"),
+]
+
+
+class TaskSteps(BaseModel):
+    steps: list[Step]
+
+
+class Developer(BaseAgent):
+    agent_type = "developer"
+    display_name = "Developer"
+
+    async def run(self) -> AgentResponse:
+        if self.prev_response and self.prev_response.type == ResponseType.TASK_REVIEW_FEEDBACK:
+            return await self.breakdown_current_iteration(self.prev_response.data["feedback"])
+
+        # If any of the files are missing metadata/descriptions, those need to be filled-in
+        missing_descriptions = [file.path for file in self.current_state.files if not file.meta.get("description")]
+        if missing_descriptions:
+            log.debug(f"Some files are missing descriptions: {', '.join(missing_descriptions)}, reqesting analysis")
+            return AgentResponse.describe_files(self)
+
+        log.debug(f"Current state files: {len(self.current_state.files)}, relevant {self.current_state.relevant_files}")
+        # Check which files are relevant to the current task
+        if self.current_state.files and not self.current_state.relevant_files:
+            await self.get_relevant_files()
+            return AgentResponse.done(self)
+
+        if not self.current_state.unfinished_tasks:
+            log.warning("No unfinished tasks found, nothing to do (why am I called? is this a bug?)")
+            return AgentResponse.done(self)
+
+        if self.current_state.unfinished_iterations:
+            return await self.breakdown_current_iteration()
+
+        # By default, we want to ask the user if they want to run the task,
+        # except in certain cases (such as they've just edited it).
+        if not self.current_state.current_task.get("run_always", False):
+            if not await self.ask_to_execute_task():
+                return AgentResponse.done(self)
+
+        return await self.breakdown_current_task()
+
+    async def breakdown_current_iteration(self, review_feedback: Optional[str] = None) -> AgentResponse:
+        """
+        Breaks down current iteration or task review into steps.
+
+        :param review_feedback: If provided, the task review feedback is broken down instead of the current iteration
+        :return: AgentResponse.done(self) when the breakdown is done
+        """
+        if self.current_state.unfinished_steps:
+            # if this happens, it's most probably a bug as we should have gone through all the
+            # steps before getting new new iteration instructions
+            log.warning(
+                f"Unfinished steps found before the next iteration is broken down: {self.current_state.unfinished_steps}"
+            )
+
+        if review_feedback is not None:
+            iteration = None
+            description = review_feedback
+            user_feedback = ""
+            source = "review"
+            n_tasks = 1
+            log.debug(f"Breaking down the task review feedback {review_feedback}")
+            await self.send_message("Breaking down the task review feedback...")
+        else:
+            iteration = self.current_state.current_iteration
+            if iteration is None:
+                log.error("Iteration breakdown called but there's no current iteration or task review, possible bug?")
+                return AgentResponse.done(self)
+
+            description = iteration["description"]
+            user_feedback = iteration["user_feedback"]
+            source = "troubleshooting"
+            n_tasks = len(self.next_state.iterations)
+            log.debug(f"Breaking down the iteration {description}")
+            await self.send_message("Breaking down the current task iteration ...")
+
+        await self.ui.send_task_progress(
+            n_tasks,  # iterations and reviews can be created only one at a time, so we are always on last one
+            n_tasks,
+            self.current_state.current_task["description"],
+            source,
+            "in-progress",
+        )
+        llm = self.get_llm()
+        # FIXME: In case of iteration, parse_task depends on the context (files, tasks, etc) set there.
+        # Ideally this prompt would be self-contained.
+        convo = (
+            AgentConvo(self)
+            .template(
+                "iteration",
+                current_task=self.current_state.current_task,
+                user_feedback=user_feedback,
+                user_feedback_qa=None,
+                next_solution_to_try=None,
+            )
+            .assistant(description)
+            .template("parse_task")
+            .require_schema(TaskSteps)
+        )
+        response: TaskSteps = await llm(convo, parser=JSONParser(TaskSteps), temperature=0)
+
+        self.set_next_steps(response, source)
+
+        if iteration:
+            self.next_state.complete_iteration()
+
+        return AgentResponse.done(self)
+
+    async def breakdown_current_task(self) -> AgentResponse:
+        task = self.current_state.current_task
+        source = self.current_state.current_epic.get("source", "app")
+        await self.ui.send_task_progress(
+            self.current_state.tasks.index(self.current_state.current_task) + 1,
+            len(self.current_state.tasks),
+            self.current_state.current_task["description"],
+            source,
+            "in-progress",
+        )
+
+        log.debug(f"Breaking down the current task: {task['description']}")
+        await self.send_message("Thinking about how to implement this task ...")
+
+        current_task_index = self.current_state.tasks.index(task)
+
+        llm = self.get_llm()
+        convo = AgentConvo(self).template(
+            "breakdown",
+            task=task,
+            iteration=None,
+            current_task_index=current_task_index,
+        )
+        response: str = await llm(convo)
+
+        # FIXME: check if this is correct, as sqlalchemy can't figure out modifications
+        # to attributes; however, self.next is not saved yet so maybe this is fine
+        self.next_state.tasks[current_task_index] = {
+            **task,
+            "instructions": response,
+        }
+
+        await self.send_message("Breaking down the task into steps ...")
+        convo.template("parse_task").require_schema(TaskSteps)
+        response: TaskSteps = await llm(convo, parser=JSONParser(TaskSteps), temperature=0)
+
+        # There might be state leftovers from previous tasks that we need to clean here
+        self.next_state.modified_files = {}
+        self.set_next_steps(response, source)
+        return AgentResponse.done(self)
+
+    async def get_relevant_files(self) -> AgentResponse:
+        log.debug("Getting relevant files for the current task")
+        await self.send_message("Figuring out which project files are relevant for the next task ...")
+
+        llm = self.get_llm()
+        convo = AgentConvo(self).template("filter_files", current_task=self.current_state.current_task)
+
+        # FIXME: this doesn't validate correct structure format, we should use pydantic for that as well
+        llm_response: list[str] = await llm(convo, parser=JSONParser(), temperature=0)
+
+        existing_files = {file.path for file in self.current_state.files}
+        self.next_state.relevant_files = [path for path in llm_response if path in existing_files]
+
+        return AgentResponse.done(self)
+
+    def set_next_steps(self, response: TaskSteps, source: str):
+        # For logging/debugging purposes, we don't want to remove the finished steps
+        # until we're done with the task.
+        finished_steps = [step for step in self.current_state.steps if step["completed"]]
+        self.next_state.steps = finished_steps + [
+            {
+                "id": uuid4().hex,
+                "completed": False,
+                "source": source,
+                **step.model_dump(),
+            }
+            for step in response.steps
+        ]
+        if len(self.next_state.unfinished_steps) > 0:
+            self.next_state.steps += [
+                # TODO: add refactor step here once we have the refactor agent
+                {
+                    "id": uuid4().hex,
+                    "completed": False,
+                    "type": "review_task",
+                    "source": source,
+                },
+                {
+                    "id": uuid4().hex,
+                    "completed": False,
+                    "type": "create_readme",
+                    "source": source,
+                },
+            ]
+        log.debug(f"Next steps: {self.next_state.unfinished_steps}")
+
+    async def ask_to_execute_task(self) -> bool:
+        """
+        Asks the user to approve, skip or edit the current task.
+
+        If task is edited, the method returns False so that the changes are saved. The
+        Orchestrator will rerun the agent on the next iteration.
+
+        :return: True if the task should be executed as is, False if the task is skipped or edited
+        """
+        description = self.current_state.current_task["description"]
+        user_response = await self.ask_question(
+            "Do you want to execute the this task:\n\n" + description,
+            buttons={"yes": "Yes", "edit": "Edit Task", "skip": "Skip Task"},
+            default="yes",
+            buttons_only=True,
+        )
+        if user_response.button == "yes":
+            # Execute the task as is
+            return True
+
+        if user_response.cancelled or user_response.button == "skip":
+            log.info(f"Skipping task: {description}")
+            self.next_state.current_task["instructions"] = "(skipped on user request)"
+            self.next_state.complete_task()
+            await self.send_message(f"Skipping task {description}")
+            # We're done here, and will pick up the next task (if any) on the next run
+            return False
+
+        user_response = await self.ask_question(
+            "Edit the task description:",
+            buttons={
+                # FIXME: Continue doesn't actually work, VSCode doesn't send the user
+                # message if it's clicked. Long term we need to fix the extension.
+                # "continue": "Continue",
+                "cancel": "Cancel",
+            },
+            default="continue",
+            initial_text=description,
+        )
+        if user_response.button == "cancel" or user_response.cancelled:
+            # User hasn't edited the task so we can execute it immediately as is
+            return True
+
+        self.next_state.current_task["description"] = user_response.text
+        self.next_state.current_task["run_always"] = True
+        self.next_state.relevant_files = []
+        log.info(f"Task description updated to: {user_response.text}")
+        # Orchestrator will rerun us with the new task description
+        return False
--- a/core/agents/error_handler.py
+++ b/core/agents/error_handler.py
@@ -0,0 +1,108 @@
+from uuid import uuid4
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class ErrorHandler(BaseAgent):
+    """
+    Error handler agent.
+
+    Error handler is responsible for handling errors returned by other agents. If it's possible
+    to recover from the error, it should do it (which may include updating the "next" state) and
+    return DONE. Otherwise it should return EXIT to tell Orchestrator to quit the application.
+    """
+
+    agent_type = "error-handler"
+    display_name = "Error Handler"
+
+    async def run(self) -> AgentResponse:
+        from core.agents.executor import Executor
+        from core.agents.spec_writer import SpecWriter
+
+        error = self.prev_response
+        if error is None:
+            log.warning("ErrorHandler called without a previous error", stack_info=True)
+            return AgentResponse.done(self)
+
+        log.error(
+            f"Agent {error.agent.display_name} returned error response: {error.type}",
+            extra={"data": error.data},
+        )
+
+        if isinstance(error.agent, SpecWriter):
+            # If SpecWriter wasn't able to get the project description, there's nothing for
+            # us to do.
+            return AgentResponse.exit(self)
+
+        if isinstance(error.agent, Executor):
+            return await self.handle_command_error(
+                error.data.get("message", "Unknown error"), error.data.get("details", {})
+            )
+
+        log.error(
+            f"Unhandled error response from agent {error.agent.display_name}",
+            extra={"data": error.data},
+        )
+        return AgentResponse.exit(self)
+
+    async def handle_command_error(self, message: str, details: dict) -> AgentResponse:
+        """
+        Handle an error returned by Executor agent.
+
+        Error message must be the analyis of the command execution, and the details must contain:
+        * cmd - command that was executed
+        * timeout - timeout for the command if any (or None if no timeout was used)
+        * status_code - exit code for the command (or None if the command timed out)
+        * stdout - standard output of the command
+        * stderr - standard error of the command
+
+        :return: AgentResponse
+        """
+        cmd = details.get("cmd")
+        timeout = details.get("timeout")
+        status_code = details.get("status_code")
+        stdout = details.get("stdout", "")
+        stderr = details.get("stderr", "")
+
+        if not message:
+            raise ValueError("No error message provided in command error response")
+        if not cmd:
+            raise ValueError("No command provided in command error response details")
+
+        llm = self.get_llm()
+        convo = AgentConvo(self).template(
+            "debug",
+            task_steps=self.current_state.steps,
+            current_task=self.current_state.current_task,
+            # FIXME: can this break?
+            step_index=self.current_state.steps.index(self.current_state.current_step),
+            cmd=cmd,
+            timeout=timeout,
+            stdout=stdout,
+            stderr=stderr,
+            status_code=status_code,
+            # fixme: everything above copypasted from Executor
+            analysis=message,
+        )
+        llm_response: str = await llm(convo)
+
+        # TODO: duplicate from Troubleshooter, maybe extract to a ProjectState method?
+        self.next_state.iterations = self.current_state.iterations + [
+            {
+                "id": uuid4().hex,
+                "user_feedback": f"Error running command: {cmd}",
+                "description": llm_response,
+                "alternative_solutions": [],
+                "attempts": 1,
+                "completed": False,
+            }
+        ]
+        # TODO: maybe have ProjectState.finished_steps as well? would make the debug/ran_command prompts nicer too
+        self.next_state.steps = [s for s in self.current_state.steps if s.get("completed") is True]
+        # No need to call complete_step() here as we've just removed the steps so that Developer can break down the iteration
+        return AgentResponse.done(self)
--- a/core/agents/executor.py
+++ b/core/agents/executor.py
@@ -0,0 +1,166 @@
+from datetime import datetime
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.llm.parser import JSONParser
+from core.log import get_logger
+from core.proc.exec_log import ExecLog
+from core.proc.process_manager import ProcessManager
+from core.state.state_manager import StateManager
+from core.ui.base import AgentSource, UIBase
+
+log = get_logger(__name__)
+
+
+class CommandResult(BaseModel):
+    """
+    Analysis of the command run and decision on the next steps.
+    """
+
+    analysis: str = Field(
+        description="Analysis of the command output (stdout, stderr) and exit code, in context of the current task"
+    )
+    success: bool = Field(
+        description="True if the command should be treated as successful and the task should continue, false if the command unexpectedly failed and we should debug the issue"
+    )
+
+
+class Executor(BaseAgent):
+    agent_type = "executor"
+    display_name = "Executor"
+
+    def __init__(
+        self,
+        state_manager: StateManager,
+        ui: UIBase,
+    ):
+        """
+        Create a new Executor agent
+        """
+        self.ui_source = AgentSource(self.display_name, self.agent_type)
+        self.ui = ui
+        self.state_manager = state_manager
+        self.process_manager = ProcessManager(
+            root_dir=state_manager.get_full_project_root(),
+            output_handler=self.output_handler,
+            exit_handler=self.exit_handler,
+        )
+        self.stream_output = True
+
+    def for_step(self, step):
+        # FIXME: not needed, refactor to use self.current_state.current_step
+        # in general, passing current step is not needed
+        self.step = step
+        return self
+
+    async def output_handler(self, out, err):
+        await self.stream_handler(out)
+        await self.stream_handler(err)
+
+    async def exit_handler(self, process):
+        pass
+
+    async def run(self) -> AgentResponse:
+        if not self.step:
+            raise ValueError("No current step set (probably an Orchestrator bug)")
+
+        options = self.step["command"]
+        cmd = options["command"]
+        timeout = options.get("timeout")
+
+        if timeout:
+            q = f"Can I run command: {cmd} with {timeout}s timeout?"
+        else:
+            q = f"Can I run command: {cmd}?"
+
+        confirm = await self.ask_question(
+            q,
+            buttons={"yes": "Yes", "no": "No"},
+            default="yes",
+            buttons_only=True,
+        )
+        if confirm.button == "no":
+            log.info(f"Skipping command execution of `{cmd}` (requested by user)")
+            await self.send_message(f"Skipping command {cmd}")
+            self.complete()
+            return AgentResponse.done(self)
+
+        started_at = datetime.now()
+
+        log.info(f"Running command `{cmd}` with timeout {timeout}s")
+        status_code, stdout, stderr = await self.process_manager.run_command(cmd, timeout=timeout)
+        llm_response = await self.check_command_output(cmd, timeout, stdout, stderr, status_code)
+
+        duration = (datetime.now() - started_at).total_seconds()
+
+        self.complete()
+
+        exec_log = ExecLog(
+            started_at=started_at,
+            duration=duration,
+            cmd=cmd,
+            cwd=".",
+            env={},
+            timeout=timeout,
+            status_code=status_code,
+            stdout=stdout,
+            stderr=stderr,
+            analysis=llm_response.analysis,
+            success=llm_response.success,
+        )
+        await self.state_manager.log_command_run(exec_log)
+
+        if llm_response.success:
+            return AgentResponse.done(self)
+
+        return AgentResponse.error(
+            self,
+            llm_response.analysis,
+            {
+                "cmd": cmd,
+                "timeout": timeout,
+                "stdout": stdout,
+                "stderr": stderr,
+                "status_code": status_code,
+            },
+        )
+
+    async def check_command_output(
+        self, cmd: str, timeout: Optional[int], stdout: str, stderr: str, status_code: int
+    ) -> CommandResult:
+        llm = self.get_llm()
+        convo = (
+            AgentConvo(self)
+            .template(
+                "ran_command",
+                task_steps=self.current_state.steps,
+                current_task=self.current_state.current_task,
+                # FIXME: can step ever happen *not* to be in current steps?
+                step_index=self.current_state.steps.index(self.step),
+                cmd=cmd,
+                timeout=timeout,
+                stdout=stdout,
+                stderr=stderr,
+                status_code=status_code,
+            )
+            .require_schema(CommandResult)
+        )
+        return await llm(convo, parser=JSONParser(spec=CommandResult), temperature=0)
+
+    def complete(self):
+        """
+        Mark the step as complete.
+
+        Note that this marks the step complete in the next state. If there's an error,
+        the state won't get committed and the error handler will have access to the
+        current state, where this step is still unfinished.
+
+        This is intentional, so that the error handler can decide what to do with the
+        information we give it.
+        """
+        self.step = None
+        self.next_state.complete_step()
--- a/core/agents/human_input.py
+++ b/core/agents/human_input.py
@@ -0,0 +1,46 @@
+from core.agents.base import BaseAgent
+from core.agents.response import AgentResponse, ResponseType
+
+
+class HumanInput(BaseAgent):
+    agent_type = "human-input"
+    display_name = "Human Input"
+
+    async def run(self) -> AgentResponse:
+        if self.prev_response and self.prev_response.type == ResponseType.INPUT_REQUIRED:
+            return await self.input_required(self.prev_response.data.get("files", []))
+
+        return await self.human_intervention(self.step)
+
+    async def human_intervention(self, step) -> AgentResponse:
+        description = step["human_intervention_description"]
+
+        await self.ask_question(
+            f"I need human intervention: {description}",
+            buttons={"continue": "Continue"},
+            default="continue",
+            buttons_only=True,
+        )
+        self.next_state.complete_step()
+        return AgentResponse.done(self)
+
+    async def input_required(self, files: list[dict]) -> AgentResponse:
+        for item in files:
+            file = item["file"]
+            line = item["line"]
+
+            # FIXME: this is an ugly hack, we shouldn't need to know how to get to VFS and
+            # anyways the full path is only available for local vfs, so this is doubly wrong;
+            # instead, we should just send the relative path to the extension and it should
+            # figure out where its local files are and how to open it.
+            full_path = self.state_manager.file_system.get_full_path(file)
+
+            await self.send_message(f"Input required on {file}:{line}")
+            await self.ui.open_editor(full_path, line)
+            await self.ask_question(
+                f"Please open {file} and modify line {line} according to the instructions.",
+                buttons={"continue": "Continue"},
+                default="continue",
+                buttons_only=True,
+            )
+        return AgentResponse.done(self)
--- a/core/agents/mixins.py
+++ b/core/agents/mixins.py
@@ -0,0 +1,37 @@
+from typing import Optional
+
+from core.agents.convo import AgentConvo
+
+
+class IterationPromptMixin:
+    """
+    Provides a method to find a solution to a problem based on user feedback.
+
+    Used by ProblemSolver and Troubleshooter agents.
+    """
+
+    async def find_solution(
+        self,
+        user_feedback: str,
+        *,
+        user_feedback_qa: Optional[list[str]] = None,
+        next_solution_to_try: Optional[str] = None,
+    ) -> str:
+        """
+        Generate a new solution for the problem the user reported.
+
+        :param user_feedback: User feedback about the problem.
+        :param user_feedback_qa: Additional q/a about the problem provided by the user (optional).
+        :param next_solution_to_try: Hint from ProblemSolver on which solution to try (optional).
+        :return: The generated solution to the problem.
+        """
+        llm = self.get_llm()
+        convo = AgentConvo(self).template(
+            "iteration",
+            current_task=self.current_state.current_task,
+            user_feedback=user_feedback,
+            user_feedback_qa=user_feedback_qa,
+            next_solution_to_try=next_solution_to_try,
+        )
+        llm_solution: str = await llm(convo)
+        return llm_solution
--- a/core/agents/orchestrator.py
+++ b/core/agents/orchestrator.py
@@ -0,0 +1,329 @@
+from typing import Optional
+
+from core.agents.architect import Architect
+from core.agents.base import BaseAgent
+from core.agents.code_monkey import CodeMonkey
+from core.agents.code_reviewer import CodeReviewer
+from core.agents.developer import Developer
+from core.agents.error_handler import ErrorHandler
+from core.agents.executor import Executor
+from core.agents.human_input import HumanInput
+from core.agents.problem_solver import ProblemSolver
+from core.agents.response import AgentResponse, ResponseType
+from core.agents.spec_writer import SpecWriter
+from core.agents.task_reviewer import TaskReviewer
+from core.agents.tech_lead import TechLead
+from core.agents.tech_writer import TechnicalWriter
+from core.agents.troubleshooter import Troubleshooter
+from core.config import LLMProvider, get_config
+from core.llm.convo import Convo
+from core.log import get_logger
+from core.telemetry import telemetry
+from core.ui.base import ProjectStage
+
+log = get_logger(__name__)
+
+
+class Orchestrator(BaseAgent):
+    """
+    Main agent that controls the flow of the process.
+
+    Based on the current state of the project, the orchestrator invokes
+    all other agents. It is also responsible for determining when each
+    step is done and the project state needs to be committed to the database.
+    """
+
+    agent_type = "orchestrator"
+    display_name = "Orchestrator"
+
+    async def run(self) -> bool:
+        """
+        Run the Orchestrator agent.
+
+        :return: True if the Orchestrator exited successfully, False otherwise.
+        """
+        response = None
+
+        log.info(f"Starting {__name__}.Orchestrator")
+
+        self.executor = Executor(self.state_manager, self.ui)
+        self.process_manager = self.executor.process_manager
+        # self.chat = Chat() TODO
+
+        await self.init_ui()
+        await self.offline_changes_check()
+
+        llm_api_check = await self.test_llm_access()
+        if not llm_api_check:
+            return False
+
+        # TODO: consider refactoring this into two loop; the outer with one iteration per comitted step,
+        # and the inner which runs the agents for the current step until they're done. This would simplify
+        # handle_done() and let us do other per-step processing (eg. describing files) in between agent runs.
+        while True:
+            await self.update_stats()
+
+            agent = self.create_agent(response)
+            log.debug(f"Running agent {agent.__class__.__name__} (step {self.current_state.step_index})")
+            response = await agent.run()
+
+            if response.type == ResponseType.EXIT:
+                log.debug(f"Agent {agent.__class__.__name__} requested exit")
+                break
+
+            if response.type == ResponseType.DONE:
+                response = await self.handle_done(agent, response)
+                continue
+
+        # TODO: rollback changes to "next" so they aren't accidentally committed?
+        return True
+
+    async def test_llm_access(self) -> bool:
+        """
+        Make sure the LLMs for all the defined agents are reachable.
+
+        Each LLM provider is only checked once.
+        Returns True if the check for successful for all LLMs.
+        """
+
+        config = get_config()
+        defined_agents = config.agent.keys()
+
+        convo = Convo()
+        convo.user(
+            " ".join(
+                [
+                    "This is a connection test. If you can see this,",
+                    "please respond only with 'START' and nothing else.",
+                ]
+            )
+        )
+
+        success = True
+        tested_llms: set[LLMProvider] = set()
+        for agent_name in defined_agents:
+            llm = self.get_llm(agent_name)
+            llm_config = config.llm_for_agent(agent_name)
+
+            if llm_config.provider in tested_llms:
+                continue
+
+            tested_llms.add(llm_config.provider)
+            provider_model_combo = f"{llm_config.provider.value} {llm_config.model}"
+            try:
+                resp = await llm(convo)
+            except Exception as err:
+                log.warning(f"API check for {provider_model_combo} failed: {err}")
+                success = False
+                await self.ui.send_message(f"Error connecting to the {provider_model_combo} API: {err}")
+                continue
+
+            if resp and len(resp) > 0:
+                log.debug(f"API check for {provider_model_combo} passed.")
+            else:
+                log.warning(f"API check for {provider_model_combo} failed.")
+                await self.ui.send_message(
+                    f"Error connecting to the {provider_model_combo} API. Please check your settings and internet connection."
+                )
+                success = False
+
+        return success
+
+    async def offline_changes_check(self):
+        """
+        Check for changes outside of Pythagora.
+
+        If there are changes, ask the user if they want to keep them, and
+        import if needed.
+        """
+
+        log.info("Checking for offline changes.")
+        modified_files = await self.state_manager.get_modified_files()
+
+        if self.state_manager.workspace_is_empty():
+            # NOTE: this will currently get triggered on a new project, but will do
+            # nothing as there's no files in the database.
+            log.info("Detected empty workspace, restoring state from the database.")
+            await self.state_manager.restore_files()
+        elif modified_files:
+            await self.send_message(f"We found {len(modified_files)} new and/or modified files.")
+
+            hint = "".join(
+                [
+                    "If you would like Pythagora to import those changes, click 'Yes'.\n",
+                    "Clicking 'No' means Pythagora will restore (overwrite) all files to the last stored state.\n",
+                ]
+            )
+            use_changes = await self.ask_question(
+                question="Would you like to keep your changes?",
+                buttons={
+                    "yes": "Yes, keep my changes",
+                    "no": "No, restore last Pythagora state",
+                },
+                buttons_only=True,
+                hint=hint,
+            )
+            if use_changes.button == "yes":
+                log.debug("Importing offline changes into Pythagora.")
+                await self.import_files()
+            else:
+                log.debug("Restoring last stored state.")
+                await self.state_manager.restore_files()
+
+        log.info("Offline changes check done.")
+
+    async def handle_done(self, agent: BaseAgent, response: AgentResponse) -> AgentResponse:
+        """
+        Handle the DONE response from the agent and commit current state to the database.
+
+        This also checks for any files created or modified outside Pythagora and
+        imports them. If any of the files require input from the user, the returned response
+        will trigger the HumanInput agent to ask the user to provide the required input.
+
+        """
+        n_epics = len(self.next_state.epics)
+        n_finished_epics = n_epics - len(self.next_state.unfinished_epics)
+        n_tasks = len(self.next_state.tasks)
+        n_finished_tasks = n_tasks - len(self.next_state.unfinished_tasks)
+        n_iterations = len(self.next_state.iterations)
+        n_finished_iterations = n_iterations - len(self.next_state.unfinished_iterations)
+        n_steps = len(self.next_state.steps)
+        n_finished_steps = n_steps - len(self.next_state.unfinished_steps)
+
+        log.debug(
+            f"Agent {agent.__class__.__name__} is done, "
+            f"committing state for step {self.current_state.step_index}: "
+            f"{n_finished_epics}/{n_epics} epics, "
+            f"{n_finished_tasks}/{n_tasks} tasks, "
+            f"{n_finished_iterations}/{n_iterations} iterations, "
+            f"{n_finished_steps}/{n_steps} dev steps."
+        )
+        await self.state_manager.commit()
+
+        # If there are any new or modified files changed outside Pythagora,
+        # this is a good time to add them to the project. If any of them have
+        # INPUT_REQUIRED, we'll first ask the user to provide the required input.
+        return await self.import_files()
+
+    def create_agent(self, prev_response: Optional[AgentResponse]) -> BaseAgent:
+        state = self.current_state
+
+        if prev_response:
+            if prev_response.type in [ResponseType.CANCEL, ResponseType.ERROR]:
+                return ErrorHandler(self.state_manager, self.ui, prev_response=prev_response)
+            if prev_response.type == ResponseType.CODE_REVIEW:
+                return CodeReviewer(self.state_manager, self.ui, prev_response=prev_response)
+            if prev_response.type == ResponseType.CODE_REVIEW_FEEDBACK:
+                return CodeMonkey(self.state_manager, self.ui, prev_response=prev_response, step=state.current_step)
+            if prev_response.type == ResponseType.DESCRIBE_FILES:
+                return CodeMonkey(self.state_manager, self.ui, prev_response=prev_response)
+            if prev_response.type == ResponseType.INPUT_REQUIRED:
+                # FIXME: HumanInput should be on the whole time and intercept chat/interrupt
+                return HumanInput(self.state_manager, self.ui, prev_response=prev_response)
+            if prev_response.type == ResponseType.UPDATE_EPIC:
+                return TechLead(self.state_manager, self.ui, prev_response=prev_response)
+            if prev_response.type == ResponseType.TASK_REVIEW_FEEDBACK:
+                return Developer(self.state_manager, self.ui, prev_response=prev_response)
+
+        if not state.specification.description:
+            # Ask the Spec Writer to refine and save the project specification
+            return SpecWriter(self.state_manager, self.ui)
+        elif not state.specification.architecture:
+            # Ask the Architect to design the project architecture and determine dependencies
+            return Architect(self.state_manager, self.ui, process_manager=self.process_manager)
+        elif (
+            not state.epics
+            or not self.current_state.unfinished_tasks
+            or (state.specification.template and not state.files)
+        ):
+            # Ask the Tech Lead to break down the initial project or feature into tasks and apply projet template
+            return TechLead(self.state_manager, self.ui, process_manager=self.process_manager)
+        elif not state.steps and not state.iterations:
+            # Ask the Developer to break down current task into actionable steps
+            return Developer(self.state_manager, self.ui)
+
+        if state.current_step:
+            # Execute next step in the task
+            # TODO: this can be parallelized in the future
+            return self.create_agent_for_step(state.current_step)
+
+        if state.unfinished_iterations:
+            if state.current_iteration["description"]:
+                # Break down the next iteration into steps
+                return Developer(self.state_manager, self.ui)
+            else:
+                # We need to iterate over the current task but there's no solution, as Pythagora
+                # is stuck in a loop, and ProblemSolver needs to find alternative solutions.
+                return ProblemSolver(self.state_manager, self.ui)
+
+        # We have just finished the task, call Troubleshooter to ask the user to review
+        return Troubleshooter(self.state_manager, self.ui)
+
+    def create_agent_for_step(self, step: dict) -> BaseAgent:
+        step_type = step.get("type")
+        if step_type == "save_file":
+            return CodeMonkey(self.state_manager, self.ui, step=step)
+        elif step_type == "command":
+            return self.executor.for_step(step)
+        elif step_type == "human_intervention":
+            return HumanInput(self.state_manager, self.ui, step=step)
+        elif step_type == "review_task":
+            return TaskReviewer(self.state_manager, self.ui)
+        elif step_type == "create_readme":
+            return TechnicalWriter(self.state_manager, self.ui)
+        else:
+            raise ValueError(f"Unknown step type: {step_type}")
+
+    async def import_files(self) -> Optional[AgentResponse]:
+        imported_files = await self.state_manager.import_files()
+        if not imported_files:
+            return None
+
+        log.info(f"Imported new/changed files to project: {', '.join(f.path for f in imported_files)}")
+
+        input_required_files: list[dict[str, int]] = []
+        for file in imported_files:
+            for line in self.state_manager.get_input_required(file.content.content):
+                input_required_files.append({"file": file.path, "line": line})
+
+        if input_required_files:
+            # This will trigger the HumanInput agent to ask the user to provide the required changes
+            # If the user changes anything (removes the "required changes"), the file will be re-imported.
+            return AgentResponse.input_required(self, input_required_files)
+
+        # Commit the newly imported file
+        log.debug(f"Committing imported files as a separate step {self.current_state.step_index}")
+        await self.state_manager.commit()
+        return None
+
+    async def init_ui(self):
+        await self.ui.send_project_root(self.state_manager.get_full_project_root())
+        if self.current_state.epics:
+            await self.ui.send_project_stage(ProjectStage.CODING)
+        elif self.current_state.specification:
+            await self.ui.send_project_stage(ProjectStage.ARCHITECTURE)
+        else:
+            await self.ui.send_project_stage(ProjectStage.DESCRIPTION)
+
+    async def update_stats(self):
+        if self.current_state.steps and self.current_state.current_step:
+            source = self.current_state.current_step.get("source")
+            source_steps = [s for s in self.current_state.steps if s.get("source") == source]
+            await self.ui.send_step_progress(
+                source_steps.index(self.current_state.current_step) + 1,
+                len(source_steps),
+                self.current_state.current_step,
+                source,
+            )
+
+        total_files = 0
+        total_lines = 0
+        for file in self.current_state.files:
+            total_files += 1
+            total_lines += len(file.content.content.splitlines())
+
+        telemetry.set("num_files", total_files)
+        telemetry.set("num_lines", total_lines)
+
+        stats = telemetry.get_project_stats()
+        await self.ui.send_project_stats(stats)
--- a/core/agents/problem_solver.py
+++ b/core/agents/problem_solver.py
@@ -0,0 +1,126 @@
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.agents.troubleshooter import IterationPromptMixin
+from core.llm.parser import JSONParser
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class AlternativeSolutions(BaseModel):
+    # FIXME: This is probably extra leftover from some dead code in the old implementation
+    description_of_tried_solutions: str = Field(
+        description="A description of the solutions that were tried to solve the recurring issue that was labeled as loop by the user.",
+    )
+    alternative_solutions: list[str] = Field(
+        description=("List of all alternative solutions to the recurring issue that was labeled as loop by the user.")
+    )
+
+
+class ProblemSolver(IterationPromptMixin, BaseAgent):
+    agent_type = "problem-solver"
+    display_name = "Problem Solver"
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.iteration = self.current_state.current_iteration
+        self.next_state_iteration = self.next_state.current_iteration
+        self.previous_solutions = [s for s in self.iteration["alternative_solutions"] if s["tried"]]
+        self.possible_solutions = [s for s in self.iteration["alternative_solutions"] if not s["tried"]]
+
+    async def run(self) -> AgentResponse:
+        if self.iteration is None:
+            log.warning("ProblemSolver agent started without an iteration to work on, possible bug?")
+            return AgentResponse.done(self)
+
+        if not self.possible_solutions:
+            await self.generate_alternative_solutions()
+            return AgentResponse.done(self)
+
+        return await self.try_alternative_solutions()
+
+    async def generate_alternative_solutions(self):
+        llm = self.get_llm()
+        convo = (
+            AgentConvo(self)
+            .template(
+                "get_alternative_solutions",
+                user_input=self.iteration["user_feedback"],
+                iteration=self.iteration,
+                previous_solutions=self.previous_solutions,
+            )
+            .require_schema(AlternativeSolutions)
+        )
+        llm_response: AlternativeSolutions = await llm(
+            convo,
+            parser=JSONParser(spec=AlternativeSolutions),
+            temperature=1,
+        )
+        self.next_state_iteration["alternative_solutions"] = self.iteration["alternative_solutions"] + [
+            {
+                "user_feedback": None,
+                "description": solution,
+                "tried": False,
+            }
+            for solution in llm_response.alternative_solutions
+        ]
+        self.next_state.flag_iterations_as_modified()
+
+    async def try_alternative_solutions(self) -> AgentResponse:
+        preferred_solution = await self.ask_for_preferred_solution()
+        if preferred_solution is None:
+            # TODO: We have several alternative solutions but the user didn't choose any.
+            # This means the user either needs expert help, or that they need to go back and
+            # maybe rephrase the tasks or even the project specs.
+            # For now, we'll just mark these as not working and try to regenerate.
+            self.next_state_iteration["alternative_solutions"] = [
+                {
+                    **s,
+                    "tried": True,
+                    "user_feedback": s["user_feedback"] or "That doesn't sound like a good idea, try something else.",
+                }
+                for s in self.possible_solutions
+            ]
+            self.next_state.flag_iterations_as_modified()
+            return AgentResponse.done(self)
+
+        index, next_solution_to_try = preferred_solution
+        llm_solution = await self.find_solution(
+            self.iteration["user_feedback"],
+            next_solution_to_try=next_solution_to_try,
+        )
+
+        self.next_state_iteration["alternative_solutions"][index]["tried"] = True
+        self.next_state_iteration["description"] = llm_solution
+        self.next_state_iteration["attempts"] = self.iteration["attempts"] + 1
+        self.next_state.flag_iterations_as_modified()
+        return AgentResponse.done(self)
+
+    async def ask_for_preferred_solution(self) -> Optional[tuple[int, str]]:
+        solutions = self.possible_solutions
+        buttons = {}
+
+        for i in range(len(solutions)):
+            buttons[str(i)] = str(i + 1)
+        buttons["none"] = "None of these"
+
+        solutions_txt = "\n\n".join([f"{i+1}: {s['description']}" for i, s in enumerate(solutions)])
+        user_response = await self.ask_question(
+            "Choose which solution would you like Pythagora to try next:\n\n" + solutions_txt,
+            buttons=buttons,
+            default="0",
+            buttons_only=True,
+        )
+        if user_response.button == "none" or user_response.cancelled:
+            return None
+
+        try:
+            i = int(user_response.button)
+            return i, solutions[i]
+        except (ValueError, IndexError):
+            return None
--- a/core/agents/response.py
+++ b/core/agents/response.py
@@ -0,0 +1,139 @@
+from enum import Enum
+from typing import TYPE_CHECKING, Optional
+
+from core.log import get_logger
+
+if TYPE_CHECKING:
+    from core.agents.base import BaseAgent
+    from core.agents.error_handler import ErrorHandler
+
+
+log = get_logger(__name__)
+
+
+class ResponseType(str, Enum):
+    DONE = "done"
+    """Agent has finished processing."""
+
+    ERROR = "error"
+    """There was an error processing the request."""
+
+    CANCEL = "cancel"
+    """User explicitly cancelled the operation."""
+
+    EXIT = "exit"
+    """Pythagora should exit."""
+
+    CODE_REVIEW = "code-review"
+    """Agent is requesting a review of the created code."""
+
+    CODE_REVIEW_FEEDBACK = "code-review-feedback"
+    """Agent is providing feedback on the code review."""
+
+    DESCRIBE_FILES = "describe-files"
+    """Analysis of the files in the project is requested."""
+
+    INPUT_REQUIRED = "input-required"
+    """User needs to modify a line in the generated code."""
+
+    UPDATE_EPIC = "update-epic"
+    """Update the epic development plan after a task was iterated on."""
+
+    TASK_REVIEW_FEEDBACK = "task-review-feedback"
+    """Agent is providing feedback on the entire task."""
+
+
+class AgentResponse:
+    type: ResponseType = ResponseType.DONE
+    agent: "BaseAgent"
+    data: Optional[dict]
+
+    def __init__(self, type: ResponseType, agent: "BaseAgent", data: Optional[dict] = None):
+        self.type = type
+        self.agent = agent
+        self.data = data
+
+    def __repr__(self) -> str:
+        return f"<AgentResponse type={self.type} agent={self.agent}>"
+
+    @staticmethod
+    def done(agent: "BaseAgent") -> "AgentResponse":
+        return AgentResponse(type=ResponseType.DONE, agent=agent)
+
+    @staticmethod
+    def error(agent: "BaseAgent", message: str, details: Optional[dict] = None) -> "AgentResponse":
+        return AgentResponse(
+            type=ResponseType.ERROR,
+            agent=agent,
+            data={"message": message, "details": details},
+        )
+
+    @staticmethod
+    def cancel(agent: "BaseAgent") -> "AgentResponse":
+        return AgentResponse(type=ResponseType.CANCEL, agent=agent)
+
+    @staticmethod
+    def exit(agent: "ErrorHandler") -> "AgentResponse":
+        return AgentResponse(type=ResponseType.EXIT, agent=agent)
+
+    @staticmethod
+    def code_review(
+        agent: "BaseAgent",
+        path: str,
+        instructions: str,
+        old_content: str,
+        new_content: str,
+        attempt: int,
+    ) -> "AgentResponse":
+        return AgentResponse(
+            type=ResponseType.CODE_REVIEW,
+            agent=agent,
+            data={
+                "path": path,
+                "instructions": instructions,
+                "old_content": old_content,
+                "new_content": new_content,
+                "attempt": attempt,
+            },
+        )
+
+    @staticmethod
+    def code_review_feedback(
+        agent: "BaseAgent",
+        new_content: str,
+        approved_content: str,
+        feedback: str,
+        attempt: int,
+    ) -> "AgentResponse":
+        return AgentResponse(
+            type=ResponseType.CODE_REVIEW_FEEDBACK,
+            agent=agent,
+            data={
+                "new_content": new_content,
+                "approved_content": approved_content,
+                "feedback": feedback,
+                "attempt": attempt,
+            },
+        )
+
+    @staticmethod
+    def describe_files(agent: "BaseAgent") -> "AgentResponse":
+        return AgentResponse(type=ResponseType.DESCRIBE_FILES, agent=agent)
+
+    @staticmethod
+    def input_required(agent: "BaseAgent", files: list[dict[str, int]]) -> "AgentResponse":
+        return AgentResponse(type=ResponseType.INPUT_REQUIRED, agent=agent, data={"files": files})
+
+    @staticmethod
+    def update_epic(agent: "BaseAgent") -> "AgentResponse":
+        return AgentResponse(type=ResponseType.UPDATE_EPIC, agent=agent)
+
+    @staticmethod
+    def task_review_feedback(agent: "BaseAgent", feedback: str) -> "AgentResponse":
+        return AgentResponse(
+            type=ResponseType.TASK_REVIEW_FEEDBACK,
+            agent=agent,
+            data={
+                "feedback": feedback,
+            },
+        )
--- a/core/agents/spec_writer.py
+++ b/core/agents/spec_writer.py
@@ -0,0 +1,143 @@
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.db.models import Complexity
+from core.llm.parser import StringParser
+from core.telemetry import telemetry
+from core.templates.example_project import (
+    EXAMPLE_PROJECT_ARCHITECTURE,
+    EXAMPLE_PROJECT_DESCRIPTION,
+    EXAMPLE_PROJECT_PLAN,
+)
+
+# If the project description is less than this, perform an analysis using LLM
+ANALYZE_THRESHOLD = 1500
+# URL to the wiki page with tips on how to write a good project description
+INITIAL_PROJECT_HOWTO_URL = (
+    "https://github.com/Pythagora-io/gpt-pilot/wiki/How-to-write-a-good-initial-project-description"
+)
+
+
+class SpecWriter(BaseAgent):
+    agent_type = "spec-writer"
+    display_name = "Spec Writer"
+
+    async def run(self) -> AgentResponse:
+        response = await self.ask_question(
+            "Describe your app in as much detail as possible",
+            allow_empty=False,
+            buttons={"example": "Start an example project"},
+        )
+        if response.cancelled:
+            return AgentResponse.error(self, "No project description")
+
+        if response.button == "example":
+            self.prepare_example_project()
+            return AgentResponse.done(self)
+
+        spec = response.text
+
+        complexity = await self.check_prompt_complexity(spec)
+        if len(spec) < ANALYZE_THRESHOLD and complexity != Complexity.SIMPLE:
+            spec = await self.analyze_spec(spec)
+            spec = await self.review_spec(spec)
+
+        self.next_state.specification = self.current_state.specification.clone()
+        self.next_state.specification.description = spec
+        self.next_state.specification.complexity = complexity
+        telemetry.set("initial_prompt", spec)
+        telemetry.set("is_complex_app", complexity != Complexity.SIMPLE)
+
+        return AgentResponse.done(self)
+
+    async def check_prompt_complexity(self, prompt: str) -> str:
+        await self.send_message("Checking the complexity of the prompt ...")
+        llm = self.get_llm()
+        convo = AgentConvo(self).template("prompt_complexity", prompt=prompt)
+        llm_response: str = await llm(convo, temperature=0, parser=StringParser())
+        return llm_response.lower()
+
+    def prepare_example_project(self):
+        spec = self.current_state.specification.clone()
+        spec.description = EXAMPLE_PROJECT_DESCRIPTION
+        spec.architecture = EXAMPLE_PROJECT_ARCHITECTURE["architecture"]
+        spec.system_dependencies = EXAMPLE_PROJECT_ARCHITECTURE["system_dependencies"]
+        spec.package_dependencies = EXAMPLE_PROJECT_ARCHITECTURE["package_dependencies"]
+        spec.template = EXAMPLE_PROJECT_ARCHITECTURE["template"]
+        spec.complexity = Complexity.SIMPLE
+        telemetry.set("initial_prompt", spec.description.strip())
+        telemetry.set("is_complex_app", False)
+        telemetry.set("template", spec.template)
+        telemetry.set(
+            "architecture",
+            {
+                "architecture": spec.architecture,
+                "system_dependencies": spec.system_dependencies,
+                "package_dependencies": spec.package_dependencies,
+            },
+        )
+        self.next_state.specification = spec
+
+        self.next_state.epics = [
+            {
+                "name": "Initial Project",
+                "description": EXAMPLE_PROJECT_DESCRIPTION,
+                "completed": False,
+                "complexity": Complexity.SIMPLE,
+            }
+        ]
+        self.next_state.tasks = EXAMPLE_PROJECT_PLAN
+
+    async def analyze_spec(self, spec: str) -> str:
+        msg = (
+            "Your project description seems a bit short. "
+            "The better you can describe the project, the better GPT Pilot will understand what you'd like to build.\n\n"
+            f"Here are some tips on how to better describe the project: {INITIAL_PROJECT_HOWTO_URL}\n\n"
+            "Let's start by refining your project idea:"
+        )
+        await self.send_message(msg)
+
+        llm = self.get_llm()
+        convo = AgentConvo(self).template("ask_questions").user(spec)
+
+        while True:
+            response: str = await llm(convo)
+            if len(response) > 500:
+                # The response is too long for it to be a question, assume it's the spec
+                confirm = await self.ask_question(
+                    (
+                        "Can we proceed with this project description? If so, just press ENTER. "
+                        "Otherwise, please tell me what's missing or what you'd like to add."
+                    ),
+                    allow_empty=True,
+                    buttons={"continue": "Continue"},
+                )
+                if confirm.cancelled or confirm.button == "continue" or confirm.text == "":
+                    return spec
+                convo.user(confirm.text)
+
+            else:
+                convo.assistant(response)
+
+                user_response = await self.ask_question(
+                    response,
+                    buttons={"skip": "Skip questions"},
+                )
+                if user_response.cancelled or user_response.button == "skip":
+                    convo.user(
+                        "This is enough clarification, you have all the information. "
+                        "Please output the spec now, without additional comments or questions."
+                    )
+                    response: str = await llm(convo)
+                    return response
+
+                convo.user(user_response.text)
+
+    async def review_spec(self, spec: str) -> str:
+        convo = AgentConvo(self).template("review_spec", spec=spec)
+        llm = self.get_llm()
+        llm_response: str = await llm(convo, temperature=0)
+        additional_info = llm_response.strip()
+        if additional_info:
+            spec += "\nAdditional info/examples:\n" + additional_info
+        return spec
--- a/core/agents/task_reviewer.py
+++ b/core/agents/task_reviewer.py
@@ -0,0 +1,53 @@
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class TaskReviewer(BaseAgent):
+    agent_type = "task-reviewer"
+    display_name = "Task Reviewer"
+
+    async def run(self) -> AgentResponse:
+        response = await self.review_code_changes()
+        self.next_state.complete_step()
+        return response
+
+    async def review_code_changes(self) -> AgentResponse:
+        """
+        Review all the code changes during current task.
+        """
+
+        log.debug(f"Reviewing code changes for task {self.current_state.current_task['description']}")
+        await self.send_message("Reviewing the task implementation ...")
+        all_feedbacks = [
+            iteration["user_feedback"].replace("```", "").strip()
+            for iteration in self.current_state.iterations
+            # Some iterations are created by the task reviewer and have no user feedback
+            if iteration["user_feedback"]
+        ]
+
+        files_before_modification = self.current_state.modified_files
+        files_after_modification = [
+            (file.path, file.content.content)
+            for file in self.current_state.files
+            if (file.path in files_before_modification)
+        ]
+
+        llm = self.get_llm()
+        # TODO instead of sending files before and after maybe add nice way to show diff for multiple files
+        convo = AgentConvo(self).template(
+            "review_task",
+            current_task=self.current_state.current_task,
+            all_feedbacks=all_feedbacks,
+            files_before_modification=files_before_modification,
+            files_after_modification=files_after_modification,
+        )
+        llm_response: str = await llm(convo, temperature=0.7)
+
+        if llm_response.strip().lower() == "done":
+            return AgentResponse.done(self)
+        else:
+            return AgentResponse.task_review_feedback(self, llm_response)
--- a/core/agents/tech_lead.py
+++ b/core/agents/tech_lead.py
@@ -0,0 +1,196 @@
+from typing import Optional
+from uuid import uuid4
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse, ResponseType
+from core.db.models import Complexity
+from core.llm.parser import JSONParser
+from core.log import get_logger
+from core.templates.registry import apply_project_template, get_template_summary
+from core.ui.base import ProjectStage
+
+log = get_logger(__name__)
+
+
+class Task(BaseModel):
+    description: str = Field(description=("Very detailed description of a development task."))
+
+
+class DevelopmentPlan(BaseModel):
+    plan: list[Task] = Field(description="List of development tasks that need to be done to implement the entire plan.")
+
+
+class UpdatedDevelopmentPlan(BaseModel):
+    updated_current_task: Task = Field(
+        description="Updated detailed description of what was implemented while working on the current development task."
+    )
+    plan: list[Task] = Field(description="List of unfinished development tasks.")
+
+
+class TechLead(BaseAgent):
+    agent_type = "tech-lead"
+    display_name = "Tech Lead"
+
+    async def run(self) -> AgentResponse:
+        if self.prev_response and self.prev_response.type == ResponseType.UPDATE_EPIC:
+            return await self.update_epic()
+
+        if len(self.current_state.epics) == 0:
+            self.create_initial_project_epic()
+            # Orchestrator will rerun us to break down the initial project epic
+            return AgentResponse.done(self)
+
+        await self.ui.send_project_stage(ProjectStage.CODING)
+
+        if self.current_state.specification.template and not self.current_state.files:
+            await self.apply_project_template()
+            return AgentResponse.done(self)
+
+        unfinished_epics = self.current_state.unfinished_epics
+        if unfinished_epics:
+            return await self.plan_epic(unfinished_epics[0])
+        else:
+            return await self.ask_for_new_feature()
+
+    def create_initial_project_epic(self):
+        log.debug("Creating initial project epic")
+        self.next_state.epics = [
+            {
+                "id": uuid4().hex,
+                "name": "Initial Project",
+                "source": "app",
+                "description": self.current_state.specification.description,
+                "summary": None,
+                "completed": False,
+                "complexity": self.current_state.specification.complexity,
+            }
+        ]
+
+    async def apply_project_template(self) -> Optional[str]:
+        state = self.current_state
+
+        # Only do this for the initial project and if the template is specified
+        if len(state.epics) != 1 or not state.specification.template:
+            return None
+
+        log.info(f"Applying project template: {self.current_state.specification.template}")
+        await self.send_message(f"Applying project template {self.current_state.specification.template} ...")
+        summary = await apply_project_template(
+            self.current_state.specification.template,
+            self.state_manager,
+            self.process_manager,
+        )
+        # Saving template files will fill this in and we want it clear for the
+        # first task.
+        self.next_state.relevant_files = []
+        return summary
+
+    async def ask_for_new_feature(self) -> AgentResponse:
+        log.debug("Asking for new feature")
+        response = await self.ask_question(
+            "Do you have a new feature to add to the project? Just write it here",
+            buttons={"end": "No, I'm done"},
+            allow_empty=True,
+        )
+
+        if response.cancelled or response.button == "end" or not response.text:
+            return AgentResponse.exit(self)
+
+        self.next_state.epics = self.current_state.epics + [
+            {
+                "id": uuid4().hex,
+                "name": f"Feature #{len(self.current_state.epics)}",
+                "source": "feature",
+                "description": response.text,
+                "summary": None,
+                "completed": False,
+                "complexity": Complexity.HARD,
+            }
+        ]
+        # Orchestrator will rerun us to break down the new feature epic
+        return AgentResponse.done(self)
+
+    async def plan_epic(self, epic) -> AgentResponse:
+        log.debug(f"Planning tasks for the epic: {epic['name']}")
+        await self.send_message("Starting to create the action plan for development ...")
+
+        llm = self.get_llm()
+        convo = (
+            AgentConvo(self)
+            .template(
+                "plan",
+                epic=epic,
+                task_type=self.current_state.current_epic.get("source", "app"),
+                existing_summary=get_template_summary(self.current_state.specification.template),
+            )
+            .require_schema(DevelopmentPlan)
+        )
+
+        response: DevelopmentPlan = await llm(convo, parser=JSONParser(DevelopmentPlan))
+        self.next_state.tasks = self.current_state.tasks + [
+            {
+                "id": uuid4().hex,
+                "description": task.description,
+                "instructions": None,
+                "completed": False,
+            }
+            for task in response.plan
+        ]
+        return AgentResponse.done(self)
+
+    async def update_epic(self) -> AgentResponse:
+        """
+        Update the development plan for the current epic.
+
+        As a side-effect, this also marks the current task as a complete,
+        and should only be called by Troubleshooter once the task is done,
+        if the Troubleshooter decides plan update is needed.
+
+        """
+        epic = self.current_state.current_epic
+        self.next_state.complete_task()
+        await self.state_manager.log_task_completed()
+
+        if not self.next_state.unfinished_tasks:
+            # There are no tasks after this one, so there's nothing to update
+            return AgentResponse.done(self)
+
+        finished_tasks = [task for task in self.next_state.tasks if task["completed"]]
+
+        log.debug(f"Updating development plan for {epic['name']}")
+        await self.ui.send_message("Updating development plan ...")
+
+        llm = self.get_llm()
+        convo = (
+            AgentConvo(self)
+            .template(
+                "update_plan",
+                finished_tasks=finished_tasks,
+                task_type=self.current_state.current_epic.get("source", "app"),
+                modified_files=[f for f in self.current_state.files if f.path in self.current_state.modified_files],
+            )
+            .require_schema(UpdatedDevelopmentPlan)
+        )
+
+        response: UpdatedDevelopmentPlan = await llm(
+            convo,
+            parser=JSONParser(UpdatedDevelopmentPlan),
+            temperature=0,
+        )
+        log.debug(f"Reworded last task as: {response.updated_current_task.description}")
+        finished_tasks[-1]["description"] = response.updated_current_task.description
+
+        self.next_state.tasks = finished_tasks + [
+            {
+                "id": uuid4().hex,
+                "description": task.description,
+                "instructions": None,
+                "completed": False,
+            }
+            for task in response.plan
+        ]
+        log.debug(f"Updated development plan for {epic['name']}, {len(response.plan)} tasks remaining")
+        return AgentResponse.done(self)
--- a/core/agents/tech_writer.py
+++ b/core/agents/tech_writer.py
@@ -0,0 +1,30 @@
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.response import AgentResponse
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class TechnicalWriter(BaseAgent):
+    agent_type = "tech-writer"
+    display_name = "Technical Writer"
+
+    async def run(self) -> AgentResponse:
+        n_tasks = len(self.current_state.tasks)
+        n_unfinished = len(self.current_state.unfinished_tasks)
+
+        if n_unfinished in [n_tasks // 2, 1]:
+            # Halfway through the initial project, and and at the last task
+            await self.create_readme()
+
+        self.next_state.complete_step()
+        return AgentResponse.done(self)
+
+    async def create_readme(self):
+        await self.ui.send_message("Creating README ...")
+
+        llm = self.get_llm()
+        convo = AgentConvo(self).template("create_readme")
+        llm_response: str = await llm(convo)
+        await self.state_manager.save_file("README.md", llm_response)
--- a/core/agents/troubleshooter.py
+++ b/core/agents/troubleshooter.py
@@ -0,0 +1,281 @@
+from typing import Optional
+from uuid import uuid4
+
+from pydantic import BaseModel, Field
+
+from core.agents.base import BaseAgent
+from core.agents.convo import AgentConvo
+from core.agents.mixins import IterationPromptMixin
+from core.agents.response import AgentResponse
+from core.llm.parser import JSONParser, OptionalCodeBlockParser
+from core.log import get_logger
+from core.telemetry import telemetry
+
+log = get_logger(__name__)
+
+LOOP_THRESHOLD = 3  # number of iterations in task to be considered a loop
+
+
+class BugReportQuestions(BaseModel):
+    missing_data: list[str] = Field(
+        description="Very clear question that needs to be answered to have good bug report."
+    )
+
+
+class Troubleshooter(IterationPromptMixin, BaseAgent):
+    agent_type = "troubleshooter"
+    display_name = "Troubleshooter"
+
+    async def run(self) -> AgentResponse:
+        run_command = await self.get_run_command()
+        user_instructions = await self.get_user_instructions()
+        if user_instructions is None:
+            # LLM decided we don't need to test anything, so we're done with the task
+            return await self.complete_task()
+
+        # Developer sets iteration as "completed" when it generates the step breakdown, so we can't
+        # use "current_iteration" here
+        last_iteration = self.current_state.iterations[-1] if self.current_state.iterations else None
+
+        should_iterate, is_loop, user_feedback = await self.get_user_feedback(
+            run_command,
+            user_instructions,
+            last_iteration is not None,
+        )
+        if not should_iterate:
+            # User tested and reported no problems, we're done with the task
+            return await self.complete_task()
+
+        user_feedback_qa = await self.generate_bug_report(run_command, user_instructions, user_feedback)
+
+        if is_loop:
+            if last_iteration["alternative_solutions"]:
+                # If we already have alternative solutions, it means we were already in a loop.
+                return self.try_next_alternative_solution(user_feedback, user_feedback_qa)
+            else:
+                # Newly detected loop, set up an empty new iteration to trigger ProblemSolver
+                llm_solution = ""
+                await self.trace_loop("loop-feedback")
+        else:
+            llm_solution = await self.find_solution(user_feedback, user_feedback_qa=user_feedback_qa)
+
+        self.next_state.iterations = self.current_state.iterations + [
+            {
+                "id": uuid4().hex,
+                "user_feedback": user_feedback,
+                "user_feedback_qa": user_feedback_qa,
+                "description": llm_solution,
+                "alternative_solutions": [],
+                # FIXME - this is incorrect if this is a new problem; otherwise we could
+                # just count the iterations
+                "attempts": 1,
+                "completed": False,
+            }
+        ]
+        if len(self.next_state.iterations) == LOOP_THRESHOLD:
+            await self.trace_loop("loop-start")
+
+        return AgentResponse.done(self)
+
+    async def complete_task(self) -> AgentResponse:
+        """
+        Mark the current task as completed.
+
+        If there were iterations for the task, instead of marking the task as completed directly,
+        we ask the TechLead to update the epic (it needs state to the current task) and then mark
+        the task as completed.
+        """
+        self.next_state.steps = []
+        if len(self.current_state.iterations) >= LOOP_THRESHOLD:
+            await self.trace_loop("loop-end")
+
+        if self.current_state.iterations:
+            return AgentResponse.update_epic(self)
+        else:
+            self.next_state.complete_task()
+            await self.state_manager.log_task_completed()
+            await self.ui.send_task_progress(
+                self.current_state.tasks.index(self.current_state.current_task) + 1,
+                len(self.current_state.tasks),
+                self.current_state.current_task["description"],
+                self.current_state.current_epic.get("source", "app"),
+                "done",
+            )
+            return AgentResponse.done(self)
+
+    def _get_task_convo(self) -> AgentConvo:
+        # FIXME: Current prompts reuse conversation from the developer so we have to resort to this
+        task = self.current_state.current_task
+        current_task_index = self.current_state.tasks.index(task)
+
+        return (
+            AgentConvo(self)
+            .template(
+                "breakdown",
+                task=task,
+                iteration=None,
+                current_task_index=current_task_index,
+            )
+            .assistant(self.current_state.current_task["instructions"])
+        )
+
+    async def get_run_command(self) -> Optional[str]:
+        if self.current_state.run_command:
+            return self.current_state.run_command
+
+        await self.send_message("Figuring out how to run the app ...")
+
+        llm = self.get_llm()
+        convo = self._get_task_convo().template("get_run_command")
+
+        # Although the prompt is explicit about not using "```", LLM may still return it
+        llm_response: str = await llm(convo, temperature=0, parser=OptionalCodeBlockParser())
+        self.next_state.run_command = llm_response
+        return llm_response
+
+    async def get_user_instructions(self) -> Optional[str]:
+        await self.send_message("Determining how to test the app ...")
+
+        llm = self.get_llm()
+        convo = self._get_task_convo().template("define_user_review_goal", task=self.current_state.current_task)
+        user_instructions: str = await llm(convo)
+
+        user_instructions = user_instructions.strip()
+        if user_instructions.lower() == "done":
+            log.debug(f"Nothing to do for user testing for task {self.current_state.current_task['description']}")
+            return None
+
+        return user_instructions
+
+    async def get_user_feedback(
+        self,
+        run_command: str,
+        user_instructions: str,
+        last_iteration: Optional[dict],
+    ) -> tuple[bool, bool, str, str]:
+        """
+        Ask the user to test the app and provide feedback.
+
+        :return (bool, bool, str): Tuple containing "should_iterate", "is_loop" and
+        "user_feedback" respectively.
+
+        If "should_iterate" is False, the user has confirmed that the app works as expected and there's
+        nothing for the troubleshooter or problem solver to do.
+
+        If "is_loop" is True, Pythagora is stuck in a loop and needs to consider alternative solutions.
+
+        The last element in the tuple is the user feedback, which may be empty if the user provided no
+        feedback (eg. if they just clicked on "Continue" or "I'm stuck in a loop").
+        """
+
+        test_message = "Can you check if the app works please?"
+        if user_instructions:
+            test_message += " Here is a description of what should be working:\n\n" + user_instructions
+
+        if run_command:
+            await self.ui.send_run_command(run_command)
+
+        buttons = {"continue": "Everything works, continue"}
+        if last_iteration:
+            buttons["loop"] = "I still have the same issue"
+
+        user_response = await self.ask_question(
+            test_message,
+            buttons=buttons,
+            default="continue",
+        )
+        if user_response.button == "continue" or user_response.cancelled:
+            return False, False, ""
+
+        if user_response.button == "loop":
+            return True, True, ""
+
+        return True, False, user_response.text
+
+    def try_next_alternative_solution(self, user_feedback: str, user_feedback_qa: list[str]) -> AgentResponse:
+        """
+        Call the ProblemSolver to try an alternative solution.
+
+        Stores the user feedback and sets iteration state (not completed, no description)
+        so that ProblemSolver will be triggered.
+
+        :param user_feedback: User feedback to store in the iteration state.
+        :param user_feedback_qa: Additional questions/answers about the problem.
+        :return: Agent response done.
+        """
+        next_state_iteration = self.next_state.iterations[-1]
+        next_state_iteration["description"] = ""
+        next_state_iteration["user_feedback"] = user_feedback
+        next_state_iteration["user_feedback_qa"] = user_feedback_qa
+        next_state_iteration["attempts"] += 1
+        next_state_iteration["completed"] = False
+        self.next_state.flag_iterations_as_modified()
+        return AgentResponse.done(self)
+
+    async def generate_bug_report(
+        self,
+        run_command: Optional[str],
+        user_instructions: str,
+        user_feedback: str,
+    ) -> list[str]:
+        """
+        Generate a bug report from the user feedback.
+
+        :param run_command: The command to run to test the app.
+        :param user_instructions: Instructions on how to test the functionality.
+        :param user_feedback: The user feedback.
+        :return: Additional questions and answers to generate a better bug report.
+        """
+        additional_qa = []
+        llm = self.get_llm()
+        convo = (
+            AgentConvo(self)
+            .template(
+                "bug_report",
+                user_instructions=user_instructions,
+                user_feedback=user_feedback,
+                # TODO: revisit if we again want to run this in a loop, where this is useful
+                additional_qa=additional_qa,
+            )
+            .require_schema(BugReportQuestions)
+        )
+        llm_response: BugReportQuestions = await llm(convo, parser=JSONParser(BugReportQuestions))
+
+        if not llm_response.missing_data:
+            return []
+
+        for question in llm_response.missing_data:
+            if run_command:
+                await self.ui.send_run_command(run_command)
+            user_response = await self.ask_question(
+                question,
+                buttons={
+                    "continue": "Submit answer",
+                    "skip": "Skip this question",
+                    "skip-all": "Skip all questions",
+                },
+                allow_empty=False,
+            )
+            if user_response.cancelled or user_response.button == "skip-all":
+                break
+            elif user_response.button == "skip":
+                continue
+
+            additional_qa.append(
+                {
+                    "question": question,
+                    "answer": user_response.text,
+                }
+            )
+
+        return additional_qa
+
+    async def trace_loop(self, trace_event: str):
+        state = self.current_state
+        task_with_loop = {
+            "task_description": state.current_task["description"],
+            "task_number": len([t for t in state.tasks if t["completed"]]) + 1,
+            "steps": len(state.steps),
+            "iterations": len(state.iterations),
+        }
+        await telemetry.trace_loop(trace_event, task_with_loop)
--- a/core/cli/init.py
+++ b/core/cli/init.py
--- a/core/cli/helpers.py
+++ b/core/cli/helpers.py
@@ -0,0 +1,319 @@
+import json
+import os
+import os.path
+import sys
+from argparse import ArgumentParser, ArgumentTypeError, Namespace
+from typing import Optional
+from urllib.parse import urlparse
+from uuid import UUID
+
+from core.config import Config, LLMProvider, LocalIPCConfig, ProviderConfig, UIAdapter, get_config, loader
+from core.config.env_importer import import_from_dotenv
+from core.config.version import get_version
+from core.db.session import SessionManager
+from core.db.setup import run_migrations
+from core.log import setup
+from core.state.state_manager import StateManager
+from core.ui.base import UIBase
+from core.ui.console import PlainConsoleUI
+from core.ui.ipc_client import IPCClientUI
+
+
+def parse_llm_endpoint(value: str) -> Optional[tuple[LLMProvider, str]]:
+    """
+    Parse --llm-endpoint command-line option.
+
+    Option syntax is: --llm-endpoint <provider>:<url>
+
+    :param value: Argument value.
+    :return: Tuple with LLM provider and URL, or None if if the option wasn't provided.
+    """
+    if not value:
+        return None
+
+    parts = value.split(":", 1)
+    if len(parts) != 2:
+        raise ArgumentTypeError("Invalid LLM endpoint format; expected 'provider:url'")
+
+    try:
+        provider = LLMProvider(parts[0])
+    except ValueError as err:
+        raise ArgumentTypeError(f"Unsupported LLM provider: {err}")
+    url = urlparse(parts[1])
+    if url.scheme not in ("http", "https"):
+        raise ArgumentTypeError(f"Invalid LLM endpoint URL: {parts[1]}")
+
+    return provider, url.geturl()
+
+
+def parse_llm_key(value: str) -> Optional[tuple[LLMProvider, str]]:
+    """
+    Parse --llm-key command-line option.
+
+    Option syntax is: --llm-key <provider>:<key>
+
+    :param value: Argument value.
+    :return: Tuple with LLM provider and key, or None if if the option wasn't provided.
+    """
+    if not value:
+        return None
+
+    parts = value.split(":", 1)
+    if len(parts) != 2:
+        raise ArgumentTypeError("Invalid LLM endpoint format; expected 'provider:key'")
+
+    try:
+        provider = LLMProvider(parts[0])
+    except ValueError as err:
+        raise ArgumentTypeError(f"Unsupported LLM provider: {err}")
+
+    return provider, parts[1]
+
+
+def parse_arguments() -> Namespace:
+    """
+    Parse command-line arguments.
+
+    Available arguments:
+        --help: Show the help message
+        --config: Path to the configuration file
+        --show-config: Output the default configuration to stdout
+        --default-config: Output the configuration to stdout
+        --level: Log level (debug,info,warning,error,critical)
+        --database: Database URL
+        --local-ipc-port: Local IPC port to connect to
+        --local-ipc-host: Local IPC host to connect to
+        --version: Show the version and exit
+        --list: List all projects
+        --list-json: List all projects in JSON format
+        --project: Load a specific project
+        --branch: Load a specific branch
+        --step: Load a specific step in a project/branch
+        --llm-endpoint: Use specific API endpoint for the given provider
+        --llm-key: Use specific LLM key for the given provider
+        --import-v0: Import data from a v0 (gpt-pilot) database with the given path
+        --email: User's email address, if provided
+        --extension-version: Version of the VSCode extension, if used
+    :return: Parsed arguments object.
+    """
+    version = get_version()
+
+    parser = ArgumentParser()
+    parser.add_argument("--config", help="Path to the configuration file", default="config.json")
+    parser.add_argument("--show-config", help="Output the default configuration to stdout", action="store_true")
+    parser.add_argument("--level", help="Log level (debug,info,warning,error,critical)", required=False)
+    parser.add_argument("--database", help="Database URL", required=False)
+    parser.add_argument("--local-ipc-port", help="Local IPC port to connect to", type=int, required=False)
+    parser.add_argument("--local-ipc-host", help="Local IPC host to connect to", default="localhost", required=False)
+    parser.add_argument("--version", action="version", version=version)
+    parser.add_argument("--list", help="List all projects", action="store_true")
+    parser.add_argument("--list-json", help="List all projects in JSON format", action="store_true")
+    parser.add_argument("--project", help="Load a specific project", type=UUID, required=False)
+    parser.add_argument("--branch", help="Load a specific branch", type=UUID, required=False)
+    parser.add_argument("--step", help="Load a specific step in a project/branch", type=int, required=False)
+    parser.add_argument("--delete", help="Delete a specific project", type=UUID, required=False)
+    parser.add_argument(
+        "--llm-endpoint",
+        help="Use specific API endpoint for the given provider",
+        type=parse_llm_endpoint,
+        action="append",
+        required=False,
+    )
+    parser.add_argument(
+        "--llm-key",
+        help="Use specific LLM key for the given provider",
+        type=parse_llm_key,
+        action="append",
+        required=False,
+    )
+    parser.add_argument(
+        "--import-v0",
+        help="Import data from a v0 (gpt-pilot) database with the given path",
+        required=False,
+    )
+    parser.add_argument("--email", help="User's email address", required=False)
+    parser.add_argument("--extension-version", help="Version of the VSCode extension", required=False)
+    return parser.parse_args()
+
+
+def load_config(args: Namespace) -> Optional[Config]:
+    """
+    Load Pythagora JSON configuration file and apply command-line arguments.
+
+    :param args: Command-line arguments (at least `config` must be present).
+    :return: Configuration object, or None if config couldn't be loaded.
+    """
+    if not os.path.isfile(args.config):
+        imported = import_from_dotenv(args.config)
+        if not imported:
+            print(f"Configuration file not found: {args.config}; using default", file=sys.stderr)
+            return get_config()
+
+    try:
+        config = loader.load(args.config)
+    except ValueError as err:
+        print(f"Error parsing config file {args.config}: {err}", file=sys.stderr)
+        return None
+
+    if args.level:
+        config.log.level = args.level.upper()
+
+    if args.database:
+        config.db.url = args.database
+
+    if args.local_ipc_port:
+        config.ui = LocalIPCConfig(port=args.local_ipc_port, host=args.local_ipc_host)
+
+    if args.llm_endpoint:
+        for provider, endpoint in args.llm_endpoint:
+            if provider not in config.llm:
+                config.llm[provider] = ProviderConfig()
+            config.llm[provider].base_url = endpoint
+
+    if args.llm_key:
+        for provider, key in args.llm_key:
+            if provider not in config.llm:
+                config.llm[provider] = ProviderConfig()
+            config.llm[provider].api_key = key
+
+    try:
+        Config.model_validate(config)
+    except ValueError as err:
+        print(f"Configuration error: {err}", file=sys.stderr)
+        return None
+
+    return config
+
+
+async def list_projects_json(db: SessionManager):
+    """
+    List all projects in the database in JSON format.
+    """
+    sm = StateManager(db)
+    projects = await sm.list_projects()
+
+    data = []
+    for project in projects:
+        p = {
+            "name": project.name,
+            "id": project.id.hex,
+            "branches": [],
+        }
+        for branch in project.branches:
+            b = {
+                "name": branch.name,
+                "id": branch.id.hex,
+                "steps": [],
+            }
+            for state in branch.states:
+                s = {
+                    "name": f"Step #{state.step_index}",
+                    "step": state.step_index,
+                }
+                b["steps"].append(s)
+            if b["steps"]:
+                b["steps"][-1]["name"] = "Latest step"
+            p["branches"].append(b)
+        data.append(p)
+    print(json.dumps(data, indent=2))
+
+
+async def list_projects(db: SessionManager):
+    """
+    List all projects in the database.
+    """
+    sm = StateManager(db)
+    projects = await sm.list_projects()
+
+    print(f"Available projects ({len(projects)}):")
+    for project in projects:
+        print(f"* {project.name} ({project.id})")
+        for branch in project.branches:
+            last_step = max(state.step_index for state in branch.states)
+            print(f"  - {branch.name} ({branch.id}) - last step: {last_step}")
+
+
+async def load_project(
+    sm: StateManager,
+    project_id: Optional[UUID] = None,
+    branch_id: Optional[UUID] = None,
+    step_index: Optional[int] = None,
+) -> bool:
+    """
+    Load a project from the database.
+
+    :param sm: State manager.
+    :param project_id: Project ID (optional, loads the last step in the main branch).
+    :param branch_id: Branch ID (optional, loads the last step in the branch).
+    :param step_index: Step index (optional, loads the state at the given step).
+    :return: True if the project was loaded successfully, False otherwise.
+    """
+    step_txt = f" step {step_index}" if step_index else ""
+
+    if branch_id:
+        project_state = await sm.load_project(branch_id=branch_id, step_index=step_index)
+        if project_state:
+            return True
+        else:
+            print(f"Branch {branch_id}{step_txt} not found; use --list to list all projects", file=sys.stderr)
+            return False
+
+    elif project_id:
+        project_state = await sm.load_project(project_id=project_id, step_index=step_index)
+        if project_state:
+            return True
+        else:
+            print(f"Project {project_id}{step_txt} not found; use --list to list all projects", file=sys.stderr)
+            return False
+
+    return False
+
+
+async def delete_project(sm: StateManager, project_id: UUID) -> bool:
+    """
+    Delete a project from a database.
+
+    :param sm: State manager.
+    :param project_id: Project ID.
+    :return: True if project was deleted, False otherwise.
+    """
+
+    return await sm.delete_project(project_id)
+
+
+def show_config():
+    """
+    Print the current configuration to stdout.
+    """
+    cfg = get_config()
+    print(cfg.model_dump_json(indent=2))
+
+
+def init() -> tuple[UIBase, SessionManager, Namespace]:
+    """
+    Initialize the application.
+
+    Loads configuration, sets up logging and UI, initializes the database
+    and runs database migrations.
+
+    :return: Tuple with UI, db session manager, file manager, and command-line arguments.
+    """
+    args = parse_arguments()
+    config = load_config(args)
+    if not config:
+        return (None, None, args)
+
+    setup(config.log, force=True)
+
+    if config.ui.type == UIAdapter.IPC_CLIENT:
+        ui = IPCClientUI(config.ui)
+    else:
+        ui = PlainConsoleUI()
+
+    run_migrations(config.db)
+    db = SessionManager(config.db)
+
+    return (ui, db, args)
+
+
+__all__ = ["parse_arguments", "load_config", "list_projects_json", "list_projects", "load_project", "init"]
--- a/core/cli/main.py
+++ b/core/cli/main.py
@@ -0,0 +1,145 @@
+import sys
+from argparse import Namespace
+from asyncio import run
+
+from core.agents.orchestrator import Orchestrator
+from core.cli.helpers import delete_project, init, list_projects, list_projects_json, load_project, show_config
+from core.db.session import SessionManager
+from core.db.v0importer import LegacyDatabaseImporter
+from core.llm.base import APIError
+from core.log import get_logger
+from core.state.state_manager import StateManager
+from core.telemetry import telemetry
+from core.ui.base import UIBase
+
+log = get_logger(__name__)
+
+
+async def run_project(sm: StateManager, ui: UIBase) -> bool:
+    """
+    Work on the project.
+
+    Starts the orchestrator agent with the newly loaded/created project
+    and runs it until the orchestrator decides to exit.
+
+    :param sm: State manager.
+    :param ui: User interface.
+    :return: True if the orchestrator exited successfully, False otherwise.
+    """
+
+    telemetry.start()
+    telemetry.set("app_id", str(sm.project.id))
+    telemetry.set("initial_prompt", sm.current_state.specification.description)
+
+    orca = Orchestrator(sm, ui)
+    success = False
+    try:
+        success = await orca.run()
+    except KeyboardInterrupt:
+        log.info("Interrupted by user")
+        telemetry.set("end_result", "interrupt")
+        await sm.rollback()
+    except APIError as err:
+        log.warning(f"LLM API error occurred: {err.message}")
+        await ui.send_message(f"LLM API error occurred: {err.message}")
+        await ui.send_message("Stopping Pythagora due to previous error.")
+        telemetry.set("end_result", "failure:api-error")
+        await sm.rollback()
+    except Exception as err:
+        telemetry.record_crash(err)
+        await sm.rollback()
+        log.error(f"Uncaught exception: {err}", exc_info=True)
+        await ui.send_message(f"Unrecoverable error occurred: {err}")
+
+    if success:
+        telemetry.set("end_result", "success:exit")
+    else:
+        # We assume unsuccessful exit (but not an exception) is a result
+        # of an API error that the user didn't retry.
+        telemetry.set("end_result", "failure:api-error")
+
+    await telemetry.send()
+    return success
+
+
+async def start_new_project(sm: StateManager, ui: UIBase) -> bool:
+    """
+    Start a new project.
+
+    :param sm: State manager.
+    :param ui: User interface.
+    :return: True if the project was created successfully, False otherwise.
+    """
+    user_input = await ui.ask_question("What is the name of the project", allow_empty=False)
+    if user_input.cancelled:
+        return False
+
+    project_state = await sm.create_project(user_input.text)
+    return project_state is not None
+
+
+async def async_main(
+    ui: UIBase,
+    db: SessionManager,
+    args: Namespace,
+) -> bool:
+    """
+    Main application coroutine.
+
+    :param ui: User interface.
+    :param db: Database session manager.
+    :param args: Command-line arguments.
+    :return: True if the application ran successfully, False otherwise.
+    """
+
+    if args.list:
+        await list_projects(db)
+        return True
+    elif args.list_json:
+        await list_projects_json(db)
+        return True
+    if args.show_config:
+        show_config()
+        return True
+    elif args.import_v0:
+        importer = LegacyDatabaseImporter(db, args.import_v0)
+        await importer.import_database()
+        return True
+
+    telemetry.set("user_contact", args.email)
+    if args.extension_version:
+        telemetry.set("is_extension", True)
+        telemetry.set("extension_version", args.extension_version)
+
+    sm = StateManager(db, ui)
+    ui_started = await ui.start()
+    if not ui_started:
+        return False
+
+    if args.project or args.branch or args.step:
+        telemetry.set("is_continuation", True)
+        # FIXME: we should send the project stage and other runtime info to the UI
+        success = await load_project(sm, args.project, args.branch, args.step)
+        if not success:
+            return False
+    elif args.delete:
+        success = await delete_project(sm, args.delete)
+        return success
+    else:
+        success = await start_new_project(sm, ui)
+        if not success:
+            return False
+
+    return await run_project(sm, ui)
+
+
+def run_pythagora():
+    ui, db, args = init()
+    if not ui or not db:
+        return -1
+    success = run(async_main(ui, db, args))
+    return 0 if success else -1
+
+
+if __name__ == "__main__":
+    sys.exit(run_pythagora())
--- a/core/config/init.py
+++ b/core/config/init.py
@@ -0,0 +1,375 @@
+from enum import Enum
+from os.path import abspath, dirname, isdir, join
+from typing import Literal, Optional, Union
+
+from pydantic import BaseModel, ConfigDict, Field, field_validator
+from typing_extensions import Annotated
+
+ROOT_DIR = abspath(join(dirname(__file__), "..", ".."))
+DEFAULT_IGNORE_PATHS = [
+    ".git",
+    ".gpt-pilot",
+    ".idea",
+    ".vscode",
+    ".next",
+    ".DS_Store",
+    "__pycache__",
+    "site-packages",
+    "node_modules",
+    "package-lock.json",
+    "venv",
+    "dist",
+    "build",
+    "target",
+    "*.min.js",
+    "*.min.css",
+    "*.svg",
+    "*.csv",
+    "*.log",
+    "go.sum",
+]
+IGNORE_SIZE_THRESHOLD = 50000  # 50K+ files are ignored by default
+
+# Agents with sane setup in the default configuration
+DEFAULT_AGENT_NAME = "default"
+DESCRIBE_FILES_AGENT_NAME = "CodeMonkey.describe_files"
+
+
+class _StrictModel(BaseModel):
+    """
+    Pydantic parser configuration options.
+    """
+
+    model_config = ConfigDict(
+        extra="forbid",
+    )
+
+
+class LLMProvider(str, Enum):
+    """
+    Supported LLM providers.
+    """
+
+    OPENAI = "openai"
+    ANTHROPIC = "anthropic"
+    GROQ = "groq"
+    LM_STUDIO = "lm-studio"
+
+
+class UIAdapter(str, Enum):
+    """
+    Supported UI adapters.
+    """
+
+    PLAIN = "plain"
+    IPC_CLIENT = "ipc-client"
+
+
+class ProviderConfig(_StrictModel):
+    """
+    LLM provider configuration.
+    """
+
+    base_url: Optional[str] = Field(
+        None,
+        description="Base URL for the provider's API (if different from the provider default)",
+    )
+    api_key: Optional[str] = Field(
+        None,
+        description="API key to use for authentication (if not set, provider uses default from environment variable)",
+    )
+    connect_timeout: float = Field(
+        default=60.0,
+        description="Timeout (in seconds) for connecting to the provider's API",
+        ge=0.0,
+    )
+    read_timeout: float = Field(
+        default=10.0,
+        description="Timeout (in seconds) for receiving a new chunk of data from the response stream",
+        ge=0.0,
+    )
+
+
+class AgentLLMConfig(_StrictModel):
+    """
+    Configuration for the various LLMs used by Pythagora.
+
+    Each Agent has an LLM provider, from the Enum LLMProvider. If
+    AgentLLMConfig is not specified, default will be used.
+    """
+
+    provider: LLMProvider = LLMProvider.OPENAI
+    model: str = Field(description="Model to use", default="gpt-4-turbo")
+    temperature: float = Field(
+        default=0.5,
+        description="Temperature to use for sampling",
+        ge=0.0,
+        le=1.0,
+    )
+
+
+class LLMConfig(_StrictModel):
+    """
+    Complete agent-specific configuration for an LLM.
+    """
+
+    provider: LLMProvider = LLMProvider.OPENAI
+    model: str = Field(description="Model to use")
+    base_url: Optional[str] = Field(
+        None,
+        description="Base URL for the provider's API (if different from the provider default)",
+    )
+    api_key: Optional[str] = Field(
+        None,
+        description="API key to use for authentication (if not set, provider uses default from environment variable)",
+    )
+    temperature: float = Field(
+        default=0.5,
+        description="Temperature to use for sampling",
+        ge=0.0,
+        le=1.0,
+    )
+    connect_timeout: float = Field(
+        default=60.0,
+        description="Timeout (in seconds) for connecting to the provider's API",
+        ge=0.0,
+    )
+    read_timeout: float = Field(
+        default=10.0,
+        description="Timeout (in seconds) for receiving a new chunk of data from the response stream",
+        ge=0.0,
+    )
+
+    @classmethod
+    def from_provider_and_agent_configs(cls, provider: ProviderConfig, agent: AgentLLMConfig):
+        return cls(
+            provider=agent.provider,
+            model=agent.model,
+            base_url=provider.base_url,
+            api_key=provider.api_key,
+            temperature=agent.temperature,
+            connect_timeout=provider.connect_timeout,
+            read_timeout=provider.read_timeout,
+        )
+
+
+class PromptConfig(_StrictModel):
+    """
+    Configuration for prompt templates:
+    """
+
+    paths: list[str] = Field(
+        [join(ROOT_DIR, "core", "prompts")],
+        description="List of directories to search for prompt templates",
+    )
+
+    @field_validator("paths")
+    @classmethod
+    def validate_paths(cls, v: list[str]) -> list[str]:
+        for path in v:
+            if not isdir(path):
+                raise ValueError(f"Invalid prompt path: {path}")
+        return v
+
+
+class LogConfig(_StrictModel):
+    """
+    Configuration for logging.
+    """
+
+    level: str = Field(
+        "DEBUG",
+        description="Logging level",
+        pattern=r"^(DEBUG|INFO|WARNING|ERROR|CRITICAL)$",
+    )
+    format: str = Field(
+        "%(asctime)s %(levelname)s [%(name)s] %(message)s",
+        description="Logging format",
+    )
+    output: Optional[str] = Field(
+        "pythagora.log",
+        description="Output file for logs (if not specified, logs are printed to stderr)",
+    )
+
+
+class DBConfig(_StrictModel):
+    """
+    Configuration for database connections.
+
+    Supported URL schemes:
+
+    * sqlite+aiosqlite: SQLite database using the aiosqlite driver
+    """
+
+    url: str = Field(
+        "sqlite+aiosqlite:///pythagora.db",
+        description="Database connection URL",
+    )
+    debug_sql: bool = Field(False, description="Log all SQL queries to the console")
+
+    @field_validator("url")
+    @classmethod
+    def validate_url_scheme(cls, v: str) -> str:
+        if v.startswith("sqlite+aiosqlite://"):
+            return v
+        raise ValueError(f"Unsupported database URL scheme in: {v}")
+
+
+class PlainUIConfig(_StrictModel):
+    """
+    Configuration for plaintext console UI.
+    """
+
+    type: Literal[UIAdapter.PLAIN] = UIAdapter.PLAIN
+
+
+class LocalIPCConfig(_StrictModel):
+    """
+    Configuration for VSCode extension IPC client.
+    """
+
+    type: Literal[UIAdapter.IPC_CLIENT] = UIAdapter.IPC_CLIENT
+    host: str = "localhost"
+    port: int = 8125
+
+
+UIConfig = Annotated[
+    Union[PlainUIConfig, LocalIPCConfig],
+    Field(discriminator="type"),
+]
+
+
+class FileSystemType(str, Enum):
+    """
+    Supported filesystem types.
+    """
+
+    MEMORY = "memory"
+    LOCAL = "local"
+
+
+class FileSystemConfig(_StrictModel):
+    """
+    Configuration for project workspace.
+    """
+
+    type: Literal[FileSystemType.LOCAL] = FileSystemType.LOCAL
+    workspace_root: str = Field(
+        join(ROOT_DIR, "workspace"),
+        description="Workspace directory containing all the projects",
+    )
+    ignore_paths: list[str] = Field(
+        DEFAULT_IGNORE_PATHS,
+        description="List of paths to ignore when scanning for files and folders",
+    )
+    ignore_size_threshold: int = Field(
+        IGNORE_SIZE_THRESHOLD,
+        description="Files larger than this size should be ignored",
+    )
+
+
+class Config(_StrictModel):
+    """
+    Pythagora Core configuration
+    """
+
+    llm: dict[LLMProvider, ProviderConfig] = Field(default={LLMProvider.OPENAI: ProviderConfig()})
+    agent: dict[str, AgentLLMConfig] = Field(
+        default={
+            DEFAULT_AGENT_NAME: AgentLLMConfig(),
+            DESCRIBE_FILES_AGENT_NAME: AgentLLMConfig(model="gpt-3.5-turbo", temperature=0.0),
+        }
+    )
+    prompt: PromptConfig = PromptConfig()
+    log: LogConfig = LogConfig()
+    db: DBConfig = DBConfig()
+    ui: UIConfig = PlainUIConfig()
+    fs: FileSystemConfig = FileSystemConfig()
+
+    def llm_for_agent(self, agent_name: str = "default") -> LLMConfig:
+        """
+        Fetch an LLM configuration for a given agent.
+
+        If the agent specific configuration doesn't exist, returns the configuration
+        for the 'default' agent.
+        """
+
+        agent_name = agent_name if agent_name in self.agent else "default"
+        agent_config = self.agent[agent_name]
+        provider_config = self.llm[agent_config.provider]
+        return LLMConfig.from_provider_and_agent_configs(provider_config, agent_config)
+
+
+class ConfigLoader:
+    """
+    Configuration loader takes care of loading and parsing configuration files.
+
+    The default loader is already initialized as `core.config.loader`. To
+    load the configuration from a file, use `core.config.loader.load(path)`.
+
+    To get the current configuration, use `core.config.get_config()`.
+    """
+
+    config: Config
+    config_path: Optional[str]
+
+    def __init__(self):
+        self.config_path = None
+        self.config = Config()
+
+    @staticmethod
+    def _remove_json_comments(json_str: str) -> str:
+        """
+        Remove comments from a JSON string.
+
+        Removes all lines that start with "//" from the JSON string.
+
+        :param json_str: JSON string with comments.
+        :return: JSON string without comments.
+        """
+        return "\n".join([line for line in json_str.splitlines() if not line.strip().startswith("//")])
+
+    @classmethod
+    def from_json(cls: "ConfigLoader", config: str) -> Config:
+        """
+        Parse JSON Into a Config object.
+
+        :param config: JSON string to parse.
+        :return: Config object.
+        """
+        return Config.model_validate_json(cls._remove_json_comments(config), strict=True)
+
+    def load(self, path: str) -> Config:
+        """
+        Load a configuration from a file.
+
+        :param path: Path to the configuration file.
+        :return: Config object.
+        """
+        with open(path, "rb") as f:
+            raw_config = f.read()
+
+        if b"\x00" in raw_config:
+            encoding = "utf-16"
+        else:
+            encoding = "utf-8"
+
+        text_config = raw_config.decode(encoding)
+        self.config = self.from_json(text_config)
+        self.config_path = path
+        return self.config
+
+
+loader = ConfigLoader()
+
+
+def get_config() -> Config:
+    """
+    Return current configuration.
+
+    :return: Current configuration object.
+    """
+    return loader.config
+
+
+__all__ = ["loader", "get_config"]
--- a/core/config/env_importer.py
+++ b/core/config/env_importer.py
@@ -0,0 +1,90 @@
+from os.path import dirname, exists, join
+
+from dotenv import dotenv_values
+
+from core.config import Config, LLMProvider, ProviderConfig, loader
+
+
+def import_from_dotenv(new_config_path: str) -> bool:
+    """
+    Import configuration from old gpt-pilot .env file and save it to a new format.
+
+    If the configuration is already loaded, does nothing. If the target file
+    already exists, it's parsed as is (it's not overwritten).
+
+    Otherwise, loads the values from `pilot/.env` file and creates a new configuration
+    with the relevant settings.
+
+    This intentionally DOES NOT load the .env variables into the current process
+    environments, to avoid polluting it with old settings.
+
+    :param new_config_path: Path to save the new configuration file.
+    :return: True if the configuration was imported, False otherwise.
+    """
+    if loader.config_path or exists(new_config_path):
+        # Config already exists, nothing to do
+        return True
+
+    env_path = join(dirname(__file__), "..", "..", "pilot", ".env")
+    if not exists(env_path):
+        return False
+
+    values = dotenv_values(env_path)
+    if not values:
+        return False
+
+    config = convert_config(values)
+
+    with open(new_config_path, "w", encoding="utf-8") as fp:
+        fp.write(config.model_dump_json(indent=2))
+
+    return True
+
+
+def convert_config(values: dict) -> Config:
+    config = Config()
+
+    for provider in LLMProvider:
+        endpoint = values.get(f"{provider.value.upper()}_ENDPOINT")
+        key = values.get(f"{provider.value.upper()}_API_KEY")
+
+        if provider == LLMProvider.OPENAI:
+            # OpenAI is also used for Azure and OpenRouter and local LLMs
+            if endpoint is None:
+                endpoint = values.get("AZURE_ENDPOINT")
+            if endpoint is None:
+                endpoint = values.get("OPENROUTER_ENDPOINT")
+
+            if key is None:
+                key = values.get("AZURE_API_KEY")
+            if key is None:
+                key = values.get("OPENROUTER_API_KEY")
+                if key and endpoint is None:
+                    endpoint = "https://openrouter.ai/api/v1/chat/completions"
+
+        if endpoint or key and provider not in config.llm:
+            config.llm[provider] = ProviderConfig()
+
+        if endpoint:
+            endpoint = endpoint.replace("chat/completions", "")
+            config.llm[provider].base_url = endpoint
+        if key:
+            config.llm[provider].api_key = key
+
+    provider = "openai"
+    model = values.get("MODEL_NAME", "gpt-4-turbo")
+    if "/" in model:
+        provider, model = model.split("/", 1)
+
+    try:
+        agent_provider = LLMProvider(provider.upper())
+    except ValueError:
+        agent_provider = LLMProvider.OPENAI
+
+    config.agent["default"].model = model
+    config.agent["default"].provider = agent_provider
+
+    ignore_paths = [p for p in values.get("IGNORE_PATHS", "").split(",") if p]
+    if ignore_paths:
+        config.fs.ignore_paths += ignore_paths
+    return config
--- a/core/config/user_settings.py
+++ b/core/config/user_settings.py
@@ -0,0 +1,94 @@
+import sys
+from os import getenv, makedirs
+from pathlib import Path
+from uuid import uuid4
+
+from pydantic import BaseModel, Field, PrivateAttr
+
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+SETTINGS_APP_NAME = "GPT Pilot"
+DEFAULT_TELEMETRY_ENDPOINT = "https://api.pythagora.io/telemetry"
+
+
+class TelemetrySettings(BaseModel):
+    id: str = Field(default_factory=lambda: uuid4().hex, description="Unique telemetry ID")
+    enabled: bool = Field(True, description="Whether telemetry should send stats to the server")
+    endpoint: str = Field(DEFAULT_TELEMETRY_ENDPOINT, description="Telemetry server endpoint")
+
+
+def resolve_config_dir() -> Path:
+    """
+    Figure out where to store the global config file(s).
+
+    :return: path to the desired location config directory
+
+    See the UserSettings docstring for details on how the config directory is
+    determined.
+    """
+    posix_app_name = SETTINGS_APP_NAME.replace(" ", "-").lower()
+
+    xdg_config_home = getenv("XDG_CONFIG_HOME")
+    if xdg_config_home:
+        return Path(xdg_config_home) / Path(posix_app_name)
+
+    if sys.platform == "win32" and getenv("APPDATA"):
+        return Path(getenv("APPDATA")) / Path(SETTINGS_APP_NAME)
+
+    return Path("~").expanduser() / Path(f".{posix_app_name}")
+
+
+class UserSettings(BaseModel):
+    """
+    This object holds all the global user settings, that are applicable for
+    all Pythagora/GPT-Pilot installations.
+
+    The use settings are stored in a JSON file in the config directory.
+
+    The config directory is determined by the following rules:
+    * If the XDG_CONFIG_HOME environment variable is set (desktop Linux), use that.
+    * If the APPDATA environment variable is set (Windows), use that.
+    * Otherwise, use the POSIX default ~/.<app-name> (MacOS, server Linux).
+
+    This is a singleton object, use it by importing the instance directly
+    from the module:
+
+    >>> from config.user_settings import settings
+    >>> print(settings.telemetry.id)
+    >>> print(settings.config_path)
+    """
+
+    telemetry: TelemetrySettings = TelemetrySettings()
+    _config_path: str = PrivateAttr("")
+
+    @staticmethod
+    def load():
+        config_path = resolve_config_dir() / "config.json"
+
+        if not config_path.exists():
+            default = UserSettings()
+            default._config_path = config_path
+            default.save()
+
+        with open(config_path, "r", encoding="utf-8") as fp:
+            settings = UserSettings.model_validate_json(fp.read())
+        settings._config_path = str(config_path)
+        return settings
+
+    def save(self):
+        makedirs(Path(self._config_path).parent, exist_ok=True)
+        with open(self._config_path, "w", encoding="utf-8") as fp:
+            fp.write(self.model_dump_json(indent=2))
+
+    @property
+    def config_path(self):
+        return self._config_path
+
+
+settings = UserSettings.load()
+
+
+__all__ = ["settings"]
--- a/core/config/version.py
+++ b/core/config/version.py
@@ -0,0 +1,86 @@
+import re
+from os.path import abspath, basename, dirname, isdir, isfile, join
+from typing import Optional
+
+GIT_DIR_PATH = abspath(join(dirname(__file__), "..", "..", ".git"))
+
+
+def get_git_commit() -> Optional[str]:
+    """
+    Return the current git commit (if running from a repo).
+
+    :return: commit hash or None if not running from a git repo
+    """
+
+    if not isdir(GIT_DIR_PATH):
+        return None
+
+    git_head = join(GIT_DIR_PATH, "HEAD")
+    if not isfile(git_head):
+        return None
+
+    with open(git_head, "r", encoding="utf-8") as f:
+        ref = f.read().strip()
+
+    # Direct reference to commit hash
+    if not ref.startswith("ref: "):
+        return ref
+
+    # Follow the reference
+    ref = ref[5:]
+    ref_path = join(GIT_DIR_PATH, ref)
+
+    # Dangling reference,  return the reference name
+    if not isfile(ref_path):
+        return basename(ref_path)
+
+    # Return the reference commit hash
+    with open(ref_path, "r", encoding="utf-8") as f:
+        return f.read().strip()
+
+
+def get_package_version() -> str:
+    """
+    Get package version as defined pyproject.toml.
+
+    If not found, returns "0.0.0."
+
+    :return: package version as defined in pyproject.toml
+    """
+    UNKNOWN = "0.0.0"
+    PYPOETRY_VERSION_PATTERN = re.compile(r'^\s*version\s*=\s*"(.*)"\s*(#.*)?$')
+
+    pyproject_path = join(dirname(__file__), "..", "..", "pyproject.toml")
+    if not isfile(pyproject_path):
+        return UNKNOWN
+
+    with open(pyproject_path, "r", encoding="utf-8") as fp:
+        for line in fp:
+            m = PYPOETRY_VERSION_PATTERN.match(line)
+            if m:
+                return m.group(1)
+
+    return UNKNOWN
+
+
+def get_version() -> str:
+    """
+    Find and return the current version of Pythagora Core.
+
+    The version string is built from the package version and the current
+    git commit hash (if running from a git repo).
+
+    Example: 0.0.0-gitbf01c19
+
+    :return: version string
+    """
+
+    version = get_package_version()
+    commit = get_git_commit()
+    if commit:
+        version = version + "-git" + commit[:7]
+
+    return version
+
+
+__all__ = ["get_version"]
--- a/core/db/init.py
+++ b/core/db/init.py
--- a/core/db/alembic.ini
+++ b/core/db/alembic.ini
@@ -0,0 +1,116 @@
+# A generic, single database configuration.
+
+[alembic]
+# path to migration scripts
+script_location = core/db/migrations
+
+# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
+# Uncomment the line below if you want the files to be prepended with date and time
+# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
+# for all available tokens
+# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
+
+# sys.path path, will be prepended to sys.path if present.
+# defaults to the current working directory.
+prepend_sys_path = .
+
+# timezone to use when rendering the date within the migration file
+# as well as the filename.
+# If specified, requires the python>=3.9 or backports.zoneinfo library.
+# Any required deps can installed by adding `alembic[tz]` to the pip requirements
+# string value is passed to ZoneInfo()
+# leave blank for localtime
+# timezone =
+
+# max length of characters to apply to the
+# "slug" field
+# truncate_slug_length = 40
+
+# set to 'true' to run the environment during
+# the 'revision' command, regardless of autogenerate
+# revision_environment = false
+
+# set to 'true' to allow .pyc and .pyo files without
+# a source .py file to be detected as revisions in the
+# versions/ directory
+# sourceless = false
+
+# version location specification; This defaults
+# to migrations/versions.  When using multiple version
+# directories, initial revisions must be specified with --version-path.
+# The path separator used here should be the separator specified by "version_path_separator" below.
+version_locations = core/db/migrations/versions
+
+# version path separator; As mentioned above, this is the character used to split
+# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
+# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
+# Valid values for version_path_separator are:
+#
+# version_path_separator = :
+# version_path_separator = ;
+# version_path_separator = space
+# Use os.pathsep. Default configuration used for new projects.
+version_path_separator = os
+
+# set to 'true' to search source files recursively
+# in each "version_locations" directory
+# new in Alembic version 1.10
+# recursive_version_locations = false
+
+# the output encoding used when revision files
+# are written from script.py.mako
+# output_encoding = utf-8
+
+sqlalchemy.url = sqlite:///pythagora.db
+
+[post_write_hooks]
+# post_write_hooks defines scripts or Python functions that are run
+# on newly generated revision scripts.  See the documentation for further
+# detail and examples
+
+# format using "black" - use the console_scripts runner, against the "black" entrypoint
+# hooks = black
+# black.type = console_scripts
+# black.entrypoint = black
+# black.options = -l 79 REVISION_SCRIPT_FILENAME
+
+# lint with attempts to fix using "ruff" - use the exec runner, execute a binary
+hooks = ruff
+ruff.type = exec
+ruff.executable = ruff
+ruff.options = check --fix REVISION_SCRIPT_FILENAME
+
+# Logging configuration
+[loggers]
+keys = root,sqlalchemy,alembic
+
+[handlers]
+keys = console
+
+[formatters]
+keys = generic
+
+[logger_root]
+level = WARN
+handlers = console
+qualname =
+
+[logger_sqlalchemy]
+level = WARN
+handlers =
+qualname = sqlalchemy.engine
+
+[logger_alembic]
+level = INFO
+handlers =
+qualname = alembic
+
+[handler_console]
+class = StreamHandler
+args = (sys.stderr,)
+level = NOTSET
+formatter = generic
+
+[formatter_generic]
+format = %(levelname)-5.5s [%(name)s] %(message)s
+datefmt = %H:%M:%S
--- a/core/db/migrations/README
+++ b/core/db/migrations/README
@@ -0,0 +1 @@
+Generic single-database configuration.
--- a/core/db/migrations/env.py
+++ b/core/db/migrations/env.py
@@ -0,0 +1,83 @@
+from logging.config import fileConfig
+
+from alembic import context
+from sqlalchemy import engine_from_config, pool
+
+from core.db.models import Base
+
+# this is the Alembic Config object, which provides
+# access to the values within the .ini file in use.
+config = context.config
+
+# Interpret the config file for Python logging.
+# This line sets up loggers basically.
+if config.config_file_name is not None and not config.get_main_option("pythagora_runtime"):
+    fileConfig(config.config_file_name)
+
+# Set database URL from environment
+# config.set_main_option("sqlalchemy.url", getenv("DATABASE_URL"))
+
+# add your model's MetaData object here
+# for 'autogenerate' support
+target_metadata = Base.metadata
+
+# other values from the config, defined by the needs of env.py,
+# can be acquired:
+# my_important_option = config.get_main_option("my_important_option")
+# ... etc.
+
+
+def run_migrations_offline() -> None:
+    """Run migrations in 'offline' mode.
+
+    This configures the context with just a URL
+    and not an Engine, though an Engine is acceptable
+    here as well.  By skipping the Engine creation
+    we don't even need a DBAPI to be available.
+
+    Calls to context.execute() here emit the given string to the
+    script output.
+
+    """
+    url = config.get_main_option("sqlalchemy.url")
+    context.configure(
+        url=url,
+        target_metadata=target_metadata,
+        literal_binds=True,
+        dialect_opts={"paramstyle": "named"},
+        render_as_batch="sqlite://" in url,
+    )
+
+    with context.begin_transaction():
+        context.run_migrations()
+
+
+def run_migrations_online() -> None:
+    """Run migrations in 'online' mode.
+
+    In this scenario we need to create an Engine
+    and associate a connection with the context.
+
+    """
+    url = config.get_main_option("sqlalchemy.url")
+    connectable = engine_from_config(
+        config.get_section(config.config_ini_section, {}),
+        prefix="sqlalchemy.",
+        poolclass=pool.NullPool,
+    )
+
+    with connectable.connect() as connection:
+        context.configure(
+            connection=connection,
+            target_metadata=target_metadata,
+            render_as_batch="sqlite://" in url,
+        )
+
+        with context.begin_transaction():
+            context.run_migrations()
+
+
+if context.is_offline_mode():
+    run_migrations_offline()
+else:
+    run_migrations_online()
--- a/core/db/migrations/script.py.mako
+++ b/core/db/migrations/script.py.mako
@@ -0,0 +1,26 @@
+"""${message}
+
+Revision ID: ${up_revision}
+Revises: ${down_revision | comma,n}
+Create Date: ${create_date}
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+${imports if imports else ""}
+
+# revision identifiers, used by Alembic.
+revision: str = ${repr(up_revision)}
+down_revision: Union[str, None] = ${repr(down_revision)}
+branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
+depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
+
+
+def upgrade() -> None:
+    ${upgrades if upgrades else "pass"}
+
+
+def downgrade() -> None:
+    ${downgrades if downgrades else "pass"}
--- a/core/db/migrations/versions/4f79e6952354_added_complexity_to_specification.py
+++ b/core/db/migrations/versions/4f79e6952354_added_complexity_to_specification.py
@@ -0,0 +1,34 @@
+"""added complexity to specification
+
+Revision ID: 4f79e6952354
+Revises: 5b04ea6afce5
+Create Date: 2024-05-16 18:01:49.024811
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = "4f79e6952354"
+down_revision: Union[str, None] = "5b04ea6afce5"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table("specifications", schema=None) as batch_op:
+        batch_op.add_column(sa.Column("complexity", sa.String(), server_default="hard", nullable=False))
+
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table("specifications", schema=None) as batch_op:
+        batch_op.drop_column("complexity")
+
+    # ### end Alembic commands ###
--- a/core/db/migrations/versions/5b04ea6afce5_add_agent_info_to_llm_request_log.py
+++ b/core/db/migrations/versions/5b04ea6afce5_add_agent_info_to_llm_request_log.py
@@ -0,0 +1,34 @@
+"""add agent info to llm request log
+
+Revision ID: 5b04ea6afce5
+Revises: fd206d3095d0
+Create Date: 2024-05-12 11:07:40.271217
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = "5b04ea6afce5"
+down_revision: Union[str, None] = "fd206d3095d0"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table("llm_requests", schema=None) as batch_op:
+        batch_op.add_column(sa.Column("agent", sa.String(), nullable=True))
+
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table("llm_requests", schema=None) as batch_op:
+        batch_op.drop_column("agent")
+
+    # ### end Alembic commands ###
--- a/core/db/migrations/versions/e7b54beadf8f_initial.py
+++ b/core/db/migrations/versions/e7b54beadf8f_initial.py
@@ -0,0 +1,120 @@
+"""initial
+
+Revision ID: e7b54beadf8f
+Revises:
+Create Date: 2024-05-06 09:38:05.391674
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = "e7b54beadf8f"
+down_revision: Union[str, None] = None
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.create_table(
+        "file_contents",
+        sa.Column("id", sa.String(), nullable=False),
+        sa.Column("content", sa.String(), nullable=False),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_file_contents")),
+    )
+    op.create_table(
+        "projects",
+        sa.Column("id", sa.Uuid(), nullable=False),
+        sa.Column("name", sa.String(), nullable=False),
+        sa.Column("created_at", sa.DateTime(), server_default=sa.text("(CURRENT_TIMESTAMP)"), nullable=False),
+        sa.Column("folder_name", sa.String(), nullable=False),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_projects")),
+    )
+    op.create_table(
+        "specifications",
+        sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
+        sa.Column("description", sa.String(), nullable=False),
+        sa.Column("architecture", sa.String(), nullable=False),
+        sa.Column("system_dependencies", sa.JSON(), nullable=False),
+        sa.Column("package_dependencies", sa.JSON(), nullable=False),
+        sa.Column("template", sa.String(), nullable=True),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_specifications")),
+    )
+    op.create_table(
+        "branches",
+        sa.Column("id", sa.Uuid(), nullable=False),
+        sa.Column("project_id", sa.Uuid(), nullable=False),
+        sa.Column("created_at", sa.DateTime(), server_default=sa.text("(CURRENT_TIMESTAMP)"), nullable=False),
+        sa.Column("name", sa.String(), nullable=False),
+        sa.ForeignKeyConstraint(
+            ["project_id"], ["projects.id"], name=op.f("fk_branches_project_id_projects"), ondelete="CASCADE"
+        ),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_branches")),
+    )
+    op.create_table(
+        "project_states",
+        sa.Column("id", sa.Uuid(), nullable=False),
+        sa.Column("branch_id", sa.Uuid(), nullable=False),
+        sa.Column("prev_state_id", sa.Uuid(), nullable=True),
+        sa.Column("specification_id", sa.Integer(), nullable=False),
+        sa.Column("created_at", sa.DateTime(), server_default=sa.text("(CURRENT_TIMESTAMP)"), nullable=False),
+        sa.Column("step_index", sa.Integer(), server_default="1", nullable=False),
+        sa.Column("epics", sa.JSON(), nullable=False),
+        sa.Column("tasks", sa.JSON(), nullable=False),
+        sa.Column("steps", sa.JSON(), nullable=False),
+        sa.Column("iterations", sa.JSON(), nullable=False),
+        sa.Column("relevant_files", sa.JSON(), nullable=False),
+        sa.Column("modified_files", sa.JSON(), nullable=False),
+        sa.Column("run_command", sa.String(), nullable=True),
+        sa.ForeignKeyConstraint(
+            ["branch_id"], ["branches.id"], name=op.f("fk_project_states_branch_id_branches"), ondelete="CASCADE"
+        ),
+        sa.ForeignKeyConstraint(
+            ["prev_state_id"],
+            ["project_states.id"],
+            name=op.f("fk_project_states_prev_state_id_project_states"),
+            ondelete="CASCADE",
+        ),
+        sa.ForeignKeyConstraint(
+            ["specification_id"], ["specifications.id"], name=op.f("fk_project_states_specification_id_specifications")
+        ),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_project_states")),
+        sa.UniqueConstraint("branch_id", "step_index", name=op.f("uq_project_states_branch_id")),
+        sa.UniqueConstraint("prev_state_id", name=op.f("uq_project_states_prev_state_id")),
+        sqlite_autoincrement=True,
+    )
+    op.create_table(
+        "files",
+        sa.Column("id", sa.Integer(), nullable=False),
+        sa.Column("project_state_id", sa.Uuid(), nullable=False),
+        sa.Column("content_id", sa.String(), nullable=False),
+        sa.Column("path", sa.String(), nullable=False),
+        sa.Column("meta", sa.JSON(), server_default="{}", nullable=False),
+        sa.ForeignKeyConstraint(
+            ["content_id"], ["file_contents.id"], name=op.f("fk_files_content_id_file_contents"), ondelete="RESTRICT"
+        ),
+        sa.ForeignKeyConstraint(
+            ["project_state_id"],
+            ["project_states.id"],
+            name=op.f("fk_files_project_state_id_project_states"),
+            ondelete="CASCADE",
+        ),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_files")),
+        sa.UniqueConstraint("project_state_id", "path", name=op.f("uq_files_project_state_id")),
+    )
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_table("files")
+    op.drop_table("project_states")
+    op.drop_table("branches")
+    op.drop_table("specifications")
+    op.drop_table("projects")
+    op.drop_table("file_contents")
+    # ### end Alembic commands ###
--- a/core/db/migrations/versions/fd206d3095d0_store_request_input_exec_logs_to_db.py
+++ b/core/db/migrations/versions/fd206d3095d0_store_request_input_exec_logs_to_db.py
@@ -0,0 +1,106 @@
+"""store request input exec logs to db
+
+Revision ID: fd206d3095d0
+Revises: e7b54beadf8f
+Create Date: 2024-05-09 08:25:10.698607
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = "fd206d3095d0"
+down_revision: Union[str, None] = "e7b54beadf8f"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.create_table(
+        "exec_logs",
+        sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
+        sa.Column("branch_id", sa.Uuid(), nullable=False),
+        sa.Column("project_state_id", sa.Uuid(), nullable=True),
+        sa.Column("started_at", sa.DateTime(), nullable=False),
+        sa.Column("duration", sa.Float(), nullable=False),
+        sa.Column("cmd", sa.String(), nullable=False),
+        sa.Column("cwd", sa.String(), nullable=False),
+        sa.Column("env", sa.JSON(), nullable=False),
+        sa.Column("timeout", sa.Float(), nullable=True),
+        sa.Column("status_code", sa.Integer(), nullable=True),
+        sa.Column("stdout", sa.String(), nullable=False),
+        sa.Column("stderr", sa.String(), nullable=False),
+        sa.Column("analysis", sa.String(), nullable=False),
+        sa.Column("success", sa.Boolean(), nullable=False),
+        sa.ForeignKeyConstraint(
+            ["branch_id"], ["branches.id"], name=op.f("fk_exec_logs_branch_id_branches"), ondelete="CASCADE"
+        ),
+        sa.ForeignKeyConstraint(
+            ["project_state_id"],
+            ["project_states.id"],
+            name=op.f("fk_exec_logs_project_state_id_project_states"),
+            ondelete="SET NULL",
+        ),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_exec_logs")),
+    )
+    op.create_table(
+        "llm_requests",
+        sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
+        sa.Column("branch_id", sa.Uuid(), nullable=False),
+        sa.Column("project_state_id", sa.Uuid(), nullable=True),
+        sa.Column("started_at", sa.DateTime(), server_default=sa.text("(CURRENT_TIMESTAMP)"), nullable=False),
+        sa.Column("provider", sa.String(), nullable=False),
+        sa.Column("model", sa.String(), nullable=False),
+        sa.Column("temperature", sa.Float(), nullable=False),
+        sa.Column("messages", sa.JSON(), nullable=False),
+        sa.Column("response", sa.String(), nullable=True),
+        sa.Column("prompt_tokens", sa.Integer(), nullable=False),
+        sa.Column("completion_tokens", sa.Integer(), nullable=False),
+        sa.Column("duration", sa.Float(), nullable=False),
+        sa.Column("status", sa.String(), nullable=False),
+        sa.Column("error", sa.String(), nullable=True),
+        sa.ForeignKeyConstraint(
+            ["branch_id"], ["branches.id"], name=op.f("fk_llm_requests_branch_id_branches"), ondelete="CASCADE"
+        ),
+        sa.ForeignKeyConstraint(
+            ["project_state_id"],
+            ["project_states.id"],
+            name=op.f("fk_llm_requests_project_state_id_project_states"),
+            ondelete="SET NULL",
+        ),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_llm_requests")),
+    )
+    op.create_table(
+        "user_inputs",
+        sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
+        sa.Column("branch_id", sa.Uuid(), nullable=False),
+        sa.Column("project_state_id", sa.Uuid(), nullable=True),
+        sa.Column("created_at", sa.DateTime(), server_default=sa.text("(CURRENT_TIMESTAMP)"), nullable=False),
+        sa.Column("question", sa.String(), nullable=False),
+        sa.Column("answer_text", sa.String(), nullable=True),
+        sa.Column("answer_button", sa.String(), nullable=True),
+        sa.Column("cancelled", sa.Boolean(), nullable=False),
+        sa.ForeignKeyConstraint(
+            ["branch_id"], ["branches.id"], name=op.f("fk_user_inputs_branch_id_branches"), ondelete="CASCADE"
+        ),
+        sa.ForeignKeyConstraint(
+            ["project_state_id"],
+            ["project_states.id"],
+            name=op.f("fk_user_inputs_project_state_id_project_states"),
+            ondelete="SET NULL",
+        ),
+        sa.PrimaryKeyConstraint("id", name=op.f("pk_user_inputs")),
+    )
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_table("user_inputs")
+    op.drop_table("llm_requests")
+    op.drop_table("exec_logs")
+    # ### end Alembic commands ###
--- a/core/db/models/init.py
+++ b/core/db/models/init.py
@@ -0,0 +1,29 @@
+# Pythagora database models
+#
+# Always import models from this module to ensure the SQLAlchemy registry
+# is correctly populated.
+
+from .base import Base
+from .branch import Branch
+from .exec_log import ExecLog
+from .file import File
+from .file_content import FileContent
+from .llm_request import LLMRequest
+from .project import Project
+from .project_state import ProjectState
+from .specification import Complexity, Specification
+from .user_input import UserInput
+
+__all__ = [
+    "Base",
+    "Branch",
+    "Complexity",
+    "ExecLog",
+    "File",
+    "FileContent",
+    "LLMRequest",
+    "Project",
+    "ProjectState",
+    "Specification",
+    "UserInput",
+]
--- a/core/db/models/base.py
+++ b/core/db/models/base.py
@@ -0,0 +1,45 @@
+# DeclarativeBase enables declarative configuration of
+# database models within SQLAlchemy.
+#
+# It also sets up a registry for the classes that inherit from it,
+# so that SQLAlechemy understands how they map to database tables.
+
+from sqlalchemy import MetaData
+from sqlalchemy.ext.asyncio import AsyncAttrs
+from sqlalchemy.orm import DeclarativeBase
+from sqlalchemy.types import JSON
+
+
+class Base(AsyncAttrs, DeclarativeBase):
+    """Base class for all SQL database models."""
+
+    # Mapping of Python types to SQLAlchemy types.
+    type_annotation_map = {
+        list[dict]: JSON,
+        list[str]: JSON,
+        dict: JSON,
+    }
+
+    metadata = MetaData(
+        # Naming conventions for constraints, foreign keys, etc.
+        naming_convention={
+            "ix": "ix_%(column_0_label)s",
+            "uq": "uq_%(table_name)s_%(column_0_name)s",
+            "ck": "ck_%(table_name)s_`%(constraint_name)s`",
+            "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
+            "pk": "pk_%(table_name)s",
+        }
+    )
+
+    def __eq__(self, other) -> bool:
+        """
+        Two instances of the same model class are the same if their
+        IDs are the same.
+
+        This allows comparison of models bound to different sessions.
+        """
+        return isinstance(other, self.__class__) and self.id == other.id
+
+    def __repr__(self) -> str:
+        """Return a string representation of the model."""
+        return f"<{self.__class__.__name__}(id={self.id})>"
--- a/core/db/models/branch.py
+++ b/core/db/models/branch.py
@@ -0,0 +1,89 @@
+from datetime import datetime
+from typing import TYPE_CHECKING, Optional, Union
+from uuid import UUID, uuid4
+
+from sqlalchemy import ForeignKey, inspect, select
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+from sqlalchemy.sql import func
+
+from core.db.models import Base
+
+if TYPE_CHECKING:
+    from sqlalchemy.ext.asyncio import AsyncSession
+
+    from core.db.models import ExecLog, LLMRequest, Project, ProjectState, UserInput
+
+
+class Branch(Base):
+    __tablename__ = "branches"
+
+    DEFAULT = "main"
+
+    # ID and parent FKs
+    id: Mapped[UUID] = mapped_column(primary_key=True, default=uuid4)
+    project_id: Mapped[UUID] = mapped_column(ForeignKey("projects.id", ondelete="CASCADE"))
+
+    # Attributes
+    created_at: Mapped[datetime] = mapped_column(server_default=func.now())
+    name: Mapped[str] = mapped_column(default=DEFAULT)
+
+    # Relationships
+    project: Mapped["Project"] = relationship(back_populates="branches", lazy="selectin")
+    states: Mapped[list["ProjectState"]] = relationship(back_populates="branch", cascade="all")
+    llm_requests: Mapped[list["LLMRequest"]] = relationship(back_populates="branch", cascade="all")
+    user_inputs: Mapped[list["UserInput"]] = relationship(back_populates="branch", cascade="all")
+    exec_logs: Mapped[list["ExecLog"]] = relationship(back_populates="branch", cascade="all")
+
+    @staticmethod
+    async def get_by_id(session: "AsyncSession", branch_id: Union[str, UUID]) -> Optional["Branch"]:
+        """
+        Get a project by ID.
+
+        :param session: The SQLAlchemy session.
+        :param project_id: The branch ID (as str or UUID value).
+        :return: The Branch object if found, None otherwise.
+        """
+        if not isinstance(branch_id, UUID):
+            branch_id = UUID(branch_id)
+
+        result = await session.execute(select(Branch).where(Branch.id == branch_id))
+        return result.scalar_one_or_none()
+
+    async def get_last_state(self) -> Optional["ProjectState"]:
+        """
+        Get the last project state of the branch.
+
+        :return: The last step of the branch, or None if there are no steps.
+        """
+
+        from core.db.models import ProjectState
+
+        session = inspect(self).async_session
+        if session is None:
+            raise ValueError("Branch instance not associated with a DB session.")
+
+        result = await session.execute(
+            select(ProjectState)
+            .where(ProjectState.branch_id == self.id)
+            .order_by(ProjectState.step_index.desc())
+            .limit(1)
+        )
+        return result.scalar_one_or_none()
+
+    async def get_state_at_step(self, step_index: int) -> Optional["ProjectState"]:
+        """
+        Get the project state at the given step index for the branch.
+
+        :return: The indicated step within the branch, or None if there's no such step.
+        """
+
+        from core.db.models import ProjectState
+
+        session = inspect(self).async_session
+        if session is None:
+            raise ValueError("Branch instance not associated with a DB session.")
+
+        result = await session.execute(
+            select(ProjectState).where((ProjectState.branch_id == self.id) & (ProjectState.step_index == step_index))
+        )
+        return result.scalar_one_or_none()
--- a/core/db/models/exec_log.py
+++ b/core/db/models/exec_log.py
@@ -0,0 +1,71 @@
+from datetime import datetime
+from typing import TYPE_CHECKING, Optional
+from uuid import UUID
+
+from sqlalchemy import ForeignKey, inspect
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+from core.db.models import Base
+from core.proc.exec_log import ExecLog as ExecLogData
+
+if TYPE_CHECKING:
+    from core.db.models import Branch, ProjectState
+
+
+class ExecLog(Base):
+    __tablename__ = "exec_logs"
+
+    # ID and parent FKs
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    branch_id: Mapped[UUID] = mapped_column(ForeignKey("branches.id", ondelete="CASCADE"))
+    project_state_id: Mapped[Optional[UUID]] = mapped_column(ForeignKey("project_states.id", ondelete="SET NULL"))
+
+    # Attributes
+    started_at: Mapped[datetime] = mapped_column()
+    duration: Mapped[float] = mapped_column()
+    cmd: Mapped[str] = mapped_column()
+    cwd: Mapped[str] = mapped_column()
+    env: Mapped[dict] = mapped_column()
+    timeout: Mapped[Optional[float]] = mapped_column()
+    status_code: Mapped[Optional[int]] = mapped_column()
+    stdout: Mapped[str] = mapped_column()
+    stderr: Mapped[str] = mapped_column()
+    analysis: Mapped[str] = mapped_column()
+    success: Mapped[bool] = mapped_column()
+
+    # Relationships
+    branch: Mapped["Branch"] = relationship(back_populates="exec_logs")
+    project_state: Mapped["ProjectState"] = relationship(back_populates="exec_logs")
+
+    @classmethod
+    def from_exec_log(cls, project_state: "ProjectState", exec_log: ExecLogData) -> "ExecLog":
+        """
+        Store the user input in the database.
+
+        Note this just creates the UserInput object. It is committed to the
+        database only when the DB session itself is comitted.
+
+        :param project_state: Project state to associate the request log with.
+        :param question: Question the user was asked.
+        :param user_input: User input.
+        :return: Newly created User input in the database.
+        """
+        session = inspect(project_state).async_session
+
+        obj = cls(
+            project_state=project_state,
+            branch=project_state.branch,
+            started_at=exec_log.started_at,
+            duration=exec_log.duration,
+            cmd=exec_log.cmd,
+            cwd=exec_log.cwd,
+            env=exec_log.env,
+            timeout=exec_log.timeout,
+            status_code=exec_log.status_code,
+            stdout=exec_log.stdout,
+            stderr=exec_log.stderr,
+            analysis=exec_log.analysis,
+            success=exec_log.success,
+        )
+        session.add(obj)
+        return obj
--- a/core/db/models/file.py
+++ b/core/db/models/file.py
@@ -0,0 +1,43 @@
+from typing import TYPE_CHECKING, Optional
+from uuid import UUID
+
+from sqlalchemy import ForeignKey, UniqueConstraint
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+from core.db.models import Base
+
+if TYPE_CHECKING:
+    from core.db.models import FileContent, ProjectState
+
+
+class File(Base):
+    __tablename__ = "files"
+    __table_args__ = (UniqueConstraint("project_state_id", "path"),)
+
+    # ID and parent FKs
+    id: Mapped[int] = mapped_column(primary_key=True)
+    project_state_id: Mapped[UUID] = mapped_column(ForeignKey("project_states.id", ondelete="CASCADE"))
+    content_id: Mapped[str] = mapped_column(ForeignKey("file_contents.id", ondelete="RESTRICT"))
+
+    # Attributes
+    path: Mapped[str] = mapped_column()
+    meta: Mapped[dict] = mapped_column(default=dict, server_default="{}")
+
+    # Relationships
+    project_state: Mapped[Optional["ProjectState"]] = relationship(back_populates="files")
+    content: Mapped["FileContent"] = relationship(back_populates="files", lazy="selectin")
+
+    def clone(self) -> "File":
+        """
+        Clone the file object, to be used in a new project state.
+
+        The clone references the same file content object as the original.
+
+        :return: The cloned file object.
+        """
+        return File(
+            project_state=None,
+            content_id=self.content_id,
+            path=self.path,
+            meta=self.meta,
+        )
--- a/core/db/models/file_content.py
+++ b/core/db/models/file_content.py
@@ -0,0 +1,47 @@
+from typing import TYPE_CHECKING
+
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+from core.db.models import Base
+
+if TYPE_CHECKING:
+    from core.db.models import File
+
+
+class FileContent(Base):
+    __tablename__ = "file_contents"
+
+    # ID and parent FKs
+    id: Mapped[str] = mapped_column(primary_key=True)
+
+    # Attributes
+    content: Mapped[str] = mapped_column()
+
+    # Relationships
+    files: Mapped[list["File"]] = relationship(back_populates="content")
+
+    @classmethod
+    async def store(cls, session: AsyncSession, hash: str, content: str) -> "FileContent":
+        """
+        Store the file content in the database.
+
+        If the content is already stored, returns the reference to the existing
+        content object. Otherwise stores it to the database and returns the newly
+        created content object.
+
+        :param session: The database session.
+        :param hash: The hash of the file content, used as an unique ID.
+        :param content: The file content as unicode string.
+        :return: The file content object.
+        """
+        result = await session.execute(select(FileContent).where(FileContent.id == hash))
+        fc = result.scalar_one_or_none()
+        if fc is not None:
+            return fc
+
+        fc = cls(id=hash, content=content)
+        session.add(fc)
+
+        return fc
--- a/core/db/models/llm_request.py
+++ b/core/db/models/llm_request.py
@@ -0,0 +1,79 @@
+from datetime import datetime
+from typing import TYPE_CHECKING, Optional
+from uuid import UUID
+
+from sqlalchemy import ForeignKey, inspect
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+from sqlalchemy.sql import func
+
+from core.db.models import Base
+from core.llm.request_log import LLMRequestLog
+
+if TYPE_CHECKING:
+    from core.agents.base import BaseAgent
+    from core.db.models import Branch, ProjectState
+
+
+class LLMRequest(Base):
+    __tablename__ = "llm_requests"
+
+    # ID and parent FKs
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    branch_id: Mapped[UUID] = mapped_column(ForeignKey("branches.id", ondelete="CASCADE"))
+    project_state_id: Mapped[Optional[UUID]] = mapped_column(ForeignKey("project_states.id", ondelete="SET NULL"))
+
+    # Attributes
+    started_at: Mapped[datetime] = mapped_column(server_default=func.now())
+    agent: Mapped[Optional[str]] = mapped_column()
+    provider: Mapped[str] = mapped_column()
+    model: Mapped[str] = mapped_column()
+    temperature: Mapped[float] = mapped_column()
+    messages: Mapped[list[dict]] = mapped_column()
+    response: Mapped[Optional[str]] = mapped_column()
+    prompt_tokens: Mapped[int] = mapped_column()
+    completion_tokens: Mapped[int] = mapped_column()
+    duration: Mapped[float] = mapped_column()
+    status: Mapped[str] = mapped_column()
+    error: Mapped[Optional[str]] = mapped_column()
+
+    # Relationships
+    branch: Mapped["Branch"] = relationship(back_populates="llm_requests")
+    project_state: Mapped["ProjectState"] = relationship(back_populates="llm_requests")
+
+    @classmethod
+    def from_request_log(
+        cls,
+        project_state: "ProjectState",
+        agent: Optional["BaseAgent"],
+        request_log: LLMRequestLog,
+    ) -> "LLMRequest":
+        """
+        Store the request log in the database.
+
+        Note this just creates the request log object. It is committed to the
+        database only when the DB session itself is comitted.
+
+        :param project_state: Project state to associate the request log with.
+        :param agent: Agent that made the request (if the caller was an agent).
+        :param request_log: Request log.
+        :return: Newly created LLM request log in the database.
+        """
+        session = inspect(project_state).async_session
+
+        obj = cls(
+            project_state=project_state,
+            branch=project_state.branch,
+            agent=agent.agent_type,
+            provider=request_log.provider,
+            model=request_log.model,
+            temperature=request_log.temperature,
+            messages=request_log.messages,
+            response=request_log.response,
+            prompt_tokens=request_log.prompt_tokens,
+            completion_tokens=request_log.completion_tokens,
+            duration=request_log.duration,
+            status=request_log.status,
+            error=request_log.error,
+        )
+        session.add(obj)
+        return obj
--- a/core/db/models/project.py
+++ b/core/db/models/project.py
@@ -0,0 +1,124 @@
+import re
+from datetime import datetime
+from typing import TYPE_CHECKING, Optional, Union
+from unicodedata import normalize
+from uuid import UUID, uuid4
+
+from sqlalchemy import delete, inspect, select
+from sqlalchemy.ext.asyncio import AsyncSession
+from sqlalchemy.orm import Mapped, mapped_column, relationship, selectinload
+from sqlalchemy.sql import func
+
+from core.db.models import Base
+
+if TYPE_CHECKING:
+    from core.db.models import Branch
+
+
+class Project(Base):
+    __tablename__ = "projects"
+
+    # ID and parent FKs
+    id: Mapped[UUID] = mapped_column(primary_key=True, default=uuid4)
+
+    # Attributes
+    name: Mapped[str] = mapped_column()
+    created_at: Mapped[datetime] = mapped_column(server_default=func.now())
+    folder_name: Mapped[str] = mapped_column(
+        default=lambda context: Project.get_folder_from_project_name(context.get_current_parameters()["name"])
+    )
+
+    # Relationships
+    branches: Mapped[list["Branch"]] = relationship(back_populates="project", cascade="all")
+
+    @staticmethod
+    async def get_by_id(session: "AsyncSession", project_id: Union[str, UUID]) -> Optional["Project"]:
+        """
+        Get a project by ID.
+
+        :param session: The SQLAlchemy session.
+        :param project_id: The project ID (as str or UUID value).
+        :return: The Project object if found, None otherwise.
+        """
+        if not isinstance(project_id, UUID):
+            project_id = UUID(project_id)
+
+        result = await session.execute(select(Project).where(Project.id == project_id))
+        return result.scalar_one_or_none()
+
+    async def get_branch(self, name: Optional[str] = None) -> Optional["Branch"]:
+        """
+        Get a project branch by name.
+
+        :param session: The SQLAlchemy session.
+        :param branch_name: The name of the branch (default "main").
+        :return: The Branch object if found, None otherwise.
+        """
+        from core.db.models import Branch
+
+        session = inspect(self).async_session
+        if session is None:
+            raise ValueError("Project instance not associated with a DB session.")
+
+        if name is None:
+            name = Branch.DEFAULT
+
+        result = await session.execute(select(Branch).where(Branch.project_id == self.id, Branch.name == name))
+        return result.scalar_one_or_none()
+
+    @staticmethod
+    async def get_all_projects(session: "AsyncSession") -> list["Project"]:
+        """
+        Get all projects.
+
+        This assumes the projects have at least one branch and one state.
+
+        :param session: The SQLAlchemy session.
+        :return: List of Project objects.
+        """
+        from core.db.models import Branch, ProjectState
+
+        latest_state_query = (
+            select(ProjectState.branch_id, func.max(ProjectState.id).label("max_id"))
+            .group_by(ProjectState.branch_id)
+            .subquery()
+        )
+
+        query = (
+            select(Project, Branch, ProjectState)
+            .join(Branch, Project.branches)
+            .join(ProjectState, Branch.states)
+            .join(latest_state_query, ProjectState.id == latest_state_query.columns.max_id)
+            .options(selectinload(Project.branches), selectinload(Branch.states))
+            .order_by(Project.name, Branch.name)
+        )
+
+        results = await session.execute(query)
+        return results.scalars().all()
+
+    @staticmethod
+    def get_folder_from_project_name(name: str):
+        """
+        Get the folder name from the project name.
+
+        :param name: Project name.
+        :return: Folder name.
+        """
+        # replace unicode with accents with base characters (eg "šašavi" → "sasavi")
+        name = normalize("NFKD", name).encode("ascii", "ignore").decode("utf-8")
+
+        # replace spaces/interpunction with a single dash
+        return re.sub(r"[^a-zA-Z0-9]+", "-", name).lower().strip("-")
+
+    @staticmethod
+    async def delete_by_id(session: "AsyncSession", project_id: UUID) -> int:
+        """
+        Delete a project by ID.
+
+        :param session: The SQLAlchemy session.
+        :param project_id: The project ID
+        :return: Number of rows deleted.
+        """
+
+        result = await session.execute(delete(Project).where(Project.id == project_id))
+        return result.rowcount
--- a/core/db/models/project_state.py
+++ b/core/db/models/project_state.py
@@ -0,0 +1,338 @@
+from copy import deepcopy
+from datetime import datetime
+from typing import TYPE_CHECKING, Optional
+from uuid import UUID, uuid4
+
+from sqlalchemy import ForeignKey, UniqueConstraint, delete, inspect
+from sqlalchemy.ext.asyncio import AsyncSession
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+from sqlalchemy.orm.attributes import flag_modified
+from sqlalchemy.sql import func
+
+from core.db.models import Base
+from core.log import get_logger
+
+if TYPE_CHECKING:
+    from core.db.models import Branch, ExecLog, File, FileContent, LLMRequest, Specification, UserInput
+
+log = get_logger(__name__)
+
+
+class ProjectState(Base):
+    __tablename__ = "project_states"
+    __table_args__ = (
+        UniqueConstraint("prev_state_id"),
+        UniqueConstraint("branch_id", "step_index"),
+        {"sqlite_autoincrement": True},
+    )
+
+    # ID and parent FKs
+    id: Mapped[UUID] = mapped_column(primary_key=True, default=uuid4)
+    branch_id: Mapped[UUID] = mapped_column(ForeignKey("branches.id", ondelete="CASCADE"))
+    prev_state_id: Mapped[Optional[UUID]] = mapped_column(ForeignKey("project_states.id", ondelete="CASCADE"))
+    specification_id: Mapped[int] = mapped_column(ForeignKey("specifications.id"))
+
+    # Attributes
+    created_at: Mapped[datetime] = mapped_column(server_default=func.now())
+    step_index: Mapped[int] = mapped_column(default=1, server_default="1")
+    epics: Mapped[list[dict]] = mapped_column(default=list)
+    tasks: Mapped[list[dict]] = mapped_column(default=list)
+    steps: Mapped[list[dict]] = mapped_column(default=list)
+    iterations: Mapped[list[dict]] = mapped_column(default=list)
+    relevant_files: Mapped[list[str]] = mapped_column(default=list)
+    modified_files: Mapped[dict] = mapped_column(default=dict)
+    run_command: Mapped[Optional[str]] = mapped_column()
+
+    # Relationships
+    branch: Mapped["Branch"] = relationship(back_populates="states", lazy="selectin")
+    prev_state: Mapped[Optional["ProjectState"]] = relationship(
+        back_populates="next_state",
+        remote_side=[id],
+        single_parent=True,
+    )
+    next_state: Mapped[Optional["ProjectState"]] = relationship(back_populates="prev_state")
+    files: Mapped[list["File"]] = relationship(
+        back_populates="project_state",
+        lazy="selectin",
+        cascade="all,delete-orphan",
+    )
+    specification: Mapped["Specification"] = relationship(back_populates="project_states", lazy="selectin")
+    llm_requests: Mapped[list["LLMRequest"]] = relationship(back_populates="project_state", cascade="all")
+    user_inputs: Mapped[list["UserInput"]] = relationship(back_populates="project_state", cascade="all")
+    exec_logs: Mapped[list["ExecLog"]] = relationship(back_populates="project_state", cascade="all")
+
+    @property
+    def unfinished_steps(self) -> list[dict]:
+        """
+        Get the list of unfinished steps.
+
+        :return: List of unfinished steps.
+        """
+        return [step for step in self.steps if not step.get("completed")]
+
+    @property
+    def current_step(self) -> Optional[dict]:
+        """
+        Get the current step.
+
+        Current step is always the first step that's not finished yet.
+
+        :return: The current step, or None if there are no more unfinished steps.
+        """
+        li = self.unfinished_steps
+        return li[0] if li else None
+
+    @property
+    def unfinished_iterations(self) -> list[dict]:
+        """
+        Get the list of unfinished iterations.
+
+        :return: List of unfinished iterations.
+        """
+        return [iteration for iteration in self.iterations if not iteration.get("completed")]
+
+    @property
+    def current_iteration(self) -> Optional[dict]:
+        """
+        Get the current iteration.
+
+        Current iteration is always the first iteration that's not finished yet.
+
+        :return: The current iteration, or None if there are no unfinished iterations.
+        """
+        li = self.unfinished_iterations
+        return li[0] if li else None
+
+    @property
+    def unfinished_tasks(self) -> list[dict]:
+        """
+        Get the list of unfinished tasks.
+
+        :return: List of unfinished tasks.
+        """
+        return [task for task in self.tasks if not task.get("completed")]
+
+    @property
+    def current_task(self) -> Optional[dict]:
+        """
+        Get the current task.
+
+        Current task is always the first task that's not finished yet.
+
+        :return: The current task, or None if there are no unfinished tasks.
+        """
+        li = self.unfinished_tasks
+        return li[0] if li else None
+
+    @property
+    def unfinished_epics(self) -> list[dict]:
+        """
+        Get the list of unfinished epics.
+
+        :return: List of unfinished epics.
+        """
+        return [epic for epic in self.epics if not epic.get("completed")]
+
+    @property
+    def current_epic(self) -> Optional[dict]:
+        """
+        Get the current epic.
+
+        Current epic is always the first epic that's not finished yet.
+
+        :return: The current epic, or None if there are no unfinished epics.
+        """
+        li = self.unfinished_epics
+        return li[0] if li else None
+
+    @property
+    def relevant_file_objects(self):
+        """
+        Get the relevant files with their content.
+
+        :return: List of tuples with file path and content.
+        """
+        return [file for file in self.files if file.path in self.relevant_files]
+
+    @staticmethod
+    def create_initial_state(branch: "Branch") -> "ProjectState":
+        """
+        Create the initial project state for a new branch.
+
+        This does *not* commit the new state to the database.
+
+        No checks are made to ensure that the branch does not
+        already have a state.
+
+        :param branch: The branch to create the state for.
+        :return: The new ProjectState object.
+        """
+        from core.db.models import Specification
+
+        return ProjectState(
+            branch=branch,
+            specification=Specification(),
+            step_index=1,
+        )
+
+    async def create_next_state(self) -> "ProjectState":
+        """
+        Create the next project state for the branch.
+
+        This does NOT insert the new state and the associated objects (spec,
+        files, ...) to the database.
+
+        :param session: The SQLAlchemy session.
+        :return: The new ProjectState object.
+        """
+        if not self.id:
+            raise ValueError("Cannot create next state for unsaved state.")
+
+        if "next_state" in self.__dict__:
+            raise ValueError(f"Next state already exists for state with id={self.id}.")
+
+        new_state = ProjectState(
+            branch=self.branch,
+            prev_state=self,
+            step_index=self.step_index + 1,
+            specification=self.specification,
+            epics=deepcopy(self.epics),
+            tasks=deepcopy(self.tasks),
+            steps=deepcopy(self.steps),
+            iterations=deepcopy(self.iterations),
+            files=[],
+            relevant_files=deepcopy(self.relevant_files),
+            modified_files=deepcopy(self.modified_files),
+        )
+
+        session: AsyncSession = inspect(self).async_session
+        session.add(new_state)
+
+        for file in await self.awaitable_attrs.files:
+            clone = file.clone()
+            new_state.files.append(clone)
+
+        return new_state
+
+    def complete_step(self):
+        if not self.unfinished_steps:
+            raise ValueError("There are no unfinished steps to complete")
+        if "next_state" in self.__dict__:
+            raise ValueError("Current state is read-only (already has a next state).")
+
+        log.debug(f"Completing step {self.unfinished_steps[0]['type']}")
+        self.unfinished_steps[0]["completed"] = True
+        flag_modified(self, "steps")
+
+    def complete_task(self):
+        if not self.unfinished_tasks:
+            raise ValueError("There are no unfinished tasks to complete")
+        if "next_state" in self.__dict__:
+            raise ValueError("Current state is read-only (already has a next state).")
+
+        log.debug(f"Completing task {self.unfinished_tasks[0]['description']}")
+        self.unfinished_tasks[0]["completed"] = True
+        self.steps = []
+        self.iterations = []
+        self.relevant_files = []
+        self.modified_files = {}
+        flag_modified(self, "tasks")
+
+        if not self.unfinished_tasks and self.unfinished_epics:
+            self.complete_epic()
+
+    def complete_epic(self):
+        if not self.unfinished_epics:
+            raise ValueError("There are no unfinished epics to complete")
+        if "next_state" in self.__dict__:
+            raise ValueError("Current state is read-only (already has a next state).")
+
+        log.debug(f"Completing epic {self.unfinished_epics[0]['name']}")
+        self.unfinished_epics[0]["completed"] = True
+        flag_modified(self, "epics")
+
+    def complete_iteration(self):
+        if not self.unfinished_iterations:
+            raise ValueError("There are no unfinished iterations to complete")
+        if "next_state" in self.__dict__:
+            raise ValueError("Current state is read-only (already has a next state).")
+
+        log.debug(f"Completing iteration {self.unfinished_iterations[0]}")
+        self.unfinished_iterations[0]["completed"] = True
+        self.flag_iterations_as_modified()
+
+    def flag_iterations_as_modified(self):
+        """
+        Flag the iteration field as having been modified
+
+        Used by Agents that perform modifications within the mutable iterations field,
+        to tell the database that it was modified and should get saved (as SQLalchemy
+        can't detect changes in mutable fields by itself).
+        """
+        flag_modified(self, "iterations")
+
+    def get_file_by_path(self, path: str) -> Optional["File"]:
+        """
+        Get a file from the current project state, by the file path.
+
+        :param path: The file path.
+        :return: The file object, or None if not found.
+        """
+        for file in self.files:
+            if file.path == path:
+                return file
+
+        return None
+
+    def save_file(self, path: str, content: "FileContent", external: bool = False) -> "File":
+        """
+        Save a file to the project state.
+
+        This either creates a new file pointing at the given content,
+        or updates the content of an existing file. This method
+        doesn't actually commit the file to the database, just attaches
+        it to the project state.
+
+        If the file was created by Pythagora (not externally by user or template import),
+        mark it as relevant for the current task.
+
+        :param path: The file path.
+        :param content: The file content.
+        :param external: Whether the file was added externally (e.g. by a user).
+        :return: The (unsaved) file object.
+        """
+        from core.db.models import File
+
+        if "next_state" in self.__dict__:
+            raise ValueError("Current state is read-only (already has a next state).")
+
+        file = self.get_file_by_path(path)
+        if file:
+            original_content = file.content.content
+            file.content = content
+        else:
+            original_content = ""
+            file = File(path=path, content=content)
+            self.files.append(file)
+
+        if path not in self.modified_files and not external:
+            self.modified_files[path] = original_content
+        if path not in self.relevant_files:
+            self.relevant_files.append(path)
+
+        return file
+
+    async def delete_after(self):
+        """
+        Delete all states in the branch after this one.
+        """
+
+        session: AsyncSession = inspect(self).async_session
+
+        log.debug(f"Deleting all project states in branch {self.branch_id} after {self.id}")
+        await session.execute(
+            delete(ProjectState).where(
+                ProjectState.branch_id == self.branch_id,
+                ProjectState.step_index > self.step_index,
+            )
+        )
--- a/core/db/models/specification.py
+++ b/core/db/models/specification.py
@@ -0,0 +1,48 @@
+from typing import TYPE_CHECKING, Optional
+
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+from core.db.models import Base
+
+if TYPE_CHECKING:
+    from core.db.models import ProjectState
+
+
+class Complexity:
+    """Estimate of the project or feature complexity."""
+
+    SIMPLE = "simple"
+    MODERATE = "moderate"
+    HARD = "hard"
+
+
+class Specification(Base):
+    __tablename__ = "specifications"
+
+    # ID and parent FKs
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+
+    # Attributes
+    description: Mapped[str] = mapped_column(default="")
+    architecture: Mapped[str] = mapped_column(default="")
+    system_dependencies: Mapped[list[dict]] = mapped_column(default=list)
+    package_dependencies: Mapped[list[dict]] = mapped_column(default=list)
+    template: Mapped[Optional[str]] = mapped_column()
+    complexity: Mapped[str] = mapped_column(server_default=Complexity.HARD)
+
+    # Relationships
+    project_states: Mapped[list["ProjectState"]] = relationship(back_populates="specification")
+
+    def clone(self) -> "Specification":
+        """
+        Clone the specification.
+        """
+        clone = Specification(
+            description=self.description,
+            architecture=self.architecture,
+            system_dependencies=self.system_dependencies,
+            package_dependencies=self.package_dependencies,
+            template=self.template,
+            complexity=self.complexity,
+        )
+        return clone
--- a/core/db/models/user_input.py
+++ b/core/db/models/user_input.py
@@ -0,0 +1,59 @@
+from datetime import datetime
+from typing import TYPE_CHECKING, Optional
+from uuid import UUID
+
+from sqlalchemy import ForeignKey, inspect
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+from sqlalchemy.sql import func
+
+from core.db.models import Base
+from core.ui.base import UserInput as UserInputData
+
+if TYPE_CHECKING:
+    from core.db.models import Branch, ProjectState
+
+
+class UserInput(Base):
+    __tablename__ = "user_inputs"
+
+    # ID and parent FKs
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    branch_id: Mapped[UUID] = mapped_column(ForeignKey("branches.id", ondelete="CASCADE"))
+    project_state_id: Mapped[Optional[UUID]] = mapped_column(ForeignKey("project_states.id", ondelete="SET NULL"))
+
+    # Attributes
+    created_at: Mapped[datetime] = mapped_column(server_default=func.now())
+    question: Mapped[str] = mapped_column()
+    answer_text: Mapped[Optional[str]] = mapped_column()
+    answer_button: Mapped[Optional[str]] = mapped_column()
+    cancelled: Mapped[bool] = mapped_column()
+
+    # Relationships
+    branch: Mapped["Branch"] = relationship(back_populates="user_inputs")
+    project_state: Mapped["ProjectState"] = relationship(back_populates="user_inputs")
+
+    @classmethod
+    def from_user_input(cls, project_state: "ProjectState", question: str, user_input: UserInputData) -> "UserInput":
+        """
+        Store the user input in the database.
+
+        Note this just creates the UserInput object. It is committed to the
+        database only when the DB session itself is comitted.
+
+        :param project_state: Project state to associate the request log with.
+        :param question: Question the user was asked.
+        :param user_input: User input.
+        :return: Newly created User input in the database.
+        """
+        session = inspect(project_state).async_session
+
+        obj = cls(
+            project_state=project_state,
+            branch=project_state.branch,
+            question=question,
+            answer_text=user_input.text,
+            answer_button=user_input.button,
+            cancelled=user_input.cancelled,
+        )
+        session.add(obj)
+        return obj
--- a/core/db/session.py
+++ b/core/db/session.py
@@ -0,0 +1,75 @@
+from sqlalchemy import event
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
+
+from core.config import DBConfig
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class SessionManager:
+    """
+    Async-aware context manager for database session.
+
+    Usage:
+
+    >>> config = DBConfig(url="sqlite+aiosqlite:///test.db")
+    >>> async with DBSession(config) as session:
+    ...     # Do something with the session
+    """
+
+    def __init__(self, config: DBConfig):
+        """
+        Initialize the session manager with the given configuration.
+
+        :param config: Database configuration.
+        """
+        self.config = config
+        self.engine = create_async_engine(
+            self.config.url, echo=config.debug_sql, echo_pool="debug" if config.debug_sql else None
+        )
+        self.SessionClass = async_sessionmaker(self.engine, expire_on_commit=False)
+        self.session = None
+        self.recursion_depth = 0
+
+        event.listen(self.engine.sync_engine, "connect", self._on_connect)
+
+    def _on_connect(self, dbapi_connection, _):
+        """Connection event handler"""
+        log.debug(f"Connected to database {self.config.url}")
+
+        if self.config.url.startswith("sqlite"):
+            # Note that SQLite uses NullPool by default, meaning every session creates a
+            # database "connection". This is fine and preferred for SQLite because
+            # it's a local file. PostgreSQL or other database use a real connection pool
+            # by default.
+            dbapi_connection.execute("pragma foreign_keys=on")
+
+    async def start(self) -> AsyncSession:
+        if self.session is not None:
+            self.recursion_depth += 1
+            log.warning(f"Re-entering database session (depth: {self.recursion_depth}), potential bug", stack_info=True)
+            return self.session
+
+        self.session = self.SessionClass()
+        return self.session
+
+    async def close(self):
+        if self.session is None:
+            log.warning("Closing database session that was never opened", stack_info=True)
+            return
+        if self.recursion_depth > 0:
+            self.recursion_depth -= 1
+            return
+
+        await self.session.close()
+        self.session = None
+
+    async def __aenter__(self) -> AsyncSession:
+        return await self.start()
+
+    async def __aexit__(self, exc_type, exc_val, exc_tb):
+        return await self.close()
+
+
+__all__ = ["SessionManager"]
--- a/core/db/setup.py
+++ b/core/db/setup.py
@@ -0,0 +1,49 @@
+from os.path import dirname, join
+
+from alembic import command
+from alembic.config import Config
+
+from core.config import DBConfig
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+def _async_to_sync_db_scheme(url: str) -> str:
+    """
+    Convert an async database URL to a synchronous one.
+
+    This is needed because Alembic does not support async database
+    connections.
+
+    :param url: Asynchronouse database URL.
+    :return: Synchronous database URL.
+    """
+    if url.startswith("postgresql+asyncpg://"):
+        return url.replace("postgresql+asyncpg://", "postgresql://")
+    elif url.startswith("sqlite+aiosqlite://"):
+        return url.replace("sqlite+aiosqlite://", "sqlite://")
+    return url
+
+
+def run_migrations(config: DBConfig):
+    """
+    Run database migrations using Alembic.
+
+    This needs to happen synchronously, before the asyncio
+    mainloop is started, and before any database access.
+
+    :param config: Database configuration.
+    """
+    url = _async_to_sync_db_scheme(config.url)
+    ini_location = join(dirname(__file__), "alembic.ini")
+
+    log.debug(f"Running database migrations for {url} (config: {ini_location})")
+
+    alembic_cfg = Config(ini_location)
+    alembic_cfg.set_main_option("sqlalchemy.url", url)
+    alembic_cfg.set_main_option("pythagora_runtime", "true")
+    command.upgrade(alembic_cfg, "head")
+
+
+__all__ = ["run_migrations"]
--- a/core/db/v0importer.py
+++ b/core/db/v0importer.py
@@ -0,0 +1,246 @@
+from json import loads
+from os.path import exists
+from pathlib import Path
+from uuid import UUID, uuid4
+
+import aiosqlite
+
+from core.db.models import Branch, Project, ProjectState
+from core.db.session import SessionManager
+from core.log import get_logger
+from core.state.state_manager import StateManager
+
+log = get_logger(__name__)
+
+
+class LegacyDatabaseImporter:
+    def __init__(self, session_manager: SessionManager, dbpath: str):
+        self.session_manager = session_manager
+        self.state_manager = StateManager(self.session_manager, None)
+        self.dbpath = dbpath
+        self.conn = None
+
+        if not exists(dbpath):
+            raise FileNotFoundError(f"File not found: {dbpath}")
+
+    async def import_database(self):
+        info = await self.load_legacy_database()
+        await self.save_to_new_database(info)
+
+    async def load_legacy_database(self):
+        async with aiosqlite.connect(self.dbpath) as conn:
+            self.conn = conn
+            is_valid = await self.verify_schema()
+            if not is_valid:
+                raise ValueError(f"Database {self.dbpath} doesn't look like a GPT-Pilot database")
+
+            apps = await self.get_apps()
+            info = {}
+            for app_id in apps:
+                app_info = await self.get_app_info(app_id)
+                info[app_id] = {
+                    "name": apps[app_id],
+                    **app_info,
+                }
+
+        return info
+
+    async def verify_schema(self) -> bool:
+        tables = set()
+        async with self.conn.execute("select name from sqlite_master where type = 'table'") as cursor:
+            async for row in cursor:
+                tables.add(row[0])
+
+        return "app" in tables and "development_steps" in tables
+
+    async def get_apps(self) -> dict[str, str]:
+        apps = {}
+        async with self.conn.execute("select id, name, status from app") as cursor:
+            async for id, name, status in cursor:
+                if status == "coding":
+                    apps[id] = name
+        return apps
+
+    async def get_app_info(self, app_id: str) -> dict:
+        app_info = {
+            "initial_prompt": None,
+            "architecture": None,
+            "tasks": [],
+        }
+
+        async with self.conn.execute("select architecture from architecture where app_id = ?", (app_id,)) as cursor:
+            row = await cursor.fetchone()
+            if row:
+                app_info["architecture"] = loads(row[0])
+
+        async with self.conn.execute("select prompt from project_description where app_id = ?", (app_id,)) as cursor:
+            row = await cursor.fetchone()
+            if row:
+                app_info["initial_prompt"] = row[0]
+
+        async with self.conn.execute(
+            "select id, prompt_path, prompt_data, messages, llm_response from development_steps "
+            "where app_id = ? order by created_at asc",
+            (app_id,),
+        ) as cursor:
+            async for row in cursor:
+                dev_step_id, prompt_path, prompt_data, messages, llm_response = row
+                if prompt_path == "development/task/breakdown.prompt":
+                    task_info = await self.get_task_info(dev_step_id, prompt_data, llm_response)
+                    app_info["tasks"].append(task_info)
+
+        return app_info
+
+    async def get_task_info(self, dev_step_id, prompt_data_json: str, llm_response: dict) -> dict:
+        prompt_data = loads(prompt_data_json)
+        current_feature = prompt_data.get("current_feature")
+        previous_features = prompt_data.get("previous_features") or []
+        tasks = prompt_data["development_tasks"]
+        current_task_index = prompt_data["current_task_index"]
+        current_task = tasks[current_task_index]
+        instructions = llm_response
+        files = await self.get_task_files(dev_step_id)
+        return {
+            "current_feature": current_feature,
+            "previous_features": previous_features,
+            "tasks": tasks,
+            "current_task_index": current_task_index,
+            "current_task": current_task,
+            "instructions": instructions,
+            "files": files,
+        }
+
+    async def get_task_files(self, dev_step_id: int):
+        files = {}
+
+        async with self.conn.execute(
+            "select content, path, name, description from file_snapshot "
+            "inner join file on file_snapshot.file_id = file.id "
+            "where file_snapshot.development_step_id = ?",
+            (dev_step_id,),
+        ) as cursor:
+            async for row in cursor:
+                content, path, name, description = row
+                file_path = Path(path + "/" + name).as_posix() if path else name
+                try:
+                    if isinstance(content, bytes):
+                        content = content.decode("utf-8")
+                except:  # noqa
+                    # skip binary file
+                    continue
+                files[file_path] = {
+                    "description": description or None,
+                    "content": content,
+                }
+
+        return files
+
+    async def save_to_new_database(self, info: dict):
+        async with self.session_manager as session:
+            projects = await Project.get_all_projects(session)
+
+        for project in projects:
+            imported_app = info.pop(project.id.hex, None)
+            if imported_app:
+                log.info(f"Project {project.name} already exists in the new database, skipping")
+
+        for app_id, app_info in info.items():
+            await self.save_app(app_id, app_info)
+
+    async def save_app(self, app_id: str, app_info: dict):
+        log.info(f"Importing app {app_info['name']} (id={app_id}) ...")
+
+        async with self.session_manager as session:
+            project = Project(id=UUID(app_id), name=app_info["name"])
+            branch = Branch(project=project)
+            state = ProjectState.create_initial_state(branch)
+
+            spec = state.specification
+            spec.description = app_info["initial_prompt"]
+            spec.architecture = app_info["architecture"]["architecture"]
+            spec.system_dependencies = app_info["architecture"]["system_dependencies"]
+            spec.package_dependencies = app_info["architecture"]["package_dependencies"]
+            spec.template = app_info["architecture"].get("template")
+
+            session.add(project)
+            await session.commit()
+
+        project = await self.state_manager.load_project(project_id=app_id)
+
+        # It is much harder to import all tasks and keep features/tasks lists in sync, so
+        # we only support importing the latest task.
+        if app_info["tasks"]:
+            await self.save_latest_task(app_info["tasks"][-1])
+
+        # This just closes the session and removes the last (incomplete) state.
+        # Everything else should already be safely comitted.
+        await self.state_manager.rollback()
+
+    async def save_latest_task(self, task: dict):
+        sm = self.state_manager
+        state = sm.current_state
+
+        state.epics = [
+            {
+                "id": uuid4().hex,
+                "name": "Initial Project",
+                "description": state.specification.description,
+                "summary": None,
+                "completed": bool(task["previous_features"]) or (task["current_feature"] is not None),
+                "complexity": "hard",
+            }
+        ]
+
+        for i, feature in enumerate(task["previous_features"]):
+            state.epics += [
+                {
+                    "id": uuid4().hex,
+                    "name": f"Feature #{i + 1}",
+                    "description": feature["summary"],  # FIXME: is this good enough
+                    "summary": None,
+                    "completed": True,
+                    "complexity": "hard",
+                }
+            ]
+
+        if task["current_feature"]:
+            state.epics = state.epics + [
+                {
+                    "id": uuid4().hex,
+                    "name": f"Feature #{len(state.epics)}",
+                    "description": task["current_feature"],
+                    "summary": None,
+                    "completed": False,
+                    "complexity": "hard",
+                }
+            ]
+
+        current_task_index = task["current_task_index"]
+        state.tasks = [
+            {
+                "id": uuid4().hex,
+                "description": task_info["description"],
+                "instructions": None,
+                "completed": current_task_index > i,
+            }
+            for i, task_info in enumerate(task["tasks"])
+        ]
+        state.tasks[current_task_index]["instructions"] = task["instructions"]
+        await sm.current_session.commit()
+
+        # Reload project at the initialized state to reinitialize the next state
+        await self.state_manager.load_project(project_id=state.branch.project.id, step_index=state.step_index)
+
+        await self.save_task_files(task["files"])
+        await self.state_manager.commit()
+
+    async def save_task_files(self, files: dict):
+        for path, file_info in files.items():
+            await self.state_manager.save_file(
+                path,
+                file_info["content"],
+                metadata={
+                    "description": file_info["description"],
+                    "references": [],
+                },
+            )
--- a/core/disk/init.py
+++ b/core/disk/init.py
--- a/core/disk/ignore.py
+++ b/core/disk/ignore.py
@@ -0,0 +1,125 @@
+import os.path
+from fnmatch import fnmatch
+from typing import Optional
+
+
+class IgnoreMatcher:
+    """
+    A class to match paths against a list of ignore patterns or
+    file attributes (size, type).
+    """
+
+    def __init__(
+        self,
+        root_path: str,
+        ignore_paths: list[str],
+        *,
+        ignore_size_threshold: Optional[int] = None,
+    ):
+        """
+        Initialize the IgnoreMatcher object.
+
+        Ignore paths are matched agains the file name and the full path,
+        and may include shell-like wildcards ("*" for any number of characters,
+        "?" for a single character). Paths are normalized, so "/" works on both
+        Unix and Windows, and Windows matching is case insensitive.
+
+        :param root_path: Root path to use when checking files on disk.
+        :param ignore_paths: List of patterns to ignore.
+        :param ignore_size_threshold: Files larger than this size will be ignored.
+        """
+        self.root_path = root_path
+        self.ignore_paths = ignore_paths
+        self.ignore_size_threshold = ignore_size_threshold
+
+    def ignore(self, path: str) -> bool:
+        """
+        Check if the given path matches any of the ignore patterns.
+
+        :param path: (Relative) path to the file or directory to check
+        :return: True if the path matches any of the ignore patterns, False otherwise
+        """
+
+        full_path = os.path.normpath(os.path.join(self.root_path, path))
+
+        if self._is_in_ignore_list(path):
+            return True
+
+        if self._is_large_file(full_path):
+            return True
+
+        # Binary files are always ignored
+        if self._is_binary(full_path):
+            return True
+
+        return False
+
+    def _is_in_ignore_list(self, path: str) -> bool:
+        """
+        Check if the given path matches any of the ignore patterns.
+
+        Both the (relative) file path and the file (base) name are matched.
+
+        :param path: The path to the file or directory to check
+        :return: True if the path matches any of the ignore patterns, False otherwise.
+        """
+        name = os.path.basename(path)
+        for pattern in self.ignore_paths:
+            if fnmatch(name, pattern) or fnmatch(path, pattern):
+                return True
+        return False
+
+    def _is_large_file(self, full_path: str) -> bool:
+        """
+        Check if the given file is larger than the threshold.
+
+        This also returns True if the file doesn't or is not a regular file (eg.
+        it's a symlink), since we want to ignore those kinds of files as well.
+
+        :param path: Full path to the file to check.
+        :return: True if the file is larger than the threshold, False otherwise.
+        """
+        if self.ignore_size_threshold is None:
+            return False
+
+        # We don't handle directories here
+        if os.path.isdir(full_path):
+            return False
+
+        if not os.path.isfile(full_path):
+            return True
+
+        try:
+            return bool(os.path.getsize(full_path) > self.ignore_size_threshold)
+        except:  # noqa
+            return True
+
+    def _is_binary(self, full_path: str) -> bool:
+        """
+        Check if the given file is binary and should be ignored.
+
+        This also returns True if the file doesn't or is not a regular file (eg.
+        it's a symlink), or can't be opened, since we want to ignore those too.
+
+        :param path: Full path to the file to check.
+        :return: True if the file should be ignored, False otherwise.
+        """
+
+        # We don't handle directories here
+        if os.path.isdir(full_path):
+            return False
+
+        if not os.path.isfile(full_path):
+            return True
+
+        try:
+            with open(full_path, "r", encoding="utf-8") as f:
+                f.read(128 * 1024)
+            return False
+        except:  # noqa
+            # If we can't open the file for any reason (eg. PermissionError), it's
+            # best to ignore it anyway
+            return True
+
+
+__all__ = ["IgnoreMatcher"]
--- a/core/disk/vfs.py
+++ b/core/disk/vfs.py
@@ -0,0 +1,188 @@
+import os
+import os.path
+from hashlib import sha1
+from pathlib import Path
+
+from core.disk.ignore import IgnoreMatcher
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class VirtualFileSystem:
+    def save(self, path: str, content: str):
+        """
+        Save content to a file. Use for both new and updated files.
+
+        :param path: Path to the file, relative to project root.
+        :param content: Content to save.
+        """
+        raise NotImplementedError()
+
+    def read(self, path: str) -> str:
+        """
+        Read file contents.
+
+        :param path: Path to the file, relative to project root.
+        :return: File contents.
+        """
+        raise NotImplementedError()
+
+    def remove(self, path: str):
+        """
+        Remove a file.
+
+        If file doesn't exist or is a directory, or if the file is ignored,
+        do nothing.
+
+        :param path: Path to the file, relative to project root.
+        """
+        raise NotImplementedError()
+
+    def get_full_path(self, path: str) -> str:
+        """
+        Get the full path to a file.
+
+        This should be used to check the full path of the file on whichever
+        file system it locally is stored. For example, getting a full path
+        to a file and then passing it to an external program via run_command
+        should work.
+
+        :param path: Path to the file, relative to project root.
+        :return: Full path to the file.
+        """
+        raise NotImplementedError()
+
+    def _filter_by_prefix(self, file_list: list[str], prefix: str) -> list[str]:
+        # We use "/" internally on all platforms, including win32
+        if not prefix.endswith("/"):
+            prefix = prefix + "/"
+        return [f for f in file_list if f.startswith(prefix)]
+
+    def _get_file_list(self) -> list[str]:
+        raise NotImplementedError()
+
+    def list(self, prefix: str = None) -> list[str]:
+        """
+        Return a list of files in the project.
+
+        File paths are relative to the project root.
+
+        :param prefix: Optional prefix to filter files for.
+        :return: List of file paths.
+        """
+        retval = sorted(self._get_file_list())
+        if prefix:
+            retval = self._filter_by_prefix(retval, prefix)
+        return retval
+
+    def hash(self, path: str) -> str:
+        content = self.read(path)
+        return self.hash_string(content)
+
+    @staticmethod
+    def hash_string(content: str) -> str:
+        return sha1(content.encode("utf-8")).hexdigest()
+
+
+class MemoryVFS(VirtualFileSystem):
+    files: dict[str, str]
+
+    def __init__(self):
+        self.files = {}
+
+    def save(self, path: str, content: str):
+        self.files[path] = content
+
+    def read(self, path: str) -> str:
+        try:
+            return self.files[path]
+        except KeyError:
+            raise ValueError(f"File not found: {path}")
+
+    def remove(self, path: str):
+        if path in self.files:
+            del self.files[path]
+
+    def get_full_path(self, path: str) -> str:
+        # We use "/" internally on all platforms, including win32
+        return "/" + path
+
+    def _get_file_list(self) -> list[str]:
+        return self.files.keys()
+
+
+class LocalDiskVFS(VirtualFileSystem):
+    def __init__(
+        self,
+        root: str,
+        create: bool = True,
+        allow_existing: bool = True,
+        ignore_matcher: IgnoreMatcher = None,
+    ):
+        if not os.path.isdir(root):
+            if create:
+                os.makedirs(root)
+            else:
+                raise ValueError(f"Root directory does not exist: {root}")
+        else:
+            if not allow_existing:
+                raise FileExistsError(f"Root directory already exists: {root}")
+
+        if ignore_matcher is None:
+            ignore_matcher = IgnoreMatcher(root, [])
+
+        self.root = root
+        self.ignore_matcher = ignore_matcher
+
+    def get_full_path(self, path: str) -> str:
+        return os.path.normpath(os.path.join(self.root, path))
+
+    def save(self, path: str, content: str):
+        full_path = self.get_full_path(path)
+        os.makedirs(os.path.dirname(full_path), exist_ok=True)
+        with open(full_path, "w", encoding="utf-8") as f:
+            f.write(content)
+        log.debug(f"Saved file {path} ({len(content)} bytes) to {full_path}")
+
+    def read(self, path: str) -> str:
+        full_path = self.get_full_path(path)
+        if not os.path.isfile(full_path):
+            raise ValueError(f"File not found: {path}")
+
+        # TODO: do we want error handling here?
+        with open(full_path, "r", encoding="utf-8") as f:
+            return f.read()
+
+    def remove(self, path: str):
+        if self.ignore_matcher.ignore(path):
+            return
+
+        full_path = self.get_full_path(path)
+        if os.path.isfile(full_path):
+            try:
+                os.remove(full_path)
+                log.debug(f"Removed file {path} from {full_path}")
+            except Exception as err:  # noqa
+                log.error(f"Failed to remove file {path}: {err}", exc_info=True)
+
+    def _get_file_list(self) -> list[str]:
+        files = []
+        for dpath, dirnames, filenames in os.walk(self.root):
+            # Modify in place to prevent recursing into ignored directories
+            dirnames[:] = [
+                d
+                for d in dirnames
+                if not self.ignore_matcher.ignore(os.path.relpath(os.path.join(dpath, d), self.root))
+            ]
+
+            for filename in filenames:
+                path = os.path.relpath(os.path.join(dpath, filename), self.root)
+                if not self.ignore_matcher.ignore(path):
+                    # We use "/" internally on all platforms, including win32
+                    files.append(Path(path).as_posix())
+
+        return files
+
+
+__all__ = ["VirtualFileSystem", "MemoryVFS", "LocalDiskVFS"]
--- a/core/llm/init.py
+++ b/core/llm/init.py
--- a/core/llm/anthropic_client.py
+++ b/core/llm/anthropic_client.py
@@ -0,0 +1,123 @@
+import datetime
+import zoneinfo
+from typing import Optional
+
+from anthropic import AsyncAnthropic, RateLimitError
+from httpx import Timeout
+
+from core.config import LLMProvider
+from core.llm.convo import Convo
+from core.log import get_logger
+
+from .base import BaseLLMClient
+
+log = get_logger(__name__)
+
+# Maximum number of tokens supported by Anthropic Claude 3
+MAX_TOKENS = 4096
+
+
+class AnthropicClient(BaseLLMClient):
+    provider = LLMProvider.ANTHROPIC
+
+    def _init_client(self):
+        self.client = AsyncAnthropic(
+            api_key=self.config.api_key,
+            base_url=self.config.base_url,
+            timeout=Timeout(
+                max(self.config.connect_timeout, self.config.read_timeout),
+                connect=self.config.connect_timeout,
+                read=self.config.read_timeout,
+            ),
+        )
+        self.stream_handler = self.stream_handler
+
+    def _adapt_messages(self, convo: Convo) -> list[dict[str, str]]:
+        """
+        Adapt the conversation messages to the format expected by the Anthropic Claude model.
+
+        Claude only recognizes "user" and "assistant" roles, and requires them to be switched
+        for each message (ie. no consecutive messages from the same role).
+
+        :param convo: Conversation to adapt.
+        :return: Adapted conversation messages.
+        """
+        messages = []
+        for msg in convo.messages:
+            if msg["role"] == "function":
+                raise ValueError("Anthropic Claude doesn't support function calling")
+
+            role = "user" if msg["role"] in ["user", "system"] else "assistant"
+            if messages and messages[-1]["role"] == role:
+                messages[-1]["content"] += "\n\n" + msg["content"]
+            else:
+                messages.append(
+                    {
+                        "role": role,
+                        "content": msg["content"],
+                    }
+                )
+        return messages
+
+    async def _make_request(
+        self,
+        convo: Convo,
+        temperature: Optional[float] = None,
+        json_mode: bool = False,
+    ) -> tuple[str, int, int]:
+        messages = self._adapt_messages(convo)
+        completion_kwargs = {
+            "max_tokens": MAX_TOKENS,
+            "model": self.config.model,
+            "messages": messages,
+            "temperature": self.config.temperature if temperature is None else temperature,
+        }
+        if json_mode:
+            completion_kwargs["response_format"] = {"type": "json_object"}
+
+        response = []
+        async with self.client.messages.stream(**completion_kwargs) as stream:
+            async for content in stream.text_stream:
+                response.append(content)
+                if self.stream_handler:
+                    await self.stream_handler(content)
+
+            # TODO: get tokens from the final message
+            final_message = await stream.get_final_message()
+            final_message.content
+
+        response_str = "".join(response)
+
+        # Tell the stream handler we're done
+        if self.stream_handler:
+            await self.stream_handler(None)
+
+        return response_str, final_message.usage.input_tokens, final_message.usage.output_tokens
+
+    def rate_limit_sleep(self, err: RateLimitError) -> Optional[datetime.timedelta]:
+        """
+        Anthropic rate limits docs:
+        https://docs.anthropic.com/en/api/rate-limits#response-headers
+        Limit reset times are in RFC 3339 format.
+
+        """
+        headers = err.response.headers
+        if "anthropic-ratelimit-tokens-remaining" not in headers:
+            return None
+
+        remaining_tokens = headers["anthropic-ratelimit-tokens-remaining"]
+        if remaining_tokens == 0:
+            relevant_dt = headers["anthropic-ratelimit-tokens-reset"]
+        else:
+            relevant_dt = headers["anthropic-ratelimit-requests-reset"]
+
+        try:
+            reset_time = datetime.datetime.fromisoformat(relevant_dt)
+        except ValueError:
+            return datetime.timedelta(seconds=5)
+
+        now = datetime.datetime.now(tz=zoneinfo.ZoneInfo("UTC"))
+        return reset_time - now
+
+
+__all__ = ["AnthropicClient"]
--- a/core/llm/base.py
+++ b/core/llm/base.py
@@ -0,0 +1,306 @@
+import asyncio
+import datetime
+import json
+from enum import Enum
+from time import time
+from typing import Any, Callable, Optional, Tuple
+
+import httpx
+
+from core.config import LLMConfig, LLMProvider
+from core.llm.convo import Convo
+from core.llm.request_log import LLMRequestLog, LLMRequestStatus
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+
+class LLMError(str, Enum):
+    KEY_EXPIRED = "key_expired"
+    RATE_LIMITED = "rate_limited"
+
+
+class APIError(Exception):
+    def __init__(self, message: str):
+        self.message = message
+
+
+class BaseLLMClient:
+    """
+    Base asynchronous streaming client for language models.
+
+    Example usage:
+
+    >>> async def stream_handler(content: str):
+    ...     print(content)
+    ...
+    >>> def parser(content: str) -> dict:
+    ...     return json.loads(content)
+    ...
+    >>> client_class = BaseClient.for_provider(provider)
+    >>> client = client_class(config, stream_handler=stream_handler)
+    >>> response, request_log = await client(convo, parser=parser)
+    """
+
+    provider: LLMProvider
+
+    def __init__(
+        self,
+        config: LLMConfig,
+        *,
+        stream_handler: Optional[Callable] = None,
+        error_handler: Optional[Callable] = None,
+    ):
+        """
+        Initialize the client with the given configuration.
+
+        :param config: Configuration for the client.
+        :param stream_handler: Optional handler for streamed responses.
+        """
+        self.config = config
+        self.stream_handler = stream_handler
+        self.error_handler = error_handler
+        self._init_client()
+
+    def _init_client(self):
+        raise NotImplementedError()
+
+    async def _make_request(
+        self,
+        convo: Convo,
+        temperature: Optional[float] = None,
+        json_mode: bool = False,
+    ) -> tuple[str, int, int]:
+        """
+        Call the Anthropic Claude model with the given conversation.
+
+        Low-level method that streams the response chunks.
+        Use `__call__` instead of this method.
+
+        :param convo: Conversation to send to the LLM.
+        :param json_mode: If True, the response is expected to be JSON.
+        :return: Tuple containing the full response content, number of input tokens, and number of output tokens.
+        """
+        raise NotImplementedError()
+
+    async def _adapt_messages(self, convo: Convo) -> list[dict[str, str]]:
+        """
+        Adapt the conversation messages to the format expected by the LLM.
+
+        Claude only recognizes "user" and "assistant roles"
+
+        :param convo: Conversation to adapt.
+        :return: Adapted conversation messages.
+        """
+        messages = []
+        for msg in convo.messages:
+            if msg.role == "function":
+                raise ValueError("Anthropic Claude doesn't support function calling")
+
+            role = "user" if msg.role in ["user", "system"] else "assistant"
+            if messages and messages[-1]["role"] == role:
+                messages[-1]["content"] += "\n\n" + msg.content
+            else:
+                messages.append(
+                    {
+                        "role": role,
+                        "content": msg.content,
+                    }
+                )
+        return messages
+
+    async def __call__(
+        self,
+        convo: Convo,
+        *,
+        temperature: Optional[float] = None,
+        parser: Optional[Callable] = None,
+        max_retries: int = 3,
+        json_mode: bool = False,
+    ) -> Tuple[Any, LLMRequestLog]:
+        """
+        Invoke the LLM with the given conversation.
+
+        Stream handler, if provided, should be an async function
+        that takes a single argument, the response content (str).
+        It will be called for each response chunk.
+
+        Parser, if provided, should be a function that takes the
+        response content (str) and returns the parsed response.
+        On parse error, the parser should raise a ValueError with
+        a descriptive error message that will be sent back to the LLM
+        to retry, up to max_retries.
+
+        :param convo: Conversation to send to the LLM.
+        :param parser: Optional parser for the response.
+        :param max_retries: Maximum number of retries for parsing the response.
+        :param json_mode: If True, the response is expected to be JSON.
+        :return: Tuple of the (parsed) response and request log entry.
+        """
+        import anthropic
+        import groq
+        import openai
+
+        if temperature is None:
+            temperature = self.config.temperature
+
+        convo = convo.fork()
+        request_log = LLMRequestLog(
+            provider=self.provider,
+            model=self.config.model,
+            temperature=temperature,
+        )
+
+        prompt_length_kb = len(json.dumps(convo.messages).encode("utf-8")) / 1024
+        log.debug(
+            f"Calling {self.provider.value} model {self.config.model} (temp={temperature}), prompt length: {prompt_length_kb:.1f} KB"
+        )
+        t0 = time()
+
+        for _ in range(max_retries):
+            request_log.messages = convo.messages[:]
+            request_log.response = None
+            request_log.error = None
+            response = None
+
+            try:
+                response, prompt_tokens, completion_tokens = await self._make_request(
+                    convo,
+                    temperature=temperature,
+                    json_mode=json_mode,
+                )
+            except (openai.APIConnectionError, anthropic.APIConnectionError, groq.APIConnectionError) as err:
+                log.warning(f"API connection error: {err}", exc_info=True)
+                request_log.error = str(f"API connection error: {err}")
+                request_log.status = LLMRequestStatus.ERROR
+                continue
+            except httpx.ReadTimeout as err:
+                log.warning(f"Read timeout (set to {self.config.read_timeout}s): {err}", exc_info=True)
+                request_log.error = str(f"Read timeout: {err}")
+                request_log.status = LLMRequestStatus.ERROR
+                continue
+            except httpx.ReadError as err:
+                log.warning(f"Read error: {err}", exc_info=True)
+                request_log.error = str(f"Read error: {err}")
+                request_log.status = LLMRequestStatus.ERROR
+                continue
+            except (openai.RateLimitError, anthropic.RateLimitError, groq.RateLimitError) as err:
+                log.warning(f"Rate limit error: {err}", exc_info=True)
+                request_log.error = str(f"Rate limit error: {err}")
+                request_log.status = LLMRequestStatus.ERROR
+                wait_time = self.rate_limit_sleep(err)
+                if wait_time:
+                    message = f"We've hit {self.config.provider.value} rate limit. Sleeping for {wait_time.seconds} seconds..."
+                    await self.error_handler(LLMError.RATE_LIMITED, message)
+                    await asyncio.sleep(wait_time.seconds)
+                    continue
+                else:
+                    # RateLimitError that shouldn't be retried, eg. insufficient funds
+                    err_msg = err.response.json().get("error", {}).get("message", "Rate limiting error.")
+                    raise APIError(err_msg) from err
+            except (openai.NotFoundError, anthropic.NotFoundError, groq.NotFoundError) as err:
+                err_msg = err.response.json().get("error", {}).get("message", f"Model not found: {self.config.model}")
+                raise APIError(err_msg) from err
+            except (openai.AuthenticationError, anthropic.AuthenticationError, groq.AuthenticationError) as err:
+                log.warning(f"Key expired: {err}", exc_info=True)
+                err_msg = err.response.json().get("error", {}).get("message", "Incorrect API key")
+                if "[BricksLLM]" in err_msg:
+                    # We only want to show the key expired message if it's from Bricks
+                    await self.error_handler(LLMError.KEY_EXPIRED)
+
+                raise APIError(err_msg) from err
+            except (openai.APIStatusError, anthropic.APIStatusError, groq.APIStatusError) as err:
+                # Token limit exceeded (in original gpt-pilot handled as
+                # TokenLimitError) is thrown as 400 (OpenAI, Anthropic) or 413 (Groq).
+                # All providers throw an exception that is caught here.
+                # OpenAI and Groq return a `code` field in the error JSON that lets
+                # us confirm that we've breached the token limit, but Anthropic doesn't,
+                # so we can't be certain that's the problem in Anthropic case.
+                # Here we try to detect that and tell the user what happened.
+                err_code = err.response.json().get("error", {}).get("code", "")
+                if err_code in ("request_too_large", "context_length_exceeded", "string_above_max_length"):
+                    # Handle OpenAI and Groq token limit exceeded
+                    # OpenAI will return `string_above_max_length` for prompts more than 1M characters
+                    message = "".join(
+                        [
+                            "We sent too large request to the LLM, resulting in an error. ",
+                            "This is usually caused by including framework files in an LLM request. ",
+                            "Here's how you can get GPT Pilot to ignore those extra files: ",
+                            "https://bit.ly/faq-token-limit-error",
+                        ]
+                    )
+                    raise APIError(message) from err
+
+                log.warning(f"API error: {err}", exc_info=True)
+                request_log.error = str(f"API error: {err}")
+                request_log.status = LLMRequestStatus.ERROR
+                return None, request_log
+
+            request_log.response = response
+
+            request_log.prompt_tokens += prompt_tokens
+            request_log.completion_tokens += completion_tokens
+            if parser:
+                try:
+                    response = parser(response)
+                    break
+                except ValueError as err:
+                    log.debug(f"Error parsing GPT response: {err}, asking LLM to retry", exc_info=True)
+                    convo.assistant(response)
+                    convo.user(f"Error parsing response: {err}. Please output your response EXACTLY as requested.")
+                    continue
+            else:
+                break
+        else:
+            log.warning(f"Failed to parse response after {max_retries} retries")
+            response = None
+            request_log.status = LLMRequestStatus.ERROR
+
+        t1 = time()
+        request_log.duration = t1 - t0
+
+        log.debug(
+            f"Total {self.provider.value} response time {request_log.duration:.2f}s, {request_log.prompt_tokens} prompt tokens, {request_log.completion_tokens} completion tokens used"
+        )
+
+        return response, request_log
+
+    @staticmethod
+    def for_provider(provider: LLMProvider) -> type["BaseLLMClient"]:
+        """
+        Return LLM client for the specified provider.
+
+        :param provider: Provider to return the client for.
+        :return: Client class for the specified provider.
+        """
+        from .anthropic_client import AnthropicClient
+        from .groq_client import GroqClient
+        from .openai_client import OpenAIClient
+
+        if provider == LLMProvider.OPENAI:
+            return OpenAIClient
+        elif provider == LLMProvider.ANTHROPIC:
+            return AnthropicClient
+        elif provider == LLMProvider.GROQ:
+            return GroqClient
+        else:
+            raise ValueError(f"Unsupported LLM provider: {provider.value}")
+
+    def rate_limit_sleep(self, err: Exception) -> Optional[datetime.timedelta]:
+        """
+        Return how long we need to sleep because of rate limiting.
+
+        These are computed from the response headers that each LLM returns.
+        For details, check the implementation for the specific LLM. If there
+        are no rate limiting headers, we assume that the request should not
+        be retried and return None (this will be the case for insufficient
+        quota/funds in the account).
+
+        :param err: RateLimitError that was raised by the LLM client.
+        :return: optional timedelta to wait before trying again
+        """
+
+        raise NotImplementedError()
+
+
+__all__ = ["BaseLLMClient"]
--- a/core/llm/convo.py
+++ b/core/llm/convo.py
@@ -0,0 +1,163 @@
+from copy import deepcopy
+from typing import Iterator, Optional
+
+
+class Convo:
+    """
+    A conversation between a user and a Large Language Model (LLM) assistant.
+    """
+
+    ROLES = ["system", "user", "assistant", "function"]
+
+    messages: list[dict[str, str]]
+
+    def __init__(self, content: Optional[str] = None):
+        """
+        Initialize a new conversation.
+
+        :param content: Initial system message (optional).
+        """
+        self.messages = []
+        if content is not None:
+            self.system(content)
+
+    @staticmethod
+    def _dedent(text: str) -> str:
+        """
+        Remove common leading whitespace from every line of text.
+
+        :param text: Text to dedent.
+        :return: Dedented text.
+        """
+        indent = len(text)
+        lines = text.splitlines()
+        for line in lines:
+            if line.strip():
+                indent = min(indent, len(line) - len(line.lstrip()))
+        dedented_lines = [line[indent:].rstrip() for line in lines]
+        return "\n".join(line for line in dedented_lines)
+
+    def add(self, role: str, content: str, name: Optional[str] = None) -> "Convo":
+        """
+        Add a message to the conversation.
+
+        In most cases, you should use the convenience methods instead.
+
+        :param role: Role of the message (system, user, assistant, function).
+        :param content: Content of the message.
+        :param name: Name of the message sender (optional).
+        :return: The conv object.
+        """
+
+        if role not in self.ROLES:
+            raise ValueError(f"Unknown role: {role}")
+        if not content:
+            raise ValueError("Empty message content")
+        if not isinstance(content, str) and not isinstance(content, dict):
+            raise TypeError(f"Invalid message content: {type(content).__name__}")
+
+        message = {
+            "role": role,
+            "content": self._dedent(content) if isinstance(content, str) else content,
+        }
+        if name is not None:
+            message["name"] = name
+
+        self.messages.append(message)
+        return self
+
+    def system(self, content: str, name: Optional[str] = None) -> "Convo":
+        """
+        Add a system message to the conversation.
+
+        System messages can use `name` for showing example conversations
+        between an example user and an example assistant.
+
+        :param content: Content of the message.
+        :param name: Name of the message sender (optional).
+        :return: The convo object.
+        """
+        return self.add("system", content, name)
+
+    def user(self, content: str, name: Optional[str] = None) -> "Convo":
+        """
+        Add a user message to the conversation.
+
+        :param content: Content of the message.
+        :param name: User name (optional).
+        :return: The convo object.
+        """
+        return self.add("user", content, name)
+
+    def assistant(self, content: str, name: Optional[str] = None) -> "Convo":
+        """
+        Add an assistant message to the conversation.
+
+        :param content: Content of the message.
+        :param name: Assistant name (optional).
+        :return: The convo object.
+        """
+        return self.add("assistant", content, name)
+
+    def function(self, content: str, name: Optional[str] = None) -> "Convo":
+        """
+        Add a function (tool) response to the conversation.
+
+        :param content: Content of the message.
+        :param name: Function/tool name (optional).
+        :return: The convo object.
+        """
+        return self.add("function", content, name)
+
+    def fork(self) -> "Convo":
+        """
+        Create an identical copy of the conversation.
+
+        This performs a deep copy of all the message
+        contents, so you can safely modify both the
+        parent and the child conversation.
+
+        :return: A copy of the conversation.
+        """
+        child = Convo()
+        child.messages = deepcopy(self.messages)
+        return child
+
+    def after(self, parent: "Convo") -> "Convo":
+        """
+        Create a chat with only messages after the last common
+        message (that appears in both parent conversation and
+        this one).
+
+        :param parent: Parent conversation.
+        :return: A new conversation with only new messages.
+        """
+        index = 0
+        while index < min(len(self.messages), len(parent.messages)) and self.messages[index] == parent.messages[index]:
+            index += 1
+
+        child = Convo()
+        child.messages = [deepcopy(msg) for msg in self.messages[index:]]
+        return child
+
+    def last(self) -> Optional[dict[str, str]]:
+        """
+        Get the last message in the conversation.
+
+        :return: The last message, or None if the conversation is empty.
+        """
+        return self.messages[-1] if self.messages else None
+
+    def __iter__(self) -> Iterator[dict[str, str]]:
+        """
+        Iterate over the messages in the conversation.
+
+        :return: An iterator over the messages.
+        """
+        return iter(self.messages)
+
+    def __repr__(self) -> str:
+        return f"<Convo({self.messages})>"
+
+
+__all__ = ["Convo"]
--- a/core/llm/groq_client.py
+++ b/core/llm/groq_client.py
@@ -0,0 +1,93 @@
+import datetime
+from typing import Optional
+
+import tiktoken
+from groq import AsyncGroq, RateLimitError
+from httpx import Timeout
+
+from core.config import LLMProvider
+from core.llm.base import BaseLLMClient
+from core.llm.convo import Convo
+from core.log import get_logger
+
+log = get_logger(__name__)
+tokenizer = tiktoken.get_encoding("cl100k_base")
+
+
+class GroqClient(BaseLLMClient):
+    provider = LLMProvider.GROQ
+
+    def _init_client(self):
+        self.client = AsyncGroq(
+            api_key=self.config.api_key,
+            base_url=self.config.base_url,
+            timeout=Timeout(
+                max(self.config.connect_timeout, self.config.read_timeout),
+                connect=self.config.connect_timeout,
+                read=self.config.read_timeout,
+            ),
+        )
+
+    async def _make_request(
+        self,
+        convo: Convo,
+        temperature: Optional[float] = None,
+        json_mode: bool = False,
+    ) -> tuple[str, int, int]:
+        completion_kwargs = {
+            "model": self.config.model,
+            "messages": convo.messages,
+            "temperature": self.config.temperature if temperature is None else temperature,
+            "stream": True,
+        }
+        if json_mode:
+            completion_kwargs["response_format"] = {"type": "json_object"}
+
+        stream = await self.client.chat.completions.create(**completion_kwargs)
+        response = []
+        prompt_tokens = 0
+        completion_tokens = 0
+
+        async for chunk in stream:
+            if not chunk.choices:
+                continue
+
+            content = chunk.choices[0].delta.content
+            if not content:
+                continue
+
+            response.append(content)
+            if self.stream_handler:
+                await self.stream_handler(content)
+
+        response_str = "".join(response)
+
+        # Tell the stream handler we're done
+        if self.stream_handler:
+            await self.stream_handler(None)
+
+        if prompt_tokens == 0 and completion_tokens == 0:
+            # FIXME: Here we estimate Groq tokens using the same method as for OpenAI....
+            # See https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
+            prompt_tokens = sum(3 + len(tokenizer.encode(msg["content"])) for msg in convo.messages)
+            completion_tokens = len(tokenizer.encode(response_str))
+
+        return response_str, prompt_tokens, completion_tokens
+
+    def rate_limit_sleep(self, err: RateLimitError) -> Optional[datetime.timedelta]:
+        """
+        Groq rate limits docs: https://console.groq.com/docs/rate-limits
+
+        Groq includes `retry-after` header when 429 RateLimitError is
+        thrown, so we use that instead of calculating our own backoff time.
+        """
+
+        headers = err.response.headers
+        if "retry-after" not in headers:
+            return None
+
+        retry_after = int(err.response.headers["retry-after"])
+        return datetime.timedelta(seconds=retry_after)
+
+
+__all__ = ["GroqClient"]
--- a/core/llm/openai_client.py
+++ b/core/llm/openai_client.py
@@ -0,0 +1,116 @@
+import datetime
+import re
+from typing import Optional
+
+import tiktoken
+from httpx import Timeout
+from openai import AsyncOpenAI, RateLimitError
+
+from core.config import LLMProvider
+from core.llm.base import BaseLLMClient
+from core.llm.convo import Convo
+from core.log import get_logger
+
+log = get_logger(__name__)
+tokenizer = tiktoken.get_encoding("cl100k_base")
+
+
+class OpenAIClient(BaseLLMClient):
+    provider = LLMProvider.OPENAI
+
+    def _init_client(self):
+        self.client = AsyncOpenAI(
+            api_key=self.config.api_key,
+            base_url=self.config.base_url,
+            timeout=Timeout(
+                max(self.config.connect_timeout, self.config.read_timeout),
+                connect=self.config.connect_timeout,
+                read=self.config.read_timeout,
+            ),
+        )
+
+    async def _make_request(
+        self,
+        convo: Convo,
+        temperature: Optional[float] = None,
+        json_mode: bool = False,
+    ) -> tuple[str, int, int]:
+        completion_kwargs = {
+            "model": self.config.model,
+            "messages": convo.messages,
+            "temperature": self.config.temperature if temperature is None else temperature,
+            "stream": True,
+            "stream_options": {
+                "include_usage": True,
+            },
+        }
+        if json_mode:
+            completion_kwargs["response_format"] = {"type": "json_object"}
+
+        stream = await self.client.chat.completions.create(**completion_kwargs)
+        response = []
+        prompt_tokens = 0
+        completion_tokens = 0
+
+        async for chunk in stream:
+            if chunk.usage:
+                prompt_tokens += chunk.usage.prompt_tokens
+                completion_tokens += chunk.usage.completion_tokens
+
+            if not chunk.choices:
+                continue
+
+            content = chunk.choices[0].delta.content
+            if not content:
+                continue
+
+            response.append(content)
+            if self.stream_handler:
+                await self.stream_handler(content)
+
+        response_str = "".join(response)
+
+        # Tell the stream handler we're done
+        if self.stream_handler:
+            await self.stream_handler(None)
+
+        if prompt_tokens == 0 and completion_tokens == 0:
+            # See https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
+            prompt_tokens = sum(3 + len(tokenizer.encode(msg["content"])) for msg in convo.messages)
+            completion_tokens = len(tokenizer.encode(response_str))
+            log.warning(
+                "OpenAI response did not include token counts, estimating with tiktoken: "
+                f"{prompt_tokens} input tokens, {completion_tokens} output tokens"
+            )
+
+        return response_str, prompt_tokens, completion_tokens
+
+    def rate_limit_sleep(self, err: RateLimitError) -> Optional[datetime.timedelta]:
+        """
+        OpenAI rate limits docs:
+        https://platform.openai.com/docs/guides/rate-limits/error-mitigation
+        Limit reset times are in "2h32m54s" format.
+        """
+
+        headers = err.response.headers
+        if "x-ratelimit-remaining-tokens" not in headers:
+            return None
+
+        remaining_tokens = headers["x-ratelimit-remaining-tokens"]
+        time_regex = r"(?:(\d+)h)?(?:(\d+)m)?(?:(\d+)s)?"
+        if remaining_tokens == 0:
+            match = re.search(time_regex, headers["x-ratelimit-reset-tokens"])
+        else:
+            match = re.search(time_regex, headers["x-ratelimit-reset-requests"])
+
+        if match:
+            seconds = int(match.group(1)) * 3600 + int(match.group(2)) * 60 + int(match.group(3))
+        else:
+            # Not sure how this would happen, we would have to get a RateLimitError,
+            # but nothing (or invalid entry) in the `reset` field. Using a sane default.
+            seconds = 5
+
+        return datetime.timedelta(seconds=seconds)
+
+
+__all__ = ["OpenAIClient"]
--- a/core/llm/parser.py
+++ b/core/llm/parser.py
@@ -0,0 +1,161 @@
+import json
+import re
+from enum import Enum
+from typing import Optional, Union
+
+from pydantic import BaseModel, ValidationError
+
+
+class MultiCodeBlockParser:
+    """
+    Parse multiple Markdown code blocks from a string.
+
+    Expects zero or more blocks, and ignores any text
+    outside of the code blocks.
+
+    Example usage:
+
+    >>> parser = MultiCodeBlockParser()
+    >>> text = '''
+    ... text outside block
+    ...
+    ... ```python
+    ... first block
+    ... ```
+    ... some text between blocks
+    ... ```js
+    ... more
+    ... code
+    ... ```
+    ... some text after blocks
+    '''
+    >>> assert parser(text) == ["first block", "more\ncode"]
+
+    If no code blocks are found, an empty list is returned:
+    """
+
+    def __init__(self):
+        # FIXME: ``` should be the only content on the line`
+        self.pattern = re.compile(r"```([a-z0-9]+\n)?(.*?)```\s*", re.DOTALL)
+
+    def __call__(self, text: str) -> list[str]:
+        blocks = []
+        for block in self.pattern.findall(text):
+            blocks.append(block[1].strip())
+        return blocks
+
+
+class CodeBlockParser(MultiCodeBlockParser):
+    """
+    Parse a Markdown code block from a string.
+
+    Expects exactly one code block, and ignores
+    any text before or after it.
+
+    Usage:
+    >>> parser = CodeBlockParser()
+    >>> text = "text\n```py\ncodeblock\n'''\nmore text"
+    >>> assert parser(text) == "codeblock"
+
+    This is a special case of MultiCodeBlockParser,
+    checking that there's exactly one block.
+    """
+
+    def __call__(self, text: str) -> str:
+        blocks = super().__call__(text)
+        # FIXME: if there are more than 1 code block, this means the output actually contains ```,
+        # so re-parse this with that in mind
+        if len(blocks) != 1:
+            raise ValueError(f"Expected a single code block, got {len(blocks)}")
+        return blocks[0]
+
+
+class OptionalCodeBlockParser:
+    def __call__(self, text: str) -> str:
+        text = text.strip()
+        if text.startswith("```") and text.endswith("\n```"):
+            # Remove the first and last line. Note the first line may include syntax
+            # highlighting, so we can't just remove the first 3 characters.
+            text = "\n".join(text.splitlines()[1:-1]).strip()
+        return text
+
+
+class JSONParser:
+    def __init__(self, spec: Optional[BaseModel] = None, strict: bool = True):
+        self.spec = spec
+        self.strict = strict or (spec is not None)
+
+    @property
+    def schema(self):
+        return self.spec.model_json_schema() if self.spec else None
+
+    @staticmethod
+    def errors_to_markdown(errors: list) -> str:
+        error_txt = []
+        for error in errors:
+            loc = ".".join(str(loc) for loc in error["loc"])
+            etype = error["type"]
+            msg = error["msg"]
+            error_txt.append(f"- `{loc}`: {etype} ({msg})")
+        return "\n".join(error_txt)
+
+    def __call__(self, text: str) -> Union[BaseModel, dict, None]:
+        text = text.strip()
+        if text.startswith("```"):
+            try:
+                text = CodeBlockParser()(text)
+            except ValueError:
+                if self.strict:
+                    raise
+                else:
+                    return None
+
+        try:
+            data = json.loads(text.strip())
+        except json.JSONDecodeError as e:
+            if self.strict:
+                raise ValueError(f"JSON is not valid: {e}") from e
+            else:
+                return None
+        if self.spec is None:
+            return data
+
+        try:
+            model = self.spec(**data)
+        except ValidationError as err:
+            errtxt = self.errors_to_markdown(err.errors())
+            raise ValueError(f"Invalid JSON format:\n{errtxt}") from err
+        except Exception as err:
+            raise ValueError(f"Error parsing JSON: {err}") from err
+
+        return model
+
+
+class EnumParser:
+    def __init__(self, spec: Enum, ignore_case: bool = True):
+        self.spec = spec
+        self.ignore_case = ignore_case
+
+    def __call__(self, text: str) -> Enum:
+        text = text.strip()
+        if self.ignore_case:
+            text = text.lower()
+        try:
+            return self.spec(text)
+        except ValueError as e:
+            options = ", ".join([str(v) for v in self.spec])
+            raise ValueError(f"Invalid option '{text}'; valid options: {options}") from e
+
+
+class StringParser:
+    def __call__(self, text: str) -> str:
+        # Strip any leading and trailing whitespace
+        text = text.strip()
+
+        # Check and remove quotes at the start and end if they match
+        if text.startswith(("'", '"')) and text.endswith(("'", '"')) and len(text) > 1:
+            # Remove the first and last character if they are both quotes
+            if text[0] == text[-1]:
+                text = text[1:-1]
+
+        return text
--- a/core/llm/prompt.py
+++ b/core/llm/prompt.py
@@ -0,0 +1,48 @@
+from os.path import isdir
+from typing import Any, Optional
+
+from jinja2 import BaseLoader, Environment, FileSystemLoader, StrictUndefined, TemplateNotFound
+
+
+class FormatTemplate:
+    def __call__(self, template: str, **kwargs: dict[str, Any]) -> str:
+        return template.format(**kwargs)
+
+
+class BaseJinjaTemplate:
+    def __init__(self, loader: Optional[BaseLoader]):
+        self.env = Environment(
+            loader=loader,
+            autoescape=False,
+            lstrip_blocks=True,
+            trim_blocks=True,
+            keep_trailing_newline=True,
+            undefined=StrictUndefined,
+        )
+
+
+class JinjaStringTemplate(BaseJinjaTemplate):
+    def __init__(self):
+        super().__init__(None)
+
+    def __call__(self, template: str, **kwargs: dict[str, Any]) -> str:
+        tpl = self.env.from_string(template)
+        return tpl.render(**kwargs)
+
+
+class JinjaFileTemplate(BaseJinjaTemplate):
+    def __init__(self, template_dirs: list[str]):
+        for td in template_dirs:
+            if not isdir(td):
+                raise ValueError(f"Template directory does not exist: {td}")
+        super().__init__(FileSystemLoader(template_dirs))
+
+    def __call__(self, template: str, **kwargs: dict[str, Any]) -> str:
+        try:
+            tpl = self.env.get_template(template)
+        except TemplateNotFound as err:
+            raise ValueError(f"Template not found: {template}") from err
+        return tpl.render(**kwargs)
+
+
+__all__ = ["FormatTemplate", "JinjaStringTemplate", "JinjaFileTemplate"]
--- a/core/llm/request_log.py
+++ b/core/llm/request_log.py
@@ -0,0 +1,28 @@
+from datetime import datetime
+from enum import Enum
+
+from pydantic import BaseModel, Field
+
+from core.config import LLMProvider
+
+
+class LLMRequestStatus(str, Enum):
+    SUCCESS = "success"
+    ERROR = "error"
+
+
+class LLMRequestLog(BaseModel):
+    provider: LLMProvider
+    model: str
+    temperature: float
+    messages: list[dict[str, str]] = Field(default_factory=list)
+    response: str = ""
+    prompt_tokens: int = 0
+    completion_tokens: int = 0
+    started_at: datetime = Field(default_factory=datetime.now)
+    duration: float = 0.0
+    status: LLMRequestStatus = LLMRequestStatus.SUCCESS
+    error: str = ""
+
+
+__all__ = ["LLMRequestLog", "LLMRequestStatus"]
--- a/core/log/init.py
+++ b/core/log/init.py
@@ -0,0 +1,50 @@
+from logging import FileHandler, Formatter, Logger, StreamHandler, getLogger
+
+from core.config import LogConfig
+
+
+def setup(config: LogConfig, force: bool = False):
+    """
+    Set up logging based on the current configuration.
+
+    The method is idempotent unless `force` is set to True,
+    in which case it will reconfigure the logging.
+    """
+
+    root = getLogger()
+    logger = getLogger("pythagora")
+    # Only clear/remove existing log handlers if we're forcing a new setup
+    if not force and (root.handlers or logger.handlers):
+        return
+
+    while force and root.handlers:
+        root.removeHandler(root.handlers[0])
+
+    while force and logger.handlers:
+        logger.removeHandler(logger.handlers[0])
+
+    level = config.level
+    formatter = Formatter(config.format)
+
+    if config.output:
+        handler = FileHandler(config.output, encoding="utf-8")
+    else:
+        handler = StreamHandler()
+
+    handler.setFormatter(formatter)
+    handler.setLevel(level)
+
+    logger.setLevel(level)
+    logger.addHandler(handler)
+
+
+def get_logger(name) -> Logger:
+    """
+    Get log function for a given (module) name
+
+    :return: Logger instance
+    """
+    return getLogger(name)
+
+
+__all__ = ["setup", "get_logger"]
--- a/core/proc/init.py
+++ b/core/proc/init.py
--- a/core/proc/exec_log.py
+++ b/core/proc/exec_log.py
@@ -0,0 +1,21 @@
+from datetime import datetime
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+
+class ExecLog(BaseModel):
+    started_at: datetime = Field(default_factory=datetime.now)
+    duration: float = Field(description="The duration of the command/process run in seconds")
+    cmd: str = Field(description="The full command (as executed in the shell)")
+    cwd: str = Field(description="The working directory for the command (relative to project root)")
+    env: dict = Field(description="The environment variables for the command")
+    timeout: Optional[float] = Field(description="The command timeout in seconds (or None if no timeout)")
+    status_code: Optional[int] = Field(description="The command return code, or None if there was a timeout")
+    stdout: str = Field(description="The command standard output")
+    stderr: str = Field(description="The command standard error")
+    analysis: str = Field(description="The result analysis as performed by the LLM")
+    success: bool = Field(description="Whether the command was successful")
+
+
+__all__ = ["ExecLog"]
--- a/core/proc/process_manager.py
+++ b/core/proc/process_manager.py
@@ -0,0 +1,278 @@
+import asyncio
+import signal
+import sys
+import time
+from dataclasses import dataclass
+from os import getenv
+from os.path import abspath, join
+from typing import Callable, Optional
+from uuid import UUID, uuid4
+
+import psutil
+
+from core.log import get_logger
+
+log = get_logger(__name__)
+
+NONBLOCK_READ_TIMEOUT = 0.01
+BUSY_WAIT_INTERVAL = 0.1
+WATCHER_IDLE_INTERVAL = 1.0
+MAX_COMMAND_TIMEOUT = 180
+
+
+@dataclass
+class LocalProcess:
+    id: UUID
+    cmd: str
+    cwd: str
+    env: dict[str, str]
+    stdout: str
+    stderr: str
+    _process: asyncio.subprocess.Process
+
+    def __hash__(self) -> int:
+        return hash(self.id)
+
+    @staticmethod
+    async def start(
+        cmd: str,
+        *,
+        cwd: str = ".",
+        env: dict[str, str],
+        bg: bool = False,
+    ) -> "LocalProcess":
+        log.debug(f"Starting process: {cmd} (cwd={cwd}, env={env})")
+        _process = await asyncio.create_subprocess_shell(
+            cmd,
+            cwd=cwd,
+            env=env,
+            start_new_session=bg,
+            stdin=asyncio.subprocess.PIPE,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
+        if bg:
+            _process.stdin.close()
+
+        return LocalProcess(
+            id=uuid4(),
+            cmd=cmd,
+            cwd=cwd,
+            env=env,
+            stdout="",
+            stderr="",
+            _process=_process,
+        )
+
+    async def wait(self, timeout: Optional[float] = None) -> int:
+        try:
+            future = self._process.wait()
+            if timeout:
+                future = asyncio.wait_for(future, timeout)
+            retcode = await future
+        except asyncio.TimeoutError:
+            log.debug(f"Process {self.cmd} still running after {timeout}s, terminating")
+            await self.terminate()
+            # FIXME: this may still hang if we don't manage to kill the process.
+            retcode = await self._process.wait()
+
+        await self.read_output()
+        return retcode
+
+    @staticmethod
+    async def _nonblock_read(reader: asyncio.StreamReader, timeout: float) -> str:
+        """
+        Reads data from a stream reader without blocking (for long).
+
+        This wraps the read in a (short) timeout to avoid blocking the event loop for too long.
+
+        :param reader: Async stream reader to read from.
+        :param timeout: Timeout for the read operation (should not be too long).
+        :return: Data read from the stream reader, or empty string.
+        """
+        try:
+            data = await asyncio.wait_for(reader.read(), timeout)
+            return data.decode("utf-8", errors="ignore")
+        except asyncio.TimeoutError:
+            return ""
+
+    async def read_output(self, timeout: float = NONBLOCK_READ_TIMEOUT) -> tuple[str, str]:
+        new_stdout = await self._nonblock_read(self._process.stdout, timeout)
+        new_stderr = await self._nonblock_read(self._process.stderr, timeout)
+        self.stdout += new_stdout
+        self.stderr += new_stderr
+        return (new_stdout, new_stderr)
+
+    async def _terminate_process_tree(self, signal: int):
+        # This is a recursive function that terminates the entire process tree
+        # of the current process. It first terminates all child processes, then
+        # terminates itself.
+        shell_process = psutil.Process(self._process.pid)
+        processes = shell_process.children(recursive=True)
+        processes.append(shell_process)
+        for proc in processes:
+            try:
+                proc.send_signal(signal)
+            except psutil.NoSuchProcess:
+                pass
+
+        psutil.wait_procs(processes, timeout=1)
+
+    async def terminate(self, kill: bool = True):
+        if kill and sys.platform != "win32":
+            await self._terminate_process_tree(signal.SIGKILL)
+        else:
+            # Windows doesn't have SIGKILL
+            await self._terminate_process_tree(signal.SIGTERM)
+
+    @property
+    def is_running(self) -> bool:
+        try:
+            return psutil.Process(self._process.pid).is_running()
+        except psutil.NoSuchProcess:
+            return False
+
+    @property
+    def pid(self) -> int:
+        return self._process.pid
+
+
+class ProcessManager:
+    def __init__(
+        self,
+        *,
+        root_dir: str,
+        env: Optional[dict[str, str]] = None,
+        output_handler: Optional[Callable] = None,
+        exit_handler: Optional[Callable] = None,
+    ):
+        if env is None:
+            env = {
+                "PATH": getenv("PATH"),
+            }
+        self.processes: dict[UUID, LocalProcess] = {}
+        self.default_env = env
+        self.root_dir = root_dir
+        self.watcher_should_run = True
+        self.watcher_task = asyncio.create_task(self.watcher())
+        self.output_handler = output_handler
+        self.exit_handler = exit_handler
+
+    async def stop_watcher(self):
+        """
+        Stop the process watcher.
+
+        This should only be done when the ProcessManager is no longer needed.
+        """
+        if not self.watcher_should_run:
+            raise ValueError("Process watcher is not running")
+
+        self.watcher_should_run = False
+        await self.watcher_task
+
+    async def watcher(self):
+        """
+        Watch over the processes and manage their output and lifecycle.
+
+        This is a separate coroutine running independently of the caller
+        coroutine.
+        """
+        # IDs of processes whos output has been fully read after they finished
+        complete_processes = set()
+
+        while self.watcher_should_run:
+            procs = [p for p in self.processes.values() if p.id not in complete_processes]
+            if len(procs) == 0:
+                await asyncio.sleep(WATCHER_IDLE_INTERVAL)
+                continue
+
+            for process in procs:
+                out, err = await process.read_output()
+                if self.output_handler and (out or err):
+                    await self.output_handler(out, err)
+
+                if not process.is_running:
+                    # We're not removing the complete process from the self.processes
+                    # list to give time to the rest of the system to read its outputs
+                    complete_processes.add(process.id)
+                    if self.exit_handler:
+                        await self.exit_handler(process)
+
+            # Sleep a bit to avoid busy-waiting
+            await asyncio.sleep(BUSY_WAIT_INTERVAL)
+
+    async def start_process(
+        self,
+        cmd: str,
+        *,
+        cwd: str = ".",
+        env: Optional[dict[str, str]] = None,
+        bg: bool = True,
+    ) -> LocalProcess:
+        env = {**self.default_env, **(env or {})}
+        abs_cwd = abspath(join(self.root_dir, cwd))
+        process = await LocalProcess.start(cmd, cwd=abs_cwd, env=env, bg=bg)
+        if bg:
+            self.processes[process.id] = process
+        return process
+
+    async def run_command(
+        self,
+        cmd: str,
+        *,
+        cwd: str = ".",
+        env: Optional[dict[str, str]] = None,
+        timeout: float = MAX_COMMAND_TIMEOUT,
+    ) -> tuple[Optional[int], str, str]:
+        """
+        Run command and wait for it to finish.
+
+        Status code is an integer representing the process exit code, or
+        None if the process timed out and was terminated.
+
+        :param cmd: Command to run.
+        :param cwd: Working directory.
+        :param env: Environment variables.
+        :param timeout: Timeout in seconds.
+        :return: Tuple of (status code, stdout, stderr).
+        """
+        timeout = min(timeout, MAX_COMMAND_TIMEOUT)
+        terminated = False
+        process = await self.start_process(cmd, cwd=cwd, env=env, bg=False)
+
+        t0 = time.time()
+        while process.is_running and (time.time() - t0) < timeout:
+            out, err = await process.read_output(BUSY_WAIT_INTERVAL)
+            if self.output_handler and (out or err):
+                await self.output_handler(out, err)
+
+        if process.is_running:
+            log.debug(f"Process {cmd} still running after {timeout}s, terminating")
+            await process.terminate()
+            terminated = True
+        else:
+            await process.wait()
+
+        out, err = await process.read_output()
+        if self.output_handler and (out or err):
+            await self.output_handler(out, err)
+
+        if terminated:
+            status_code = None
+        else:
+            status_code = process._process.returncode or 0
+
+        return (status_code, process.stdout, process.stderr)
+
+    def list_running_processes(self):
+        return [p for p in self.processes.values() if p.is_running]
+
+    async def terminate_process(self, process_id: UUID) -> tuple[str, str]:
+        if process_id not in self.processes:
+            raise ValueError(f"Process {process_id} not found")
+
+        process = self.processes[process_id]
+        await process.terminate(kill=False)
+        del self.processes[process_id]
+
+        return (process.stdout, process.stderr)
--- a/core/prompts/architect/technologies.prompt
+++ b/core/prompts/architect/technologies.prompt
@@ -0,0 +1,68 @@
+You're designing the architecture and technical specifications for a new project.
+
+If the project requirements call out for specific technology, use that. Otherwise, if working on a web app, prefer Node.js for the backend (with Express if a web server is needed, and MongoDB if a database is needed), and Bootstrap for the front-end. You MUST NOT use Docker, Kubernetes, microservices and single-page app frameworks like React, Next.js, Angular, Vue or Svelte unless the project details explicitly require it.
+
+Here are the details for the new project:
+-----------------------------
+{% include "partials/project_details.prompt" %}
+{% include "partials/features_list.prompt" %}
+-----------------------------
+
+Based on these details, think step by step to design the architecture for the project and choose technologies to use in building it.
+
+1. First, design and describe project architecture in general terms
+2. Then, list any system dependencies that should be installed on the system prior to start of development.  For each system depedency, output a {{ os }} command to check whether it's installed.
+3. Finally, list any other 3rd party packages or libraries that will be used (that will be installed later using packager a package manager in the project repository/environment).
+4. {% if templates %}Optionally, choose a project starter template.{% else %}(for this project there are no available starter/boilerplate templates, so there's no template to choose){% endif %}
+
+{% if templates %}
+You have an option to use a project template that implements standard boilerplate/scaffolding so you can start faster and be more productive. To be considered, a template must be compatible with the architecture and technologies you've choosen (it doesn't need to implement everything that will be used in the project, just a useful subset). If multiple templates can be considered, pick one that's the best match.
+
+If no project templates are a good match, don't pick any! It's better to start from scratch than to use a template that is not a good fit for the project and then spend time reworking it to fit the requirements.
+
+Here are the available project templates:
+{% for name, tpl in templates.items() %}
+### {{ name }}
+{{ tpl.description }}
+
+Contains:
+{{ tpl.summary }}
+{% endfor %}
+{% endif %}
+
+*IMPORTANT*: You must follow these rules while creating your project:
+
+* You must only list *system* dependencies, ie. the ones that need to be installed (typically as admin) to set up the programming language, database, etc. Any packages that will need to be installed via language/platform-specific package managers are *not* system dependencies.
+* If there are several popular options (such as Nginx or Apache for web server), pick one that would be more suitable for the app in question.
+* DO NOT include text editors, IDEs, shells, OpenSSL, CLI tools such as git, AWS, or Stripe clients, or other utilities in your list. only direct dependencies required to build and run the project.
+* If a dependency (such as database) has a cloud alternative or can be installed on another computer (ie. isn't required on this computer), you must mark it as `required_locally: false`
+
+Output only your response in JSON format like in this example, without other commentary:
+```json
+{
+    "architecture": "Detailed description of the architecture of the application",
+    "system_dependencies": [
+        {
+            "name": "Node.js",
+            "description": "JavaScript runtime for building apps. This is required to be able to run the app you're building.",
+            "test": "node --version",
+            "required_locally": true
+        },
+        {
+            "name": "MongoDB",
+            "description": "NoSQL database. If you don't want to install MongoDB locally, you can use a cloud version such as MongoDB Atlas.",
+            "test": "mongosh --version",
+            "required_locally": false
+        },
+        ...
+    ],
+    "package_dependencies": [
+        {
+            "name": "express",
+            "description": "Express web server for Node"
+        },
+        ...
+    ],
+    "template": "name of the project template to use" // or null if you decide not to use a project template
+}
+```
--- a/core/prompts/code-monkey/breakdown.prompt
+++ b/core/prompts/code-monkey/breakdown.prompt
@@ -0,0 +1,2 @@
+{# This is the same template as for Developer's breakdown because Code Monkey is reusing it in a conversation #}
+{% extends "developer/breakdown.prompt" %}
--- a/core/prompts/code-monkey/describe_file.prompt
+++ b/core/prompts/code-monkey/describe_file.prompt
@@ -0,0 +1,26 @@
+Your task is to explain the functionality implemented by a particular source code file.
+
+Given a file path and file contents, your output should contain:
+
+* a detailed explanation of what the file is about;
+* a list of all other files referenced (imported) from this file. note that some libraries, frameworks or libraries assume file extension and don't use it explicitly. For example, "import foo" in Python references "foo.py" without specifying the extension. In your response, use the complete file name including the implied extension (for example "foo.py", not just "foo").
+
+Please analyze file `{{ path }}`, which contains the following content:
+```
+{{ content }}
+```
+
+Output the result in a JSON format with the following structure, as in this example:
+
+Example:
+{
+    "summary": "Describe in detail the functionality being defind o implemented in this file. Be as detailed as possible",
+    "references": [
+        "some/file.py",
+        "some/other/file.js"
+    ],
+}
+
+**IMPORTANT** In references, only include references to files that are local to the project. Do not include standard libraries or well-known external dependencies.
+
+Your response must be a valid JSON document, following the example format. Do not add any extra explanation or commentary outside the JSON document.
--- a/core/prompts/code-monkey/implement_changes.prompt
+++ b/core/prompts/code-monkey/implement_changes.prompt
@@ -0,0 +1,56 @@
+{% if rework_feedback is defined %}
+You previously made changes to file `{{ file_name }}`, according to the instructions described in the previous message.
+The reviewer accepted some of your changes, and the file now looks like this:
+```
+{{ file_content }}
+```
+{% elif file_content %}
+I need to modify file `{{ file_name }}` that currently looks like this:
+```
+{{ file_content }}
+```
+{% else %}
+I need to create a new file `{{ file_name }}`:
+{% endif %}
+
+**IMPORTANT**
+{% if rework_feedback is defined %}
+But not all changes were accepted, and the reviewer provided feedback on the changes that you must rework:
+{{ rework_feedback}}
+Please update the file accordingly and output the full new version of the file.
+{% else %}
+I want you to implement changes described in previous message, that starts with `{{ " ".join(instructions.split()[:5]) }}` and ends with `{{ " ".join(instructions.split()[-5:]) }}`.
+{% endif %}
+Make sure you don't make any mistakes, especially ones that could affect rest of project. Your changes will {% if rework_feedback is defined %}again {% endif %}be reviewed by very detailed reviewer. Because of that, it is extremely important that you are STRICTLY following ALL the following rules while implementing changes:
+
+**IMPORTANT** Output format
+You must output the COMPLETE NEW VERSION of this file in following format:
+-----------------------format----------------------------
+```
+the full contents of the updated file, without skipping over any content
+```
+------------------------end_of_format---------------------------
+
+**IMPORTANT** Comprehensive Codebase Insight
+It's crucial to grasp the full scope of the codebase related to your tasks to avert mistakes. Check the initial conversation message for a list of files. Pay a lot of attention to files that are directly included in the file you are currently modifying or that are importing your file.
+Consider these examples to guide your approach and thought process:
+-----------------------start_of_examples----------------------------
+- UI components or templates: Instead of placing scripts directly on specific pages, integrating them in the <head> section or as reusable partials enhances application-wide consistency and reusability.
+- Database operations: Be careful not to execute an action, like password hashing, both in a routing function and a model's pre('save') hook, which could lead to redundancy and errors.
+- Adding backend logic: Prior to creating new functions, verify if an equivalent function exists in the codebase that you could import and use, preventing unnecessary code duplication and keeping the project efficient.
+-----------------------end_of_examples----------------------------
+
+**IMPORTANT** Coding principles
+Write high-quality code, first organize it logically with clear, meaningful names for variables, functions, and classes. Aim for simplicity and adhere to the DRY (Don't Repeat Yourself) principle to avoid code duplication. Ensure your codebase is structured and modular for easy navigation and updates.
+
+**IMPORTANT** If the instructions have comments like `// ..add code here...` or `# placeholder for code`, instead of copying the comment, interpret the instructions and output the relevant code.
+
+**IMPORTANT** Your reply MUST NOT omit any code in the new implementation or substitute anything with comments like `// .. rest of the code goes here ..` or `# insert existing code here`, because I will overwrite the existing file with the content you provide. Output ONLY the content for this file, without additional explanation, suggestions or notes. Your output MUST start with ``` and MUST end with ``` and include only the complete file contents.
+
+**IMPORTANT** For hardcoded configuration values that the user needs to change, mark the line that needs user configuration with `INPUT_REQUIRED {config_description}` comment,  where `config_description` is a description of the value that needs to be set by the user. Use appropriate syntax for comments in the file you're saving (for example `// INPUT_REQUIRED {config_description}` in JavaScript). NEVER ask the user to write code or provide implementation, even if the instructions suggest it! If the file type doesn't support comments (eg JSON), don't add any.
+
+**IMPORTANT**: Logging
+Whenever you write code, make sure to log code execution so that when a developer looks at the CLI output, they can understand what is happening on the server. If the description above mentions the exact code that needs to be added but doesn't contain enough logs, you need to add the logs handlers inside that code yourself.
+
+**IMPORTANT**: Error handling
+Whenever you write code, make sure to add error handling for all edge cases you can think of because this app will be used in production so there shouldn't be any crashes. Whenever you log the error, you **MUST** log the entire error message and trace and not only the error message. If the description above mentions the exact code that needs to be added but doesn't contain enough error handlers, you need to add the error handlers inside that code yourself.
--- a/core/prompts/code-monkey/review_feedback.prompt
+++ b/core/prompts/code-monkey/review_feedback.prompt
@@ -0,0 +1,17 @@
+Your changes have been reviewed.
+{% if content != original_content %}
+The reviewer approved and applied some of your changes, but requested you rework the others.
+
+Here's the file with the approved changes already applied:
+```
+{{ content }}
+```
+
+Here's the reviewer's feedback:
+{% else %}
+The reviewer requested that you rework your changes, here's the feedback:
+{% endif %}
+
+{{ rework_feedback }}
+
+Based on this feedback and the original instructions, think carefully, make the correct changes, and output the entire file again. Remember, Output ONLY the content for this file, without additional explanation, suggestions or notes. Your output MUST start with ``` and MUST end with ``` and include only the complete file contents.
--- a/core/prompts/code-monkey/system.prompt
+++ b/core/prompts/code-monkey/system.prompt
@@ -0,0 +1,3 @@
+You are a full stack software developer that works in a software development agency.
+You write modular, clean, maintainable, production-ready code.
+Your job is to implement tasks that your tech lead assigns you.
--- a/core/prompts/code-reviewer/breakdown.prompt
+++ b/core/prompts/code-reviewer/breakdown.prompt
@@ -0,0 +1,2 @@
+{# This is the same template as for Developer's breakdown because Code Reviewer is reusing it in a conversation #}
+{% extends "developer/breakdown.prompt" %}
--- a/core/prompts/code-reviewer/review_changes.prompt
+++ b/core/prompts/code-reviewer/review_changes.prompt
@@ -0,0 +1,29 @@
+A developer on your team has been working on the task described in previous message. Based on those instructions, the developer has made changes to file `{{ file_name }}`.
+
+Here is the original content of this file:
+```
+{{ old_content }}
+```
+
+Here is the diff of the changes:
+
+{% for hunk in hunks %}## Hunk {{ loop.index }}
+```diff
+{{ hunk }}
+```
+{% endfor %}
+
+As you can see, there {% if hunks|length == 1 %}is only one hunk in this diff, and it{% else %}are {{hunks|length}} hunks in this diff, and each{% endif %} starts with the `@@` header line.
+
+When reviewing the code changes, apply these principles to decide on each hunk:
+- Apply: Approve and integrate the hunk into our core codebase if it accurately delivers the intended functionality or enhancement, aligning with our project objectives. This action confirms the change is beneficial and meets our quality standards.
+- Ignore: Use this option sparingly, only when you're certain the entire hunk is incorrect or will introduce errors (logical, syntax, etc.) that could negatively impact the project. Ignoring means the hunk will be completely removed. This should be reserved for cases where the inclusion of the code is definitively more harmful than its absence. Emphasize careful consideration before choosing 'Ignore.' It's crucial for situations where the hunk's removal is the only option to prevent significant issues. Otherwise, 'Rework' might be the better choice to ensure the code's integrity and functionality.
+- Rework: Suggest this option if the concept behind the change is valid and necessary but is implemented in a way that introduces problems. This indicates a need for a revision of the hunk to refine its integration without fully discarding the underlying idea.
+
+When deciding what should be done with the hunk you are currently reviewing, pick an option that most reviewers of your skill would choose. Your decisions have to be consistent.
+
+Keep in mind you're just reviewing current file. You don't need to consider if other files are created, dependent packages installed, etc. Focus only on reviewing the changes in this file based on the instructions in the previous message.
+
+Note that the developer may add, modify or delete logging (including `gpt_pilot_debugging_log`) or error handling that's not explicitly asked for, but is a part of good development practice. Unless these logging and error handling additions break something, your decision to apply, ignore or rework the hunk should not be based on this. Base your decision only on functional changes - comments or logging are less important. Importantly, don't ask for a rework just because of logging or error handling changes. Also, take into account this is a junior developer and while the approach they take may not be the best practice, if it's not *wrong*, let it pass. Ask for rework only if the change is clearly bad and would break something.
+
+The developer that wrote this is sometimes sloppy and has could have deleted some parts of the code that contain important functionality and should not be deleted. Pay special attention to that in your review.
--- a/core/prompts/code-reviewer/system.prompt
+++ b/core/prompts/code-reviewer/system.prompt
@@ -0,0 +1,2 @@
+You are a world class full stack software developer. You write modular, clean, maintainable, production-ready code.
+Your job is to review changes implemented by your junior team members.
--- a/core/prompts/developer/breakdown.prompt
+++ b/core/prompts/developer/breakdown.prompt
@@ -0,0 +1,34 @@
+You are working on an app called "{{ state.branch.project.name }}" and you need to write code for the entire {% if state.epics|length > 1 %}feature{% else %}app{% endif %} based on the tasks that the tech lead gives you. So that you understand better what you're working on, you're given other specs for "{{ state.branch.project.name }}" as well.
+
+{% include "partials/project_details.prompt" %}
+{% include "partials/features_list.prompt" %}
+{% include "partials/files_list.prompt" %}
+
+We've broken the development of this {% if state.epics|length > 1 %}feature{% else %}app{% endif %} down to these tasks:
+```
+{% for task in state.tasks %}
+{{ loop.index }}. {{ task.description }}{% if task.get("completed") %} (completed){% endif %}
+{% endfor %}
+```
+
+You are currently working on task #{{ current_task_index + 1 }} with the following description:
+```
+{{ task.description }}
+```
+{% if current_task_index != 0 %}All previous tasks are finished and you don't have to work on them.{% endif %}
+
+Now, tell me all the code that needs to be written to implement ONLY this task and have it fully working and all commands that need to be run to implement this task.
+
+**IMPORTANT**
+{%- if state.epics|length == 1 %}
+Remember, I created an empty folder where I will start writing files that you tell me and that are needed for this app.
+{% endif %}
+{% include "partials/relative_paths.prompt" %}
+DO NOT specify commands to create any folders or files, they will be created automatically - just specify the relative path to each file that needs to be written.
+
+{% include "partials/file_naming.prompt" %}
+{% include "partials/execution_order.prompt" %}
+{% include "partials/human_intervention_explanation.prompt" %}
+{% include "partials/file_size_limit.prompt" %}
+
+Never use the port 5000 to run the app, it's reserved.
--- a/core/prompts/developer/filter_files.prompt
+++ b/core/prompts/developer/filter_files.prompt
@@ -0,0 +1,16 @@
+We're starting work on a new task for a project we're working on.
+
+{% include "partials/project_details.prompt" %}
+{% include "partials/files_list.prompt" %}
+{% include "partials/relative_paths.prompt" %}
+
+We've broken the development of the project down to these tasks:
+```
+{% for task in state.tasks %}
+{{ loop.index }}. {{ task.description }}{% if task.get("completed") %} (completed){% endif %}
+{% endfor %}
+```
+
+The next task we need to work on is: {{ current_task.description }}
+
+Before we dive into solving this task, we need to determine which files which files from the above list are relevant to this task. Output the relevant files in a JSON list.
--- a/core/prompts/developer/iteration.prompt
+++ b/core/prompts/developer/iteration.prompt
@@ -0,0 +1 @@
+{% extends "troubleshooter/iteration.prompt" %}
--- a/core/prompts/developer/parse_task.prompt
+++ b/core/prompts/developer/parse_task.prompt
@@ -0,0 +1,43 @@
+Ok, now, take your response and convert it to a list of actionable steps that will be executed by a machine.
+Analyze the entire message, think step by step and make sure that you don't omit any information
+when converting this message to steps.
+
+Each step can be either:
+
+* `command` - command to run (must be able to run on a {{ os }} machine, assume current working directory is project root folder)
+* `save_file` - create or update ONE file
+* `human_intervention` - if you need the human to do something, use this type of step and explain in details what you want the human to do. NEVER use `human_intervention` for testing, as testing will be done separately by a dedicated QA after all the steps are done. Also you MUST NOT use `human_intervention` to ask the human to write or review code.
+
+**IMPORTANT**: If multiple changes are required for same file, you must provide single `save_file` step for each file.
+
+{% include "partials/file_naming.prompt" %}
+{% include "partials/relative_paths.prompt" %}
+{% include "partials/execution_order.prompt" %}
+{% include "partials/human_intervention_explanation.prompt" %}
+
+**IMPORTANT**: Remember, NEVER output human intervention steps to do manual tests or coding tasks, even if the previous message asks for it! The testing will be done *after* these steps and you MUST NOT include testing in these steps.
+
+Examples:
+------------------------example_1---------------------------
+```
+{
+  "tasks": [
+    {
+      "type": "save_file",
+      "save_file": {
+        "path": "server.js"
+      },
+    },
+    {
+      "type": "command",
+      "command": {
+        "command": "mv index.js public/index.js"",
+        "timeout": 5,
+        "success_message": "",
+        "command_id": "move_index_file"
+      }
+    }
+  ]
+}
+```
+------------------------end_of_example_1---------------------------
--- a/core/prompts/developer/system.prompt
+++ b/core/prompts/developer/system.prompt
@@ -0,0 +1,5 @@
+You are a world class full stack software developer working in a team.
+
+You write modular, well-organized code split across files that are not too big, so that the codebase is maintainable. You include proper error handling and logging for your clean, readable, production-level quality code.
+
+Your job is to implement tasks assigned by your tech lead, following task implementation instructions.
--- a/core/prompts/error-handler/debug.prompt
+++ b/core/prompts/error-handler/debug.prompt
@@ -0,0 +1,58 @@
+A coding task has been implemented for the new project we're working on.
+
+{% include "partials/project_details.prompt" %}
+{% include "partials/files_list.prompt" %}
+
+We've broken the development of the project down to these tasks:
+```
+{% for task in state.tasks %}
+{{ loop.index }}. {{ task.description }}{% if task.get("completed") %} (completed){% endif %}
+{% endfor %}
+```
+
+The current task is: {{ current_task.description }}
+
+Here are the detailed instructions for the current task:
+```
+{{ current_task.instructions }}
+```
+{# FIXME: the above stands in place of a previous (task breakdown) convo, and is duplicated in define_user_review_goal, review_task and debug prompts #}
+
+{% if task_steps and step_index is not none -%}
+The current task has been split into multiple steps, and each step is one of the following:
+* `command` - command to run
+* `save_file` -  create or update a file
+* `human_intervention` - if the human needs to do something
+
+{# FIXME: this is copypasted from ran_command #}
+Here is the list of all steps in in this task (steps that were already completed are marked as COMPLETED, future steps that will be executed once debugging is done are marked as FUTURE, and the current step is marked as CURRENT STEP):
+{% for step in task_steps %}
+* {% if loop.index0 < step_index %}(COMPLETED){% elif loop.index0 > step_index %}(FUTURE){% else %}(**CURRENT STEP**){% endif %} {{ step.type }}: `{% if step.type == 'command' %}{{ step.command.command }}{% elif step.type == 'save_file' %}{{ step.save_file.path }}{% endif %}`
+{% endfor %}
+
+When trying to see if command was ran successfully, take into consideration steps that were previously executed and steps that will be executed after the current step. It can happen that command seems like it failed but it will be fixed with next steps. In that case you should consider that command to be successfully executed.
+{%- endif %}
+
+I ran the command `{{ cmd }}`, and it {% if status_code is none %}timed out{% else %}exited with status code {{ status_code }}{% endif %}.
+{% if stdout %}
+Command stdout:
+```
+{{ stdout }}
+```
+{% endif %}
+{% if stderr %}
+Command stderr:
+```
+{{ stderr }}
+```
+{% endif %}
+{# end copypasted #}
+
+{{ analysis }}
+
+Based on the above, I want you to propose a step by step plan to solve the problem and continue with the the current task. I will take your plan and replace the current steps with it, so make sure it contains everything needed to complete this task AND THIS TASK ONLY.
+
+{% include "partials/file_naming.prompt" %}
+{% include "partials/execution_order.prompt" %}
+{% include "partials/human_intervention_explanation.prompt" %}
+{% include "partials/file_size_limit.prompt" %}
--- a/core/prompts/executor/ran_command.prompt
+++ b/core/prompts/executor/ran_command.prompt
@@ -0,0 +1,56 @@
+A coding task has been implemented for the new project we're working on.
+
+{% include "partials/project_details.prompt" %}
+{% include "partials/files_list.prompt" %}
+
+We've broken the development of the project down to these tasks:
+```
+{% for task in state.tasks %}
+{{ loop.index }}. {{ task.description }}{% if task.get("completed") %} (completed){% endif %}
+{% endfor %}
+```
+
+The current task is: {{ current_task.description }}
+
+Here are the detailed instructions for the current task:
+```
+{{ current_task.instructions }}
+```
+{# FIXME: the above stands in place of a previous (task breakdown) convo, and is duplicated in define_user_review_goal and debug prompts #}
+
+{% if task_steps and step_index is not none -%}
+The current task has been split into multiple steps, and each step is one of the following:
+* `command` - command to run
+* `save_file` -  create or update a file
+* `human_intervention` - if the human needs to do something
+
+Here is the list of all steps in in this task (steps that were already completed are marked as COMPLETED, future steps that will be executed once debugging is done are marked as FUTURE, and the current step is marked as CURRENT STEP):
+{% for step in task_steps %}
+* {% if loop.index0 < step_index %}(COMPLETED){% elif loop.index0 > step_index %}(FUTURE){% else %}(**CURRENT STEP**){% endif %} {{ step.type }}: `{% if step.type == 'command' %}{{ step.command.command }}{% elif step.type == 'save_file' %}{{ step.save_file.path }}{% endif %}`
+{% endfor %}
+
+When trying to see if command was ran successfully, take into consideration steps that were previously executed and steps that will be executed after the current step. It can happen that command seems like it failed but it will be fixed with next steps. In that case you should consider that command to be successfully executed.
+{%- endif %}
+
+I ran the command `{{ cmd }}`, and it {% if status_code is none %}timed out{% else %}exited with status code {{ status_code }}{% endif %}.
+{% if stdout %}
+Command stdout:
+```
+{{ stdout }}
+```
+{% endif %}
+{% if stderr %}
+Command stderr:
+```
+{{ stderr }}
+```
+{% endif %}
+
+Think about the output and result of this command in the context of current task and current step. Provide detailed analysis of the output and determine if the command was successfully executed.
+Output your response in the following JSON format:
+```
+{
+    "analysis": "Detailed analysis of the command results. In this error the command was successfully executed because...",
+    "success": true
+}
+```
--- a/core/prompts/partials/execution_order.prompt
+++ b/core/prompts/partials/execution_order.prompt
@@ -0,0 +1 @@
+All the steps will be executed in order in which you give them, so it is very important that you think about all steps before you start listing them. For example, you should never code something before you install dependencies or you should never try access a file before it exists in project.
--- a/core/prompts/partials/features_list.prompt
+++ b/core/prompts/partials/features_list.prompt
@@ -0,0 +1,16 @@
+{% if state.epics|length > 2 %}
+
+Here is the list of features that were previously implemented on top of initial high level description of "{{ state.branch.project.name }}":
+```
+{% for feature in state.epics[1:] %}
+- {{ loop.index0 }}. {{ feature.summary }}
+{% endfor %}
+```
+{% endif %}
+{% if state.epics|length > 1 %}
+
+Here is the feature that you are implementing right now:
+```
+{{ state.unfinished_epics[0].description }}
+```
+{% endif %}
--- a/core/prompts/partials/file_naming.prompt
+++ b/core/prompts/partials/file_naming.prompt
@@ -0,0 +1 @@
+**IMPORTANT**: When creating and naming new files, ensure the file naming (camelCase, kebab-case, underscore_case, etc) is consistent with the best practices and coding style of the language.
--- a/core/prompts/partials/file_size_limit.prompt
+++ b/core/prompts/partials/file_size_limit.prompt
@@ -0,0 +1,2 @@
+**IMPORTANT**
+When you think about in which file should the new code go to, always try to make files as small as possible and put code in more smaller files rather than in one big file.
--- a/core/prompts/partials/files_list.prompt
+++ b/core/prompts/partials/files_list.prompt
@@ -0,0 +1,26 @@
+{% if state.relevant_files %}
+These files are currently implemented in the project:
+{% for file in state.files %}
+* `{{ file.path }}{% if file.meta.get("description") %}: {{file.meta.description}}{% endif %}`
+{% endfor %}
+
+Here are the complete contents of files relevant to this task:
+---START_OF_FILES---
+{% for file in state.relevant_file_objects %}
+File **`{{ file.path }}`** ({{file.content.content.splitlines()|length}} lines of code):
+```
+{{ file.content.content }}```
+
+{% endfor %}
+---END_OF_FILES---
+{% elif state.files %}
+These files are currently implemented in the project:
+---START_OF_FILES---
+{% for file in state.files %}
+**`{{ file.path }}`** ({{file.content.content.splitlines()|length}} lines of code):
+```
+{{ file.content.content }}```
+
+{% endfor %}
+---END_OF_FILES---
+{% endif %}
--- a/core/prompts/partials/human_intervention_explanation.prompt
+++ b/core/prompts/partials/human_intervention_explanation.prompt
@@ -0,0 +1,38 @@
+**IMPORTANT**
+You must not tell me to run a command in the database or anything OS related - only if some dependencies need to be installed. If there is a need to run an OS related command, specifically tell me that this should be labeled as "Human Intervention" and explain what the human needs to do.
+Avoid using "Human Intervention" if possible. You should NOT use "Human Intervention" for anything else than steps that you can't execute. Also, you must not use "Human Intervention" to ask user to test that the application works, because this will be done separately after all the steps are finished - no need to ask the user now.
+
+Here are a few examples when and how to use "Human Intervention":
+------------------------start_of_example_1---------------------------
+Here is an example of good response for the situation where it seems like 3rd party API, in this case Facebook, is not working:
+
+* "Human Intervention"
+"1. Check latest Facebook API documentation for updates on endpoints, parameters, or authentication.
+2. Verify Facebook API key/authentication and request format to ensure they are current and correctly implemented.
+3. Use REST client tools like Postman or cURL to directly test the Facebook API endpoints.
+4. Check the Facebook API's status page for any reported downtime or service issues.
+5. Try calling the Facebook API from a different environment to isolate the issue."
+------------------------end_of_example_1---------------------------
+
+------------------------start_of_example_2---------------------------
+Here is an example of good response for the situation where the user needs to enable some settings in their Gmail account:
+
+* "Human Intervention"
+"To enable sending emails from your Node.js app via your Gmail, account, you need to do the following:
+1. Log in to your Gmail account.
+2. Go to 'Manage your Google Account' > Security.
+3. Scroll down to 'Less secure app access' and turn it on.
+4. Under 'Signing in to Google', select 'App Passwords'. (You may need to sign in again)
+5. At the bottom, click 'Select app' and choose the app you’re using.
+6. Click 'Generate'.
+Then, use your gmail address and the password generated in the step #6 and put it into the .env file."
+------------------------end_of_example_2---------------------------
+
+------------------------start_of_example_3---------------------------
+Here is an example when there are issues with writing to the MongoDB connection:
+
+* "Human Intervention"
+"1. Verify the MongoDB credentials provided have write permissions, not just read-only access.
+2. Confirm correct database and collection names are used when connecting to database.
+3. Update credentials if necessary to include insert document permissions."
+------------------------end_of_example_3---------------------------
--- a/core/prompts/partials/project_details.prompt
+++ b/core/prompts/partials/project_details.prompt
@@ -0,0 +1,22 @@
+Here is a high level description of "{{ state.branch.project.name }}":
+```
+{{ state.specification.description }}
+```
+
+{% if state.specification.architecture %}
+Here is a short description of the project architecture:
+{{ state.specification.architecture }}
+{% endif %}
+{% if state.specification.system_dependencies %}
+
+Here are the technologies that should be used for this project:
+{% for tech in state.specification.system_dependencies %}
+* {{ tech.name }} - {{ tech.description }}
+{% endfor %}
+{% endif %}
+{% if state.specification.package_dependencies %}
+
+{% for tech in state.specification.package_dependencies %}
+* {{ tech.name }} - {{ tech.description }}
+{% endfor %}
+{% endif %}
--- a/core/prompts/partials/project_tasks.prompt
+++ b/core/prompts/partials/project_tasks.prompt
@@ -0,0 +1,67 @@
+Before we go into the coding part, I want you to split the development process of creating this {{ task_type }} into smaller tasks so that it is easier to develop, debug and make the {{ task_type }} work.
+
+Each task needs to be related only to the development of this {{ task_type }} and nothing else - once the {{ task_type }} is fully working, that is it. There shouldn't be a task for researching, deployment, writing documentation, testing or anything that is not writing the actual code.
+
+**IMPORTANT**
+As an experienced tech lead you always follow rules on how to create tasks. Dividing project into tasks is extremely important job and you have to do it very carefully.
+
+Now, based on the project details provided{% if task_type  == 'feature' %} and new feature description{% endif %}, think task by task and create the entire development plan{% if task_type  == 'feature' %} for new feature{% elif task_type  == 'app' %}. {% if state.files %}Continue from the existing code listed above{% else %}Start from the project setup{% endif %} and specify each task until the moment when the entire app should be fully working{% if state.files %}. You should not reimplement what's already done - just continue from the implementation already there{% endif %}{% endif %} while strictly following these rules:
+
+Rule #1
+There should never be a task that is only testing or ensuring something works, every task must have coding involved. Have this in mind for every task, but it is extremely important for last task of project. Testing if {{ task_type }} works will be done as part of each task.
+
+Rule #2
+This rule applies to the complexity of tasks.
+You have to make sure the project is not split into tasks that are too small or simple for no reason but also not too big or complex so that they are hard to develop, debug and review.
+Have in mind that project already has workspace folder created and only system dependencies installed. You don't have to create tasks for that.
+Here are examples of poorly created tasks:
+
+**too simple tasks**
+- Set up a Node.js project and install all necessary dependencies.
+- Establish a MongoDB database connection using Mongoose with the IP '127.0.0.1'.
+
+**too complex tasks**
+- Set up Node.js project with /home, /profile, /register and /login routes that will have user authentication, connection to MongoDB with user schemas, mailing of new users and frontend with nice design.
+
+You must to avoid creating tasks that are too simple or too complex. You have to aim to create tasks that are medium complexity. Here are examples of tasks that are good:
+
+**good tasks**
+- Set up a Node.js project, install all necessary dependencies and set up an express server with a simple route to `/ping` that returns the status 200.
+- Establish a MongoDB database connection and implement the message schema using Mongoose for persistent storage of messages.
+
+Rule #3
+This rule applies to the number of tasks you will create.
+Every {{ task_type }} should have different number of tasks depending on complexity. Think task by task and create the minimum number of tasks that are relevant for this specific {{ task_type }}.
+{% if task_type  == 'feature' %} If the feature is small, it is ok to have only 1 task.{% endif %}
+Here are some examples of apps with different complexity that can give you guidance on how many tasks you should create:
+
+Example #1:
+app description: "I want to create an app that will just say 'Hello World' when I open it on my localhost:3000."
+number of tasks: 1
+
+Example #2:
+app description: "Create a node.js app that enables users to register and log into the app. On frontend it should have /home (shows user data), /register and /login. It should use sessions to keep user logged in."
+number of tasks: 2-4
+
+Example #3:
+app description: "A cool online shoe store, with a sleek look. In terms of data models, there are shoes, categories and user profiles. For web pages: product listing, details, shopping cart. It must look cool and jazzy."
+number of tasks: 5-15
+
+Rule #4
+This rule applies to writing task 'description'.
+Every task must have a clear and very detailed (must be minimum of 4 sentences but can be more) 'description'. It must be very clear so that even developers who just moved to this project can execute them without additional questions. It is not enough to just write something like "Create a route for /home". You have to describe what needs to be done in that route, what data needs to be returned, what should be the status code, etc. Give as many details as possible and make sure no information is missing that could be needed for this task.
+Here is an example of good and bad task description:
+
+**bad task**
+{
+    "description": "Create a route for /dashboard"
+}
+
+**good task**
+{
+    "description": "In 'route.js' add a route for /dashboard that returns the status 200. Route should be accessible only for logged in users. In 'middlewares.js' there should be a check if user is logged in using session. If user is not logged in, it should redirect to /login. If user is logged in, it should return the user data. User data should be fetched from database in 'users' collection using the user id from session."
+}
+
+Rule #5
+When creating and naming new files, ensure the file naming (camelCase, kebab-case, underscore_case, etc) is consistent with the best practices and coding style of the language.
+Pay attention to file paths: if the command or argument is a file or folder from the project, use paths relative to the project root (for example, use `somedir/somefile` instead of `/somedir/somefile`).
--- a/core/prompts/partials/relative_paths.prompt
+++ b/core/prompts/partials/relative_paths.prompt
@@ -0,0 +1 @@
+**IMPORTANT**: Pay attention to file paths: if the command or argument is a file or folder from the project, use paths relative to the project root (for example, use `somedir/somefile` instead of `/path/to/project/somedir/somefile`).
--- a/core/prompts/problem-solver/get_alternative_solutions.prompt
+++ b/core/prompts/problem-solver/get_alternative_solutions.prompt
@@ -0,0 +1,57 @@
+You are working on an app called "{{ state.branch.project.name }}" and you need to write code for the entire {% if state.epics|length > 1 %}feature{% else %}app{% endif %} based on the tasks that the tech lead gives you. So that you understand better what you're working on, you're given other specs for "{{ state.branch.project.name }}" as well.
+
+{% include "partials/project_details.prompt" %}
+{% include "partials/features_list.prompt" %}
+
+We've broken the development of this {% if state.epics|length > 1 %}feature{% else %}app{% endif %} down to these tasks:
+```
+{% for task in state.tasks %}
+{{ loop.index }}. {{ task.description }}{% if task.get("completed") %} (completed){% endif %}
+{% endfor %}
+```
+
+{% if state.current_task %}
+You are currently working on, and have to focus only on, this task:
+```
+{{ state.current_task.description }}
+```
+
+{% endif %}
+A part of the app is already finished.
+{% include "partials/files_list.prompt" %}
+
+You are trying to solve an issue that your colleague is reporting.
+{% if previous_solutions|length > 0 %}
+You tried {{ previous_solutions|length }} times to solve it but it was unsuccessful. In last few attempts, your colleague gave you this report:
+{% for solution in previous_solutions[-3:] %}
+----------------------------start_of_report_{{ loop.index }}----------------------------
+{{ solution.user_feedback }}
+----------------------------end_of_report_{{ loop.index }}----------------------------
+
+Then, you gave the following proposal (proposal_{{ loop.index }}) of what needs to be done to fix the issue:
+----------------------------start_of_proposal_{{ loop.index }}----------------------------
+{{ solution.description }}
+----------------------------end_of_of_proposal_{{ loop.index }}----------------------------
+
+{% if not loop.last %}
+Then, upon implementing these changes, your colleague came back with the following report:
+{% endif %}
+{% endfor %}
+{% endif %}
+
+{% if user_input != '' %}
+Your colleague who is testing the app "{{ name }}" sent you this report now:
+```
+{{ user_input }}
+```
+
+You tried to solve this problem before but your colleague is telling you that you got into a loop where all your tries end up the same way - with an error.
+{%- endif -%}
+
+It seems that the solutions you're proposing aren't working.
+
+Now, think step by step about 5 alternative solutions to get this code to work that are most probable to solve this issue.
+
+Every proposed solution needs to be concrete and not vague (eg, it cannot be "Review and change apps functionality") and based on the code changes. A solution can be complex if it's related to the same part of the code (eg. "Try changing the input variables X, Y and Z to a method N").
+
+Order them in the order of the biggest probability of fixing the problem. A developer will then go through this list item by item, try to implement it, and check if it solved the issue until the end of the list.
--- a/core/prompts/problem-solver/iteration.prompt
+++ b/core/prompts/problem-solver/iteration.prompt
@@ -0,0 +1 @@
+{% extends "troubleshooter/iteration.prompt" %}
--- a/core/prompts/problem-solver/system.prompt
+++ b/core/prompts/problem-solver/system.prompt
--- a/core/prompts/spec-writer/ask_questions.prompt
+++ b/core/prompts/spec-writer/ask_questions.prompt
@@ -0,0 +1,76 @@
+Your task is to talk to a new client and develop a detailed specification for a new application the client wants to build. This specification will serve as an input to an AI software developer and thus must be very detailed, contain all the project functionality and precisely define behaviour, 3rd-party integrations (if any), etc.
+
+The AI developer prefers working on web apps using Node/Express/MongoDB/Mongoose/EJS stack, and use vanilla JS with Bootstrap on the frontend, unless the client has different requirements.
+Try to avoid the use of Docker, Kubernetes, microservices and single-page app frameworks like React, Next.js, Angular, Vue or Svelte unless the brief explicitly requires it.
+
+In your work, follow these important rules:
+* In your communication with the client, be straightforward, concise, and focused on the task.
+* Ask questions ONE BY ONE. This is veryy important, as the client is easily confused. If you were to ask multiple questions the user would probably miss some questions, so remember to always ask the questions one by one
+* Ask specific questions, taking into account what you already know about the project. For example, don't ask "what features do you need?" or "describe your idea"; instead ask "what is the most important feature?"
+* Pay special attention to any documentation or information that the project might require (such as accessing a custom API, etc). Be sure to ask the user to provide information and examples that the developers will need to build the proof-of-concept. You will need to output all of this in the final specification.
+* This is a a prototype project, it is important to have small and well-defined scope. If the scope seems to grow too large (beyond a week or two of work for one developer), ask the user if they can simplify the project.
+* Do not address non-functional requirements (performance, deployment, security, budget, timelines, etc...). We are only concerned with functional and technical specification here.
+* Do not address deployment or hosting, including DevOps tasks to set up a CI/CD pipeline
+* Don't address or invision any future development (post proof-of-concept), the scope of your task is to only spec the PoC/prototype.
+* If the user provided specific information on how to access 3rd party API or how exactly to implement something, you MUST include that in the specification. Remember, the AI developer will only have access to the specification you write.
+
+Ensure that you have all the information about:
+* overall description and goals for the app
+* all the features of the application
+* functional specification
+    * how the user will use the app
+    * enumerate all the parts of the application (eg. pages of the application, background processing if any, etc); for each part, explain *in detail* how it should work from the perspective of the user
+    * identify any constraints, business rules, user flows or other important info that affect how the application works or how it is used
+* technical specification
+    * what kind of an application this is and what platform/technologies will be used
+    * the architecture of the application (what happens on backend, frontend, mobile, background tasks, integration with 3rd party services, etc)
+    * detailed description of each component of the application architecture
+* integration specification
+    * any 3rd party apps, services, APIs that will be used (eg. for auth, payments, etc..)
+    * if a custom API is used, precise definitions, with examples, how to use the custom API or do the custom integration
+
+If you identify any missing information or need clarification on any vague or ambiguous parts of the brief, ask the client about it.
+
+Important note: don't ask trivial questions for obvious or unimportant parts of the app, for example:
+* Bad questions example 1:
+  * Client brief: I want to build a hello world web app
+  * Bad questions:
+    * What title do you want for the web page that displays "Hello World"?
+    * What color and font size would you like for the "Hello World" text to be displayed in?
+    * Should the "Hello World" message be static text served directly from the server, or would you like it implemented via JavaScript on the client side?
+  * Explanation: There's no need to micromanage the developer(s) and designer(s), the client would've specified these details if they were important.
+
+If you ask such trivial questions, the client will think you're stupid and will leave. DOn'T DO THAT
+
+Think carefully about what a developer must know to be able to build the app. The specification must address all of this information, otherwise the AI software developer will not be able to build the app.
+
+When you gather all the information from the client, output the complete specification. Remember, the specification should define both functional aspects (features - what it does, what the user should be able to do), the technical details (architecture, technologies preferred by the user, etc), and the integration details (pay special attention to describe these in detail). Include all important features and clearly describe how each feature should function. IMPORTANT: Do not add any preamble (eg. "Here's the specification....") or conclusion/commentary (eg. "Let me know if you have further questions")!
+
+Here's an EXAMPLE initial prompt:
+---start-of-example-output---
+Online forum similar to Hacker News (news.ycombinator.com), with a simple and clean interface, where people can post links or text posts, and other people can upvote, downvote and comment on. Reading is open to anonymous users, but users must register to post, upvote, downvote or comment. Use simple username+password authentication. The forum should be implemented in Node.js with Express framework, using MongoDB and Mongoose ORM.
+
+The UI should use EJS view engine, Bootstrap for styling and plain vanilla JavaScript. Design should be simple and look like Hacker News, with a top bar for navigation, using a blue color scheme instead of the orange color in HN. The footer in each page should just be "Built using GPT Pilot".
+
+Each story has a title (one-line text), a link (optional, URL to an external article being shared on AI News), and text (text to show in the post). Link and text are mutually exclusive - if the submitter tries to use both, show them an error.
+
+Use the following algorithm to rank top stories, and comments within a story: "score = upvotes - downvotes + comments - sqrt(age)" , where "upvotes" and "downvotes" are the number of upvotes and downvotes the story or comment has, "comments" is the number of comments for a story (total), or the number of sub-comments (for a comment), and "age" is how old is the story, in minutes, and "sqrt" is the square root function.
+
+Implement the following pages:
+
+* / - shows the top 20 posted stories, ranked using the scoring algorithm, with a "More" link that shows the next 20 (pagination using "p" query parameter), and so on
+* /newest - shows the latest 20 posted stories, ranked chronologically (newest first), with a "More" link that shows the next 20 (pagination using "p" query parameter), and so on
+* /submit - shows a form to submit a new story, upon submitting the user should get redirected to /newest
+* /login - shows a login form (username, password, "login" button, and a link to register page for new users)
+* /register - shows a register form (username, password, "register" button, and a link to login page for existing users)
+* /item - shows the story (use "id" query parameter to pass the story ID to this route)
+* /comment - shows the form to send a comment  (just a textarea and "submit" button) - upon commenting, the person should get redirected to the story they commented on
+
+The / and /newest pages should show the story title (link to the external article if "link" is set, otherwise link to the story item /item page), number of points (points = upvotes - downvotes), poster username (no link), how old is the story ("x minutes ago", "y hours ago" or "z days ago"), and "xyz comments" (link to /item page of the story). This is basically the same how HN shows it.
+
+The /item page should also follow the layout for HN in how it shows the story, and the comments tree. Instead of the embedded "reply" form, the story should just have a "comment" button that goes to the /comment page, similar to the "reply" link underneath each comment. Both should link to the /comment page.
+---end-of-example-output---
+
+Remember, this is important: the AI developer will not have access to client's initial description and transcript of your conversation. The developer will only see the specification you output on the end. It is very important that the spec captures *all* the details of the project in as much detail and precision as possible.
+
+Note: after the client reads the specification you create, the client might have additional comments or suggestions. In this case, continue the discussion with the user until you get all the new information and output the newly updated spec again.
--- a/core/prompts/spec-writer/prompt_complexity.prompt
+++ b/core/prompts/spec-writer/prompt_complexity.prompt
@@ -0,0 +1,8 @@
+```
+{{ prompt }}
+```
+
+The above is a user prompt for application/software tool they are trying to develop. Determine the complexity of the user's request. Do NOT respond with thoughts, reasoning, explanations or anything similar, return ONLY a string representation of the complexity level. Use the following scale:
+"hard" for high complexity
+"moderate" for moderate complexity
+"simple" for low complexity
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`{% extends "troubleshooter/iteration.prompt" %}`
				`@@ -0,0 +1 @@`
				`All the steps will be executed in order in which you give them, so it is very important that you think about all steps before you start listing them. For example, you should never code something before you install dependencies or you should never try access a file before it exists in project.`
				`@@ -0,0 +1 @@`
				`IMPORTANT: When creating and naming new files, ensure the file naming (camelCase, kebab-case, underscore_case, etc) is consistent with the best practices and coding style of the language.`
				`@@ -0,0 +1 @@`
				IMPORTANT: Pay attention to file paths: if the command or argument is a file or folder from the project, use paths relative to the project root (for example, use `somedir/somefile` instead of `/path/to/project/somedir/somefile`).