Add Memory Pre-Seeding information to readme.md

Add the documentation for memory pre-seeding
2026-04-08 03:00:28 -04:00 · 2023-04-12 20:47:46 +02:00
parent 4c30b47bbc
commit 65cc4f833f
1 changed files with 47 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -41,6 +41,7 @@ Your support is greatly appreciated
  - [🗣️ Speech Mode](#️-speech-mode)
  - [🔍 Google API Keys Configuration](#-google-api-keys-configuration)
    - [Setting up environment variables](#setting-up-environment-variables)
+  - [🧠 Memory pre-seeding](#memory-pre-seeding)
  - [💀 Continuous Mode ⚠️](#-continuous-mode-️)
  - [GPT3.5 ONLY Mode](#gpt35-only-mode)
  - [🖼 Image Generation](#image-generation)
@@ -245,6 +246,52 @@ To switch to either, change the `MEMORY_BACKEND` env variable to the value that

 1. View memory usage by using the `--debug` flag :)

+
+## 🧠 Memory pre-seeding
+
+```
+# python scripts/data_ingestion.py -h 
+usage: data_ingestion.py [-h] (--file FILE | --dir DIR) [--init] [--overlap OVERLAP] [--max_length MAX_LENGTH]
+
+Ingest a file or a directory with multiple files into memory. Make sure to set your .env before running this script.
+
+options:
+  -h, --help               show this help message and exit
+  --file FILE              The file to ingest.
+  --dir DIR                The directory containing the files to ingest.
+  --init                   Init the memory and wipe its content (default: False)
+  --overlap OVERLAP        The overlap size between chunks when ingesting files (default: 200)
+  --max_length MAX_LENGTH  The max_length of each chunk when ingesting files (default: 4000
+
+# python scripts/data_ingestion.py --dir seed_data --init --overlap 200 --max_length 1000
+```
+
+This script located at scripts/data_ingestion.py, allows you to ingest files into memory and pre-seed it before running Auto-GPT. 
+
+Memory pre-seeding is a technique that involves ingesting relevant documents or data into the AI's memory so that it can use this information to generate more informed and accurate responses.
+
+To pre-seed the memory, the content of each document is split into chunks of a specified maximum length with a specified overlap between chunks, and then each chunk is added to the memory backend set in the .env file. When the AI is prompted to recall information, it can then access those pre-seeded memories to generate more informed and accurate responses.
+
+This technique is particularly useful when working with large amounts of data or when there is specific information that the AI needs to be able to access quickly. 
+By pre-seeding the memory, the AI can retrieve and use this information more efficiently, saving time, API call and improving the accuracy of its responses. 
+
+You could for example download the documentation of an API, a Github repository, etc. and ingest it into memory before running Auto-GPT. 
+
+⚠️ If you use Redis as your memory, make sure to run Auto-GPT with the WIPE_REDIS_ON_START set to False in your .env file.
+
+⚠️For other memory backend, we currently forcefully wipe the memory when starting Auto-GPT. To ingest data with those memory backend, you can call the data_ingestion.py script anytime during an Auto-GPT run. 
+
+Memories will be available to the AI immediately as they are ingested, even if ingested while Auto-GPT is running.
+
+In the example above, the script initializes the memory, ingests all files within the seed_data directory into memory with an overlap between chunks of 200 and a maximum length of each chunk of 4000.
+Note that you can also use the --file argument to ingest a single file into memory and that the script will only ingest files within the auto_gpt_workspace directory.
+
+You can adjust the max_length and overlap parameters to fine-tune the way the docuents are presented to the AI when it "recall" that memory:
+
+- Adjusting the overlap value allows the AI to access more contextual information from each chunk when recalling information, but will result in more chunks being created and therefore increase memory backend usage and OpenAI API requests.
+- Reducing the max_length value will create more chunks, which can save prompt tokens by allowing for more message history in the context, but will also increase the number of chunks.
+- Increasing the max_length value will provide the AI with more contextual information from each chunk, reducing the number of chunks created and saving on OpenAI API requests. However, this may also use more prompt tokens and decrease the overall context available to the AI.
+
 ## 💀 Continuous Mode ⚠️

 Run the AI **without** user authorisation, 100% automated.