{% extends "admin/base.html" %} {% block title %}Help & Credits{% endblock %} {% block header_title %}Help & Credits{% endblock %} {% block content %}
The main dashboard provides a real-time, auto-updating overview of your entire AI infrastructure.
The Live Flow provides a high-fidelity visual representation of your request pipeline using SSE (Server-Sent Events).
This is where you configure the backend AI servers that the proxy will manage and distribute requests to.
Ollama servers and vLLM (or any OpenAI-compatible) servers. The proxy handles the translation automatically. You can also add an optional API key if your backend server requires one.The Instance Manager allows the Fortress to supervise separate Ollama processes directly on the host machine.
CUDA_VISIBLE_DEVICES to each instance.Control who can access your AI models and how.
The playgrounds are powerful tools for interacting with and evaluating your models directly within the UI.
This is your interactive command center for testing conversational models.
.txt, .py, .md, etc.). Their content will be automatically included in your prompt, perfect for asking questions about code or text..py for Python).This tool helps you visually understand how different embedding models work. It answers the question: "Does my model group similar concepts together?"
.json file you add to the benchmarks/ folder will automatically appear in the "Load Pre-built" list.Virtual Agents allow you to transform a raw base model into a specialized persona by hardcoding a "Soul" (System Prompt).
coding-guru or legal-expert. Users simply call these names as if they were real models.Routers are "virtual traffic controllers" that distribute incoming requests to a set of target models or agents.
An Ensemble acts as a **Mixture-of-Experts** pipeline. It allows you to query multiple models in parallel and synthesize a single, high-quality answer.
<think> tags, allowing the user to collapse/expand the "reasoning" of each sub-agent.When a user requests the auto model, the Fortress analyzes the prompt intent and chooses the best candidate from the Models Manager.
In the Models Manager, you define the "DNA" of your models. These settings are critical for correct request handling:
Enable this for Vision Models (LLaVA, Moondream, etc.). When a request arrives with image data, the proxy filters out any model that does not have this checked to prevent backend crashes.
Some models support a think: true parameter to show internal reasoning.
โข If unchecked: The proxy strips the "think" parameter before forwarding to protect incompatible models.
โข If checked: The parameter is preserved. For gpt-oss models, the proxy automatically translates true to "medium" for compatibility.
The proxy attempts to auto-detect this from Ollama/Llama.cpp servers. You can manually lower this value to force the auto router to pick a larger model when the user's prompt history grows too long.
Lower numbers = Higher priority. If multiple models match the user's intent (e.g., two "Code" models), the one with the lowest priority value is chosen first.
Gain insights into how your models are being used. All charts and tables are exportable to PNG or CSV.
This is a critical security layer. By default, the proxy **blocks** API key holders from accessing sensitive Ollama endpoints like /api/pull, /api/delete, and /api/create. This prevents users from consuming excessive resources or modifying your backend servers.
Encrypt all traffic by going to the Settings page. You can either upload your certificate and key files directly or provide the file paths on the server. A server restart is required to apply changes.
For local testing, you can generate a self-signed certificate with OpenSSL:
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj "/CN=localhost"
In the Settings page, you can specify comma-separated lists of IP addresses or ranges (e.g., 192.168.1.0/24) for the "Allowed IPs" and "Denied IPs" fields to control access to the proxy.
The Fortress can act as both an Ollama and an OpenAI server. This allows you to use your Hub with any software that expects either API style.
When Redis is enabled, you can set global or per-key rate limits. The proxy also automatically blocks IPs that have too many failed admin login attempts.
LoLLMs Hub follows the Standard Host Interface protocol. This allows tools developed here to run unchanged in the LoLLMs PyQt app or the WebUI.
To access user-specific data, add lollms to your function signature. The Hub will automatically inject the host interface.
def tool_my_logic(args, lollms=None):
if lollms:
username = lollms.user['username']
Never use open() for state. Use the host methods to ensure data is isolated by user and tool library.
lollms.set(key, val, persistent=True)lollms.get(key, default)lollms.delete(key)app/ directory. Use pipmaster for external libraries and the lollms object for everything else.
Skills are specialized instructions, workflows, or tool definitions that you can plug into your models.
.md file..skill files, which are simple ZIP archives of the skill folder.Personalities define the "soul" of your AI. They determine how the model speaks, acts, and behaves.
## Identity section to set the persona's backstory, tone, and character.## Behaviour section to define the core system prompt, greeting, and constraints.
curl http://127.0.0.1:8080/api/generate \
-H "Authorization: Bearer op_prefix_secret" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"prompt": "Why is the sky blue?",
"stream": false
}'
import requests
import json
proxy_url = "http://127.0.0.1:8080/api/chat"
api_key = "op_prefix_secret"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
data = {
"model": "llama3",
"messages": [
{"role": "user", "content": "Explain quantum computing simply."}
],
"stream": False
}
response = requests.post(proxy_url, headers=headers, data=json.dumps(data))
print(response.json())
Redis is an in-memory database that provides high-speed data access. This application uses it for two optional but important security features:
The easiest way to run Redis on any platform is with Docker.
docker run -d --name redis-stack -p 6379:6379 --restart always redis/redis-stack:latest
Once Redis is running, go to the Settings page, enter your Redis connection details, and save. The proxy will connect automatically.
To provide AI access to your whole family (Phones, Raspberry Pis, other PCs) using your main Gamer PC:
Use the Instance Manager to start one Ollama process per GPU. Assign 0 to Instance A and 1 to Instance B. This allows your family to generate in parallel without slowing each other down.
Create a user for each family member. Generate a unique API key for your daughter's PC and another for your phone's Telegram bot. This lets you track usage and set individual limits.
family-gpt, and the proxy will automatically choose the fastest available GPU for them.
This application was developed with passion by the open-source community. It stands on the shoulders of giants and wouldn't be possible without the following incredible projects:
Project built and maintained by ParisNeo with help from AI and cool developers (check the contributors list in the github page).
Visit the project on GitHub to contribute, report issues, or star the repository!