{% extends "admin/base.html" %} {% block title %}Help & Credits{% endblock %} {% block header_title %}Help & Credits{% endblock %} {% block content %}

System Analytics

๐Ÿ–ฅ๏ธ The Dashboard: Your Monitoring Hub

The main dashboard provides a real-time, auto-updating overview of your entire AI infrastructure.

  • System Status: These gauges show the live CPU, Memory, and Disk usage of the machine running the proxy server itself.
  • Active Models: This table shows a unified list of all models currently active across your backend servers.
    • For **Ollama** servers, this means models currently loaded into VRAM/RAM. They have an "Expires In" timer and can be unloaded.
    • For **vLLM** servers, all available models are considered "Always Active" as they are managed by the vLLM instance itself.
  • Load Balancer Status: This panel shows the health of all configured backend servers. "Online" means the proxy can reach the server, while "Offline" indicates a connection issue. It also shows a lifetime request count for each server.
  • Rate Limit Queue Status: If you're using Redis, this shows a live view of API keys that are actively being rate-limited, how close they are to their limit, and when their usage window will reset.

๐Ÿ“ก Live System Flow: Real-Time Telemetry

The Live Flow provides a high-fidelity visual representation of your request pipeline using SSE (Server-Sent Events).

  • Visual Life-cycle: Particles represent requests moving from the Gateway (left) to Compute Nodes (right).
    • Received: Request accepted and routing logic is being evaluated.
    • Assigned: A specific backend or orchestrator has been locked in.
    • Active: Real-time streaming is occurring. Hover to see live TTFT and TPS.
  • Parallel Trace: When using Ensembles, you will see multiple sub-particles branch out simultaneously toward individual agents, while the main request remains active until synthesis.
  • The Cemetery: Failed or completed requests glide to the bottom bar. Clicking the Diagnostic Button on a cemetery particle allows you to copy the full JSON trace or error logs for debugging.

๐Ÿ”ง Server Management

This is where you configure the backend AI servers that the proxy will manage and distribute requests to.

  • Adding Servers: You can add both standard Ollama servers and vLLM (or any OpenAI-compatible) servers. The proxy handles the translation automatically. You can also add an optional API key if your backend server requires one.
  • Refreshing Models: Clicking "Refresh" fetches the latest list of available models from a server and stores it in the proxy's database. This is crucial for the "Smart Model Routing" feature. Model lists are also refreshed automatically in the background.
  • Model Whitelisting: In the "Edit Server" page, you can select specific models from the server's catalog. If you set a whitelist, the proxy will only route requests to that server if the requested model is on the allowed list. This is useful for pinning specific hardware to specific tasks.
  • Managing Models: Clicking "Manage Models" takes you to a detailed view for that server. For Ollama servers, you can pull new models, update existing ones, delete models from disk, and trigger a model to be loaded into or unloaded from memory. These actions are not applicable to vLLM servers.

๐Ÿ—๏ธ Local Instance Manager

The Instance Manager allows the Fortress to supervise separate Ollama processes directly on the host machine.

  • Process Isolation: You can start multiple Ollama instances on different ports. This is highly recommended for multi-GPU setups, where you can assign specific CUDA_VISIBLE_DEVICES to each instance.
  • Auto-Discovery: The Fortress scans your local network for unmanaged Ollama instances. You can "Adopt" these to bring them under the dashboard's management.
  • Lifecycle Control: Start, stop, and monitor the health of managed local processes without ever leaving the UI.

๐Ÿ‘ค User & Key Management

Control who can access your AI models and how.

  • User Accounts: Create separate user accounts to logically group API keys. This is useful for organizing keys by team, project, or application.
  • API Keys: From a user's "Manage Keys" page, you can create multiple keys. Each key gets a descriptive name and a unique prefix for easy identification in logs.
  • Key Lifecycle:
    • Disable/Enable: Temporarily turn a key on or off without deleting it.
    • Revoke: Permanently and irreversibly invalidate a key. This is a security measure for lost or compromised keys.
  • Per-Key Rate Limits: If Redis is configured, you can override the global rate limit for specific keys, allowing you to give higher or lower priority to certain applications.

๐Ÿงช Playgrounds & Benchmarking: Test Your Models

The playgrounds are powerful tools for interacting with and evaluating your models directly within the UI.

Chat Playground

This is your interactive command center for testing conversational models.

  • Real-time Interaction: Chat with any available model and see responses stream in token by token.
  • Multi-modal Support:
    • **Images:** Simply paste an image into the chat box or use the image attach button.
    • **Documents:** Use the document attach button to upload text-based files (.txt, .py, .md, etc.). Their content will be automatically included in your prompt, perfect for asking questions about code or text.
  • System Prompts: Use the settings icon () to set a system prompt, defining the model's persona or rules. We've included powerful presets like "Chain of Thought" and "Image Bounding Box Detection".
  • Code Block Actions: Hover over any code block in a response to reveal "Copy" and "Save" buttons. The save button will automatically suggest a file extension based on the language (e.g., .py for Python).
  • Message Controls: Hover over any message to Copy, Edit, Delete, or Regenerate. Editing a message forks the conversation from that point.
  • Import/Export: Save your entire chat history to a JSON file or load a previous conversation.

Embedding Playground

This tool helps you visually understand how different embedding models work. It answers the question: "Does my model group similar concepts together?"

  • How it Works: You define groups of related words ("concepts"). The tool gets the vector embeddings for each text and uses PCA to project them into a 2D graph.
  • Interpreting the Results: Texts with similar meanings should appear clustered together. Well-defined, tight clusters indicate that the model has a good grasp of semantic similarity.
  • Benchmarks: Use pre-built benchmarks, create your own in the UI, or load/save them as JSON files. Any .json file you add to the benchmarks/ folder will automatically appear in the "Load Pre-built" list.

๐ŸŽญ Virtual Agents: Giving AI a Soul

Virtual Agents allow you to transform a raw base model into a specialized persona by hardcoding a "Soul" (System Prompt).

  • Abstraction: Create names like coding-guru or legal-expert. Users simply call these names as if they were real models.
  • Automatic Injection: Every time an agent is called, the Fortress automatically prepends the system prompt to the conversation, ensuring the model stays in character.

๐Ÿง  Advanced Orchestration & Routing

Smart Routers (Load Balancing & Firewalls)

Routers are "virtual traffic controllers" that distribute incoming requests to a set of target models or agents.

  • Strategies:
    • Priority: Always uses the first available model in the list.
    • Random: Even distribution for large clusters.
    • Least Loaded: Routes to the backend with the lowest active request count (best for high-TPS apps).
  • Firewall Rules: Add conditional logic (e.g., "If prompt contains 'image', use LLaVA", "If length < 100, use TinyLlama"). Rules are processed top-to-bottom.

Ensemble Orchestrators (MoE)

An Ensemble acts as a **Mixture-of-Experts** pipeline. It allows you to query multiple models in parallel and synthesize a single, high-quality answer.

  • Parallel Brains: The user's query is sent to every agent in the ensemble at the same time.
  • Master Synthesis: A "Master Model" reviews the outputs of all parallel agents and produces a final consolidated response.
  • Internal Monologue: Enable "Show Monologue" to wrap agent responses in <think> tags, allowing the user to collapse/expand the "reasoning" of each sub-agent.

The 'auto' Model (Intelligent Selection)

When a user requests the auto model, the Fortress analyzes the prompt intent and chooses the best candidate from the Models Manager.

  • Metadata Powered: Use the Models Manager to tag models as "Reasoning", "Fast", "Code", or "Multi-modal".
  • Logic: If a prompt looks like code, it goes to a "Code" model. If it contains images, it goes to an "Image" model.

Configuring Model Capabilities

In the Models Manager, you define the "DNA" of your models. These settings are critical for correct request handling:

๐Ÿ‘๏ธ Image Compatibility

Enable this for Vision Models (LLaVA, Moondream, etc.). When a request arrives with image data, the proxy filters out any model that does not have this checked to prevent backend crashes.

๐Ÿง  Think (CoT) Mode

Some models support a think: true parameter to show internal reasoning.

โ€ข If unchecked: The proxy strips the "think" parameter before forwarding to protect incompatible models.
โ€ข If checked: The parameter is preserved. For gpt-oss models, the proxy automatically translates true to "medium" for compatibility.

๐Ÿ“ Max Context Window

The proxy attempts to auto-detect this from Ollama/Llama.cpp servers. You can manually lower this value to force the auto router to pick a larger model when the user's prompt history grows too long.

๐Ÿ”ข Priority

Lower numbers = Higher priority. If multiple models match the user's intent (e.g., two "Code" models), the one with the lowest priority value is chosen first.

๐Ÿ“ˆ Usage Statistics

Gain insights into how your models are being used. All charts and tables are exportable to PNG or CSV.

  • Global Analytics: The main Usage Stats page shows aggregate data across all users, including requests per day, peak hours, model popularity, and server load distribution.
  • Per-User Analytics: From the User Management page, click "View Usage" for any user to see the same set of detailed charts filtered specifically for them.
  • Sortable Tables: The tables on the statistics and user management pages are sortable. Just click on a column header to reorder the data.

๐Ÿ›ก๏ธ Security Features

Endpoint Blocking

This is a critical security layer. By default, the proxy **blocks** API key holders from accessing sensitive Ollama endpoints like /api/pull, /api/delete, and /api/create. This prevents users from consuming excessive resources or modifying your backend servers.

Customizable: You can change the list of blocked endpoints in the Settings page under "Endpoint Security".

HTTPS/SSL Encryption

Encrypt all traffic by going to the Settings page. You can either upload your certificate and key files directly or provide the file paths on the server. A server restart is required to apply changes.

For local testing, you can generate a self-signed certificate with OpenSSL:

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj "/CN=localhost"

IP Filtering

In the Settings page, you can specify comma-separated lists of IP addresses or ranges (e.g., 192.168.1.0/24) for the "Allowed IPs" and "Denied IPs" fields to control access to the proxy.

Dual-Protocol Gateway

The Fortress can act as both an Ollama and an OpenAI server. This allows you to use your Hub with any software that expects either API style.

  • Port Separation: You can run the OpenAI API on a different port (default 8081) for specialized routing or firewall rules.
  • Automatic Translation: The Hub handles the heavy lifting of translating OpenAI's JSON schema to your backend models automatically.
  • Agent Support (Tool Calling): Full support for function calling. You can now use the Hub with CrewAI, LangChain, or AutoGen using either the OpenAI or Ollama protocols.

Rate Limiting & Brute-Force Protection

When Redis is enabled, you can set global or per-key rate limits. The proxy also automatically blocks IPs that have too many failed admin login attempts.

๐Ÿ› ๏ธ Portable Tools & User Persistence

LoLLMs Hub follows the Standard Host Interface protocol. This allows tools developed here to run unchanged in the LoLLMs PyQt app or the WebUI.

The 'lollms' Parameter

To access user-specific data, add lollms to your function signature. The Hub will automatically inject the host interface.

def tool_my_logic(args, lollms=None):
    if lollms:
        username = lollms.user['username']

Persistence API

Never use open() for state. Use the host methods to ensure data is isolated by user and tool library.

  • lollms.set(key, val, persistent=True)
  • lollms.get(key, default)
  • lollms.delete(key)
Portability Mandate: Do not import any files from the app/ directory. Use pipmaster for external libraries and the lollms object for everything else.

๐Ÿง  Skills & Personalities

Skills Library

Skills are specialized instructions, workflows, or tool definitions that you can plug into your models.

  • Standardization: Skills allow you to define repeatable procedures (like "Python Code Reviewer") in a single .md file.
  • Format: They use YAML Frontmatter (for metadata) and Markdown (for body logic).
  • Portability: You can export/import skills using .skill files, which are simple ZIP archives of the skill folder.

Personalities Studio

Personalities define the "soul" of your AI. They determine how the model speaks, acts, and behaves.

  • Identity: Use the ## Identity section to set the persona's backstory, tone, and character.
  • Behaviour: Use the ## Behaviour section to define the core system prompt, greeting, and constraints.
  • Memory Seeds: You can embed persistent facts within the personality to ensure the model always knows specific details about your environment or preferences.

๐Ÿ‘จโ€๐Ÿ’ป Usage Examples

cURL


curl http://127.0.0.1:8080/api/generate \
  -H "Authorization: Bearer op_prefix_secret" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

Python (`requests`)


import requests
import json

proxy_url = "http://127.0.0.1:8080/api/chat"
api_key = "op_prefix_secret"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

data = {
    "model": "llama3",
    "messages": [
        {"role": "user", "content": "Explain quantum computing simply."}
    ],
    "stream": False
}

response = requests.post(proxy_url, headers=headers, data=json.dumps(data))
print(response.json())

โšก Redis Setup Guide

Why Do I Need Redis?

Redis is an in-memory database that provides high-speed data access. This application uses it for two optional but important security features:

  • API Rate Limiting: Prevents abuse by limiting how many requests a key can make in a given time.
  • Brute-Force Protection: Temporarily locks an IP address after too many failed admin logins.
Note: The proxy will work perfectly without Redis, but these security features will be disabled.

Installation

The easiest way to run Redis on any platform is with Docker.

docker run -d --name redis-stack -p 6379:6379 --restart always redis/redis-stack:latest

Configuration

Once Redis is running, go to the Settings page, enter your Redis connection details, and save. The proxy will connect automatically.

๐Ÿ  Setting Up a Home AI Hub

To provide AI access to your whole family (Phones, Raspberry Pis, other PCs) using your main Gamer PC:

1. Multi-GPU Optimization

Use the Instance Manager to start one Ollama process per GPU. Assign 0 to Instance A and 1 to Instance B. This allows your family to generate in parallel without slowing each other down.

2. Family API Keys

Create a user for each family member. Generate a unique API key for your daughter's PC and another for your phone's Telegram bot. This lets you track usage and set individual limits.

Pro Tip: Use Smart Routers to create an "Easy Mode" model name. Your family can just use the model family-gpt, and the proxy will automatically choose the fastest available GPU for them.

๐Ÿ’– Credits & Acknowledgements

This application was developed with passion by the open-source community. It stands on the shoulders of giants and wouldn't be possible without the following incredible projects:

Project built and maintained by ParisNeo with help from AI and cool developers (check the contributors list in the github page).

Visit the project on GitHub to contribute, report issues, or star the repository!

{% endblock %}