Files
lollms_hub/app/templates/admin/help.html
2026-04-14 09:41:29 +02:00

633 lines
45 KiB
HTML

{% extends "admin/base.html" %}
{% block title %}Help & Credits{% endblock %}
{% block header_title %}Help & Credits{% endblock %}
{% block content %}
<div class="mb-6 flex items-center justify-between">
<h1 class="text-xl font-bold">System Analytics</h1>
<button onclick="window.startWizard()" class="text-[10px] bg-indigo-600/20 text-indigo-400 border border-indigo-500/30 px-3 py-1 rounded hover:bg-indigo-600/30 font-black uppercase tracking-tighter transition-all">Restart Onboarding Wizard</button>
</div>
<style>
.toc-link {
display: block;
padding: 0.5rem 1rem;
border-radius: 0.375rem;
transition: all 0.2s ease-in-out;
border-left: 3px solid transparent;
}
.toc-link:hover {
background-color: rgba(128, 128, 128, 0.1);
padding-left: 1.25rem;
}
.toc-link.active-toc {
color: var(--color-primary-500);
font-weight: 600;
border-left-color: var(--color-primary-500);
}
.ui-brutalism .toc-link.active-toc {
background-color: var(--color-primary-500);
color: white;
border: 3px solid black;
}
</style>
<div class="flex flex-col lg:flex-row gap-12">
<!-- Main Content -->
<div class="w-full lg:w-3/4 space-y-12">
<div id="help-content-container" class="h-[80vh] overflow-y-auto pr-4 custom-scrollbar">
<div id="markdown-body" class="prose prose-invert max-w-none">
<!-- Content injected here via JS -->
</div>
</div>
<!-- Dashboard Section -->
<div id="dashboard" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🖥️ The Dashboard: Your Monitoring Hub</h2>
<div class="space-y-4 text-current">
<p>The main dashboard provides a real-time, auto-updating overview of your entire AI infrastructure.</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>System Status:</strong> These gauges show the live CPU, Memory, and Disk usage of the machine running the proxy server itself.</li>
<li><strong>Active Models:</strong> This table shows a unified list of all models currently active across your backend servers.
<ul class="list-['-_'] list-inside pl-4 mt-2">
<li>For **Ollama** servers, this means models currently loaded into VRAM/RAM. They have an "Expires In" timer and can be unloaded.</li>
<li>For **vLLM** servers, all available models are considered "Always Active" as they are managed by the vLLM instance itself.</li>
</ul>
</li>
<li><strong>Load Balancer Status:</strong> This panel shows the health of all configured backend servers. "Online" means the proxy can reach the server, while "Offline" indicates a connection issue. It also shows a lifetime request count for each server.</li>
<li><strong>Rate Limit Queue Status:</strong> If you're using Redis, this shows a live view of API keys that are actively being rate-limited, how close they are to their limit, and when their usage window will reset.</li>
</ul>
</div>
</div>
<!-- Live Flow Section -->
<div id="live-flow" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">📡 Live System Flow: Real-Time Telemetry</h2>
<div class="space-y-4 text-current">
<p>The <a href="{{ url_for('admin_live_status') }}" class="font-semibold text-[var(--color-primary-600)] hover:underline">Live Flow</a> provides a high-fidelity visual representation of your request pipeline using SSE (Server-Sent Events).</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>Visual Life-cycle:</strong> Particles represent requests moving from the <strong>Gateway</strong> (left) to <strong>Compute Nodes</strong> (right).
<ul class="list-['-_'] list-inside pl-4 mt-2">
<li><span class="text-indigo-400 font-bold">Received:</span> Request accepted and routing logic is being evaluated.</li>
<li><span class="text-amber-400 font-bold">Assigned:</span> A specific backend or orchestrator has been locked in.</li>
<li><span class="text-emerald-400 font-bold">Active:</span> Real-time streaming is occurring. Hover to see live <strong>TTFT</strong> and <strong>TPS</strong>.</li>
</ul>
</li>
<li><strong>Parallel Trace:</strong> When using Ensembles, you will see multiple sub-particles branch out simultaneously toward individual agents, while the main request remains active until synthesis.</li>
<li><strong>The Cemetery:</strong> Failed or completed requests glide to the bottom bar. Clicking the <strong>Diagnostic Button</strong> on a cemetery particle allows you to copy the full JSON trace or error logs for debugging.</li>
</ul>
</div>
</div>
<!-- Server Management Section -->
<div id="server-management" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🔧 Server Management</h2>
<div class="space-y-4 text-current">
<p>This is where you configure the backend AI servers that the proxy will manage and distribute requests to.</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>Adding Servers:</strong> You can add both standard <code class="inline-code">Ollama</code> servers and <code class="inline-code">vLLM</code> (or any OpenAI-compatible) servers. The proxy handles the translation automatically. You can also add an optional API key if your backend server requires one.</li>
<li><strong>Refreshing Models:</strong> Clicking "Refresh" fetches the latest list of available models from a server and stores it in the proxy's database. This is crucial for the "Smart Model Routing" feature. Model lists are also refreshed automatically in the background.</li>
<li><strong>Model Whitelisting:</strong> In the "Edit Server" page, you can select specific models from the server's catalog. If you set a whitelist, the proxy will <b>only</b> route requests to that server if the requested model is on the allowed list. This is useful for pinning specific hardware to specific tasks.</li>
<li><strong>Managing Models:</strong> Clicking "Manage Models" takes you to a detailed view for that server. For Ollama servers, you can pull new models, update existing ones, delete models from disk, and trigger a model to be loaded into or unloaded from memory. These actions are not applicable to vLLM servers.</li>
</ul>
</div>
</div>
<!-- Instance Manager Section -->
<div id="instance-manager" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🏗️ Local Instance Manager</h2>
<div class="space-y-4 text-current">
<p>The <a href="{{ url_for('admin_instances') }}" class="font-semibold text-[var(--color-primary-600)] hover:underline">Instance Manager</a> allows the Fortress to supervise separate Ollama processes directly on the host machine.</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>Process Isolation:</strong> You can start multiple Ollama instances on different ports. This is highly recommended for multi-GPU setups, where you can assign specific <code class="inline-code">CUDA_VISIBLE_DEVICES</code> to each instance.</li>
<li><strong>Auto-Discovery:</strong> The Fortress scans your local network for unmanaged Ollama instances. You can "Adopt" these to bring them under the dashboard's management.</li>
<li><strong>Lifecycle Control:</strong> Start, stop, and monitor the health of managed local processes without ever leaving the UI.</li>
</ul>
</div>
</div>
<!-- User Management Section -->
<div id="user-management" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">👤 User & Key Management</h2>
<div class="space-y-4 text-current">
<p>Control who can access your AI models and how.</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>User Accounts:</strong> Create separate user accounts to logically group API keys. This is useful for organizing keys by team, project, or application.</li>
<li><strong>API Keys:</strong> From a user's "Manage Keys" page, you can create multiple keys. Each key gets a descriptive name and a unique prefix for easy identification in logs.</li>
<li><strong>Key Lifecycle:</strong>
<ul class="list-['-_'] list-inside pl-4 mt-2">
<li><strong>Disable/Enable:</strong> Temporarily turn a key on or off without deleting it.</li>
<li><strong>Revoke:</strong> Permanently and irreversibly invalidate a key. This is a security measure for lost or compromised keys.</li>
</ul>
</li>
<li><strong>Per-Key Rate Limits:</strong> If Redis is configured, you can override the global rate limit for specific keys, allowing you to give higher or lower priority to certain applications.</li>
</ul>
</div>
</div>
<!-- Playgrounds & Benchmarking Section -->
<div id="playgrounds" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🧪 Playgrounds & Benchmarking: Test Your Models</h2>
<div class="space-y-8 text-current">
<p>The playgrounds are powerful tools for interacting with and evaluating your models directly within the UI.</p>
<div>
<h3 class="text-xl font-semibold mb-2">Chat Playground</h3>
<p>This is your interactive command center for testing conversational models.</p>
<ul class="list-disc list-inside space-y-2 mt-3 pl-4">
<li><strong>Real-time Interaction:</strong> Chat with any available model and see responses stream in token by token.</li>
<li><strong>Multi-modal Support:</strong>
<ul class="list-['-_'] list-inside pl-4 mt-2">
<li>**Images:** Simply paste an image into the chat box or use the image attach button.</li>
<li>**Documents:** Use the document attach button to upload text-based files (<code class="inline-code">.txt</code>, <code class="inline-code">.py</code>, <code class="inline-code">.md</code>, etc.). Their content will be automatically included in your prompt, perfect for asking questions about code or text.</li>
</ul>
</li>
<li><strong>System Prompts:</strong> Use the settings icon (<svg class="w-4 h-4 inline" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z"></path><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M15 12a3 3 0 11-6 0 3 3 0 016 0z"></path></svg>) to set a system prompt, defining the model's persona or rules. We've included powerful presets like "Chain of Thought" and "Image Bounding Box Detection".</li>
<li><strong>Code Block Actions:</strong> Hover over any code block in a response to reveal "Copy" and "Save" buttons. The save button will automatically suggest a file extension based on the language (e.g., <code class="inline-code">.py</code> for Python).</li>
<li><strong>Message Controls:</strong> Hover over any message to Copy, Edit, Delete, or Regenerate. Editing a message forks the conversation from that point.</li>
<li><strong>Import/Export:</strong> Save your entire chat history to a JSON file or load a previous conversation.</li>
</ul>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Embedding Playground</h3>
<p>This tool helps you visually understand how different embedding models work. It answers the question: "Does my model group similar concepts together?"</p>
<ul class="list-disc list-inside space-y-2 mt-3 pl-4">
<li><strong>How it Works:</strong> You define groups of related words ("concepts"). The tool gets the vector embeddings for each text and uses PCA to project them into a 2D graph.</li>
<li><strong>Interpreting the Results:</strong> Texts with similar meanings should appear clustered together. Well-defined, tight clusters indicate that the model has a good grasp of semantic similarity.</li>
<li><strong>Benchmarks:</strong> Use pre-built benchmarks, create your own in the UI, or load/save them as JSON files. Any <code class="inline-code">.json</code> file you add to the <code class="inline-code">benchmarks/</code> folder will automatically appear in the "Load Pre-built" list.</li>
</ul>
</div>
</div>
</div>
<!-- Virtual Agents Section -->
<div id="virtual-agents" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🎭 Virtual Agents: Giving AI a Soul</h2>
<div class="space-y-4 text-current">
<p>Virtual Agents allow you to transform a raw base model into a specialized persona by hardcoding a "Soul" (System Prompt).</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>Abstraction:</strong> Create names like <code class="inline-code">coding-guru</code> or <code class="inline-code">legal-expert</code>. Users simply call these names as if they were real models.</li>
<li><strong>Automatic Injection:</strong> Every time an agent is called, the Fortress automatically prepends the system prompt to the conversation, ensuring the model stays in character.</li>
</ul>
</div>
</div>
<!-- Advanced Routing Section -->
<div id="model-routing" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🧠 Advanced Orchestration & Routing</h2>
<div class="space-y-8 text-current">
<div>
<h3 class="text-xl font-semibold mb-2">Smart Routers (Load Balancing & Firewalls)</h3>
<p>Routers are "virtual traffic controllers" that distribute incoming requests to a set of target models or agents.</p>
<ul class="list-disc list-inside space-y-2 mt-3 pl-4">
<li><strong>Strategies:</strong>
<ul class="list-['-_'] list-inside pl-4 mt-2">
<li><strong>Priority:</strong> Always uses the first available model in the list.</li>
<li><strong>Random:</strong> Even distribution for large clusters.</li>
<li><strong>Least Loaded:</strong> Routes to the backend with the lowest active request count (best for high-TPS apps).</li>
</ul>
</li>
<li><strong>Firewall Rules:</strong> Add conditional logic (e.g., "If prompt contains 'image', use LLaVA", "If length < 100, use TinyLlama"). Rules are processed top-to-bottom.</li>
</ul>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Ensemble Orchestrators (MoE)</h3>
<p>An Ensemble acts as a **Mixture-of-Experts** pipeline. It allows you to query multiple models in parallel and synthesize a single, high-quality answer.</p>
<ul class="list-disc list-inside space-y-2 mt-3 pl-4">
<li><strong>Parallel Brains:</strong> The user's query is sent to every agent in the ensemble at the same time.</li>
<li><strong>Master Synthesis:</strong> A "Master Model" reviews the outputs of all parallel agents and produces a final consolidated response.</li>
<li><strong>Internal Monologue:</strong> Enable "Show Monologue" to wrap agent responses in <code class="inline-code">&lt;think&gt;</code> tags, allowing the user to collapse/expand the "reasoning" of each sub-agent.</li>
</ul>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">The 'auto' Model (Intelligent Selection)</h3>
<p>When a user requests the <code class="inline-code">auto</code> model, the Fortress analyzes the prompt intent and chooses the best candidate from the <strong>Models Manager</strong>.</p>
<ul class="list-disc list-inside space-y-2 mt-3 pl-4">
<li><strong>Metadata Powered:</strong> Use the <a href="{{ url_for('admin_models_manager') }}" class="underline">Models Manager</a> to tag models as "Reasoning", "Fast", "Code", or "Multi-modal".</li>
<li><strong>Logic:</strong> If a prompt looks like code, it goes to a "Code" model. If it contains images, it goes to an "Image" model.</li>
</ul>
</div>
<div class="mt-8 pt-6 border-t border-white/5">
<h3 class="text-xl font-semibold mb-4">Configuring Model Capabilities</h3>
<p>In the <a href="{{ url_for('admin_models_manager') }}" class="font-semibold underline">Models Manager</a>, you define the "DNA" of your models. These settings are critical for correct request handling:</p>
<div class="grid grid-cols-1 md:grid-cols-2 gap-4 mt-4">
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-indigo-400 mb-2">👁️ Image Compatibility</h4>
<p class="text-sm text-gray-400">Enable this for <b>Vision Models</b> (LLaVA, Moondream, etc.). When a request arrives with image data, the proxy filters out any model that does <i>not</i> have this checked to prevent backend crashes.</p>
</div>
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-amber-400 mb-2">🧠 Think (CoT) Mode</h4>
<p class="text-sm text-gray-400">Some models support a <code class="inline-code">think: true</code> parameter to show internal reasoning.
<br><br>• If <b>unchecked</b>: The proxy strips the "think" parameter before forwarding to protect incompatible models.
<br>• If <b>checked</b>: The parameter is preserved. For <b>gpt-oss</b> models, the proxy automatically translates <code class="inline-code">true</code> to <code class="inline-code">"medium"</code> for compatibility.</p>
</div>
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-emerald-400 mb-2">📏 Max Context Window</h4>
<p class="text-sm text-gray-400">The proxy attempts to <b>auto-detect</b> this from Ollama/Llama.cpp servers. You can manually lower this value to force the <code class="inline-code">auto</code> router to pick a larger model when the user's prompt history grows too long.</p>
</div>
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-purple-400 mb-2">🔢 Priority</h4>
<p class="text-sm text-gray-400">Lower numbers = Higher priority. If multiple models match the user's intent (e.g., two "Code" models), the one with the lowest priority value is chosen first.</p>
</div>
</div>
</div>
</div>
</div>
<!-- Statistics Section -->
<div id="statistics" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">📈 Usage Statistics</h2>
<div class="space-y-4 text-current">
<p>Gain insights into how your models are being used. All charts and tables are exportable to PNG or CSV.</p>
<ul class="list-disc list-inside space-y-3 pl-4">
<li><strong>Global Analytics:</strong> The main <a href="{{ url_for('admin_stats') }}" class="font-semibold text-[var(--color-primary-600)] hover:underline">Usage Stats</a> page shows aggregate data across all users, including requests per day, peak hours, model popularity, and server load distribution.</li>
<li><strong>Per-User Analytics:</strong> From the <a href="{{ url_for('admin_users') }}" class="font-semibold text-[var(--color-primary-600)] hover:underline">User Management</a> page, click "View Usage" for any user to see the same set of detailed charts filtered specifically for them.</li>
<li><strong>Sortable Tables:</strong> The tables on the statistics and user management pages are sortable. Just click on a column header to reorder the data.</li>
</ul>
</div>
</div>
<!-- Security Features Section -->
<div id="security" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🛡️ Security Features</h2>
<div class="space-y-8 text-current">
<div>
<h3 class="text-xl font-semibold mb-2">Endpoint Blocking</h3>
<p>This is a critical security layer. By default, the proxy **blocks** API key holders from accessing sensitive Ollama endpoints like <code class="inline-code">/api/pull</code>, <code class="inline-code">/api/delete</code>, and <code class="inline-code">/api/create</code>. This prevents users from consuming excessive resources or modifying your backend servers.</p>
<div class="mt-4 p-3 rounded-md text-sm info-box">
<strong>Customizable:</strong> You can change the list of blocked endpoints in the <a href="{{ url_for('admin_settings') }}" class="font-semibold underline">Settings</a> page under "Endpoint Security".
</div>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">HTTPS/SSL Encryption</h3>
<p>Encrypt all traffic by going to the <a href="{{ url_for('admin_settings') }}" class="font-semibold underline">Settings</a> page. You can either upload your certificate and key files directly or provide the file paths on the server. A server restart is required to apply changes.</p>
<p class="mt-2">For local testing, you can generate a self-signed certificate with OpenSSL:</p>
<div class="code-block-wrapper mt-2">
<pre><code class="language-bash">openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj "/CN=localhost"</code></pre>
<button class="copy-button">Copy</button>
</div>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">IP Filtering</h3>
<p>In the <a href="{{ url_for('admin_settings') }}" class="font-semibold underline">Settings</a> page, you can specify comma-separated lists of IP addresses or ranges (e.g., <code class="inline-code">192.168.1.0/24</code>) for the "Allowed IPs" and "Denied IPs" fields to control access to the proxy.</p>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Dual-Protocol Gateway</h3>
<p>The Fortress can act as both an <b>Ollama</b> and an <b>OpenAI</b> server. This allows you to use your Hub with any software that expects either API style.</p>
<ul class="list-disc list-inside space-y-2 mt-2 pl-4">
<li><b>Port Separation:</b> You can run the OpenAI API on a different port (default 8081) for specialized routing or firewall rules.</li>
<li><b>Automatic Translation:</b> The Hub handles the heavy lifting of translating OpenAI's JSON schema to your backend models automatically.</li>
<li><b>Agent Support (Tool Calling):</b> Full support for function calling. You can now use the Hub with <b>CrewAI</b>, <b>LangChain</b>, or <b>AutoGen</b> using either the OpenAI or Ollama protocols.</li>
</ul>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Rate Limiting & Brute-Force Protection</h3>
<p>When Redis is enabled, you can set global or per-key rate limits. The proxy also automatically blocks IPs that have too many failed admin login attempts.</p>
</div>
</div>
</div>
<!-- Portable Tools & Multi-User Persistence Section -->
<div id="tool-portability" class="card-style scroll-mt-20 border-l-4 border-sky-500">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🛠️ Portable Tools & User Persistence</h2>
<div class="space-y-6 text-current">
<p>LoLLMs Hub follows the <b>Standard Host Interface</b> protocol. This allows tools developed here to run unchanged in the LoLLMs PyQt app or the WebUI.</p>
<div class="grid grid-cols-1 md:grid-cols-2 gap-6">
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-sky-400 mb-2">The 'lollms' Parameter</h4>
<p class="text-sm">To access user-specific data, add <code>lollms</code> to your function signature. The Hub will automatically inject the host interface.</p>
<pre class="text-[10px] mt-2 bg-black/40 p-2 rounded">def tool_my_logic(args, lollms=None):
if lollms:
username = lollms.user['username']</pre>
</div>
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-sky-400 mb-2">Persistence API</h4>
<p class="text-sm">Never use <code>open()</code> for state. Use the host methods to ensure data is isolated by user and tool library.</p>
<ul class="text-xs space-y-1 mt-2 list-disc list-inside">
<li><code>lollms.set(key, val, persistent=True)</code></li>
<li><code>lollms.get(key, default)</code></li>
<li><code>lollms.delete(key)</code></li>
</ul>
</div>
</div>
<div class="mt-4 p-3 rounded-md text-sm warning-box">
<strong>Portability Mandate:</strong> Do <u>not</u> import any files from the <code>app/</code> directory. Use <code>pipmaster</code> for external libraries and the <code>lollms</code> object for everything else.
</div>
</div>
</div>
<!-- Skills & Personalities Section -->
<div id="skills-personalities" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🧠 Skills & Personalities</h2>
<div class="space-y-8 text-current">
<div>
<h3 class="text-xl font-semibold mb-2">Skills Library</h3>
<p>Skills are specialized instructions, workflows, or tool definitions that you can plug into your models.</p>
<ul class="list-disc list-inside space-y-2 mt-2 pl-4">
<li><strong>Standardization:</strong> Skills allow you to define repeatable procedures (like "Python Code Reviewer") in a single <code class="inline-code">.md</code> file.</li>
<li><strong>Format:</strong> They use YAML Frontmatter (for metadata) and Markdown (for body logic).</li>
<li><strong>Portability:</strong> You can export/import skills using <code class="inline-code">.skill</code> files, which are simple ZIP archives of the skill folder.</li>
</ul>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Personalities Studio</h3>
<p>Personalities define the "soul" of your AI. They determine how the model speaks, acts, and behaves.</p>
<ul class="list-disc list-inside space-y-2 mt-2 pl-4">
<li><strong>Identity:</strong> Use the <code class="inline-code">## Identity</code> section to set the persona's backstory, tone, and character.</li>
<li><strong>Behaviour:</strong> Use the <code class="inline-code">## Behaviour</code> section to define the core system prompt, greeting, and constraints.</li>
<li><strong>Memory Seeds:</strong> You can embed persistent facts within the personality to ensure the model always knows specific details about your environment or preferences.</li>
</ul>
</div>
</div>
</div>
<!-- Usage Examples Section -->
<div id="examples" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">👨‍💻 Usage Examples</h2>
<div class="space-y-8">
<div>
<h3 class="text-xl font-semibold mb-2">cURL</h3>
<div class="code-block-wrapper">
<pre><code class="language-bash">
curl http://127.0.0.1:8080/api/generate \
-H "Authorization: Bearer op_prefix_secret" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"prompt": "Why is the sky blue?",
"stream": false
}'</code></pre>
<button class="copy-button">Copy</button>
</div>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Python (`requests`)</h3>
<div class="code-block-wrapper">
<pre><code class="language-python">
import requests
import json
proxy_url = "http://127.0.0.1:8080/api/chat"
api_key = "op_prefix_secret"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
data = {
"model": "llama3",
"messages": [
{"role": "user", "content": "Explain quantum computing simply."}
],
"stream": False
}
response = requests.post(proxy_url, headers=headers, data=json.dumps(data))
print(response.json())
</code></pre>
<button class="copy-button">Copy</button>
</div>
</div>
</div>
</div>
<!-- Redis Setup Guide -->
<div id="redis-setup" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">⚡ Redis Setup Guide</h2>
<div class="space-y-6 text-current">
<div>
<h3 class="text-xl font-semibold mb-2">Why Do I Need Redis?</h3>
<p>Redis is an in-memory database that provides high-speed data access. This application uses it for two optional but important security features:</p>
<ul class="list-disc list-inside space-y-2 mt-2 pl-4">
<li><strong>API Rate Limiting:</strong> Prevents abuse by limiting how many requests a key can make in a given time.</li>
<li><strong>Brute-Force Protection:</strong> Temporarily locks an IP address after too many failed admin logins.</li>
</ul>
<div class="mt-4 p-3 rounded-md text-sm warning-box">
<strong>Note:</strong> The proxy will work perfectly without Redis, but these security features will be disabled.
</div>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Installation</h3>
<p class="mb-4">The easiest way to run Redis on any platform is with Docker.</p>
<div class="code-block-wrapper mt-2">
<pre><code class="language-bash">docker run -d --name redis-stack -p 6379:6379 --restart always redis/redis-stack:latest</code></pre>
<button class="copy-button">Copy</button>
</div>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Configuration</h3>
<p>Once Redis is running, go to the <a href="{{ url_for('admin_settings') }}" class="font-semibold text-[var(--color-primary-600)] hover:underline">Settings</a> page, enter your Redis connection details, and save. The proxy will connect automatically.</p>
</div>
</div>
</div>
<!-- Home AI Hub Configuration -->
<div id="home-hub" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">🏠 Setting Up a Home AI Hub</h2>
<div class="space-y-4 text-current">
<p>To provide AI access to your whole family (Phones, Raspberry Pis, other PCs) using your main Gamer PC:</p>
<div class="grid grid-cols-1 md:grid-cols-2 gap-4">
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-[var(--color-primary-500)] mb-2">1. Multi-GPU Optimization</h4>
<p class="text-sm">Use the <strong>Instance Manager</strong> to start one Ollama process per GPU. Assign <code class="inline-code">0</code> to Instance A and <code class="inline-code">1</code> to Instance B. This allows your family to generate in parallel without slowing each other down.</p>
</div>
<div class="p-4 bg-white/5 rounded-lg border border-white/10">
<h4 class="font-bold text-[var(--color-primary-500)] mb-2">2. Family API Keys</h4>
<p class="text-sm">Create a user for each family member. Generate a unique API key for your daughter's PC and another for your phone's Telegram bot. This lets you track usage and set individual limits.</p>
</div>
</div>
<div class="mt-4 p-3 rounded-md text-sm info-box">
<strong>Pro Tip:</strong> Use <strong>Smart Routers</strong> to create an "Easy Mode" model name. Your family can just use the model <code class="inline-code">family-gpt</code>, and the proxy will automatically choose the fastest available GPU for them.
</div>
</div>
</div>
<!-- Credits Section -->
<div id="credits" class="card-style scroll-mt-20">
<h2 class="card-header text-2xl font-bold mb-6 pb-2">💖 Credits & Acknowledgements</h2>
<div class="space-y-4 text-current">
<p>This application was developed with passion by the open-source community. It stands on the shoulders of giants and wouldn't be possible without the following incredible projects:</p>
<ul class="list-disc list-inside space-y-2 pl-4">
<li><strong><a href="https://fastapi.tiangolo.com/" target="_blank" class="font-semibold text-[var(--color-primary-600)] hover:underline">FastAPI</a></strong>, <strong><a href="https://www.sqlalchemy.org/" target="_blank" class="font-semibold text-[var(--color-primary-600)] hover:underline">SQLAlchemy</a></strong>, <strong><a href="https://jinja.palletsprojects.com/" target="_blank" class="font-semibold text-[var(--color-primary-600)] hover:underline">Jinja2</a></strong>, <strong><a href="https://www.chartjs.org/" target="_blank" class="font-semibold text-[var(--color-primary-600)] hover:underline">Chart.js</a></strong>, and <strong><a href="https://tailwindcss.com/" target="_blank" class="font-semibold text-[var(--color-primary-600)] hover:underline">Tailwind CSS</a></strong>.</li>
</ul>
<p class="pt-4">Project built and maintained by <strong>ParisNeo</strong> with help from AI and cool developers (check the contributors list in the github page).</p>
<p>Visit the project on <a href="https://github.com/ParisNeo/lollms_hub" target="_blank" class="font-semibold text-[var(--color-primary-600)] hover:underline">GitHub</a> to contribute, report issues, or star the repository!</p>
</div>
</div>
</div>
<!-- Sticky Table of Contents -->
<div class="w-full lg:w-1/4">
<div class="sticky top-10">
<div class="card-style">
<h3 class="card-header text-lg font-bold mb-4">On this page</h3>
<nav>
<ul class="space-y-2">
<li><a href="#multi-user-architecture-the-fortress-pattern" class="toc-link">Multi-User Architecture</a></li>
<li><a href="#administrator-s-manual-system-setup" class="toc-link">🛡️ Administrator Guide</a></li>
<li><a href="#developer-s-manual-tool-graph-building" class="toc-link">👨‍💻 Developer Guide</a></li>
<li><a href="#end-user-manual-daily-use" class="toc-link">👤 End-User Guide</a></li>
<li><a href="#multi-node-architecture-diagram" class="toc-link">📡 Infrastructure Flow</a></li>
<li><a href="#community-credits" class="toc-link">Community & Credits</a></li>
</ul>
</nav>
</div>
</div>
</div>
</div>
<script src="{{ url_for('static', path='vendor/marked.min.js') }}"></script>
<script>
document.addEventListener('DOMContentLoaded', () => {
// Parse Markdown
// Parse Markdown
const content = document.getElementById('markdown-body');
const mdContent = `
# 🧊 LoLLMs Hub: The Multi-User AI Fortress
LoLLMs Hub is a secure, multi-tenant agentic runtime. Unlike standard proxies, it provides a **sandboxed environment** for multiple users to share compute resources while maintaining absolute privacy for their data, memories, and tools.
---
## 🏗️ Multi-User Architecture (The Fortress Pattern)
LoLLMs Hub uses a "Fortress" architecture to ensure that one user's AI activity never bleeds into another's.
### 🔐 The Four Pillars of Isolation
| Pillar | Mechanism | User Impact |
| :--- | :--- | :--- |
| **Cognitive Memory** | Per-User Database Rows | The AI learns *your* name and *your* preferences without confusing them with others. |
| **Tool Persistence** | Scoped Host Interface | Tools using \`lollms.set()\` save data specific to the current API key and user. |
| **Workflow Privacy** | Virtual Model Naming | Users call custom workflows by name (e.g., \`my-legal-agent\`) without needing to know the backend IP. |
| **Throughput Control** | Redis Rate Limiting | Administrators can prevent a single user from saturating all GPU workers. |
---
## 🛠️ Manuals by Role
### 🛡️ Administrator's Manual (System Setup)
*For those managing the hardware and users.*
1. **Server Clustering**: Add multiple [Ollama/vLLM Nodes]({{ url_for('admin_servers') }}). Set **Max Parallelism** to tell the Hub how many queries each node can handle simultaneously.
2. **Cluster Health**: Monitor the [Live System Flow]({{ url_for('admin_live_status') }}) to see traffic distribution across your GPU farm.
3. **User Onboarding**: Create [User Accounts]({{ url_for('admin_users') }}) and issue API keys. Keys can be disabled or revoked instantly without restarting the server.
4. **Endpoint Security**: Restrict sensitive paths (like \`/api/pull\`) in **Settings** to prevent unauthorized model downloads.
### 👨‍💻 Developer's Manual (Tool & Graph Building)
*For those building custom AI logic.*
1. **Portable Tools**: Build Python scripts in the [Tools Library]({{ url_for('admin_tools') }}).
- **Important**: Always use the \`lollms\` parameter to access per-user data.
- **Standard**: \`lollms.get("key")\` automatically retrieves the value for the *calling user*, not a global global value.
2. **Workflow Architect**: Use the [Conception UI]({{ url_for('admin_conception') }}) to build graphs.
- **Tip**: Click **"Build with AI"** to describe a flow and have the Hub generate the LiteGraph JSON for you.
3. **MCP Integration**: Use the \`hub/mcp\` node to connect to the global ecosystem of Model Context Protocol servers.
### 👤 End-User Manual (Daily Use)
*For those using the AI via API or Playground.*
1. **The "auto" Model**: You don't need to pick a model. Just request \`auto\`. The Hub detects if you're coding, asking about images, or need complex reasoning and picks the best hardware.
2. **Cognitive Memory**: Talk to the bot naturally. It uses a **Tiered Memory** system.
- **Working Memory**: Important facts are always known.
- **Long-term Handles**: Old facts are "remembered" only when relevant to save context space.
3. **Chat Playgrounds**: Use the [Playground]({{ url_for('admin_playground') }}) to test models with image support, document uploads, and real-time telemetry.
---
## 📡 Multi-Node Architecture Diagram
\`\`\`mermaid
graph TD
UserA[User A / Dev] -->|API Key A| Hub[LoLLMs Hub Fortress]
UserB[User B / App] -->|API Key B| Hub
subgraph Isolation_Layer [Isolation & Persistence]
Hub --> MemA[(User A Memory)]
Hub --> MemB[(User B Memory)]
end
subgraph Compute_Farm [GPU Cluster]
Hub -->|Least Loaded| Srv1[Ollama: GPU 0]
Hub -->|Least Loaded| Srv2[Ollama: GPU 1]
Hub -->|OpenAI Protocol| Srv3[vLLM: Cloud]
end
\`\`\`
---
## 👨‍💻 Community & Credits
LoLLMs Hub is built with ❤️ by **ParisNeo** and the open-source community.
Visit the project on [GitHub](https://github.com/ParisNeo/lollms_hub) to contribute!
`.trim();
content.innerHTML = marked.parse(mdContent);
// Copy button logic
document.querySelectorAll('.copy-button').forEach(button => {
button.addEventListener('click', () => {
const codeBlock = button.previousElementSibling.querySelector('code');
const text = codeBlock.innerText;
navigator.clipboard.writeText(text).then(() => {
const originalText = button.textContent;
button.textContent = 'Copied!';
setTimeout(() => {
button.textContent = originalText;
}, 2000);
}).catch(err => {
console.error('Failed to copy text: ', err);
});
});
});
// Table of Contents scroll-spy logic
const sections = document.querySelectorAll('div[id]');
const tocLinks = document.querySelectorAll('.toc-link');
const observer = new IntersectionObserver((entries) => {
let visibleSectionId = null;
let maxRatio = 0;
entries.forEach(entry => {
if (entry.isIntersecting && entry.intersectionRatio > maxRatio) {
maxRatio = entry.intersectionRatio;
visibleSectionId = entry.target.getAttribute('id');
}
});
if (!visibleSectionId && entries.length > 0 && entries[0].isIntersecting) {
visibleSectionId = entries[0].target.getAttribute('id');
}
if (visibleSectionId) {
tocLinks.forEach(link => {
link.classList.remove('active-toc');
if (link.getAttribute('href') === `#${visibleSectionId}`) {
link.classList.add('active-toc');
}
});
}
}, {
rootMargin: '-20% 0px -70% 0px',
threshold: [0.1, 0.5, 0.9]
});
sections.forEach(section => {
observer.observe(section);
});
});
</script>
{% endblock %}