mirror of https://github.com/ParisNeo/ollama_proxy_server.git synced 2026-04-22 03:00:03 -04:00

Go to file

Saifeddine ALOUI 712319e929 feat(admin): improve instance lifecycle and datastore ingestion

- Refactor instance state checking in admin toggle endpoint
- Add AI processing option to document ingestion pipeline
- Enhance instance manager with cross-platform signal handling
- Fix node registry path resolution for robust module loading
- Update admin templates with CSRF protection and UI polish

2026-04-21 05:07:05 +02:00

.github/workflows

ci: streamline GitHub Actions and tox configuration

2026-02-17 08:53:19 +01:00

app

feat(admin): improve instance lifecycle and datastore ingestion

2026-04-21 05:07:05 +02:00

assets

updated README

2025-10-31 03:56:08 +01:00

benchmarks

enhanced

2025-11-10 03:11:51 +01:00

docs

feat(api/core): Refactor workflow engine and agent structure for enhanced evaluation reporting

2026-04-20 01:00:10 +02:00

src

feat(admin): improve instance lifecycle and datastore ingestion

2026-04-21 05:07:05 +02:00

tests

add tox for linting and testing and add ci/cd

2025-11-13 16:02:33 +00:00

useful_scripts

feat: enhance system architecture, introduce space_hunter, and update admin interfaces

2026-04-17 19:22:34 +02:00

.env.example

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

.gitignore

feat: enhance system architecture, introduce space_hunter, and update admin interfaces

2026-04-17 19:22:34 +02:00

.gitlab-ci.yml

add tox for linting and testing and add ci/cd

2025-11-13 16:02:33 +00:00

CONTRIBUTING.md

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

DEVELOPMENT.md

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

Dockerfile

refactor(docker): optimize multi-stage build for production

2026-04-14 23:33:13 +02:00

gunicorn_conf.py

feat(admin): add log rotation and retention controls

2026-03-24 02:29:49 +01:00

LICENSE

Initial commit

2024-01-15 23:35:19 +01:00

poetry.lock

Bump urllib3 from 2.5.0 to 2.6.3

2026-02-17 11:41:06 +00:00

purge_bloat.py

feat: Implement core API routing, proxy logic, and logging configuration

2026-04-15 18:19:23 +02:00

pyproject.toml

Update pyproject.toml

2026-04-20 09:17:19 +02:00

README.md

feat(admin): add internal system bypass and enhance admin settings

2026-04-10 20:00:08 +02:00

requirements.txt

Update requirements.txt

2026-04-20 09:12:57 +02:00

reset_admin_password.bat

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

reset_admin_password.py

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

reset_admin_password.sh

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

reset.bat

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

reset.sh

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

run_windows.bat

refactor(api): update authentication handling and fix token validation logic

2026-04-15 22:34:46 +02:00

run.sh

feat(run.sh): add automatic system dependency installer for Linux

2026-04-15 22:39:35 +02:00

setup_wizard.py

chore(rebrand): rename project from "Ollama Proxy Fortress" to "lollms hub"

2026-03-22 01:12:20 +01:00

todo.md

Update todo.md

2026-04-14 18:23:10 +02:00

tox.ini

ci: streamline GitHub Actions and tox configuration

2026-02-17 08:53:19 +01:00

README.md

🧊 LoLLMs Hub Fortress: The Universal AI Gateway

LoLLMs Hub is a secure, high-performance orchestration layer that unifies disparate AI backends into a single, managed API. It serves as a fortress for your compute resources, providing enterprise-grade security, intelligent routing, and advanced workflow automation.

🏢 Unified Multi-Backend Architecture

LoLLMs Hub is designed for extreme scale and flexibility. It treats different inference engines as swappable compute nodes:

Ollama: Native support for local model management and serving.
vLLM: High-throughput serving for production-grade LLM clusters.
llama.cpp: Lightweight, GGUF-based local serving via the internal Binary Hub.
OpenAI-Compatible APIs: Seamlessly proxy and load-balance cloud providers.

🏢 Hierarchical Architecture (The Master Hub Pattern)

LoLLMs Hub is designed for extreme scale. Because its API is 100% compatible with Ollama and OpenAI, you can chain multiple Hubs together.

Multi-Node Cluster Flow

graph TD
    Client[AI Client / App] -->|Auth & Load Balance| Master[Master Node: LoLLMs Hub]
    
    subgraph Machine_A [Worker 1: Windows Gamer PC]
        Master --> SlaveA[Local Slave Hub]
        SlaveA --> InstA1[Ollama Instance 1: GPU 0]
        SlaveA --> InstA2[Ollama Instance 2: GPU 1]
    end
    
    subgraph Machine_B [Worker 2: Linux Server]
        Master --> SlaveB[Ollama Service]
    end
    
    subgraph Machine_C [Worker 3: Cloud]
        Master --> SlaveC[vLLM / OpenAI API]
    end

Why use a Slave Hub?

Local Management: The Slave Hub handles spawning local processes on the worker machine.
Simplified Master: The Master Hub only sees one "Remote Server" entry instead of dozens of raw instances.
Security: Only the Master needs public access; Slaves can stay on a private VPN/VLAN.

🧠 Advanced AI Orchestration: Routers & Ensembles

Smart Routers: Hierarchical Model Routing

Create virtual "traffic controllers" that intelligently route requests to the best model for the job:

Routing Strategies:

Priority: Always try the first model, fall back to second if unavailable
Random: Even distribution for large clusters
Least Loaded: Route to the backend with lowest active request count (best for high-TPS apps)

Decision Rules (Evaluated Top-to-Bottom):

Fast Rules: Keywords, regex patterns, message length, image detection, specific users
Semantic Rules: Use a small LLM to classify intent when pattern matching isn't enough

Example - Vision-Enabled Router:

User Request: "What's in this image?" [attached photo]
↓
Rule: has_images = true → Route to gemma3:27b (VLM)
↓
VLM analyzes image → "A golden retriever playing fetch"
↓
Description + original prompt → Sent to llama3.1:70b (powerful text model)

Ensemble Orchestrators (MoE)

Combine multiple models in parallel to create superior reasoning:

Flow:

User Query: "Analyze this contract for risks"
↓
Parallel Execution:
├─ Contract-Law-Agent → "Clause 3 is non-standard"
├─ Case-Law-Agent → "Similar to Smith v. Jones (2019)"
└─ Compliance-Agent → "GDPR Article 17 may apply"
↓
Master Model Synthesizes:
"Based on expert analysis: The non-standard clause (3) resembles 
Smith v. Jones where it was upheld. However, GDPR compliance 
requires additional safeguards. Recommendation:..."

Quick Vision Enabler

The fastest way to give your text-only models vision capabilities:

Before: Your powerful 70B text model can't see images 😢 After: Create a vision router in 10 seconds ✅

Select your text model (e.g., nemotron-3-super)
Select a vision model (e.g., gemma3:27b)
Name it (e.g., smart-vision-assistant)
Done! Use smart-vision-assistant as any model name

🔒 Encrypt Everything with One-Click HTTPS/SSL

Securing your AI traffic is now dead simple. In the Settings -> HTTPS/SSL menu, you have two easy options:

Upload & Go (Easiest): Simply upload your key.pem and cert.pem files directly through the UI. The hub handles the rest.
Path-Based: If your certificates are already on the server (e.g., managed by Certbot), just provide the full file paths.

A server restart is required to apply changes, ensuring your connection is fully encrypted and secure from eavesdropping.

Get Started in 60 Seconds (Yes, Really!)

1. Download the Project

Download the source code from the repository, either by using git or by downloading the ZIP file and extracting it.

git clone https://github.com/ParisNeo/lollms_hub.git
cd lollms_hub

2. Run the Installer

Based on your operating system, run the appropriate script. The first time you run it, it will guide you through a simple setup wizard.

On Windows: Simply double-click run_windows.bat.

On macOS or Linux: Open your terminal, navigate to the project folder, and run:

chmod +x run.sh
./run.sh

That's it! The hub is now running. To stop it, just close the terminal window or press Ctrl+C.

Visual Showcase

Step 2: The Command Center Dashboard

Your new mission control. Instantly see system health, active models, server status, and live rate-limit queues, all updating automatically.

Step 3: Manage Your Servers & Models

No more SSH or terminal juggling. Add all your Ollama instances to the hub, then pull, update, and delete models on any server with a few clicks.

Step 4: Choose Your Look: The Theming Engine

Navigate to the Settings page and instantly transform the entire UI. Pick a style that matches your mood or your desktop setup.

Step 5: Manage Users & Drill Down into Analytics

The User Management page gives you a sortable, high-level overview. From here, click "View Usage" to dive into a dedicated analytics page for any specific user.

Step 6: Create Intelligent Model Orchestration

Navigate to Smart Routers to build hierarchical routing logic. Create a vision enabler in seconds: select a text model + VLM, name it, and instantly give your text-only AI image understanding.

Or go to Ensembles to build Mixture-of-Experts pipelines. Define multiple parallel agents and a master synthesizer for complex reasoning tasks.

Step 6: Create Intelligent Model Orchestration

Navigate to Smart Routers to build hierarchical routing logic. Create a vision enabler in seconds: select a text model + VLM, name it, and instantly give your text-only AI image understanding.

Or go to Ensembles to build Mixture-of-Experts pipelines. Define multiple parallel agents and a master synthesizer for complex reasoning tasks.

Step 7: Test & Benchmark in the Playgrounds

Use the built-in playgrounds to evaluate your models. The Chat Playground provides a familiar UI to test conversational models with streaming and image support. The Embedding Playground lets you visualize and benchmark how different models understand semantic relationships using powerful 2D plots.

Step 8: Master Your Analytics

The main "Usage Stats" page and the per-user pages give you a beautiful, exportable overview of exactly how your models are being used.

Step 8: Get Help When You Need It

The built-in Help page is now a rich document with a sticky table of contents that tracks your scroll position, making it effortless to find the information you need.

For the Power Users: Docker Deployment

If you prefer a container-based workflow, we've got you covered.

1. Build the Docker image:

docker build -t lollms-hub .

2. Run the container: Create a .env file on your host machine, then run:

docker run -d --name lollms-hub \
  -p 8080:8080 \
  --env-file ./.env \
  -v ./lollms_hub.db:/home/app/lollms_hub.db \
  -v ./.ssl:/home/app/.ssl \
  -v ./app/static/uploads:/home/app/app/static/uploads \
  lollms-hub

Note the extra volume mounts for the database, SSL files, and user uploads to persist data outside the container.

Resetting Your Installation (Troubleshooting)

WARNING: IRREVERSIBLE ACTION

The reset scripts are for troubleshooting or starting over completely. They will PERMANENTLY DELETE your database, configuration, and Python environment.

If you encounter critical errors or wish to perform a completely fresh installation, use the provided reset scripts.

On Windows: Double-click the reset.bat file.

On macOS or Linux:

chmod +x reset.sh
./reset.sh

Credits and Acknowledgements

This application was developed with passion by the open-source community. A special thank you to:

ParisNeo for creating and maintaining this project.
All contributors who have helped find and fix bugs.
The teams behind FastAPI, SQLAlchemy, Jinja2, Chart.js, and Tailwind CSS.

Visit the project on GitHub to contribute, report issues, or star the repository!

License

This project is licensed under the Apache License 2.0. Feel free to use, modify, and distribute.

Languages

Python 48.7%

HTML 45.2%

JavaScript 2.5%

CSS 1.7%

Shell 1.5%

Other 0.4%

README.md

🧊 LoLLMs Hub Fortress: The Universal AI Gateway

🏢 Unified Multi-Backend Architecture

🏢 Hierarchical Architecture (The Master Hub Pattern)

Multi-Node Cluster Flow

🧠 Advanced AI Orchestration: Routers & Ensembles

Smart Routers: Hierarchical Model Routing

Ensemble Orchestrators (MoE)

Quick Vision Enabler

🔒 Encrypt Everything with One-Click HTTPS/SSL

Get Started in 60 Seconds (Yes, Really!)

1. Download the Project

2. Run the Installer

Visual Showcase

Step 1: Secure Admin Login

Step 2: The Command Center Dashboard

Step 3: Manage Your Servers & Models

Step 4: Choose Your Look: The Theming Engine

Step 5: Manage Users & Drill Down into Analytics

Step 6: Create Intelligent Model Orchestration

Step 6: Create Intelligent Model Orchestration

Step 7: Test & Benchmark in the Playgrounds

Step 8: Master Your Analytics

Step 8: Get Help When You Need It

For the Power Users: Docker Deployment

Resetting Your Installation (Troubleshooting)

Credits and Acknowledgements

License