{% extends "admin/base.html" %} {% block title %}Help & Credits{% endblock %} {% block header_title %}Help & Credits{% endblock %} {% block content %}
This proxy server acts as a secure gateway to your Ollama instances. The workflow is simple:
When creating a new API key, you can set a custom rate limit that applies only to that key. You do this by filling in the two rate limit fields:
Rate Limit Requests: The maximum number of requests this key can make within the time window.Window (Minutes): The duration of the time window in minutes.Example: Setting `Requests` to `100` and `Window` to `5` means this specific key can be used a maximum of 100 times every 5 minutes.
If you leave these fields blank, the key will automatically use the global rate limit settings defined in your server's `.env` file.
In all examples, replace `op_..._...` with your actual generated API key.
This command fetches the federated list of all models from all active backend servers.
curl http://127.0.0.1:8080/api/tags \
-H "Authorization: Bearer op_prefix_secret"
A standard way to stream a response using the popular `requests` library.
import requests
import json
API_KEY = "op_prefix_secret" # <-- Replace with your key
PROXY_URL = "http://127.0.0.1:8080/api/generate"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
data = {
"model": "llama3",
"prompt": "Tell me a short story about a robot who discovers music.",
"stream": True
}
try:
with requests.post(PROXY_URL, headers=headers, json=data, stream=True) as response:
response.raise_for_status() # Raise an exception for bad status codes
print("Story: ", end="", flush=True)
for chunk in response.iter_lines():
if chunk:
decoded_chunk = json.loads(chunk.decode('utf-8'))
print(decoded_chunk.get("response", ""), end="", flush=True)
print()
except requests.exceptions.RequestException as e:
print(f"\nAn error occurred: {e}")
To use the proxy with the `lollms-client` library, configure the host address to point to the proxy and provide your key as the `service_key`.
from lollms_client import LollmsClient, MSG_TYPE
# Configure the client to use the secure proxy
lc = LollmsClient(
llm_binding_name="ollama",
llm_binding_config={
"host_address": "http://localhost:8080", # <-- Point to the proxy server
"model_name": "llama3", # Or any model you have
"service_key": "op_prefix_secret", # <-- Replace with your API key
}
)
def streaming_callback(data, msg_type):
if msg_type == MSG_TYPE.MSG_TYPE_CHUNK:
print(data, end="", flush=True)
return True
print("--- Streaming with lollms-client ---")
lc.generate_text(
prompt="What is the capital of France?",
stream=True,
streaming_callback=streaming_callback
)
print("\n\n--- Listing models ---")
models = lc.listModels()
print(models)
This application was developed with passion by the open-source community. It stands on the shoulders of giants and wouldn't be possible without the following incredible projects:
A special thank you to **Saifeddine ALOUI (ParisNeo)** for creating and maintaining this project.
Visit the project on GitHub to contribute, report issues, or star the repository!