Run, Swap, and Scale LLM Backends. The Local MLOps Control Plane for AI.
Building and bench-testing AI is a pain. You're stuck wrestling a fragile stack of scripts, containers, and boilerplate just to test one prompt. Aleutian unifies your entire MLOps workflow—RAG, data, observability—and gives you dynamic control to swap between public APIs and private, fine-tuned models, all from a single control plane. It gives you the freedom to build, the control to deploy, and the sovereignty to own your data.
Before You Start: Prerequisites
- Podman Desktop (Installed)
- Ollama App (Installed)
- Podman Machine (Running)
- Ollama App (Running)
Ensure Podman & Ollama are running before you start.
This model runs locally and is required for the stack to start.
Required for embedding models. Get your HF Token.
Required by the orchestrator. Get your OpenAI Key (or use 'dummy-key').
Your App, Your Data, Your Choice
Aleutian is the vital link that "chains" your app to any backend, giving you 100% control over cost, privacy, and power.
Your Application
CLI, API, or UI
Aleutian Control Plane
RAG Engine, Privacy Scan, History
Swappable Backends
Ollama (Local)
Free, Private, Fast
OpenAI / Gemini
Max Power, Pay-as-you-go
Private TGI/vLLM
Self-Hosted, VPC
A Familiar Workflow
No complex abstractions. Just simple, powerful commands that fit right into your workflow.
$ aleutian populate vectordb ./my-local-docs
INFO[0000] Starting recursive file walk of './my-local-docs'
INFO[0000] Found 24 files.
INFO[0000] Scanning 'annual-report-2024.pdf' for PII...
INFO[0000] > Found 3 potential PII matches (Credit Card).
INFO[0000] > User approved ingestion.
INFO[0000] Calling PDF Parser service...
INFO[0000] Embedding 120 chunks for 'annual-report-2024.pdf'
...
INFO[0002] Successfully populated 24 documents.
Who is Aleutian For?
Does This Sound Like You?
Aleutian isn't just a tool; it's a workflow. It's designed for specific, painful problems that AI/ML engineers face every day.
The Security-Conscious Engineer
The Pain
- You are banned from using Copilot, ChatGPT, or any cloud API on proprietary code.
- You are terrified of any sensitive data leaving your network, even by accident.
- Your "solution" is running a local Ollama model, but it's just an API. You have no RAG, no data pipelines, no observability, and no UI to make it useful.
- You've been told the "hybrid" approach is secure, but you know sending any code snippets to a 3rd party still violates policy.
The Promise
You get a 100% air-gapped MLOps stack. Analyze all your private documents with a complete, production-grade RAG and observability system that runs entirely offline on your laptop.
The AI/ML Experimenter
The Pain
- Your Jupyter Notebook is 90% boilerplate: initializing Weaviate clients, Ollama clients, and managing API keys.
- Debugging your multi-step agent is a nightmare of `print()` statements and sifting through terminal logs.
- Adding a new tool (like a stock ticker) or a new data store (like Postgres) means building another FastAPI server and figuring out how to deploy and manage it.
- Your setup is a fragile mess of scripts that you can't easily share with your co-workers.
The Promise
You get a stable, shareable MLOps runtime. All your tools, data connectors, and RAG pipelines run in one managed, observable stack that you can call in one line: `client = AleutianClient()`.
The Cost-Conscious Developer
The Pain
- Your OpenAI API bill is already $50/day... and you're just developing.
- You're hesitant to iterate and run tests because every single RAG query costs you money.
- You want to test 5 different local GGUF models (Llama3, Mistral, Phi-3), but manually downloading, converting, and configuring each one is a painful, time-sucking process.
The Promise
You get a 100% free local dev loop. Use the built-in `aleutian convert` command to pull any model from Hugging Face, quantize it (-q bf16), and automatically register it with Ollama in one step (`--register`). Have a fine-tuned model? Point the same command at your local directory (`--is-local-path`) to convert and load it. Test your entire app for free, then just change "one line" in your config to swap models.
The MLOps/DevOps Engineer
The Pain
- The ML team "throws a notebook over the wall" that "works on their machine".
- Your job is to re-architect this "throwaway" prototype from scratch for production.
- It has no health checks, no observability, no container definitions, and no clear CI/CD path.
The Promise
The local `podman-compose.yml` is the blueprint for production. Because the prototype was built on Aleutian, it's already containerized, observable, and configurable. You save months of refactoring.
Technical Features
The Complete MLOps Stack
Your "Who Is This For?" workflow is powered by these six core components, all running locally.
Secure Data Ingestion
Automate data processing and privacy-scanning before it hits your vector DB.
Flexible RAG Engine
Utilize multiple RAG strategies (like Re-ranking) out-of-the-box via a simple API.
Unified LLM Access
Seamlessly switch between local models (Ollama) and external APIs (OpenAI) with one config change.
Integrated Observability
Get immediate insights with a pre-configured OpenTelemetry, Jaeger, and Prometheus stack.
Modular & Extensible
Add custom containers and tools using standard `podman-compose.override.yml` practices.
Production Ready
Your local stack translates directly to Kubernetes or Docker Swarm for production.
Python SDK (aleutian-client)
Integrate Aleutian's power programmatically into your notebooks, scripts, or backend applications.
Powerful CLI
Manage the stack, ingest data, run queries, convert models, and administer Weaviate directly from your terminal.
Need help with integration?
Focus on your product, not your infrastructure. Our experts can deploy and configure a bespoke Aleutian stack for your specific needs.
Learn About EnterpriseAleutian Enterprise
Get advanced privacy controls, enhanced security, and priority support. Tell us your needs.
Built for Developers, by Developers
AleutianLocal is free, open-source, and licensed under AGPLv3. We believe in empowering developers to own their entire stack, from laptop to production.