Imagine being able to sell access to your proprietary datasets directly to AI assistants like Claude or ChatGPT. Not as static file downloads, but as live, queryable knowledge bases that AI can search in real-time during conversations. This is the vision behind FLTR (Backsplash): a RAG-as-a-Service platform that makes proprietary datasets accessible to AI through the Model Context Protocol (MCP).
The Vision: "Context as a Service"
Large Language Models (LLMs) are powerful, but they're limited by their training data cutoff dates and lack of domain-specific knowledge. Retrieval Augmented Generation (RAG) solves this by allowing AI to query external knowledge bases, but implementing RAG at scale comes with challenges:
- Complex infrastructure (vector databases, embedding pipelines, document processing)
- High costs (GPU compute, storage, API calls)
- Integration difficulties (connecting to AI chat interfaces)
- Multi-tenancy concerns (isolating customer data)
FLTR solves these problems by providing dataset-as-a-service with built-in RAG capabilities and MCP integration.
What is the Model Context Protocol (MCP)?
MCP is an open standard created by Anthropic for connecting AI assistants to external data sources. Think of it as a universal adapter between LLMs and your data:
┌──────────────┐
│ Claude/GPT │ ← AI Assistant
└──────┬───────┘
│ MCP Protocol
▼
┌──────────────┐
│ MCP Server │ ← Your Data Source
└──────┬───────┘
│
▼
┌──────────────┐
│ Database │ ← PostgreSQL, Milvus, etc.
└──────────────┘
Benefits:
- Standardized interface for AI-to-data connections
- Real-time queries during AI conversations
- Proper citations in AI responses
- Access control and authentication built-in
System Architecture
FLTR uses a dual-architecture approach:
┌─────────────────────────────────────────────┐
│ FLTR Architecture │
├─────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Next.js │ │ FastAPI │ │
│ │ Frontend │◄────►│ Backend │ │
│ │ (Drizzle) │ │ (SQLAlchemy) │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ fltr_auth DB │ │ fltr DB │ │
│ │(Users/Teams) │ │ (Datasets) │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Milvus │ │
│ │ (Vectors) │ │
│ └──────────────┘ │
│ │
│ ┌────────────────────┐ │
│ │ Modal.com │ │
│ │ (Document Proc.) │ │
│ └────────────────────┘ │
└───────────────────────────────────────────┘
Key Components:
- Next.js Frontend: User dashboard, dataset management, credit purchasing
- FastAPI Backend: API server, MCP integration, vector search
- Two-Database Strategy:
fltr_auth: User accounts, teams, authenticationfltr: Datasets, documents, chunks, embeddings metadata
- Milvus: Vector database for semantic search
- Modal.com: Serverless GPU for document processing
- Cloudflare R2: Object storage for uploaded files
Database Design: The Two-Database Strategy
Why two databases?
User authentication and dataset operations have different requirements:
-- fltr_auth database (managed by Better Auth)
CREATE TABLE users (
id UUID PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT,
credits INTEGER DEFAULT 0
);
CREATE TABLE teams (
id UUID PRIMARY KEY,
name TEXT NOT NULL,
owner_id UUID REFERENCES users(id)
);
-- fltr database (dataset operations)
CREATE TABLE datasets (
id UUID PRIMARY KEY,
name TEXT NOT NULL,
owner_id UUID, -- References fltr_auth.users.id
milvus_collection TEXT NOT NULL,
chunk_size INTEGER DEFAULT 512,
chunk_overlap INTEGER DEFAULT 50
);
CREATE TABLE documents (
id UUID PRIMARY KEY,
dataset_id UUID REFERENCES datasets(id),
filename TEXT NOT NULL,
r2_key TEXT NOT NULL, -- Cloudflare R2 path
status TEXT DEFAULT 'pending',
processed_at TIMESTAMP
);
CREATE TABLE chunks (
id UUID PRIMARY KEY,
document_id UUID REFERENCES documents(id),
content TEXT NOT NULL,
milvus_id TEXT NOT NULL, -- Reference to Milvus vector
chunk_index INTEGER,
metadata JSONB
);
Cross-database references use UUIDs, allowing flexible deployment (same server or separate instances).
Document Processing Pipeline
The processing pipeline uses Modal.com for serverless GPU compute:
1. Upload to R2 (Direct from Browser)
// app/actions/upload.ts
'use server'
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
import { getSignedUrl } from '@aws-sdk/s3-request-presigner'
export async function getUploadUrl(filename: string) {
const r2 = new S3Client({
region: 'auto',
endpoint: process.env.R2_ENDPOINT!,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID!,
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY!,
},
})
const key = `uploads/${nanoid()}/${filename}`
const command = new PutObjectCommand({
Bucket: process.env.R2_BUCKET!,
Key: key,
})
const signedUrl = await getSignedUrl(r2, command, { expiresIn: 3600 })
return { signedUrl, key }
}
Client-side upload (no API bottleneck):
// components/upload.tsx
const { signedUrl, key } = await getUploadUrl(file.name)
// Upload directly to R2
await fetch(signedUrl, {
method: 'PUT',
body: file,
})
// Trigger processing
await triggerProcessing(key)
2. Modal.com Processing
# modal_app.py
import modal
stub = modal.Stub("fltr-processor")
@stub.function(
image=modal.Image.debian_slim().pip_install(
"pypdf2", "sentence-transformers", "pymilvus"
),
gpu="T4", # GPU for embeddings
timeout=600,
)
async def process_document(r2_key: str, dataset_id: str):
"""
Process PDF: Extract text, chunk, embed, store in Milvus
"""
# Download from R2
content = download_from_r2(r2_key)
# Extract text from PDF
text = extract_text_from_pdf(content)
# Chunk with overlap
chunks = chunk_text(text, size=512, overlap=50)
# Generate embeddings (GPU-accelerated)
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunks)
# Store in Milvus
collection = Collection(f"dataset_{dataset_id}")
collection.insert([
chunks,
embeddings,
[{"chunk_index": i} for i in range(len(chunks))]
])
# Update database status
update_document_status(dataset_id, "completed")
3. Webhook Trigger (R2 → FastAPI → Modal)
# fastapi_app/routes/webhooks.py
from fastapi import APIRouter, BackgroundTasks
import modal
router = APIRouter()
@router.post("/r2-upload-complete")
async def r2_webhook(
data: dict,
background_tasks: BackgroundTasks
):
r2_key = data["key"]
dataset_id = data["dataset_id"]
# Trigger Modal processing asynchronously
f = modal.Function.lookup("fltr-processor", "process_document")
background_tasks.add_task(
f.remote,
r2_key=r2_key,
dataset_id=dataset_id
)
return {"status": "processing"}
Vector Search with Milvus
Milvus provides high-performance vector similarity search:
Collection Strategy: Shared vs. Per-Dataset
Option 1: One collection per dataset ❌
# Expensive! Creates overhead per dataset
collection = Collection(f"dataset_{dataset_id}")
Option 2: Shared collection with filtering ✅
# Efficient! One collection, filter by dataset_id
collection = Collection("all_datasets")
# Search with filter
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=10,
expr=f"dataset_id == '{dataset_id}'" # Filter to specific dataset
)
Why shared collection?
- Lower memory overhead
- Faster startup (no need to load multiple collections)
- Easier to scale
- Still isolated via filtering
Search Implementation
# fastapi_app/services/search.py
from pymilvus import Collection, connections
from sentence_transformers import SentenceTransformer
connections.connect(host="milvus", port="19530")
collection = Collection("all_datasets")
model = SentenceTransformer('all-MiniLM-L6-v2')
def search_dataset(query: str, dataset_id: str, top_k: int = 5):
# Generate query embedding
query_embedding = model.encode([query])[0]
# Search Milvus
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=top_k,
expr=f"dataset_id == '{dataset_id}'",
output_fields=["chunk_id", "content", "metadata"]
)
# Fetch full chunk content from PostgreSQL
chunks = []
for hit in results[0]:
chunk = db.query(Chunk).filter_by(
milvus_id=hit.id
).first()
chunks.append({
"content": chunk.content,
"score": hit.distance,
"metadata": chunk.metadata,
"document": chunk.document.filename
})
return chunks
MCP Integration: Connecting AI to Your Data
The MCP server exposes datasets as queryable tools for AI:
// mcp-server/index.ts
import { Server } from '@modelcontextprotocol/sdk/server'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio'
const server = new Server({
name: 'fltr-mcp-server',
version: '1.0.0',
})
// Register search tool
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: 'search_dataset',
description: 'Search a FLTR dataset using semantic search',
inputSchema: {
type: 'object',
properties: {
dataset_id: { type: 'string' },
query: { type: 'string' },
top_k: { type: 'number', default: 5 },
},
required: ['dataset_id', 'query'],
},
},
],
}))
// Handle tool execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'search_dataset') {
const { dataset_id, query, top_k } = request.params.arguments
// Call FastAPI backend
const response = await fetch(
`${process.env.FASTAPI_URL}/search`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ dataset_id, query, top_k }),
}
)
const results = await response.json()
return {
content: [
{
type: 'text',
text: JSON.stringify(results, null, 2),
},
],
}
}
})
// Start server
const transport = new StdioServerTransport()
await server.connect(transport)
Usage in Claude Desktop:
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"fltr": {
"command": "node",
"args": ["/path/to/mcp-server/index.ts"],
"env": {
"FASTAPI_URL": "https://api.fltr.app"
}
}
}
}
Now in Claude conversations:
User: What are the key findings in the Q3 2024 revenue report?
Claude: I'll search the FLTR dataset for that information.
[Calls MCP tool: search_dataset]
dataset_id: "user_reports"
query: "Q3 2024 revenue findings"
[Results with citations]
Based on the Q3 2024 Revenue Report (page 4):
- Total revenue increased 23% YoY to $4.2M
- SaaS ARR grew 31% to $3.1M
- Customer churn decreased to 2.1%
Source: q3-2024-revenue-report.pdf, chunks 12-15
Credit System Implementation
Users purchase credits to access datasets:
// lib/schema.ts (Drizzle - fltr_auth)
export const users = pgTable('users', {
id: uuid('id').primaryKey(),
email: text('email').notNull().unique(),
credits: integer('credits').default(0),
})
export const creditTransactions = pgTable('credit_transactions', {
id: uuid('id').primaryKey(),
userId: uuid('user_id').references(() => users.id),
amount: integer('amount').notNull(), // Positive = purchase, negative = usage
type: text('type').notNull(), // 'purchase', 'search', 'upload'
datasetId: uuid('dataset_id'), // References fltr.datasets
createdAt: timestamp('created_at').defaultNow(),
})
Deduct credits on search:
# fastapi_app/routes/search.py
@router.post("/search")
async def search(
dataset_id: str,
query: str,
user_id: str,
db: Session = Depends(get_db)
):
# Check user has credits (cross-database query)
user = db_auth.query(User).filter_by(id=user_id).first()
if user.credits < 1:
raise HTTPException(status_code=402, detail="Insufficient credits")
# Perform search
results = search_dataset(query, dataset_id)
# Deduct credit
user.credits -= 1
db_auth.commit()
# Log transaction
transaction = CreditTransaction(
user_id=user_id,
amount=-1,
type="search",
dataset_id=dataset_id
)
db_auth.add(transaction)
db_auth.commit()
return results
Cost Optimization Strategies
1. DigitalOcean Droplet Downsizing
Before: $50/month droplet (overkill for FastAPI) After: $24/month droplet (right-sized)
# Optimize gunicorn workers
gunicorn -w 2 -k uvicorn.workers.UvicornWorker main:app
# Only 2 workers needed for moderate traffic
2. Modal.com Serverless GPU
Why not always-on GPU?
- GPU VMs cost $1-3/hour
- Document processing is sporadic (not 24/7)
Modal.com pricing:
- $0 when idle
- $0.60/hour GPU T4 (only when processing)
- Average cost: ~$15/month (vs $720+ for dedicated)
3. Milvus Shared Collections
Before: One collection per dataset
- 100 datasets = 100 collections
- High memory overhead
After: One shared collection with filtering
- 100 datasets = 1 collection
- 80% memory reduction
Production Deployment Challenges
1. Cross-Database Foreign Keys
Challenge: Drizzle (Next.js) and SQLAlchemy (FastAPI) both need access to fltr_auth.users
Solution: UUID references without actual foreign key constraints
// Drizzle doesn't enforce FK across databases
export const datasets = pgTable('datasets', {
id: uuid('id').primaryKey(),
ownerId: uuid('owner_id'), // Refers to fltr_auth.users.id (not enforced)
})
# SQLAlchemy also uses UUID references
class Dataset(Base):
__tablename__ = "datasets"
id = Column(UUID, primary_key=True)
owner_id = Column(UUID) # Refers to fltr_auth.users.id
Trade-off: Lose referential integrity, but gain flexibility.
2. Dual ORM Strategy
Drizzle (Next.js):
- Type-safe SQL generation
- Great TypeScript integration
- Fast for simple queries
SQLAlchemy (FastAPI):
- Mature Python ORM
- Better for complex queries
- Event system for lifecycle hooks
Coordination: Use migrations carefully to keep schemas in sync.
Lessons Learned
- MCP is powerful but early - Protocol is evolving, expect breaking changes
- Two databases = more complexity - Worth it for separation of concerns
- Shared Milvus collection wins - Better performance, lower costs
- Modal.com = game changer - Serverless GPU eliminates massive fixed costs
- Presigned URLs >> proxying - Direct R2 uploads save bandwidth and latency
Future Roadmap
Short-term:
- Multi-format support (DOCX, CSV, JSON)
- Dataset versioning
- Team collaboration features
- API keys for programmatic access
Long-term:
- Federated search across multiple datasets
- Custom embedding models
- Real-time dataset updates (webhooks on changes)
- Marketplace for public datasets
Conclusion
FLTR demonstrates that RAG-as-a-Service is feasible and can be cost-effective with the right architecture:
- Next.js + FastAPI = Best of both worlds (TypeScript + Python)
- Modal.com = Serverless GPU unlocks affordability
- MCP = Standardized AI-to-data integration
- Milvus shared collections = Scalable vector search
Whether you're building a dataset marketplace, internal knowledge base, or AI-powered research tool, these patterns apply. The combination of modern tooling and serverless compute makes RAG accessible to indie developers, not just enterprises.
Learn more: fltr.app