Hardware + Software · One On-Premise Stack

The BiltIQ AI Factory

Your Data. Your Premises. Your AI.

Private, full-stack enterprise AI — agents over your own CRM, documents, and support data — running on office-grade hardware, with nothing leaving the building.

50–500×

Cheaper / token vs cloud API

240 W

Per node — no data center

~500 GB

On-prem model memory

Zero

Cloud · water · data egress

DPIIT RecognisedNVIDIA Inception PartnerDPDP · HIPAA Aligned

Book a Discovery Sprint →See the platform →

See the AI Factory on Your Premises

30-min call · We map your workload and recommend the right on-premise stack.

01 — Why Now

Token subscriptions are an unbounded tax on your own success

Enterprises have realized that token subscriptions are an unbounded tax on their own activity — the more their staff use AI, the more they pay, forever — and that their operational alpha (workflows, customer knowledge, institutional memory) should compound inside their own walls, not inside a vendor relationship. Frontier labs promise contractually that they won’t train on your data; we make the question irrelevant. When the models, the retrieval index, and the memory all live on your premises, there is nothing to trust and nothing to audit remotely: your data cannot train anyone else’s model because it never leaves your building.

Your intelligence stays yours, and it compounds for you.

Companies adopting AI hit the same three walls: runaway API token bills, per-seat pricing that scales with headcount, and customer data flowing to third-party clouds. Yet the work they actually need — assistants grounded in their own CRM, tickets, and documents — doesn’t require frontier-scale models. When retrieval (RAG) supplies the facts and MCP/tools supply the actions, a well-served 12B–35B open model does the job frontier APIs are sold for.

02 — The Platform

Two layers, one stack

We assess the workload, specify and install the right hardware fleet on your own premises, and layer software on top that turns it into a working AI factory — grounded in your data, safe by construction.

LAYER 1 — HARDWARE

A data center that isn’t one

Five nodes on a standard office LAN, ~500 GB of total model memory, managed by our own Manthan cluster manager (auto-discovery, web-UI model lifecycle, self-healing systemd + watchdog, per-model ROI analytics). This is the reference fleet we run ourselves — proof of what office-grade hardware does; your fleet is sized to your workload. See the AI Factory Advisory tiers.

Node	Hardware	Serves
3× NVIDIA DGX Spark (GB10)	128 GB unified each · 240 W max	Deep reasoning at long context · multimodal + voice · embeddings + reranking
1× RTX PRO 6000 Blackwell server	96 GB + RTX 4070	Full multimodal (text/image/audio/video), streaming ASR
1× RTX 5060 laptop	8 GB	Roaming demo node

Inference-time engineering — speculative decoding (~88% acceptance), KV-cache compression, MoE expert offloading, prefix caching — is what makes 240 W boxes serve at interactive speed.

Right-sized by design

The unit of deployment is not a data center — it’s a box on a desk. A single node comfortably serves ~5 concurrent requests: enough for a 20–30 person office, or a small hospital where 3–4 doctors are processing patients at the same time. Start with one node for a department, add nodes as demand grows — the same management plane runs one box or a fleet. Each box is silent, plugs into a normal wall socket, needs no water and no special cooling — the office AC you already run is the entire thermal plan.

Scenario	Fleet draw	Annual cost
Typical serving	~300–500 W	~₹35,000 / yr (~$420)
Worst case, 24/7	~1.7 kW	~₹1.5 L / yr (~$1,800)
One 8×H100 server, for contrast	10.2 kW	~₹8.9 L + chilled-water cooling

Marginal cost: ~₹15–20 of electricity per million tokens, vs $2–15 per million on APIs. The whole fleet’s heat load is under one domestic AC unit. No liquid cooling, no raised floor, zero water (AI data centers evaporate ~1–2 L of water per kWh; we evaporate none).

LAYER 2 — SOFTWARE

The BiltIQ Platform

Three cores in one monorepo, compliance mode on_prem_required — no cloud egress, fail-closed, enforced in code, not policy.

MANTHAN · RAG + MEMORY

30+ format parsers (PDF/DOCX/XLSX/PPTX/images/nested ZIPs) with layout-aware chunking; hybrid retrieval (BM25F + dense + RRF, re-scored by a multimodal reranker) so answers are grounded with citations; five-layer memory from working memory through knowledge graph to an immutable audit trail.

AGENT OS

Teams of AI specialists defined in YAML, run in ReAct loops with budgets and circuit breakers; 30+ tools plus MCP (files, browser, code, vision, TTS/STT, SSH, IoT); CRM/ERP connectors syncing business systems into isolated per-project scopes; delivery where staff already work — Slack, WhatsApp, Teams, Telegram, email; Ed25519-signed agent-to-agent delegation; multi-tenant isolation at five layers.

PRIVACY FILTER

Presidio + NER detection extended for Indian identifiers (Aadhaar, PAN, mobile); HMAC pseudonymisation so originals never reach a model but records stay linkable; DPDP 2023, GDPR, HIPAA, CCPA validators scoring every document; a tamper-evident SHA-256 audit hash chain. Published as an MIT library on PyPI.

user/channel → privacy filter → injection scanner → [local vLLM model] → output validator → audit

How the layers connect

The software’s model registry maps every fleet endpoint by role — generation, reasoning, vision, medical, embedding, rerank — so Agent OS routes each task to the right-sized model automatically. One fleet key, health-checked endpoints, native systemd deployment end to end. The same build ships as SaaS or fully self-hosted.

03 — Proof

Proof, not promises

ATC Quest

LIVE · TRL 8

atcquest.com

Our AI-native enterprise LMS, live with paying customers: five specialised agents, deployable on-premise, cloud, or hybrid, with custom fine-tuning for institutions.

FK Partners

PRODUCTION

Brazil

An international client running our multi-agent RAG pipeline in production to generate and manage certification exam questions for Brazilian financial-market (ANBIMA) credentials.

ATC CommandCenter / LeadFlow

ALPHA

internal

Our AI-native business operating system: a CRM core with an AI agent wielding 57 tools across 15 modules, all on local vLLM.

Act Now

RETROFIT PATH

ERP platform

A horizontal ERP platform built to compete with Zoho and Odoo — the first showcase of the retrofit path: the moment our AI layer switches on, existing data and workflows become conversational, grounded, and automated, with no rewrite and no migration.

04 — Why This Wins

Why this wins

RAG + MCP shrink the model requirement

Facts come from your data, actions from tools; the model orchestrates, it doesn’t memorize the internet.

Unified-memory hardware shrinks the box requirement

128 GB mini-PCs and one workstation GPU replace 8-GPU servers and InfiniBand.

The privacy filter unlocks regulated data

Pseudonymised in, validated out, hash-chained audit.

The economics invert

A year of electricity costs less than a month of comparable API spend, and the marginal token is effectively free.

Your alpha compounds at home

Every conversation, document, and workflow enriches your knowledge graph and retrieval index on your hardware. Nothing feeds a vendor’s next model.

A complete AI factory — models, agents, retrieval, privacy, and audit — that fits in an office rack, cools with the AC you already own, and by construction cannot leak your data.

Want this on your premises?

Your Data. Your Premises. Your AI.

Want a formal architecture blueprint? Book a Discovery Sprint →Want to talk first? Book a Consultation →