Most businesses asking about AI integration assume they need a complete rebuild. In reality, if you already have a working API, adding LLM capabilities is often a matter of adding three things: a function-calling schema, an embedding pipeline, and a prompt management layer.
At Niobotics, based in Leigh, Manchester, we specialise in retrofitting existing systems for AI use — without breaking what already works.
Step 1 — Define your function-calling schema
GPT-5 and Claude support tool use (formerly function calling) — the model can call your API endpoints as if they were functions. To enable this, you provide the LLM with a JSON schema describing what each endpoint does, what parameters it accepts, and what it returns.
For example, a product search endpoint becomes:
searchProducts(query: string, maxResults: number, category?: string) → Product[]
The LLM calls this automatically when a user asks "find me red trainers under £50".
We build these schemas to match your existing OpenAPI spec, so the same documentation that describes your API to developers also describes it to AI models.
Step 2 — Build a RAG pipeline
Retrieval-Augmented Generation (RAG) lets your AI answer questions based on your own data — not just training data. The architecture is:
- Your documents/data are chunked and converted to vector embeddings using the OpenAI Embeddings API
- Embeddings are stored in pgvector — a PostgreSQL extension for vector similarity search
- When a user asks a question, the query is embedded and the most semantically similar chunks are retrieved
- Those chunks are injected into the LLM's context window as grounding data
- The LLM answers based on your data, not hallucinated training data
We implement this as a set of versioned API endpoints that sit alongside your existing routes.
Step 3 — Prompt injection defence
When users can influence what goes into an LLM prompt, they can attempt prompt injection — trying to override your system instructions. We implement:
- Input sanitisation to strip known injection patterns
- Separate system and user message contexts that can't be overridden
- Output validation to check the LLM's response matches expected schemas before returning it to users
- Rate limiting on AI endpoints to prevent abuse
Step 4 — Context window management
LLMs have finite context windows. For applications with long conversation histories or large document sets, naive approaches quickly hit token limits. We implement:
- Message summarisation to compress old conversation turns
- Semantic retrieval to only inject the most relevant RAG chunks
- Token counting middleware that trims context before it exceeds the model's limit
Cost
Adding LLM integration to an existing API at Niobotics costs around £100 — including the function-calling schema, RAG pipeline, prompt injection defences, and streaming response handler. Free maintenance is included.
Niobotics Ltd — Leigh, Manchester, WN7 1BW
LLM integration from around £100. team@lugbook.com
Add AI to your existing system
Free consultation. Around £100. We retrofit, we don't rebuild unnecessarily.
Get a free quote →