AI Explainer - Part 3: AI Applications
In conversations with clients, we've realised that the list on AI terms commonly used today by specialists can be overwhelming. This series of explainers is our attempt to fix that. We've compiled straight-forward definitions for the terms that come up most often, written for business people rather than engineers. This is Part 3 in an ongoing series to help make this topic more accessible. In this last edition, we'll cover terms you would likely hear in conversations about AI applications.
If you want to go back, you can find Parts 1 and 2 here:
AI Applications
Prompt Engineering
The practice of crafting and refining the instructions you give an AI model to get more accurate and reliable outputs. It sounds simple, but there is a real skill to it, and in live applications it makes an enormous difference to the quality of responses you get from the AI.
At its most basic, prompt engineering is about being specific. Telling a model who it is, what it's trying to achieve, what format to respond in, and what to avoid will consistently give better results than a vague one-line instruction. But it goes further than that. Experienced prompt engineers think about how to structure complex tasks, how to handle edge cases, how to stop the model going off-piste, and how to get consistent results across thousands of runs rather than just one.
In a business context, prompt engineering is baked into almost every AI product. The instructions sitting invisibly behind a customer service bot, an internal knowledge assistant, or an automated document processor are all the result of careful prompt work.
Evaluation (Evals)
This is the automated and repeatable process of testing how well an AI system performs. In traditional software, testing is relatively straightforward: the code either does what it's supposed to or it doesn't. Testing in AI is less straightforward. As AI responses are probabilistic, outputs vary, and "correct" is more likely a matter of judgement. Evals are how you bring rigour to that problem.
A good evaluation framework defines what good looks like for your specific use case, it builds a set of test cases that cover the range of things the system needs to handle, and scores outputs against those criteria consistently. Things you may need to validate against include factual accuracy, assessing tone, verifying that the model stays on topic, or confirming that it handles tricky edge cases sensibly.
Evals matter most when you are making changes. Swapping to a different model, updating your prompts, or adding new data to your knowledge base can all shift performance in unexpected ways. Without Evals you find out something broke when a user complains. With them, you catch it before it ships.
Agents
AI systems that can take actions independently in order to complete a multi-step task, rather than just responding to a single question. An agent can plan a sequence of steps, use digital tools, check its own work, and keep going until the job is done.
A simple agent might be given a task like "research our top three competitors and summarise their pricing pages." It would search the web, visit the relevant pages, pull out the information, and write up the summary, all without a human directing each step. A more sophisticated agent might be integrated into business systems, filing reports, updating records, sending notifications, and escalating exceptions as part of a fully automated workflow.
Agents are one of the most exciting and fastest-moving areas in AI right now. They are also one of the most important to get right. An agent that can take real actions in real systems needs appropriate oversight, clear boundaries, and solid error handling. The potential for mistakes or for unintended destructive actions scales with the level of autonomy you give it.
RAG (Retrieval-Augmented Generation)
A technique that gives an AI model access to a specific body of knowledge at the point it generates a response. Rather than relying solely on what it learned during training, the model first retrieves relevant information from a document set or database, pulls it into the context, and uses it to inform its answer.
A foundation model trained on general internet data doesn't know anything about your business, such as internal policies, your product catalogue or your client history. RAG closes that gap without the cost and complexity of fine-tuning. You maintain a knowledge base, the system fetches what's relevant when a question comes in, and the model answers based on your information rather than providing general or hallucinated information.
It's the architecture behind most enterprise AI assistants you'll encounter today. Ask it about your company's expenses policy and it retrieves the right document. Ask it about a specific client account and it pulls the relevant records. The model does the processing but the RAG ensures it has the right (relevant) information.
Embedding
A way of converting text or other types of data into lists of numbers that capture meaning in a form a machine can work with. The key property is that similar meanings produce similar numbers, which means you can find related content mathematically rather than by matching exact words.
This is what powers semantic search, the kind where you type a question in your own words and get back relevant results even if the documents never use those exact words. It's also the engine underneath RAG systems. When a question comes in, it gets converted to an embedding, the system finds the stored content with the closest matching embeddings, and that content gets passed to the model as context.
Orchestration
The coordination layer that connects models, tools, data sources, and agents into a coherent workflow. Individual AI components, a language model, a search tool, a database, an external API, are useful in isolation, but most real business applications require them to work together in sequence. Orchestration is what makes that happen.
A customer onboarding workflow might use orchestration to pull a prospect's details from a CRM, generate a personalised welcome document, send it via email, and log the interaction, with different tools handling each step and a central layer managing the flow between them. None of that happens by accident.
Guardrails
Rules, filters, and checks applied around an AI model to keep its outputs safe, on-brand, and fit for purpose. Models are capable and flexible, but left entirely to their own devices they can go off-topic, produce inappropriate content, leak sensitive information, or simply behave inconsistently. Guardrails are how you prevent that.
They can operate at different levels. Some are baked into the model itself by the provider, broad safety constraints that apply to everyone. Others are applied at the application level by the team building the product, controlling things like tone, topic boundaries, response format, and what the model is and isn't allowed to discuss. A customer-facing AI assistant for a financial services firm, for example, would have tight guardrails preventing it from giving specific investment advice or straying outside its defined scope.
Multimodal
Describes AI models that can work across more than one type of input or output such as text, images, audio, video, or structured data, rather than being limited to a single format.
Early language models only handled text. Modern frontier models can accept an image and answer questions about it, transcribe spoken audio, interpret a chart, read a scanned document, or generate an image from a written description. Some can do several of these in combination, analysing a photograph and producing a written report about it, for example, or taking a voice note and turning it into a formatted summary.
Wrap Up
The terms covered in this edition are likely to be used when discussing the application of AI models in common business use cases. Along with Parts 1 and 2, this session should help business owners and non-technical stakeholders partake in discussions around how AI may be used in practice.