[LLM Build] A Tiny Context-Aware Q&A Bot
© 2025 Mamta Upadhyay. This article is the intellectual property of the author. No part may be reproduced without permission
Why This Series?
Until now, this blog has focused heavily on AI security, red teaming and governance. But increasingly, I have found myself helping others get started with building as well. Whether it’s small projects, agents or LLM applications, many security-minded professionals ask how to build responsibly. I believe the answer is clear: To secure AI systems well, you must understand how they are built.
So I am expanding the scope of this blog to include LLM Build content alongside security. We will start with lightweight apps to keep things simple. These are small, beginner-friendly projects you can run, extend or fork. Most builds will be cost-efficient or completely free using tools like OpenAI or Ollama. Wherever possible, I will highlight security and red team implications so you can continue sharpening your defensive instincts while learning how to build.
What Are We Building?
To kick things off, we are starting with a simple, context-aware Q&A bot. This project lets you upload a plain text file and ask questions about it. The LLM reads your question, combines it with the reference material you provided and responds with a relevant answer. It’s an easy way to simulate Retrieval-Augmented Generation (RAG) workflows without the need for agents, databases or external infrastructure.
This type of setup is useful in many practical scenarios, such as:
✔ Quickly querying policy documents or handbooks
✔ Building prototypes for internal knowledge bots
✔ Simulating how memory and context influence model behavior
Let’s dive into two versions: one using OpenAI’s API and another fully local and free using Ollama.
What This Bot Does
✔ Takes a .txt file of your choosing
✔ Accepts a user’s question
✔ Injects the file content as context into a prompt
✔ Returns a relevant answer using an LLM
** Cost Notes on OpenAI **
For OpenAI, you will need your own API key. It works on the free tier and usage counts against your quota.
Quick Start: Run It Yourself
Don’t have time to follow along? No problem. You can just use the code from the GitHub repo below and get started quickly. Then move on to “Example Runs” section.
👉 GitHub: https://github.com/m-pentest/llm-tiny-qa-bot
The repository includes both the OpenAI and Ollama versions, along with a sample reference.txt file and README documentation. If using OpenAI, do read the “Cost Notes” below. You will need to pip install (pip3 if using python3) openai for using this:
mamta@ai-labs:$ pip3 install openai
If using Ollama, make sure that Ollama is installed and running locally. See “setup instructions for Ollama”.
Manual Method
Option 1: OpenAI Version (for API users)
Install OpenAI
mamta@ai-labs:$ pip3 install openai
File Structure
mamta@ai-labs:$ tree
.
└── mini_build_qa_openai
├── qa_bot_openai.py # OpenAI-based script
└── reference.txt # Your custom content
The Code
import os
from openai import OpenAI
# Load OpenAI key (set this in your environment or hardcode for testing)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
with open("reference.txt", "r", encoding="utf-8") as f:
context = f.read()
question = input("Ask your question: ")
prompt = f"""
Use the following context to answer the question:
{context}
Q: {question}
A:
"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": prompt}
]
)
print("\nAnswer:\n", response.choices[0].message.content)
Option 2: Free, Fully Local Version Using Ollama
Want to run this completely locally for free? Let’s use Ollama, a powerful open-source tool that lets you run LLMs on your own machine. Just make sure your computer has:
✔ macOS, Linux, or Windows WSL2
✔ At least 8 to 16 GB RAM recommended depending on the model
Setup Instructions for Ollama
Open your terminal and run:
mamta@ai-labs:$ curl -fsSL https://ollama.com/install.sh | sh (For Windows: Download the installer from ollama.com)
Now pull the model. Start with a small and fast model like llama3:
mamta@ai-labs:$ ollama pull llama3
Make sure Ollama is running locally:
mamta@ai-labs:$ ollama run llama3
Here is your Python script for using Ollama locally:
import requests
import json
def ask_ollama(context, question):
prompt = f"""Use the following context to answer the question:\n\n{context}\n\nQ: {question}\nA:"""
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": "llama3", "prompt": prompt},
stream=True
)
output = ""
for line in response.iter_lines():
if line:
try:
data = json.loads(line.decode("utf-8"))
if "response" in data:
output += data["response"]
except Exception as e:
print("Failed to parse line:", line)
print("Error:", e)
return output
with open("reference.txt", "r", encoding="utf-8") as f:
context = f.read()
question = input("Ask your question: ")
answer = ask_ollama(context, question)
print("\nAnswer:\n", answer)
Save it as qa_bot_ollama.py.
Folder Structure
mamta@ai-labs:$ tree
.
└── mini_build_qa_ollama
├── qa_bot_ollama.py
└── reference.txt
Example Runs
In this example, we will use a reference.txt file that contains content from my blog post on memory poisoning in agentic LLMs. You can read the full post here: Memory Poisoning in Agentic LLMs
mamta@ai-labs:$ cat reference.txt In agentic LLMs, memory is a persistence layer attackers can quietly poison for long-term control. Unlike prompt injection, memory poisoning works over time by embedding misleading content that re-emerges in future steps. Memory poisoning attacks can target both short-term working memory and long-term memory stores. If not properly managed, poisoned memory may influence decision-making even after the original attacker input is gone. This creates stealthy security risks because the system continues operating with false assumptions built into its own memory.
Sample Run with OpenAI
mamta@ai-labs:$ python qa_bot_openai.py Ask your question: What are memory poisoning attacks? Answer: Memory poisoning attacks involve inserting misleading content into both short-term and long-term memory stores of agentic LLMs. This persistence means the model can continue operating with false assumptions long after the initial attack.
Sample Run with Ollama (Local)
mamta@ai-labs:$ python qa_bot_ollama.py Ask your question: How does poisoned memory affect future decisions? Answer: Poisoned memory can lead the system to make decisions based on false assumptions, long after the original attacker input is gone. This stealth makes it particularly risky in agentic workflows.
Bonus Use Case: Local Red Team Testing
This setup is a great base for later use cases:
✔ Testing prompt injection attacks
✔ Simulating memory poisoning
✔ Evaluating local LLM performance for security applications
Wrap
This foundational tiny QA bot is clean, modifiable and designed to get you thinking like a builder. Whether you are just getting started or testing ideas for more secure LLM systems, it offers a real, runnable base you can learn from and extend. As we explore the boundaries of AI security and safety, knowing how things are built is essential. Because in AI, just like in security:
You can’t break it well if you don’t know how it works.
Related
Discover more from The Secure AI Blog
Subscribe to get the latest posts sent to your email.
A simple, context-aware QA bot that runs locally or with OpenAI. Perfect for beginners exploring LLM builds and RAG workflows.