[LLM Build] A Tiny Context-Aware Q&A Bot

by Mamta Upadhyay
Posted On June 22, 2025

Why This Series?

Until now, this blog has focused heavily on AI security, red teaming and governance. But increasingly, I have found myself helping others get started with building as well. Whether it’s small projects, agents or LLM applications, many security-minded professionals ask how to build responsibly. I believe the answer is clear: To secure AI systems well, you must understand how they are built.

So I am expanding the scope of this blog to include LLM Build content alongside security. We will start with lightweight apps to keep things simple. These are small, beginner-friendly projects you can run, extend or fork. Most builds will be cost-efficient or completely free using tools like OpenAI or Ollama. Wherever possible, I will highlight security and red team implications so you can continue sharpening your defensive instincts while learning how to build.

What Are We Building?

To kick things off, we are starting with a simple, context-aware Q&A bot. This project lets you upload a plain text file and ask questions about it. The LLM reads your question, combines it with the reference material you provided and responds with a relevant answer. It’s an easy way to simulate Retrieval-Augmented Generation (RAG) workflows without the need for agents, databases or external infrastructure.

This type of setup is useful in many practical scenarios, such as:

✔ Quickly querying policy documents or handbooks
✔ Building prototypes for internal knowledge bots
✔ Simulating how memory and context influence model behavior

Let’s dive into two versions: one using OpenAI’s API and another fully local and free using Ollama.

What This Bot Does

✔ Takes a .txt file of your choosing
✔ Accepts a user’s question
✔ Injects the file content as context into a prompt
✔ Returns a relevant answer using an LLM

Cost Notes on OpenAI

For OpenAI, you will need your own API key. It works on the free tier and usage counts against your quota.

Quick Start: Run It Yourself

Don’t have time to follow along? No problem. You can just use the code from the GitHub repo below and get started quickly. Then move on to “Example Runs” section.

👉 GitHub: https://github.com/m-pentest/llm-tiny-qa-bot

The repository includes both the OpenAI and Ollama versions, along with a sample reference.txt file and README documentation. If using OpenAI, do read the “Cost Notes” below. You will need to pip install (pip3 if using python3) openai for using this:

mamta@ai-labs:$ pip3 install openai

If using Ollama, make sure that Ollama is installed and running locally. See “setup instructions for Ollama”.

Manual Method

Option 1: OpenAI Version (for API users)

Install OpenAI

mamta@ai-labs:$ pip3 install openai

File Structure

mamta@ai-labs:$ tree
.
└── mini_build_qa_openai
    ├── qa_bot_openai.py   # OpenAI-based script
    └── reference.txt      # Your custom content

The Code

import os
from openai import OpenAI

# Load OpenAI key (set this in your environment or hardcode for testing)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

with open("reference.txt", "r", encoding="utf-8") as f:
    context = f.read()

question = input("Ask your question: ")

prompt = f"""
Use the following context to answer the question:

{context}

Q: {question}
A:
"""

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": prompt}
    ]
)

print("\nAnswer:\n", response.choices[0].message.content)

Option 2: Free, Fully Local Version Using Ollama

Want to run this completely locally for free? Let’s use Ollama, a powerful open-source tool that lets you run LLMs on your own machine. Just make sure your computer has:

✔ macOS, Linux, or Windows WSL2
✔ At least 8 to 16 GB RAM recommended depending on the model

Setup Instructions for Ollama

Open your terminal and run:

mamta@ai-labs:$ curl -fsSL https://ollama.com/install.sh | sh

(For Windows: Download the installer from ollama.com)

Now pull the model. Start with a small and fast model like llama3:

mamta@ai-labs:$ ollama pull llama3

Make sure Ollama is running locally:

mamta@ai-labs:$ ollama run llama3

Here is your Python script for using Ollama locally:

import requests
import json

def ask_ollama(context, question):
    prompt = f"""Use the following context to answer the question:\n\n{context}\n\nQ: {question}\nA:"""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": "llama3", "prompt": prompt},
        stream=True
    )

    output = ""
    for line in response.iter_lines():
        if line:
            try:
                data = json.loads(line.decode("utf-8"))
                if "response" in data:
                    output += data["response"]
            except Exception as e:
                print("Failed to parse line:", line)
                print("Error:", e)

    return output

with open("reference.txt", "r", encoding="utf-8") as f:
    context = f.read()

question = input("Ask your question: ")
answer = ask_ollama(context, question)
print("\nAnswer:\n", answer)

Save it as qa_bot_ollama.py.

Folder Structure

mamta@ai-labs:$ tree
.
└── mini_build_qa_ollama
    ├── qa_bot_ollama.py
    └── reference.txt

Example Runs

In this example, we will use a reference.txt file that contains content from my blog post on memory poisoning in agentic LLMs. You can read the full post here: Memory Poisoning in Agentic LLMs

mamta@ai-labs:$ cat reference.txt
In agentic LLMs, memory is a persistence layer attackers can quietly poison for long-term control. Unlike prompt injection, memory poisoning works over time by embedding misleading content that re-emerges in future steps. Memory poisoning attacks can target both short-term working memory and long-term memory stores. If not properly managed, poisoned memory may influence decision-making even after the original attacker input is gone. This creates stealthy security risks because the system continues operating with false assumptions built into its own memory.

Sample Run with OpenAI

mamta@ai-labs:$ python qa_bot_openai.py
Ask your question: What are memory poisoning attacks?

Answer:
Memory poisoning attacks involve inserting misleading content into both short-term and long-term memory stores of agentic LLMs. This persistence means the model can continue operating with false assumptions long after the initial attack.

Sample Run with Ollama (Local)

mamta@ai-labs:$ python qa_bot_ollama.py
Ask your question: How does poisoned memory affect future decisions?

Answer:
Poisoned memory can lead the system to make decisions based on false assumptions, long after the original attacker input is gone. This stealth makes it particularly risky in agentic workflows.

Bonus Use Case: Local Red Team Testing

This setup is a great base for later use cases:

✔ Testing prompt injection attacks
✔ Simulating memory poisoning
✔ Evaluating local LLM performance for security applications

Wrap

This foundational tiny QA bot is clean, modifiable and designed to get you thinking like a builder. Whether you are just getting started or testing ideas for more secure LLM systems, it offers a real, runnable base you can learn from and extend. As we explore the boundaries of AI security and safety, knowing how things are built is essential. Because in AI, just like in security:

You can’t break it well if you don’t know how it works.

Discover more from The Secure AI Blog

Subscribe to get the latest posts sent to your email.

A simple, context-aware QA bot that runs locally or with OpenAI. Perfect for beginners exploring LLM builds and RAG workflows.

Share this:
Email
LinkedIn
X
Like this:
Like Loading…

[LLM Build] Building your first AI Agent

[LLM Build] A Tiny Context-Aware Q&A Bot

Why This Series?

What Are We Building?

What This Bot Does

Cost Notes on OpenAI

Quick Start: Run It Yourself

Manual Method

Option 1: OpenAI Version (for API users)

Option 2: Free, Fully Local Version Using Ollama

Setup Instructions for Ollama

Folder Structure

Example Runs

Sample Run with OpenAI

Sample Run with Ollama (Local)

Bonus Use Case: Local Red Team Testing

Wrap

Like this:

Related

Discover more from The Secure AI Blog

Like this:

Leave a ReplyCancel reply

[LLM Build] A Tiny Context-Aware Q&A Bot

Why This Series?

What Are We Building?

What This Bot Does

** Cost Notes on OpenAI **

Quick Start: Run It Yourself

Manual Method

Option 1: OpenAI Version (for API users)

Option 2: Free, Fully Local Version Using Ollama

Setup Instructions for Ollama

Folder Structure

Example Runs

Sample Run with OpenAI

Sample Run with Ollama (Local)

Bonus Use Case: Local Red Team Testing

Wrap

Share this:

Like this:

Related

Discover more from The Secure AI Blog

Share this:

Like this:

[LLM Build] Building your first AI Agent

Leave a ReplyCancel reply

Discover more from The Secure AI Blog

Cost Notes on OpenAI