Assignment: Document how I built a small RAG chatbot on my portfolio.
Why RAG on a portfolio site?#
A plain LLM does not know my projects, CV details, or how I want visitors to be answered. Retrieval Augmented Generation (RAG) fixes that by:
- Storing my own text in small chunks
- Turning chunks and the user’s question into embeddings (vectors)
- Finding the most relevant chunks for each question
- Sending those chunks to the model as context before it replies
The visitor still chats in natural language, but answers can be grounded in material I control.
What I built (high level)#
Browser (Hugo site)
→ floating chat UI + conversation history (localStorage)
→ POST /chat with message history
Node API (server/)
→ embed user question
→ retrieve top chunks from rag-data/*.md
→ call OpenAI Chat Completions with CONTEXT + messages
→ return { "reply": "..." }The API key never lives in the frontend — only in server/.env on the machine that runs the API.
Frontend: chat as part of the site#
On the portfolio I added:
- A hero input on the homepage (“Ask me anything”) that opens the same chat when you send a message
- A fixed chat button (bottom-right) that opens a centered modal (assistant-style UI)
- Multiple conversations saved in the browser (
localStorage), with a sidebar to switch or start a New chat
The widget calls the backend only when ragApiUrl is set in Hugo config (config/_default/params.toml under [chat]). Without a URL, it falls back to a short static message so the site still works offline.
Knowledge base: server/rag-data/#
RAG needs source documents. I keep them as Markdown files in server/rag-data/, for example portfolio.md, with sections about:
- Who I am and what I focus on (backend, data, web)
- How the bot should behave (concise, no invented facts)
- Pointers to projects and contact
When I update my background or projects, I edit these files (or add new .md files). That is the source of truth the bot should prefer over guessing.
Backend RAG pipeline (server/rag.js)#
At server startup the RAG module:
- Reads all
.mdfiles inrag-data/ - Splits text into chunks (configurable size and overlap)
- Calls OpenAI
/v1/embeddings(text-embedding-3-smallby default) - Keeps chunks + vectors in memory for the running process
On each chat request:
- The latest user message is embedded
- Cosine similarity ranks chunks against the question
- The top K chunks (default 5) are injected into the system prompt under a
CONTEXTblock - The full conversation is sent to chat completions
The system prompt tells the model to use CONTEXT for facts and to say it is unsure rather than inventing employers, grades, or project details.
How I connect the API key (same key for RAG + chat)#
RAG uses OpenAI embeddings; answering uses chat completions. I use one OPENAI_API_KEY for both — configured only on the server.
Step by step (what I did):
- Create key — At platform.openai.com/api-keys I create a secret key and save it in a password manager. I never paste it into Hugo or into this blog post.
- Local env file — In
server/I runcp .env.example .envand setOPENAI_API_KEY=sk-.... Optional:OPENAI_MODEL=gpt-4o-mini,RAG_TOP_K=5. - Ignore secrets in Git —
server/.envis in.gitignore; only.env.exampleis committed (placeholders, no real key). - Start API —
npm installthennpm startinserver/. The first run embeds all files inserver/rag-data/; that uses the same key as later chat requests. - Link the website — In
config/_default/params.tomlunder[chat]I setragApiUrl = "http://localhost:8788/chat"(production URL when deployed). - Test — I open
/healthon the API, then send a message in the portfolio chat and confirm the reply is grounded in myrag-datatext.
The chat widget in the browser only receives data-chat-api-url from Hugo. Every call to OpenAI goes server → OpenAI, never browser → OpenAI.
If the key is wrong or missing, I get a clear error in the chat or a 503 from the API — not a silent failure.
What worked well#
- Simple stack: no separate vector database for a small knowledge set
- Easy to extend: add a new
.mdfile and restart the API to re-index - Clear separation: static site + small API service
Limitations#
- Index is rebuilt when the server starts — not a live multi-tenant vector DB
- Quality depends on writing good knowledge files and chunk size
- Embeddings + chat both cost API usage; long chats increase token use
- RAG reduces hallucinations but does not remove them entirely
For the full integration story (endpoint contract, prompts, error handling), see An AI-driven application — calling an external LLM API.
Reflection#
This assignment matched the course idea of RAG: retrieve first, then generate. For a portfolio, a lightweight in-process index was enough. A larger product might use Dify, Pinecone, or another hosted pipeline — the pattern stays the same.