Building a small RAG chatbot on my portfolio

Table of Contents

Assignment: Document how I built a small RAG chatbot on my portfolio.

Why RAG on a portfolio site?
#

A plain LLM does not know my projects, CV details, or how I want visitors to be answered. Retrieval Augmented Generation (RAG) fixes that by:

Storing my own text in small chunks
Turning chunks and the user’s question into embeddings (vectors)
Finding the most relevant chunks for each question
Sending those chunks to the model as context before it replies

The visitor still chats in natural language, but answers can be grounded in material I control.

What I built (high level)
#

Browser (Hugo site)
  → floating chat UI + conversation history (localStorage)
  → POST /chat with message history
Node API (server/)
  → embed user question
  → retrieve top chunks from rag-data/*.md
  → call OpenAI Chat Completions with CONTEXT + messages
  → return { "reply": "..." }

The API key never lives in the frontend — only in server/.env on the machine that runs the API.

Frontend: chat as part of the site
#

On the portfolio I added:

A hero input on the homepage (“Ask me anything”) that opens the same chat when you send a message
A fixed chat button (bottom-right) that opens a centered modal (assistant-style UI)
Multiple conversations saved in the browser (localStorage), with a sidebar to switch or start a New chat

The widget calls the backend only when ragApiUrl is set in Hugo config (config/_default/params.toml under [chat]). Without a URL, it falls back to a short static message so the site still works offline.

Knowledge base: `server/rag-data/`
#

RAG needs source documents. I keep them as Markdown files in server/rag-data/, for example portfolio.md, with sections about:

Who I am and what I focus on (backend, data, web)
How the bot should behave (concise, no invented facts)
Pointers to projects and contact

When I update my background or projects, I edit these files (or add new .md files). That is the source of truth the bot should prefer over guessing.

Backend RAG pipeline (`server/rag.js`)
#

At server startup the RAG module:

Reads all .md files in rag-data/
Splits text into chunks (configurable size and overlap)
Calls OpenAI /v1/embeddings (text-embedding-3-small by default)
Keeps chunks + vectors in memory for the running process

On each chat request:

The latest user message is embedded
Cosine similarity ranks chunks against the question
The top K chunks (default 5) are injected into the system prompt under a CONTEXT block
The full conversation is sent to chat completions

The system prompt tells the model to use CONTEXT for facts and to say it is unsure rather than inventing employers, grades, or project details.

How I connect the API key (same key for RAG + chat)
#

RAG uses OpenAI embeddings; answering uses chat completions. I use one OPENAI_API_KEY for both — configured only on the server.

Step by step (what I did):

Create key — At platform.openai.com/api-keys I create a secret key and save it in a password manager. I never paste it into Hugo or into this blog post.
Local env file — In server/ I run cp .env.example .env and set OPENAI_API_KEY=sk-.... Optional: OPENAI_MODEL=gpt-4o-mini, RAG_TOP_K=5.
Ignore secrets in Git — server/.env is in .gitignore; only .env.example is committed (placeholders, no real key).
Start API — npm install then npm start in server/. The first run embeds all files in server/rag-data/; that uses the same key as later chat requests.
Link the website — In config/_default/params.toml under [chat] I set ragApiUrl = "http://localhost:8788/chat" (production URL when deployed).
Test — I open /health on the API, then send a message in the portfolio chat and confirm the reply is grounded in my rag-data text.

The chat widget in the browser only receives data-chat-api-url from Hugo. Every call to OpenAI goes server → OpenAI, never browser → OpenAI.

If the key is wrong or missing, I get a clear error in the chat or a 503 from the API — not a silent failure.

What worked well
#

Simple stack: no separate vector database for a small knowledge set
Easy to extend: add a new .md file and restart the API to re-index
Clear separation: static site + small API service

Limitations
#

Index is rebuilt when the server starts — not a live multi-tenant vector DB
Quality depends on writing good knowledge files and chunk size
Embeddings + chat both cost API usage; long chats increase token use
RAG reduces hallucinations but does not remove them entirely

For the full integration story (endpoint contract, prompts, error handling), see An AI-driven application — calling an external LLM API.

Reflection
#

This assignment matched the course idea of RAG: retrieve first, then generate. For a portfolio, a lightweight in-process index was enough. A larger product might use Dify, Pinecone, or another hosted pipeline — the pattern stays the same.

Why RAG on a portfolio site?#

What I built (high level)#

Frontend: chat as part of the site#

Knowledge base: server/rag-data/#

Backend RAG pipeline (server/rag.js)#

How I connect the API key (same key for RAG + chat)#

What worked well#

Limitations#

Reflection#