Skip to main content

An AI-driven application — calling an external LLM API

·1307 words·7 mins
Jonathan Kudsk
Author
Jonathan Kudsk
Datamatiker student · Backend, data & web

Assignment (5+6): Document an AI-driven application that calls an external API with an LLM.

This post describes the portfolio assistant: not a generic chat demo, but a real integration embedded in my Hugo site with backend, prompts, RAG, and structured error handling.

Application purpose
#

Visitors can ask questions about me, my work, and how to get in touch. The app:

  • Accepts multi-turn chat in the browser
  • Sends conversation history to my backend
  • The backend calls OpenAI (external LLM API)
  • Optionally augments prompts with retrieved context (RAG)
  • Returns a text reply displayed in the chat UI

That is an AI-driven feature inside a larger product (the portfolio), which matches the course definition of an LLM as a software component accessed over HTTP.

Architecture
#

┌─────────────────┐     POST /chat          ┌──────────────────┐
│  Hugo (static)  │  JSON: messages,       │  Node (Express)  │
│  chat widget    │  conversationId        │  server/         │
└────────┬────────┘ ──────────────────────►└────────┬─────────┘
         │                                            │
         │  localStorage: conversations               │  retrieveContext()
         │                                            │  system + messages
         │                                            ▼
         │                                   ┌──────────────────┐
         │                                   │  OpenAI API      │
         │                                   │  embeddings +    │
         │                                   │  chat completions│
         └◄──────── { "reply": "..." } ─────└──────────────────┘

Rule from the course: the API key stays on the backend only (OPENAI_API_KEY in server/.env).

How I set up the OpenAI API key
#

There is no separate “RAG API key”. I use one OpenAI API key on the server for both embeddings (retrieval) and chat completions (answers). The browser never sees that key.

1. Create the key at OpenAI
#

  1. I log in at platform.openai.com.
  2. I open API keys (under my account / dashboard).
  3. I click Create new secret key, give it a name (e.g. portfolio-chat), and copy the value once — it is only shown at creation time.
  4. I keep billing/usage limits in mind on the same account; both embedding and chat calls count toward usage.

If the course or a future employer uses Azure OpenAI or another provider, the idea is the same: a secret on the server, and optionally OPENAI_BASE_URL in .env for a compatible endpoint.

2. Store the key only in server/.env
#

I do not put the key in Hugo, JavaScript, or Git.

  1. In the repo I go to the server/ folder.
  2. I copy the template: cp .env.example .env
  3. I edit .env and set:
OPENAI_API_KEY=sk-proj-...your-key-here...
OPENAI_MODEL=gpt-4o-mini
PORT=8788

Optional variables I can add later:

  • OPENAI_EMBEDDING_MODEL=text-embedding-3-small — for RAG vectors
  • SYSTEM_PROMPT=... — override the assistant’s system message
  • CORS_ORIGIN=https://portfolio.kudskprogramming.dk — restrict browser origins in production

The file server/.env is listed in .gitignore, so it is not pushed to GitHub. Only server/.env.example (without a real key) is committed as documentation for myself and for anyone cloning the repo.

3. Start the backend
#

From server/:

npm install
npm start

The API listens on port 8788 by default. On startup I watch the terminal: if RAG is enabled, I see a log line that chunks from rag-data/ were indexed. If the key is missing, POST /chat returns 503 with a message to configure .env — that is intentional so I notice misconfiguration early.

I sanity-check with:

GET http://localhost:8788/health

A healthy response includes "modelConfigured": true when OPENAI_API_KEY is set.

4. Point the Hugo site at the API
#

In config/_default/params.toml I configure the public URL (no secret here):

[chat]
ragApiUrl = "http://localhost:8788/chat"
showContactLinkWithRag = false

For production I change this to my deployed API, for example:

ragApiUrl = "https://api.kudskprogramming.dk/chat"

Hugo writes that value into the chat widget as data-chat-api-url. When I build the site (hugo or deploy), the static HTML knows where to send chat requests — but still not how to authenticate to OpenAI; that happens only inside Node using .env.

5. Run the portfolio and test
#

  1. I keep npm start running in server/.
  2. I run the site locally (hugo server or open the built public/ site).
  3. I open the homepage, click the chat button, and send a short question.
  4. The UI shows a typing indicator, then an assistant reply. In the network tab I see POST to /chat on my API, not to api.openai.com from the browser.

If I stop the API or use a wrong key, the widget shows an error message in the chat instead of failing silently — that matches how I want visitors to experience outages.

What I deliberately avoid
#

MistakeWhat I do instead
Key in params.toml or front matterKey only in server/.env
Key in jk-chat-widget.jsFrontend only knows ragApiUrl
Committing .env.gitignore + example file without secrets
Sharing keys in blog posts or MoodleDescribe the steps, never paste the real key

My API endpoint
#

POST /chat

Request body (simplified):

{
  "conversationId": "uuid-from-browser",
  "messages": [
    { "role": "user", "content": "What do you work with?" },
    { "role": "assistant", "content": "..." }
  ]
}

Response on success:

{
  "reply": "Assistant message text"
}

Response on failure:

{
  "error": "Human-readable error message"
}

The frontend uses fetch, shows a typing indicator, disables input while waiting, and surfaces API errors as assistant messages so the user is not stuck on a blank screen.

Prompts
#

System prompt (role and rules)
#

Defined in code with an optional override via SYSTEM_PROMPT in .env. Default behaviour:

  • Assistant for Jonathan Kudsk’s portfolio
  • Prefer CONTEXT from RAG when answering factual questions
  • Do not invent CV or project facts
  • Suggest the contact form for serious inquiries

User / assistant messages
#

The conversation history from the browser is forwarded as OpenAI messages (user and assistant turns). The latest user message also drives RAG retrieval (embedding + top chunks).

This follows the course split:

  • System = role, limits, output behaviour
  • User/assistant turns = dialogue and task data

External LLM calls
#

The backend uses two OpenAI endpoints:

CallPurpose
POST /v1/embeddingsRAG: chunk index + query embedding
POST /v1/chat/completionsGenerate the reply (gpt-4o-mini by default)

Model and base URL are configurable (OPENAI_MODEL, OPENAI_BASE_URL) for other providers that expose a compatible API.

Configuration on the static site
#

The Hugo side only needs the backend URL (see How I set up the OpenAI API key above). If ragApiUrl is left empty, the chat still opens but uses a short offline fallback text instead of calling the LLM — useful while I work on the site without spending API credits.

Error handling (course checklist)
#

Implemented at a basic but real level:

  • Missing API key503 with clear message to configure .env
  • OpenAI errors → parsed and returned as { "error": "..." }
  • Network / timeout → frontend abort after ~55s, user sees retry guidance
  • Invalid JSON from model provider502 with short detail
  • Empty reply → treated as error

Not yet implemented (possible extensions): retries with backoff, job queue for long tasks, structured JSON output like the assignment assessor exercise.

Testing
#

Manual tests I run:

  1. GET /health on the API — key configured, RAG enabled
  2. Open portfolio, send a question — reply uses portfolio tone
  3. Ask something not in rag-data — model should hesitate or say it is unsure
  4. Stop the API — frontend shows error path, not a crash
  5. New chat + switch chats — history in localStorage persists per conversation

Relation to the larger exam project
#

The course also describes a separate exercise: assess a student report with a rubric and return structured JSON (overallAssessment, criteriaFeedback, etc.). That is a different endpoint and prompt design (POST /api/assess).

This portfolio app focuses on visitor chat + RAG. The integration patterns are the same: backend-owned key, system/user prompts, validate and handle failures.

Reflection
#

Building this app reinforced that “AI-driven” means engineering around the model: boundaries, secrets, retrieval, UX while waiting, and honest limits when context is missing. The LLM is one service in the stack — the product is the full path from button click to grounded answer.

See also: RAG chatbot post for retrieval details and the same key setup from a RAG-focused angle.

More detail on env variables lives in server/README.md in the repository.