Skip to main content

RAG workflow automation — keeping the bot aligned with my portfolio

·703 words·4 mins
Jonathan Kudsk
Author
Jonathan Kudsk
Datamatiker student · Backend, data & web

Assignment: Write about an automated workflow that updates a RAG bot on Dify so it always reflects the content on my portfolio.

What the assignment is asking for
#

The goal is not a one-off upload of documents. It is a pipeline:

Portfolio content changes
  → workflow detects or is triggered
  → knowledge base is refreshed
  → chatbot answers from up-to-date material

On Dify, that usually means:

  • A knowledge dataset connected to your bot
  • Documents ingested (often from files or URLs)
  • A workflow or integration that re-syncs when the site changes — for example after deploy, on a schedule, or via webhook

Without automation, the bot slowly becomes wrong every time you edit projects or bio text.

How I think about “sync” for my own site
#

My portfolio is a Hugo static site. The assistant I ship uses a custom Node API (server/) with Markdown in server/rag-data/ instead of Dify for production chat. The workflow idea is the same as on Dify:

StepDify-styleMy portfolio API
Source of truthSite pages / exportsrag-data/*.md (+ site content I copy in)
TriggerWorkflow / webhook / scheduleServer restart after deploy, or manual edit + restart
ProcessChunk + embed + store in datasetrag.js chunks + OpenAI embeddings at startup
ConsumeDify app / widgetPOST /chat from the Hugo chat widget

So the automation problem is: when portfolio facts change, how do we refresh what the bot knows without manual copy-paste every time?

Workflow I use today (practical)
#

  1. Edit portfolio content — projects in content/projects/, bio on contact/home, etc.
  2. Update knowledge files — mirror important facts into server/rag-data/ (e.g. new project summary, skills, FAQ).
  3. Redeploy or restart the chat API — on start, rag.js rebuilds the embedding index and logs how many chunks were indexed. The server reads OPENAI_API_KEY from server/.env again on each restart (I never put the key in the workflow script or in Git).
  4. Verify with a few questions in the chat UI (“What projects do you have?”, “How can I contact you?”).

API key setup (OpenAI account → .envragApiUrl in Hugo) is described in detail in my AI-driven application post.

That is a manual but repeatable workflow. It is honest for a student portfolio and keeps the RAG layer understandable.

Toward fuller automation (Dify or custom)
#

These are the next steps I would document in a Dify setup or extend on my server:

Option A — Dify + portfolio (course tool)
#

  1. Export or crawl portfolio URLs after each hugo deploy.
  2. Trigger a Dify knowledge sync (API or built-in workflow).
  3. Point the Dify chat app at the updated dataset.
  4. Embed or link the Dify widget if you do not use a custom backend.

Option B — Custom API + CI
#

  1. On Git push / deploy, a script generates rag-data/*.md from Hugo content (templates or a small extractor).
  2. CI calls POST /reindex on the chat API (endpoint to add) or restarts the service.
  3. No separate Dify host — one pipeline from repo to embeddings.
git push → build site → generate rag-data → restart API / reindex

Either option satisfies the spirit of the assignment: the bot’s knowledge tracks the portfolio instead of drifting.

Risks and design choices
#

  • Stale chunks: If you only update the website but not rag-data, RAG will confidently cite old text. Automation must touch the same store the retriever reads.
  • Over-syncing: Re-embedding everything on every tiny edit costs time and API money; hash-based “only re-index if content changed” is a good production pattern (also covered in course material).
  • Two bots: Running both Dify and a custom API is fine for learning, but visitors should use one clear entry point on the live site.

Reflection
#

The assignment frames Dify as the product for workflow automation. I implemented the same architectural idea on my own stack so the live portfolio chat and my exam documentation stay aligned. The important learning outcome is the pipeline mindset: treat knowledge as data that must be versioned, triggered, and refreshed — not as a one-time PDF upload.

If I add Dify later, I would use it for orchestration and keep this post updated with screenshots of the actual workflow nodes and triggers.