Version: 0.1.0

Serve a RAG Chatbot with OpenWebUI

Put OpenWebUI in front of the NuFi LLM Serving to get a ChatGPT-style chat screen and document-grounded answers (RAG) without writing any extra code. NuFi Serving provides an OpenAI-compatible API (/v1/chat/completions), so OpenWebUI can connect directly, and document search uses the Knowledge feature built into OpenWebUI.

Composition

Serving: qwen-instruct-tutorial-v1-original   ← OpenAI-compatible LLM backend (vLLM, deployed as a prerequisite)
  └─ /v1/chat/completions

Volume: openwebui-volume                      ← OpenWebUI config / user DB + uploaded docs + built-in index storage
  └─ /app/backend/data

Serving: tutorial-openwebui                   ← External exposure point
  └─ ghcr.io/open-webui/open-webui:main
  └─ Connects the LLM Serving as an OpenAI-compatible backend
  └─ Upload documents to a Knowledge collection → reference with # in chat

OpenWebUI stores user accounts, conversation history, uploaded documents, and the built-in embedding index all under the container's /app/backend/data. Mount a NuFi Volume so the data persists across Pod restarts.

Prerequisites

A vLLM-based LLM Serving is Running per the prerequisites of Build a RAG Chatbot in Lab.
Know the model ID used when deploying the LLM Serving (the value after --model in the Serving detail Command/Args)
1–2 English text documents for testing (.md, .txt, .pdf, etc.)

Naming conventions used in this guide

Values shown in environment variables and URLs must be replaced with values from your own environment. The main locations:

Project namespace — Check the project namespace area of the top header in the NuFi dashboard. This guide uses rag as the namespace in examples.
LLM Serving name — The name shown in the Development > Serving list. At Quick Deploy time it's auto-generated in the form <model-name>-<version>-<artifact>. This guide uses qwen-instruct-tutorial-v1-original prepared as a prerequisite as the example.
OpenWebUI Serving name — The value you enter directly in this tutorial: tutorial-openwebui.

1. Create a Volume for OpenWebUI data

For the volume creation procedure, see User Guide > Storage > Create Volume. Create with the values below.

Field	Value
Volume Name	`openwebui-volume`
Size	`10Gi`

If you plan to upload many documents or keep long chat histories, use a larger Size. After creation, check that the volume is Ready in the list.

Why is a Volume needed?

OpenWebUI stores not only user accounts, chat history, and workspace settings but also documents uploaded to Knowledge and the built-in embedding index under /app/backend/data. Without a Volume, the documents and the index are lost whenever the Pod restarts.

2. Deploy the OpenWebUI Serving

For how to create a Serving and the meaning of each field, see User Guide > Serving > Create Serving. This guide only covers values specific to this tutorial.

Go to Development > Serving > Create to deploy OpenWebUI.

Step 1 — Basic information

Field	Value
Service Name	`tutorial-openwebui`
Description	`OpenWebUI chat UI with Knowledge`
Service Template	`Custom`

Step 2 — Detailed settings

Field	Value
Image	`ghcr.io/open-webui/open-webui:main`
CPU	`2`
Memory	`8`
Accelerator Type	`None`
Replicas	`1`
Inference Port	`8080`

Volume Mount

Field	Value
Volume	`openwebui-volume`
Mount Path	`/app/backend/data`

Environment Variables

OPENAI_API_BASE_URL=http://qwen-instruct-tutorial-v1-original.rag.svc:80/v1
OPENAI_API_KEY=unused
WEBUI_AUTH=true
ENABLE_OLLAMA_API=false
ENABLE_RAG_HYBRID_SEARCH=true

OPENAI_API_BASE_URL — Internal call URL of the LLM Serving in the form http://<llm-service-name>.<project-namespace>.svc:80/v1. Replace <llm-service-name> and <project-namespace> per the prerequisites with your environment.
OPENAI_API_KEY — Any string (NuFi internal Servings don't validate the key).
WEBUI_AUTH — Leave as true so the first signup automatically becomes admin. Leave this for in-house sharing.
ENABLE_OLLAMA_API — Turn off; we don't use the Ollama backend.
ENABLE_RAG_HYBRID_SEARCH — Performs Knowledge search as hybrid vector + BM25, improving accuracy over pure vector search.

To block external signups

After deployment, sign up with the first account to become admin, then turn off Admin Settings > General > Enable New Sign Ups in OpenWebUI to block external signups.

After deployment, in the Development > Serving list, wait until tutorial-openwebui reaches Running. The first start downloads the embedding model and may take 1–3 minutes.

3. Connect and initial setup

1. Open the Connect URL

In the Serving list, select tutorial-openwebui and press the Connect button to open the external URL.

Serving list — tutorial-openwebui Connect

2. Create the admin account

The first time you connect, the sign-up screen appears. Enter name, email, and password to create the first account. The first signup automatically becomes admin.

On password loss

User info is stored inside openwebui-volume, so if you lose the password you'll need to mount the Volume in a Lab and edit the user table directly. Keep the password in a safe place.

4. Check LLM Serving connection

The connection should already be wired through the environment variable (OPENAI_API_BASE_URL). Verify in the UI.

1. Open the model selection dropdown

Open the model selection dropdown in the top-left. You should see the model IDs exposed by the LLM Serving pointed to by OPENAI_API_BASE_URL. Verify it matches the --model (or --served-model-name) value from the prerequisites.

OpenWebUI model selection dropdown

2. If the model is not visible

Check in the following order.

Click your avatar (top right) → Admin Panel → the Settings tab at the top → Connections in the left menu → go to the OpenAI API item.
Verify that the entered URL matches the OPENAI_API_BASE_URL environment variable value, and press the refresh button next to the URL to refetch the model list.
If still empty, check the LLM Serving detail → Logs tab to see whether vLLM is ready.

5. Note on Korean embedding

OpenWebUI's default embedding model is trained mainly on English. To accurately search Korean documents you need to swap in a multilingual embedding model separately. This tutorial skips that step and uploads English documents to verify operation.

6. Create a Knowledge collection

A Knowledge collection is the unit that groups documents. Upload multiple files to a single collection and reference the whole collection from chat.

Workspace Knowledge screen

In the left sidebar, go to Workspace > Knowledge.
Click + New Knowledge in the upper right.
When the Create a knowledge base dialog opens, fill in:
- What are you working on? (Name your knowledge base): tutorial-docs
- What are you trying to achieve? (Describe your knowledge base and objectives): any short description (e.g., Dudaji product manuals)
Click the Create button.

7. Upload documents

Open the tutorial-docs collection and either click + Add Content or drag files into the central drop zone.

Recommended formats: .md, .txt, .pdf (OpenWebUI extracts text from PDFs internally)
You can upload multiple files at once

Uploaded files are immediately chunked, embedded, and loaded into the collection. They can be used for search once each file's Status becomes Processed.

8. Use Knowledge from chat

Method 1 — One-off collection reference (#)

In a new chat, type # in the message input to see the list of registered Knowledge collections / files. Select tutorial-docs and ask the question.

Example questions (based on uploaded English manuals — since the default embedding model is English-leaning, write questions in English):

#tutorial-docs What is the maximum load capacity of the Gravity Desk Pro?

#tutorial-docs What does error code E04 on the Claymore K3 mean and how should I handle it?

#tutorial-docs How long does the Claymore K3 self-diagnosis take, and what should I do if the LED blinks red?

#tutorial-docs When does the Gravity Desk Pro require ACS recalibration?

OpenWebUI chat screen — Knowledge reference result

When the response finishes, a list of referenced document chunks is shown collapsed above the answer. Click to see the original chunk contents.

Method 2 — Permanently bind Knowledge to a model

If typing # every chat is cumbersome, you can bind collections at the model level.

Workspace Models — + New Model

Model detail — Knowledge connection

Click Workspace > Models > + New Model.
In Base Model (From), select the model exposed by the LLM Serving (e.g., /mnt/rag-volume/Qwen2.5-0.5B-Instruct).
Set Name to qwen.
In the Knowledge section, click Select Knowledge and add the tutorial-docs collection.
Click Save & Create in the upper right.

Now when you select qwen from the model dropdown at the top of a chat, every message automatically goes through the tutorial-docs collection and is answered with RAG.

9. Check response + sources

Ask questions related to the uploaded documents and verify the answer and the referenced documents.

OpenWebUI chat response — Source button

The answer body is generated by the LLM Serving.
The answer area also shows a Source button.
Clicking the Source button shows the document chunks (filename, partial body) referenced when creating the answer.

Response quality

Qwen2.5-0.5B-Instruct deployed in the E2E tutorial is a 0.5B-class small model, so answers based on RAG-retrieved documents may be inaccurate or formatted inconsistently. Swap the LLM Serving to a larger model and perceived quality improves significantly.

Composition​

Prerequisites​

1. Create a Volume for OpenWebUI data​

2. Deploy the OpenWebUI Serving​

Step 1 — Basic information​

Step 2 — Detailed settings​

Volume Mount​

Environment Variables​

3. Connect and initial setup​

4. Check LLM Serving connection​

5. Note on Korean embedding​

6. Create a Knowledge collection​

7. Upload documents​

8. Use Knowledge from chat​

9. Check response + sources​