Version: Next

Architecture

System Architecture

NuFi is an NPUOps platform that runs on top of a Kubernetes cluster. All components are K8s-native.

NuFi System Composition

Main Components

Component	Role
Dashboard	Web UI — the frontend users interact with directly
API Server	Handles REST API, creates/manages Custom Resources, directly creates Lab workloads (Kubeflow Notebook / File Manager Pod), and orchestrates evaluation tools such as Optimizer and Benchmark
NuFi Controller (K8s Operator)	Watches NuFi CRDs and provisions K8s resources such as Serving Pods and various Jobs
nufi-proxy	Sidecar proxy on the inference request path. Client requests pass through a VirtualService to nufi-proxy, which then forwards them to the inference server. Handles traffic control for inference requests, including load balancing, temperature-based traffic shutoff, Async Queue, and Transformer pre/post-processing chaining.
nufi-notebook-servers	Jupyter / VS Code / LlamaFactory container images for Labs (development environments)
nufi-file-manager	File manager for uploading, downloading, and managing files in a Volume

Custom Resources (CRDs) and Outputs

The CRDs managed by NuFi Controller and the workloads each produces:

CRD	Output Workload
NpuDeploy	Serving (Inference Server + nufi-proxy + Transformer + Temperature Sidecar + Service + VirtualService)
ModelImport	Model Import Job (MLflow → Registry)
NpuPortingPipeline	NPU Compile Job (per-device compile)
DatasetSourceRevision	Dataset Job (Upload / Import / Promotion)
EvaluationRun	Evaluation Job (lm-eval, etc.)

note

Lab (Notebook Server, File Manager) is created directly by the API Server, not by the NuFi Controller. Notebook is provisioned through the Kubeflow Notebook CR, and File Manager is provisioned as a Pod created directly by the API Server.

Evaluation Orchestration Tools

The API Server hosts internal tools that automate evaluation and tuning.

Optimizer: Runs automated trials repeatedly, creating an EvaluationRun CR for each trial.
Benchmark Profile: Defines evaluation metric policies and aggregates results.

Traffic & Scaling

Supporting components for the Serving runtime.

Auto-scaling (KEDA): Scales serving workloads driven by NpuDeploy.
Async Queue: Async request buffering for nufi-proxy.

Infrastructure dependencies:

Istio / VirtualService: Used for external traffic routing for Serving and Labs. A VirtualService is created per Serving to route inference requests through the endpoint URL.

Request Flow

Model Deployment Flow

When a user creates a Serving in the Dashboard, the NuFi Controller automatically provisions the K8s resources.

Inference Request Flow

Inference requests to a deployed service pass through nufi-proxy on their way to the inference server.

Deployment Structure

NuFi is packaged as a single bundle and installed into a Kubernetes cluster. The cluster and the NuFi system are installed together.

Feature	Description
Single bundle	Cluster + infrastructure + NuFi applications, all included
K8s native	Every component is managed as a K8s resource. Operations, deployment, and observability follow Kubernetes conventions, so you can use existing K8s operational tools (kubectl, Helm, Prometheus, etc.) as is.

Supported Devices

Vendor	Device	Supported Features
NVIDIA	CUDA-capable GPU	Lab, Serving
FuriosaAI	RNGD	Lab, Serving

note

For per-device support scope and feature differences, see the Custom Device Management documentation.

NuFi System Composition​

Main Components​

Custom Resources (CRDs) and Outputs​

Evaluation Orchestration Tools​

Traffic & Scaling​

Request Flow​

Model Deployment Flow​

Inference Request Flow​

Deployment Structure​

Supported Devices​