Skip to main content
Version: Next

Architecture

System Architecture

NuFi is an NPUOps platform that runs on top of a Kubernetes cluster. All components are K8s-native.

NuFi System Composition

Main Components

ComponentRole
DashboardWeb UI — the frontend users interact with directly
API ServerHandles REST API, creates/manages Custom Resources, directly creates Lab workloads (Kubeflow Notebook / File Manager Pod), and orchestrates evaluation tools such as Optimizer and Benchmark
NuFi Controller (K8s Operator)Watches NuFi CRDs and provisions K8s resources such as Serving Pods and various Jobs
nufi-proxySidecar proxy on the inference request path. Client requests pass through a VirtualService to nufi-proxy, which then forwards them to the inference server. Handles traffic control for inference requests, including load balancing, temperature-based traffic shutoff, Async Queue, and Transformer pre/post-processing chaining.
nufi-notebook-serversJupyter / VS Code / LlamaFactory container images for Labs (development environments)
nufi-file-managerFile manager for uploading, downloading, and managing files in a Volume

Custom Resources (CRDs) and Outputs

The CRDs managed by NuFi Controller and the workloads each produces:

CRDOutput Workload
NpuDeployServing (Inference Server + nufi-proxy + Transformer + Temperature Sidecar + Service + VirtualService)
ModelImportModel Import Job (MLflow → Registry)
NpuPortingPipelineNPU Compile Job (per-device compile)
DatasetSourceRevisionDataset Job (Upload / Import / Promotion)
EvaluationRunEvaluation Job (lm-eval, etc.)
note

Lab (Notebook Server, File Manager) is created directly by the API Server, not by the NuFi Controller. Notebook is provisioned through the Kubeflow Notebook CR, and File Manager is provisioned as a Pod created directly by the API Server.

Evaluation Orchestration Tools

The API Server hosts internal tools that automate evaluation and tuning.

  • Optimizer: Runs automated trials repeatedly, creating an EvaluationRun CR for each trial.
  • Benchmark Profile: Defines evaluation metric policies and aggregates results.

Traffic & Scaling

Supporting components for the Serving runtime.

  • Auto-scaling (KEDA): Scales serving workloads driven by NpuDeploy.
  • Async Queue: Async request buffering for nufi-proxy.

Infrastructure dependencies:

  • Istio / VirtualService: Used for external traffic routing for Serving and Labs. A VirtualService is created per Serving to route inference requests through the endpoint URL.

Request Flow

Model Deployment Flow

When a user creates a Serving in the Dashboard, the NuFi Controller automatically provisions the K8s resources.

Inference Request Flow

Inference requests to a deployed service pass through nufi-proxy on their way to the inference server.


Deployment Structure

NuFi is packaged as a single bundle and installed into a Kubernetes cluster. The cluster and the NuFi system are installed together.

FeatureDescription
Single bundleCluster + infrastructure + NuFi applications, all included
K8s nativeEvery component is managed as a K8s resource. Operations, deployment, and observability follow Kubernetes conventions, so you can use existing K8s operational tools (kubectl, Helm, Prometheus, etc.) as is.

Supported Devices

VendorDeviceSupported Features
NVIDIACUDA-capable GPULab, Serving
FuriosaAIRNGDLab, Serving
note

For per-device support scope and feature differences, see the Custom Device Management documentation.