Architecture
NuFi is an NPUOps platform that runs on top of a Kubernetes cluster. All components are K8s-native.
NuFi System Composition
Main Components
| Component | Role |
|---|---|
| Dashboard | Web UI — the frontend users interact with directly |
| API Server | Handles REST API, creates/manages Custom Resources, and directly creates Lab workloads (Kubeflow Notebook / File Manager Pod) |
| NuFi Controller (K8s Operator) | Watches NuFi CRDs and provisions K8s resources such as Serving Pods and various Jobs |
| nufi-proxy | Sidecar proxy on the inference request path. Client requests pass through a VirtualService to nufi-proxy, which then forwards them to the inference server. Handles traffic control for inference requests, including load balancing, temperature-based traffic shutoff, Async Queue, and Transformer pre/post-processing chaining. |
| nufi-notebook-servers | Jupyter / VS Code / LlamaFactory container images for Labs (development environments) |
| nufi-file-manager | File manager for uploading, downloading, and managing files in a Volume |
Custom Resources (CRDs) and Outputs
The CRDs managed by NuFi Controller and the workloads each produces:
| CRD | Output Workload |
|---|---|
| NpuDeploy | Serving (Inference Server + nufi-proxy + Transformer + Temperature Sidecar + Service + VirtualService) |
| ModelImport | Model Import Job (MLflow → Registry) |
| NpuPortingPipeline | NPU Compile Job (per-device compile) |
Lab (Notebook Server, File Manager) is created directly by the API Server, not by the NuFi Controller. Notebook is provisioned through the Kubeflow Notebook CR, and File Manager is provisioned as a Pod created directly by the API Server.
Traffic & Scaling
Supporting components for the Serving runtime.
- Auto-scaling (KEDA): Scales serving workloads driven by NpuDeploy.
- Async Queue: Async request buffering for nufi-proxy.
Infrastructure dependencies:
- Istio / VirtualService: Used for external traffic routing for Serving and Labs. A VirtualService is created per Serving to route inference requests through the endpoint URL.
Request Flow
Model Deployment Flow
When a user creates a Serving in the Dashboard, the NuFi Controller automatically provisions the K8s resources.
Inference Request Flow
Inference requests to a deployed service pass through nufi-proxy on their way to the inference server.
Deployment Structure
NuFi is packaged as a single bundle and installed into a Kubernetes cluster. The cluster and the NuFi system are installed together.
| Feature | Description |
|---|---|
| Single bundle | Cluster + infrastructure + NuFi applications, all included |
| K8s native | Every component is managed as a K8s resource. Operations, deployment, and observability follow Kubernetes conventions, so you can use existing K8s operational tools (kubectl, Helm, Prometheus, etc.) as is. |
Supported Devices
| Vendor | Device | Supported Features |
|---|---|---|
| NVIDIA | CUDA-capable GPU | Lab, Serving |
| FuriosaAI | RNGD | Lab, Serving |
For per-device support scope and feature differences, see the Custom Device Management documentation.