Architecture
NuFi is an NPUOps platform that runs on top of a Kubernetes cluster. All components are K8s-native.
NuFi System Composition
Main Components
| Component | Role |
|---|---|
| Dashboard | Web UI — the frontend users interact with directly |
| API Server | Handles REST API, creates/manages Custom Resources, directly creates Lab workloads (Kubeflow Notebook / File Manager Pod), and orchestrates evaluation tools such as Optimizer and Benchmark |
| NuFi Controller (K8s Operator) | Watches NuFi CRDs and provisions K8s resources such as Serving Pods and various Jobs |
| nufi-proxy | Sidecar proxy on the inference request path. Client requests pass through a VirtualService to nufi-proxy, which then forwards them to the inference server. Handles traffic control for inference requests, including load balancing, temperature-based traffic shutoff, Async Queue, and Transformer pre/post-processing chaining. |
| nufi-notebook-servers | Jupyter / VS Code / LlamaFactory container images for Labs (development environments) |
| nufi-file-manager | File manager for uploading, downloading, and managing files in a Volume |
Custom Resources (CRDs) and Outputs
The CRDs managed by NuFi Controller and the workloads each produces:
| CRD | Output Workload |
|---|---|
| NpuDeploy | Serving (Inference Server + nufi-proxy + Transformer + Temperature Sidecar + Service + VirtualService) |
| ModelImport | Model Import Job (MLflow → Registry) |
| NpuPortingPipeline | NPU Compile Job (per-device compile) |
| DatasetSourceRevision | Dataset Job (Upload / Import / Promotion) |
| EvaluationRun | Evaluation Job (lm-eval, etc.) |
Lab (Notebook Server, File Manager) is created directly by the API Server, not by the NuFi Controller. Notebook is provisioned through the Kubeflow Notebook CR, and File Manager is provisioned as a Pod created directly by the API Server.
Evaluation Orchestration Tools
The API Server hosts internal tools that automate evaluation and tuning.
- Optimizer: Runs automated trials repeatedly, creating an
EvaluationRunCR for each trial. - Benchmark Profile: Defines evaluation metric policies and aggregates results.
Traffic & Scaling
Supporting components for the Serving runtime.
- Auto-scaling (KEDA): Scales serving workloads driven by NpuDeploy.
- Async Queue: Async request buffering for nufi-proxy.
Infrastructure dependencies:
- Istio / VirtualService: Used for external traffic routing for Serving and Labs. A VirtualService is created per Serving to route inference requests through the endpoint URL.
Request Flow
Model Deployment Flow
When a user creates a Serving in the Dashboard, the NuFi Controller automatically provisions the K8s resources.
Inference Request Flow
Inference requests to a deployed service pass through nufi-proxy on their way to the inference server.
Deployment Structure
NuFi is packaged as a single bundle and installed into a Kubernetes cluster. The cluster and the NuFi system are installed together.
| Feature | Description |
|---|---|
| Single bundle | Cluster + infrastructure + NuFi applications, all included |
| K8s native | Every component is managed as a K8s resource. Operations, deployment, and observability follow Kubernetes conventions, so you can use existing K8s operational tools (kubectl, Helm, Prometheus, etc.) as is. |
Supported Devices
| Vendor | Device | Supported Features |
|---|---|---|
| NVIDIA | CUDA-capable GPU | Lab, Serving |
| FuriosaAI | RNGD | Lab, Serving |
For per-device support scope and feature differences, see the Custom Device Management documentation.