Infrastructure
Purpose-built for AI workloads.
Deep learning requires an environment where network, storage, and compute operate without friction. We engineered every layer for this.
Un-virtualized GPUs
When you use GPUs on standard cloud platforms, a hypervisor sits between your code and the hardware. Nubis connects your AI workloads directly to the GPU bus — bare-metal access that can cut training times by up to 40%.
Non-blocking network fabric
Moving terabytes of training data can create massive bottlenecks. Our 40 Gbps internal network fabric guarantees storage and GPU instances communicate at line-rate speed — the network is never the bottleneck.
High-throughput object storage
S3-compatible object storage co-located physically with our compute nodes. Stream enormous datasets into memory during training without incurring the egress fees that global providers charge per gigabyte.
Low-latency edge inference
Deploy trained weights to our lightweight edge instances. Localized speech recognition or live video analysis can return responses to mobile devices across Africa in under 20 milliseconds.
On-demand GPU burst
Only need GPUs for training runs? Provision bare-metal GPU instances in under 90 seconds and destroy them when done. Pay per hour — no 1-year commitments required.
Distributed training support
Scale across multiple GPU nodes with our high-bandwidth mesh fabric. NCCL collective communication operates at near-theoretical bandwidth with single-digit microsecond switch latency.
Architecture
Direct-to-GPU pipeline
A high-throughput topology designed to prevent data starvation during distributed machine learning training runs.
S3-Compatible Object Storage
01Co-located petabyte-scale training dataset lakes. Zero egress charges between storage and compute nodes.
40 Gbps non-blocking switch
02Dedicated physical fiber linking storage to compute. Bisection bandwidth guaranteed at full line rate.
PCIe Gen 5 bus
03Hardware-level interface bypassing virtual drivers. No hypervisor interrupt overhead between CPU and GPU memory.
Bare-metal NVIDIA GPUs
04Unthrottled tensor cores running at 100% capacity. CUDA directly on metal — no driver abstraction layers.
Example
# PyTorch High-Throughput DataLoader
# Utilizing Nubis local object storage
import torch
from torchvision import datasets
from torch.utils.data import DataLoader
# Direct internal S3 endpoint (0 egress fees)
NUBIS_S3_ENDPOINT = "http://storage.internal.nubis:9000"
# Stream dataset directly to GPU memory
train_dataset = datasets.ImageFolder(
root="s3://training-data/imagenet_1k",
transform=transforms.Compose([
transforms.Resize(256),
transforms.ToTensor(),
])
)
# Leverage max workers on bare-metal CPUs
loader = DataLoader(
train_dataset,
batch_size=512,
shuffle=True,
num_workers=32, # Bare-metal thread count
pin_memory=True # Accelerate host-to-device transfer
)Use cases
AI shipping on African infrastructure.
LLM fine-tuning
Fine-tune large language models on proprietary African-language datasets without data ever leaving the continent. Full GDPR and local data-residency compliance.
Real-time fraud detection
Deploy ML inference endpoints inside the same Nubis VPC as your payment processing systems. Sub-5ms model calls — fast enough to score transactions before they settle.
Computer vision at scale
Process live video feeds from smart city infrastructure, agricultural monitoring, or quality inspection lines — with inference happening on-continent, not round-tripping to Europe.
Deep dive
The training data bottleneck.
Why AI is different. Traditional web applications retrieve small amounts of data and spend most of their time waiting for the user. AI training is the opposite — a model constantly pulls millions of images or text files into GPU memory as fast as the network allows.
The standard cloud problem. Your GPU might be physically located in one building while your data is stored in another. That distance, plus standard networking protocols, creates a "data starvation" effect — your expensive GPU sits idle waiting for images to arrive.
The Nubis advantage. We purposefully built our object storage to sit next to our compute racks, tied together with an ultra-high-speed fiber network. Data streams directly into the GPU pipeline. For an enterprise, this means less time paying by the hour for a GPU that isn't working at maximum efficiency.

