Spheron GPU Catalog

Rent NVIDIA B300 GPUs on Demand from $2.45/hr

288GB HBM3e Blackwell Ultra with 15 PFLOPS dense FP4, built for trillion-parameter training.

At a glance

You can rent an NVIDIA B300 Blackwell Ultra GPU on Spheron starting at $2.45/hr per GPU per hour on dedicated (99.99% SLA, non-interruptible), with spot pricing cheaper still. Per-minute billing, no long-term contracts, and B300 instances deploy as part of GB300 NVL72 rack systems or HGX B300 8-way nodes. Each GPU ships with 288GB HBM3e (50% more than B200), NVLink 5 @ 1.8 TB/s, 5th gen Tensor Cores with an enhanced FP4 Transformer Engine, and dramatically higher throughput than B200 across every precision format. Built for 200B+ parameter training, ultra-long-context inference (1M+ tokens), MoE models at trillion-parameter scale, and multi-modal foundation models. B300 is the pick when B200's 192GB isn't enough.

GPU ArchitectureNVIDIA Blackwell Ultra
VRAM288 GB HBM3e
Memory Bandwidth8.0 TB/s

Technical specifications

GPU Architecture
NVIDIA Blackwell Ultra
VRAM
288 GB HBM3e
Memory Bandwidth
8.0 TB/s
Tensor Cores
5th Generation (Ultra)
CUDA Cores
20,480+
FP64 Performance
60 TFLOPS
FP32 Performance
120 TFLOPS
TF32 Performance
3,000 TFLOPS
FP8 Tensor (dense)
7,500 TFLOPS
FP4 Tensor (dense)
15,000 TFLOPS
System RAM
184 GB DDR5
vCPUs
32 vCPUs
Storage
250 GB NVMe Gen5
Network
NVLink 1.8 TB/s
TDP
1200W

Pricing comparison

ProviderPrice/hrSavings
SpheronYour price
$2.45/hr-
Nebius
$6.10/hr2.5x more expensive
CoreWeave
Contact sales-
AWS (p6-b300)
$17.80/hr7.3x more expensive
Custom & Reserved

Need More B300 Than What's Listed?

Reserved Capacity

Commit to a duration, lock in availability and better rates

Custom Clusters

8 to 512+ GPUs, specific hardware, InfiniBand configs on request

Supplier Matchmaking

Spheron sources from its certified data center network, negotiates pricing, handles setup

Need more B300 capacity? Tell us your requirements and we'll source it from our certified data center network.

Typical turnaround: 24–48 hours

When to pick the B300

Scenario 01

Pick B300 if

You're training or serving 200B+ parameter models and B200's 192GB HBM3e isn't enough. 288GB lets you fit larger dense models on a single GPU, keep longer context windows (1M+ tokens), or reduce tensor-parallel splits on fixed model sizes. Also the pick for GB300 NVL72 rack-scale deployments where all 72 GPUs address unified memory.

Recommended fit
Scenario 02

Pick B200 instead if

Your model fits comfortably in 192GB and you want the cheapest Blackwell rate. B200 is widely available, cheaper per hour, and matches B300 on FP4 Transformer Engine capability. Best for most 70B-200B workloads.

Recommended fit
Scenario 03

Pick H200 instead if

You don't need Blackwell FP4 and want proven Hopper with 141GB HBM3e. H200 is significantly cheaper per hour and has been production-hardened for over a year, a safer pick when Blackwell software tuning isn't worth the premium.

Recommended fit
Scenario 04

Pick GB300 NVL72 instead if

You need rack-scale training for trillion-parameter frontier models. GB300 NVL72 connects 72 B300 GPUs over NVLink into a unified 20+ TB memory domain — the only architecture that handles models too large for any single 8-way node.

Recommended fit

Ideal use cases

Use case / 01
🌐

Frontier Model Training

Train the most advanced frontier AI models at scale with 288GB memory per GPU and class-leading memory bandwidth. Handle the largest MoE and dense transformer architectures without memory constraints.

Frontier-scale MoE models with 10T+ parametersMulti-modal foundation models (text, image, video, audio, 3D)Scientific AI for drug discovery and protein foldingSparse-attention and long-context transformers (1M+ tokens)
Use case / 02
💬

Ultra-High-Throughput LLM

Serve the world's largest language models at production scale with massive memory capacity and superior compute density, minimizing cost per token across all precision formats.

Real-time inference for 200B+ parameter LLMsUltra-long context RAG pipelines (1M+ token windows)Multi-turn agentic AI with reasoning and tool useSpeculative decoding pipelines at scale
Use case / 03

Generative AI & Creative Workloads

Power next-generation generative AI with massive VRAM headroom for high-resolution video, 3D, and complex multi-modal generation pipelines all within a single GPU.

Cinematic 4K/8K video generation at real-time speedsHigh-fidelity 3D world and asset generationFull-context multi-modal document understandingEnterprise-grade code generation and agentic programming
Use case / 04
🔬

AI Research & Architecture Exploration

Give researchers the memory and compute needed to explore novel architectures, scaling laws, and experimental approaches without hardware bottlenecks.

Novel neural architecture search at scaleMulti-agent and emergent-behavior RL researchIn-context learning and ICL at 1M+ token lengthsBrain-scale and physics simulation workloads

Performance benchmarks

LLM Pre-training (100B)
3.3x faster
vs H100 SXM5
LLM Inference Throughput
24,000 tokens/s
Llama-3 70B FP8
MoE Training Efficiency
4.1x faster
vs H100 SXM5
Multi-Modal Training
3.5x faster
vs H100 SXM5
Stable Diffusion XL
5.2x faster
1024×1024 generation
Memory Capacity
3.6x larger
vs H100 80GB

Train a 400B+ MoE model on 8x B300 HGX

288GB per GPU on an 8-way HGX B300 node gives you 2.3TB of HBM3e across NVLink, enough to train a 400B+ MoE or pre-train a large dense model with aggressive batch sizes.

bash
Spheron
# SSH into your HGX B300 nodessh ubuntu@<instance-ip> # NVIDIA NeMo Framework ships Blackwell-optimized containersdocker run --gpus all --rm -it \  nvcr.io/nvidia/nemo:25.04 bash # Inside container, launch FP8 pre-training with FSDPtorchrun --nproc_per_node=8 \  examples/nlp/language_modeling/megatron_gpt_pretraining.py \  model.mcore_gpt=True \  model.transformer_engine=True \  model.fp8=hybrid \  model.tensor_model_parallel_size=4 \  model.pipeline_model_parallel_size=2 \  trainer.devices=8

For FP4 pre-training, pass model.fp4=True (requires Transformer Engine 2.0+ and Blackwell kernels). FP4 roughly doubles effective throughput vs FP8 on compatible layers.

Interconnect fabric

NVLink Ultra Configuration

B300 GPUs are built on NVLink Ultra technology, delivering 1.8 TB/s bidirectional bandwidth per GPU. Combined with 288GB of HBM3e memory per card, B300 clusters enable near-linear scaling for the most data-intensive distributed training workloads, including trillion-parameter models with long-context requirements.

01NVLink 5.0 Ultra with 1.8 TB/s per GPU bandwidth
0218x bandwidth improvement over PCIe Gen5
03Full NVSwitch connectivity across 8-GPU systems
04Unified memory addressing across all GPUs in node
05Direct GPU-to-GPU communication bypassing CPU
06NVIDIA SHARP support for in-network computing
07Optimized for DeepSpeed ZeRO-3, FSDP, and Megatron
08Sub-100ns GPU-to-GPU latency within node
Scale

Need a custom multi-node cluster or reserved capacity?

B300 vs alternatives

Related resources

FAQ / 10

Frequently asked questions

Also consider