Spheron GPU Catalog

NVIDIA B300 GPU: 288GB Blackwell Ultra Specs, Pricing & Rental. Rent B300 GPU from $3.32/hr

Name: NVIDIA B300 GPU Rental
Brand: NVIDIA
Availability: InStock

288GB HBM3e Blackwell Ultra with 15 PFLOPS dense FP4. B300 GPU rentals built for trillion-parameter training.

At a glance

You can rent an NVIDIA B300 Blackwell Ultra GPU on Spheron starting at $3.32/hr per GPU per hour, the lowest live marketplace rate. Per-minute billing, no long-term contracts, and B300 instances deploy as part of GB300 NVL72 rack systems or HGX B300 8-way nodes. Each GPU ships with 288GB HBM3e (50% more than B200), NVLink 5 @ 1.8 TB/s, 5th gen Tensor Cores with an enhanced FP4 Transformer Engine, and dramatically higher throughput than B200 across every precision format. Built for 200B+ parameter training, ultra-long-context inference (1M+ tokens), MoE models at trillion-parameter scale, and multi-modal foundation models. B300 is the pick when B200's 192GB isn't enough.

GPU ArchitectureNVIDIA Blackwell Ultra

VRAM288 GB HBM3e

Memory Bandwidth8.0 TB/s

NVIDIA B300 specifications

GPU Architecture

NVIDIA Blackwell Ultra

VRAM

288 GB HBM3e

Memory Bandwidth

8.0 TB/s

Tensor Cores

5th Generation (Ultra)

CUDA Cores

20,480+

FP64 Performance

60 TFLOPS

FP32 Performance

120 TFLOPS

TF32 Performance

3,000 TFLOPS

FP8 Tensor (dense)

7,500 TFLOPS

FP4 Tensor (dense)

15,000 TFLOPS

System RAM

184 GB DDR5

vCPUs

32 vCPUs

Storage

250 GB NVMe Gen5

Network

NVLink 1.8 TB/s

TDP

1200W

NVIDIA B300 pricing

Provider	Price/hr	Savings
SpheronYour price	$3.32/hr	-
Nebius	$6.10/hr	1.8x more expensive
CoreWeave	Contact sales	-
AWS (p6-b300)	$17.80/hr	5.4x more expensive

Custom & Reserved

Need More B300 Than What's Listed?

Large B300 clusters, custom configs, or guaranteed long-term capacity.

Reserved Capacity

Commit to a duration, lock in availability and better rates

Custom Clusters

8 to 512+ GPUs, specific hardware, InfiniBand configs on request

Supplier Matchmaking

Spheron sources from its certified data center network, negotiates pricing, handles setup

Need more B300 capacity? Tell us your requirements and we'll source it from our certified data center network.

Typical turnaround: 24–48 hours

When to pick the B300

Scenario 01

Pick B300 if

You're training or serving 200B+ parameter models and B200's 192GB HBM3e isn't enough. 288GB lets you fit larger dense models on a single GPU, keep longer context windows (1M+ tokens), or reduce tensor-parallel splits on fixed model sizes. Also the pick for GB300 NVL72 rack-scale deployments where all 72 GPUs address unified memory.

Recommended fit

Scenario 02

Pick B200 instead if

Your model fits comfortably in 192GB and you want the cheapest Blackwell rate. B200 is widely available, cheaper per hour, and matches B300 on FP4 Transformer Engine capability. Best for most 70B-200B workloads.

Recommended fit

Scenario 03

Pick H200 instead if

You don't need Blackwell FP4 and want proven Hopper with 141GB HBM3e. H200 is significantly cheaper per hour and has been production-hardened for over a year, a safer pick when Blackwell software tuning isn't worth the premium.

Recommended fit

Scenario 04

Pick GB300 NVL72 instead if

You need rack-scale training for trillion-parameter frontier models. GB300 NVL72 connects 72 B300 GPUs over NVLink into a unified 20+ TB memory domain, the only architecture that handles models too large for any single 8-way node.

Recommended fit

NVIDIA B300 use cases

Use case / 01

Optimized

🌐

Frontier Model Training

Train the most advanced frontier AI models at scale with 288GB memory per GPU and class-leading memory bandwidth. Handle the largest MoE and dense transformer architectures without memory constraints.

Frontier-scale MoE models with 10T+ parametersMulti-modal foundation models (text, image, video, audio, 3D)Scientific AI for drug discovery and protein foldingSparse-attention and long-context transformers (1M+ tokens)

Use case / 02

Optimized

💬

Ultra-High-Throughput LLM

Serve the world's largest language models at production scale with massive memory capacity and superior compute density, minimizing cost per token across all precision formats.

Real-time inference for 200B+ parameter LLMsUltra-long context RAG pipelines (1M+ token windows)Multi-turn agentic AI with reasoning and tool useSpeculative decoding pipelines at scale

Use case / 03

Optimized

✨

Generative AI & Creative Workloads

Power next-generation generative AI with massive VRAM headroom for high-resolution video, 3D, and complex multi-modal generation pipelines all within a single GPU.

Cinematic 4K/8K video generation at real-time speedsHigh-fidelity 3D world and asset generationFull-context multi-modal document understandingEnterprise-grade code generation and agentic programming

Use case / 04

Optimized

🔬

AI Research & Architecture Exploration

Give researchers the memory and compute needed to explore novel architectures, scaling laws, and experimental approaches without hardware bottlenecks.

Novel neural architecture search at scaleMulti-agent and emergent-behavior RL researchIn-context learning and ICL at 1M+ token lengthsBrain-scale and physics simulation workloads

NVIDIA B300 benchmarks

LLM Pre-training (100B)

3.3x faster

vs H100 SXM5

LLM Inference Throughput

24,000 tokens/s

Llama-3 70B FP8

MoE Training Efficiency

4.1x faster

vs H100 SXM5

Multi-Modal Training

3.5x faster

vs H100 SXM5

Stable Diffusion XL

5.2x faster

1024×1024 generation

Memory Capacity

3.6x larger

vs H100 80GB

Train a 400B+ MoE model on 8x B300 HGX

288GB per GPU on an 8-way HGX B300 node gives you 2.3TB of HBM3e across NVLink, enough to train a 400B+ MoE or pre-train a large dense model with aggressive batch sizes.

bash

Spheron

01# SSH into your HGX B300 node02ssh ubuntu@<instance-ip>03 04# NVIDIA NeMo Framework ships Blackwell-optimized containers05docker run --gpus all --rm -it \06  nvcr.io/nvidia/nemo:25.04 bash07 08# Inside container, launch FP8 pre-training with FSDP09torchrun --nproc_per_node=8 \10  examples/nlp/language_modeling/megatron_gpt_pretraining.py \11  model.mcore_gpt=True \12  model.transformer_engine=True \13  model.fp8=hybrid \14  model.tensor_model_parallel_size=4 \15  model.pipeline_model_parallel_size=2 \16  trainer.devices=8

For FP4 pre-training, pass model.fp4=True (requires Transformer Engine 2.0+ and Blackwell kernels). FP4 roughly doubles effective throughput vs FP8 on compatible layers.

Interconnect fabric

NVLink Ultra Configuration

B300 GPUs are built on NVLink Ultra technology, delivering 1.8 TB/s bidirectional bandwidth per GPU. Combined with 288GB of HBM3e memory per card, B300 clusters enable near-linear scaling for the most data-intensive distributed training workloads, including trillion-parameter models with long-context requirements.

01NVLink 5.0 Ultra with 1.8 TB/s per GPU bandwidth

0218x bandwidth improvement over PCIe Gen5

03Full NVSwitch connectivity across 8-GPU systems

04Unified memory addressing across all GPUs in node

05Direct GPU-to-GPU communication bypassing CPU

06NVIDIA SHARP support for in-network computing

07Optimized for DeepSpeed ZeRO-3, FSDP, and Megatron

08Sub-100ns GPU-to-GPU latency within node

Scale

Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.

B300 vs alternatives

B300 vs AMD MI400

CDNA 5 vs Blackwell Ultra architecture, LLM inference projections, ROCm vs CUDA maturity, and GPU cloud pricing for teams weighing AMD's MI400 series as an alternative to B300.

B300 vs Rubin and Hopper generations

Where B300 fits in NVIDIA's generational stack, how Blackwell Ultra compares to Hopper, and what changes with Rubin on the horizon. Useful context before committing to multi-year infrastructure.

NVIDIA B300 guides and resources

01Read

NVIDIA B300 (Blackwell Ultra): Complete Guide to Specs and Pricing

Everything you need to know about B300 specs, pricing, architecture, and when the upgrade from B200 is worth it.

02Read

NVIDIA Vera Rubin NVL72: Rack-Scale H300 System Specs and Cloud Timing

The next-gen successor to GB200 NVL72. 72 R100 (H300) GPUs, 260 TB/s NVLink fabric. Plan your B300 to Rubin upgrade path.

03Read

GPU Requirements Cheat Sheet 2026

Find the right GPU for every major open-source AI model, includes B300-class workload recommendations.

04Read

GPU Cloud Benchmarks 2026

Real performance and pricing data across every major GPU cloud provider, including next-gen Blackwell GPUs.

01Technical Brief

NVIDIA B300 Release Date and Cloud Availability

The NVIDIA B300 Blackwell Ultra GPU was announced at GTC 2025 as the memory-upgraded refresh of the B200. Production shipments began in mid-2025 inside GB300 NVL72 rack-scale systems. CoreWeave was first to general availability on GB300 NVL72 in August 2025; Nebius, AWS p6-b300, Microsoft Azure ND GB300 v6, and Google Cloud A4X Max followed through Q4 2025 and H1 2026. The B300 is still in early-availability rollout across most providers as of mid-2026.

On Spheron, B300 capacity is sourced from data center partners with priority given to sustained training commitments. Live availability and reservation pricing is on the pricing page or via the contact form. The B300's successor is the NVIDIA Rubin R100 (288GB HBM4 at up to 22 TB/s), which is expected to start shipping in H2 2026 and reach broad cloud availability in 2027.

02Technical Brief

B300 VRAM and Memory Bandwidth: 288GB HBM3e at 8 TB/s

The B300 ships with 288GB of HBM3e memory at 8 TB/s of bandwidth, making it the highest-VRAM single GPU available outside the upcoming Rubin generation. That is 50% more VRAM than the B200 (192GB HBM3e), 2x the VRAM of the H200 (141GB HBM3e), and 3.6x the VRAM of the H100 (80GB HBM3). The bandwidth matches the B200 at 8 TB/s, so the throughput advantage of B300 over B200 is purely about fitting larger models or longer contexts in a single GPU rather than serving each token faster.

Where the 288GB VRAM matters: 200B+ parameter dense models fit in FP8 on a single GPU without tensor parallelism overhead, trillion-parameter MoE models fit across a single 8-GPU HGX B300 node with ~2.3TB of pooled HBM3e, and 1M+ token context windows fit without KV cache eviction. The B300 includes the second-generation Transformer Engine with native FP4 support, delivering roughly 15 PFLOPS of FP4 dense compute. For workloads that fit comfortably in 192GB, the B200 at a lower hourly rate is the better economic fit; B300 is the right pick when single-GPU VRAM is the binding constraint.

FAQ / 10

NVIDIA B300 FAQ

What is the NVIDIA B300 and how does it differ from the B200?

Is the B300 available now on Spheron?

When does 288GB of VRAM matter vs a B200?

Can I use B300 for inference-only workloads?

What frameworks are supported on B300?

How does B300 compare to renting multiple H100s?

What is the cost to buy a B300 vs renting on Spheron?

B300 GPUs list in the $40,000-$50,000 range per card, and an 8-way HGX B300 node with networking, cooling, and chassis runs $400K-$600K fully provisioned. At Spheron's on-demand rate, you'd need well over a year of 24/7 utilization to break even on hardware acquisition alone, before counting power, rack space, or depreciation. For all but the largest continuous training commitments, on-demand rental wins on total cost of ownership.

Do you offer reserved or dedicated B300 capacity?

What makes Spheron's B300 offering different from public clouds?

What's the difference between dedicated and spot B300 instances?

Dedicated B300 instances are non-interruptible, run on a 99.99% SLA, and bill per-minute at the on-demand rate. Spot instances run on spare capacity at meaningfully lower rates but can be preempted when dedicated demand rises. Given B300's role in critical frontier training runs, dedicated is the default pick. Spot makes sense for fault-tolerant workloads: batch inference, hyperparameter sweeps, or ablation studies with frequent checkpointing (every 15-30 minutes). For a 70B+ pre-training run where a preemption would cost days of wall time, dedicated is almost always worth the premium.