Comparison

GPU Cloud Pricing Comparison 2026: H100 From $2.01/hr

Back to BlogWritten by Mitrasish, Co-founder & CTO, SpheronMay 14, 2026

GPU Cloud PricingCloud GPU PricingGPU Cloud Pricing ComparisonCheapest GPU CloudCloud GPU PricesGPU Cloud RatesGPU Hourly RateH100B200Cost Optimization

GPU Cloud Pricing Comparison 2026: H100 From $2.01/hr

TL;DR: GPU Cloud Spot vs On-Demand vs Hyperscaler (July 2026)

GPU	Spheron Spot	Spheron On-Demand	AWS On-Demand	Azure On-Demand
H100 SXM5	$2.91/hr	$2.54/hr	~$6.88/hr	~$12.29/hr
B200 SXM6	$5.34/hr	$9.36/hr	~$14.24/hr	TBA
A100 80GB	$0.80/hr	$1.43/hr	~$3.43/hr	~$3.67/hr
H200 SXM	$3.31/hr	$3.70/hr	~$4.98/hr	~$13.78/hr
RTX 4090	N/A	$0.55/hr	N/A	N/A

H100 SXM5 on-demand on Spheron is 2.7x cheaper than AWS on-demand ($2.54 vs $6.88). B200 spot is 2.7x cheaper ($5.34 vs $14.24). One July oddity worth knowing: the H100 spot pool is tight enough that spot ($2.91) currently costs more than on-demand ($2.54), so take on-demand. No egress fees, per-minute billing, no minimum commitment.

Pricing fluctuates based on GPU availability. Rates above are based on 06 Jul 2026. Check current GPU pricing → for live rates.

_Updated July 2026 with live Spheron API rates. Notable: spot discounts have compressed hard since May. B200 SXM6 is $9.36/hr on-demand with $5.34/hr spot, and H100 spot briefly costs more than on-demand. Provider rate-card comparisons below reflect their May 2026 published rates unless noted._

Hyperscalers charge 3-6x more than neo-cloud alternatives for the same GPU hardware. AWS H100 on-demand runs ~$6.88/hr. Azure charges ~$12.29/hr per GPU on their ND H100 v5 instances. On Spheron, H100 PCIe is $2.01/hr on-demand. If you're looking for cheap GPU cloud capacity without stepping down to consumer-grade hardware, that gap is where the savings live. It isn't a temporary anomaly either. It reflects structural differences in overhead, margin, and business model.

This post covers 7 GPU models across 5+ providers, with on-demand, spot, and reserved pricing for each. If you are still evaluating, the free GPU cloud credits guide maps every trial and startup program worth using before you pay for capacity. You can check Spheron's current GPU pricing for live rates. For throughput data behind these prices, see our GPU cloud benchmarks. For weekly pricing moves and hardware availability updates, see the GPU cloud news digest.

The GPU Models Covered

GPU Model	VRAM	Primary Use Case	Tier
RTX 4090	24 GB GDDR6X	Hobbyist inference, fine-tuning	Consumer
A100 80GB	80 GB HBM2e	Training, inference	Data center
L40S	48 GB GDDR6	Inference, rendering	Data center
H100 SXM5	80 GB HBM3	Production training	Data center
H200 SXM	141 GB HBM3e	Large model inference	Data center
B200	192 GB HBM3e	Frontier inference	Blackwell
RTX 5090	32 GB GDDR7	Consumer inference	Consumer

GPU Cloud Pricing by Model (May 2026)

All prices as of 14 May 2026, based on publicly available on-demand rates. Prices fluctuate based on GPU availability and provider policies. Check current Spheron GPU pricing for live rates.

H100 SXM5 Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron (H100 SXM5)	$2.50	$1.03	8-way HGX; lowest spot rate
Spheron (H100 PCIe)	$2.01	N/A	Cheapest H100 entry tier
Lambda Labs	$2.49–$3.44	N/A	H100 SXM; on-demand only (8x–1x configs)
Runpod	$2.99	Available	SXM Secure Cloud
Vast.ai	~$1.53–$2.27	Available	Marketplace rates
CoreWeave	~$6.16	N/A	H100 HGX SXM; normalized per GPU
Nebius	$3.85	N/A	On-demand
FluidStack	$2.10	N/A
Paperspace	$5.95	N/A
AWS (p5)	~$6.88	~$3.83	Spot ~44% off OD; see AWS P5 H100 pricing for full P5/P5e/P5en instance breakdown, Savings Plan rates, and Capacity Block terms. For G7 instance pricing specifically, see AWS EC2 G7 per-hour costs (RTX PRO 4500 Blackwell, from $2.52/hr)
GCP (A3)	~$10.98	~$3.69	Estimated; varies by region
OCI (BM.GPU.H100.8)	~$10.00	~$3.00-$5.00	Flat $10/GPU/hr; see OCI GPU pricing 2026 for preemptible rates and Universal Credits math
Azure (ND H100 v5)	~$12.29	N/A	Per GPU on ND96isr H100 v5 ($98.32/hr, 8 GPUs)

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

For a dedicated Lambda Cloud H100 cost analysis including reserved contract math, see Lambda Cloud H100 pricing 2026. For Nebius-specific rates including H200 pricing, committed-use discount math, and hidden egress costs, see Nebius H100 per-hour pricing. For a full A3 instance breakdown including committed-use tiers and hidden costs, see GCP A3 H100 pricing vs Spheron. GCP's B200 instance runs on the separate A4 series; see Google Cloud A4 B200 pricing vs Spheron for that breakdown. See our Azure H100 pricing guide for a full breakdown of NDv5 reserved and spot tiers. For an H100 price tracker updated monthly with on-demand and spot rates across all major providers, see the H100 news and pricing hub. GMI Cloud's table row above reflects its base on-demand rate only; the provider also sells a separate, pricier 8x H100 InfiniBand cluster tier for distributed training, broken down along with its B200 rate in our GMI Cloud pricing vs Spheron comparison.

H200 SXM Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron	$4.54	N/A	Dedicated only on Spheron as of May 2026
GMI Cloud	$2.60	N/A	On-demand; from $2.60/hr
Nebius	$4.50	N/A	On-demand
Runpod	$4.39	N/A	Secure Cloud
Jarvislabs	$3.80	N/A	On-demand
AWS (p5e)	~$4.98	N/A	Estimated; spot not widely available
GCP	TBA	Spot only	Limited on-demand availability
Azure	~$13.78	N/A	Estimated

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

B200 Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron (B200 SXM6)	$6.02	$2.12	On-demand premium; spot near H100 PCIe range
Runpod	$5.89	N/A	Secure Cloud
Nebius	$5.50	N/A	On-demand
Lambda Labs	$4.99–$5.29	N/A	On-demand; varies by configuration (8x–1x configs)
AWS (p6-b200)	~$14.24	~$3.24	Estimated; $113.93/hr for 8-GPU node
Azure	TBA	N/A	Not yet in standard catalog

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

B300 SXM6 Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron (B300 SXM6)	$6.80	$2.45	Blackwell Ultra, frontier training
Most hyperscalers	TBA	N/A	B300 not yet in standard catalogs

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

For frontier training that needs Blackwell Ultra wired up as a full rack rather than single SXM6 cards, Spheron also has GB300 NVL72 inventory open to reserve today. Put your GPU count, timeline, and workload on the form and the team confirms availability and gets back to you within a business day.

The AWS on-demand rates above are a different product from AWS's reserved Capacity Blocks, which got two separate price hikes in 2026 and now run $12.355/accelerator-hour for P6-B200 and $14.04 for P6-B300; see our AWS Capacity Blocks pricing breakdown for the full rate card and Spheron comparison.

A100 80GB Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron (A100 80G SXM4)	$1.07	$0.60
Thunder Compute	$0.78	N/A
Market range	$0.78–$2.06	Varies	Neo-cloud average
AWS (p4de)	~$3.43	~$3.07	80GB; Estimated
GCP (A2)	~$5.78	~$2.51	80GB (a2-ultragpu); us-central1; Estimated
Azure (NC A100 v4)	~$3.67	~$0.74	NC24ads A100 v4; spot available

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

L40S Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron (L40S PCIe)	$0.72	N/A
Runpod	$0.86	Available
AWS reserved	~$1.17	N/A	1-year reserved (g6e.xlarge)
Marketplace	~$0.40	Available	Varies

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

RTX 4090 Pricing

Provider	On-Demand $/hr	Spot $/hr	Notes
Spheron	$0.55	N/A
Runpod	$0.69	Available	Community
Vast.ai	$0.35–$0.55	Available	Marketplace, varies
Local marketplace	~$0.20	N/A	Variable reliability

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

RTX 5090 Pricing

Provider	On-Demand $/hr	Notes
Spheron	$0.76	Limited inventory
Runpod	$0.99	Community Cloud; limited inventory
Vast.ai	$0.51–$0.89	Marketplace rates; limited availability

RTX 5090 cloud availability is limited to a small number of providers as of May 2026. Inventory is constrained and prices can shift quickly.

Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.

On-Demand vs Spot vs Reserved: Which Pricing Tier to Choose

On-Demand Pricing

On-demand gives you full flexibility with no commitment. You pay the listed hourly rate, start when you want, and stop when you're done. It is the most expensive tier but the right choice for:

Short experiments and one-off jobs where total cost is low anyway
Workloads with unpredictable runtimes or sharp deadlines
Debugging and development where interruption is intolerable
Production inference APIs where availability guarantees matter

Most neo-cloud providers (Spheron, Lambda, Runpod) do not require contracts for on-demand instances, and several bill per-minute or per-second.

Spot / Preemptible Pricing

Spot instances use idle capacity that providers offer at steep discounts. They can be reclaimed with short notice, typically 30 seconds to 2 minutes. Savings over on-demand range from 40-65%.

GPU	On-Demand	Spot	Savings %
H100 SXM5 (Spheron)	$2.50/hr	$1.03/hr	~59%
B300 (Spheron)	$6.80/hr	$2.45/hr	~64%
A100 80GB (Spheron)	$1.07/hr	$0.60/hr	~44%
H100 SXM5 (AWS)	~$6.88/hr	~$3.83/hr	~44%

Spot pricing is the right call for: batch training jobs with checkpoint/resume, offline inference pipelines, hyperparameter sweeps, and data preprocessing. It is the wrong call for: production serving, real-time inference APIs, or any job that cannot tolerate interruption. Through late 2025, when H100 reserved pools on hyperscalers were largely sold out during the GPU supply crunch, spot instances on neo-clouds became the primary on-ramp for teams that could not secure reserved capacity on AWS or GCP. Supply has since normalized, but spot is still the cheapest path to short-burst capacity.

Reserved / Committed Pricing

Reserved pricing requires a commitment, typically 1 to 12 months, in exchange for 20-40% discounts vs on-demand. AWS EC2 reserved instances, Azure reserved VMs, and GCP committed-use contracts all follow this model.

Neo-cloud providers like Lambda Labs and CoreWeave offer reserved clusters at negotiated rates. Spheron offers volume pricing via direct contact for teams with predictable long-term compute needs.

Reserved pricing is right for: production inference running 24/7, large-scale training programs with predictable GPU-hour requirements, and teams that have validated their workload and want to lock in cost predictability.

For a deeper breakdown of how these billing models compare for different workloads, see our serverless vs on-demand vs reserved GPU guide. For a deeper comparison of bare-metal vs serverless billing structures, see Spheron vs Modal.

Comparing pricing is the first step; attributing that cost to the right team or project is covered in the GPU FinOps and cost allocation guide.

Hidden Costs: What the Hourly Rate Doesn't Include

Egress and Bandwidth Fees

Hyperscalers charge $0.08-$0.12/GB for outbound data transfers. Most neo-clouds (Spheron, Runpod, Lambda) include bandwidth in the instance rate or charge flat rates well below hyperscaler egress fees.

In practice: transferring a 100 GB model checkpoint out of AWS costs $8-12 in egress fees on top of whatever you paid for the GPU hour. At scale, if you are syncing checkpoints to external storage or serving model weights across regions, egress can easily match or exceed your GPU compute bill.

Storage Costs

Persistent volume storage typically runs $0.08-$0.15/GB/month. Temporary storage is included in GPU instances but is not persisted between restarts. If your workflow requires persistent storage across sessions, factor this in when comparing providers.

For large models, even modest storage needs add up. A 70B parameter model in FP16 requires around 140 GB of storage. At $0.10/GB/month, that is $14/month in storage alone before any compute.

Networking and IP Fees

Static IP addresses, load balancers, and VPC peering add cost on hyperscalers. Most neo-clouds include a public IP in the instance rate. If your application requires custom networking topology, AWS and GCP give you more tools but charge for the privilege.

Minimum Commitments

Some providers require a 1-hour minimum billing period (Paperspace). Others bill per-minute (Spheron) or per-second (Runpod). For short experimental runs under 10 minutes, the minimum commitment model can effectively double or triple your per-run cost. Check this before choosing a provider for iterative development work.

Price-Performance: Cost Per Token and Cost Per TFLOP

The cheapest hourly rate rarely delivers the best cost-per-token. With B200 SXM6 on Spheron at $6.02/hr on-demand or $2.12/hr on spot, the pricing depends on your tolerance for interruption. On spot, B200 delivers roughly 3-4x the throughput of H100 PCIe at only $0.11/hr more, making it highly cost-effective for checkpoint-friendly workloads.

LLM Inference: Cost Per Million Tokens (Llama 3 70B)

The RTX 4090 is excluded from this table. Llama 3 70B in FP16 requires approximately 140 GB of VRAM, which exceeds the 24 GB on a single RTX 4090. If you need 70B inference on consumer GPUs, use INT4 quantization (e.g., GGUF Q4_K_M), which reduces the memory requirement to roughly 40 GB and is best spread across multiple GPUs or handled on a dedicated data center GPU.

GPU	Provider	$/hr	Est. tokens/sec	$/M tokens
A100 80GB	Spheron	$1.07	~520	$0.57
L40S	Spheron	$0.91	~450	$0.56
H100 PCIe	Spheron	$2.01	~1,200	$0.47
H100 SXM5	AWS	~$6.88	~1,200	$1.59
H200 SXM	Spheron	$4.54	~1,800	$0.70
B200 SXM6	Spheron	$6.02	~4,000	$0.42
B200	AWS	~$14.24	~4,000	$0.99

Throughput figures are per-GPU estimates for comparison purposes. GPUs with less than 141 GB VRAM require multi-GPU tensor parallelism or quantization for Llama 3 70B inference. Reference GPU cloud benchmarks for full multi-GPU data.

For on-demand pricing, B200 on-demand at $0.42/M delivers the best cost-per-token, edging out H100 PCIe at $0.47/M despite a higher hourly rate, thanks to its throughput premium. On spot, B200 at $2.12/hr would yield roughly $0.15/M tokens, making it the cost leader for checkpoint-friendly workloads. The A100 at on-demand $0.57/M is a solid mid-range option. For very large model batches, H200 at $0.70/M provides additional VRAM headroom. For detailed total cost of ownership analysis, see the GPU cost optimization playbook.

Spheron vs Every Major Competitor

Spheron vs AWS/GCP/Azure: The cost gap is 40-85% across major GPU models when comparing on-demand rates. Beyond the hourly rate, Spheron does not charge egress fees, does not require minimum commitments, and bills per-minute. AWS, GCP, and Azure add egress, storage, networking, and reserved capacity overhead that compound the gap substantially on real workloads.

Spheron vs Runpod: Spheron undercuts Runpod on H100 pricing ($2.01 vs $2.99 on-demand). Runpod offers a lower H200 on-demand rate ($4.39 vs $4.54), but for B200 workloads tolerant of interruption, Spheron's B200 spot at $2.12/hr ($5.89/hr on Runpod) offers significant savings. For on-demand B200, Spheron at $6.02/hr is higher than both Runpod ($5.89) and Lambda ($4.99-$5.29), so Runpod becomes attractive for non-interruptible B200 inference. Both platforms offer per-minute billing, spot instances, and multi-GPU configurations. Runpod has a larger community marketplace; Spheron aggregates from enterprise-grade data center partners with SLA guarantees.

Spheron vs Lambda Labs: Lambda is on-demand only for most GPU models. If your workload benefits from spot pricing, Spheron delivers 40-60% cost reductions that Lambda cannot match. Lambda's GPU inventory is strong for H100 and A100; Spheron adds B200 spot availability.

Spheron vs Vast.ai: Vast.ai's marketplace model can produce lower prices on commodity GPUs (A100, RTX 4090) because individual providers compete, but reliability and SLA coverage are variable. Spheron offers guaranteed SLA-backed capacity with consistent performance. For cost-first commodity workloads where reliability tolerance is high, Vast.ai is worth evaluating, alongside other cheaper Vast.ai alternatives with SLA-backed capacity.

Spheron vs CoreWeave: CoreWeave is enterprise-focused with contract pricing and strong multi-node cluster support. For startups and teams that need on-demand access without a sales cycle, Spheron is more accessible. CoreWeave makes sense for large organizations with predictable multi-month compute requirements and existing enterprise procurement workflows.

For head-to-head comparisons, see Spheron vs Runpod, Spheron vs Vast.ai, Spheron vs CoreWeave, Runpod alternatives, and Lambda Labs alternatives. For teams tracking ownership changes among neoclouds, see how Voltage Park's pricing held up after becoming Lightning AI.

How to Choose the Cheapest GPU Cloud for Your Workload

Workload	Recommended GPU	Recommended Provider Tier	Why
Hobbyist inference (7B-13B)	RTX 4090	Vast.ai / Spheron spot	Lowest cost, sufficient VRAM
Fine-tuning 7B-70B	A100 80GB	Spheron / Lambda on-demand	Mature stack, good price
Production inference (70B)	H100 / H200	Spheron spot or on-demand	Balance of cost and throughput
Large model training	H200 / B200	Spheron / CoreWeave	VRAM headroom
Frontier inference (100B+)	B200 / B300	Spheron	Best cost-per-token at scale

For hyperscaler integration requirements (IAM, VPC, compliance certifications), AWS/GCP/Azure may be justified despite significantly higher GPU costs. If your workload is tightly integrated with S3, BigQuery, or Azure Active Directory, the switching cost of migrating to a neo-cloud can outweigh the per-GPU savings in the short term.

Final Verdict

For most AI teams, neo-cloud providers deliver 40-85% lower GPU compute costs than hyperscalers with comparable or better GPU availability in 2026. The pricing gap has widened, not narrowed, as hyperscaler overhead and margin have increased faster than neo-cloud cost reductions.

The cheapest hourly rate is not always the best value. Calculate cost per token or cost per training step before committing to a platform. With B200 SXM6 spot at $2.12/hr on Spheron and A100 80G SXM4 on-demand at $1.07/hr, the pricing structure rewards different use cases in Q2 2026. B200 spot is the cost-per-token leader for checkpoint-friendly workloads; A100 on-demand is the best dollar-per-hour choice for smaller models and fine-tuning that cannot tolerate interruption.

Spot pricing is worth using for batch workloads. The 40-65% savings over on-demand are real and reproducible for any workload that implements checkpoint/resume. On-demand is right for production serving and latency-sensitive workloads where interruption is unacceptable.

All pricing in this post is based on publicly available on-demand rates as of 14 May 2026. GPU cloud prices fluctuate over time based on availability, provider changes, and market conditions. Check Spheron's GPU pricing page for the most current rates.

For Intel-based accelerator pricing and a detailed cost-per-token comparison against H200 and B200, see our Intel Gaudi 3 vs H200 and B200 analysis.

For a more detailed look at Rubin NVL72 availability timelines and projected cloud pricing, including the cost-per-token comparison against current Blackwell rates, see the Vera Rubin NVL72 cloud rental and pricing outlook.

For a full break-even analysis comparing on-premise H100 servers to cloud, including a 3-year TCO model and decision framework, see LLM Inference On-Premise vs GPU Cloud: 2026 Cost Analysis.

Compare current rates on Spheron's GPU pricing page and rent a GPU now to start running your workloads at lower cost.

Cheapest GPU Cloud Providers for 2026

The cheapest GPU cloud depends on which GPU and how interruptible your workload is. There is no single "cheapest provider" because pricing structures differ by tier (spot vs on-demand), by GPU class (consumer vs data-center), and by region. For a feature-by-feature look at the providers themselves rather than just their rates, our top 10 cloud GPU providers ranking covers reliability, networking, and regional coverage. Based on May 2026 public rates across the 5+ providers tracked in this guide:

Cheapest H100 SXM5 per hour: Spheron spot at $1.03/hr is the floor in May 2026, with Runpod spot occasionally hitting $1.19/hr depending on regional availability. On-demand: Spheron at $2.50/hr is the cheapest dedicated H100 SXM5 across the providers tracked. Lambda Labs at $2.49/hr beats it only on certain regions and configurations. Hyperscalers (AWS p5 ~$6.88/hr, Azure ~$12.29/hr, GCP A3-high ~$3.00/hr) are 2-5x more expensive.

Cheapest A100 80GB per hour: Spheron at $1.07/hr on-demand and $0.60/hr spot. Vast.ai marketplace pricing dips to $0.67/hr when high-reliability hosts are available, but consistency varies host-to-host.

Cheapest B200 per hour: Spheron spot at $2.12/hr is the cheapest B200 across all 5+ providers. On-demand: Spheron at $6.02/hr is matched by Nebius at $5.50/hr and Lambda Labs at $5.29/hr depending on region. AWS p6-b200 at roughly $14.24/hr is the most expensive option in this tier.

Cheapest RTX 4090 per hour: Vast.ai marketplace hosts go as low as $0.31/hr; Spheron lists RTX 4090 on-demand at $0.55/hr, and Runpod Community Cloud at $0.34/hr ($0.69/hr on Secure Cloud). For non-production workloads that tolerate marketplace host variability, Vast.ai wins on hourly cost. For a production endpoint, Spheron's dedicated tier delivers SLA-backed reliability at competitive cost.

Cheapest serverless GPU inference: Modal and Runpod Serverless both bill per-second of active compute, which can undercut hourly billing for sub-minute request workloads. The break-even is roughly 30% GPU utilization: below that, serverless wins on total cost; above that, dedicated hourly billing on Spheron or any neo-cloud is cheaper because there is no cold-start overhead.

The pattern across every GPU class: hyperscalers are not the cheapest option for any GPU model in 2026. The cheapest provider is usually a neo-cloud or marketplace, and within that group Spheron spot pricing leads on H100, A100, and B200 for fault-tolerant workloads. For workloads that cannot tolerate interruption, on-demand pricing across Spheron, Lambda, Runpod, and Nebius sits within 20% of each other and the deciding factor is usually GPU availability in your region rather than hourly cost.

Indian teams comparing costs should also check our dedicated GPU cloud guide for India for INR-equivalent pricing and domestic provider options including E2E Networks, Yotta, and Tata Communications.

For teams optimizing total compute spend across providers, see our GPU cost optimization playbook and heterogeneous GPU inference cost optimization guides.

Spheron gives you on-demand access to H100, H200, B200, A100, L40S, and RTX 4090 GPUs with per-minute billing, no egress fees, and spot pricing that cuts costs by up to 64% compared to on-demand. No contracts, no minimums, no hidden fees.
Compare GPU pricing and rent now on Spheron →

FAQ / 05

Frequently Asked Questions

As of May 2026, Spheron lists H100 PCIe at $2.01/hr on-demand and H100 SXM5 at $2.50/hr on-demand ($1.03/hr on spot), among the lowest H100 rates available. Lambda Labs lists H100 SXM at $2.49-$3.44/hr depending on configuration, AWS at ~$6.88/hr, and Azure at ~$12.29/hr (per GPU on the ND96isr H100 v5 instance). Spheron spot pricing cuts H100 SXM5 costs by ~59% for fault-tolerant workloads.

As of May 2026, Spheron's B200 SXM6 is $6.02/hr on-demand with $2.12/hr spot pricing. Runpod Secure Cloud is $5.89/hr, Nebius $5.50/hr, Lambda Labs $4.99-$5.29/hr, and AWS p6-b200 approximately $14.24/hr on-demand. At $2.12/hr spot, the B200 gives 2.4x the memory bandwidth of H100 PCIe ($2.01/hr) with minimal premium, making it the clear price-performance leader for fault-tolerant inference workloads.

The most common hidden costs are: egress bandwidth fees ($0.08-$0.12/GB on hyperscalers, free or flat on most neo-clouds), persistent storage ($0.08-$0.15/GB/month), minimum rental commitments (some providers require 1-hour or 1-day minimums), and network/IP address fees. AWS, GCP, and Azure typically charge $0.08-$0.12/GB for data egress, which can exceed the GPU cost for large model checkpoints.

Spot pricing is worth it for batch training jobs, offline inference, and any workload that can checkpoint and resume after interruption. Savings range from 40-65% below on-demand rates. Avoid spot for production inference APIs, real-time serving, or jobs without checkpointing. Most GPU cloud providers, including Spheron, Runpod, and Vast.ai, offer spot instances.

Reserved GPU pricing (also called committed-use or contract pricing) typically requires 1-month to 12-month commitments in exchange for 20-40% discounts vs on-demand rates. AWS EC2 reserved instances, Azure reserved VMs, and GCP committed-use contracts all follow this model. Neo-cloud providers like Lambda Labs and CoreWeave offer reserved clusters at negotiated rates. Spheron offers volume discounts via direct contact for longer commitments.

Back to all posts

Try It Yourself

Try It on Real GPUs

The GPUs behind these guides are the ones you can rent here: H100s, H200s, B200s, and more, billed per minute with no contracts and no minimum. Pick one and you are live in under two minutes.

Deploy Time

< 2 min

Uptime SLA

99.9%

GPU Models

10+

Billing

Per-Min