TL;DR: GPU Cloud Spot vs On-Demand vs Hyperscaler (May 2026)
| GPU | Spheron Spot | Spheron On-Demand | AWS On-Demand | Azure On-Demand |
|---|---|---|---|---|
| H100 SXM5 | $1.03/hr | $2.50/hr | ~$6.88/hr | ~$12.29/hr |
| B200 SXM6 | $2.12/hr | $6.02/hr | ~$14.24/hr | TBA |
| A100 80GB | $0.60/hr | $1.07/hr | ~$3.43/hr | ~$3.67/hr |
| H200 SXM | N/A | $4.54/hr | ~$4.98/hr | ~$13.78/hr |
| RTX 4090 | N/A | $0.55/hr | N/A | N/A |
H100 SXM5 spot on Spheron is 6.7× cheaper than AWS on-demand. B200 spot is 6.7× cheaper. No egress fees, per-minute billing, no minimum commitment.
Pricing fluctuates based on GPU availability. Rates above are based on 14 May 2026. Check current GPU pricing → for live rates.
_Updated May 2026 with live Spheron API rates and current provider rate cards. Notable: B200 SXM6 on Spheron is $6.02/hr on-demand with $2.12/hr spot, well above H100 PCIe on-demand ($2.01/hr) but spot brings it near parity._
Hyperscalers charge 3-6x more than neo-cloud alternatives for the same GPU hardware. AWS H100 on-demand runs ~$6.88/hr. Azure charges ~$12.29/hr per GPU on their ND H100 v5 instances. On Spheron, H100 PCIe is $2.01/hr on-demand. If you're looking for cheap GPU cloud capacity without stepping down to consumer-grade hardware, that gap is where the savings live. It isn't a temporary anomaly either. It reflects structural differences in overhead, margin, and business model.
This post covers 7 GPU models across 5+ providers, with on-demand, spot, and reserved pricing for each. You can check Spheron's current GPU pricing for live rates. For throughput data behind these prices, see our GPU cloud benchmarks.
The GPU Models Covered
| GPU Model | VRAM | Primary Use Case | Tier |
|---|---|---|---|
| RTX 4090 | 24 GB GDDR6X | Hobbyist inference, fine-tuning | Consumer |
| A100 80GB | 80 GB HBM2e | Training, inference | Data center |
| L40S | 48 GB GDDR6 | Inference, rendering | Data center |
| H100 SXM5 | 80 GB HBM3 | Production training | Data center |
| H200 SXM | 141 GB HBM3e | Large model inference | Data center |
| B200 | 192 GB HBM3e | Frontier inference | Blackwell |
| RTX 5090 | 32 GB GDDR7 | Consumer inference | Consumer |
GPU Cloud Pricing by Model (May 2026)
All prices as of 14 May 2026, based on publicly available on-demand rates. Prices fluctuate based on GPU availability and provider policies. Check current Spheron GPU pricing for live rates.
H100 SXM5 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron (H100 SXM5) | $2.50 | $1.03 | 8-way HGX; lowest spot rate |
| Spheron (H100 PCIe) | $2.01 | N/A | Cheapest H100 entry tier |
| Lambda Labs | $2.49–$3.44 | N/A | H100 SXM; on-demand only (8x–1x configs) |
| RunPod | $2.69 | Available | PCIe Community Cloud |
| Vast.ai | ~$1.53–$2.27 | Available | Marketplace rates |
| CoreWeave | ~$6.16 | N/A | H100 HGX SXM; normalized per GPU |
| Nebius | $2.95 | N/A | On-demand |
| FluidStack | $2.10 | N/A | |
| Paperspace | $5.95 | N/A | |
| AWS (p5) | ~$6.88 | ~$3.83 | Spot ~44% off OD; see AWS P5 H100 pricing for full P5/P5e/P5en instance breakdown, Savings Plan rates, and Capacity Block terms |
| GCP (A3) | ~$10.98 | ~$3.69 | Estimated; varies by region |
| Azure (ND H100 v5) | ~$12.29 | N/A | Per GPU on ND96isr H100 v5 ($98.32/hr, 8 GPUs) |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
For a dedicated Lambda Cloud H100 cost analysis including reserved contract math, see Lambda Cloud H100 pricing 2026. For a full A3 instance breakdown including committed-use tiers and hidden costs, see GCP A3 H100 pricing vs Spheron. See our Azure H100 pricing guide for a full breakdown of NDv5 reserved and spot tiers.
H200 SXM Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $4.54 | N/A | Dedicated only on Spheron as of May 2026 |
| GMI Cloud | $2.60 | N/A | On-demand; from $2.60/hr |
| Nebius | $3.50 | N/A | On-demand |
| RunPod | $3.59 | N/A | Secure Cloud |
| Jarvislabs | $3.80 | N/A | On-demand |
| AWS (p5e) | ~$4.98 | N/A | Estimated; spot not widely available |
| GCP | TBA | Spot only | Limited on-demand availability |
| Azure | ~$13.78 | N/A | Estimated |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
B200 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron (B200 SXM6) | $6.02 | $2.12 | On-demand premium; spot near H100 PCIe range |
| RunPod | $4.99 | N/A | Secure Cloud |
| Nebius | $5.50 | N/A | On-demand |
| Lambda Labs | $4.99–$5.29 | N/A | On-demand; varies by configuration (8x–1x configs) |
| AWS (p6-b200) | ~$14.24 | ~$3.24 | Estimated; $113.93/hr for 8-GPU node |
| Azure | TBA | N/A | Not yet in standard catalog |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
B300 SXM6 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron (B300 SXM6) | $6.80 | $2.45 | Blackwell Ultra, frontier training |
| Most hyperscalers | TBA | N/A | B300 not yet in standard catalogs |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
A100 80GB Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron (A100 80G SXM4) | $1.07 | $0.60 | |
| Thunder Compute | $0.78 | N/A | |
| Market range | $0.78–$2.06 | Varies | Neo-cloud average |
| AWS (p4de) | ~$3.43 | ~$3.07 | 80GB; Estimated |
| GCP (A2) | ~$5.78 | ~$2.51 | 80GB (a2-ultragpu); us-central1; Estimated |
| Azure (NC A100 v4) | ~$3.67 | ~$0.74 | NC24ads A100 v4; spot available |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
L40S Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron (L40S PCIe) | $0.72 | N/A | |
| RunPod | $0.79 | Available | |
| AWS reserved | ~$1.17 | N/A | 1-year reserved (g6e.xlarge) |
| Marketplace | ~$0.40 | Available | Varies |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
RTX 4090 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $0.55 | N/A | |
| RunPod | $0.34 | Available | Community |
| Vast.ai | $0.35–$0.55 | Available | Marketplace, varies |
| Local marketplace | ~$0.20 | N/A | Variable reliability |
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
RTX 5090 Pricing
| Provider | On-Demand $/hr | Notes |
|---|---|---|
| Spheron | $0.76 | Limited inventory |
| RunPod | $0.69 | Community Cloud; limited inventory |
| Vast.ai | $0.51–$0.89 | Marketplace rates; limited availability |
RTX 5090 cloud availability is limited to a small number of providers as of May 2026. Inventory is constrained and prices can shift quickly.
Pricing data as of 14 May 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
On-Demand vs Spot vs Reserved: Which Pricing Tier to Choose
On-Demand Pricing
On-demand gives you full flexibility with no commitment. You pay the listed hourly rate, start when you want, and stop when you're done. It is the most expensive tier but the right choice for:
- Short experiments and one-off jobs where total cost is low anyway
- Workloads with unpredictable runtimes or sharp deadlines
- Debugging and development where interruption is intolerable
- Production inference APIs where availability guarantees matter
Most neo-cloud providers (Spheron, Lambda, RunPod) do not require contracts for on-demand instances, and several bill per-minute or per-second.
Spot / Preemptible Pricing
Spot instances use idle capacity that providers offer at steep discounts. They can be reclaimed with short notice, typically 30 seconds to 2 minutes. Savings over on-demand range from 40-65%.
| GPU | On-Demand | Spot | Savings % |
|---|---|---|---|
| H100 SXM5 (Spheron) | $2.50/hr | $1.03/hr | ~59% |
| B300 (Spheron) | $6.80/hr | $2.45/hr | ~64% |
| A100 80GB (Spheron) | $1.07/hr | $0.60/hr | ~44% |
| H100 SXM5 (AWS) | ~$6.88/hr | ~$3.83/hr | ~44% |
Spot pricing is the right call for: batch training jobs with checkpoint/resume, offline inference pipelines, hyperparameter sweeps, and data preprocessing. It is the wrong call for: production serving, real-time inference APIs, or any job that cannot tolerate interruption. Through late 2025, when H100 reserved pools on hyperscalers were largely sold out during the GPU supply crunch, spot instances on neo-clouds became the primary on-ramp for teams that could not secure reserved capacity on AWS or GCP. Supply has since normalized, but spot is still the cheapest path to short-burst capacity.
Reserved / Committed Pricing
Reserved pricing requires a commitment, typically 1 to 12 months, in exchange for 20-40% discounts vs on-demand. AWS EC2 reserved instances, Azure reserved VMs, and GCP committed-use contracts all follow this model.
Neo-cloud providers like Lambda Labs and CoreWeave offer reserved clusters at negotiated rates. Spheron offers volume pricing via direct contact for teams with predictable long-term compute needs.
Reserved pricing is right for: production inference running 24/7, large-scale training programs with predictable GPU-hour requirements, and teams that have validated their workload and want to lock in cost predictability.
For a deeper breakdown of how these billing models compare for different workloads, see our serverless vs on-demand vs reserved GPU guide. For a deeper comparison of bare-metal vs serverless billing structures, see Spheron vs Modal.
Hidden Costs: What the Hourly Rate Doesn't Include
Egress and Bandwidth Fees
Hyperscalers charge $0.08-$0.12/GB for outbound data transfers. Most neo-clouds (Spheron, RunPod, Lambda) include bandwidth in the instance rate or charge flat rates well below hyperscaler egress fees.
In practice: transferring a 100 GB model checkpoint out of AWS costs $8-12 in egress fees on top of whatever you paid for the GPU hour. At scale, if you are syncing checkpoints to external storage or serving model weights across regions, egress can easily match or exceed your GPU compute bill.
Storage Costs
Persistent volume storage typically runs $0.08-$0.15/GB/month. Temporary storage is included in GPU instances but is not persisted between restarts. If your workflow requires persistent storage across sessions, factor this in when comparing providers.
For large models, even modest storage needs add up. A 70B parameter model in FP16 requires around 140 GB of storage. At $0.10/GB/month, that is $14/month in storage alone before any compute.
Networking and IP Fees
Static IP addresses, load balancers, and VPC peering add cost on hyperscalers. Most neo-clouds include a public IP in the instance rate. If your application requires custom networking topology, AWS and GCP give you more tools but charge for the privilege.
Minimum Commitments
Some providers require a 1-hour minimum billing period (Paperspace). Others bill per-minute (Spheron) or per-second (RunPod). For short experimental runs under 10 minutes, the minimum commitment model can effectively double or triple your per-run cost. Check this before choosing a provider for iterative development work.
Price-Performance: Cost Per Token and Cost Per TFLOP
The cheapest hourly rate rarely delivers the best cost-per-token. With B200 SXM6 on Spheron at $6.02/hr on-demand or $2.12/hr on spot, the pricing depends on your tolerance for interruption. On spot, B200 delivers roughly 3-4x the throughput of H100 PCIe at only $0.11/hr more, making it highly cost-effective for checkpoint-friendly workloads.
LLM Inference: Cost Per Million Tokens (Llama 3 70B)
The RTX 4090 is excluded from this table. Llama 3 70B in FP16 requires approximately 140 GB of VRAM, which exceeds the 24 GB on a single RTX 4090. If you need 70B inference on consumer GPUs, use INT4 quantization (e.g., GGUF Q4_K_M), which reduces the memory requirement to roughly 40 GB and is best spread across multiple GPUs or handled on a dedicated data center GPU.
| GPU | Provider | $/hr | Est. tokens/sec | $/M tokens |
|---|---|---|---|---|
| A100 80GB | Spheron | $1.07 | ~520 | $0.57 |
| L40S | Spheron | $0.91 | ~450 | $0.56 |
| H100 PCIe | Spheron | $2.01 | ~1,200 | $0.47 |
| H100 SXM5 | AWS | ~$6.88 | ~1,200 | $1.59 |
| H200 SXM | Spheron | $4.54 | ~1,800 | $0.70 |
| B200 SXM6 | Spheron | $6.02 | ~4,000 | $0.42 |
| B200 | AWS | ~$14.24 | ~4,000 | $0.99 |
Throughput figures are per-GPU estimates for comparison purposes. GPUs with less than 141 GB VRAM require multi-GPU tensor parallelism or quantization for Llama 3 70B inference. Reference GPU cloud benchmarks for full multi-GPU data.
For on-demand pricing, B200 on-demand at $0.42/M delivers the best cost-per-token, edging out H100 PCIe at $0.47/M despite a higher hourly rate, thanks to its throughput premium. On spot, B200 at $2.12/hr would yield roughly $0.15/M tokens, making it the cost leader for checkpoint-friendly workloads. The A100 at on-demand $0.57/M is a solid mid-range option. For very large model batches, H200 at $0.70/M provides additional VRAM headroom. For detailed total cost of ownership analysis, see the GPU cost optimization playbook.
Spheron vs Every Major Competitor
Spheron vs AWS/GCP/Azure: The cost gap is 40-85% across major GPU models when comparing on-demand rates. Beyond the hourly rate, Spheron does not charge egress fees, does not require minimum commitments, and bills per-minute. AWS, GCP, and Azure add egress, storage, networking, and reserved capacity overhead that compound the gap substantially on real workloads.
Spheron vs RunPod: Spheron undercuts RunPod on H100 pricing ($2.01 vs $2.69 on-demand). RunPod offers a lower H200 on-demand rate ($3.59 vs $4.54), but for B200 workloads tolerant of interruption, Spheron's B200 spot at $2.12/hr ($4.99/hr on RunPod) offers significant savings. For on-demand B200, Spheron at $6.02/hr is higher than both RunPod ($4.99) and Lambda ($4.99-$5.29), so RunPod becomes attractive for non-interruptible B200 inference. Both platforms offer per-minute billing, spot instances, and multi-GPU configurations. RunPod has a larger community marketplace; Spheron aggregates from enterprise-grade data center partners with SLA guarantees.
Spheron vs Lambda Labs: Lambda is on-demand only for most GPU models. If your workload benefits from spot pricing, Spheron delivers 40-60% cost reductions that Lambda cannot match. Lambda's GPU inventory is strong for H100 and A100; Spheron adds B200 spot availability.
Spheron vs Vast.ai: Vast.ai's marketplace model can produce lower prices on commodity GPUs (A100, RTX 4090) because individual providers compete, but reliability and SLA coverage are variable. Spheron offers guaranteed SLA-backed capacity with consistent performance. For cost-first commodity workloads where reliability tolerance is high, Vast.ai is worth evaluating.
Spheron vs CoreWeave: CoreWeave is enterprise-focused with contract pricing and strong multi-node cluster support. For startups and teams that need on-demand access without a sales cycle, Spheron is more accessible. CoreWeave makes sense for large organizations with predictable multi-month compute requirements and existing enterprise procurement workflows.
For head-to-head comparisons, see Spheron vs RunPod, Spheron vs Vast.ai, Spheron vs CoreWeave, RunPod alternatives, and Lambda Labs alternatives.
How to Choose the Cheapest GPU Cloud for Your Workload
| Workload | Recommended GPU | Recommended Provider Tier | Why |
|---|---|---|---|
| Hobbyist inference (7B-13B) | RTX 4090 | Vast.ai / Spheron spot | Lowest cost, sufficient VRAM |
| Fine-tuning 7B-70B | A100 80GB | Spheron / Lambda on-demand | Mature stack, good price |
| Production inference (70B) | H100 / H200 | Spheron spot or on-demand | Balance of cost and throughput |
| Large model training | H200 / B200 | Spheron / CoreWeave | VRAM headroom |
| Frontier inference (100B+) | B200 / B300 | Spheron | Best cost-per-token at scale |
For hyperscaler integration requirements (IAM, VPC, compliance certifications), AWS/GCP/Azure may be justified despite significantly higher GPU costs. If your workload is tightly integrated with S3, BigQuery, or Azure Active Directory, the switching cost of migrating to a neo-cloud can outweigh the per-GPU savings in the short term.
Final Verdict
For most AI teams, neo-cloud providers deliver 40-85% lower GPU compute costs than hyperscalers with comparable or better GPU availability in 2026. The pricing gap has widened, not narrowed, as hyperscaler overhead and margin have increased faster than neo-cloud cost reductions.
The cheapest hourly rate is not always the best value. Calculate cost per token or cost per training step before committing to a platform. With B200 SXM6 spot at $2.12/hr on Spheron and A100 80G SXM4 on-demand at $1.07/hr, the pricing structure rewards different use cases in Q2 2026. B200 spot is the cost-per-token leader for checkpoint-friendly workloads; A100 on-demand is the best dollar-per-hour choice for smaller models and fine-tuning that cannot tolerate interruption.
Spot pricing is worth using for batch workloads. The 40-65% savings over on-demand are real and reproducible for any workload that implements checkpoint/resume. On-demand is right for production serving and latency-sensitive workloads where interruption is unacceptable.
All pricing in this post is based on publicly available on-demand rates as of 14 May 2026. GPU cloud prices fluctuate over time based on availability, provider changes, and market conditions. Check Spheron's GPU pricing page for the most current rates.
For Intel-based accelerator pricing and a detailed cost-per-token comparison against H200 and B200, see our Intel Gaudi 3 vs H200 and B200 analysis.
For a full break-even analysis comparing on-premise H100 servers to cloud, including a 3-year TCO model and decision framework, see LLM Inference On-Premise vs GPU Cloud: 2026 Cost Analysis.
Compare current rates on Spheron's GPU pricing page and rent a GPU now to start running your workloads at lower cost.
Cheapest GPU Cloud Providers for 2026
The cheapest GPU cloud depends on which GPU and how interruptible your workload is. There is no single "cheapest provider" because pricing structures differ by tier (spot vs on-demand), by GPU class (consumer vs data-center), and by region. Based on May 2026 public rates across the 5+ providers tracked in this guide:
Cheapest H100 SXM5 per hour: Spheron spot at $1.03/hr is the floor in May 2026, with RunPod spot occasionally hitting $1.19/hr depending on regional availability. On-demand: Spheron at $2.50/hr is the cheapest dedicated H100 SXM5 across the providers tracked. Lambda Labs at $2.49/hr beats it only on certain regions and configurations. Hyperscalers (AWS p5 ~$6.88/hr, Azure ~$12.29/hr, GCP A3-high ~$3.00/hr) are 2-5x more expensive.
Cheapest A100 80GB per hour: Spheron at $1.07/hr on-demand and $0.60/hr spot. Vast.ai marketplace pricing dips to $0.67/hr when high-reliability hosts are available, but consistency varies host-to-host.
Cheapest B200 per hour: Spheron spot at $2.12/hr is the cheapest B200 across all 5+ providers. On-demand: Spheron at $6.02/hr is matched by Nebius at $5.50/hr and Lambda Labs at $5.29/hr depending on region. AWS p6-b200 at roughly $14.24/hr is the most expensive option in this tier.
Cheapest RTX 4090 per hour: Vast.ai marketplace hosts go as low as $0.31/hr; Spheron lists RTX 4090 on-demand at $0.55/hr, and RunPod Community at $0.69/hr. For non-production workloads that tolerate marketplace host variability, Vast.ai wins on hourly cost. For a production endpoint, Spheron's dedicated tier delivers SLA-backed reliability at competitive cost.
Cheapest serverless GPU inference: Modal and RunPod Serverless both bill per-second of active compute, which can undercut hourly billing for sub-minute request workloads. The break-even is roughly 30% GPU utilization: below that, serverless wins on total cost; above that, dedicated hourly billing on Spheron or any neo-cloud is cheaper because there is no cold-start overhead.
The pattern across every GPU class: hyperscalers are not the cheapest option for any GPU model in 2026. The cheapest provider is usually a neo-cloud or marketplace, and within that group Spheron spot pricing leads on H100, A100, and B200 for fault-tolerant workloads. For workloads that cannot tolerate interruption, on-demand pricing across Spheron, Lambda, RunPod, and Nebius sits within 20% of each other and the deciding factor is usually GPU availability in your region rather than hourly cost.
Indian teams comparing costs should also check our dedicated GPU cloud guide for India for INR-equivalent pricing and domestic provider options including E2E Networks, Yotta, and Tata Communications.
For teams optimizing total compute spend across providers, see our GPU cost optimization playbook and heterogeneous GPU inference cost optimization guides.
Spheron gives you on-demand access to H100, H200, B200, A100, L40S, and RTX 4090 GPUs with per-minute billing, no egress fees, and spot pricing that cuts costs by up to 64% compared to on-demand. No contracts, no minimums, no hidden fees.
Frequently Asked Questions
As of May 2026, Spheron lists H100 PCIe at $2.01/hr on-demand and H100 SXM5 at $2.50/hr on-demand ($1.03/hr on spot), among the lowest H100 rates available. Lambda Labs lists H100 SXM at $2.49-$3.44/hr depending on configuration, AWS at ~$6.88/hr, and Azure at ~$12.29/hr (per GPU on the ND96isr H100 v5 instance). Spheron spot pricing cuts H100 SXM5 costs by ~59% for fault-tolerant workloads.
As of May 2026, Spheron's B200 SXM6 is $6.02/hr on-demand with $2.12/hr spot pricing. RunPod Secure Cloud is $4.99/hr, Nebius $5.50/hr, Lambda Labs $4.99-$5.29/hr, and AWS p6-b200 approximately $14.24/hr on-demand. At $2.12/hr spot, the B200 gives 2.4x the memory bandwidth of H100 PCIe ($2.01/hr) with minimal premium, making it the clear price-performance leader for fault-tolerant inference workloads.
The most common hidden costs are: egress bandwidth fees ($0.08-$0.12/GB on hyperscalers, free or flat on most neo-clouds), persistent storage ($0.08-$0.15/GB/month), minimum rental commitments (some providers require 1-hour or 1-day minimums), and network/IP address fees. AWS, GCP, and Azure typically charge $0.08-$0.12/GB for data egress, which can exceed the GPU cost for large model checkpoints.
Spot pricing is worth it for batch training jobs, offline inference, and any workload that can checkpoint and resume after interruption. Savings range from 40-65% below on-demand rates. Avoid spot for production inference APIs, real-time serving, or jobs without checkpointing. Most GPU cloud providers, including Spheron, RunPod, and Vast.ai, offer spot instances.
Reserved GPU pricing (also called committed-use or contract pricing) typically requires 1-month to 12-month commitments in exchange for 20-40% discounts vs on-demand rates. AWS EC2 reserved instances, Azure reserved VMs, and GCP committed-use contracts all follow this model. Neo-cloud providers like Lambda Labs and CoreWeave offer reserved clusters at negotiated rates. Spheron offers volume discounts via direct contact for longer commitments.
