Name: NVIDIA H200 GPU Rental
Brand: NVIDIA
Availability: InStock

Question 1

How much does it cost to rent an H200 GPU?

Accepted Answer

On Spheron the H200 starts at $1.19/hr per GPU per hour on demand. Billing is per minute with no minimum commit. For reference, specialist clouds price H200 around $3.79 to $3.99/hr per GPU (Lambda, Jarvislabs, RunPod), AWS p5e runs ~$4.98/hr per GPU after the January 2026 15% increase, CoreWeave is ~$6.31/hr, and Azure ND H200 v5 and GCP a3-ultragpu on-demand land around $10.60 to $10.87/hr per GPU.

Question 2

What is the cheapest way to rent an H200?

Accepted Answer

Spot instances on Spheron are the cheapest route when they are available, typically 40 to 60 percent below the dedicated rate. Spot can be reclaimed when demand spikes, so checkpoint every 15 to 30 minutes and use spot for fault-tolerant jobs (fine-tuning, batch inference, experimentation). For production serving with SLAs, stay on dedicated (99.99% SLA, non-interruptible). Both are on-demand tiers with per-minute billing.

Question 3

Can I rent an H200 by the hour?

Accepted Answer

Yes. Spheron bills per minute with no minimum. A one-hour benchmark costs one hour. No reserved-instance contracts on dedicated or spot, and no commit fees.

Question 4

How fast can I deploy an H200 instance?

Accepted Answer

Most H200 instances are live in under 2 minutes. Hardware is pre-warmed and provisioning behaves like a container start rather than a VM boot. If your Docker image is ready, you can be serving tokens inside three minutes of hitting deploy.

Question 5

What is the main difference between H200 and H100?

Accepted Answer

H200 shares H100's Hopper compute (same 4th-gen Tensor Cores, same FP8 via Transformer Engine, same 989 TFLOPS TF32 and 3,958 TFLOPS FP8 with sparsity). What changes is memory: 141GB HBM3e at 4.8 TB/s on H200 versus 80GB HBM3 at 3.35 TB/s on H100. That is 1.76x the capacity and 1.43x the bandwidth, so H200 wins on anything memory-bound, especially long-context LLM serving.

Question 6

Is H200 better for inference or training?

Accepted Answer

Both, but it is especially strong for inference. The extra memory lets you run bigger batches, longer contexts, and multiple models per GPU, which directly improves tokens per dollar on serving workloads. For training, H200 helps when KV cache or activation memory is the bottleneck. If memory is not the limit, H100 often has better price-performance.

Question 7

What LLM sizes can a single H200 handle?

Accepted Answer

A single H200 comfortably handles 70B models in FP8 or FP16 with headroom for KV cache on long contexts, and 100B-class models with smaller batches. For 200B+ parameter models, tensor parallel across 2 to 8 H200s with NVLink is the right pattern. Mixtral 8x22B, DeepSeek V3, and Llama 3.1 70B all run well on a single H200.

Question 8

Can I serve multiple models on one H200?

Accepted Answer

Yes. With 141GB of VRAM you can colocate two to three models in the 30B range, or three to five smaller 7B to 13B models. That is useful for stacks that need chat, code, and embedding models together, or for A/B serving of multiple checkpoints on the same card without cross-GPU hops.

Question 9

Do you support multi-GPU H200 clusters with NVLink and InfiniBand?

Accepted Answer

Yes. Spheron offers 8x H200 per node with 900 GB/s NVLink and multi-node clusters connected by 400 Gb/s NDR InfiniBand with GPUDirect RDMA. Tested with vLLM, TensorRT-LLM, SGLang, Megatron-LM, and DeepSpeed ZeRO-3. Larger configurations are available on request.

Question 10

What regions are H200s available in?

Accepted Answer

H200 capacity is online across North America, Europe, and Asia, sourced from data center partners. Availability shifts with demand; the dashboard shows live capacity per region.

Question 11

What frameworks are optimized for H200?

Accepted Answer

All major serving stacks are tuned for H200: TensorRT-LLM (highest peak throughput, NVIDIA official), vLLM (OpenAI-compatible, easy deployment), SGLang (structured decoding), and Hugging Face TGI. Training stacks (PyTorch FSDP, DeepSpeed ZeRO-3, Megatron-LM) also have H200 kernels. CUDA 12.6+, cuDNN, and NCCL ship pre-configured in Spheron images.

Question 12

Is the H200 worth it over the H100?

Accepted Answer

If your inference is hitting H100's 80GB ceiling (OOM at large batches, KV-cache pressure on long contexts, or you want to colocate multiple models), H200 pays for itself through higher tokens per dollar. If you are comfortably under 80GB, H100 is cheaper. Train on H100, serve on H200 is a common split.

Question 13

Do you offer enterprise SLAs and dedicated support for H200?

Accepted Answer

For 100+ GPU deployments and production-critical workloads, Spheron offers dedicated Slack or Discord support, sourcing assistance, and SLA-backed instances. Smaller deployments are self-serve through the dashboard.

Question 14

How does H200 pricing on Spheron compare to AWS, GCP, and Azure?

Accepted Answer

For the same H200 hardware, Spheron is meaningfully cheaper than the hyperscalers on-demand. As of April 2026, AWS p5e runs ~$4.98/hr per GPU (post the January 2026 15% price increase), CoreWeave is ~$6.31/hr, Azure ND H200 v5 lands around $10.60/hr per GPU, and GCP a3-ultragpu-8g on-demand works out to roughly $10.87/hr per GPU. Spheron starts at $1.19/hr. Same silicon, different pricing model.

Provider	Price/hr	Savings
SpheronYour price	$1.19/hr	-
Lambda	$3.79/hr	3.2x more expensive
Jarvislabs	$3.80/hr	3.2x more expensive
RunPod	$3.99/hr	3.4x more expensive
AWS p5e	$4.98/hr	4.2x more expensive
CoreWeave	$6.31/hr	5.3x more expensive
Azure ND H200 v5	$10.60/hr	8.9x more expensive
Google Cloud a3-ultragpu	$10.87/hr	9.1x more expensive

Rent NVIDIA H200 GPUs on Demand from $1.19/hr

Technical specifications

Pricing comparison

Need More H200 Than What's Listed?

When to pick the H200

Pick the H200 if

Pick the H100 instead if

Pick the B200 instead if

Pick the A100 instead if

Ideal use cases

Long-context LLM inference

Multi-model and RAG serving

LLM fine-tuning and RLHF

High-throughput inference at scale

Performance benchmarks

Serve Llama 3.1 70B FP8 on one H200 in under 3 minutes

Multi-GPU H200 with NVLink and InfiniBand

H200 vs alternatives

Related resources

NVIDIA H100 vs H200: Benchmarks, Specs, and When to Upgrade

H200 Deployment Guide: Long Context, Multi-Model, and NVLink Clusters

H200 vs B200 vs GB200: Which Blackwell-Class GPU Fits Your Workload?

AMD MI300X vs NVIDIA H200: Memory, Performance, and Cost

Best NVIDIA GPUs for LLMs

GPU Memory Requirements for Large Language Models

Frequently asked questions

Also consider

H100

B200

GH200

A100