Rent NVIDIA A100 80GB GPUs on Demand from $0.45/hr
80GB HBM2e, NVLink 600 GB/s, MIG, per-minute billing. Live in under 2 minutes.
Renting an NVIDIA A100 80GB on Spheron starts at $0.45/hr per GPU per hour on dedicated (99.99% SLA), with interruptible spot instances cheaper still. There is no minimum commit, billing is per minute, and most instances are live inside two minutes. The A100 has 80GB of HBM2e and 2.0 TB/s of memory bandwidth, enough to train or fine-tune models up to about 30B parameters on a single card and serve quantized 70B models at production latency. SXM variants add 600 GB/s NVLink between GPUs for multi-GPU training. Hyperscaler on-demand A100 80GB pricing runs roughly $3.40 per GPU per hour on AWS p4de, $4.10 on Azure ND A100 v4, and about $5.00 on GCP a2-ultragpu.
Technical specifications
Pricing comparison
| Provider | Price/hr | Savings |
|---|---|---|
SpheronYour price | $0.45/hr | - |
Jarvislabs | $1.49/hr | 3.3x more expensive |
TensorDock | $1.57/hr | 3.5x more expensive |
Lambda Labs | $2.49/hr | 5.5x more expensive |
AWS p4de | $3.43/hr | 7.6x more expensive |
Azure ND A100 v4 | $4.10/hr | 9.1x more expensive |
Google Cloud | $5.07/hr | 11.3x more expensive |
Need More A100 Than What's Listed?
Reserved Capacity
Commit to a duration, lock in availability and better rates
Custom Clusters
8 to 512+ GPUs, specific hardware, InfiniBand configs on request
Supplier Matchmaking
Spheron sources from its certified data center network, negotiates pricing, handles setup
Need more A100 capacity? Tell us your requirements and we'll source it from our certified data center network.
Typical turnaround: 24–48 hours
When to pick the A100
Pick the A100 if
You are training or fine-tuning a 7B to 30B parameter model, serving a quantized 70B model, or running classic workloads like BERT, ResNet, recommender systems, and RAPIDS analytics. The A100 is also the right call when you want the most mature ML stack on the market and are happy trading a bit of FP8 throughput for 40 to 60 percent lower hourly cost than H100.
Pick the H100 instead if
Your workload is FP8-native (Llama 3 / DeepSeek inference, FP8 training runs) or you need Transformer Engine speedups. H100 is roughly 2.5 to 3x faster on Tensor Core math and 1.7x more memory bandwidth, but it costs about 2x as much. If the speedup pays for itself, make the jump.
Pick the L40S instead if
You are running pure inference on sub-30B models, or batch image and video generation. L40S has 48GB GDDR6 and a much lower hourly cost, with strong FP8 and Ada Lovelace Tensor Cores. It has no NVLink, so it is not the right pick for multi-GPU training.
Pick the RTX 4090 instead if
You are doing development, small-scale fine-tuning, or sub-13B inference on a budget. The 4090 has 24GB VRAM and no NVLink, but it is the cheapest way to run modern AI stacks. Step up to A100 once you need more memory or multi-GPU scaling.
Ideal use cases
LLM training and fine-tuning
Train or fine-tune models in the 7B to 30B range with mixed precision. FSDP and DeepSpeed ZeRO scale cleanly across 8x A100 with NVLink, and LoRA / QLoRA bring 70B within reach on a single card.
Production LLM inference
Serve models at steady latency with vLLM, TensorRT-LLM, or Triton. INT8 and FP16 paths are well optimized, and MIG lets you carve one A100 into up to 7 isolated inference slots.
Classic ML and computer vision
The A100 still holds the line on computer vision and recommender workloads that predate the LLM wave. Mature CUDA kernels, stable ecosystem, predictable throughput.
GPU data analytics and HPC
RAPIDS, cuDF, cuGraph, and GPU-accelerated SQL engines all target A100 first. FP64 throughput is 9.7 TFLOPS, enough for most simulation work that does not need Hopper-class double precision.
Performance benchmarks
Serve Llama 3.1 8B on an A100 in under 2 minutes
Spin up a Spheron A100 80GB, pull the vLLM image, and serve Llama 3.1 8B with an OpenAI-compatible API. Point any OpenAI SDK client at the endpoint and you are done.
# 1. Provision an A100 80GB from the Spheron CLI (or use the dashboard)spheron deploy --gpu a100-80gb --image vllm/vllm-openai:latest # 2. Inside the instance, serve Llama 3.1 8B Instructvllm serve meta-llama/Llama-3.1-8B-Instruct \ --max-model-len 8192 \ --gpu-memory-utilization 0.92 \ --port 8000 # 3. Hit the endpoint from any OpenAI-compatible clientcurl http://<instance-ip>:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Summarize MIG partitioning on A100."}] }'For 70B inference, add --tensor-parallel-size 2 and rent 2x A100 80GB with NVLink. For multi-node training, contact us for InfiniBand-connected clusters.
Multi-GPU A100 with NVLink and InfiniBand
A100 SXM4 nodes on Spheron link 8 GPUs with NVLink at 600 GB/s intra-node, and multi-node jobs use 200 Gb/s HDR InfiniBand with GPUDirect RDMA. That is the same fabric NVIDIA ships in DGX A100 systems, so PyTorch DDP, DeepSpeed ZeRO, and Megatron-LM run at close to linear scaling.
Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.
A100 vs alternatives
A100 is roughly 2.5 to 3x faster on training and inference, with 2.5x the memory. V100 is effectively end-of-life for modern LLM work.
L40S is cheaper and strong at single-GPU inference with FP8, but has no NVLink. A100 wins for multi-GPU training and 70B INT4 serving that needs 80GB.
Related resources
NVIDIA A100 vs V100: Specs, Benchmarks, and When to Upgrade
Side-by-side Ampere vs Volta comparison with benchmarks and migration guidance.
A100 Deployment Guide: SXM vs PCIe, Spot vs Dedicated, MIG
Deep dive on A100 configurations, interconnects, MIG partitioning, and deployment patterns on Spheron.
Best NVIDIA GPUs for LLMs
Framework for matching GPU choice to model size, from 7B on A100 to 670B on B200.
GPU Memory Requirements for Large Language Models
Calculate VRAM needs across precision levels and KV-cache pressure for every major model class.
How a 12-Person Startup Trained a 70B Model for $11,200
Cost breakdown for training a 70B model using spot A100 instances with aggressive checkpointing.
GPU Cost Optimization Playbook
Practical tactics to cut A100 spend: spot scheduling, MIG, batching, right-sizing.