blog

Vast.ai Price/Performance Breakdown

Written by Yulia Sadovnikova | Oct 18, 2024 10:45:19 PM

By Jan Akhremchik

 

Vast.ai is a cloud provider of computational resources for typical AI workflows. It allows people across the world to sell their computational resources and allows other people to use those resources for their needs. So, the platform itself provides a connection between buyers and sellers and doesn't own any computational resources. The main advantage of this platform is that the prices of resources are lower than those of competitors.

But is it really obvious and transparent? And what happens when you allow people to sell resources from the hardware they built?  

Let’s check it, shall we?

Cost

First, let's discuss some numbers. We'll compare Vast.ai with its competitors in the nearest price range, using the most common and popular configurations.

Provider

GPU Model

VRAM (GB)

Hourly Rate (USD) 

Monthly Cost (8 hrs/day, 22 days)

Vast.ai

       
 

RTX 3090

24

$0.16

$27.28

 

RTX 4090

24

$0.35

$61.60 

 

A40

48

$0.28

$49.81

 

A5000

24

$0.20

$35.73

 

A6000

48

$0.50

$88.00

tensordock.com

       
 

RTX 3090

24

$0.20

$35.20

 

RTX 4090                   

24

$0.48

$84.48

 

A40

48

$0.53

$93.28

 

A6000

48

$0.45

$79.20

runpod.io

       
 

RTX3090

24

$0.43

$75.68

 

RTX4090

24

$0.69

$121.44

 

A5000

24

$0.43

$75.68

 

A6000

48

$0.76

$133.76

The lower, the better, right? Well, not exactly.

When you see a page like that with a huge amount of numbers, it's easy to get overwhelmed and fixate only on words like **RTX 4090** and **$0.35/hr**. But there is more to it.

Let’s look at the price breakdown for that RTX 4090 machine.

That's pretty transparent, except for two nuances:

  1. Internet: This host charges you for each terabyte of downloaded/uploaded data. As a result, your actual monthly expenses can vary. Keep in mind that each host declares its own price for traffic, and it's not included in the overall estimation because it can't be calculated beforehand. In some cases, it might get expensive quickly.
  2. Storage: When talking about on-demand consumption of resources, it's easy to assume that you pay as you go, but that’s not the case with disk space. Disk space costs you money whether or not you're using your machine, and you need to account for that. In some cases, a higher per-hour cost may end up being cheaper in the long run, due to lower costs while the machine is idle.

Efficiency

When choosing a provider, knowing how compatible their hardware is is wise. The most important aspects that should draw your attention are:

  • CPU: In computationally expensive operations, like training a neural network, the CPU acts as a broker between data stored on your disk, your RAM, and your GPU memory. The CPU handles operations of copying this data, and in some cases, handles preprocessing. Therefore, it may act as a bottleneck in your training pipeline, struggling to feed the GPU with data continuously. It’s recommended to use server-grade CPUs with 10+ dedicated cores, depending on the amount of preprocessing you need. 

Below is the perfect example of a misconfigured machine, in terms not only of CPU, but the whole package overall.

  • Disk speed: Although disk speeds are generally decent across hosts, it’s important to keep in mind the same potential bottleneck mentioned in the CPU section. A bottleneck in disk read speed may lead to GPU idling during computation. Usually, 2000+ MB/s should be enough for most scenarios.  
  • GPU Interface: An aspect sometimes neglected by hosts is the bus itself. If the number of PCI-e lanes and their version is lower than the GPU can handle, it typically serves as another computational bottleneck. Nowadays, if the interface differs from "PCIe 4.0 16x," it **will** lead to data copying bottlenecks for Ampere/Ada generation GPUs.

 Here is an example of a poorly configured machine. Apart from strange and poorly performing "nvme" disk, the GPU bandwidth reduced by a half compared to what it should be for RTX A5000 card. That will result in performance drop during computation.

Usage

When used in an 'on-demand' manner, Vast.ai does not guarantee the availability of computational resources, and the reason for that is obvious. However, the drawback of this setup is that each time you need to start your machine, you may end up copying your data to a new instance provided by a different host under different terms, and that’s rather inconvenient.

Conclusion

Overall, Vast.ai is a decent platform for hosts and users to find each other. Given the price-per-GPU ratio, it offers one of the most advantageous deals on the market. However, taking into account all of the points mentioned above, this platform may not be suitable for complex or demanding pipelines.