What is our primary use case?
NVIDIA RTX Series cards are very useful for Edge AI, basically to run local AI and local LLMs. Instead of running LLMs on the cloud or using the general ChatGPT, you can run your own LLMs on-premise. So it helps a lot. Though it is not pretty expensive, NVIDIA has enterprise-level graphics cards like H100 and other Blackwell cards which are very expensive. We don't use those; we use the smaller models, less than 32 GB, and they are pretty good.
The use case for NVIDIA RTX Series is for RAG systems, Retrieval-Augmented Generation. For local RAG, we provide the solution for companies who want to have AI queries on their information locally so that it is not shared. If you go to ChatGPT and try to find information, you will be sharing your personal information, so there will be an issue of privacy there. Our main customers are small hospitals and law firms. Right now, we will have more, especially for radiologists and other things.
What is most valuable?
Ray tracing is part of NVIDIA RTX Series.
The main reason to use NVIDIA RTX Series is because of CUDA. NVIDIA has created their own development environment, which is much easier to use. There are other vendors such as AMD, but we have a little bit of compatibility issues on various LLMs. The models might not run on Intel GPUs or AMD GPUs, so NVIDIA is straightforward and easy to use.
What needs improvement?
NVIDIA RTX Series, if they provide more, any AI LLM is dependent on VRAM. Right now, the NVIDIA RTX Series 5090 comes with only 32 GB VRAM. But luckily, Apple has come up with their Mac Studios which have unified memory where we could use that unified memory as VRAM. If unified memory systems come into the picture, then NVIDIA might lose its value. Nowadays, people are buying Mac Minis which have unified memory to run local AI, local LLMs. Still, they are not as fast as NVIDIA, but there is a chance. It is a horse race; you don't know which horse will win next.
It all depends on each hardware's capability, how many Tensor Cores it has, and at what frequency it is running. So I'm not sure how I can assess it; it all depends on how the architecture is and how fast your system is.
For how long have I used the solution?
I have been working with NVIDIA RTX Series for about two years.
What do I think about the stability of the solution?
Initially, there were teething problems since the drivers had issues. It only runs on Linux platforms or their own platforms. Slowly, now they are stabilizing. They're providing proper drivers. Initially, maybe two years ago, it was difficult. Linux systems would release a driver, and when a patch comes into the Linux system, that driver wouldn't work. There were issues, but now it has been stabilized.
The stability is very stable and predictable. I would rate the stability level an eight.
What do I think about the scalability of the solution?
The scalability for NVIDIA RTX Series, the 3090 had NVLink. You could put in many cards and increase the VRAM and processing power. But with 4090 and 5090, they don't provide NVLink anymore; you cannot cascade them. You need to go for enterprise-level cards. If you want to pay a higher cost, you can always go. Scalability on lower-side NVIDIA RTX Series cards is not pretty good, which NVIDIA might have thought to reduce; I don't know. But in the old 3090 cards, you had NVLink adapters available where you could put in three or four 3090 cards and run a larger model on that.
How are customer service and support?
I communicate with the local vendor, not directly with NVIDIA.
They are very helpful and professional.
On the driver's side, when there was an issue with the driver and the system was failing, I had to ask which driver works best with which version of Linux, and they provided a proper answer. It wasn't for any repair or anything.
I would rate this technical support a seven.
How would you rate customer service and support?
What was our ROI?
Because we spend and give solutions at the lowest possible cost, we have a good return on investment. There are no worries on that.
What's my experience with pricing, setup cost, and licensing?
NVIDIA RTX Series 5090 is around, I am in Australia, and it will cost you around $4,500 to $5,000 per piece from the dealer. If you're going for the Pro series, NVIDIA RTX Series Pro 6000, those are around $14,000, which has 96 GB VRAM and all. But right now we only use 5090, which is good enough for our business purpose.
Which other solutions did I evaluate?
We did try various Intel graphics, Intel Neural Sticks, and those things. We have been working especially in the AI arena for the past five or six years. We have evaluated AMD Radeon and Intel's Arc GPUs. The issue with Intel is that it is stuck; the maximum you can get is 16 GB VRAM on that. Intel cards may work well, but they get heated up very fast, consume more power, and are not easy to manage.
Resources are required to manage them, and they get hot; it's not worth it. They are a little cheaper than NVIDIA, but it's not worth it.
NVIDIA is always my first choice because, if you're a Windows customer, you will always go to Intel products. If you're running Windows, you will always choose AMD CPU for Windows only if you have a limited budget because they are cheaper. But NVIDIA is more compatible, and because of their CUDA environment, people use NVIDIA.
What other advice do I have?
I have been working on AI, still on the LLMs which are there on Hugging Face. Hugging Face is one of the premier sites where you get, it is a repository for the AI.
I'm currently using in the lab, which is an AI lab, NVIDIA RTX Series 5090. NVIDIA RTX Series 5090 is a graphics card, GPU card, 32 GB. If you search for NVIDIA RTX Series 5090, you will find that. Those are the latest Blackwell GPUs from NVIDIA.
I have been using their previous cards: NVIDIA RTX Series 3090, 4090. Now it is 5090. NVIDIA RTX Series 5090 has 32 GB VRAM, and they are pretty good.
Our current work involves more data crunching rather than visual things. We are not doing much on visual right now, but once we get into radiologists and other things where we use image processing, then ray tracing will be very much useful. Right now, we do inferencing more than ray tracing.
Compared to 3090 and 5090, if you do a comparison between these two, the DLSS here has much better frame rates, much faster, and it also consumes a lot of power. In the winter, I don't need the room heater.
The heat generated by the system is good enough to keep the room warm.
For local AI, once the unified memory systems come into play, they consume less power. My 5090 system requires at least a 1000-watt power supply, and it consumes at least 650 to 800 watts. That is quite high for one card.
These new unified memory systems will run at a lower wattage and also give you the required output faster.
I don't run any metrics because we don't do evaluation of various things. It depends on various models: it should be compatible, it should be running well, and it should give us a response faster, meaning inferencing faster. We work on models which are less than 20 GB, between 20 to 30 GB, and NVIDIA RTX Series 5090 has 32 GB RAM, VRAM there.
Because we spend and give solutions at the lowest possible cost, we have a good return on investment. There are no worries on that.
I communicate with the local vendor, not directly with NVIDIA.
They are very helpful and professional.
On the driver's side, when there was an issue with the driver and the system was failing, I had to ask which driver works best with which version of Linux, and they provided a proper answer. It wasn't for any repair or anything.
They are good because I'm not talking directly to the vendor; I'm talking to a local vendor who is a supplier. So that is good enough. My overall rating for this product is eight out of ten.