How has it helped my organization?
A primary benefit is high reliability. They have very good price performance and configuration options. Being able to configure them in different ways, for different node types, was something we needed.
What is most valuable?
In referring to the Apollos, what we liked about them was:
- A combination of the density
- The flexibility to run dual CPU nodes or add GPUs to other nodes
- Absolutely being able to mount into Omni-Path architecture, HFIs on those nodes, because we were the very first site in the world
- Being able to connect those in large quantities
- In Bridges, we have 800 Apollo 2000 nodes, and they have been running extremely well for us
What needs improvement?
I think it's on a good track. What's coming out in Gen 10 is very strong in terms of additional security. Overall, I think those are well architected. They're a very flexible form factor for scale-out. Assuming ongoing support for the latest generation CPUs and accelerators, that will be something we'll keep following for the foreseeable future.
In Bridges we combine the different node types to create a heterogeneous, ideal system. Rather than wishing we had more features in a given node type, we integrate different types. We choose different products from the spectrum of HPE offerings to serve those needs optimally, rather than trying to push any given node in a direction it doesn't belong.
What do I think about the stability of the solution?
Stability has been extremely good. Jobs run for days to many weeks at a time. We recently supported a campaign for a research group in Oklahoma, who were forecasting severe storms, doing this for 34 days. They were running on 205 nodes.
The example we're featuring was a breakthrough in artificial intelligence where an AI first beat the world's best poker players. And for that one, we ran 20 days continuously, and of course, the nodes had to be up because players are playing the games and we were running that on 600 nodes of Apollos. That was just as seamless, and it was a resounding victory. So, I think that's the strongest win through Apollos in our system so far.
What do I think about the scalability of the solution?
Scalability for us is limited only by budget. Using Omni-Path, we can scale our topology out with great flexibility. And so, scaling out workloads across Apollos has been seamless. We're running various protocols across them. We're running a lot of MPI, and they do spark their workloads. So the scalability has just been limited only by the size of our system.
How are customer service and technical support?
We have an arrangement with HPE technical support. As our system does call on them on occasion, but the stability has been very high. Over the past year and four months that we've been running bridges, I think we have only had under 70 calls on the whole system.
Which solution did I use previously and why did I switch?
We knew we had to invest in a new solution as we were looking at designing a system to serve the national research community. We knew what their application needs are, and what their scientific goals will be. So we were imagining what that system would have to deliver to meet those needs. So that's when they told us the kinds of servers we needed in the system. We have the Apollos, we have the L580s, with three terabytes of RAM, we have Superdome integrity with 12 terabytes of RAM, and we have a number of GL360 and other service nodes.
But it was really looking at the users requirements and looking at where high performance computing, high performance data analytics and artificial intelligence are going through about 2019, that that's what caused us to select the kinds of servers that we did, the ratios we did, and the topology we chose to connect them in.
How was the initial setup?
It was the first Omni-Path installation in the world, so people were very careful. With that caveat, I think it was straightforward.
Which other solutions did I evaluate?
We always look at all vendors before reaching a conclusion. I don't want to name them here, but we're always aware of what's in the market. We evaluate these for each procurement. We pick the solution that's best. The competitive edge for HPE involves several things. These are not in any specific order, as they are hard to rank.
- HPE's strategic position in the marketplace. Being a close partner with Intel, we trust them when there's a new CPU. We can get it in an HPE server very early on.
- When something new comes out, like Omni-Path, it was brand new then. We trusted that HPE would be able to deliver that in a validated product very, very early.
- We are always pushing the time envelope. Their strategic alliance with other strong partners, gave us trust that we would be able to deliver on time, and we were. That's unusual in this field.
- They uniquely had very large memory servers so the Superdomes, and the bandwidth in those servers, was extremely good compared to anything else on the market. We wanted that, especially, for large scale genomics. Putting that in the solution was a big plus. I'd say these items together were the strongest determining factors, from a technical perspective.
What other advice do I have?
I think the advice is to look at the workload very closely, understand what you want it to do, look at the product spectrum that's available here, and do the mix and match like we did. Build them together. There are software frameworks now that actually make it easier than when we did it, to stand up this sort of collection of resources, and to just go with what the workload needs.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.