What is our primary use case?
As a managed service provider with VMware, we offer virtual desktops to our customers. We support around 80+ customers, some use VMware, some use Hyper-V... it varies. In total, we support around one million virtual desktops.
My role as an enterprise architect is to provide solutions to these diverse customers.
How has it helped my organization?
From a VMware perspective, we use SRM, or Site Recovery Manager, for disaster recovery. SRM has an automated recovery process, which reduces the need for manual intervention during a disaster. This minimizes downtime and ensures faster recovery times.
The automation of failover and failback processes. is key. During a disaster or planned migration, SRM automates the failover process to the recovery site. It coordinates the shutdown of virtual machines on the protected site and powers on corresponding VMs on the recovery side.
This automation is crucial because it may involve hundreds of VMs that can't fail over all at once. SRM follows a specific procedure, ensuring that virtual machines are recovered in the correct order according to a predefined recovery plan. This minimizes downtime and ensures complete recovery.
Additionally, after a disaster or migration event has been resolved, SRM automates the failback process. This allows you to return virtual machines and data to the original production site.
What is most valuable?
Automated recovery is one good feature, SRM also offers simplified management. It has a centralized interface for configuring, testing, and executing disaster recovery plans across multiple sites. This means management from a single point, even for many sites.
Recovery plans can be different for customers. So, another important feature is the policy-based protection within SRM. Recovery plans can be customized based on business requirements, including Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). These objectives might differ between industries – healthcare might need a more aggressive RPO, for instance.
Additionally, SRM offers failback capabilities and reporting features for compliance.
What needs improvement?
While there are no major drawbacks, some potential improvements could address complexities in implementing and maintaining SRM.
The cost can be significant, and there's a resource overhead – meaning SRM consumes resources on both protected and recovery sites.
Additionally, it has a lot of dependencies on VMware infrastructure, and testing can be complex. Testing often requires extensive approval at the organizational level.
Finally, sufficient bandwidth is crucial, failover things are purely dependent on the bandwidth availability, so network limitations can impact performance.
For how long have I used the solution?
I have been using it for 15+ years.
What do I think about the stability of the solution?
One needs to consider the factors contributing to stability. Firstly, it's a mature technology with a robust architecture. It integrates seamlessly with hypervisor APIs.
Additionally, there's certified compatibility for the product. Hardware and software vendors ensure certification for compatibility with VMware.
Furthermore, VMware consistently releases updates, security patches, and enhancements, addressing vulnerabilities and improving capabilities. The extensive support, documentation, and active VMware community all contribute to the solution's ecosystem. Overall, I consider VMware SRM a stable product.
What do I think about the scalability of the solution?
The product is scalable. As a market leader, I don't anticipate challenges with its scaling capabilities.
How are customer service and support?
We handle the support. Our company handles support internally, but we do raise tickets directly with vendors when needed. VMware's support is good.
They have extensive pre-defined documentation that resolves roughly 80% of issues, providing solutions readily available to most customers.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We primarily compare with Dell servers and Microsoft Hyper-V.
VMware SRM focuses on disaster recovery automation and orchestration. Competitors offer similar capabilities but may differ in how they handle them.
For example, Microsoft has its Hyper-V hypervisor and failover platform. They've also integrated Azure Site Recovery for cloud-based resource recovery. Others, like Nutanix, may utilize tools like AHV for live migrations.
How was the initial setup?
Implementing and managing SRM can be complex, especially for organizations with limited expertise in virtualization and disaster recovery technologies.
It may require dedicated resources and training to effectively deploy and maintain. Typically, senior Level 3 and Level 4 system administrators are best suited for this due to the complexity involved. People with less experience might find it a bit difficult to understand.
There are various steps involved in deployment. A lot of planning is required, including hardware compatibility checks, installation, configuration, networking, storage, VMware creation, and integration with monitoring, backup, and security systems. Additionally, a lot of testing and optimization needs to be done.
The timeline depends on the scale of the deployment - small, medium, or large complexity. For a small deployment with templates, it might take around two months. Medium complexity could take three months, and a large, complex deployment could take around five months.
Moreover, VMware SRM with VMware infrastructure is easy. VMware's widespread adoption across data centers simplifies integration. Standard software features streamline integration with networking, storage, Active Directory, monitoring, backup, data recovery, and security teams. I don't foresee significant complexity with integration.
What about the implementation team?
For virtual desktop environments, there are different sizing approaches. Sizing can be based on the number of servers or the number of virtual desktops managed.
For example, with a hundred servers, it depends on the support window – 24/7, 16/5, 9/5, etc. We might deploy our own on-site engineers, with a mix of one senior, two mid-level, and two junior staff.
If we size based on the number of virtual desktops, then one resource per shift might support a thousand desktops. To cover 24/7 with two weekly off-days per person, we'd need a minimum of five resources for a thousand desktop operations.
What's my experience with pricing, setup cost, and licensing?
Mature products like VMware SRM often have higher costs. Their capabilities are extensive, but the licensing model can be complex.
What other advice do I have?
It's a very good solution. I would rate it a nine out of ten.
I definitely recommend VMware SRM for organizations with complex environments, particularly if cost is not the primary concern. It's important to understand that the solution is powerful but also has some inherent complexity.
I recommend VMware SRM for those with complex environments, especially if cost isn't a major limitation.