What is our primary use case?
I have been using AWS Elastic Disaster Recovery to replicate our production servers between regions. Our initial setup was straightforward, involving installing the replication agent and configuring the staging resources. The main advantage is that it took very few minutes to get started.
We are implementing this service to mainly replicate our critical servers to another AWS region for disaster recovery. Over this period, we have used it for continuous replication and a few DR drills, and it has been reliable and fairly easy to manage. In a real-time scenario where a data center in Iran was demolished due to war, we currently have AWS Elastic Disaster Recovery in two regions: Hyderabad and Mumbai. With this setup, we are able to replicate our infrastructure accordingly.
During a region migration project, we are using continuous block-level replication, which keeps the RPO very low. Even if anything happens, the data loss is minimal. We regularly run DR drills using the Test Launch feature, which helps us validate the recovery process without affecting the production servers. Fast recovery time is particularly important for us. During our testing process, the recovery instance launches within a few minutes based on our server size, whether it is small, large, or medium. We also used AWS Elastic Disaster Recovery during our region migration activity to replicate the servers from one region to another, which made the migration process much smoother and easier. Our AWS console provides us clear visibility into what is happening, whether it is success or failure. It gives us clear visibility of the replication status, recovery points, and lag time, making it easy for us to monitor whether everything is healthy or not.
What is most valuable?
The best features that AWS Elastic Disaster Recovery offers is continuous block-level replication with a very low RPO. It keeps replicating the changes from the source servers to the DR region almost in real-time. Even if any failure happens, we can recover the servers with minimal data loss, which is one of the best features. The other thing which I really value is the Test Launch feature; we can perform DR drills anytime without affecting the production environment. Overall, the best combination of AWS Elastic Disaster Recovery is its near real-time replication and quick recovery testing, and this makes the service very useful in real-world scenarios.
AWS Elastic Disaster Recovery has impacted my organization positively by significantly reducing our recovery time, which is our RTO, and also the data loss risk. Because the servers are continuously replicating to the DR region with minimal downtime, in case of any major issue in the primary environment, we can launch the recovery instance within minutes. Another positive impact was during the DR drills and audits. Since AWS Elastic Disaster Recovery allows non-disruptive test launches, we were able to demonstrate our DR capabilities more confidently to the internal teams and management. Overall, it gave the organization more confidence in handling any outages and also improved the overall resilience of our infrastructure.
What needs improvement?
Some features that I personally feel can be improved are more simplified monitoring and reporting. As I previously mentioned, the console shows the replication status. If it had more detailed dashboards or built-in reports for DR readiness, it would make it easier for the teams to track everything in one place. Another improvement would be cost visibility and optimization guidance in optimizing the cost and also giving us visibility of it. Because the staging resources and replication storage are running continuously, it would be very helpful for organizations and users if AWS provided clearer cost insights, recommendations, and remediations to optimize the DR environment. It would also be useful if AWS added more automation options for application-level recovery, such as easier ways to handle IP changes, domain name system (DNS) updates, or application dependencies during failover.
Additionally, we can simplify the setup and configuration process. For someone new to the service, understanding the staging settings, launch templates, and networking configurations can take some time. Setup simplicity and more detailed monitoring and alerting features would be beneficial. If we could add that, we can easily track the replication health, lag, and potential issues. Instead of relying on other additional tools for monitoring and alerting features, we can rely on AWS Elastic Disaster Recovery itself.
For how long have I used the solution?
I have been using AWS Elastic Disaster Recovery for approximately 1.5 years.
What do I think about the stability of the solution?
AWS Elastic Disaster Recovery is actually stable.
How are customer service and support?
The customer support has been very interactive. If we are struggling in any part, it has been generally good because they respond within a reasonable time and have been helping us in troubleshooting each and every step if we face any issues. The quality of the support is good overall.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, I have not used any disaster recovery solution, but I have heard about manual solutions. I know the manual process of how they were used in the organization previously, but I have not used it, such as the traditional backup methods.
Before this, my organization was relying on the traditional backup-based recovery method, such as using snapshots and manual server restoration. I have heard about this and gotten opinions on the aspects of these methods.
How was the initial setup?
When I started working with AWS Elastic Disaster Recovery, one of the major issues I faced during my learning phase was understanding the initial setup and replication process. Installing the replication agent was straightforward, but configuring the staging area, the IAM roles needed for it, and the launch settings actually took time. Another challenge was during my initial replication; for servers with large volumes, the first sync actually took quite a long time. We had to plan it properly to avoid the network impact since it took a long time. We also faced a small issue during the first test launch where the recovery instance came up, but some application configuration and private IP settings needed adjustment before the application worked properly.
What was our ROI?
My ROI in this case has been quite good in my organization. Instead of maintaining a full secondary DR infrastructure with a running EC2 instance, we only pay for the replication storage and staging resources, which keeps the overall cost lower. Because of AWS Elastic Disaster Recovery, most of the process is automated. During DR testing or failover, we can launch the recovery instance in just a few clicks. The recovery process requires fewer people and much less time, which has saved my organization engineering effort and operational time.
What's my experience with pricing, setup cost, and licensing?
My experience with the pricing, setup cost, and licensing is that it has been a reasonable cost. Compared to maintaining a full standby disaster recovery environment, it is actually reasonable. In my case, since the cloud is basically a pay-as-you-go model, we only pay for the replication storage, data transfer, and small staging instances. We don't need to keep the EC2 instance running all the time in the DR region, which helps us reduce our overall DR cost. In my perspective, it is more cost-effective.
Which other solutions did I evaluate?
We have not purchased AWS Elastic Disaster Recovery in the AWS Marketplace; we are using the in-built AWS Elastic Disaster Recovery.
What other advice do I have?
I would give advice to be strong in the initial setup. Because for anyone new to AWS Elastic Disaster Recovery, the initial setup may take time. They need to plan their DR architecture properly. I would also recommend testing the DR setup regularly using the Test Launch feature so that the team can be very familiar with the recovery process. I have shared my thought process on AWS Elastic Disaster Recovery, and I have given my overall thought process and views on this service. I would rate this solution an 8 out of 10.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)