What is our primary use case?
I use Elastic Search, and from time to time I use it, but most of the time I am a system administrator. I deployed it more than using it. At the beginning, I was a system administrator, responsible for the deployment and maintenance of Elastic Search clusters. For a few years now, I have started to use it more because the end users are rookie users. They need a lot of help to be able to use Elastic Search effectively. I started to be a user approximately five years ago.
Today, at least, we provide a global, unique Elastic Search cluster for the whole company, and all teams store their logs inside, their traces, and their APM traces. Teams use Kibana to display information. We also use Prometheus exporters to collect metrics from the logs. We execute some query DSL over Elastic Search to collect metrics, which will be injected in a time series database like Prometheus. This is the main usage. We store metrics, logs, and APM traces.
What is most valuable?
The deployment of Elastic Search is excellent. I like Elastic Search very much for that. I say regularly to the team that Elastic is elastic. It is really difficult to break. This was not the case a few years ago when I worked with Elastic Search version one and version two. Starting with version six of Elastic Search, it started to be really strong. Today, in the past, the main issue was about the data and the volume.
At the moment they integrated lifecycle policy for indices, ILM, Index Life Cycle Management. When it was created, additionally to the data stream, it started to be really easy to have all the same index volume. It is really easy to administrate and to balance data between data centers and between data nodes, and to keep the same everywhere. It is very nice. It is my favorite feature of Elastic Search. It is so easy to manage. Also, maybe because we used it for a long time, we started to be comfortable with all the setup and the node type, and how we should manage our cluster to make it resilient. I think it is really easy to maintain comparatively to some other databases.
What needs improvement?
To be honest, there is only one downside of Elastic Search that makes sense because we use a basic license, which is a free license. We do not have some features available because of the free license. Except for that, I do not have any complaint. It works perfectly. It is pretty easy to administrate and to use. I do not have complaints, to be honest, except the fact that we do not have all features available such as the APM service map or alerting.
We are not able to use a provider like Sentry, Slack, or PagerDuty. We are forced at some point to generate metrics from the logs in order to use our alerting stack in Prometheus, which works. It is an open-source project which allows us to generate alerts to Slack, PagerDuty, and some third-party tools without any license. However, it is not doable with Elastic Search in the open-source version. The alerting part is the most complicated part to manage because of the license.
What do I think about the stability of the solution?
From time to time we have some JVM, Java Virtual Machine issues with Elastic Search. However, it is more linked to users' requests. From time to time, people ask Elastic Search to search inside one year of logs without a nice query and without any filters. This is clearly not doable and some nodes will crash. This makes sense. However, Elastic Search is really stable when we do not have this kind of request.
What do I think about the scalability of the solution?
Elastic Search is the perfect tool for scalability. You just need to deploy new nodes. They will be able to join and reach the cluster really easily. I appreciate it for that as well because today at VP, we use Terraform to deploy our infrastructure. All Elastic Search nodes are managed through Terraform. If I need to extend my data node or my ingest node or whatever, I just need to deploy new nodes with the same setup, and the node will join my cluster, and it will scale horizontally really easily.
How are customer service and support?
I have never had to contact the technical support of Elastic Search.
How would you rate customer service and support?
Which other solutions did I evaluate?
For logs management, I have not used any alternatives or something similar to Elastic Search. For APM as well, there was a plan in the past to try to migrate to Grafana, the Grafana open-source platform for APM traces using Tempo. Tempo is a Grafana Labs project. However, we decided to keep Elastic Search for that, so we do not have any other tool or similar tool to accomplish that.
Maybe just one, it is about error tracking. We can track errors with APM inside an application, and currently we use Sentry, which is not just an error tracking platform, but also about performance management. However, we use it only for error tracking. It is more useful for developers at the beginning of a new project. Most of the time, they prefer to be connected to Sentry more than APM in order to track errors. When the project will be in production, they will be more focused on the performance than the errors. At this moment they will start to use APM, Elastic Search APM more than Sentry. We do not provide any performance indicators. Sentry is also able to manage performance metrics, but we use it only for errors and everything related to performance has been disabled.
What other advice do I have?
I think the pricing of Elastic Search is really, really expensive. The main point is that we do not get any license. I tried in the past, a few times, to contact the Elastic Search team to get a quote, and it was so complicated each time to get a quote because of the volume and the number of nodes. We are a big company at VP, so we have a lot of nodes, more than one hundred. For sure it was so expensive. They tried to tell me about the enterprise mode and about the new license way to manage cost based on CPU and memory usage. It was really expensive because at this moment, we do not use any cloud services. Our Elastic Search cluster is on-premises.
Everything is self-hosted at VP tech, at VP. We do not have any limit. People using AWS or GCP have limits because the volume of data is really expensive in cloud services and cloud platforms. Because we self-hosted everything around our services such as Elastic Search or Sentry, the idea is to let the user be able to store a lot of data and a lot of metrics. We try to train the team to have a good log level. We do not have such limitation in terms of volume. We have a really big cluster, and at the end, the price is so huge. I gave this review a rating of ten out of ten.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: My company does not have a business relationship with this vendor other than being a customer.