What is our primary use case?
We use Prometheus for monitoring all aspects of our infrastructure end-to-end. That includes servers, virtual machines, databases, caching servers, ELK stack, and our Kubernetes servers. We are users of Prometheus.
What is most valuable?
The scraping mechanism is a wonderful feature. I've used many other monitoring systems that were mostly client-server-based models including Nagios, Zabbix, and New Relic, among others. With all of them, the server used to get overloaded when the client sent too much matrix, even in the case of a pull or push mechanism of client-server architecture. In my previous organization, we had to host several Nagios individual servers in case one went down. Prometheus gives us high availability automatically and a stand-alone process; if it doesn't run on one server, it runs on another. It's wonderful that the exports run on different servers. They scrape the matrix and then open it to a particular URL for Prometheus to read those metrics and then store them.
I very much like the remote write feature. Prometheus bridges the gap for everyone whether they've come from an old monitoring setup or are into microservices. I also like the concept of dynamic conflict which is brilliant.
We chose Prometheus because it's open source with a lot of documentation and community support which is lacking in other products.
What needs improvement?
The Prometheus community says it's not meant to be clusterized so people shift to solutions like Thanos and VictoriaMetrics. Prometheus could have done that too, it's not complicated. Rather than us having to use a different database, Prometheus could develop its own database a little more so that it becomes a one-stop solution. That would be wonderful.
One of the issues is that dynamic conflict uses regular expressions and it can be confusing for people not familiar with them and the unique specific symbols and line-cut characters.
For how long have I used the solution?
I've been using this solution for six years.
What do I think about the stability of the solution?
The product is stable. In my previous company, we used Prometheus with Docker Compose and kept high data retention. We didn't have a third database to store the matrix so the container used to go down very often. The issue was not with Prometheus but that there were insufficient resources for the monitoring system.
40% of those in our company need Prometheus for their daily work, including developers. Indirectly, that number goes to 80% when you include those reliant on the reporting. We do everything through Prometheus. We also use Nagios for monitoring bare metals and in a way Nagios monitors Prometheus and Prometheus monitors Nagios. In that way, we're able to monitor two different monitoring systems.
What do I think about the scalability of the solution?
The solution is not meant to scale.
How are customer service and support?
There is good documentation and the open source community offers good support, so I haven't needed to contact customer support.
How was the initial setup?
Prometheus is very easy to set up and is user friendly. It just runs and gets you a very simple UI running on export. Where it becomes complicated is people not understanding the configuration because its support of many exporters means a lot of jobs need to be written in order to use it well.
Deployment time depends on the use case. Our present use case is quite complex so implementation took about a week. We wanted to monitor 10 to 15 clusters so we had to deploy Prometheus on a different environment and ensure that our data was placed at a stable central location. We initially carried out a POC which took less than a day.
What's my experience with pricing, setup cost, and licensing?
The solution is open source.
What other advice do I have?
If a company uses bare metal systems and their product doesn't need significant extensive monitoring, I wouldn't recommend Prometheus. That's not because it's difficult to use but because it doesn't qualify for that use case. If you have a use case where you already have microservices deployed and you want information, then that's a suitable use case. Otherwise, Nagios or any other simple operating system is easier to use. It depends on what kind of product needs monitoring.
In spite of the scaling issue, Prometheus provides almost everything I need. Despite needing to integrate it with other tools, it's seamless and simple to use. I rate this solution 10 out of 10.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.