We are using the solution for migrating out of the data center. Old apps need to be re-architected. We plan to move to multi-cloud for disaster recovery and avoid vendor lockouts. The migration is a mix between an MSP (Infosys) and in-house devs. The hard part is ensuring these apps run the same in the cloud as they do on-prem. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly, it is important not to cut corners which is why we needed observability.
Staff Cloud Engineer at a energy/utilities company with 51-200 employees
Good infrastructure and APM metrics with easy onboarding of new products
Pros and Cons
- "We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level."
- "The real issue with this product is cost control."
What is our primary use case?
How has it helped my organization?
The product has created a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in service now. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.
What is most valuable?
For use, the most valuable features we have are infrastructure and APM metrics. The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze.
We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level. Then we use Datadogs conditionals in the monitor to dynamically alert hundreds of teams, and with the ServiceNow integration, we can also assign tickets based on the environment. Now, our top teams are using APM/profiler to find bottlenecks and improve the speed of our apps.
What needs improvement?
The real issue with this product is cost control. For example, when logs first came out, they didn't have any index cuts. This leads to runaway logs and exploding costs.
It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there are no ways to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes that would save us 5X on our bill.
Buyer's Guide
Datadog
January 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
881,082 professionals have used our research since 2012.
For how long have I used the solution?
I've been using the solution for about three years.
What do I think about the stability of the solution?
The solution is very stable. There are not too many outages, and they fix them fast.
What do I think about the scalability of the solution?
It is easy to scale. It's why we adopted it.
How are customer service and support?
Before premium support, I would avoid using them since it was so bad.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We previously used App Dynamics. It isn't built for the cloud and is hard to deploy at scale.
How was the initial setup?
The initial setup was not complex. We just had to teach teams the concept of tags.
What about the implementation team?
We implemented the solution in-house. It was me. I am the SME for Datadog at the company.
What was our ROI?
We have seen an ROI. It has saved months of time and reduced blindspots for all app teams.
What's my experience with pricing, setup cost, and licensing?
We'd advise new users to be careful with logs, and the APM as those are the ones that can get expensive fast.
Which other solutions did I evaluate?
We looked into Dynatrace. However, we found the cost to be high.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Cloud Engineer, Vice President of Monitoring at a financial services firm with 10,001+ employees
Good ServiceNow integration, helpful API crawlers, and useful APM metrics
Pros and Cons
- "The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze."
- "It seems that admin cost control granularity is an afterthought."
What is our primary use case?
We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts.
The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability
How has it helped my organization?
Using the product has caused a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in ServiceNow. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.
What is most valuable?
For use, the most valuable features we have are infrastructure and APM metrics.
The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze.
We rely heavily on the API crawlers Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having to also make them add it at the agent level. Then we use Datadog's conditionals in the monitor to dynamically alert hundreds of teams.
With the ServiceNow integration, we can also assign tickets based on the environment. Now our top teams are using the APM/profiler to find bottlenecks and improve the speed of our apps
What needs improvement?
The real issue with this product is cost control. For example, when logs first came out they didn't have any index cuts. This caused runaway logs and exploding costs.
It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there is no way to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes, that would save us 5X on our bill.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
The solution is very stable. There are not too many outages, and they fix them fast.
What do I think about the scalability of the solution?
It is easy to scale. That is why we adopted it.
How are customer service and support?
Before premium support, I would avoid using them as it was so bad.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We previously used AppDynamics. It isn't built for the cloud and is hard to deploy at scale.
How was the initial setup?
The initial setup was not difficult. We just had to teach teams the concept of tags.
What about the implementation team?
We did the implementation in-house. It was me. I am the SME for Datadog at the company.
What was our ROI?
The solution has saved months of time and reduced blindspots for all app teams.
What's my experience with pricing, setup cost, and licensing?
I'd advise users to be careful with logs and the APM as those are the ones that can get expensive fast.
Which other solutions did I evaluate?
We looked into Dynatrace. However, we found the cost to be high.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Datadog
January 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
881,082 professionals have used our research since 2012.
Customizable and helpful for isolating and filtering environments
Pros and Cons
- "We have way more observability than what we had before - on the application and the overall system."
- "Auto instrumentation on tracing has not been very easy to find in the documentation."
What is our primary use case?
We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.
We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.
We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.
How has it helped my organization?
We have way more observability than what we had before - on the application and the overall system. That includes the GKE cluster, nodes, and pods. It's helped with our cloud-run instances, databases, and data storage.
We also started observability in the CI pipeline to measure our CI performance, as it was a pain point for us. We are aiming to do incremental deployments and releases, and the bottleneck so far has been our CI performance. The visibility on which actions or functions take the most time allows us to pinpoint and focus on improving configurations on these.
What is most valuable?
We use structure logging a lot to triage production issues. The querying, attributes and tags manipulation, and customization have been very helpful in isolating and filtering environments. The integration with Winston logger has also been a breeze.
First and foremost, was that structured logging, tags, and attributes have not only allowed us to narrow down to a problem quickly in production, they have also let us create dashboards from these logs to understand more user behaviors, such as how many users stop and leave our application before an upload has completed. That helps us understand how important processing time is to a user.
We also intend to use distributed tracing more to understand where the error has occurred in a particular request.
What needs improvement?
Definitely, documentation could use improvement. As I navigated and try to find instrumentation and implementation details, I discovered inconsistency among SDKs based on languages.
There are also places where highlighting can be improved. I once created an issue on GitHub, and it was resolved right away by an engineer. He pointed out that it was actually in the documentation. I looked again and found it was not very obvious. We were stuck on the problem for days.
Auto instrumentation on tracing has not been very easy to find in the documentation. We ended up using OpenTelemetry, yet the conversion between tracing contexts has been difficult.
For how long have I used the solution?
We've used the solution between six months and a year.
How are customer service and support?
Customer service and support are generally very fast. I did experience one ticket, which involved changing the log index retention period, not being responded to. Any support tickets related to technical issues were resolved pretty fast.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We used to use GCP Stackdriver for logging and monitoring since our infrastructure is all GCP based. It was lacking a lot, particularly on tracing and structured logging. We often had a lot of trouble triaging and diagnosing a production problem. Datadog's specialty is observability. Since we started using the product, we were able to create dashboards, and utilize APM, continuous profiling, RUM, and distributed tracing for production support and user trends.
Datadog also offers labs and workshops for its products, which is very helpful.
What about the implementation team?
We implemented the product ourselves.
What was our ROI?
I'm not sure what our ROI would be.
What's my experience with pricing, setup cost, and licensing?
We started with on-demand pricing as we were re-writing our product, and we weren't sure about the total usage. After we went into production and released the product, we experienced a price surge. Fortunately, our Datadog account manager reached out to us and suggested a monthly subscription, which is what we'll be switching to.
I'd advise keeping an eye on the usage and possibly setting up some monitoring on price. We didn't have much of a setup cost; we started with a free trial and continued with on-demand after the trial ended.
Which other solutions did I evaluate?
We didn't evaluate many of the other options. However, we do also use OpenTelemetry, which is vendor agnostic and integrates with Datadog.
What other advice do I have?
We always keep the Datadog agent to the latest version.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Manager at a manufacturing company with 10,001+ employees
Great network monitoring, testing, and integration tools
Pros and Cons
- "The visibility into our network has allowed for quick diagnosis of failures, identification of underutilized or over-utilized resources, and allowed for cloud cost optimization opportunities."
- "I would love to see more metrics or analytics in IoT devices."
What is our primary use case?
This solution is for physical device monitoring across breweries, including PLCs, HMI Cameras, RFID panels, scales, etc. We want to gain visibility into these devices to influence predictive maintenance and unscheduled downtime. We want to monitor physical devices across the zone from a control tower perspective for end users and support teams alike. Understanding more about the performance of the devices and mechanical components will allow us to schedule downtime to fix imminent catastrophic failures and prevent unplanned downtime and lost revenue.
How has it helped my organization?
Previously, we had no visibility into the architectural layout of our infrastructure. The UI of Datadog has allowed for increased visibility and access to broken or underperforming resources or critical pieces of infrastructure. Beyond this, it has allowed us to identify areas where we can optimize cost in our cloud infrastructure.
What is most valuable?
The most valuable features I have found are network monitoring, testing, and integration tools. The visibility into our network has allowed for quick diagnosis of failures, identification of underutilized or over-utilized resources, and allowed for cloud cost optimization opportunities. The ability to correlate metrics has proven useful in determining downstream or upstream issues influencing the device, machine, or database having issues.
What needs improvement?
I would love to see more metrics or analytics in IoT devices.
For how long have I used the solution?
I've been using the solution for approximately two years.
What do I think about the stability of the solution?
I have never experienced an issue or outage.
What do I think about the scalability of the solution?
The solution is very scalable and developed in a fashion that provides the ability to scale easily.
How are customer service and support?
Customer service has been outstanding. They have been timely and knowledgeable with all of my questions.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We used a different product for the total stack solution.
How was the initial setup?
The initial setup was straightforward.
What about the implementation team?
We handled the setup process in-house.
What was our ROI?
I'm unsure as to if we've seen an ROI.
Which other solutions did I evaluate?
We did evaluate SolarWinds.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
VP, Application support at a financial services firm with 10,001+ employees
Good service catalog and dashboard but the application performance monitoring module needs more functionality
Pros and Cons
- "The service catalog helped improve our organization by giving a good view of the flow for our microservices applications."
- "The dashboard could be improved. It would be helpful to get a view of specific things that we need to monitor for our application."
What is our primary use case?
We primarily use the solution for the service catalog.
We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services.
Having the trace and log features are useful for locating the microservice for the on-call person.
We would like to see some more useful applications for health monitoring where we can customize the cases based on data from the database.
It needs to have the facility to monitor data inside tables and the status of the UI.
How has it helped my organization?
The service catalog helped improve our organization by giving a good view of the flow for our microservices applications. It's important when we have different developers working on different services and having the trace and log features help the on-call person locate the microservice.
The application performance monitoring has also been useful. This module had a few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view in one tree. We may need more of a one-stop shop on top of the dashboard, and that is missing in Datadog. We'd like to be able to scrap our existing monitoring tool.
What is most valuable?
The service catalog is very useful. We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services. Having the trace and log features have been useful in order to locate the microservice for the on-call person.
The dashboard is great. It is helpful to get a view of specific things that we need to monitor for our application. It has been a good way to watch specific things and add them together.
The application performance monitoring is an excellent aspect. This module had a few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view into one tree, however.
What needs improvement?
The dashboard could be improved. It would be helpful to get a view of specific things that we need to monitor for our application. However, it was a good way to watch specific things and add them together.
The application performance monitoring module had very few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view into one tree.
For how long have I used the solution?
I've used the solution for one month.
Which solution did I use previously and why did I switch?
We previously used ITRS Geneos.
What other advice do I have?
We are using the latest version of the solution.
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer. Provider
Software Engineer at a healthcare company with 501-1,000 employees
Great dashboards and custom metrics with the ability to parse logs
Pros and Cons
- "The dashboards are great."
- "We need more advanced querying against logs."
What is our primary use case?
We share dashboards, set up alerts, and monitor everything that happens in our system. We use it in staging, features, production, and our load test environment. It is exceptionally helpful for making our engineering more data-driven.
I came from a company that believes we should focus on being telemetry driven. Instilling this in a smaller, less mature engineering organization has been challenging. However, it is much easier while using Datadog.
What is most valuable?
The dashboards are great. They are an easy way to give visibility into what we need to watch with others who are not SMEs.
I enjoy the custom metrics. With this, we can take things that were once logs and then retain them longer.
We are able to parse logs. To be honest, this was only useful due to the fact that we had not yet set up the Datadog agent properly in PHP. Once we did this, the Datadog log parsing was no longer needed.
The ability to pin to a date and time is very helpful. This allows us to pinpoint exactly what was happening.
What needs improvement?
We need more advanced querying against logs. While most issues I have had here can be alleviated by way of sending better-formatted logs, it would be cool to do SQL-type queries against our data.
We need a way to see dashboard metadata. We launched a huge customer, and we saw more people using Datadog than ever across the entire organization, yet had no way to tell.
It would be ideal if we had some way to compare arbitrary date times more easily. We would love to use the Diff Graph command against some hard-coded value, for instance, against some known event.
For how long have I used the solution?
I've used the solution for eight months.
What do I think about the scalability of the solution?
The scalability is great!
Which solution did I use previously and why did I switch?
We previously used New Relic. I was not part of the decision-making team that made the switch.
What was our ROI?
The ROI is the speed at which we can debug live sites. It has been excellent. It's amazing how many incidents we can capture before customers notice.
Which other solutions did I evaluate?
We looked into New Relic and a home-brewed solution as potential other options.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Solutions Consultant Manager at a computer software company with 1,001-5,000 employees
Stable cloud monitoring solution that is easy to use and deploy and is budget friendly
Pros and Cons
- "Datadog is easy to use and easy to deploy. It's a better solution compared to others on the market in terms of being budget friendly for our customers."
- "Datadog could be improved if it could detect other software in a container or server."
What is our primary use case?
We use this solution for our customer's IP and to support their cloud infrastructure.
What is most valuable?
Datadog is easy to use and easy to deploy. It's a better solution compared to others on the market in terms of being budget friendly for our customers.
What needs improvement?
Datadog could be improved if it could detect other software in a container or server. Datadog is better than other APM or observability tools, but it focuses mostly on telling the customer what they need to know about the software, database or applications that land on the server. We also need to know the version before setting up an agent with the APM modeling tool.
In some instances, the owner of a particular software changes to another person and this person did not originally transfer the knowledge or data to manage the server. The new person needs to monitor this server and they need to know what software or version of software was installed on this server before they used the APM agent for monitoring. If datadog could provide this insight, it would improve how we use the solution.
In a future release, we would like to be able to complete a network traffic or network flow analysis to detect the errors or problems on the network.
For how long have I used the solution?
I have been using this solution for two years.
What do I think about the stability of the solution?
This is a stable solution.
How was the initial setup?
The initial setup was straightforward. We needed two engineers for the deployment.
What's my experience with pricing, setup cost, and licensing?
This solution is budget friendly.
What other advice do I have?
Overall, Datadog is a good product to use and is easy to deploy.
I would rate this solution a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
IT Test Manager at a transportation company with 10,001+ employees
Very good documentation provided along with regular new features
Pros and Cons
- "Datadog is constantly adding new features."
- "Lacks some flexibility in the customization."
What is our primary use case?
Our primary use case is log management and we also use the solution for monitoring the application and underlying infrastructure. I'm an IT test manager.
What is most valuable?
I appreciate that they are constantly adding new features, some of which we haven't yet had a chance to implement.
What needs improvement?
I'd like to see more flexibility in the customization and they have a few settings which need to be changed but we are unable to make those changes as users or as the administrator. The tagging to get the different parts of the monitoring interconnected is a bit tricky and takes time to work out.
For how long have I used the solution?
I've been using this solution for 18 months.
What do I think about the stability of the solution?
The stability is good.
What do I think about the scalability of the solution?
I would say that the amount that we are monitoring is not that large and we've never had any scalability issues. We have around 50 users in our department.
How are customer service and support?
The availability or accessibility to customer service is not always good, although they generally provide solutions once you do manage to get hold of them.
Which solution did I use previously and why did I switch?
We have previously used different tools for different parts of the monitoring. We changed to AWS when we moved to the cloud. We also found that the effort in maintaining Grafana and Prometheus and keeping it up to date was taking too much time.
How was the initial setup?
The initial setup was straightforward, we used a service provider and they also maintain our operation in general.
What's my experience with pricing, setup cost, and licensing?
We have a four-year contract with Datadog, and the solution is pay-as-you-use.
What other advice do I have?
I would suggest using the documentation, which is quite good. It's best to start with existing integrations, and then do the customization step-by-step.
I rate this solution eight out of 10.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2026
Product Categories
Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM) AI ObservabilityPopular Comparisons
Wazuh
Zabbix
SentinelOne Singularity Cloud Security
Dynatrace
Splunk Enterprise Security
Snyk
Microsoft Defender for Cloud
Prisma Cloud by Palo Alto Networks
Darktrace
IBM Security QRadar
New Relic
Splunk AppDynamics
Elastic Security
Azure Monitor
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- Any advice about APM solutions?
- Which would you choose - Datadog or Dynatrace?
- What is the biggest difference between Datadog and New Relic APM?
- Which monitoring solution is better - New Relic or Datadog?
- Do you recommend Datadog? Why or why not?
- How is Datadog's pricing? Is it worth the price?
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- What cloud monitoring software did you choose and why?
















