Try our new research platform with insights from 80,000+ expert users
reviewer2002896 - PeerSpot reviewer
VP at a financial services firm with 10,001+ employees
Real User
Oct 31, 2022
Good monitoring, dashboards, and flame graphs
Pros and Cons
  • "The most valuable aspect is the APM which can monitor the metrics and latencies."
  • "The correlation between the logs and the metrics needs improvement as most cases, we might use another logging tool (that is cheaper in cost) which we then have to link together."

What is our primary use case?

The product is used for APM solutions for the metrics and traces for the REST API requests and service maps to understand the upstream and downstream services.

We are creating dashboards and widgets to monitor the status. We are creating alerts and monitors as well. We integrated the alerts and ticketing system in our organization with SNOW and Netcool.

We are using Kubernetes, AWS, and infrastructure metrics. We are using Kafka and Aurora Postgres logs as well, and we are using HTTP status codes to identify the error types.

How has it helped my organization?

So far, the solution works very well and solves most of the problems we have. Currently, we are trying to integrate the trace ID into Datadog and correlate the logs and metrics. However, Datadog is not supporting the spring-generated trace IDs, and they are not shown in the Datadog UI. It works in reverse. This means Datadog injects the DD-specific trace ID into the application logs, and those logs can be in other tools, for example, Cloud Watch and Splunk. 

What is most valuable?

The most valuable aspect is the APM which can monitor the metrics and latencies. There's a low error rate, and any alerts can be tagged to the service requests and sent via email to the required DLs. 

We can create incidents as well in our internal tools, like SNOW and Netcool.

The monitoring enables different dimensions of metrics to monitor the services and infrastructure. 

We have cloud infrastructure monitoring in Kubernetes nodes, pods containers, and ingress metrics.

Alerts are sent to an email in case of any issues. The metrics are used to create alerts.

The solution offers good dashboards, service maps, traces and flame graphs, HTTP status codes, power packs, service catalogs, and profiling.

While the logs module is not activated, we are using all other modules.

What needs improvement?

The correlation between the logs and the metrics needs improvement as most cases, we might use another logging tool (that is cheaper in cost) which then we have to link together. 

They can improve the SSO logging as well. Currently, we are logging in every two to three days by sending the login link explicitly.

Buyer's Guide
Datadog
March 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
884,873 professionals have used our research since 2012.

For how long have I used the solution?

I've been using the solution for two years. 

What do I think about the stability of the solution?

The stability is awesome. 

What do I think about the scalability of the solution?

We are expanding beyond observability right now.

How are customer service and support?

They offer pretty awesome customer support.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The initial setup was easy.

What about the implementation team?

We implemented the solution with the help of a vendor team.

What was our ROI?

I'd rate the ROI ten out of ten.

What's my experience with pricing, setup cost, and licensing?

I would recommend Datadog to others.

Which other solutions did I evaluate?

We also evaluated ECE and Splunk.

What other advice do I have?

The solution has a great support model.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2004336 - PeerSpot reviewer
Software Engineer at a tech vendor with 1,001-5,000 employees
Real User
Oct 31, 2022
Great profiling and tracing but storage is expensive
Pros and Cons
  • "Anything I've wanted to do, I found a way to get it done through Datadog."
  • "When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself."

What is our primary use case?

We use the solution for application hosting and a little bit of everything when it comes to supporting a worldwide logistics tracking service. It's used as a central service for collecting telemetrics and logs. We find it does the same work as all of our old tools combined, including Prometheus, Kibana, Google Logs, and more; putting all of this information in a single platform makes it easy to corroborate information and associate a request with the data, which might be lost when it is saved as logs.

How has it helped my organization?

At my organization, we have plenty of microservices written in different languages. Different teams prefer one or the other framework or library within those languages.

With Datadog, we can get in a single line and march in the same direction; our logs and metrics are collected in the same fashion, making it easy to find bugs or integration problems across services and understand how they interact with other systems.

What is most valuable?

I primarily prefer to utilize the profiling and tracing feature. It can potentially be used as a more-informed alternative to logs.

Beyond that, anything I've wanted to do, I found a way to get it done through Datadog. It allows for testing, logging, hardware monitoring, system performance, memory consumption, advanced observability, AI assistance, cross-team collaboration, and business analytics. Datadog helps some of the world’s biggest brands transform faster with the help of true AIOps, AI-assisted answers, UX and business analytics, cloud observability, and smart AI assistance.

It's all supporting my desire to build a great application, and in a centralized SaaS application, it's hard to say anything can beat it.

What needs improvement?

The storage of logs is a little bit unexpected; most services generate gigabytes of logs, and their size is not excessive. When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself.

For how long have I used the solution?

I've used the solution for one year.

What do I think about the stability of the solution?

We have no concerns with stability.

What do I think about the scalability of the solution?

It appears to be that there are no issues with scaling.

How are customer service and support?

Technical support is slow. It takes forever to get responses from the support team.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I've previously used Kibana and Prometheus. We are still using these.

How was the initial setup?

Setting up through the environment variables made it unbelievably easy to get started.

What about the implementation team?

We've implemented the solution in-house.

What was our ROI?

I do not have this number off-hand, as I am not the finance guy. I just like the product.

What's my experience with pricing, setup cost, and licensing?

I'd advise new users not to start off by sending logs.

Which other solutions did I evaluate?

We did not really look at other options.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Datadog
March 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
884,873 professionals have used our research since 2012.
reviewer2000457 - PeerSpot reviewer
Staff Cloud Engineer at a energy/utilities company with 51-200 employees
Real User
Oct 31, 2022
Good infrastructure and APM metrics with easy onboarding of new products
Pros and Cons
  • "We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level."
  • "The real issue with this product is cost control."

What is our primary use case?

We are using the solution for migrating out of the data center. Old apps need to be re-architected. We plan to move to multi-cloud for disaster recovery and avoid vendor lockouts. The migration is a mix between an MSP (Infosys) and in-house devs. The hard part is ensuring these apps run the same in the cloud as they do on-prem. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly, it is important not to cut corners which is why we needed observability.

How has it helped my organization?

The product has created a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in service now. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.

What is most valuable?

For use, the most valuable features we have are infrastructure and APM metrics. The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze. 

We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level. Then we use Datadogs conditionals in the monitor to dynamically alert hundreds of teams, and with the ServiceNow integration, we can also assign tickets based on the environment. Now, our top teams are using APM/profiler to find bottlenecks and improve the speed of our apps.

What needs improvement?

The real issue with this product is cost control. For example, when logs first came out, they didn't have any index cuts. This leads to runaway logs and exploding costs. 

It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there are no ways to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes that would save us 5X on our bill.

For how long have I used the solution?

I've been using the solution for about three years. 

What do I think about the stability of the solution?

The solution is very stable. There are not too many outages, and they fix them fast.

What do I think about the scalability of the solution?

It is easy to scale. It's why we adopted it. 

How are customer service and support?

Before premium support, I would avoid using them since it was so bad.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We previously used App Dynamics. It isn't built for the cloud and is hard to deploy at scale.

How was the initial setup?

The initial setup was not complex. We just had to teach teams the concept of tags.

What about the implementation team?

We implemented the solution in-house. It was me. I am the SME for Datadog at the company.

What was our ROI?

We have seen an ROI. It has saved months of time and reduced blindspots for all app teams.

What's my experience with pricing, setup cost, and licensing?

We'd advise new users to be careful with logs, and the APM as those are the ones that can get expensive fast.

Which other solutions did I evaluate?

We looked into Dynatrace. However, we found the cost to be high.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2000466 - PeerSpot reviewer
Senior Cloud Engineer, Vice President of Monitoring at a financial services firm with 10,001+ employees
Real User
Oct 31, 2022
Good ServiceNow integration, helpful API crawlers, and useful APM metrics
Pros and Cons
  • "The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze."
  • "It seems that admin cost control granularity is an afterthought."

What is our primary use case?

We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts. 

The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability

How has it helped my organization?

Using the product has caused a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in ServiceNow. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.

What is most valuable?

For use, the most valuable features we have are infrastructure and APM metrics.

The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze. 

We rely heavily on the API crawlers Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having to also make them add it at the agent level. Then we use Datadog's conditionals in the monitor to dynamically alert hundreds of teams. 

With the ServiceNow integration, we can also assign tickets based on the environment. Now our top teams are using the APM/profiler to find bottlenecks and improve the speed of our apps

What needs improvement?

The real issue with this product is cost control. For example, when logs first came out they didn't have any index cuts. This caused runaway logs and exploding costs. 

It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there is no way to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes, that would save us 5X on our bill.

For how long have I used the solution?

I've used the solution for about three years. 

What do I think about the stability of the solution?

The solution is very stable. There are not too many outages, and they fix them fast.

What do I think about the scalability of the solution?

It is easy to scale. That is why we adopted it.

How are customer service and support?

Before premium support, I would avoid using them as it was so bad.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We previously used AppDynamics. It isn't built for the cloud and is hard to deploy at scale.

How was the initial setup?

The initial setup was not difficult. We just had to teach teams the concept of tags.

What about the implementation team?

We did the implementation in-house. It was me. I am the SME for Datadog at the company.

What was our ROI?

The solution has saved months of time and reduced blindspots for all app teams.

What's my experience with pricing, setup cost, and licensing?

I'd advise users to be careful with logs and the APM as those are the ones that can get expensive fast.

Which other solutions did I evaluate?

We looked into Dynatrace. However, we found the cost to be high.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
LuWang - PeerSpot reviewer
DevOps Engineer at Screencastify
Real User
Oct 31, 2022
Customizable and helpful for isolating and filtering environments
Pros and Cons
  • "We have way more observability than what we had before - on the application and the overall system."
  • "Auto instrumentation on tracing has not been very easy to find in the documentation."

What is our primary use case?

We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.

We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.

We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.

How has it helped my organization?

We have way more observability than what we had before - on the application and the overall system. That includes the GKE cluster, nodes, and pods. It's helped with our cloud-run instances, databases, and data storage.

We also started observability in the CI pipeline to measure our CI performance, as it was a pain point for us. We are aiming to do incremental deployments and releases, and the bottleneck so far has been our CI performance. The visibility on which actions or functions take the most time allows us to pinpoint and focus on improving configurations on these.

What is most valuable?

We use structure logging a lot to triage production issues. The querying, attributes and tags manipulation, and customization have been very helpful in isolating and filtering environments. The integration with Winston logger has also been a breeze.

First and foremost, was that structured logging, tags, and attributes have not only allowed us to narrow down to a problem quickly in production, they have also let us create dashboards from these logs to understand more user behaviors, such as how many users stop and leave our application before an upload has completed. That helps us understand how important processing time is to a user.

We also intend to use distributed tracing more to understand where the error has occurred in a particular request.

What needs improvement?

Definitely, documentation could use improvement. As I navigated and try to find instrumentation and implementation details, I discovered inconsistency among SDKs based on languages. 

There are also places where highlighting can be improved. I once created an issue on GitHub, and it was resolved right away by an engineer. He pointed out that it was actually in the documentation. I looked again and found it was not very obvious. We were stuck on the problem for days.

Auto instrumentation on tracing has not been very easy to find in the documentation. We ended up using OpenTelemetry, yet the conversion between tracing contexts has been difficult.

For how long have I used the solution?

We've used the solution between six months and a year. 

How are customer service and support?

Customer service and support are generally very fast. I did experience one ticket, which involved changing the log index retention period, not being responded to. Any support tickets related to technical issues were resolved pretty fast.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used to use GCP Stackdriver for logging and monitoring since our infrastructure is all GCP based. It was lacking a lot, particularly on tracing and structured logging. We often had a lot of trouble triaging and diagnosing a production problem. Datadog's specialty is observability. Since we started using the product, we were able to create dashboards, and utilize APM, continuous profiling, RUM, and distributed tracing for production support and user trends.

Datadog also offers labs and workshops for its products, which is very helpful.

What about the implementation team?

We implemented the product ourselves.

What was our ROI?

I'm not sure what our ROI would be.

What's my experience with pricing, setup cost, and licensing?

We started with on-demand pricing as we were re-writing our product, and we weren't sure about the total usage. After we went into production and released the product, we experienced a price surge. Fortunately, our Datadog account manager reached out to us and suggested a monthly subscription, which is what we'll be switching to.

I'd advise keeping an eye on the usage and possibly setting up some monitoring on price. We didn't have much of a setup cost; we started with a free trial and continued with on-demand after the trial ended.

Which other solutions did I evaluate?

We didn't evaluate many of the other options. However, we do also use OpenTelemetry, which is vendor agnostic and integrates with Datadog.

What other advice do I have?

We always keep the Datadog agent to the latest version.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2000448 - PeerSpot reviewer
Senior Manager at a manufacturing company with 10,001+ employees
Real User
Oct 30, 2022
Great network monitoring, testing, and integration tools
Pros and Cons
  • "The visibility into our network has allowed for quick diagnosis of failures, identification of underutilized or over-utilized resources, and allowed for cloud cost optimization opportunities."
  • "I would love to see more metrics or analytics in IoT devices."

What is our primary use case?

This solution is for physical device monitoring across breweries, including PLCs, HMI Cameras, RFID panels, scales, etc. We want to gain visibility into these devices to influence predictive maintenance and unscheduled downtime. We want to monitor physical devices across the zone from a control tower perspective for end users and support teams alike. Understanding more about the performance of the devices and mechanical components will allow us to schedule downtime to fix imminent catastrophic failures and prevent unplanned downtime and lost revenue.

How has it helped my organization?

Previously, we had no visibility into the architectural layout of our infrastructure. The UI of Datadog has allowed for increased visibility and access to broken or underperforming resources or critical pieces of infrastructure. Beyond this, it has allowed us to identify areas where we can optimize cost in our cloud infrastructure.

What is most valuable?

The most valuable features I have found are network monitoring, testing, and integration tools. The visibility into our network has allowed for quick diagnosis of failures, identification of underutilized or over-utilized resources, and allowed for cloud cost optimization opportunities. The ability to correlate metrics has proven useful in determining downstream or upstream issues influencing the device, machine, or database having issues.

What needs improvement?

I would love to see more metrics or analytics in IoT devices. 

For how long have I used the solution?

I've been using the solution for approximately two years.

What do I think about the stability of the solution?

I have never experienced an issue or outage.

What do I think about the scalability of the solution?

The solution is very scalable and developed in a fashion that provides the ability to scale easily.

How are customer service and support?

Customer service has been outstanding. They have been timely and knowledgeable with all of my questions.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used a different product for the total stack solution.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

We handled the setup process in-house.

What was our ROI?

I'm unsure as to if we've seen an ROI.

Which other solutions did I evaluate?

We did evaluate SolarWinds.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer1996905 - PeerSpot reviewer
VP, Application support at a financial services firm with 10,001+ employees
Real User
Oct 30, 2022
Good service catalog and dashboard but the application performance monitoring module needs more functionality
Pros and Cons
  • "The service catalog helped improve our organization by giving a good view of the flow for our microservices applications."
  • "The dashboard could be improved. It would be helpful to get a view of specific things that we need to monitor for our application."

What is our primary use case?

We primarily use the solution for the service catalog.

We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services.

Having the trace and log features are useful for locating the microservice for the on-call person.

We would like to see some more useful applications for health monitoring where we can customize the cases based on data from the database.

It needs to have the facility to monitor data inside tables and the status of the UI.

How has it helped my organization?

The service catalog helped improve our organization by giving a good view of the flow for our microservices applications. It's important when we have different developers working on different services and having the trace and log features help the on-call person locate the microservice.

The application performance monitoring has also been useful. This module had a few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view in one tree. We may need more of a one-stop shop on top of the dashboard, and that is missing in Datadog. We'd like to be able to scrap our existing monitoring tool.

What is most valuable?

The service catalog is very useful. We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services. Having the trace and log features have been useful in order to locate the microservice for the on-call person.

The dashboard is great. It is helpful to get a view of specific things that we need to monitor for our application. It has been a good way to watch specific things and add them together.

The application performance monitoring is an excellent aspect. This module had a few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view into one tree, however.

What needs improvement?

The dashboard could be improved. It would be helpful to get a view of specific things that we need to monitor for our application. However, it was a good way to watch specific things and add them together.

The application performance monitoring module had very few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view into one tree.

For how long have I used the solution?

I've used the solution for one month. 

Which solution did I use previously and why did I switch?

We previously used ITRS Geneos.

What other advice do I have?

We are using the latest version of the solution. 

I'd rate the solution seven out of ten. 

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer. Provider
PeerSpot user
reviewer1996488 - PeerSpot reviewer
Software Engineer at Spring Health
User
Oct 26, 2022
Great dashboards and custom metrics with the ability to parse logs
Pros and Cons
  • "The dashboards are great."
  • "We need more advanced querying against logs."

What is our primary use case?

We share dashboards, set up alerts, and monitor everything that happens in our system. We use it in staging, features, production, and our load test environment. It is exceptionally helpful for making our engineering more data-driven. 

I came from a company that believes we should focus on being telemetry driven. Instilling this in a smaller, less mature engineering organization has been challenging. However, it is much easier while using Datadog.

What is most valuable?

The dashboards are great. They are an easy way to give visibility into what we need to watch with others who are not SMEs.

I enjoy the custom metrics. With this, we can take things that were once logs and then retain them longer.

We are able to parse logs. To be honest, this was only useful due to the fact that we had not yet set up the Datadog agent properly in PHP. Once we did this, the Datadog log parsing was no longer needed.

The ability to pin to a date and time is very helpful. This allows us to pinpoint exactly what was happening.

What needs improvement?

We need more advanced querying against logs. While most issues I have had here can be alleviated by way of sending better-formatted logs, it would be cool to do SQL-type queries against our data.

We need a way to see dashboard metadata. We launched a huge customer, and we saw more people using Datadog than ever across the entire organization, yet had no way to tell.

It would be ideal if we had some way to compare arbitrary date times more easily. We would love to use the Diff Graph command against some hard-coded value, for instance, against some known event.

For how long have I used the solution?

I've used the solution for eight months.

What do I think about the scalability of the solution?

The scalability is great!

Which solution did I use previously and why did I switch?

We previously used New Relic. I was not part of the decision-making team that made the switch.

What was our ROI?

The ROI is the speed at which we can debug live sites. It has been excellent. It's amazing how many incidents we can capture before customers notice.

Which other solutions did I evaluate?

We looked into New Relic and a home-brewed solution as potential other options.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: March 2026
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.