Try our new research platform with insights from 80,000+ expert users
ZJ - PeerSpot reviewer
Software Engineer at a computer software company with 201-500 employees
User
Top 20
Sep 23, 2024
Very good custom metrics, dashboards, and alerts
Pros and Cons
  • "The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues."
  • "One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization."

What is our primary use case?

Our primary use case for Datadog involves utilizing its dashboards, monitors, and alerts to monitor several key components of our infrastructure. 

We track the performance of AWS-managed Airflow pipelines, focusing on metrics like data freshness, data volume, pipeline success rates, and overall performance. 

In addition, we monitor Looker dashboard performance to ensure data is processed efficiently. Database performance is also closely tracked, allowing us to address any potential issues proactively. This setup provides comprehensive observability and ensures that our systems operate smoothly.

How has it helped my organization?

Datadog has significantly improved our organization by providing a centralized platform to monitor all our key metrics across various systems. This unified observability has streamlined our ability to oversee infrastructure, applications, and databases from a single location. 

Furthermore, the ability to set custom alerts has been invaluable, allowing us to receive real-time notifications when any system degradation occurs. This proactive monitoring has enhanced our ability to respond swiftly to issues, reducing downtime and improving overall system reliability. As a result, Datadog has contributed to increased operational efficiency and minimized potential risks to our services.

What is most valuable?

The most valuable features we’ve found in Datadog are its custom metrics, dashboards, and alerts. The ability to create custom metrics allows us to track specific performance indicators that are critical to our operations, giving us greater control and insights into system behavior. 

The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues. Additionally, the alerting system ensures we are promptly notified of any system anomalies or degradations, enabling us to take immediate action to prevent downtime. 

Beyond the product features, Datadog’s customer support has been incredibly timely and helpful, resolving any issues quickly and ensuring minimal disruption to our workflow. This combination of features and support has made Datadog an essential tool in our environment.

What needs improvement?

One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization. These metrics are critical for understanding the performance and resource usage of our Airflow infrastructure, and having them directly in Datadog would provide a more comprehensive view of our system’s health. This would enable us to diagnose issues faster, optimize resource allocation, and improve overall system performance. Including these metrics in Datadog would greatly enhance its utility for teams working with AWS-managed Airflow.

Buyer's Guide
Datadog
March 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
884,873 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for four months.

What do I think about the stability of the solution?

The stability of Datadog has been excellent. We have not encountered any significant issues so far. 

The platform performs reliably, and we have experienced minimal disruptions or downtime. This stability has been crucial for maintaining consistent monitoring and ensuring that our observability needs are met without interruption.

What do I think about the scalability of the solution?

Datadog is generally scalable, allowing us to handle and display thousands of custom metrics efficiently. However, we’ve encountered some limitations in the table visualization view, particularly when working with around 10,000 data points. In those cases, the search functionality doesn’t always return all valid results, which can hinder detailed analysis.

How are customer service and support?

Datadog's customer support plays a crucial role in easing the initial setup process. Their team is proactive in assisting with metric configuration, providing valuable examples, and helping us navigate the setup challenges effectively. This support significantly mitigates the complexity of the initial setup.

Which solution did I use previously and why did I switch?

We used New Relic before.

How was the initial setup?

The initial setup of Datadog can be somewhat complex, primarily due to the learning curve associated with configuring each metric field correctly for optimal data visualization. It often requires careful attention to detail and a good understanding of each option to achieve the desired graphs and insights

What about the implementation team?

We implemented the solution in-house.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2553732 - PeerSpot reviewer
Staff Full-Stack Engineer at OMERS
User
Top 20
Sep 30, 2024
Prompt support with good logging and helps with standardization
Pros and Cons
  • "The initial setup was straightforward from my own experience, helping integrate within the application and service levels."
  • "In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response."

What is our primary use case?

Internally our primary usage of Datadog pertains around APM/tracing, logging, RUM (real user monitoring), synthetic testing of service/application health and state, overall general monitoring + observability, and custom dashboards for aggregate observability. We also are more frequently leveraging the more recent service catalog feature.

We have several microservices, several databases, and a few web applications (both external and internal facing), and all of these within our systems are contained within several environments ranging from dev, sit, eat, and production.

How has it helped my organization?

Datadog has had a massive impact on our department. Before, we had loose logging dumped into a sea of GCP logs with haphazard custom solutions for traceability between logs and network calls. Datadog has helped standardize and normalize our processes around observability while providing fantastic tools for aggregating insight around what is monitored regularly, all wrapped in an easy-to-use UI.

Additionally, a range of types of users exist within our department, each with its own positive impact on Datadog. DevOps leverages it to easily manage infra, developers leverage it to easily monitor/debug services and applications, and business leverages it for statistics.

What is most valuable?

Personally I've found the RUM (real user monitoring) to be above and beyond what I've worked with before. Client-side monitoring has always been on the short end of the stick but the information collected and ease of instrumentation provided by Datadog is second to none.

Having a live dynamic service map is also one of my favourite features; it provides real-time insights into which services/applications are connected to which.

We are also investigating the new API catalog feature set, which I believe will provide a high-value impact for real-time documentation and information about all of our shared microservices that other dev teams can use.

What needs improvement?

In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response.

For how long have I used the solution?

I've used the solution for approximately two years across our department and around a year or so of it being used practically and fully integrated into our systems.

What do I think about the stability of the solution?

Aside from one very brief bad update from the Datadog team around RUM where they broke the native 'fetch' for node in an update to RUM (which was resolved quickly) as it used to -- and may still -- modified the global 'fetch'; Datadog as a whole solution has been highly stable.

What do I think about the scalability of the solution?

It's easy to implement and scale provided a there's a solid IaC solution in place to integrate across your system.

How are customer service and support?

The Datadog support team is prompt and helpful when tickets have been submitted from our end. When their support team have been unsure, they've properly reached out internally to the relevant SME to help answer any questions we've had prior.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I've personally dabbled with some other open-source observability and monitoring solutions; however, prior to Datadog, our department did not have any solutions other than log dumps to GCP.

How was the initial setup?

The initial setup was straightforward from my own experience, helping integrate within the application and service levels; however, our DevOps team handled most of the infra process with minimal complaints.

What about the implementation team?

We handled the solution in-house.

What's my experience with pricing, setup cost, and licensing?

I personally am not involved in the decision around costing; however, I am aware that when we first set up Datadog, we explicitly configured our services/applications to have a master switch to enable Datadog integration so that we can dynamically enable/disable targeted environments as need due to the costs being associated on a per service basis for APM/logging/etc.

Which other solutions did I evaluate?

I was not involved in the decision-making regarding the evaluation of other options.

What other advice do I have?

I highly recommend Datadog, and I would explore it for my own individual projects in the future, provided the cost is within reason. Otherwise, I would highly recommend it for any medium-to-large-sized org.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Datadog
March 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
884,873 professionals have used our research since 2012.
Senior Software Engineer at Clearstory.build
User
Top 5
Sep 26, 2024
Excellent for monitoring, analyzing, and optimizing performance
Pros and Cons
  • "Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization."
  • "The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency."

What is our primary use case?

Our primary use case for Datadog is monitoring, analyzing, and optimizing the performance and health of our applications and infrastructure. 

We leverage its logging, metrics, and tracing capabilities to pinpoint issues, track system performance, and improve overall reliability. Datadog’s ability to provide real-time insights and alerting on key metrics helps us quickly address issues, ensuring smooth operations. 

It’s integral for visibility across our microservices architecture and cloud environments.

How has it helped my organization?

Datadog has been incredibly valuable to our organization. Its ability to pinpoint warnings and errors in logs and provide detailed context is essential for troubleshooting. 

The platform's request tracing feature offers comprehensive insights into user flows, allowing us to quickly identify issues and optimize performance. 

Additionally, Datadog's real-time monitoring and alerting capabilities help us proactively manage system health, ensuring operational efficiency across our applications and infrastructure.

What is most valuable?

Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization. This feature helps us quickly identify performance bottlenecks and prioritize improvements. 

Additionally, the ability to filter requests by user email is extremely useful for tracking down user-specific issues faster. It streamlines the troubleshooting process and enables us to provide more targeted support to individual users, improving overall customer satisfaction.

What needs improvement?

The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency. Additionally, the interface can sometimes feel overwhelming, with so much happening at once, which may discourage users from exploring new features. Simplifying the layout or providing clearer guidance could enhance user experience. Any improvements related to query optimization would be highly beneficial, as it would further streamline workflows and boost productivity.

For how long have I used the solution?

I've used the solution for five years.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Senior Software Engineer at Clearstory.build
User
Top 5
Sep 20, 2024
Capable of pinpointing warnings and errors in logs and provide detailed context
Pros and Cons
  • "Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization."
  • "The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency."

What is our primary use case?

Our primary use case for Datadog is to monitor, analyze, and optimize the performance and health of our applications and infrastructure. 

We leverage its logging, metrics, and tracing capabilities to pinpoint issues, track system performance, and improve overall reliability. 

Datadog’s ability to provide real-time insights and alerting on key metrics helps us quickly address issues, ensuring smooth operations. It’s integral for visibility across our microservices architecture and cloud environments.

How has it helped my organization?

Datadog has been incredibly valuable to our organization. Its ability to pinpoint warnings and errors in logs and provide detailed context is essential for troubleshooting. 

The platform's request tracing feature offers comprehensive insights into user flows, allowing us to quickly identify issues and optimize performance. 

Additionally, Datadog's real-time monitoring and alerting capabilities help us proactively manage system health, ensuring operational efficiency across our applications and infrastructure.

What is most valuable?

Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization. This feature helps us quickly identify performance bottlenecks and prioritize improvements. 

Additionally, the ability to filter requests by user email is extremely useful for tracking down user-specific issues faster. It streamlines the troubleshooting process and enables us to provide more targeted support to individual users, improving overall customer satisfaction.

What needs improvement?

The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency. 

Additionally, the interface can sometimes feel overwhelming, with so much happening at once, which may discourage users from exploring new features. 

Simplifying the layout or providing clearer guidance could enhance user experience. Any improvements related to query optimization would be highly beneficial, as it would further streamline workflows and boost productivity.

For how long have I used the solution?

I've used the solution for five years.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Mason Parry - PeerSpot reviewer
Data Engineer at Nursa
User
Top 20
Sep 20, 2024
Customizable alerts, good dashboards, and improves reliability
Pros and Cons
  • "I like how we can customize alerts, and when alerts have become too noisy, we turn their threshold down fairly easily."
  • "It's not that straightforward when creating an alert. The syntax is a little confusing."

What is our primary use case?

We have several teams and several different projects, all working in tandem, so there are a lot of logs and monitoring that need to be done. We use Datadog mostly for alerting when things go down. 

We also have several dashboards to keep track of critical operations and to make sure things are running without issues. The Slack messaging is essential in our workflow in letting us know when an alert is triggered. I also appreciate all the graphs you can make, as it gives our team a good overview of how our services are doing.

How has it helped my organization?

It has improved our reliability and our time to get back up from an outage. By creating an alert and then messaging a Slack channel, we know when something goes down fairly fast. This, in turn, improves our response time to swarm on an issue without it affecting customers. The graphs have also been useful to demonstrate to higher-ups how our services are performing, allowing them to make more informed decisions when it comes to the team. 

What is most valuable?

The alerts are the most valuable. Having alerts have saved us countless times in the past and is essentially what we use data dog for. 

I like how we can customize alerts, and when alerts have become too noisy, we turn their threshold down fairly easily. This is also the case when alerts should be notifying us more often. 

I also like the graphs and how customizable they are. It allows us to create a nice-looking dashboard with all sorts of information relating to our project. This gives us a quick overview of how things are going.

What needs improvement?

It's not that straightforward when creating an alert. The syntax is a little confusing. I guess that the trade-off is customizability. But it would be nice to have a click-and-drag kind of way when creating an alert. So, if someone who isn't so familiar with Datadog or tech in general wanted to create an alert, they wouldn't need to know the syntax. 

It would also be great if AI could be used to generate alerts and graphs. I could write a short prompt, and then the AI could auto-generate alerts and graphs for me.

For how long have I used the solution?

I've used the solution for more than two years.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Andrei Mita - PeerSpot reviewer
Service Manager at PwC
Real User
Top 5
Oct 2, 2024
Easy to configure with synthetic testing and offers a consolidated approach to monitoring
Pros and Cons
  • "Synthetic testing is by far the most valuable feature in our organization."
  • "One area where the product could be improved is Application Performance Monitoring (APM)."

What is our primary use case?

We use this solution for enterprise monitoring across a large number of applications in multiple environments like production, development, and testing. It helps us track application performance, uptime, and resource usage in real time, providing alerts for issues like downtime or performance bottlenecks. 

Our hybrid environment includes cloud and on-premise infrastructure. The solution is crucial for ensuring reliability, compliance, and high availability across our diverse application landscape.

How has it helped my organization?

Datadog has greatly improved our organization by centralizing all monitoring into one platform, allowing us to consolidate data from a wide range of sources. 

From infrastructure metrics and application logs to end-user experience and device monitoring, everything is now collected and displayed in one place. This has simplified our monitoring processes, improved visibility, and allowed for faster issue detection and resolution. 

By streamlining these operations, Datadog has enhanced both efficiency and collaboration across teams.

What is most valuable?

Synthetic testing is by far the most valuable feature in our organization. It’s highly requested since the setup process is both quick and straightforward, allowing us to simulate user interactions across our applications with minimal effort. 

The ease of configuring tests and interpreting the results makes it accessible even to non-technical team members. This feature provides valuable insights into user experience, helps identify performance bottlenecks, and ensures that our critical workflows are functioning as expected, enhancing reliability and uptime.

What needs improvement?

One area where the product could be improved is Application Performance Monitoring (APM). While it's a powerful feature, many in our organization find it difficult to fully understand and utilize to its maximum potential. 

The data provided is comprehensive, yet it can sometimes be overwhelming, especially for those who are less familiar with the intricacies of application performance metrics. 

Simplifying the interface, offering clearer guidance, or providing more intuitive visualizations would make it easier for users to extract valuable insights quickly and efficiently.

For how long have I used the solution?

I've used the solution for four years.

What do I think about the stability of the solution?

The solution is very stable. Issues happen once or twice a year and are usually solved before we have any real impact on the service.

What do I think about the scalability of the solution?

Scalability has never been a bottleneck for us; we've never felt any issues here.

How are customer service and support?

Support is slow at the beginning, however, they are much better and responsive now.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Datadog offered the most consolidated approach to our monitoring needs.

How was the initial setup?

This was a migration project, so it was rather complex.

What about the implementation team?

We implemented the solution with our in-house team.

What's my experience with pricing, setup cost, and licensing?

I'd recommend new users look down the road and decide on at least a three-year plan.

Which other solutions did I evaluate?

We evaluated AppDynamics and Dynatrace.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Tony Martinez1 - PeerSpot reviewer
Works at VANTA INC
Real User
Top 20
Oct 1, 2024
Great logging, session replays, and alerting
Pros and Cons
  • "Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening."
  • "The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog."

What is our primary use case?

Our primary use cases include:

  • Alert on errors customers encounter in our product. We've set up logs that go to slack to tell us when a certain error threshold is hit.
  • Investigate slow page load times. We have pages in our app that are loading slowly and the logs help us figure out which queries are taking the longest time.
  • Metrics. We collect metrics on product usage.
  • Session replays. We watch session replays to see what a user was doing when a page took a long time to load or hit an error. This is helpful.

How has it helped my organization?

It's helped us find bugs that customers are experiencing before they're reported to us. Sometimes, customers don't report errors, so being able to catch errors before they're reported helps us investigate before other users find errors

Datadog has helped us investigate slow page loading times and even see the specific queries that are taking a long time to load

Logging lets us see the context around an error. For example, see if a backend service had an error before it surfaced on the frontend.

Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening.

What is most valuable?

The most valuable aspects include: 

  • Logging. Being able to view detailed logs helps debug issues.
  • Session replays. They are helpful for seeing what a customer was doing before they saw an error or had a slow page load
  • Alerting. This is an important part of our on-call process to send alerts to slack when an error threshold is crossed. Alerts/monitors are easy to configure to only alert when we want them to alert.
  • Dashboards. It's helpful to pull up dashboards that show our most common errors or page performance. It's a good way to see how the app is performing from a birds-eye-view.

What needs improvement?

The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog.

The log querying syntax can be confusing. Usually, I filter by finding a facet in a log and selecting to filter by that facet - but I'm not sure how to write the filter myself

The monitor/alert syntax is also somewhat hard to understand.

Overall, it should be easier to learn how to use the product while you're using the product. Perhaps tooltips or a link to learn more about whatever section you're using.

For how long have I used the solution?

I've used the solution for two years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

Which other solutions did I evaluate?

We did not evaluate other options. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
CTO at Nordcloud
MSP
Top 20
Aug 7, 2025
Alerting and metrics improve monitoring efficiency while pricing presents challenges
Pros and Cons
    • "The pricing nowadays is quite complex."

    What is our primary use case?

    The primary purposes for which Datadog is used include infrastructure monitoring and application monitoring.

    The main use case for Datadog integration capabilities is to monitor workloads in public cloud, and those public cloud integrations that reached the public cloud metric natively were helpful or critical for us. We are not using Datadog for AI-driven data analysis tasks, but more cloud-native and vendor-native tools at the moment, and at the time when I was still in my last employer, we didn't use Datadog for the AI piece at all.

    What is most valuable?

    I find alerting and metrics to be the most effective features of Datadog for system monitoring. It was still cheaper to run Datadog than other alternatives, so the running costs were cheaper because it was SaaS and quite easy to use.

    Datadog is only available in SaaS.

    What needs improvement?

    The pricing nowadays is quite complex.

    In future updates, I would like to see AI features included in Datadog for monitoring AI spend and usage to make the product more versatile and appealing for the customer.

    For how long have I used the solution?

    I have been using Datadog since 2014.

    What was my experience with deployment of the solution?

    There were no problems with the deployment of Datadog.

    The deployment of Datadog just took a few hours.

    What do I think about the stability of the solution?

    The challenges I encountered while using Datadog were in the early days when the product was missing the ability to monitor Kubernetes and similar features, but they have since added those features. At the moment, I don't think there are too many challenges that I am worrying about.

    How was the initial setup?

    One person is enough to do the installation.

    What other advice do I have?

    I am not working with any of these solutions currently because I'm on sabbatical, but I used to work with Datadog six months ago, and now at the moment I'm on sabbatical.

    We were using the tools that AWS and Azure came with natively to monitor the AI workflows on their platforms.

    I used to work as the CTO at Northcloud, but I no longer work there.

    On a scale of one to ten, I rate Datadog an eight out of ten.

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Other
    Disclosure: My company does not have a business relationship with this vendor other than being a customer.
    Last updated: Aug 7, 2025
    Flag as inappropriate
    PeerSpot user
    Buyer's Guide
    Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
    Updated: March 2026
    Buyer's Guide
    Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.