BMC TrueSight Reviews and Pricing

reviewer1293183

Vice President & Advisor - Compliance at a financial services firm with 5,001-10,000 employees

Feb 20, 2022

Download

Excellent standalone solution with high availability

Pros and Cons

"I like everything about this tool. I recommend this solution to anyone looking for a standalone solution with high availability meaning that can be used depending on the customers requirements."

"There are some small limitations with this tool in terms of reporting dashboards that fit all of the requirements of the individual customer."

What is our primary use case?

I am a certified TrueSight Operations Administrator where I monitor and implement BMC products. This solution is used to monitor various software infrastructures (i.e. servers, databases, hardware, etc.).

What is most valuable?

I like everything about this tool.

What needs improvement?

There are some small limitations with this tool in terms of reporting dashboards that fit all of the requirements of the individual customer.

For how long have I used the solution?

I have been using this solution for the last ten years.

Buyer's Guide

BMC TrueSight

June 2026

Free Report: BMC TrueSight Reviews and More

Learn what your peers think about BMC TrueSight. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

DOWNLOAD NOW

900,838 professionals have used our research since 2012.

What do I think about the stability of the solution?

This is a stable solution.

What do I think about the scalability of the solution?

This is a scalable solution.

How are customer service and support?

There are some troubleshooting steps that we are able to resolve ourselves. In the event that we are unable to resolve it, we simply just raise a case with BMC support and that are always there to help if necessary.

How was the initial setup?

This is a straightforward solution all around and there are three ways that you are able to install it: a silent installer, a command-line installer, plus Linux OS and Windows installers.

Depending on the project requirements, basic installation takes about five days for standalone setup. In the event that there is an HA setup that needs to be taken, an additional five or so days can be added to that time.

We have implemented this for a bank that has their entire infrastructure monitored by BMC and they have about five thousand users.

What about the implementation team?

We use out in-house team to implement the solution for our clients. We have a team of three people for maintenance of the tool.

What's my experience with pricing, setup cost, and licensing?

Annual licensing amount depends on the customers requirements. Support is an additional fee and there are options for three and five year support.

What other advice do I have?

I recommend this solution to anyone looking for a standalone solution with high availability meaning that can be used depending on the customers requirements.

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company has a business relationship with this vendor other than being a customer. Implementer

reviewer1776849

General Manager - Sales at a tech services company with 201-500 employees

Feb 10, 2022

Download

Intelligent and proactive monitoring solution that's reliable and easy to scale

Pros and Cons

"Intelligent solution with a proactive monitoring feature and consolidated dashboard that's stable and easy to scale."
"BMC TrueSight Operations Management is a firefighter that will help you when problems come."

"This solution is lacking in application monitoring features. Technical support for this solution also needs improvement, particularly in product knowledge and response time."
"The L1 and L2 support team need improvement in terms of product knowledge, but most of the time, they're always available and always trying to help us out."

What is our primary use case?

If you have a TrueSight umbrella, it is a capturing tool that used to be called BPPM (BMC ProactiveNet Performance Management), or Patrol, and it proactively monitors your IT infrastructure, e.g. the data center containing your server applications, or database, middleware, or network devices.

BMC TrueSight Operations Management is used for proactive monitoring where it has a connection with the email engine, so you can receive alerts. Through this solution, you can monitor your infrastructure, understand where the problem is coming from, and more easily understand your infrastructure. BMC TrueSight Operations Management is a firefighter that will help you when problems come.

What is most valuable?

There are many features that are most valuable in BMC TrueSight Operations Management.

First, its proactive monitoring feature is highly developed. BMC TrueSight Operations Management is an intelligent tool that's able to understand day-to-day operations and consistently gives alerts. The alerts are not automatic for some activities, e.g. some alerts are given monthly, while some are given more frequently.

The consolidated dashboard where you can enjoy a single pane of glass to look at the full infrastructure from the servers to the VMs, to the clouds, to the application, to the database, to the network devices, including having a topology, and having a tendency map of the topology of key offerings, is also a valuable feature of this solution.

What needs improvement?

There are still many things that can be improved in BMC TrueSight Operations Management.

They need to dig deeper into the layers of application monitoring. They're very strong in server and network monitoring, but they're still lacking on many of the sites, and there's still much work to be done on cloud monitoring.

These are the areas that need improvement for this solution.

We would be expecting additional features in the next release, as they always come up with good features and updates during version upgrades.

I'd like to see more features in the application side as they are lacking, when compared to AppDynamics or other competitors who have advantage over application monitoring features.

On the Cloud side, what I'd like to see on the next release is for this solution to be 100% on the Cloud, rather than it being a hybrid model.

These are the things we are looking forward to in the next release.

For how long have I used the solution?

Our company has been a partner of BMC for almost 18 years, and that's the amount of time we've been using solutions from BMC.

What do I think about the stability of the solution?

This solution is 100% reliable. It's stable.

What do I think about the scalability of the solution?

BMC TrueSight Operations Management is easy to scale. It can easily take any load, and any of the tools out there. It's an enterprise level tool.

How are customer service and support?

We contacted BMC technical support several times. The L1 and L2 support team need improvement in terms of product knowledge, but most of the time, they're always available and always trying to help us out.

There were many cases where there was a lack of response, or a longer response time. If a product defect has been found, for example, the case has to be escalated to a senior or another department, e.g. R&D, which means we have to wait for a response from that senior or from the R&D department, and that usually takes time.

How was the initial setup?

The initial setup for BMC TrueSight Operations Management is straightforward, but it's not that typical when deployed on-premises. It depends on the environment of the customer, the infrastructure, and how dependent it is. It also depends on the complexity of that environment, and what kind of tools they have in their network devices.

There can be a number of things that could make the setup straightforward or complex, e.g. If the environment is very clean and only have one or two volumes when they're using Cisco or SP, then it's easy to improve, but when the variants are too much, then it could take time.

What's my experience with pricing, setup cost, and licensing?

BMC TrueSight Operations Management is not on the cheaper side, but its pricing is on a case by case basis. Small, medium, and large-sized companies can afford it. Its licensing model is simple and based on the devices. You get the licenses based on the number of servers or network devices.

There are no hidden costs from BMC. They are very transparent with their customers. Everything's in front of the customer, including charges. They are really transparent. What we say and what BMC says, we make sure to deliver to the customer. Everything's very, very clear, including pricing and charges.

Which other solutions did I evaluate?

I evaluated AppDynamics.

What other advice do I have?

We are a partner of BMC Software, RiskNow, and Atlassian.

We have experience with all end-to-end BMC tools, starting from BMC Remedy ITSM for automations, to their operations management tool: BMC TrueSight Operations Management, which is used for monitoring, etc. We recommend these tools for our customers to use, except for application monitoring, as we didn't find a good tool for this on the BMC side, so we went with AppDynamics, then moved to Cisco. We are also looking into Dynatrace and exploring if our customers can use it.

We're always recommending the latest version of BMC TrueSight Operations Management to our customers, and we keep on upgrading if a new version is available.

Most of our customers have this solution deployed on-premises. If it was deployed on cloud, then we wouldn't have to take care of the upgrade, because it will be done automatically. Though cloud deployment is picking up, most of our customers are still deploying on-premises.

Apart from being a reseller of this solution, we also perform 100% implementation and other processes for our customers. We do end-to-end customer management.

After deployment, BMC TrueSight Operations Management only requires normal maintenance, or it can be taken cared of under managed services. Maintenance of this solution is hassle-free. It's just normal updating, e.g. patching.

This solution works well for any company size: small, medium, or large. It can take the load off any enterprise, no matter the size. It can be onboarded for a very big conglomerate without any challenge.

My advice to others looking into implementing BMC TrueSight Operations Management is to first find a partner for this solution. Once the implementation is done successfully, they can start using it. This is a beautiful solution and it works 100%, but it will still depend on how you're using it and how you're taking care of it. You need to take care of it so you can use it for a long time.

I'm rating BMC TrueSight Operations Management a ten out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company has a business relationship with this vendor other than being a customer. Reseller

Buyer's Guide

BMC TrueSight

June 2026

Free Report: BMC TrueSight Reviews and More

Learn what your peers think about BMC TrueSight. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

DOWNLOAD NOW

900,838 professionals have used our research since 2012.

Syed SaqibHabib

System Administrator at a media company with 201-500 employees

Sep 22, 2021

Download

High performance, rich reports, and stable

Pros and Cons

"The most valuable features are the rich reports, high performance, and the look and feel of the WebEx webpage are very good."
"The solution is stable, we have not needed to speak to the vendor regarding any issues, it has been operating very well."

What is our primary use case?

We use BMC TrueSight Operations Management in a Microsoft-based environment which we feel is best for performance monitoring and management. In our company, most of the workload is on Microsoft but the new version of the solution can monitor the workload of Linux too. My multi-fact servers are readily available and best practices are being suggested. I do not have to work and create everything from scratch because most of the features are there. Most of the alerts are available and very active but there are times I have to modify the extra alerts.

What is most valuable?

The most valuable features are the rich reports, high performance, and the look and feel of the WebEx webpage are very good.

For how long have I used the solution?

I have been using BMC TrueSight Operations Management for approximately five years.

What do I think about the stability of the solution?

The solution is stable. We have not needed to speak to the vendor regarding any issues, it has been operating very well.

What do I think about the scalability of the solution?

The solution is scalable.

We have more than 200 users in my organization using this solution and there are approximately three staff members that log into the solution for management and to check the resources. We had plans to increase usage of the solution but because of financial issues, we are not able to at this time.

How are customer service and technical support?

We have been in contact with technical support to align our objectives and to determine their effectiveness and for information regarding updates. We have been satisfied with the support and we have not needed them for any issues, only information.

How was the initial setup?

The solution gives clear instructions of what systems are required. You are able to simply plug data as a virtual machine in your environment and they have the option of going to the cloud. In the cloud version, you can easily enable the subscription and start working, deploying, and integrate with your environment.

On the client side, I do not have to configure or update anything on the software. It discovers what is running on the client-side system and it automatically does everything.

What about the implementation team?

We used a vendor to do the implementation because of our company policy and to make sure it was done correctly.

The solution only requires updates for the maintenance of the solution.

What's my experience with pricing, setup cost, and licensing?

The price of BMC TrueSight Operations Management is very high. If there was more flexibility with the sizing of the licensing it would be helpful, especially during the pandemic. We have wanted to expend but the licensing cost is too high.

What other advice do I have?

I rate BMC TrueSight Operations Management a ten out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Mudassir Parwez Ahmed

Sr. Technical Consultant at a tech services company with 11-50 employees

Aug 23, 2020

Download

Monitors a mix of on-prem and cloud, and predictive alerts help maintain availability

Pros and Cons

"The event management part of TrueSight Operations Management, in my experience, is probably the best in the market. You have endless flexibility. You can build your own rules, you have the MRL language, and you can implement any kind of logic on the alerts. It may be correlation, abstraction, or executing something as a result of the alerts. You have almost the whole range of options available for event management using the available customization."
"The event management part of TrueSight Operations Management, in my experience, is probably the best in the market."

"It's too complex, too many servers are required, there are too many different components in the solution, and a lot of agents are required."

What is our primary use case?

TrueSight Operations Manager includes infrastructure monitoring, as well as application performance monitoring. The premier use case that I have seen, over the last few years is infrastructure monitoring, along with network monitoring. The overall use case is monitoring of IT infrastructure, including the network; monitoring, alerting, and event management.

Occasionally we have seen a couple of customers who are interested in the application performance management as well.

The actionable alerts that we get from monitoring the infrastructure or application are the end-result of the monitoring. Most of our customers are interested in those alerts and in having a ticket created out of the alerts in their ITSM solution.

I have deployed this solution, along with other BMC solutions, for many customers across multiple verticals, like healthcare, banking, and telecom. I have done eight to 10 implementation projects of TrueSight. Our company sells BMC Software solutions and we implement, develop, and support them.

How has it helped my organization?

In a project that we're working on for a telecom company, at the time we started implementing TrueSight Operations Management, the number of alerts or events, and subsequently the number of tickets from those events, was really high. After applying the intelligent thresholds in TrueSight, and doing all the event management to correlate the related alarms and deduplicating of the alarms, and suppressing unwanted alarms, we have been able to reduce the number of events, and hence the number of tickets, by almost 60 percent.

Before that, their data center NOC team was overwhelmed with the number of tickets and the number of events. By applying the intelligent thresholds, which are called signature thresholds in TrueSight, we have been able to reduce the noise and the false negatives and even false positives. We have been able to give them only the most important actionable alarms and tickets. This has freed up a lot of time for productivity for the network operations team. They have been able to focus on different things, along with their regular stuff.

It also helps maintain the availability of infrastructure across a hybrid or complex environment. Because TrueSight can monitor network devices, databases, storage, cloud environments, and a mix of on-prem and cloud, our solutions keep checking the availability of all the devices in the infrastructure and they alert you when there is an issue. So it definitely helps in maintaining the availability. You can also configure predictive alerts or intelligent thresholds or predictive thresholds. Using them, TrueSight will try to give you an alert before something goes wrong. It will look at the threshold and it will look at the trending data for a particular metric, and before that threshold is crossed, it will give you a predictive alert saying that this threshold may be crossed in the next 15 minutes or 30 minutes. So it helps maintain the availability of your environment.

In addition, it helps to reveal underlying infrastructure issues that affect application performance, if you're monitoring an application using TrueSight APM. You can monitor an application and record the important transactions in the application that you're interested in. That is called synthetic monitoring. For example, on a banking site, the user login could be the transaction you record.

The app visibility part discovers the application automatically, and it can even monitor at the code level. For example, if there is something wrong in a transaction, maybe on the HTTP response or at the Java or .NET code level, it can indicate where the problem may be in the application. TrueSight also has Probable Cause Analysis. If you are monitoring your IT infrastructure completely, it can correlate the alerts and give you the most probable cause of a particular alert. Again, this can help you figure out the underlying issues in the environment.

The TrueSight solution has built-in intelligence. It uses its analytical engine, an AI engine, to look at the performance data for anything that it's monitoring and it creates a baseline of the performance. Then, it gives you abnormality alerts based on the baseline. Even if your threshold is not crossed, but the baseline of that metric is crossed, it will intelligently give you an alert saying that this metric is trending above the baseline. There may be a case where the static threshold has been set too high, but TrueSight has the intelligent analytical engine that can analyze the trend or the baseline, and then give you an intelligent alert. The Probable Cause Analysis uses the analytics engine to figure out what the probable cause may be for a particular alert. BMC is making good progress in terms of AI.

Mean time to remediation is related to the Probable Cause Analysis and integration with some other components like orchestration or executing a remote action. It definitely helps in reducing the mean time to remediate, but it depends on the expertise of the administrator of TrueSight. In my current assignment we have implemented TrueSight for a large customer in the Middle East, and we have quantified how much we have reduced the meantime to remediate. For the top-priority incidents, we have reduced the MTTR from 12 hours to 1.5 hours.

One of the most prominent features and values of the solution is that it helps to reduce IT operations costs. If you are using Operations Management and TrueSight Capacity, you can get a real picture of how much your IT assets are utilized, and how much of their capacity is saturated or underutilized. It gives you a very clear picture of your entire IT infrastructure, including your network devices and your cloud infrastructure. Your entire infrastructure is monitored and optimized for capacity, and that helps you save costs in your IT operations. I would estimate savings of 20 to 30 percent. I haven't calculated it myself. There are much higher numbers claimed by BMC.

What is most valuable?

The event management part of TrueSight Operations Management, in my experience, is probably the best in the market. You have endless flexibility. You can build your own rules, you have the MRL language, and you can implement any kind of logic on the alerts. It may be correlation, abstraction, or executing something as a result of the alerts. You have almost the whole range of options available for event management using the available customization. I've seen a couple of other solutions, like IBM's and HPE's for event management, and TrueSight Operations Management is far superior to them in event management.

The breadth of the solution's monitoring capabilities is a major selling point for the solution because it is incomparable. You can monitor almost any kind of server, all types of storage, network devices, databases, and even do application monitoring. You also have the option to develop your own Knowledge Module. If something that you want to monitor is not available, you can build your own Knowledge Module to monitor whatever you need. We also have cloud monitoring solutions, which are doing pretty well now. We have AWS, Microsoft Azure, Google Cloud, and container monitoring. The breadth covered by BMC for monitoring of IT infrastructure is really extensive. That breadth of monitoring is really valuable because we can cover almost any monitoring use case that customers come up with.

Also, the end-to-end, automatic ticketing — from generating an alert or an event, to doing event management, and then creating a ticket from the event, as well as automatic closure of the ticket or the event from the ticket — this whole end-to-end flow, is a major selling point. Most of our customers who have on-premise ITSM solutions use BMC Remedy. It is the most popular on-prem solution for ITSM. When customers have Remedy ITSM, it becomes a really good decision to use TrueSight Operations Management, and to use the out-of-the-box integration between the two solutions. That way, the ticketing is done automatically from the event and vice-versa.

In addition, the solution provides a single pane of glass where you can ingest data and events from many technologies. That's one of the major selling points that BMC is pitching for TrueSight Operations Management. You can monitor everything: servers, networks, databases, and your applications. You can also implement capacity optimization and the Presentation Server has a single console, a view and dashboards, where you can see everything in one place.

Previously, BMC called TrueSight a "manager of managers" because TrueSight can be integrated with almost every other monitoring and ticketing tool. For example, in my current project, we have integrated at least 20 other monitoring and alerting systems with TrueSight, and all the other systems are sending their events or alerts to TrueSight. Then, in TrueSight, we are doing the event management to reduce the noise, and filter out unwanted alerts, and get only the required alerts. Even for other integrations, TrueSight acts as a single pane of glass, where you have all these disparate systems. You can integrate all of them with TrueSight and get all the events and alerts in a single window.

What needs improvement?

In terms of root cause analysis, BMC TrueSight has a couple of modules like Service Impact Management and the Probable Cause Analysis, which work together to help you identify related events. This module, on paper, has a lot of promise, but it is actually really complicated. There are really small pieces working together and you have to have a lot of expertise to get any value out of the root cause analysis piece of the solution. For that reason, most of the customers don't really get much value out of the root cause analysis part of TrueSight.

There are other areas with room for improvement as well. For example, the monitoring part requires four or five different types of agents to monitor different things in your infrastructure, which makes things very complicated.

In addition, to implement the Operations Management solution alone, you need a lot of hardware; a lot of servers and a lot of hardware resources. If you compare it with other solutions in the market, like Dynatrace or AppDynamics, the implementation of those products can be done using notably fewer servers. If you want to set up a standalone TrueSight Operations Management for a customer, you need at least 10 servers to implement Infrastructure Management and Application Performance Management. To do the same implementation for Dynatrace or AppDynamics or SolarWinds you only need three or four servers maximum, for the same environment. So the number of resources required for implementation is very much on the higher side.

The complexity of the solution is, again, a challenge. There are so many different components that it becomes almost a nightmare for the operations teams to do the administration and apply hotfixes, patches, and to do daily operations for the solution.

It's too complex, too many servers are required, there are too many different components in the solution, and a lot of agents are required.

Apart from that, some of the intelligence features could also be enhanced. For example, the AI part of TrueSight Operations Management should be enhanced to compete with other products in the market.

For how long have I used the solution?

I've been using BMC TrueSight Operations Management for the last nine years, approximately.

What do I think about the stability of the solution?

Once the solution is deployed and the fine-tuning recommendations are in place, the solution is very stable. In my current environment we haven't seen any issue whatsoever in the last year. We have at least 20 servers running various TrueSight components, and none of them has had any issues in that time. So in that time the availability has been 100 percent and it has been 100 percent stable.

What do I think about the scalability of the solution?

It does scale well, but my concern with the solution is that when you want to scale it up the complexity increases. That is mainly because of the number of different components or software pieces that work together.

The multitenancy mode of TrueSight has a lot of room for improvement. It's like if you have a building and there are many apartments in it, you can have multiple tenants in the same building. If you want to add a tenant, you just give them an apartment in the same building. But with TrueSight, to set up multitenancy, you have to set up separate "buildings" altogether, instead of compartmentalizing into "apartments," which makes everything much more complex.

How are customer service and technical support?

I have been using BMC support for many years. Generally speaking, support is very good and, comparatively, it is much better than the competitors' support departments. But over the past couple of years, the technical expertise of the support team has consistently gone down.

Generally, the response from BMC support is excellent. You get a response almost immediately. And if the support team is unable to resolve your issue, then they coordinate with their development or customer engineering team very quickly, which is the best part.
If you are trying to get technical support from Microsoft, for example, if the support team is unable to resolve your problem, it can take months to get to a higher level in the support hierarchy. And reaching the development team of the solution is almost unimaginable. But with BMC, this is one of the best parts. If your issue is not resolved by a support team within a stipulated time period, they immediately reach out to their development team and they usually fix the problem.

How was the initial setup?

The initial deployment depends on the customer environment. If the environment is small or medium, the solution can be deployed fairly quickly, and similarly if the customer wants to deploy a standalone setup. But for a large customer, especially for customers who want to deploy the solution in a clustered environment, in a high-availability environment, or even in a DR environment, it's very complex to set up initially and it takes a fairly large amount of time to implement.

The initial setup means setting up the components, setting up the basic monitoring. The advanced configurations take extra time. For a small or medium environment, we can do the initial setup in a couple of weeks. A small to medium environment is where they are monitoring between 50 and 300 or 400 servers and IT infrastructure components, such as storage devices or hardware.

If you go above a few hundred devices, it becomes a large environment. For a large environment, it may take anywhere between two and four months to set up, depending on what kind of deployment the customer prefers: whether they want high availability, a clustered setup, or a disaster recovery setup.

We do have standardized deployment configurations for customers and we recommend that customers use them. We are BMC's most prominent partner in the Middle East, so we have done quite a few deployments and we have created standard templates for deployment, for small, medium, and large customers. Generally, the customers leave it to us to decide the implementation strategy and then we use our standard deployment template for the given environment, and that makes things much smoother and faster. We already know which component to install when, what configuration should be done, and how much time it should take, ideally. And tasks can be initiated in parallel, like agent installations.

What's my experience with pricing, setup cost, and licensing?

I would advise that you really give a lot of thought to how much you want to monitor and what the anticipated growth in monitoring requirements will be. These things should be considered in the planning phase and, accordingly, you should decide what type of environment to set up.

The licensing depends on the data streams and the event streams. If you are monitoring all the metrics for the monitored devices, the data streams and event streams will increase multifold as well. Therefore, filtering is very important in TrueSight. If you are monitoring the memory utilization for a server, for example, that alone has 20-plus attributes in TrueSight. If you let in all 20 attributes, the number of data streams will increase. If you're really interested only in the utilization metric, you may also be monitoring 19 metrics that you are not interested in and they will add to the data stream and the licensing cost will increase.

Consider scalability very carefully: how much you want to monitor and what components are very important. Then, depending on these two things, filter out unwanted metrics or attributes. If you do a good job at filtering the data, then your licensing costs will be manageable.

I'm not aware of the details of the licensing models of TrueSight's competitors, but our business team says that the cost of using TrueSight is higher compared to its competitors. But that often comes down to the filtering and the sizing. The filtering has to be done very carefully to bring down the licensing costs.

The licensing module is good and fairly self-explanatory. It's not very complex.

There are different pieces which are licensed separately. For example, Service Impact Management and Application Performance Management are licensed separately. Large customers buy the entire solution with all the features but they don't necessarily use all the features, especially the Service Impact Management. The latter is very difficult to implement and to get value out of. My advice is to consider what features of the solution you are going to use and then just pay for those features, instead of paying for everything without even using it.

Which other solutions did I evaluate?

Without naming particular competitors, I can give you general pros and cons of TrueSight Operations Management, when compared with them.

One of the pros of TrueSight Operations Management is the breadth of the IT infrastructure monitoring capabilities. TSOM can actually monitor any component of your IT infrastructure, along with your applications. It does very deep-dive monitoring and you have many more metrics, compared to any other solution, as far as I'm aware. It gives you more in-depth diagnostics and performance data.

Also, the support from BMC software is better than its competitors.

The complexity of implementing TSOM — the number of components required to set it up and the number of servers you need — is one of the cons. And the number of different agents you need to monitor different things is another con.

What other advice do I have?

TrueSight, as a solution, is a very large suite nowadays. In the last year or so, BMC has made the Orchestration module a part of the TrueSight portfolio. Then there are the Server Automation, Network Automation, and BladeLogic Client Automation pieces that are merged into the TrueSight portfolio. If you consider the entire TrueSight product suite, which includes TrueSight Operations Management, Infrastructure Management, and Application Performance Management, and you have TrueSight Capacity Optimization, TrueSight Orchestration, and TrueSight Automation — if you combine all these solutions you can see business innovation. You can automate a lot of mundane and repetitive tasks. You can automate a lot of administrative functions. You can integrate a lot of different components using Orchestration, and that helps reduce the human cost involved. And maybe you can use your human resources for more productive or more creative tasks, for things other than repetitive activities. So TrueSight can help businesses to innovate.

Overall, I would rate the solution at eight out of 10.

Disclosure: My company has a business relationship with this vendor other than being a customer. Reseller.

Monitori66fb

Monitoring Architect at a manufacturing company with 10,001+ employees

Aug 26, 2019

Download

We have reduced headcount and shrunk the mean time to resolve

Pros and Cons

"We have one application, which is fairly large. In the past, we had Level 1 and 2 NOC support teams who were responsible for watching dashboards. When they saw an issue in the application, they would call Level 2 or 3 support and escalate the call, if necessary. Now, through the use of this product, we have been able to reduce the headcount by five people, as we are able to eliminate the eyes on the glass. We no longer have people watching the dashboard. We have events which are processed automatically through the system and get to the right people. We had six people in L1s, and now have one. So, we reduced five out of six headcount, which is pretty significant."
"Now, through the use of this product, we have been able to reduce the headcount by five people, as we are able to eliminate the eyes on the glass."

"In a large company of our size, we need multiple people in our company trained. So, I have to take the training classes. Then, I have to go and train the rest of my organization. I would prefer to say to the other people on my team, "Go to this link and..." Or, "Here's a list of training sessions that you can go to which are online and that are free." I think it would help the adoption of their product in the marketplace, personally."
"It's a far more complex technology than I perceived at the beginning to deploy."

What is our primary use case?

From a senior management perspective, they want to get an understanding, when there is an outage, what is the impact of that outage across the entire suite of the company's products. We have an Event Manager that integrates all of our monitoring tools. Since we are a large company, we have about 26 different monitoring tools in use. The idea is getting all of them into a framework which can feed such a model that displays the impact of an outage.

How has it helped my organization?

We have one application, which is fairly large. In the past, we had Level 1 and 2 NOC support teams who were responsible for watching dashboards. When they saw an issue in the application, they would call Level 2 or 3 support and escalate the call, if necessary. Now, through the use of this product, we have been able to reduce the headcount by five people, as we are able to eliminate the eyes on the glass. We no longer have people watching the dashboard. We have events which are processed automatically through the system and get to the right people. We had six people in L1s, and now have one. So, we reduced five out of six headcount, which is pretty significant.

Also, the average length of time used to be 45 minutes before we had the right engineer on the line, fixing the problem. Now, it's probably three to five minutes.

The solution affected our end user experience management very positively. Our application teams are very excited about what we're doing with the reduction in headcount. More importantly, the automation that it has brought to us has streamlined so many manual tests, The teams are very happy with the way things are going.

The solution will help us maintain the availability of our infrastructure across a hybrid or complex environment. Right now, we can get to an event scenario or problem quicker than we used to. We are right on the cusp of releasing our service impact modeling. This will help us tremendously because we have a multicloud, as well as an on-premise environment. Any component should show the impact across its applications, regardless of where it's located. It has definitely helped in these environments.

We have improved our ability to get to a root cause because of the way their tools work. If you follow it down to the lowest level of the diagram, and a problem happens, it lights up a certain model in red. However, if you go down to the lowest member of the tree, you'll see who is the lowest person. So, if it's a database saying, "I'm out of disk space," then it may create all types of chaos. Following that tree down, you'll see the lowest level is the database server, and it has an event disk space issue. Then, right there, that's the root cause of all your application issues. So, it has helped us get to the root cause more quickly.

We're just now gaining momentum on the adoption of this product. We have seen with a database out of disk space, because we can get to the root cause quicker, we know what the root cause is. It can be remediated faster, but we can also eliminate the number of people who have to be on outage calls. There is no need to have network people on a call if it's a database issue. We let them deal with other things, so our operation becomes more efficient. The database people know exactly what the problem is, and quickly.

What is most valuable?

The most valuable feature is the event management piece of it. We have it integrated with a number of our different products. Thus, we can create events into a single Event Manager, which will create a Remedy ticket for us. This is a huge feature for us.

We have 26 different monitoring tools. The way this product works it allows us to define a custom event call. We can take all of our monitoring tools, and say "If you can put an event into this specific format, then we have a way of creating a common event across all of our monitoring tools." By doing that, we have a single back-end process that acts on all of the events. So, we only do a data transformation upfront when we are receiving events. This simplifies our back-end.

The solution has helped to reveal underlying infrastructure issues affecting app performance. We constantly have network issues. The network team had been capturing them, but it wasn't integrated into any impact model. By integrating them into an impact model, we could now catch and see the impact of them to our applications.

What needs improvement?

It's a complex system. The implementation is fairly challenging. They have done a good job lately of getting videos out there. We would like more videos and self-training, though. Right now, you have to go to BMC's training classes to get a good understanding of the product, and those training classes are very expensive. While I understand they are a business and trying to make money, a lot of their competition has training available via YouTube. There is much more accessibility to competitors' training.

In a large company of our size, we need multiple people in our company trained. So, I have to take the training classes. Then, I have to go and train the rest of my organization. I would prefer to say to the other people on my team, "Go to this link and..." Or, "Here's a list of training sessions that you can go to which are online and that are free." I think it would help the adoption of their product in the marketplace, personally.

It's a far more complex technology than I perceived at the beginning to deploy. I would have thought that the integration between their products would have been more seamless than it has been. This is what has made it a lot more complex than I anticipated.

From a technical standpoint, some of their products still have a dependency on Oracle Databases, and they are very well integrated in the cloud for a lot of their components. There is another database technology called Postgres, which they are partially integrated with. However, if they were to get all of their platforms integrated into Postgres, it would be much less expensive for companies, such as mine, to go to high availability, etc. The architecture really needs to be upgraded. I know they're doing a lot of this, but they need to keep doing it, and accelerate their process, so they can remain competitive.

For how long have I used the solution?

We have been working with the product for the last year. We went live with the product in April.

What do I think about the stability of the solution?

Stability of the product is about a seven out of 10. As far as stability goes, it has mostly been very good. With some of the newer stuff on 11.3, we have to call to support a lot of times and get a patch sent to us because certain things just don't work. Those pieces would have hurt stability, but once you get it running, it's very good.

What do I think about the scalability of the solution?

The overall scalability of its platform and the ability to support its website is pretty good. We have a couple people on our team who seem like they are pretty proficient at it. They can do things rather quickly.

I don't use PATROL. It is pretty good if you use their native products, like PATROL for monitoring. We integrate other monitoring tools into TSOM, so we don't use PATROL. I am familiar with it though, and I have been trained on it. I feel like it's pretty labor-intensive to manage. For example, if I have a number of different classes of servers, there are a lot of screens that I have to fill out, deploy, and push out to my systems. There has to be a more efficient way to do this. My company is always pressuring us to be more scalable. It is not very scalable in the administration of its monitoring. It could be better.

For TrueSight Operations Manager, there are a limited number of people who use it, no more than 15 to 20 system administrators and support personnel, who are mostly in administrative functions. The reason that there are so few users utilizing the system is because all the events are automated. Most of our support teams and users look at Remedy, and there are over 3000 users looking at Remedy. So, a lot of the users of our overall system have no need to look at a TrueSight console. Their work is done through the way we have designed the system. They get a Remedy ticket and what's called a PagerDuty notification. They know when they get those two things that there's an issue along with all the information's contained within those two systems. They don't need to go to the TrueSight console.

How are customer service and technical support?

The technical support team is very good. I wish there where more the people. This includes the ones who I work with on the phone, as well as their field technical people. They are very good.

I don't know if their technical support differs from their project team, but we are constantly revolving people in and out of our project because they get different assignments within BMC. Thus, I wish there more technical support people who had more longevity on our account. We will have a CMDB person assigned to us on the project from BMC, but in just three weeks, we'll find out, "Oh, that person has been reassigned, and they have to go to another account, where they have to do something different." We are constantly having to retrain people coming in from BMC. So, there is no permanence with their people on our projects.

This issue of changing technical staff is not limited to BMC. However, their resource pool seems sort of small.

We are constantly facing issues with having to call support because things didn't work as we expected them to, and I don't know why that is. We use BMC Atrium CMDB product (service impact model) and publishing service impact models seems to be challenging and problematic. We are constantly calling support, who gives them a bug fix, which fixes the problem. However, those bugs shouldn't have existed in the first place. If there is a bug fix in it that somebody knows how to fix, it shouldn't have happened in the first place.

Which solution did I use previously and why did I switch?

BMC is one of our longest running partnerships. We have been using Remedy for many years. We have been using parts of this system since 1998. However, we have never put it altogether in the way that we're doing now. We didn't replace anybody else. We had used their products before, but not to their full advantage.

How was the initial setup?

It's a complex system. We were dealing with a highly customized Remedy system which caused us a lot of issues. We had to wait for a Remedy upgrade to occur before we could deploy our systems. We were at this for about a year, and most of that time was waiting to get the Remedy implementation in place. Once the Remedy implementation and upgrade were completed, there were a lot of challenges with our CMDB data and the integration of the CMDB to a service model along with the publishing of a service model.

We have Remedy, a service model, TrueSight Operations Manager, and TSIM. With a lot of technologies in play, making them all work together has been challenging, since each one of them is a fairly sophisticated technology. BMC could do something to make it easier.

It took about three months to deploy the core technology which solved our problem. We have been waiting a very long time on the Remedy upgrade, which was over a year. However, this was because our company had highly customized the prior Remedy version. Without that in the equation, the technology took us around three months to deploy.

We are still enhancing it. That time frame was just to get it deployed. To make the full use and benefit of it, that will take well over a year. Both the technology and the organization, who is using it, need to be matured.

Right now, four or five of our core products are monitored and feeding this environment. Because we've been successful at it, we anticipate integrating more of our products and the monitoring of those products into our system. We have already built the integrations for the different monitors. It is just getting the different teams to want to use this system. That's why it's an organizational maturity thing. We could take them on very quickly, but there has to be a willingness on their part to do so. Part of our strategy is to make them want to use this system. That's on the event side.

On the service impact side, we're working with senior management. This Friday, we have a demo with the CIO with this technology, because he is the one who is putting the pressure on the different application teams to onboard with us.

We have a multiyear onboarding strategy, where we're onboarding more applications and integrating them into this particular environment. Today, they are being monitored by their own support teams, who are now beginning to see the success that we are having. The challenge that we are having organizationally is, when we onboard their applications, we expose the issues of their products through Remedy tickets and outages. A lot of times, these teams want to hide that. So, we have political issues, as well as technological hurdles to deal with.

What about the implementation team?

We did a lot of it ourselves, since we had the knowledge in-house, specifically on the event management side. We were an ADDM environment. So, we had bits and pieces of technology knowledge in our company, but in order to pull it all together, we used Wipro, as well as BMC in India to drive this. That's still the case. We're still using them to get this whole thing deployed in various pieces.

Overall, our experience with BMC and Wipro has been positive. However, there have been challenges because the technical people have moved from our account to another account. We have a rotating team of people, which gets very challenging for continuity.

For deployment and maintenance of TrueSight, we need about four people. For the whole enterprise solution, we need 25 people for 24/7/365 support.

What was our ROI?

We have reduced headcount and shrunk the mean time to resolve. That's how we justify the expense of the product. It has really worked out.

It has helped us reduce IT Ops costs. We were able to replace the headcount of five out of six Level 1 technicians. We repurposed those people to higher level tasks. Without this solution, we would not have replaced that job function.

What's my experience with pricing, setup cost, and licensing?

We did a five-year, multimillion dollar deal.

We haven't licensed the solution's machine-learning and analytics to deploy artificial intelligence for IT ops.

Which other solutions did I evaluate?

We looked at some of their competitors, but because of the technology and the base of knowledge that we already had in place, it made sense to stick with BMC. We decided to focus on making their products integrate the way they were supposed to and were designed to. That's what we've been doing since we had the knowledge and license in-house.

We also evaluated ServiceNow and BigPanda.

On the pros side, BMC and ServiceNow were very similar products. My biggest concern with BMC was they seemed to be declining in market penetration versus ServiceNow, which has been expanding considerably over the last several years. That was my biggest concern with moving forward with BMC. The pros with BMC were that we already had the knowledge in-house and the technology was proven for us. We knew it was fairly solid, so we felt confident that we would be successful with it.

One of the other differences between the two companies is the marketing organization from ServiceNow was a lot more consistent than from BMC. We probably get more calls even today from the ServiceNow account rep than we do from our BMC team. They show up every once in a while, and they do a big dog and pony show, then they go away for a bit. So, I don't think their marketing is as strong as it should be, or we're not a big enough customer for them. However, with the amount of investment we have in their product, they should be around more often.

There is one piece of BMC technology that we decided not to use. That's their Atrium Orchestrator. We use a different third-party orchestrator called Ayehu. We just found the Atrium Orchestrator from BMC to be too complex.

What other advice do I have?

Make sure you have knowledgeable people on your staff. Give yourself plenty of time for deployment, if you think it will take three months, make it six months. Look at past companies' experience on time to deploy, knowledge, and staffing requirements.

The solution's event management capabilities are very good. In some ways, they are based on very old technology. I first started using it way back in the late nineties and the basic core of the product does not appear to have changed much since then. Back then, it was a very good product. So that's not necessarily a bad thing. The other things that the company has done since then. Its enhanced the website portal, which I have a very positive impression of.

The website is fairly new, and it could be a little bit better. However, if I were to compare it to some of the other tools out there, it has a much nicer GUI and presentation. The web presentation is much more advanced than BMC's TSOM server.

We still have multiple panes of glass. E.g., we have an Event Manager screen along with a Remedy screen. We're getting closer to a single pane of glass and have fewer panes of glass. Where we had a lot of dashboards before, we now don't have anything, as we've replaced all of them. So, there are no panes of glass in our support. So, if you are a support personnel at our company, you are not looking at a screen. Instead you are looking at your cell phone, because we reach out to you when there's a problem and you don't have to look at anything.

We are using about five percent of our environment. We have what is called a limited deployment right now, because we have so much integration and automation going on. We needed to mature the support teams and the rest of the organization as a whole in what we're doing. Once we have achieved that, I anticipate a 100 percent of our applications are going to be feeding this system. After that, we will greatly extend our use.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.

Doug Greene

Sr. Director Operations at a comms service provider with 10,001+ employees

Aug 26, 2019

Download

Enables us to triangulate, using multiple sets of data - including log, app, OS, network, and more - and find issues

Pros and Cons

"The solution's event management capabilities are fantastic. We do a best of breed. If, on the network side, they use a different tool, we pull all that data in so that we have a single console. It's kind of like the monitor of monitors. We're able to aggregate all the different types of data sets, whether it's log data, app data, OS data, infrastructure data, or network data. We're able to aggregate all those events and then correlate and be able to say we're having an event."
"The very fact that we've been on it for ten years is a testament, as we continue to make the investment and pay the renewal because the return has been fantastic."

"Specifically around application performance monitoring, BMC is definitely not the market leader. The Dynatraces, the New Relics and the like are more of the market leaders in that space. I would like to see them grow that space a little bit more aggressively. It has not really been their bread and butter."
"Specifically around application performance monitoring, BMC is definitely not the market leader."

What is our primary use case?

We use it primarily for monitoring. My organization is an application support organization and part of what we need to do is to make sure is that our infrastructure is running tip-top so that those applications can run, consequently, the same way. We use the tool to do both application monitoring as well as infrastructure monitoring all the way down to storage services, and things like that on the OS layer. We have a full breadth and are able to triangulate what types of issues we're experiencing before our end-users experience those issues.

It monitors our entire platform. Everything in production, every single app, is monitored through the tool. As new applications come into our ecosystem, we have a process. The project team sits down with us. We talk about what the product's capabilities are. Most of the PMs already know that because they've been here for a long time. We set it up, and we move on to the next app. We're expanding it as new tools or new functionality or new applications come into the ecosystem.

How has it helped my organization?

Because we've used it for so long, we've been measuring results for eons. The standard metric that we use, given to us by our CIO, is that 70 percent or more of our outages need to be alert-driven, not customer-driven. So, if a customer calls in and says, "Hey, I'm having an issue logging in to PeopleSoft," which is one of our applications, we should have already known that there was an issue and handled the alert prior to the customer calling in.

A decade ago, we were using Microsoft's and HP's product sets to monitor but it was disparate. The alerts weren't aggregated and we never knew who they would go to. Therefore, we missed a lot of opportunities to be proactive in our organization. Hence, the reason we moved to the product which, at that time, was called ProactiveNet - and then it became BPPM and TrueSight, as it is today. We were able to flip that situation and we have been able to meet that metric for five years running. We had one blip in the year prior to that, and in the years before that, we were knocking it out of the park. So our metric is if we get the alert before someone had to call in, and we're successful in meeting that some 80 to 90 percent of the time.

In addition to that, when we look out across the industry, most organizations have anywhere from five to 15 people who are dedicated to monitoring. We have two. We're able to run the entire stack, along with its complementary adjacency tools, with two people. That was one of the many reasons that we made the migration from other products to ProactiveNet/BPPM/ TSOM. At that time, we were a one-man band and really needed to be able to move quickly but also be able to maintain a product and not require tons of manpower to make the product work. The improvements that BMC has made over the last two to three years are really revamping and consolidating the console so that it is truly a single console that you can run it with a single individual, should you need to.

We have 342 apps in our ecosystem and my team manages around 280 of those from a support-platform standpoint. And because we have two individuals who are dedicated to the monitoring, they partner with the rest of our admin organization to drive exactly how things need to be alerted. We review them quarterly. That is a testament to a really solid product - that it only takes one or two people to really run the thing and administrate it, versus having an entire staff and that's all they do.

The solution provides a single pane of glass where we can ingest data and events from many technologies. I am one of the few, at least from according to BMC, who has screens up in my hallways and I show our top 20 applications from a criticality standpoint - what's most important to our organization, things that I have to run. Everyone sees what's up on those boards every day. I go to it two or three times a day. Because we have that single pane of glass, we see where we're having issues organizationally and we're able to rally resources - whether it's engineering, operations, or our development group - and solve the problem and get those things from red/yellow back to green/blue. The single pane of glass was a key piece of what we needed to have to be successful as a monitoring organization.

In terms of the availability of our infrastructure, ours is not a hybrid environment, per se. We don't really measure and/or monitor - because of legalities with most of these FAS providers - how well their systems perform. But what do is measure any of the interfaces that touch or route to those applications, and we have an uptime measurement of about 99 percent for most of our apps. We have a dashboard for that which is managed out of the ITSM group. They partner with us and they pull all of our monitoring data to figure out two key metrics: total uptime and uptime excluding maintenance. Those are the two keys which enable us not only to showcase to our customer base how well the systems are performing but how often they really are available.

BMC has helped to reveal underlying infrastructure issues that affect app performance. Four years ago, PeopleSoft was running slow in regard to our payroll run. We run payrolls weekly. If you know anything about payroll, you've got to hit a certain deadline and be able to send the check file to the bank for those direct deposits to show up in people's bank accounts. It's a really sensitive issue when people don't get their checks. With the monitoring tools, we were able to triangulate that it was not an application issue but that it was actually a storage issue. Our solid-state storage was having a firmware issue which was causing slow turnover for the IO, and therefore it was slowing down the entire process of payroll. We were able to triangulate that that was the issue, decide what we needed to do - which was move the storage so that the application could continue to perform. We met the need and were able to get the payroll cut just in time so everyone could get their checks. It was a big win.

As for reducing IT ops costs, year over year, my operational expenses grow by three percent, which is mostly salary increase. I've gone from 12 resources to roughly 55 resources organizationally, while growing from 80 apps to 280 apps over the last eight years. Our operational costs have only gone up because of the use of licenses, not because of human capital. The tool has helped us work smart, not hard, and leverage the technology. We haven't necessarily needed to grow our operational expenses to accommodate the new functionality or the new applications which come into our ecosystem. We just set up the monitoring and it does its thing.

What is most valuable?

The solution's event management capabilities are fantastic. We do a best-of-breed. If, on the network side, they use a different tool, we pull all that data in so that we have a single console. It's kind of like the monitor of monitors. We're able to aggregate all the different types of data sets, whether it's log data, app data, OS data, infrastructure data, or network data. We're able to aggregate all those events and then correlate and be able to say we're having an event. Just because we have one or two alerts doesn't necessarily mean that we're having an event. It's when we get several of those that "trip the wire" that we're able to say, "Okay, we are having an event." And the tool allows us to aggregate all of that so that we're managing event-driven versus alert-driven.

The breadth of the solution's monitoring capabilities is also fantastic. A lot of IT organizations that I talk with use a conglomerate of tools to manage their monitoring and it ends up being pocketed. We don't have that problem because we are using it as the monitor of monitors and therefore we are able to take advantage of all of its bells and whistles. As well, we can feed in additional alert data, crunch that, and react appropriately and accordingly, proactively versus reactively. We'll get several low-level alerts saying, "Hey, this may be an issue," and we're able to proactively look at that before it becomes a critical outage. We use almost every aspect of the tool, with the exception of some of the automation because we haven't gotten there and found the need for it. But we're rapidly starting to take advantage of those pieces as well.

A use-case example would be if we have a drive filling up on a particular server for a particular application. If that's a known issue, we can actually orchestrate through the automation component of TSOM to be able to say, "Hey, when we see this type of alert, go try one of these three things and if that fixes the problem, go away. And if it doesn't, go ahead and escalate that as a ticket and we'll have a human go touch that server and remediate the issue." So we're right on the cusp of beginning that journey.

In addition, the entire root-cause analysis functionality within the tool is quite useful. It really comes down to how admins want to leverage it. There are what I call "old-school admins" who want to get on the box and solve it themselves. Then you have the "new-school admins" who go straight to the monitoring tools. It clearly shows you root cause analysis: This is the probable cause, and then they're able to go remediate it more quickly. We use that extensively within the operations team and the products team, which is the team that I own. I don't think the engineering team is quite there yet, but they're beginning to see the value of wanting to see that data and start using the tool themselves.

Regarding mean time to remediation, when I took over this organization, I and the rest of the group were working about 100 hours a week, just trying to keep our major systems running. It wasn't until eight months later, when we actually implemented a more mature monitoring system, that we turned the corner and people were working 60 hours. And now it's somewhere between 40 and 50 hours a week, which is much more maintainable and realistic in the industry. We were doing everything we could to keep those systems running, and we had no idea what would be in the next box of chocolates that we would open up, back when we first started this. There's a direct correlation with TSOM and the BMC product sets that have helped us be successful in working smart and not hard, like we did back in the day.

What needs improvement?

Specifically around application performance monitoring, BMC is definitely not the market leader. The Dynatraces, the New Relics and the like are more of the market leaders in that space. I would like to see them grow that space a little bit more aggressively. It has not really been their bread and butter.

They've been highly focused on cloud initiative. I don't know anyone in the industry who has solved how to monitor cloud, SaaS-based systems, because all of those systems are usually linked through other systems. That would be another area where it would be nice to see if they could find innovative ways to be able to do that.

The third piece would be around out-of-the-box automation. We all have particular types of alerts and events where all we really need to do is be able to turn the functionality on versus creating the functionality. BMC is already addressing that in many cases.

For how long have I used the solution?

We've used it in probably three incarnates of what it is today, so it's been about ten years.

What do I think about the stability of the solution?

We don't have any issues. We're in an HA format so if we do have any issues, things failover quickly and we don't miss a beat. It's the heartbeat of our products, the fact that we provide monitoring services to our businesses, so monitoring can't be down. It can't have a bad day. TrueSight Operations is a highly stable product. It is a beast. It runs really well. There's isn't a lot of care or feeding that we have to do to it to make sure that it stays healthy.

What do I think about the scalability of the solution?

It's highly scalable. We continue to add more servers and more applications within the ecosystem easily and quickly. We continue to review all of those quarterly to make sure that the way that we've tuned the monitoring is still accurate and that it's meeting the needs of both the admins and the business.

How are customer service and technical support?

We have a great relationship with BMC. We're probably different than the average bear. We've got a great account team. When we call customer support, we get answers pretty quickly. We don't have to call them very often, which is a good thing for any vendor. You don't want to have to call support a lot. But when we do, it's usually because we can't figure it out and we're able to get the answers pretty quickly through their organization.

Which solution did I use previously and why did I switch?

We used HP and then we used Microsoft Systems Center Operation Manager, SCOM.

How was the initial setup?

Back in the day, the initial setup was very complex. As it stands today, upgrades are really very easy. It's basically just a matter of refreshing old hardware, turning the system on, and making sure that it picks up all of the agents. Setting up today is infinitely more simple than it was even three or five years ago.

BMC is innovating even further and working towards containerization so that we won't have to do upgrades anymore. We'll just overlay. They've really taken into account how to consolidate consoles so that there aren't so many bits and pieces. That has made it easier for them to do upgrades. Installing the system or deploying the system only takes a couple of weeks in an organization of our size, where it used to, when we originally did it, take four months.

The latest one that we did, we had all the technical bits and pieces done within four weeks. Then we slowly rolled it out as we sunsetted particular agent groups. The total roundtrip was six months to have it fully deployed and embedded and working in the system.

At this point, we do an upgrade every three years, and every five to six years we're upgrading our hardware. This year we actually went fully virtual. Our engineering organization still takes a good bit of time to build servers. We were able to get virtual machines within weeks of the initial setup of the product, and we were able to roll to virtual machines, versus physical machines, relatively simply. It was basically a point-and-shoot install. We pulled over all of our policies and procedures that were already canned - and that was another thing that was more of a challenge in years past because we would have to redo them. This time, all that got pulled in and we were up and running within weeks.

What about the implementation team?

We partnered with BMC this time. Typically, we use a third-party, but in talking with BMC and where we were at - as we use them primarily for consultative - we said, "Hey, what's the best way to go ahead and do the upgrade in the migration?" They gave us the cut plan and then we actually did the physical work ourselves, which saved us some $200,000 in project fees.

With two guys running the system day-to-day, and consultative services from BMC to tell us, "Okay, this is how you do it," we were able to execute both the upgrading project, as well as administrating the product, while still running on the old system. It says a lot about the product's ease of use and capabilities.

Now, my guys are really smart and I'll give them all the credit. They're smarter than the average bears. But the reality is that it's rare to find a product where the people who are running it can be doing a major upgrade at the same time.

What was our ROI?

The very fact that we've been on it for ten years is a testament. We continue to make the investment. We continue to pay the renewal because the return has been fantastic. I don't have any specific data points other than the fact that we've been on the product for ten years. There's a reason for that.

What's my experience with pricing, setup cost, and licensing?

There are no costs in addition to the standard licensing fees. It's a straightforward contract.

Which other solutions did I evaluate?

Every three years, we reevaluate the space. That's just part of the culture that we've established. No one tool stays forever at the top, but BMC's monitoring capabilities and their discovery asset tools are top-of-stack, typically, in any of the research that we do. We continue to use them and we continue to have a great relationship with BMC.

What other advice do I have?

Keep it simple. Make sure that you understand, architecturally, how your applications and your data center are set up. It makes your life easier to know exactly what you're going to need to monitor.

The biggest lesson I have learned from using this solution is to really take full advantage. I joke with the BMC guys that TSOM is like AutoCAD, the engineering tool that people use to design and draw. We only scratch the surface of its full capabilities. The thing that I've learned is that it's a good idea to take advantage of all the bells and whistles as quickly as you can because it really pays dividends to do so.

We are using a little bit of the solution's machine-learning and analytics. That's an adjacency tool called IT Data Analytics and we feed that into our overall, single pane of glass monitoring. I don't know that we've taken full advantage of that quite yet. It is on the roadmap. We'll probably get to that, realistically, next year and in '21, where, as we're seeing those analytics, we will actually link automation to it. So when we see something we'll actually do something. We're a fairly small shop and therefore scale is not an absolutely necessary thing, but it is something that we are striving to move towards. It has affected our application performance in bits and pieces. It's not something that I'd wave the banner on quite yet. We have pocketed instances where ITDA has come back and told us that there was an issue, and we were able to remediate proactively versus reactively. I don't know that we're leveraging the tool's full capabilities where I can say that I have a use case where this was a big win for us.

I don't think that the monitoring tool, TSOM itself, has created or helped to support any business innovation.

As for users of the solution, I have the two admins and then I have, say, half of my organization that consumes it as a tool, so there are about 12 to 15 users. Each of those people is an application admin. Their primary responsibility is the applications that they support. The monitoring is a tool for them to use to ensure that those systems are healthy and top-notch.

I have a senior manager who manages the space. He also manages our asset-discovery tools along with all of our web and third-party space. He is a busy guy but it's all managed under one leader. There are the two folks who administrate it. It's really a very small human-capital resource footprint, in comparison to what it does technologically.

I give TrueSight Operations a nine out of ten. There are always bits and features from other products that we wish we would see in it. Usually, we see them pretty quickly.

ServiceDdffe

Service Delivery Manager at a financial services firm with 1,001-5,000 employees

Aug 22, 2019

Download

Knowledge Modules are what make the implementation across our varied infrastructure, but RBAC controls need some work

Pros and Cons

"From an administrative standpoint, what stands out in TrueSight is the ability to implement quickly. When they have a requirement to monitor something, we're able to turn that on quickly in their environment. We're able to set up new apps within a day."
"Having a good monitoring implementation has made a world of difference to our operations teams."

"We were somewhat limited in TrueSight due to some of the RBAC controls not quite being what we wanted as far as delegating out administrative privileges for implementation. But because we were able to turn requests around pretty well, that burden wasn't too heavy."
"We're end-of-lifeing it now. Overall, the licensing costs of BMC are a challenge for us in that they're hard costs, whereas open-source monitoring has soft costs, where it's harder to line-item."

What is our primary use case?

We use it for business service and infrastructure monitoring. We use the full gamut of utilities from them and monitoring in the platform.

How has it helped my organization?

We don't use APM. We used to. We line-item nixed that for various reasons a few years ago. We also don't use the ITDA, their next-gen log monitoring tool. So we're truly just within the TSOM interface, as well as doing synthetics. That being said, the Knowledge Modules that BMC brings to the market are what make the implementation across our varied infrastructure and applications. It's critical to have those Knowledge Modules. If we had to write things ourselves, or to use a more generic monitoring environment, and then build additional scripts on top of that to monitor the Kubernetes of the world, or the WebLogics of the world, or the Oracles and SQLs of the world - if we had to write scripts ourselves to bring back particular monitoring components and performance metrics and so on - that would be a heavy burden that would keep us from implementing. We don't often run into something that we haven't been able to monitor. It's just a matter of getting people to the table to tell us what they need.

When it comes to incident management, we get most of our data from TrueSight, log data, because we don't use the ITDA interface. It would be an effective interface, but for logging we go to our SIEMs, since we're already pumping data to another system there. But TrueSight definitely gives us a view into the health of our business services, which is our primary goal for implementing monitoring.

We try very hard not to use event management. What I mean by that is that we do not have a typical NOC. We don't have ten people staring at screens and then escalating as necessary. Along those same lines, we don't spam our incident management environment with events from TrueSight. With a lot of customers I've met over the years, that's essentially the old school way of doing things. Instead, we create events that are truly actionable. If we don't have an actionable event, we don't create it. We use their baseline technology to ensure that we're only sending items that are either about to have a problem or have passed the threshold of having a problem. If you're talking about typical event management, where you create an event and it gets forwarded to some other system, there's a notification about it somewhere else - the whole ITSM cycle - we don't use it for that. We use it for creating smart events that create alerts directly to the teams responsible. As I described before, we have many distributed teams rather than a centralized NOC.

In terms of TrueSight helping to maintain the availability of our infrastructure, it's an interesting question because of our distributed systems. We have 8,000 hosts across about 40 different teams, and we have 600 different applications that we run. For those critical tier-one apps, teams are highly involved in their day-to-day operations and watching them very closely. Having those two things - the actionable alerts and the ability to see what the health of their system is at any given time, and to be able to check it against what normal looks like for those applications - gives the teams that use it in such a manner the information they need to be confident that their availability is as it needs to be, or better. As far as a hybrid environment goes, we have our own hosting environment because we are the cloud to our clients. So we're not necessarily in that situation. We don't use assets other than what's in our hosting environment.

If, in the past, one of our biggest problems was just plain old infrastructure incidents, basic availability incidents where a server or an application, an interface or an endpoint, may not have been available and no one noticed it until some downstream, business end-result brought it to our attention, we've essentially eliminated 90 percent or more of those. It has been at least three years since we've done any numbers. But at the time, we might have had ten to 15 Sev-One incidents a month. When we last measured it, we were down to one. That was within a couple of years of implementing an enterprise monitoring strategy.

As for root cause, when a team is engaged in monitoring to its full extent, we're usually able to get to root cause pretty darn quick. For example, if a team has many servers that could potentially be impacting an application or a business service, tracking something down across those multiple servers and multiple owners could be really tedious and time-consuming. It would be on the order of hours, or at least many minutes, depending on the scope of the issue. With well-implemented monitoring, for our Sev-One apps, they're able to get to the solution almost immediately. If we have monitoring set up properly, the actionable event will tell them precisely where a critical component has failed and they can resolve it. Where it's a different type of incident that we might not have a particular monitor for, they're able to use the performance data, availability data, and other related alerts to get to their issue much faster than they used to. Having a good monitoring implementation has made a world of difference to our operations teams. It's so much so, that if you think back five years, which is an eternity in the IT world, when there was a Sev-One incident back then, someone would walk around tapping people on the shoulder all over the floor. That was very time-consuming. But now they're able to collaborate quickly and say, "It looks like this is the problem right here," in a well-monitored environment, and get right to the root cause.

It's helped our mean time to remediation, and I'm being conservative here, by about 70 to 80 percent. That's an absolutely huge impact.

What is most valuable?

We have many operational teams, and for any given team their requirements are different. One team is more reliant on infrastructure monitoring, because they are processing-heavy. Another team might be more reliant on endpoint monitoring where we're ensuring that the third-party endpoints they rely on are up and available. Another team may have fairly immature applications, so that they would rely heavily on log monitoring to catch all the errors that may come up. From a consumer-function standpoint, there isn't any feature that stands out. They're all important because all of our consumers are important.

From an administrative standpoint, what stands out in TrueSight is the ability to implement quickly. When they have a requirement to monitor something, we're able to turn that on quickly in their environment. We're able to set up new apps within a day. Most of the work in monitoring is working with the teams, evangelizing, educating, and making sure that they're bringing their smart requests to the table so that they get visibility into their business service. If the implementation wasn't as easy as it is, it would hinder and probably decrease the adoption of monitoring. But because we can turn requests around pretty quickly and adjust things as teams need adjustment for their different release schedules, administratively, we're able to respond and keep pace with the business and the technology that they're implementing. That is a critical function for us.

For how long have I used the solution?

We've been using TrueSight Operations Management for almost six years.

What do I think about the stability of the solution?

Stability is one of those areas of identifying challenges with TrueSight, areas that I'm not entitled to share at this point.

What do I think about the scalability of the solution?

We've been able to implement all the hosts that we care to implement on a couple of servers, with minimal maintenance. We don't use their high-availability solution. We don't really require it because the underlying infrastructure is relatively robust. We haven't had any problems with the scalability. Had we been a couple of times larger, there would've been more to implement server-wise.

The other thing about our implementation is that we send a lot more performance data to our implementation of TrueSight than the typical BMC environment might. We send everything server-side for analysis rather than keeping everything agent-side or emphasizing agent-side, as I've seen a lot of other clients do. I think the tide is turning. I think more people are doing what we're doing where we just push all the data for potential analysis. But we've been able to accomplish what we need without too much infrastructure.

How are customer service and technical support?

They had an advisory board. We, as a group, and even I specifically, had been asked by them what they needed to continue doing. One of those was continuing to build out Knowledge Modules in various technologies. Some of the ones BMC has made available, we've implemented, and some of the ones BMC has made available don't impact us and we haven't implemented. But I've been in discussions where they say, "What do we need to do," and Knowledge Modules is one of those areas where they've made a commitment to continue adding to them, and we appreciate that.

Which solution did I use previously and why did I switch?

When we first started, we did not have a monitoring program at anything resembling an enterprise-type level. We were at about 4,000 hosts and we were really not monitoring anything except for a few services. At that, it was bare-bones monitoring. We monitored, maybe, half of our environment at bare-bones.

We went on this journey six-plus years ago to have an enterprise monitoring solution that focuses on business services. One of the reasons we did that is because of the number of incidents that we had that really should never have happened. Now that we're a number of years in, and we've implemented monitoring and brought teams around in the direction of business service rather than just an executable's use of a CPU, we have much fewer incidents.

As a general trend, we're much more capable of seeing what's out there and monitoring what our issues are and taking care of it before the business incident occurs. I don't have any particularly recent examples where our monitoring was able to resolve an incident after it happened. Of course, I don't get notified when people say, "Oh, look, I resolved this," because it's part of their daily operations to find an issue and resolve it. So it's not necessarily a newsflash anymore for us.

It doesn't happen quite as frequently as it used to, but they continue to build Knowledge Modules, every time there are new products on the market. They need to create Knowledge Modules for the implementation to be enhanced. That's one of the key features of the Operations Management. That's definitely something that helps us take advantage of everything BMC has. They're not sitting on their laurels. They're building things out.

How was the initial setup?

The complexity of our environment demanded the complexity of the implementation. More than half of the effort that we had in implementing monitoring was based on the way we did our program. We were basically starting at zero and bringing teams up to speed, evangelizing, educating, getting people onboard.

The implementation of TrueSight itself was just a software implementation. It had its bumps and bruises. None of us were versed in BMC software. There were some learning curves as would typically be expected for any application of this scope, magnitude, and impact.

We had an overall strategy of doing proofs of concept for various, widespread technologies. We took that success and did a wide-to-narrow type of advertisement. We told everybody what was going on and then we brought more specific people into the room and said, "These are good targets for you to implement." During and after that evangelizing and advertising, we started implementing tier-one applications as an onboarding effort. We did that in a deep-dive fashion where we would sit down and interview these teams and really come to understand what makes their business service tick. A lot of our evangelization effort was actually in changing the focus of operations teams to think from a business service perspective. That paid off in dividends later when people were more interested in monitoring the actual functions of their applications rather than just the infrastructure of their application. We've been able to change mindsets over the course of a number of years. The first two or three years we were doing implementations. That was when we did most of that work.

From there, we worked as much as possible to allow folks to implement their own where possible, rather than centralizing it, so that people could keep up with their own demands. We were somewhat limited in TrueSight due to some of the RBAC controls not quite being what we wanted as far as delegating out administrative privileges for implementation. But because we were able to turn requests around pretty well, that burden wasn't too heavy.

From tier-one apps, we kept going and kept educating, bringing people to the table. When new applications come to our company, we still reach out and educate new teams, bring them to the table and use the onboarding process we built and solidified over the course of the first couple of years.

During the first three years, we had two-and-a-half FTEs for implementation. That was for the full program, not just the TrueSight component. It included all those interviewees, all those educational components, all the training, etc. The full program. The actual pressing of the buttons was about half of that. Once you stand it up and start connecting things, it's a matter of administratively using the tool to execute.

What about the implementation team?

Typically, our company builds knowledge for implementing infrastructure/operations activities like this from the ground up. We did not use a third-party. BMC was instrumental in our success in that they made resources available to us, implementation-wise as well as development- and support-wise.

What was our ROI?

The solution hasn't helped reduce costs in a measurable fashion. That's a measure that we wouldn't undertake. There might be soft costs benefits, such as

impact on the quality of life for operations folks
our ability to show our clients that the services we provide to them are healthy
giving the business teams, our relationship teams, the ability to speak intelligently, rather than just colloquially, about how our systems are running.

Life at our company as an operations person is nicer now because you have confidence that what you're doing makes a difference, that the business service that you're working on is healthy. The business is happier when we're able to talk to them intelligently and say, "I can actually show you that we've been up and successful."

It has helped in our ability to work on smarter things rather than silly incidents. If we eliminate incidents, then we're doing better work. We're able to do the good work of business rather than the sad work of recovery. That's not only quality of life but it's also the ability to get things done. So I know that, at some level, we're doing more with less because of our monitoring. But we don't have any hard numbers from a monitoring perspective.

What's my experience with pricing, setup cost, and licensing?

We're end-of-lifeing it now. Overall, the licensing costs of BMC are a challenge for us in that they're hard costs, whereas open-source monitoring has soft costs, where it's harder to line-item. It's harder to see the cost of implementation for other things. So that change of direction is taking place. It doesn't mean the cost isn't there; it's just soft dollars rather than hard dollars.

Which other solutions did I evaluate?

We looked at Microsoft SCCM. And, because we had a partnership with CA, we looked at their tools. There were a couple of other minor players we looked at which just didn't have the scope of what we needed to do, because of the breadth of technologies that we use. In the bakeoff, we came down to BMC and Microsoft.

It was a long time ago, so I don't know that it's fair to judge at this point, but from a monitoring perspective, the whole Microsoft suite really wasn't there. There was a lot of scripting. It was easy to identify that the administrative burden was going to be high in that implementation. Conversely, with the BMC stuff, out-of-the-box, administratively, you click and implement. That is one of our components of success, our ability to implement quickly.

On the soft side, BMC as a partner was much more interested in our success than the Microsoft folks were at the time. It's very hard to quantify unless you're there sitting in front of them at the table and working with them, consuming their knowledge. It really is a great partnership.

What other advice do I have?

BMC is at a critical point in redefining TSOM, how it's built. Anybody looking at BMC now needs to jump on the new version of TSOM and skip the current versions. I would wait until their new environment is ready. It will be containerized. Anyone implementing BMC can get used to the environment in a PoC but they shouldn't implement until their new stuff is out. I expect it to be that much different.

Make sure that you have stakeholder buy-in and that they are able to provide the resources with the correct knowledge to implement in a smart fashion. Everybody's definition of "smart" is going to be slightly different. We really hone in on the business service side to make sure that our business functions are healthy and that we're able to understand what's normal and what is out of normal. We work with the teams, even from the point that they're in development of projects, to make sure we're ahead of what's going on rather than reactive. But that means the buy-in of multiple teams: development, operations, support. That amount of effort requires stakeholders with decision-making capabilities to say that it's a priority for them.

We knew up front - and we've been able to validate our assumption - that monitoring doesn't do any good unless you are analyzing your business service for what are the critical components to observe. That's an educational effort and an implementation project. It's that upfront effort that will make your monitoring successful. Where we've been able to engage teams and teams have remained engaged, we've been the most successful in that. We took that to heart upfront, we made that part of our route to success, and we put the effort in. Our monitoring's been successful because of that. If we didn't do that, and we didn't constantly engage teams to make sure that they were aware of capabilities including the ability to give us feedback, and that we can implement quickly, we wouldn't be here. We wouldn't have advanced as far as we have. Most of that advancement was in the first two or three years, and we've just been riding that wave of success since then.

Keep in mind that most companies don't go from nothing to an enterprise monitoring solution; they go from one monitoring solution to another. But if there's anyone in the boat that we were in, where they are the size we were with no monitoring solution, they'll be in the pain that we were in. Implementing a good monitoring program, not just the tool, but a program around it, can make a world of difference to the operations teams, and subsequently to the business as well.

For those teams that are utilizing TrueSight, they don't rely on other monitoring environments. Some of those teams rely on those actionable alerts almost exclusively, and don't really use TrueSight's single pane of glass. We do have some teams that consume TrueSight and use it on a daily basis to ensure that they don't have any events, whether or not they've risen to the level of action. They'll also proactively look at some components, either business function components or infrastructure components, to ensure that they're working as designed and within the parameters of normal.

I don't think the functionality of Operations Management helps to support our business innovation. Our business runs forward and headlong into innovation, regardless of whether or not IT can keep up. We were never an impediment, other than cost. The way we run our overall IT environment is very open and flexible. Monitoring is a way for us to give business the confidence that what we're implementing is healthy, but it doesn't impact their interest in being able to implement what's new. They've always been able to do that and continue to be able to do that.

In terms of machine-learning, I mentioned above the baselining which, depending on how it's implemented, might be called machine-learning, but in TrueSight they just have a straight calculation-type of activity. We have other monitoring solutions that we're implementing as well, and that topic may be more applicable to them, but not in the TrueSight world. The TrueSight world is a straight application implementation. It's nothing exciting on that end.

I have to give our BMC partners a lot of credit for where they're planning to take TrueSight based on their roadmap, although it is speculative. I don't think the areas for improvement from us would be any different than anything they've already heard.

If someone were to implement the full suite of BMC products, you'd have to give it a nine out of ten. TSOM by itself, I have to give it a seven out of ten.

ITManager610z9

IT Manager at a manufacturing company with 1,001-5,000 employees

Jul 18, 2019

Download

Single pane of glass has resulted in dramatic improvements; it is bringing people together

Pros and Cons

"We're using native monitoring capabilities for all our server hardware, for visibility for applications, for URLs, for webpage response and accuracy, and for monitoring network throughput in a lot of particular instances. We're using lightweight protocols for pinging, for DNS, for LDAP."
"It's a really good tool and most of the issues we've got, they've either fixed or they're fixing to fix."

"The one piece that I would love to see is a general-purpose, configurable agent which would be a framework that you can deploy on anything, whether it be Java or anything else. It would allow you to easily deploy it on a platform that they support."
"On Windows we went to application HA and, quite honestly, it was terrible."

What is our primary use case?

We stood up an event management group and our responsibility is to monitor the entire company, globally: systems, applications, and infrastructure. We're modeling those out as services. We've got about 800 services that we're modeling out from the CMDB right now and monitoring pretty much everything.

We are big users of the service models. We use CA's SDM system, which we're evaluating. But in the meantime, we wrote the interface between TrueSight and CA to cut tickets and also to, in reverse, give ticket statuses in TrueSight. We're also going through a process of onboarding our services for event management where we go through a checklist of about eight different items and bring them on as a service with SLAs. Some individuals on our Service Desk - and eventually all will be - are dedicated to doing 24/7, 365 monitoring of the services, the events, and the applications.

One of the primary things we're doing is using this as a vehicle, within our "One-IT" initiative - which includes event management - to truly bring people together from a cultural and technological perspective. The goal is that everybody will have the same place to see what's going on. No longer will they have to worry about their application. Is it the databases? Is the network? And how long do they have to spend trying to figure it out? Culturally, the Service Desk is coordinating some of those impacts when they happen, so that the right people are on the call, based on what the service model says. All in all, it's a very flexible tool, which means it's complex but very powerful.

We're using Operations Management, Capacity Optimization, some App Visibility with some of the Synthetic scripting and we're just starting to deploy some Java agents on some app servers.

How has it helped my organization?

With the service modeling, once we managed to build our import stuff to get our CMD impact models and services into TrueSight, that was a big win. Because once we integrate it with SolarWinds, they will actually be able to see when there's a problem with the plant, and they will know if it is a network problem or a server problem. With the service models, they can actually get right down to the impact of any issue. We're working on some other things to make that easier, like event correlation. So if a network goes out at the plant, they don't need to know that there are problems connecting to 60 servers, rather they've got a problem with the router.

We're currently looking at either consolidating the other monitoring tools that we have around the organization or connecting them for the single-pane-of-glass goodness. We're bringing in data from SolarWinds, we're bringing in data from Oracle's OEM, and we're integrated with an application monitoring desktops. It generates an event and a ticket is cut out to the regional support people. They will go to the desktop and say, "Your disk is in danger of imminent failure. We need to go ahead and clone that guy and replace it before you're down." So we're definitely going with a single pane of glass. In terms of our IT ops management, that means it's getting better. We're trying to be more proactive instead of reactive. We've only been heavily into this for nine or ten months so the actual, long-term impacts aren't measurable yet. We're still baselining where we are at.

The single pane of glass is a big improvement.

There is also the ability to do predictive and corrective, especially for some services which we're monitoring out in the field which are critical to various plant components. It used to be that they would go down and the plant would call. Now we're detecting that they're down, we're restarting them, and we're letting somebody know there's an issue. That's also a big improvement in our manufacturing capabilities. Culturally, it is bringing people together with one place to look and giving them something to talk about when there's an issue. It's bringing IT together. The collaborative and predictive stuff is actually starting to improve.

We're not doing a tremendous amount of preventative stuff yet - unless you count when your disk is three percent from being full and you need to do something before it fills up. We're not using some of the more advanced features of the predictive analytics yet. We are starting to look at some data analytics though. We have a data analytics group which we stood up, a couple of people who are starting to use data analytics to do some things.

It's improving the overall operation, but the impact is going to be measured a little bit later. We've seen some cost deferrals and some cost savings with some support renewals we haven't had to do on some other tools. But we haven't seen the major cost impacts yet. We have spent a lot, but on cost-avoidance for various support tools we have saved close to $1,000,000. In the nine months we've been operational, we've deferred cost on at least two tools. One was about $750,000 and the other was $250,000 for maintenance.

It also helps to maintain the availability of our infrastructure across a hybrid, complex environment. I used to work at FedEx and we're not as environmentally complex as FedEx because we consolidate a lot of stuff on the ERP. But if you throw manufacturing in there, we have pretty much every flavor of platform. As with most deployments, we've got three-tier and four-tier applications. You throw the network and some load-balancers in there and it's fairly complex. If you can use a service model to see exactly what's working and what's not, it really gives you the ability to look at some things.

The solution has also helped to reveal underlying infrastructure issues that affect app performance. Let's say there is a system that is occasionally slow but you don't know why. Then you find out that it was supposed to be configured to use a large number of LDAP servers for authentication but somebody had configured it to one. When you compare the times at which the systems people were having trouble logging on and you look at the CPU and memory usage on your LDAP server, you begin to put things together, without actually analyzing configuration files. You can figure out that the system is configured improperly. When they dig in, they find that it's only talking to one LDAP server. It gives us that kind of diagnostic capability, by looking at everything, and the ability to pin things down.

In terms of root cause analysis, we're still working that through. But mean time to repair is going down because it's becoming much more obvious. Between the events that people are looking at which are prioritized, and the service models which show the actual impacts to the relationships, it's becoming much easier. Depending on the event, it's gone from about four to five hours down to 20 minutes. When it works, it's significant. A lot of it is cultural. When you go from everybody monitoring their own stuff and not talking to anybody else, to everybody looking at the same single pane of glass, and you throw a Service Desk on top of that, which is performing incident management and coordinating some things - between the technology and the culture and the process changes, you're going to see some pretty dramatic improvements.

BMC just did a custom KM for us. Typically, on a given server, we want to know when a drive is three percent. But we've got some mixes of drives, servers which have anywhere from a 100-gig drive to a terabyte drive, and the percentages that we are worried about are not the same. This request came from our SQL group. BMC was able to adjust the alert parameters based upon the size of the logical drives. That was definitely a business innovation. I think that was good for BMC too. Although that's a custom KM which we just deployed, I suspect they will make that part of their standard tool kit.

What is most valuable?

From a TrueSight perspective, we love the Capacity Optimization. We manage to collect almost all our capacity information through agents, without having to deploy a capacity agent. We've already saved some money. We're now provisioning more for obsolescence than we are for expansion because we now know exactly what we've got. One of the nice things about it is that we've now put Capacity Optimization in all our plants and mills, where the money's actually made.

The flexibility of the MRL is great. The various abilities to use native KMs to connect to a lot of things that we're doing with the hardware monitoring into the consolidated stuff, like SharePoint, is great. We're using native monitoring capabilities for all our server hardware, for visibility for applications, for URLs, for webpage response and accuracy, and for monitoring network throughput in a lot of particular instances. We're using lightweight protocols for pinging, for DNS, for LDAP. We use the scripting KMs for a lot of stuff that we have to script ourselves. We're also doing a lot of SNMP polling for devices. We've got some places where we really couldn't use a traditional agent and we deployed a Java agent that we wrote. For example, we might be monitoring UPS's out in the field using a Raspberry Pi and pushing that data back up. The problem with UPS's out in the field, when you have thousands of them, is that you don't know that the battery's bad until the power goes out. This gives us the ability to enable them to report back via SNMP.

What needs improvement?

I can only speak from my perspective because I don't know if some of the issues that we've had are industry-wide or not. For instance, we've got a lot of Microsoft stuff here, and the SCOM interface is very difficult to use. They don't have support for SCCM and some other things so you have to go directly.

The one piece that I would love to see is a general-purpose, configurable agent which would be a framework that you can deploy on anything, whether it be Java or anything else. It would allow you to easily deploy it on a platform that they support.

The KMs and some of the user interface are a little bit quirky. That's the stuff that they will eventually get to. TrueSight is a fairly new platform revision for BMC. I'm seeing a lot of those simple platform things, where you have to go here and do this and you have to go there to do that. They're very working very hard to integrate everything into the same simple console. I think that a lot of the issues that we have are going to slowly, or maybe rapidly, disappeared.

For how long have I used the solution?

We installed it a couple of years ago. We started ramping up and have been using it since then. We really went hot and heavy about nine months ago. We moved from Windows to Linux in January so that's when we really started to invest in event management work with it.

What do I think about the stability of the solution?

On Windows we went to application HA and, quite honestly, it was terrible. They'll tell you it's terrible - or they should. We are very religious about patching, so when you go to multi-node HA stuff and you've got the Windows guys patching your stuff every Saturday night, you become very unstable. What we did was we moved to Linux so that the patching wasn't necessary as often. And we went to operating-system and hardware-level failover with Oracle Solaris virtual machines, and we've been incredibly stable since then.

What do I think about the scalability of the solution?

Regarding scalability, so far, so good. We've got about 22,000 devices that we're working with, of which about 8,000 are directly monitored. The rest are coming in from SolarWinds, the network, and some other things. We're running three TSIMs and one parent, so four infrastructure managers. We've got integration servers all over North and South America and Europe. It's very scalable.

In terms of users, it's mostly IT right now and a few business people. We've also got 300 to 400 service providers who log on and look at things occasionally. A lot of them just use the ticketing system. They don't actually get into BMC. They just work their tickets and close their tickets.

As for increasing the usage of it, the foremost thing in our pipeline is to continue to bring on applications. As part of the service onboarding that I talked about, we're bringing in major applications and sitting down with the service owners. We're going through everything they could possibly want monitored and showing them what we can do for them. We're putting those thresholds in place, training their teams, and bringing their teams on as users. Slowly, over the next year to year-and-a-half, we will bring in all of IT.

How are customer service and technical support?

Tech support varies, it depends on who you get. The first-tier is pretty good. If you get the right guy, it's outstanding. They've actually brought on a lot of new people, but they seem to work together as a team. I won't say they're bad, but I don't like tech support for most companies. Overall, they're on par.

Which solution did I use previously and why did I switch?

Prior to BMC, from a monitoring perspective, we were using 65 other solutions. One of my missions is to either integrate them or consume them. Bringing on TrueSight was the vision of a guy who's no longer here. He fully understood the need for a single pane of glass. He understood, fully, the need to bring light to the monitoring situation. We did some evaluations and proofs of concept and decided on TrueSight.

Quite honestly, if you're a large corporation, you can go look at the studies and you can justify it that way, but if you stop and think about how much better your organization can run, and the things that you need to do from an operations management perspective - and you think about the automation that you can put in place - it's a no-brainer. It's just a matter of choosing which tool.

How was the initial setup?

The initial setup was complex, no doubt, by the time you bring in Professional Services, if you opt to. We didn't follow the standard model because we didn't want them to come, drop in a configured system and say, "Here's the book on how it works," and then walk away. We wanted them to participate in every aspect of it. We brought a lot of it on ourselves, where they told us what to do and we did it. We worked with the Pro Services to do it, so we took longer than it probably should have but we knew more about it than we would have as a result. It's a very flexible product, which means it's a very complex product. We had enough servers and monitors that we had to bring up a multi-tiered, large number of TSIMs. It was because of our service models that we introduced a lot of the complexity ourselves.

Because we're pushing full sets of service models out of our CMDB and into TrueSight to use as a service model, we have to put them at a top level of a TSIM so that all the other TSIMs that feed into them can show up as impact models. We went to a three-tiered architecture with presentation on top, a service management infrastructure manager in the middle, and the integration managers below. So a lot of the complexity in our particular configuration was due to the fact that we didn't want to have to figure out where those services belong, or which piece belonged on which TSIM. We wanted to punch them out to the top and then let TrueSight worry about it. So in the long run, it was complex to install but it is much easier to maintain.

The deployment took about three months. There was one person from BMC and about five people, altogether. We had DBAs involved and we had the hardware guys involved and the network guys involved. It was probably three people full-time but, off and on. Every aspect of some department that would touch this thing was involved at some point.

There is a team of five employees and myself who are not only maintaining it but doing all the monitoring configuration - working with users to collect monitoring requirements, setting thresholds and writing custom MRL and PSL.

At the cultural level, it used to be when we first started it up, people would say, "I have my own monitoring tool and I don't need you people. I'll do my thing." Now, they're saying, "You're doing things for these other people, can you, can you help me out?" It's really grown organically, and we've had to put a team together so quickly that there has not been what should have been in place, which is a major deployment plan, where all of the pieces would fall together. We're starting to work on that now.

What about the implementation team?

We worked directly with BMC. We didn't use any third-party.

What's my experience with pricing, setup cost, and licensing?

The only possible additional cost that I can mention, that you might not be aware of, is that it uses Oracle partitioning, if you use Oracle. There are Oracle partitioning fees that go with that.

Which other solutions did I evaluate?

We looked at some other options. BMC has been around a long time. If you look at the industry ratings, it's way up there, top-right quadrant, along with a couple of other solutions. Its flexibility and its capabilities dovetailed with what we wanted to do and we liked their people. They have a good attitude.

What other advice do I have?

My advice is that it's not going to be as easy as you think, but it's going to be worth more than you think when you get it done. It depends on your situation. It depends on how far advanced you are in operations management. For us, this was a complete cultural, technological, and process overall. It wasn't just replacing one tool with another. It wasn't just putting a tool in place. It was an entire IT renewal and it's still going on.

It's been a long, hard road, both from a cultural perspective and from a technology perspective, just getting people to realize the value. But once they do, they're willing to bend over backward for you.

We had some false alerts. In my job the red light means it's bad and the green light means it's good. There should be no light you think is green but it's bad. We had some of that at the beginning, more our fault than anybody else's. But once we got to the point where the signals were good and people could appreciate what they are getting, we became a very different organization.

The biggest lesson I've learned from it is that you can talk about it, you can visualize it, you can proselytize about it, but until you have a single pane of glass which is actually up and running with a lot of stuff connected to it, you just can't really appreciate the value of it.

The functionality of the solution is not helping, so much, in terms of business innovation. We're not doing business process monitoring at this point. While it might be that the business is not complaining as much, I don't measure that. But from an innovation perspective, it has had people look at things and say, "Well, if you can do this, can you do that?" We get a lot of requests for strange things, some we can do, some we can't. But it's getting people to think about things that hadn't really come up before.

It's a really good tool and most of the issues we've got, they've either fixed or they're fixing to fix. So a nine out ten is right.

SrManagef31c

Sr Manager at a tech services company with 1,001-5,000 employees

Jun 19, 2019

Download

It covers so many different technologies which can roll up into a single console

Pros and Cons

"It is breadth. It covers so many different technologies which can roll up into a single console."
"The noise reduction for ticketing works much better than we have seen in a lot of other companies."
"The BMC TrueSight platform wins probably 80 percent of the time if you look feature by feature."

"I definitely would like to see more improvement in the self-diagnostics. I need to know when anything is not working or collecting, long before our customer finds it."
"The SLA reports that we get on TrueSight today are unfortunately worthless."

What is our primary use case?

My company is a data center service provider. We host and manage IT for all types of different companies, using TrueSight to manage and monitor the health performance availability of all our customers' environments: networks, servers, databases, websites, and all their back-end IT.

Right now, the focus is pushing DevOps and AIOps in our more traditional data center management. We are not using it in the cloud space today. Therefore, the focus is the traditional data center space, but for us, that is a very large space.

How has it helped my organization?

One case that we like to use a lot: We have a customer who uses F5 load balancers, and they were managing them with CA products. Those load balancers were generating around 11,000 tickets a month. Just moving them from CA to TrueSight, and replicating the same rules, they went from 11,000 tickets a month to 400 tickets a month. TrueSight did a much better job of doing the same thing. Then from there, we were able to tune it. We got it down to about 40 tickets a month. While this is an extreme example (I don't usually see this type of improvement), it shows the power that is there.

We are able to more quickly identify problems and get an engineer on it to restart services, etc. It is not fixing the customer's bugs. They've got buggy apps, and it goes down all the time. It is just that we can get them back online faster.

What is most valuable?

It is breadth. It covers so many different technologies which can roll up into a single console.
The noise reduction for ticketing works much better than we have seen in a lot of other companies.
We're starting to get into the machine learning pieces to further enhance the intelligence of events.

What needs improvement?

Continue to improve the maturity of the product overall.

I definitely would like to see more improvement in the self-diagnostics. I need to know when anything is not working or collecting, long before our customer finds it.

I would like to see continued improved integration with some of their partners. We use a lot of Intuity software. While the connections are good, they could be better. We use App Visibility, as part of the TrueSight suite. Previously, we were a big BMC TMRT customer previously. They gave up a lot of features of TMRT to get App Visibility in. Features that our customers used. They still complain about this weekly: When are we going to get this report or view back.

When we took this issue back to BMC, they said, "It wasn't an upgrade from TMRT. It's a brand new product. It just happens to be serving the same market." From my user standpoint, we went from BMC TMRT to BMC App Visibility, giving up all these features. For us, it was an upgrade that we lost features on. I need that stuff back, at the end of the day, as a service provider. The customers need to feel comfortable that the data is there. They need to have accurate SLA type reports. The SLA reports that we get on TrueSight today are unfortunately worthless. They go to the whole integer. So, they all show 100 percent, when we've got contracts which are 99.996 percent and are now rounding to 100. Well, if we were at .9995, that's an SLA miss. Things like this are a problem. We have to do all this manually on the side. We can't roll this back, as the versions that we used to use are long out of support.

The biggest issue is probably the gaps in the reporting that I need for my end customers. That is a very public and embarrassing, I can't give you the report that you need. Also, the reliability of the ISNs needs improving. Having a customer find a machine that stopped collecting before we do, that is not what you want when you're a service provider.

For how long have I used the solution?

We have been a BMC client since 2001. We've been through many generations of the product.

What do I think about the stability of the solution?

The stability has a bit more maturing to do. There is still room for improvement. Overall, it's pretty good, depending on which layer you're looking at. At the highest level, which is the presentation server, we find that we have to restart that every two months or so, just because it stops responding. I would like it to be a bit better. We don't have any real understanding of what's causing that. The next layer down is the infrastructure manager level. That's probably about the same, every couple of months it stops responding. As you then go farther down to the data collection layer: the ISN level. Those aren't as stable as they need to be. They will go for six months fine, then fail three times in a row in two weeks. It doesn't give us a good alarm, and unfortunately, we've missed an event. Then, the customers notice something, and that didn't pass its events. So, a little more maturity is needed here.

What do I think about the scalability of the solution?

It's scaling fairly nice, but not as large as we would like. We are not seeing the type of scalability that BMC claims. For example, they say that you can run 900 agents against an ISN. We find the ISN stability goes down when you hit 500 or 600. So, you're only at two-thirds of the capacity. I forget how many millions of things that the TSIM was supposed to be able to handle. We are no where near that capacity. We're spinning up more TSIMs because it's just not scaling as advertised.

How are customer service and technical support?

Technical support is a mixed bag. Some tickets go in and are handled very quickly and well. However, we have had tickets which go in and have been out there for months, and some of them were fairly complex. They will go up to Tier 2 or Tier 3, then park. I'm assuming that we're running into a software bug, or something, but those tickets that stall out are frustrating.

How was the initial setup?

It was complex. I wish we had put Professional Services into the deal. Being a service provider, we are attached to companies all over the world with very strict auditing and security requirements. Therefore, designing the architecture to work in that environment was fairly complex. I was just talking to a product owner about the problems that we still have.

Once we get the architecture, the deployment went fairly smoothly. The policy creation and management were much more complex than in their previous products. It is probably more powerful, but not as easy to administer.

They have rolled things, which were multiple products separately in the past, into a single product. They've had to do some consolidation, or adjustments, to be able to merge them quickly to get their product to ship. This left some things missing. Some features that used to be there are gone. Features that we used to use. So, there are pain points, as we figure out how to work around the new gaps.

What about the implementation team?

We did it ourselves.

Globally, I've got six engineers and 12 operators who worked on the deployment. This is a sizable group. However, I'm currently supporting global operations of a couple hundred clients, and they're major clients.

What was our ROI?

TrueSight has helped reduce IT operations costs. From a software standpoint, I have been able to eliminate a lot of other tools, saving approximately half a million dollars a year in other maintenance costs. That is easy savings. The more important one is the labor savings: more reliable, simplified tickets.

The time savings are recognized by the operations teams, not my team. Therefore, it's hard to know the time savings, but if an operations person takes at least 15 minutes to analyze a ticket and their ticket volume is reduced by 10,000 a month, then TrueSight does save time.

We've been reducing ticket noise five to ten percent annually every year, and it has been cumulative. This means less tickets, noise, and operator intervention.

What's my experience with pricing, setup cost, and licensing?

It is a large, complex product. So, there is a commitment of manpower to deploy it, as it is not a cheap product.

We license per named endpoint for most of the products: servers, network devices, databases, etc. You pay for the initial license and maintenance. The way that my company looks at it is we figure out our monthly costs over five years, and right now, we are between five to six dollars. We need to get that down to about four dollars. That's included in the maintenance.

There is a big upfront cost when you buy the license, then there is annual maintenance. We look at, if I bought a license and paid for maintenance for five years, then average it out, what would be my monthly cost. We have had some of the competing tools come in around four dollars. This is coming in as a premium, which is why I don't have it deployed as I would like it. Therefore, we're in negotiations right now. If I can get it down to the four dollar range, I will triple my deployment in a year and a half. If they could could me to the right price point, there are 10,000 to 15,000 servers that I would install it on.

Which other solutions did I evaluate?

As we've acquired other companies, we've picked up pretty much every other tool set out there: CA, IBM, SolarWinds, etc. We have played with pretty much everything. The BMC TrueSight platform wins probably 80 percent of the time if you look feature by feature. It's a good, strong platform. It's ability to run on all the OSs that I've got is a huge thing. We do a lot with IBM iSeries, and a lot of vendors don't cover that. So, this is a big positive on the platform.

Being able to roll everything up to a single database and single feed out for reporting are all very big positives. The same type of consolidation rules under CA, if you write them in BMC, they just work when they didn't work in CA. Things like that make BMC great.

What other advice do I have?

You really want to plan out your policy and architecture in great detail before you start any deployments. It is a complex product. You don't want to have to go redo it. Pick a small environment, test out your plan, test it out a second time, beat it up, and once you're happy with it, then go nuts by deploying it everywhere. It's great once it's there, you just have to get past that design hurdle, because there are things that aren't necessarily intuitive.

I have a mixed bag impression of the usability. The end user experience is mostly good, as it's a very clean interface. There are some quibbles with it. You have to drill into a lot of layers to get into the data that you want. However, when you hit "Back", it takes you all the way back out of the tree. Then, you have to redrill into all those layers. That is a bit of an annoyance for end users. From an administration side, it is still sort of heavy, and policies are very complex. Therefore, it takes a fairly senior level engineer to build it and get it to work well. But, once it's working well, I can monitor tens of thousands of things.

Definitely get multiple references from each of the clients, since all salesmen lie. They all promise the possible best scenario, and I have found depending on the client that you get very different experiences. So, the claims that the BMC sales guys have made are all achievable in a perfect environment. No one has a perfect environment.

Claims from CA, I have found to be outright fabrications, such as, "We can do this." Then, we buy the product. "Oh well, you actually need Professional Services, and you're going to need like three years of custom coding." Millions of dollars down the drain with them.

Other vendors have different levels. They all come in very rosy, and sometimes too much. So, talk to people who have really done it. Take their advice. Don't assume that they didn't know what they were doing. There are a lot of good engineers out there. If the company is struggling, assume you will also struggle.

John_Rooney

Vice President of Managed Services at Park Place Technologies

May 23, 2019

Download

Enables us to proactively service our customers and even warn them about problems before they occur

Pros and Cons

"The fact that they have a very integrated relationship with Sentry Software, the Knowledge Module, is valuable... The richest feature for us is the number of Knowledge Modules that we can load into the product to add breadth of service to the customer. It enables us to move up the operational stack from hardware, to operating system, to application, and to cloud... That enables us to provide one pane of glass over all those layers - hardware, OS, app, and cloud."
"It has provided more value than we expected and it does what it says it's going to do."

"Reporting would be an area for improvement in TrueSight... We have almost 800 customers today on TrueSight and just under 10,000 assets. We need to be able to give a customer some information. If the customer's product fails, they'll ask us, "Did it have a problem beforehand?" We have all those events and we know all the problems it had beforehand. We have to be able to give them access to that kind of reporting. That's an enhancement that we need."
"Reporting would be an area for improvement in TrueSight."

What is our primary use case?

Park Place Technologies brought in TrueSight for three reasons. The first reason was the Presentation Server - the architecture.

The second reason was the fact it has the AIOps piece.

The third reason was their partner, called Sentry Software, out of France. We are a hardware maintenance company. We're probably one of the largest providers, worldwide, for replacing drives and storage equipment. We brought TrueSight in as a means of seeing if we could reduce the number of physical touches on a service ticket from eight to two. We've been accomplishing that with TrueSight and the Sentry software.

We provide post-warranty support for storage equipment and data center equipment. For example, if it's a VNX piece of storage gear that goes off warranty, we come in and we maintain it at a high level off of what the customer paid the OEM. We do the parts and the service in 35,000 data centers worldwide. TrueSight is enabling us to get that done in an automated fashion.

Sentry is the Knowledge Module we use in TrueSight. It has all the information about the storage equipment that we maintain. It tells us the part, the chassis, serial number, and all the detail that we spend a lot of time on phone calls with the customer trying to ascertain. We're doing that automatically now.

How has it helped my organization?

We brought the product in to handle the following: We're in 35,000 data centers today. We have 16,000 customers and we support about 400,000 assets. Those are big numbers. The pieces of storage equipment we provide have something native from the equipment manufacturers, the OEM, called "phone home." What happens is, when these devices start having a problem they send out an email that says, "I'm having this problem." To put that into perspective, we were trending towards 2,000,000 emails at the end of 2017, and growing. We would have to read 2,000,000 emails to find out what was going on. Something lower than seven percent actually had a problem we really had to read, and something well below one percent of those were actually a service event.

Before we brought in TrueSight, there were 8.2 touches via email or phone call after the ticket had come in, including exchanging log files with the customer through to our resolving it. And on the customer side, they had somebody having to look at the equipment to make sure it was actually working. From those 8.2 physical connections with them, we're down to two with TrueSight.

And here's the big difference. Instead of these things sending all of that information out in those emails, it's captured in the Knowledge Module, the policy and the agent, on the customer side of the firewall. What TrueSight does is that when it installs it takes a week to come up with what's called a dynamic baseline. It says, "For this piece of equipment in your environment, these are the key performance indicators that we're going to watch for." We can see events live when they happen. There are predictive and proactive warnings of failures or potential problems. But all that we ever get, the only thing that's communicated to us, is when there's a failure. So we can see all the chatter and we can look at that by customer, but we don't really need to. And if it's a predictive event, it will send us a notice saying, "We think this part's going to fail in two weeks," and we can help that customer.

But ultimately, what we get is a service ticket: "Failed part at this location. Here's the part number, the serial number, and the recommended remediation." That comes into our support center.

Eventually, when we have it all set up the way we envision it, the info will come into the support center and a ticket will be created and it will automatically connect to the tech and the tech will reach out to the customer. We haven't turned that on yet.

Right now, it comes in and we read it. We call the customer and say, "You have a failure." In most cases, the customer didn't know they had it yet, because it's that fast. We call them up and say, "You have a problem. We have the part, and when would you like Larry to come on site?" Because it's storage, they have to schedule downtime. Then we go out on site, we fix it, and we're done. So it's two physical touches now: We call them and they say, "Yes, it's completed."

So 2,000,000 emails have gone away, pretty much, and it all gets done at the customer site. What we see now, instead, is a couple of hundred or 1,000 service events, versus millions of emails. And we have the right part, the right chassis, the right location.

In our industry, there is about a 75 to 78 percent first-time fix rate, meaning repair personnel do not have to go back to a given site within a week. As a company, we were at about an 86 percent first-time fix rate. With TrueSight, we've never gone below 98 percent.

It's all done with software. I read all of the service emails from our customers. Customers are used to finding a log file and talking to our expert - and if a customer has five different pieces of equipment, there are five different experts involved. Now, they send a note in and they'll say, "This is resolved. I just want to make sure this process is working the way it's supposed to. I didn't call anybody. You called me to tell me I had a problem that I wasn't quite aware of. Now, I have a part, it's fixed, and we're good. Is that how it's supposed to work?" It's funny, because they were used to eight different interactions with us, as opposed to two. It's really cool.

It's taking an extremely manual process and, with the AI piece, literally helping us make better decisions. It's what AI is all about. It's really amazing. I'm excited about it because now, instead of our support center people trying to find the right part, they're calling the customer and saying, "By the way, you have a problem. We have a solution for you, and we notice in the same cluster you may have a failure in a week. Would you like us to look at that while we're there?" It's predictive, proactive maintenance. That is what it enables us to do, versus reactive.

Today, when we are proactive, it's for a fan or it's heat or it's a battery. We get notice they are about to fail and they fail pretty quickly thereafter. But when we start getting to operating systems, there are days, as you know, when you have gone on to your computer and it's been slow. On those days of the month, you can probably look in your network and find that there was a big push to get something done. With TrueSight, we'll be able to start proactively predicting these events before they happen, and rerouting the customer so they don't notice a slowdown. Our tagline is all about uptime. TrueSight helps us deliver that. It helps us deliver upfront.

What is most valuable?

The fact that they have a very integrated relationship with Sentry Software, the Knowledge Module, is valuable. We have one Knowledge Module that we're using today, which is the Sentry KM. We're bringing on the operating system Knowledge Module. The richest feature for us is the number of Knowledge Modules that we can load into the product to add breadth of service to the customer. It enables us to move up the operational stack from hardware, to operating system, to application, and to cloud. It's one presentation layer, one path with these Knowledge Modules, which we can add to it to get greater breadth.

That enables Park Place to provide one pane of glass over all those layers - hardware, OS, app, and cloud - which gives us a really good opportunity with the AIOps piece to get root cause analysis. And that's what our customers want: one pane of glass and a detailed root cause. If you've ever been in a data center when something goes wrong, the first thing they ask is, "What happened? What went wrong? Why did it break?"

It's the Knowledge Module which is the biggest feature that benefits us.

What needs improvement?

Reporting would be an area for improvement in TrueSight. In its purest form, TrueSight is an enterprise product, meaning one company would run it in its internal data centers and internal IT organization. But our company is more of a managed-service provider. We have almost 800 customers today on TrueSight and just under 10,000 assets. We need to be able to give a customer some information. If the customer's product fails, they'll ask us, "Did it have a problem beforehand?" We have all those events and we know all the problems it had beforehand. We have to be able to give them access to that kind of reporting. That's an enhancement that we need.

For how long have I used the solution?

We white label TrueSight, but it's TrueSight at its core and we've had it installed here for just under three years. Version 10.7 is our production instance and we're using version 11.3 in Azure. We're moving to a cloud platform and we're doing that with 11.3. I was hired about 15 months ago.

What do I think about the stability of the solution?

The stability of TrueSight, in its natural form, is very good. We had stability issues with it because we were doing things that were outside of that normal boundary. We were bringing in way too much information. We didn't know how to filter it. Once we got the filtering in place, it became very stable.

In our six- to nine-month process of doing the proofs of concept, when we got to that ninth month we were bringing on as many customers as we could and we were getting everything we could possibly get from all of them. It took us about three months to tune that down, with BMC's help. The product was always stable before that. The product itself didn't fail. We just overwhelmed it. If you talk about data lakes, we flooded the lake every day. And it didn't stop. We just kept bringing more stuff in. Once we added the filters, we tuned those valves, it stayed up and has been running really well.

What do I think about the scalability of the solution?

Our focus is to get 16,000 customers in TrueSight. We're walking up that scale every day. Once we figured the filtering out, we started getting the scalability. Prior to that, we were going the wrong way on scalability. But the elasticity seems to be there, the ability scale.

How are customer service and technical support?

On a scale from one to five, with five being the best, I would give BMC technical support four-and-a-half. It's not a five because the reporting piece is still missing. We need the reporting piece. They can't give us all the help, because that help is just not fully there.

Which solution did I use previously and why did I switch?

We had our own homegrown system - an email box - that the stuff came into. That's all we had. We did not use a competing product.

We went down an RFP path over three years ago. Our company has grown pretty dramatically. Between 2015 and two weeks ago, we made 14 acquisitions. There was no way we could grow the business mechanically with an 8.2-touch model in place. The support centers would be the biggest expense in the company.

It was a two-pronged approach to looking at resolving the opportunity that our growth created. The first approach was the customer, to give them quality of service: not having to get log files, not having to figure out what's going on at their end, and not having to call us.

The second approach, for the Park Place support center, was to give them better tools to provide better service to the customer. We wanted our support center to go from trying to figure out what the right part was, to letting the customer know they were about to have a problem. That's a big difference. Both of these approaches have happened.

If you put that around the world with our growth, we now have a global approach with regional focus and local delivery. Because the systems are reporting the information, we don't worry about time zones or language. All that stuff goes away. The machines speak MIB, and the MIB communicates through TrueSight, and we get the information. We don't have to speak the local language until we go out and fix the problem, because the customer is not calling the support center anymore. We have a global footprint with a regional focus. In APAC, they're looking at problems that could be happening overnight in the US and vice-versa, or in EMEA. The problem is resolved, the customer is communicated with, and the person providing that communication speaks the local language.

The machines are literally running this thing, and all we are is the delivery model. TrueSight crosses all those barriers. It crosses time zones, it crosses language; it has all the pieces we need to know about repair, including the part and the location. It knows everything we need to know about the equipment, all the software, the LPARs, etc. It gives that to us in the support center, we contact the customer, and then we speak the local language and we bring the part locally.

How was the initial setup?

It's very straightforward from a setup perspective. We were able to install it and get it running relatively quickly. That's not the hard part.

The complexity comes in because, instead of it being what I would call an off-the-shelf product, TrueSight is a series of products with an encyclopedia of tools and they all add benefit. But getting those tools to work, that's where the complexity is; knowing exactly which piece to pull and to connect. An example would be putting filters in place. That took us a while.

If you look at an average installation, it takes three to six months to get up and running. We got up way faster than that, but it has taken us about a year to get the engine to run at the capacity its capable of. It's like gas mileage where you have to drive it properly to get the right gas mileage. That has taken us some time to do. But once we got there, we have certainly been getting everything that's promised.

Park Place was up and running within a month to two months. Our production product was probably nine months out. That's when we started figuring out the filtering. We brought everything in and opened all of the spigots up, and we had all this volume coming in. With BMC's help - they were very helpful in this capacity - we were able to turn the valves to the proper flow, so we weren't flooding the thing every day.

Our implementation strategy was to put it up in a proof of concept first in a DevOps environment because our goal was to bring it out to customers. Once we got it into production, we started bring customers on as PoCs. We did about six months' worth of bringing on the customers, making sure we could bring it out and get its sea legs. Then we started deploying customers as fast as we could. And that's when we went from 10.5 to 10.7, and now we're moving to the new platform with 11.3.

What about the implementation team?

We installed it ourselves, but with BMC's help. We did it ourselves with them looking over our shoulder.

To get to the 10.7 and 11.3, their services, the Premier Support, created a "cookbook" for us to do that migration. That was extremely helpful.

And from a consulting perspective, as part of the Premier Support, we were able to get the right consultants in to help us fine-tune that motor. They would come in and look at it and say, "By the way, you can filter this stuff out because you're not actually using it." I liken it to our cell phones when our data plans are out of whack with our use. We pay way more than we need for our minutes and a consultant comes in and says, "You can do this, this, and this, and be more efficient." They've been very helpful there.

What was our ROI?

As an example I looked at recently, we had a customer that was doing 27,000 emails a month. That would mean that if we spent 30 seconds reading each email, it would total 2,700 hours per year just reading emails. And that's not solving the issues. That works out to 1.3 FTEs just reading the emails for that customer. Suppose all-in, in the US, we're paying FTEs $72,000 to $75,000 just to read emails. Our license fees are certainly less than that for that one customer.

In terms of ROI, we haven't fully gotten there yet. We've reduced those 2,000,000 emails by 42 percent, so far. We haven't gotten them all done yet. But who do you think is on our list to get moved over to this solution? That customer 27,000 emails, we're going to move them over as fast as we can.

Our ROI is to get people off the old, manual system with 8.2 touches and down to two touches. Once we start hitting critical mass, the product will certainly pay for itself in a very reasonable period of time.

What's my experience with pricing, setup cost, and licensing?

We pay license fees of between $150 and $200 per asset.

In terms of the product's pricing, we don't pay per item and it's not crazy. It's cost-effective enough for us to offer it for free on storage, and we've got some 4,000 storage assets using the product every day.

We bought a large block of licenses. Interestingly enough, we provide TrueSight for free for our storage customers. We thought it was that important, to give them the licenses for the Knowledge Module and the policy. We do charge for network and we do charge for servers.

There is an enterprise software license fee, and then you pay a percentage for your maintenance, and then Premier Support. For example, if you buy a two-year license for the product, then the maintenance fee is added to that for two years at X percent a year. Then there's a small fee on top of that for Premier Support, which I would highly recommend to a company. Standard support gives you normal support processes, while Premier Support is 24/7. It's at a much higher level of support. For a production environment, I would strongly recommend it. In comparison to the extra cost, the value of Premier Support is very worthwhile.

Which other solutions did I evaluate?

We did an RFP and looked at seven products. Although I wasn't here when the company did that, I know they looked at Nagios.

One of the two key reasons for choosing TrueSight was the AI piece, the artificial intelligence; that's the promise of the future for us. We get some of it today. We have the predictive and proactive parts today, but we're going to grow that as we grow up that stack to go to OS, and application, and cloud, to get more AI value.

And the other one was the knowledge module relationship they have with Sentry Software. We're in storage hardware. That's the number one product out in the market. Sentry is a partner with BMC and that has been the lifeblood of our whole "global, regional, local" approach.

What other advice do I have?

The advice I would give is not to make a mistake and think it's an off-the-shelf product like Office 365. Understand that it's a very robust set of tools and procedures. You really should define what you want to do with it before you bring it in the door. If you had asked us before we brought it in, we had an idea, but we didn't know exactly how we wanted to utilize it and that was because we didn't know the capabilities of it. We thought we could do X, and we found out what we really needed to do was Y. It was that gap that we had to fill, and that took us time. So the better you can define your requirements, the quicker you'll gain the true value with your outcome.

Believe me, we're seeing true value. But if we had had a better definition of what we needed up front... We thought we had all the information in the RFP but we probably didn't. I'm not sure you ever can do that, but do a good job of architecting the scope or the spec of what you're trying to do and then get their input. They can give you that information and that's when you get your true value. When those two things meet, you get the value prop.

Working with BMC has been interesting. It's been very helpful. They're part of our team, which is great. They bring their partners to the table. Their partners don't have an agenda. Everything that we get done is literally for us as the end-user and for our customers. I've not had that often before with software companies.

They invest in customer satisfaction to the point that we've asked them to implement some things that are a little bit beyond the normal scope of TrueSight. We're using it for 800 customers in an instance of TrueSight, where it really should be one TrueSight for one customer. And they've helped us make all that work, and arm-in-arm.

With Sentry it has been a team effort. Sometimes we don't know who on the call is not on our team. We're all having the same conversation, and it's not a situation where "BMC said," or "Sentry said," or "we said." It's one common unit. We had a call yesterday about architecture and making that whole piece work. I said to their architect, "Gee, you know I really like that document you put together." He said, "John, you can use any piece of that you possibly want. Go ahead and take it and do anything you need to do with it, make it work your way." That doesn't happen very often, where someone is building their own thing and they come back to you and say, "Yeah, you can use it any way you want. Just make sure it makes for you."

We have 11 people who are installing agents and policies at our customers' sites. Their job is the implementation with our customers.

In terms of people actually running TrueSight within the company and our IT infrastructure, we have parts of a couple of people. It's a part of their job. It's almost like shift work. We have a part of a full-time person on a daily basis engaged with TrueSight care and feeding. Running the product requires less than two people, all-in.

We will be hiring a new person to be a TrueSight architect, because we're bringing on more of those KMs and we need somebody who can help us do the rules management. They're not going to be here running the product, they're going to be adding new features.

Overall, I would give the product a very solid nine. If I had the reporting piece, I would give it a ten. It has provided more value than we expected and it does what it says it's going to do. You can't ask for more from a product than that.