We use ITSI in the health industry. In the UK, the NHS currently uses ITSI as one of its monitoring sources of information. In ITSI, service components are based around each area of the NHS. For any solutions that have been digitally transformed and require monitoring related to our vaccination campaigns, the logs are ingested through Splunk and monitored through ITSI.
We realized ITSI's benefits immediately after it was deployed. When the COVID pandemic broke out, it kicked off a lot of crazy stuff within the UK. Having a powerful tool to aggregate data and allow real-time monitoring helped our campaign.
ITSI can help us right-size resources, but it depends on how you do things. We have a culture, and Splunk told us not to do this because they have different methods and stuff. In ITSM, you skim what you need at the source and then push that into Splunk. Having that as the centralized logging analytic is great for that, especially when so many things are tied to ingestion, storage, etc. However, for what we do, it leaves much to be desired. You're talking about an enterprise solution on the scale of the NHS with multiple people, contractors, and all these moving parts. Some services do it well where they only send in what you need. Some services just dump everything. You've got a load of load of logs. We can right size appropriately, but it's just yeah. For us, it's it's not really done now as well, I think.
ITSI has helped us streamline our incident management. We have a 24/7 service team working around the clock, responding to alerts that Splunk produces. It's linked to ServiceNow, our service management tool. When the team inputs all the information from Splunk into these tickets, they're raised in ServiceNow. Previously, we used software called Cherwell that looked horrendous. This helps bring the package together.
We've reduced our alerts, but it requires a conscious effort to configure them. That depends on how you use the platform. It goes back to getting the right metrics out of the logs that you're producing. The tool itself is powerful, but if you don't use it properly, things can be a bit noisy, and this is quite noisy, whereas that's down to our configuration sometimes.
Reducing alert noise also takes some tweaking. You've got KPIs and correlation searches that are great for real-time monitoring, but if you set them up immediately, you will get a lot of noise anyway. It depends on how you configure it. They have a couple of tools in the forwarders to say you're only ingesting alert logs or error logs, so you pick up on whatever those error logs would trigger.
It would help to give you accuracy in your ITSI alert noise. However, it might get a bit noisy if you've got more than that and they're not configured into the perfect use case you need. Overall, it's been a conscious effort to ensure we've got our stuff configured right.
It has reduced our mean detection time. For Microsoft/CloudStrike stuff, we can have an SLA as short as three minutes. The feeds are coming in quickly, so our detection time is between three and 10 minutes. For major outages, an SLA of a few minutes is good, especially when it's not a cyber-level threat.
The resolution time is determined by how quickly we can pass the detection along to the IT team and triage the logs to determine the issue. We've had quite quick resolutions because everything's partitioned in a way where it is specifically service-bound. You can look through the data and specific areas. You can optimize these things. The search system in Splunk is powerful and helps speed up resolutions.
ITSI helps to automate routine tasks. That's what the safe searches are for. It's a complete package with Splunk Cloud and ITSI for deeper drill-downs, but not everyone can access the ITSI dashboard all day. Automation helps us get these alert structures, especially at night. When you've got a file that's meant to come in at 3 a.m., you don't need someone waiting around to look at that.
This is what those alerts and automation are for. You can put custom wrappers around stuff. It's a custom output. However, Splunk is trying to make something more standardized at the moment. It saves our IT services multiple hours a week because you don't have to do tasks or sit and look through dashboards to ensure everything is all right. These constant checks every five minutes add up over the week, so that equals tens of hours a week for a lot of different services.
ITSI's KPI and correlation search aspects are powerful, and the service creation suits the project well. It allows for good segregation of the monitoring solution and up-to-date quick-time monitoring. We're notified quickly when something goes wrong.
The end-to-end visibility is excellent. A lot of the information we get is from the cloud, and the data pipelines we introduce have a clear log trail, so it's easy to pinpoint where it goes wrong.
The UI could be updated. Some elements of the KPI section aren't where you'd expect. It looks like a website from 2010 or maybe older. You can't change some things, like if it doesn't word-wrap well. For example, if you have a long list of KPIs that exceed a character limit, you need to hover over them and wait for the HTML text to pop up to see which KPI it is.
Packaging synthetic monitoring in ITSI would be good. I'd also like a complete package for doing health checks. It would also be nice if Splunk standardized the add-ons. Splunk relies on these add-ons that users build. It's like the App Store. People put time and effort into these custom things, and if they get big enough, Splunk will purchase them and take them over.
For example, we have a custom Slack output. It'd be good if they put some effort into stuff like that because it's useful. Instead, we're putting custom wrappers around stuff, but why isn't this a thing produced by this massive platform that costs so much? They recently partnered with Cisco and don't have any plans to improve ITSI in that area. It feels like they could do more.
I have used Splunk ITSI for two and a half years.
Splunk ITSI is generally stable. It's the system that has problems. When we have problems, we escalate them to a higher authority, who sorts everything out. We've only experienced two big glitches with the product and indexes not performing as they need to be.
ITSI is quite scalable. When we have problems, we can discuss them with our Splunk case manager at biweekly meetings. We might need to add some more indexing capability. With the team's support, it's easy to add new indexes and scale up.
I rate Splunk support five out of 10. The support quality leaves much to be desired because ITSI support can be outsourced. If you're dealing with regulations that limit data access to people and entities within the country, outsourced support can cause problems. We've had a couple of calls outsourced to India, and they couldn't access the data because they weren't in the UK.
When we've received local support from professional services, they've been helpful. Also, sometimes, we've asked a few questions and it didn't feel like we got a real answer or the answer was that we essentially had to solve the issue ourselves.
I've used New Relic and Dynatrace. They have good visualizations and use similar processing languages. However, you can get locked into Splunk because other competitors aren't as powerful. Though Splunk is expensive, it's a powerful platform.
Splunk ITSI is an expensive solution. Splunk probably doesn't save us money because it's one of the most expensive monitoring solutions on the market. This isn't a tool to save money. You purchase this to improve the efficacy of your service department. This is especially true now that Cisco has acquired them. Cisco is notorious for its high prices.
There's another called LogicMonitor that has better metrics and observability, but we found that it lacks as much power as Splunk. We're heavily in favor of Splunk.
I rate Splunk ITSI nine out of 10 and would recommend it, depending on the use case. If someone wants to switch, it comes down to a financial decision. You need to compare your current platform's capabilities to what Splunk can offer you. If it's a perfect match, then I would say go for it.
Sometimes, there's a steep learning curve, but you get out of it what you put into it. The visualizations are great, and the ITSI search function enables you to narrow down log analytics well.