What is our primary use case?
We get our customers' requirements and onboard their logs into the SIEM tool using agent-based integration or some DB Connect method. After the integration, we write the use cases. There are two types of data: fault monitoring and performance monitoring. In fault monitoring, the customer typically wants every event as an alert, so we'll do a correlation search for that alert.
We'll add fields to the alerts, such as summaries and descriptions, and write the regular expression from the raw event to extract and display it on a table. After writing the correlation search, we will enable the policy that we'll use to trigger an incident in the ITSI tool. In our Splunk tool, there is a technical add-on called Remedy that we use to create a ticket for a correlation search and alert.
After writing the NEAP policy, we'll display the number of tickets and all that information in a single dashboard. In the first panel, we'll display a summary of all the applications and the number of tickets divided according to the severity. The second panel displays the alert information, such as the ID, reported date, and the host.
We have a team of four people. Two integrate the log sources into Splunk, while two write correlation searches, enable the new policies, and generate tickets in incident service with the ITSI tool. They also work on the dashboards, tables, and service analyzer.
How has it helped my organization?
With Splunk ITSI, we don't need to manually raise tickets for analysis. For example, it will not trigger a ticket if we receive an alert about a suspicious event when a set of conditions are met, but it's an invalid alert. Based on a NEAP policy, an incident will be created for each valid alert with the help of our ITSM tool. Each NEAP policy has two components: filtering criteria and action rules. The filtering criteria include sections. If the alert source equals the application log monitoring, it will group that particular event.
For each event, it groups by incidents based on the job ID, and we write conditions for the second-action rules. The incident ticket remains in progress if the event exceeds one and the status is not closed. It shouldn't create a second incident for the same job name.
Splunk ITSI helps customers to reduce their resources. For example, they don't need extra resources to raise manual incidents for each alert. This solution enables us to raise incidents for only valid alerts, and it displays them all in a single dashboard.
It doesn't affect the effectiveness of the application monitoring, but it decreases the resources and associated costs. It will improve the performance compared to raising incidents manually and reduce human error.
ITSI reduced the time needed to create a ticket. Instead of raising a manual ticket, we can automatically create one after an alert is triggered based on our policy. We can see all the incidents and alerts on the dashboard.
It has also reduced the volume of incident alerts. We sometimes raise a manual ticket for the same alert triggered yesterday or a few days before. If it is not closed, and we raise another incident by human error. We can write a new condition so that an alert name by the same name will produce no new tickets. It will update the ticket as "in progress" or change the severity from minor to major.
We can also reduce our alert noise using ITSI by writing a complex set of specific correlations. We'll write the exact conditions based on customer requirements. For example, we'll use Windows event ID 4625 for a failed login attempt. If a user wants, we can add the search criteria so only this event ID will be triggered.
When integrating our customer logs sources, we directly integrate the real-time events into Splunk. There is no time difference from the customer side. It goes directly into Splunk ITSI. Previously, we used some integration method so that when an alert triggers in EMS, it will reach out to Splunk to create an incident within a minute.
We send the artifacts, logs, and analysis to an incident response team to resolve an incident. The response time depends on the team. They receive all the evidence about an alert.
ITSI helps automate some reports and dashboard features. When we want to run some individual searches, it takes some time to run each search to generate a report and share it with the customer. We can add all these reports into a single dashboard, and we have a query for each report. We add all these queries into a dashboard and schedule the reports, so it generates a report daily showing all the graphs. We can download that report and share it with the customers.
When we automatically generate a ticket based on the alert, it reduces the detection time and makes the ticket-raising time nearly instantaneous. The time difference between the alert trigger and ticket creation time will be minimal as the machine is generating the ticket. The customer response time is five to 10 minutes.
What is most valuable?
The search function is the most valuable. It includes regular expressions and wild card searches. We'll write searches using field and case-sensitive services and use all of these search types to write an alert condition. Splunk ITSI has another feature called Glass Table that offers a visual representation.
We can manually change the dashboard by reducing its size or changing the background color. When we click on any cell, it will navigate to the next dashboard. You also have a KPI feature. Each KPI case has a separate formula, and we'll write a formula so that when a threshold is reached, it triggers a condition. All of this KPI information is displayed in one service analyzer.
ITSI's end-to-end visibility is excellent. With its help, we can monitor all the network-related log sources and infrastructure. Each log source is integrated into the tool and stored in a separate index to improve search performance. We are using this cluster environment with multiple indexes. It's better to have three to four indexes for a faster search.
The solution's preventive analytics help to prevent incidents before they occur. We write a correlation search that is reported. When an alert is triggered, we write a condition. Each incident will have a priority and a response time based on the SLA. For a priority 1 incident, we must respond within 30 minutes. It's an hour for priority 2 and two hours for priority 3. We have three to four hours for priority 4.
A ticket will be created within this time, and the incident response team will be alerted. While raising the ticket, we analyze all the alert information and everything the incident response team needs to resolve it. The incident response team will act accordingly and close the incident within this time.
What needs improvement?
When configuring a dashboard, we can write search criteria. Based on the search criteria, the dashboard shows all the alerts, including the alert time, creation time, and a summary description of the alert. When you add an extra column, such as the user that triggered the alert, the next time he refreshes the dashboard, he wants to know that the alert is acknowledged. We want to improve that comment feature.
In the Service Analyzer, we monitor the network infrastructure services and have a KPI for each service. When the value exceeds the threshold value, we can add the colors. For example, we can set it to green when the threshold value is within the limit. If it is red, then the value has passed the threshold. We want more colors in the service analyzer to display all these features.
For how long have I used the solution?
I have worked with ITSI for two years.
What do I think about the stability of the solution?
Splunk ITSI is stable. When Splunk releases its latest update package, we scan it for vulnerabilities and update it to the next version if there are none.
What do I think about the scalability of the solution?
Splunk ITSI is highly scalable.
How are customer service and support?
I rate Splunk support nine out of 10. We can get support from the Splunk community or raise a ticket to Splunk and get a reply faster.
Which solution did I use previously and why did I switch?
Before implementing Splunk, we manually noted all the alert information in a notepad. I downloaded the log file and traced the incident in tools like ServiceNow and BMC Remedy. Now we have a Remedy add-on feature integrated with Splunk ITSI, so it requires no manual intervention to raise a ticket in ITSI.
How was the initial setup?
Deploying Splunk ITSI was straightforward. We downloaded the initial version and upgraded to the latest package from the back end. It's a simple process that involves integration, log onboarding, deploying agents, and setting up DB Connect. In the agent-based method, we'll have a separate configuration. We collect the log path for all the sources and hosts that need monitoring, which will be integrated into our Splunk tool.
It requires minimal IT resources to deploy. Two IT resources are sufficient at the time of onboarding for 10 log sources weekly. It's easy to maintain. We are maintaining the license. If your data exceeds the license limit, you need to reduce it or pay for more.
What's my experience with pricing, setup cost, and licensing?
We have a 100 GB license. This licensing option is a bit expensive, but it can manage any type of bulk data, including database logs, network device logs, and social media devices.
What other advice do I have?
I rate Splunk ITSI nine out of 10. No manual intervention is needed. It generates the tickets automatically when an alert is reported. If we do this manually, it will take more time to review all the alerts.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor. The reviewer's company has a business relationship with this vendor other than being a customer: Partner