My main use case for Apache SkyWalking includes not only monitoring microservices and APIs but also managing the entire health of the application. I will explain the domains and backgrounds where we currently use it. It does more than check microservices or heavy queries. It can be integrated from the IT point of view, where your IT team can easily integrate it on the DevOps side or where applications are being deployed. It is widely used for managing the entire health of the application, checking the current status and health of the application, and how your services are currently running. When I mention services, this essentially means your queries, including database queries or backend logic that has been written to perform either up-syncing or down-syncing of data into the database or retrieving something, updating or inserting queries. Apart from that, it can be used for checking how your APIs are working, such as out of multiple API calls, how many are succeeding versus failing. If there is a particular timeout, I can see the frequency of recurrence and the time duration of the timeout or how long the network is unreachable.Even in payment applications where we have multiple applications, some related to payment processing may be failing or insert queries may not be working. There could be multiple layers on the backend side of the architecture design, and during issue resolution, it is very difficult to analyze where the actual pain point area is. Apache SkyWalking really helps identify that a particular API call failed on the payment side, perhaps deep down three layers in the architecture design, so you can see that it is failing because of a specific reason, such as network timeout, unreachable network, the bank server being down, or a third party payment server integration not responding due to heavy traffic load. In some other domains, beyond checking health, if your applications or servers are running on pods or Kubernetes containers, you can check the health of your pods as well. We have moved from outsystems to Mendix and other Java hosted applications and .NET, which all utilize Kubernetes and nodes. You can easily check which node is working fine and which is not in good state, how much traffic is currently passing through those containers or nodes, how they are integrated, and which one is responding fine versus not responding well enough. These are many areas where you can easily identify issues with the help of Apache SkyWalking. Because of its open use case platform, it helps from the licensing point of view and covers a wide area of use cases. In terms of projects, I would like to share a couple of examples. One of our patient services applications was facing issues with API failures. It was initially identified that this might be because of Java database upgradation, the fact service getting down, or perhaps a global outage of some database server, so the entire API services was getting affected. Then some fact line services started getting impacted, and because of that, a few of our Mule APIs were not working fine. Since the project had the dependency of cross-functional team members, each team was trying to identify where the actual cause was lying. At a high level, we thought that the Java API might not be connecting properly with the fact API or the Mule API internally calling the fact API, which was not getting reached properly. Someone was trying to reach out to the Mendix team to see if they could figure out and find the logs, and it could be the .NET or other applications depending on what kind of application the team was currently working on. With the help of Apache SkyWalking, you can definitely have this in place and easily identify that for this particular time duration, this was the API call that went off and this was the feature that got stopped, and these are the documents that did not reach properly. You can easily identify the area and reach out to that team, stating that you need to check out these particular APIs, and you can reach out to the support team or the vendor if needed so that on the particular SLA, those can be taken care of on priority. Apart from that, there is one more use case I would like to share regarding one of the applications on the local platform we built. Apache SkyWalking can be integrated there also because most of the time when a lot of traffic is coming for a particular second, there is sometimes a huge spike on Grafana or the logs and it is very hard to see that for a particular instant this much huge traffic is coming while your CPU or memory is quite low. You need to increase your space, but the logging is not able to maintain properly or pods are getting crashed and new pods are getting recreated. It is very hard to identify the logs to understand what is happening. Even in that area, you can easily integrate Apache SkyWalking and easily identify your Kubernetes containers and node health.
My main use case for Apache SkyWalking is a project that started in 2023 for a retail client facing serious performance issues on their new distributed architecture on AWS. The technical criticality is clear. We have an observability black hole on a high-traffic payment flow where we cannot distinguish if latencies are caused by microservices on Amazon EKS or by calls to legacy on-premises databases. We chose Apache SkyWalking through the AWS Marketplace to integrate it immediately into the existing infrastructure with the goal of monitoring a massive environment consisting of over 80 microservices and about 600 active pods. This solution allows us to manage and analyze volumes in the order of 50 million traces per day, correlating every single end-to-end transaction in real time from front end to database and pinpointing bottlenecks that are invisible with traditional logging systems.
Apache SkyWalking is a versatile open-source tool used for monitoring and analyzing the performance and behavior of applications in distributed systems. It enables tracking requests, identifying bottlenecks, and troubleshooting issues in real-time, while also monitoring microservices, logs, and server metrics.
With its comprehensive monitoring capabilities, flexible architecture, and powerful visualization tools, Apache SkyWalking provides actionable insights and enhances overall...
My main use case for Apache SkyWalking includes not only monitoring microservices and APIs but also managing the entire health of the application. I will explain the domains and backgrounds where we currently use it. It does more than check microservices or heavy queries. It can be integrated from the IT point of view, where your IT team can easily integrate it on the DevOps side or where applications are being deployed. It is widely used for managing the entire health of the application, checking the current status and health of the application, and how your services are currently running. When I mention services, this essentially means your queries, including database queries or backend logic that has been written to perform either up-syncing or down-syncing of data into the database or retrieving something, updating or inserting queries. Apart from that, it can be used for checking how your APIs are working, such as out of multiple API calls, how many are succeeding versus failing. If there is a particular timeout, I can see the frequency of recurrence and the time duration of the timeout or how long the network is unreachable.Even in payment applications where we have multiple applications, some related to payment processing may be failing or insert queries may not be working. There could be multiple layers on the backend side of the architecture design, and during issue resolution, it is very difficult to analyze where the actual pain point area is. Apache SkyWalking really helps identify that a particular API call failed on the payment side, perhaps deep down three layers in the architecture design, so you can see that it is failing because of a specific reason, such as network timeout, unreachable network, the bank server being down, or a third party payment server integration not responding due to heavy traffic load. In some other domains, beyond checking health, if your applications or servers are running on pods or Kubernetes containers, you can check the health of your pods as well. We have moved from outsystems to Mendix and other Java hosted applications and .NET, which all utilize Kubernetes and nodes. You can easily check which node is working fine and which is not in good state, how much traffic is currently passing through those containers or nodes, how they are integrated, and which one is responding fine versus not responding well enough. These are many areas where you can easily identify issues with the help of Apache SkyWalking. Because of its open use case platform, it helps from the licensing point of view and covers a wide area of use cases. In terms of projects, I would like to share a couple of examples. One of our patient services applications was facing issues with API failures. It was initially identified that this might be because of Java database upgradation, the fact service getting down, or perhaps a global outage of some database server, so the entire API services was getting affected. Then some fact line services started getting impacted, and because of that, a few of our Mule APIs were not working fine. Since the project had the dependency of cross-functional team members, each team was trying to identify where the actual cause was lying. At a high level, we thought that the Java API might not be connecting properly with the fact API or the Mule API internally calling the fact API, which was not getting reached properly. Someone was trying to reach out to the Mendix team to see if they could figure out and find the logs, and it could be the .NET or other applications depending on what kind of application the team was currently working on. With the help of Apache SkyWalking, you can definitely have this in place and easily identify that for this particular time duration, this was the API call that went off and this was the feature that got stopped, and these are the documents that did not reach properly. You can easily identify the area and reach out to that team, stating that you need to check out these particular APIs, and you can reach out to the support team or the vendor if needed so that on the particular SLA, those can be taken care of on priority. Apart from that, there is one more use case I would like to share regarding one of the applications on the local platform we built. Apache SkyWalking can be integrated there also because most of the time when a lot of traffic is coming for a particular second, there is sometimes a huge spike on Grafana or the logs and it is very hard to see that for a particular instant this much huge traffic is coming while your CPU or memory is quite low. You need to increase your space, but the logging is not able to maintain properly or pods are getting crashed and new pods are getting recreated. It is very hard to identify the logs to understand what is happening. Even in that area, you can easily integrate Apache SkyWalking and easily identify your Kubernetes containers and node health.
Our main use case for Apache SkyWalking is for monitoring Java application servers such as Tomcat and WebSphere Liberty.
My main use case for Apache SkyWalking is a project that started in 2023 for a retail client facing serious performance issues on their new distributed architecture on AWS. The technical criticality is clear. We have an observability black hole on a high-traffic payment flow where we cannot distinguish if latencies are caused by microservices on Amazon EKS or by calls to legacy on-premises databases. We chose Apache SkyWalking through the AWS Marketplace to integrate it immediately into the existing infrastructure with the goal of monitoring a massive environment consisting of over 80 microservices and about 600 active pods. This solution allows us to manage and analyze volumes in the order of 50 million traces per day, correlating every single end-to-end transaction in real time from front end to database and pinpointing bottlenecks that are invisible with traditional logging systems.