What is our primary use case?
I have worked on many use cases involving IoT platforms over the last 10 years. For instance, with a large client in the U.S., I helped track vehicles for Paccar and other companies such as Amazon, URS, or UPS. They monitor many vehicles traveling across the U.S., capturing data regarding vehicles' problems or accidents. This information is processed using Kafka or Event Hubs and pushed to our platform for analytics, where report tools such as Power BI visualize data.
For example, if any telemetry or real analytics data or streaming data comes from IoT, Azure provides the combination of Event Hubs and Streaming Analytics, and in open source, Kafka is used. For inter-system communication, the pub and sub model allows some services to publish data to the queue or topic in any services such as RabbitMQ, while Azure offers Service Bus. If you want to use real-time telemetry data, capturing things such as share market data or vehicles on highways, Event Hubs is recommended to pull that data. We can validate basic data schema using Azure Stream Analytics, determining if data processing should succeed or fail; if it succeeds, we process it with our platform, specifically our ADLS Gen2. For any basic validation that fails, another pipeline can be created for further investigation. Azure Stream Analytics can be utilized on top of Event Hubs, and we can achieve storage along with serverless Azure Function services for internal processing, or use open-source tools such as Kafka if needed.
What is most valuable?
I widely use AKS, Azure Kubernetes Service, Azure App Service, and there are APM Gateway kinds of things. I also utilize API Management and Front Door to expose any multi-region application I have, including Web Application Firewalls, and many more—around 20 to 60 services. I use Key Vault for managing secrets and monitoring Azure App Insights for tracing and monitoring. Additionally, I employ AI search for indexer purposes, processing chatbot data or any GenAI integration. I widely use OpenAI for GenAI, integrating various models with our platform.
I extensively use hybrid cloud solutions to connect on-premise cloud or cloud to another network, employing public private endpoints or private link service endpoints. Azure DevOps is also on my list, and I leverage many security concepts for end-to-end design. I consider how end users access applications to data storage and secure the entire platform for authenticated users across various use cases, including B2C, B2B, or employee scenarios. I also widely design multi-tenant applications, utilizing Azure AD or Azure AD B2C for consumers.
Azure Stream Analytics reads from any real-time stream; it's designed for processing millions of records every millisecond. They utilize Event Hubs for this purpose, as it allows for event processing. After receiving data from various sources, we validate and store it in a data store. Azure Stream Analytics can consume data from Event Hubs, applying basic validation rules to determine the validity of each record before processing.
What needs improvement?
There is a need for improvement in reprocessing or validation without custom code. Azure Stream Analytics currently allows some degree of code writing, which could be simplified with low-code or no-code platforms to enhance performance.
What do I think about the stability of the solution?
The purpose is to handle fast-moving data. For example, Azure Stream Analytics processes more data every second, which is why it's recommended for real-time streaming.
What do I think about the scalability of the solution?
The service itself allows scaling, with capabilities to process data across multiple regions effectively, which is crucial for applications demanding constant monitoring, such as healthcare or financial services.
How are customer service and support?
Having worked with Azure for 23 years, I find the support excellent due to abundant training materials and resources available globally, making it easier for adaptation.
How would you rate customer service and support?
What's my experience with pricing, setup cost, and licensing?
Azure charges in various ways based on incoming and outgoing data processing activities. Choosing between pay-as-you-go or enterprise models can affect pricing, and depending on data volume, charges might increase substantially.
Which other solutions did I evaluate?
Kinesis is equivalent to Event Hubs in the AWS ecosystem, where Lambda functions are akin to Azure Functions, and S3 buckets compare to Blob Storage. While there are many competitors, Azure remains my preferred choice based on client needs.
What other advice do I have?
I am still using Microsoft Azure Blob Storage as well as other 200 services; out of 200 services, mostly 50 to 60 services I'm using.
I used to use the Blob Storage for page blob or block blob; there was no requirement for using File Storage. Blob Storage is widely used for storing any binary format such as files or any images.
Blob Storage supports only binary format and lacks hierarchical namespaces. However, Azure Data Lake Service Gen2 supports hierarchical namespaces, accommodating various file formats including PDF or JSON. We can process it using PySpark, Scala, or SQL, and it allows data storage in multiple formats.
Azure Stream Analytics is entirely cloud-based and does not support any on-premise setup. For on-premise solutions, Apache Kafka could be considered as an alternative.
All services are platform services and can follow a pay-as-you-go model or enterprise licensing. Setting up involves configuration by our cloud engineering team, including exposing APIs, which facilitates incoming requests to Event Hubs for processing.
Azure services often aim for an SLA of 99.9. By configuring multi-region integrations and ensuring proper routing and load balancing, we can maintain high availability, even during potential failures.
For big data processing, three languages are supported in any data analytics platform, including Scala, SQL, and PySpark. I predominantly use PySpark, which supports big data and SQL, allowing us to validate conditions and filter based on input.
We can apply retry concepts or compensating techniques for Azure Stream Analytics. For instance, if basic validation fails, we can revalidate and reprocess the data through another pipeline. We implement various patterns such as retry patterns or circuit breakers to ensure proper data processing and business validations based on requirements.
On a scale of 1-10, I rate Azure Stream Analytics a 9.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure