Month: May 2022

Summary – Design and Implement a Data Stream Processing Solution

This chapter focused on the design and development of a stream processing solution. You learned about data stream producers, which are commonly IoT devices that send event messages to an ingestion endpoint hosted in the cloud. You learned about stream processing products that read, transform, and write the data stream to a location for consumers to access. The location of the data storage depends on whether the data insights are required in real time or near real time. Both scenarios flow through the speed layer, where real‐time insights flow directly into a consumer like Power BI and near real‐time data streams flow into the serving layer. While the insights are in the serving layer, additional transformation can be performed by batch processing prior to consumption. In addition to the time demands on your streaming solution, other considerations, such as the data stream format, programming paradigm, programming language, and product interoperability, are all important when designing your data streaming solution.

Azure Stream Analytics has the capacity to process data streams in parallel. Performing work in parallel increases the speed in which the transformation is completed. The result is a faster gathering of business insights. This is achieved using partition keys. Partition keys provide the platform with information that is used to group together the data and process it on a dedicated partition. The concept of time is very important in data stream solutions. Arrival time, event time, checkpoints, and watermarks all play a very important role when interruptions to the data stream occur. You learned that when an OS upgrade, node exception, or product upgrade happens, the platform uses these time management properties to get your stream back on track without losing any of the data. The replaying of data streams is possible if you have created or stored the data required to replay them. There are no such data archival features on the data streaming platform to achieve this.

There are many metrics you can use to monitor the performance of your Azure Stream analytics job. For example, the Resource Utilization, Event Counts, and Watermark Delay metrics can help you determine why the stream results are not being processed as expected or at all. Diagnostic settings, alerts, and Activity logs can also help determine why your stream processing is not achieving the expected results. Once you determine the cause of the problem, you can increase the capacity by scaling, configuring the error policy, or changing the query to fix a bug.