Designing data processing architecture for IoT Analytics

There are some key cloud services that are likely to be employed in your IoT data processing environment. Both AWS and Microsoft Azure have IoT-specific services that we will review. There are also services that support data processing and transformation that are worth a review to increase your familiarity with them.

Amazon Kinesis

Amazon Kinesis is a set of services for loading and analyzing streaming data. It handles all the underlying compute, storage, and messaging services for you.

The services in the Kinesis family are as follows:

  • Amazon Kinesis Firehose: This enables loading of massive volumes of streaming data into AWS.
  • Amazon Kinesis Streams: This service allows you to create custom applications to process and analyze streaming data in real time. There are two ends to each stream; you use the Amazon Kinesis Producer Library (KPL) to build the application that sends data into the stream. The Amazon Kinesis Client Library (KCL) is used in the application that reads data from the stream – typically a real-time dashboard or rules engine-type application. Stream data can also be directed into other AWS services, such as S3 or SQS.
  • Amazon Kinesis Analytics: This allows you to easily analyze stream data with standard SQL. Queries can run continuously, and AWS handles the scaling needed to run them automatically.

AWS Lambda

Lambda allows you to run code without provisioning servers. Python, Node.js, C#, and Java programming languages are all currently supported. Scale is handled automatically. You only pay when your code executes. Everything runs in parallel and your code needs to either be stateless or manage state in an external database. The code is essentially event-driven since you configure when and under what conditions it executes.

Using a service such as Lambda is often referred to as serverless computing and opens up a whole new range of possibilities. You could create a fully-functional web application with Lambda that has no server behind it. You can create analytic code that scales to millions of device events without ever having to worry about managing a server. The following diagram gives an overview of how it works in practice:


Overview on how Lambda works. Source: AWS

AWS Athena

Athena is a new service launched in late 2016 and operates over the data stored in S3. It allows you to query datasets using ANSI SQL. No data needs to be loaded into Athena; it queries directly against the raw files in it. This allows you to analyze large amounts of data without any Extract, Transform, and Load (ETL) to load it into a data analysis system. There are no clusters or data warehouses to manage. You pay for the amount of data scanned, which means you can compress it to lower costs.

Combine this with Lambda and you could have yourself a fairly decent low-cost big data solution. Store raw IoT files in S3 and schedule a Lambda job to periodically transform new data into an analysis-ready dataset – also in S3. Use SQL in Athena to analyze. You can do all this without worrying about servers, clusters, scaling, or managing complicated ETL.

The AWS IoT platform

The AWS IoT platform handles data messaging and security for communications between connected IoT devices and your AWS environment. It can support billions of devices and trillions of messages.

MQTT, HTTP, and WebSockets protocols are also supported. It provides authentication and encryption services for communications. All AWS services can be used to process, analyze, and make decisions on IoT data. The communication can be both ways.

Device shadows are used to store the latest state information on each IoT device. This makes it appear to applications that the device is always available, so commands can be given and values read without waiting for the device to be connected. The device shadow will sync with the IoT device when the device connects.

The AWS IoT device SDK is installed on the remote IoT device to handle communications. Registration and authentication services are handled in the AWS IoT platform. A rules engine can direct messages to other AWS services, such as Lambda, when certain conditions are met (such as [device temperature < 0 degrees Celsius]).


Basic overview of AWS IoT platform. Source: Amazon Web Services

AWS Greengrass is a new product in the IoT family, which was in the preview stage at the time of writing. It allows simplified edge analytics. This makes it exciting for the IoT analytic possibilities. It is software that is installed locally on a device or a nearby device hub. It can automatically handle buffering event data when the device is not connected. It also, and this is the exciting part, supports the same Lambda functions as the AWS cloud. You can build and test the functions in your AWS environment and then move them with minimal effort to operate at the edge using Greengrass.

Microsoft Azure IoT Hub

Azure IoT Hub is a managed service for bidirectional communications between the Azure backend and IoT devices. Millions of devices can operate on the IoT Hub; the service can scale as needed. Communications can be one-way messaging, file transfer, and request-reply methods. It integrates with other Azure services.


Example Azure IoT Hub solution architecture. Image source: Microsoft Azure

Azure IoT Hub has some key features, which give some insight into how it works:

  • Authentication and connectivity security: Each device is provisioned with its own security key that allows it to connect to IoT Hub. Identities and keys are stored in the IoT Hub identity registry.
  • Device twins: These are JSON documents that store state information such as configuration and parameter values. It is stored in the cloud and persists for each device connected to the IoT Hub. A twin allows you to store, synchronize, and query device data even if the device is offline.
  • Connectivity monitoring: Here, you can receive detailed logs on identity management and connectivity events for your devices.
  • The IoT protocols support: MQTT, HTTP, and AMQP protocols are supported by Azure IoT device SDKs that you would install on your IoT device. They are also supported through an exposed public protocol in the event you cannot use the SDK on your device.
  • IoT protocol extensibility: You can support custom protocols by either of the following:
    • Creating a field gateway: You can use the Azure IoT gateway SDK to convert your custom protocol to one of the three protocols understood by IoT Hub.
    • Customizing the Azure IoT protocol gateway: This runs in the Azure cloud and is an open source component.
  • Scale: Millions of devices can be connected at the same time with millions of events per second flowing through the hub.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s