Lambda Architecture : Capturing data from sensors in an Industry 4.0 setup

Industrial Internet solutions gather data from smart devices “at the edge” in field locations that are often remote. These devices typically stream data that eventually ends up in cloud-based or on-premises data-management systems.

Many of you might be more familiar with traditional on-line transaction processing systems feeding data warehouses via batch data loads. Streaming data is data in motion, and that introduces the need for another analysis layer called the speed layer.

This multi-layer approach is described by what is popularly called the Lambda architecture.

Traditional online transaction-processing systems feed the batch layer directly. Devices at the edge feed streaming data directly into a speed layer. The data usually then makes its way into the batch layer and is added to the data at rest.

The following diagram illustrates the main building blocks included in a Lambda architecture. The direction of most of the flow of data among these building blocks is indicated by the arrows:

A serving layer is sometimes described to be part of the Lambda architecture. This layer provides indexing of views of data at rest for faster queries and is usually deployed as part of the data-management systems. The results of queries are presented in the business intelligence and analytics tools pictured in this diagram.

Business intelligence tools refer to a broad classification of tools used by the lines of business to retrieve, analyze, and transform data and report on business results. Interfaces typically include table and hierarchy definitions used to drill to detail. More modern tools often include native-language-like questioning and are beginning to leverage cognitive interfaces.

The Lambda architecture works quite well for most organizations that have already built batch feeds to data warehouses and now are adding streaming data sources as part of their Industrial Internet project.

More recently, some organizations starting with an entirely new infrastructure have decided to eliminate batch feeds and process all incoming data as streaming data. This variation is called the Kappa architecture. The downstream data store only appends incoming data in this design, so a Hadoop cluster serving as a data lake is frequently chosen as the final landing spot for all data when this approach is taken.

Analyzing the speed layer in detail

As depicted in the illustration below, a speed layer in our architecture consisting of an IoT hub and/or event hub(s) serving as a cloud gateway and a streaming analytics engine. A cloud gateway is paired with a field gateway at the edge, or the cloud gateway will sometimes communicate directly with the smart devices themselves.


Some organizations deploy the speed layer on-premises instead of in the cloud to be located close to their existing batch layer systems. If transmission of data occurs to a central on-premises location, the gateway architecture would be similar, except an on-premises gateway would be pictured in our earlier diagram instead of the cloud gateway.

This is especially common in organizations that built Industrial Internet solutions prior to public clouds gaining in popularity and the functionality required for these types of solutions.

Field gateways gather event data at the edge from smart devices and sensors. They are usually sized based on the number of data streams that will occur, the data collection rate (events/second), and the data storage duration desired. These gateways might be custom developed or provided by vendors. OSIsoft and ThingWorx are two such popular vendors deployed as part of many custom-built solutions.

Field gateways ingest messages, filter data, provide identity mapping, and log messages (for auditing purposes) as well as provide linkage to cloud or on-premises gateways. A newer trend has emerged to also perform stream analytics and machine learning within the field gateways. The ability to push these applications to the edge is now provided by some of the public cloud vendors. In a sense, this extends the speed layer to the edge.

When these capabilities are deployed at the edge, you will need to consider CPU and memory sizing implications when sizing the field gateway platforms.

Within the speed layer that is deployed in a central location, the packaging of components varies among vendors. Among various public cloud vendors focused on IIoT solutions, the following functionality can be found in their offerings and/or those of their partners:

  • IoT hubs that enable device to cloud (D2C) via messaging protocols and cloud to device (C2D) communications contain information about the smart devices, support revocable access control for devices, enable operations modeling, and support message routing to event hubs or service buses
  • Event hubs without the management capabilities of the IoT hub, but specifically designed for just handling rapid message ingress with data transfer rates of up to 1 MB/second typical in cloud deployment
  • Streaming analytics engines providing a place to analyze data in motion with using machine learning algorithms or to view the current streaming data through business intelligence tools.


The hubs are not meant to be locations where data is stored for significant periods of time. Usually, a maximum of 24 hours of data records being stored is recommended, though it is possible to extend the length of time that data records are stored. The batch layer is the proper location for storing longer histories.

The streaming analytics engine enables real-time analysis of data that is being transmitted from the sensors and smart devices. Typical average data rates today are about 50 MB/second. As mentioned earlier, data can be directly queried from business intelligence tools. Data might also be queried using SQL, or machine learning scripts might be applied on an ongoing basis.

Data is typically then loaded into a batch layer data-management system (data lake, NoSQL database, or relational database). Immediate actions might be initiated after the analysis of incoming data, so scripts might be pushed upstream via event hubs or service bus queues.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s