Defining Data & Analytics architecture for an Industry 4.0 Network : A Complete Primer

while the real goldmine in an Industry 4.0 network is in leveraging the data captured strategically, careful and meticuluous planning if developing an Analytics architecture is critical.

Industrial analytics are unique in that the results of analysis often directly impact the physical world operationally, and can also have safety implications. Taking actions based on analytics could be harmful or undesirable. Since industrial analytics interpret and prescribe actions that interact with other sensors and components, there is also the potential for conflict. Therefore, it is important to fully understand the various information streams so that correct decisions can be made. The following are some of the unique requirements for consideration:

  • Correctness: Industrial analytics requires a higher level of accuracy to avoid undesirable and unintended consequences in the physical world
  • Timing: Industrial analytics must deliver results within their prescribed time horizon to satisfy synchronization requirements and ensure reliable, high-quality operations
  • Safety: Strong safety requirements are necessary to safeguard workers, users, equipment, and the environment
  • Contextualized: Industrial analysis is always performed within the context of an activity, and an accurate and complete understanding of the analytic results requires an understanding of the processes and the states of the equipment and peripherals
  • Causal-oriented: As industrial systems have complex and causal relationships, the analytics must be modeled with domain-specific subject matter expertise linked to physical modelling and statistical, data science, and machine learning knowledge
  • Distributed: Many industrial systems include hierarchical tiers and are distributed geographically, and each tier or subsystem might have its own unique analytic requirements requiring localized analytics (requirements for timing and resilience can result in the analytics being distributed and implemented close to the source of data, and to the target of the analytic result)
  • Streaming: Due to the continuous execution or batch processing of industrial systems, most analytic data and analytic results will be streaming in nature so that the analytics will be applied to live data as it is generated or transmitted (traditional batch-oriented analytics might also provide information to improve analytic models and human decision-making)
  • Automatic: To support continuous operations, streaming analysis and application of analytic outcomes must be automated, dynamic, and continuous
  • Semantics: To properly understand the data and produce accurate analytic results, data needs to be understood in context, attributed at the source, and communicated to improve the accuracy (data that is inferred or taken out of context will result in uncertainty)

To glean useful analytic results, the architecture that is deployed should first efficiently collect data and then stream or store the data, or both, and then transform it for analysis. Robust data management is necessary to facilitate this process.

Successful analytics requires the pre-processing of the data. Pre-processing techniques that are utilized are driven by the type and format of the data being produced, where the data is produced, and whether the rate of data generation allows for it to be processed in batches or requires streaming processing. The volume and speed of data in IIoT can present several challenges.

Data Management

Data management in IIoT systems involves incorporation of various tasks and roles from a usage viewpoint, along with the functional components of the functional viewpoint. The activities for data management include the following:

  • Reduction and analytics
  • Publish and subscribe
  • Query
  • Storage, persistence, and retrieval
  • Integration
  • Description and presence
  • Data framework
  • Rights management

Data reduction and analytics

IIoT sensors have the potential to generate a huge amount of raw data that possesses inherent value. Additional analysis is often required to gain important insights. However, it can be expensive and time-consuming to transmit such volumes of data over networks. The speed and volume of data can make some preprocessing necessary prior to transmission.

A variety of preprocessing techniques are typical. Data volume might be reduced through aggregation. Another method of data reduction is to perform sampling and then filter samples within the stream. Statistical and machine learning algorithms can be applied at the data source or edge so that only the analytic results are transmitted through the network. Moving these functions to the edge can substantially reduce network traffic.

Publish and subscribe

Publish and subscribe describes a reliable method for sending or publishing data by a person or process, at predefined intervals, to parties that have indicated an interest in the topic. These parties are called subscribers. The publisher and subscriber do not necessarily know about each other. A broker is used to filter and distribute the data. Data collected at the edge might be published to a consolidation and aggregation tier.

Publish and subscribe methods enable scalability for many data sources and consumers, and there is a reliable flow of management services to the devices. Broker operations can be run in parallel, and caching and intelligent routing can be employed to improve scalability. For extremely high volumes, clustered broker nodes can distribute the load using load balancers.


A query enables the filtering and selection of data. Queries can be executed against a dataset by a user, by an addressable device (through web sockets), by an application, or through analytical, reporting, visualization, and other tools. Query results can be pushed to a gateway to provide a data source for higher level brokers.

Query models are typically one time or continuous. A one-time query involves selection of data in response to a request, while a continuous query produces a stream of data. One-time queries are used to return a single result set and are well suited for the publish-subscribe data pattern. Streaming data returns data in real-time or at prescribed intervals and can be used in tracking and monitoring real-time analytics and machine learning.

Storage persistence and retrieval

IIoT systems typically generate huge quantities of data that needs to be processed and/or stored for record keeping, post-processing, and analysis. There can also be regulatory or audit considerations as to what types of data must be recorded and preserved, and for how long.

Stored time-series data produced by devices might be used for replaying events or simulating scenarios. Time series data might be persisted by a historian, deployed using a NoSQL database or Hadoop-based system, or archived by the historian to a relational database or other data management system.


The components and applications in an IIoT solution produce varied data specific to their respective functions. This data was traditionally retained in their respective silos in field locations. To achieve a more thorough understanding of the business, data integration across these silos is required. For example, in telecom companies, revenue leakage occurs when calling services are provided, but these billable calls are not passed on to the billing system. Integrating network usage data, linked to devices with the billing systems, can uncover these unbilled calls.

Traditional systems use a process of extraction, transformation, and loading (ETL) to extract data from one or more source systems, integrate and transform the data, and load the resulting data into a target system. This transformation frequently includes performing aggregation or other functions to enable analysis. Where large data volumes are at play, the processing often more closely follows an ELT pattern where transformations occur in the target.

In IIoT systems, heterogeneous devices can have different syntax, semantics, and APIs. A transformation to a common semantic framework is often necessary for effective analysis. Domain transformation might be used for protocol conversions.

IIoT integration challenges can be addressed by taking the following steps:

  • Address the APIs first to determine the integration requirements and if your existing integration capabilities are sufficient
  • Identify the communication requirements for the devices and select the most appropriate technology, taking into consideration how the technology can handle the number and types of devices, while determining the network topology that best meets the requirements
  • Leverage cloud-based deployment models to integrate IIoT platforms with business processes
  • If the IIoT system data and applications are on-premises, or mostly on-premises, you might consider using traditional integration tools already in-house, but keep in mind that these tools might not be optimal for IIoT connectivity or cloud service integration
  • Add an API management solution to your IIoT project, especially if the project has many APIs, or the APIs have large numbers of consumers or return restricted or sensitive data

Description and presence

Description and presence refers to awareness about devices, networking, software, and systems. This is required for effective management and the dynamic integration of new capabilities. Understanding data descriptions and availability is also a fundamental requirement.

Metadata provides the descriptive information. For example, when you select a movie to watch, you are probably interested in the movie name, movie genre, plot summary, movie length, the movie’s star rating, release date, and digital format. This is the movie’s metadata. It enables you to gain an understanding of the movie without having to watch it.

Metadata in IIoT describes the definitions and structure of the data. Description and presence enables discovery of information about the data structures and devices. Metadata analysis can reveal patterns, correlations, and trends without the need to examine the data itself.

Data framework

The data framework monitors the data exchange components and provides diagnostic data, such as the status and behavior of the components and information regarding data volumes and usage. As new data components are introduced to the framework, they are discoverable. The data framework can also be used to maintain a publish-subscribe data catalog so that users and components can discover updates to published data. The data can be exposed through dashboards and system consoles corresponding to the technologies and components.

The data framework tracks the following:

  • Component presence discovery, identifying past and present framework participants and newly added components
  • Component activity monitoring of data statistics such as update frequency, throughput, and system loads and memory usage
  • Traffic monitoring of data flow statistics such as throughput, latency, and data volume

An IIoT-specific systems management console in the data framework enables testing and diagnostics.

Rights management

Rights management describes data ownership and rights and relies heavily on security functions to control data privileges to ensure privacy and to protect data from unauthorized manipulation. Data owners and managers maintain stewardship of the data under their purview. They can grant or revoke certain rights to all data or defined sets or subsets of raw data, aggregated data, or consolidated data.

Rights management is key to meeting regulatory and compliance requirements and to keep track of ownership when data-related functions are outsourced to cloud providers or other third parties. Rights can be associated with APIs for granting authority and accessing a device.

A graph database can be useful to manage the relationships between devices, users, location, networks, and permissions. Blockchain might provide a scalable mechanism for verifying and sharing access to data and assets in the IIoT system.

Creating business value

Commercial industrial organizations, just like any business, must maintain and increase financial margins to stay in business and remain competitive. The industrial organization can achieve this, at least in part, by increasing production and reducing expenses and inventory costs. Industrial analytics can help the business stakeholders of your IIoT project identify bottlenecks and balance operational processes with demand, product, and inventory.

A focus on solving these business needs and success stories, where analytics enabled by IIoT projects, makes similar companies more competitive and often drives companies to move forward on these projects.

Analytics functionality

Industrial analytics footprints must provide certain features to deliver solutions to functional and non-functional requirements while addressing the complexity present in this multi-domain architecture. These features include the following:

  • Visualization: Displays and manipulates data and analytic results using charts and graphs
  • Exploration: Ad-hoc querying of stored data
  • Design: Analytics automation for data quality, mining and machine learning, and business intelligence
  • Orchestration: Distributes requests over clusters of computing resources to collect and aggregate data
  • Connection: Exchange of data and work between components
  • Cleansing: Removal of irrelevant and duplicated data and noise, and merging data from multiple sources
  • Computation: Execution of statistical and machine learning calculations
  • Validation: Governance ensuring analytic results are accurate
  • Application: Analytic results used to improve or correct automation, or aid human decision making
  • Storage: Historical archival of incoming data
  • Supervision and management: Monitoring, updating, correcting, and optimizing the information model, metadata, data sources, processes, and computing resources

Industrial analytic activities depend on the availability and access to the data from industrial processes and assets. The distributed nature of IIoS, and the need for analytics to produce results in time to take meaningful actions, sometimes pushes the analytics to the edge or in middle tiers where data is streaming. Once the analytics are performed, the values might be archived via batch feeds to data management systems where further analysis is possible by data scientists and SMEs. There, they might interpret and validate readings or recommend additional filtering or sampling. If further analysis is not deemed necessary, the raw data can be discarded.

To enable the continuous processing of industrial data, an analytics workflow can be developed within the data framework and automated. The workflow automation orchestrates the transformation of raw data into analytic results and performs execution of the analytic prescriptions. Workflows and their content can be improved and fine-tuned to improve accuracy and produce better result as more is learned about the processes and should be versioned.

Finally, the analytic results should be communicated in an understandable format that improves human understanding and decision making. They will want to interact and visualize the data in diverse ways. Some might want to drill through aggregations to details via hierarchies and related items.

As the analytics are honed over time, increasingly meaningful patterns can be discovered. Anomalies might be detected and alerts can be sent to operators along with supporting data as required. The root cause of anomalies and faults can be diagnosed, and prescribed actions might be taken automatically or through human intervention. By applying analytics to optimize the operating parameters and operational efficiency, failures can be avoided. Failures caused by human error can be reduced or eliminated.

Applying analytics to improve operational efficiency can result in optimal operation of devices, equipment, and reduced human stress. However, the proper data must be provided at the proper time and appropriate analytical models and algorithms must be applied, guided by engineering and business domain knowledge.

Mapping analytics architecture to reference architecture

As per the Internet Consortium (IIC), the domains in IIoT functional architecture include the following:

  • Control domain: It provides functions for asset management, sensing, actuation, communication, entity abstraction, and modeling
  • Operations domain: It enables provisioning, management, monitoring, diagnostics, and optimization of devices in the control domain
  • Information domain: It consists of data ingestion, quality and cleansing, transformation, persistence, cataloging, analytics, and governance
  • Application domain: It includes logic, rules, models and interfaces addressing business requirements
  • Business domain: It includes enterprise resource planning, human resources, asset management, billing and payments, work planning and scheduling, and customer relationship applications

The following diagram illustrates these domains and where the analytics are primarily applied:


Analytics in the control domain consists of edge analytics that provides real-time insight into operations. Here, the device time horizon can be milliseconds or less. Analytics and resulting actuation that occurs in the control domain is usually automated.

Analytics applied in the application and operations domains requires responses measured in seconds, and these responses are also usually automated. Such responses based on results from streaming analytics might include automatic fault detection and diagnosis, or automated adjustments to improve efficiency.

Analytics relevant to the business domain can aid in business planning, improve processes, and enable intelligent business processes. These analytics are typically used for planning, and the required response time to make a business decision can be measured in days. Batch analytics is more typically applied here.

Advanced analytics

Advanced analytics involves applying mathematical functions to data to understand and forecast trends, define clusters using common features, and discover relationships. For example, in an industrial setting, advanced analytics can detect and predict potential faults.

Advanced analytics can be described in the following approaches:

  • Automated: This performs continuous analysis and applies the results back into the system to improve optimization and performance.
  • Real-time: Analysis occurs as data is received to provide immediate results and prescriptive actions
  • Streaming: Analysis is performed on a data flow in memory or other transient location without loading the data into a full-fledged data-management system
  • Active: Components share analytic results in real time to enable rapid response
  • Causal-oriented: Physical and neural network deep learning are applied to identify causal relationships
  • Distributed: Analysis is performed across domains and systems using shared processing

The unique characteristics of IIoT solutions often require additional robustness and speed and accuracy of the analytics, especially when the analyses impact viability of the business and safety.

Network latency and reliability are critical to taking real-time actions. Inadequate network bandwidth will inhibit the flow of data. If these limitations create timing constraints, analytics must be performed near both the data source and the target the analytic results are used to control.

In a control system where high-resolution time-series data is generated at high frequency, data volume constraints can overwhelm network bandwidth constraints. Real-time control can become impossible. In these systems, data needs to be dynamically bound to the analytic functions in the edge using dynamic composition and automated interoperability. High-volume data might then be transmitted periodically or on demand to analytic systems where it can be analyzed for patterns, anomalies, and causal relationships.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s