**Scenario**

Let us assume that we have a Fashion apparel company that manufactures 11 different SKUs using different production lines. Each production lines produces discrete units of that particular SKU. Once the SKU has been produced, it goes into storage.

## Blockchain Transaction Scenario

When a unit of a SKU goes into storage, a block in a blockchain can represent that transaction with two features that can be of interest for a machine learning algorithm:

- The day the garment was stored
- The total quantity of that SKU now in storage

Since the blockchain contains the storage blocks of all storage locations (say A, B, C, D, E, and F locations) that are part of the network, a machine learning program can access the data and make predictions. The goal is to spread the stored quantities of the given product evenly over the six locations, as represented in the following histogram:

This screenshot above shows the storage level of a product (say P) distributed over six locations. Each location in this blockchain network is a **hub**. A hub in **supply chain management** (**SCM**) is often an intermediate storage warehouse. For example, to cover the area of these six locations, the same product will be stored at each location. This way, local trucks can come and pick the goods for delivery.

In a world of real-time producing and selling, distributors need to predict **demand**. The system needs to be **demand driven**. Naive Bayes can solve that problem. It will take the first two features into account:

**DAY**: The day the garment was stored**STOCK**: The total quantity of that SKU-garment now in storage

Then it will add a novelty—the number of blocks related to product P.

A high number of blocks at a given date means that this product was in demand in general (production, distribution). The more blocks there are, the more transactions there are. Also, if the storage levels (STOCK feature) are diminishing, this is an indicator; it means storage levels must be replenished. The DAY feature time-stamps the history of the product.

The block feature is named **BLOCK**. Since the blockchain is shared by all, a machine learning program can access reliable global data in seconds. The dataset reliability provided by blockchains constitutes a motivation in itself to optimize blocks.

### The goal

A Naive Bayes program will take the DAY, STOCK, and BLOCKS (number of) features for a given product P and produce a result. The result predicts whether this product P will be in demand or not. If the answer is yes or 1, the demand for this product requires anticipation.

#### Step 1 the dataset

The dataset contains raw data from prior events in a sequence, which makes it perfect for prediction algorithms. This constitutes a unique opportunity to see the data of all companies without having to build a database. The raw dataset will look like the following list:

This dataset contains the following:

**Blocks**of product P present in the blockchain on day*x*scanning the blockchain back by 30 days.**No**means no significant amounts of blocks has been found.**Yes**means a significant number of blocks have been found. If blocks have been found, this means that there is a demand for this product somewhere along the blockchain.**Some_blocks**means that blocks have been found, but they are too sparse to be taken into account without overfitting the prediction. However, yes will contribute to the prediction as well as no.**No_blocks**means there is no demand at all, sparse or not (**Some_blocks**), numerous (blocks) or not. This means trouble for this product P.

#### Step 2 frequency

Looking at the following frequency table provides additional information:

The **Yes** and **No** statuses of each feature (**Blocks**, **Some_blocks**, or **No_blocks**) for a given product P for a given period (past 30 days) have been grouped by frequency.

The sum is on the bottom line for each no feature and yes feature. For example, **Yes** and **No_Blocks** add up to 2.

Some additional information will prove useful for the final calculation:

- The total number of samples = 10
- The total number of yes samples = 8
- The total number of no samples = 2

#### Step 3 likelihood

Now that the **frequency** table has been calculated, the following **likelihood** table is produced using that data:

Blocks represent an important proportion of the samples, which means that along with some blocks, the demand looks good.

#### Step 4 naive Bayes equation

The goal now is to represent each variable of the Bayes’ theorem in a naive Bayes equation to obtain the probability of having **demand** for product *P* and trigger a purchase scenario for the blockchain network. Bayes’ theorem can be expressed as follows:

*P(Yes|Blocks)=P(Blocks|Yes) * P(Yes)/P(Blocks)**P(Yes)=8/10 = 0.8**P(Blocks)=5/10 = 0.5**P(Blocks|Yes)=4/8 = 0.5**P(Yes|Blocks)=(0.5*0.8)/0.5 = 0.8*

The demand looks acceptable. However, penalties must be added and other factors must be considered as well (transportation availability through other block exploration processes).

This example and method show the concept of the naive Bayes approach. However, scikit-learn has excellent scientific functions that make implementation easier. Blocks in a blockchain provide sequences of unlimited data. Exploring the scikit-learn classes for naive Bayes is an excellent way to start a gold mining adventure in the world of blockchains.