Multi-chain and data-chains partitioning algorithm in intelligent manufacturing CPS

This paper proposes a new algorithm for the blockchain system based on multi-chain storage for CPS under edge cloud computing, in which the nodes of the system are divided according to the relationship of communication tightness.The experimental results show that the algorithm based on the multi-chain storage structure for nodes divided can reduce the data synchronization time and space, and improve the expansion ability of the system. Besides, the data in the system can be isolated from each other with the multi-chain structure, thus preventing the leakage of sensitive data and improving the information security of the system.


Introduction
Industry 4.0 represents the beginning of the Information Revolution and current trends in manufacturing automation, including Cyber-Physical Systems, the Internet of Things and Cloud Computing [1]. CPS is considered to be an Industry 4.0 technology that brings the virtual world and the physical world together [2]. CPS integrates knowledge engineering in manufacturing, software systems for manufacturing equipment, and robot vision and control capabilities, modeling expert knowledge and manual skills, enabling intelligent robots to perform mass production tasks without intervention [3]. Besides, it has some characteristics such as real-time, efficient, reliable and safe [4].
With the rise of the fourth industrial revolution (industry 4.0), decentralization is being importance in the integration of Cyber-Physical Systems (CPS) with cloud computing infrastructure. In the Cyber-Physical Systems, a large amount of data is stored in the database, and the content of the database is easy to be leaked and tampered. Therefore, product problems cannot be traced when they occur, and the authenticity of the data is difficult to guarantee. Blockchain has the characteristics of *Correspondence: wh_red@163.com Department of Computer, Guangdong University of Technology, Waihuan West Road, GuangZhou, China decentralization, data immutability, transparency, which can guarantee that the stored data can be traceable and not tampered with. But each block of the blockchain itself has small capacity and poor scalability. These can not meet the needs of the Cyber-Physical Systems. Therefore, edge computing is introduced under the framework of blockchain. Edge computing is used to manage local network, package data format and provide computing power. Data is transmitted from intelligent terminal to edge node through edge gateway [5].
The main contribution of this paper is to put forward a data-chains partitioning algorithm under a multi-chain structure that is suitable for Cyber-Physical Systems. In this paper, the basic structure of the multi-chain is composed of a communication-chain and some data-chains. Theoretical analysis and experimental results show that the storage structure and the data-chains partitioning algorithm can reduce the data synchronization time and space, and also improve the expansion ability of the system. Besides, the data in the system can be isolated from each other with the multi-chain structure, thus preventing the leakage of sensitive data and improving the information security of the system.
The remaining chapters of this paper are arranged as follows. In Related work section , we discuss the related work on blockchain storage and system extension. In Data-chains partitioning algorithm section, we proposed an algorithm on how to divide data in different chains according to the community structure clustering method in data nodes. In Experimental analysis section, we present efficiency analysis for our proposed algorithm, and provide illustrative numerical results. Finally, Conclusion section concludes the paper.

Related work
The edge computing in cyber-physical system The edge computing paradigm was explored in order to alleviate the limitations of cloud centric architecture [6]. The edge computing architecture transfers part of the overall computing power of the system from the cloud to the edge node (in the field or near it), as a means of saving bandwidth and storage, because the edge node can filter the data flow from the field, so as to remove the information of no value to industrial automation. It can enable low latency and proximity processing because information can be processed close to the scene and provide enhanced scalability by supporting distributed storage by processing on a larger scale than cloud processing. Therefore, we make an overall architecture figure showing edge computing with the different components of the blockchain. The Fig. 1 is shown as below.

Research and development status of blockchain and cyber-physical system
The production task of intelligent manufacturing CPS involves a large number of intelligent production equipment, including different processing centers, various intelligent handling equipment, etc. All the records of these equipment are stored in the single third party database. The process of data transmission is complex, and there are different security problems. The data stored in the central database is easy to be stolen and hard to recover after being attacked. What's more, the data stored in the cloud or the third-party database will increase the communication load, so it is difficult for the device to process the data in real time.
Blockchain is a general term for a class of distributed databases. The nodes in the blockchain system can share the data information in the system, and the consensus of all or most of the nodes in the system ensures that the data information recorded in the system is trusted and correct [7]. It is regarded as one of the most important technology for the fourth industrial revolution, although there exit few the real-world productionlevel applications due to the hardly tolerant performance of most of the existing blockchain systems [8]. The public ledger blockchain only 3 to 7 transactions can be processed per second, which means the blockchain does not scale. Therefore, some experts proposed current blockchain storage schemes such as Bitcoin-NG [9] and FACTOM [10] can slightly improve the performance of blockchains and support simple payment verification, but the low efficiency of validation can not support a large number of data validation. The data of all nodes are stored on a single-chain in the system, and the consensus speed of the blockchain system will be greatly reduced.
In the Cyber-Physical system, the interaction between nodes must satisfy real-time performance [11]. Aiming at the problem of data secure storage in the CPS, Liu et al. [12] proposed a hierarchical cooperative storage architecture, which includes data storage node, master control node and authentication service node. The storage master node is the core of all storage management operations and is responsible for the management of storage metadata. Wang et al. [13] proposed a new reliable exchange protocol, which provides a recovery method for network and local system failure through the appropriate convertible signature scheme and message recording method. Becker et al. [14] explores the problem of the possibility of integrating the Knowledge Management System (KMS) and System Integrator(s) in Cyber-physical Production Systems (CPPS). They presented CPPS development network, including relations and the applied Systems Engineering approach in their first stage of approach. Zhu et al. [15] propose a controllable blockchain data management (CBDM) model that can be deployed in a cloud environment. They then evaluate its security and performance, in order to demonstrate utility. Zhao et al. [16] propose a new architecture called secure pub-sub (SPS) without middle ware, i.e., blockchain-based fair payment with reputation. Different from the traditional pub-sub services, no trusted third party is involved in their system due to employing blockchain technique. The security of the proposed SPS is analyzed as well. The implementation of the protocol on Ethereum of smart contract demonstrates the validity of SPS. Peng et al. [17] propose a Verifiable Query Layer (VQL). The middleware layer extracts transactions stored in the underlying blockchain system and efficiently reorganizes them in databases to provide various query services for public users. (new)To alleviate the inconsistency between dimensionality reduction and feature retention in anomaly detection, Zhou et al. [18] proposed a Variational Long Short-Term Memory (VLSTM) learning model for intelligent anomaly detection based on reconstructed feature representation. Besides, they focused on the modeling and analysis of patient-physiciangenerated data based on an integrated CNN-RNN framework [19].

Extension schemes to improve the performance of blockchain system
In order to improve the performance of blockchain system, many kinds of high-performance schemes are proposed by blockchain scholars. At present, there are three main schemes to improve the performance of blockchain in the industry. The first one is to change the traditional structure of blockchain, and replace it with the directed acyclic graph based on transaction. The second scheme is to change the consensus strategy by reducing the number of nodes participating in consensus. The third kind of scheme is to improve the overall throughput of the system by improving the horizontal expansion ability of the system. The technologies represented by this scheme include sharding, subchain and multi-channel. For this kind of technology, data synchronization should be ensured in different shardings, sub-chains and channels divided in the blockchain system, while the operation among shardings, sub-chains and channels are asynchronous. The sharding technology is to divide the nodes in the underlying network of the whole blockchain into several relatively independent areas to realize the horizontal expansion of the system. The sub-chain is a blockchain with certain independence extended from the main chain. The subchain depends on the main chain, and its own consensus mode and execution module can be defined according to the actual needs. The concept of channel in multi-channel technology is that a channel is composed of multiple nodes according to a certain grouping strategy or partitioning algorithm. In the system, multiple such channels can be created at the same time to meet the horizontal expansion of the system. At present, many scholars and enterprises put forward different framework models based on the third kind of scheme. Wu et al. [20] reviewed existing blockchainbased solutions after introducing the architecture for 5G-enabled and existing legislation and data privacy regulations that need to be considered in the design of blockchain-based solutions, which will hopefully inform future research agenda. Yin et al. [21] proposed a bidding mechanism to better encourage vehicles to contribute their resources, and the tasks for those vehicles are scheduled accordingly. They alse develop a blockchain framework to achieve the secured information exchange through smart contract for the proposed models in IoV. Wang et al. [22] proposed loan on blockchain (LoC), a novel financial loan management system based on smart contracts over permissioned blockchain Hyperledger Fabric. Imbault et al. [23] proposed the blockchain technology implemented in the industrial operating system (Predix) and applied in the green certificate use case, as well as its application in the ecological zone. Such an application scheme provides the idea of applying the blockchain technology in the new scene, but the performance of blockchain consensus will be seriously reduced when a large number of nodes are confronted with single-chain hybrid processing. Kan L et al. [24] proposed an innovative component-based framework for exchanging information across arbitrary blockchain system called interactive multiple blockchain architecture and the throughput is increased by a number of chains parallel running. This scheme use router and crossprotocol to manage data between multi-chain, but they cannot help to reduce times of cross-chain communication. However, the times of cross-chain communication will become one of the factors restricting system performance.
Combined with the actual scene of Cyber-Physical Systems, the existing storage scheme is not satisfactory. It can be shown in the following two aspects. First of all, the blockchain adopts the decentralized mechanism, which will make the nodes in the CPS generate huge similar data, and the storage of the device will be greatly occupied by these redundant data. Secondly, the current single chain structure on blockchain system leads to the mismatch between the communication speed of nodes and the requirements of high concurrency and high response speed in CPS. The data storage and parallel processing become the key factors which restrict the system performance. Therefore, by analyzing the distribution and communication characteristics among the nodes of the Cyber-Physical system, we find that there is a community structure among the nodes. In this paper, a data-chains partitioning algorithm based on node community clustering is proposed.

Trust model of nodes
In the process of Cyber-Physical Systems running, according to the need of production, it needs to carry out multiple production tasks at the same time, and each production task involves a lot of production devices. Therefore, the Cyber-Physical System is a complicated system structure of the relations between nodes. To make the manufacturing process more intelligent, it is necessary to communicate freely among the devices in the Cyber-Physical system. For example, by managing the flow of communication information within the system, including the task delivery of the upstream devices to the downstream devices in the system and the task reception of the downstream devices to the upstream devices and the feedback information of the task completion state of the downstream devices to the upstream devices. The system will participate in the production of all the devices to form a unified operation of the whole [25]. Many devices in the system can communicate freely with each other, which will greatly improve the efficiency of the system. The Cyber-Physical System is not a plane structure but is divided into different levels. The number of node devices in each layer may be different. The flow of communication information between the layers is directed.
The hierarchical and systematic nature of CPS determines the ability of nodes to work independently and online. The ability to work independently is demonstrated by the fact that a single CPS can achieve a "sense-analysisdecision-execution" data loop with its hardware and software. It has the functions of perceptibility, computability, interactivity, extensibility, and self-decision. The ability of online work is reflected in the automatic flow of data in a wider range and field between multiple nodes through network interconnection technology, which can form an intelligent production line and realize the interconnection, interworking, and interoperation of multiple CPS. Besides, the characteristics of ubiquitous connection and heterogeneous integration ensure that the connection between different nodes is not affected by the system environment [26]. Therefore, this feature can realize the integration and collaboration of multi-technologies across networks, industries, and heterogeneity, so as to guarantee the free flow of data in the system, open the interactive channel for the deep integration of all channels, and provide an important guarantee for the deep integration of information technology and industrial technology. At the same time, the self-organizing nature of CPS enables CPS to realize self-organization, self-configuration, and self-optimization between production and device operation and response to changes in demand. Considering the directivity of communication between devices, we proposed a node relation model of a directed weighted graph. Then, after considering the directivity of communication between devices, we also proposed a node relation model to construct a directed weighted graph. In this model, the edge weight represents the communication traffic between two connected nodes, and the communication tightness represents the communication status between two nodes. We set up a weighted network model, which is based on the directionality of communication. It is used to analyze the relationship between the participating nodes in a network.
The related concepts are as follows: Edge Weight: Edge weight w ij is the weight from node v i to node v j on the connected edge l ij , representing the communication between node v i and node v j . On the contrary, the edge weight w ij represents the traffic from node v i to node v j .
Communication Tightness: The tightness of the communication relationship between nodes in the intelligent manufacturing system. The higher the communication volume between node v i and node v j in unit time, the higher the communication tightness between node v i and • Step 1. Set a threshold decided by the communication tightness between the nodes in the CPS, and divide the communication relationship between the nodes into two types. One is the more close communication relationship, whose edge weight between nodes is greater than the threshold; The other is a group whose edge weights between nodes are less than the threshold or no edges exist. Then, divide nodes with high communication tightness into the same group according to the classification results; • Step 2. Collect the results and check whether all nodes are connected with the edge whose weight is greater than the threshold value. Otherwise, adjust the threshold value and repeat the first step until all nodes in the CPS are connected and the edge weight between two nodes is greater than the threshold value. • Step 3. Remove the edges whose weight value is less than the threshold, and take the greater ones as the research object. Then, divide the nodes into different communities according to the degree of the nodes; • Step 4. During the community partition, we start to set the number of nodes in a community as K = 3 . After that, the number of nodes will be increased one in the community by turns. Record the kinds of community partition and the number of communities on each turn (each node should be included in at least one community), until the value of K makes it impossible for all nodes in the network to divide a new community, then we think it that all communities have been listed and the partition of communities is complete. • Step 5. Whether all nodes in the system are included in the partition result of each community, and the overlapping (common) nodes are pointed out.

Require:
The threshold of the tightness of communication between devices, T; The set of edge strengths between nodes, S n ; Ensure: The set of nodes and communities, C k ; 1: repeat 2: while W i in S n do 3: if W i ≥ T then if all nodes can divide in k communities then 13: C k ← k 14: else 15: k = k + 1 16: end if 17: until all nodes in the network can not be divided into communities 18: return C k ;

Experimental setup
According to the characteristic of CPS, first of all, CPS has the characteristic of ubiquitous connection and heterogeneous integration. In high-level CPS, there are often a large number of different types of hardware, software, data, network. CPS can integrate heterogeneous hardware, heterogeneous software, heterogeneous data, and heterogeneous network, realize multi-technology integration and cooperation across the network, cross-industry, and heterogeneous to ensure the free flow of data in the system. Therefore, CPS must be a multi-heterogeneous integration of the complex. Besides, CPS can process and analyze the information in information space according to the perceived environmental change and adaptively respond to the external change effectively. Concurrent self-organization of CPS across multiple CPS at a higher level through network interconnection. Based on the above CPS features, we can simulate a node representing a CPS smart device and simulate the performance of CPS by testing the communication between them.
This experiment is to simulate and test a Cyber-Physical System with 42 nodes. Node numbers from "01" to "42". In different configurations, simulation node communication nodes are represented by a solid line (yellow) in the visualization process. Then if the traffic between nodes is small, the edges between the two nodes are represented by a dashed line (blue). The visualization of the nodes in the system and the communication relationship between the nodes is shown in Fig. 2.
As can be seen from Fig. 2, the nodes in the system are divided into three parts according to the traffic volume between the nodes. Among them in the first part, the internal communication is relatively high, while the nodes in this part only have a small amount of traffic with other nodes in the system.
In contrast, the third part of nodes in the system is more complex than the nodes in the previous two parts, and it is clear from Fig. 4 that the nodes '26' , '31' , '36' , '37' are at the intersection of the two communities. Therefore, nodes can be divided into different communities by using the Kclique Penetration Algorithm. Further more, we regarded those nodes whose edge is solid line as our object of the research.
Firstly, all the solid line edges and their connected nodes are extracted, and then we divided the community by the K-clique penetration algorithm. In the process of community partition, the minimum number of nodes set at K = 3 and we take it as the starting point. The value of K is added one after another until all nodes are divided into their communities. As a result, all the communities partitions will be listed. In this simulation experiment, the  Table 1.

Experiment result
According to the results of the above table, when it is divided into 5 communities, each node in the simulation system is divided into at least one community. The number of other communities dividing results is invalid because there are still some nodes that are not in a community. It can be concluded that in this experiment, when all the nodes are divided into 5 communities, the partition method is effective. The community structure is shown in Fig. 3. When the community partition algorithm is completed, the space occupied by data storage needs to be analyzed. When the nodes in the system are not segmented, all the data generated by the nodes in the system need to be stored in the same chain, and all the nodes in the same chain have the same ledger. So all the node in the system needs to synchronize all the data produced in the system. In the experiment, for the sake of statistical analysis, we assume that each data block is the same size. Here we set the size of a block as 1. And each piece of data produced by the system is followed by a block. Then, the number of blocks produced by the system is the total communication traffic in the system, and the space occupied by the data stored in the system is only related to the traffic in each chain. To get the effectiveness of the data-chain partition algorithm for the system data storage optimization, we calculate the data-chain partition based on the singlechain, the random partition data-chain, the edge weight based the partition of data-chain, and the communitybased data-chain partition. After that, we calculate the storage space occupied by each data-chain and crosschain communication under various partition methods. We can calculate the total communication traffic by the following equation: To better explain the storage space the data occupied with a random dividing method, we randomly divide the nodes into data-chains in different ways. The nodes are  randomly divided into three, four, and five data-chains. Each data-chain partition method is also a different combination of nodes. Table 2 in the previous section has explained in detail the combination of nodes under each data-chain and whether there are cross-chain nodes in each data-chain. In the formula, count is the total traffic of the system, insid is the intra-chain traffic in the data-chain, and cross is the inter-chain traffic between data-chains. To better compare the statistics, we present them in a histogram, as shown in Fig. 4.
The results show that communication data storage occupies the most storage space in the case of no use grouping. Randomly grouping can effectively reduce the space occupied by data storage, but it may cause a lot of cross chain communication, which can not well realize the requirements of rapid query of system data and high concurrency of system. According to the community structure method to divide the system nodes, not only can effectively reduce the space occupied by data storage, improve the query speed of data, better achieve the concurrency of the system, but also can effectively reduce the number of cross chain communication and reduce the system overhead. In different partition methods, the comparison results of random partition method and community partition method on system performance are shown in Table 2.

Conclusion
This paper mainly studies the performance and security problems brought by the centralized storage of CPS. In intelligent manufacturing, the centralized storage scheme will bring great network transmission pressure. The data can not be processed in parallel and will be exposed to all nodes in the system. To solve these problems, we analyze the relationship between CPS nodes and establish a trust relationship model. According to the relationship between devices in CPS, we propose an algorithm to partition the nodes into different data-chains using a cluster with community structure. Finally, we use this algorithm to carry on a simulation contrast experiment to verify the proposed algorithm. The experimental results show that if the nodes in CPS use the blockchain technology, the multi-chain storage architecture will greatly reduce the consumption of storage space. The node partition algorithm for multi-chain architecture of blockchain that we proposed in this paper can effectively reduce cross-chain communication times between nodes by the partition method using community structure cluster. The main work of our research different from others is that we analyze the communication relationship between each node in the blockchain network constructed in CPS through the Complex Network theory. We propose to use edge weights to measure the strength of communication between nodes. Then, the nodes in the blockchain network are grouped according to the strength of communication relationship by the clustering method based on community structure. Finally, we can get the partition result of nodes in the blockchain system through this algorithm. This algorithm provides a solution for nodes grouping process of the blockchain network based on the multi-chain structure. This solution is novelty for its effectiveness that it is conducive to improving the speed of data processing and reducing the communication load in the system.
Also, it can reduce the space occupied by irrelevant data so as to improve efficiency of the system. Because the data will be isolated by different data-chains, so the data synchronous between nodes in multi-chain structure can not affect each other. The data in different data-chain can not be accessed to each other, which improves the concurrent processing ability and ensures the information security of the system.