Attack detection model for BCoT based on contrastive variational autoencoder and metric learning

With development of blockchain technology, clouding computing and Internet of Things (IoT), blockchain and cloud of things (BCoT) has become development tendency. But the security has become the most development hinder of BCoT. Attack detection model is a crucial part of attack revelation mechanism for BCoT. As a consequence, attack detection model has received more concerned. Due to the great diversity and variation of network attacks aiming to BCoT, tradition attack detection models are not suitable for BCoT. In this paper, we propose a novel attack detection model for BCoT, denoted as cVAE-DML. The novel model is based on contrastive variational autoencoder (cVAE) and deep metric learning (DML). By training the cVAE, the proposed model generates private features for attack traffic information as well as shared features between attack traffic information and normal traffic information. Based on those generated features, the proposed model can generate representative new samples to balance the training dataset. At last, the decoder of cVAE is connected to the deep metric learning network to detect attack aiming to BCoT. The efficiency of cVAE-DML is verified using the CIC-IDS 2017 dataset and CSE-CIC-IDS 2018 dataset. The results show that cVAE-DML can improve attack detection efficiency even under the condition of unbalanced samples.


Introduction
In recent years, the technology of blockchain, cloud computing and IoT has been applied in many aspects of our lives, such as finance, government services and so on [1,2].As a consequence, the blockchain and cloud of things (BCoT), which is the integration of blockchain technology with cloud computing assisted Internet of Things (IoT), has received more attention [3].
The BCoT make used of advantages of blockchain technology, cloud computing and IoT to cover the shortage of those three technologies.Internet of Things (IoT) utilize internet technology to interconnection among devices to achieve collection, analysis and interaction of data.How to efficiently manage the resource is key for IoT.So, the BCoT make source shared pool provided by cloud computing to achieve efficient management.The data security is most hinder for IoT.The BCoT used the blockchain technology, such as distributed ledger and smart contract, to ensure security of data in IoT [4].So, the BCoT has become development tendency.
The security has become the most development hinder of the BCoT.Although the BCoT used blockchain technology to security of data, the vulnerabilities of IoT and blockchain is still threaten the security of BCoT [5].For example, the interconnection of some devices in BCoT applies wireless technology.As a consequence, the security hole of wireless technology also threatens security of BCoT.BCoT utilizes blockchain technology to ensure the security of data.In recent years, The number of BCoT attack incidents has grown exponentially in recent years.As one of the most critical components of BCoT security, attack detection model for BCoT has been paid more and more attention recently [6].Existing attack detection model make used of the attack features containing in attack signature database to detect attack.Due to diversities and variations of attack against BCoT, it is difficult to extract attack features to construct attack signature database.As a consequence, existing attack detection model cannot be directly applied in the security field for BCoT [7].
Meanwhile, machine learning technology has made significant progress in many fields.It can learn knowledge hiding in training datasets.Therefore, the machine learning-based attack detection model is widely applied in BCoT security [8,9].The attack detection system is essentially classification model that can distinguish attacks from regular visits [10].The training dataset for the attack detection system consists of the data that records the information of ordinary and attacking visits.For the training dataset, three problems result in inefficiency for the machine learning-based attack detection system.The three problems are shown as following: Firstly, the problem of distribution imbalance between normal and attack traffic data exists in the training dataset.In practice, compared with the frequency of normal traffic devices in BCoT, the frequency of attack traffic is relatively low.For training dataset, the amount of data recording normal visiting is much more than that recording attack traffic.As a result the distribution imbalance leads to the attack detection results being biased toward the frequent visits and inefficiency in detecting new intrusion attack [11].
Secondly, attack traffic data contains features both relevant and irrelevant attack information.The attack traffic data consists of several attributes.The information recorded in some attributer is relevant with attack.Those information can be applied to distinguish attack traffic from normal visiting.The information recorded in the other attributes is irrelevant attack.Traditional attack detection model based on machine learning technology can extract the information relevant with attack to distinguish attack traffic from normal traffic.But the information irrelevant with attack can make the model inefficient [12].
Thirdly, the distribution of normal traffic data is more concentrated than that of attack traffic data, which is scattered in a wide area.In essence, because traditional machine learning-based attack detection systems are classification models based on clustering technology, they cannot accurately distinguish attack visiting from normal traffic.As a result, attack detection systems based on nonlinear models have better efficiency than traditional attack detection systems [13].
In recent years, there has been much significant research on the above three problems regarding attack detection based on machine learning.Some researchers designed an attack detection system based on setting different weights for a single or a set of complex classified samples.Because of the high diversity and variation of the network intrusion attack, it is difficult to set different weights for all attack samples [3,7].Some researchers apply oversampling technology to balance training datasets to generate new attack samples.Then, the balanced training dataset is used to train a machine learning-based attack detection system for BCoT.However, generation of new attack samples cannot fully exploit the information hidden in the known samples.As a result, those generated attack samples provide lowlevel improvement in the efficiency of the attack detection system [14].Some researchers combine generative models, such as VAE and GAN, with oversample technology to generate new attack samples to improve the efficiency of the attack detection system.However, the generated new attack samples ignore the hidden information in the traffic data.In conclusion, how to efficiently extract features from attack traffic is key of construction attack detection model.
The cVAE combines contrastive learning with VAE to identify to enhance its salient features.The cVAE is trained in two related but unpaired datasets.And the cVAE explicitly models latent features shared between the datasets and the rich potential features of one dataset relative to another, enabling the algorithm to separate and enhance significant possible features.Therefore, the cVAE can be used as a generative model to generate new samples with various levels of salient and irrelevant latent variables [15].
The triplet network as one of the most popular metric learning technologies is successfully applied in many fields [16].The triplet network is created based on a triplet of samples.A triplet consists of three samples selected from the training dataset.The first one is to denote an anchor.The second one belongs to the same class with the anchor and is denoted as the positive sample.The third one belongs to the opposite class with the anchor, denoted as the negative sample.The triplet network takes triples as input, and it learns an embedding space where the distance between samples labeled with opposite classes is more significant than between the samples labeled with the same class [17].Finally, the triplet network can distinguish attack traffic from normal based on the learned embedding space.Traditional triplet networks suffer from poor convergence because of the random selection of samples from the training dataset to construct a triplet.
In this paper, we propose a novel attack detection model to distinguish attack traffic from normal traffic based on the binary classification model, denoted as cVAE-DML.The novel make used of the cVAE to generated new sample to solve imbalance problem.The cVAE model two types of features.The first type is features shared between normal traffic and attack traffic.The second type is private features for attack traffic.The two type features are not only used to generate samples to solve imbalance problem, but also improve efficiency of distinguish attack traffic from normal traffic.Based these two types of features, the cVAE-DML utilizes oversample technology to generate new attack samples to solve the imbalance problem in the training dataset.Based on the trained cVAE and generated samples, cVAE-DML utilizes triplet network, one of the most popular metric learning technologies to achieve attack detection.Existing attacking detection model extract attack features by analysis attack traffic.But attack traffic contains relevant and irrelevant attack information.Existing attack detection model cannot distinguish private information of attack from attack traffic.As a consequence, compared with existing attacking detection model, cVAE-DML can efficiently extract attack features for detecting attack.In short, the main contributions are listed below.
Firstly, a novel attack detection system for IoT, called cVAE-DML, is proposed based on the cVAE and deep metric learning.To solve the imbalance problem in the training dataset, the new attack detection systems combine oversample technology with the cVAE to generate new attack samples to balance the training dataset.The cVAE introduce contrastive learning to VAE to enhance latent features of attack traffic.As a consequence, the generated samples have more diverse.And then, the decoder of the cVAE is fully connected with the triplet network to embedding space, where the distance between samples labeled with opposite classes is more significant than between samples labeled with the same class.Finally, the cVAE-DML can apply the distance between samples to the detection attack.
Secondly, two public datasets, the CIC-IDS 2017 dataset, and the CSE-CIC-IDS 2018 dataset, are utilized to verify the efficiency of the cVAE-DML.The experiment results show that the cVAE-DML can improve the efficiency of attack detection under the condition of unbalanced samples compared with traditional attack detection.
The remaining of this paper is organized as follows: the related works are presented in Sect. 2. The details of the cVAE-DML are described in Sect.3. The details of the experiment are introduced in Sect. 4.

Related works
Recently, the attack detection model despite its significant progress, have still meet challenges including distribution imbalance problem in training datasets and the high diversity and variation of network attack [14].According to the detection mechanism, the attack detection system can be classified into four categories: signature-based, anomaly-based, specification-based, and hybrid detection systems, the first being the most widely used approaches [14,18,19].
A signature-based attack detection system maintains the attack signature database.For a network visit, if the feature of this network visiting matches the pattern stored in the database, the network visit is detected as an intrusion attack.A lightweight signature-based attack detection system was proposed for IoT [20].There are four-layered architectures for the attack detection system: signature generator, pattern generator, attack detection engine, and output engine.Liu et al. [21] proposed a novel attack detection system based on the artificial immune system.This attack detection system applies immune cells to stored attack features.Rebbah et al. [22] combined the attack detection system with Cloud technology.Such a detection system is based on the temporal and spatial profiles calculated for each client according to the data of its request.The attack without any documented analysis and studies the client provides.
An anomaly-based attack detection system can identify an unknown activity by comparing it with a normal behavior profile and then classifying it as normal or abnormal.Unlike signature-based attack detection systems, anomalybased attack detection systems effectively identify new intrusion attack [18].Larijani et al. proposed an attack detection system based on a random neural network for IoT [23].Roy et al. proposed a machine learning-based two-layer hierarchical attack detection mechanism to detect intrusion attack while satisfying the IoT resource constraint [24].This mechanism applies a fog layer to offload networking and computation overhead from the cloud, and it provides the first-line defense closer to the physical IoT devices.Yin et al. proposed an attack detection system based on recurrent neural networks and studied the performance of this system in binary and multiclass classification [25].For the attacks with lower detection accuracy in the attack detection systems, Vinayakumar et al. proposed a bidirectional long-and shortterm memory-based attack detection system [26].Sun et al. applied the combined CNN and LSTM to extract spatiotemporal features of traffic data to detect intrusion attack [27].Zhang et al. applied DBN to predict the kind of traffic data.And Zhang applies a genetic algorithm to optimize DBN [28].Ge et al. used forward neural networks based on multi-classification to construct the attack detection system and transferred learning technology to encoder classification features [29].Based on GANS and fog architecture, the attack detection system was proposed to detect unknown attacks and conquer the acquisition challenge [30].The authors used the AE to extract attack features.The CNN and MLP are applied to detect the intrusion attack [31][32][33].Meanwhile, the authors applied DNN to learn attack features.Then, learned features were applied to detect the features [26,34].Some research on the attack detection system has focused on model generation through intensive analysis of feature engineering instead of considering the real environment.They have limitations in applying the previous methods for a real network environment to detect real-time attacks.
Traditional machine learning technologies are applied in attack detection systems for IoT.Li et al. combined an attack detection system with a K-Nearest Neighbor (KNN) classification algorithm to detect intrusion attack in wireless sensor networks [35].Shapoorifard et al. proposed an attack detection system based on the KNN classification algorithm and the K-MEANS algorithm [36].
Through the above analysis, we can obtain conclusions as following.The efficiency of signature-based attack detection model depends on signature database.The attack traffic data is diversity.As a consequence, it is difficult to efficiently extract attack signature for traffic data.Compared with signature-based attack detection model, anomaly-based and specification-based attack detection model use machine learning to learn attack feature.As a consequence, those attack detection model effectively identify new intrusion attack.But the three problems mentioned in Introduction is hinder for the efficiency.

Method
In this section, the details of the cVAE-DML will be described.The cVAE-DML essentially construct binary classification model to distinguish attack traffic from normal traffic.As a result, all the attack classes are assigned to the same label regardless of the attack type.There are four modules in the cVAE-DML: the data preprocessing module, the oversampling module, the training module, and the predictive module.The original data consists of the values of multiple attributes.Firstly, the values of all the attributes in the original data are encoded in the data preprocessing module.The training dataset is composed of all the encoded data.And then, the training dataset is fed to the oversampling module to generate new attack samples.In the training dataset, the amount of normal traffic data far exceeds the amount of the attack traffic data.So, all the generated new attack samples are added to the attack traffic data to alleviate the imbalance problem in the training dataset.Thirdly, all the training datasets are fed to the training module to train a binary classification model that can distinguish the attack traffic from the normal visiting.Finally, new visiting data is put into a trained classification model to judge whether it is an intrusion attack.The meaning of all the abbreviation is shown in Table 1.

Data preprocessing
There are two kinds of attributes in the original data: the category attribute and the value attribute.The strategy of onehot encoding is used for the category attribute.For instance, the encoders for the category attribute 'protocol' , including 'TCP' , 'UDP' , and 'ICMP' , are [1,0,0], [0,1,0], and [0,0,1].For the value attributes, the value of each attribute is a continuous variable.For the value attributes, Eq. ( 1) is applied to map the value of each attribute to the range of [0,1], where it is denoted as the value of the attribute, and Max and Min are represented as the max value and min value of this attribute, respectively.Finally, all the preprocessed data are  There are two latent variables in the cVAE.The first latent variable, denoted as s, is salient for detecting attack against the BCoT.This latent variable is private latent variables for x a .The second latent variable, denoted as z, is irrel- evant variable with detection intrusion attack and shared between x n and x a .Two conditional distributions, which are followed by x a and x n , are shown as Eqs.( 2) and ( 3), respectively.When the set x n is fed to the encoder q z , the output of the encoder q z is z x n .And then the decoder of f θ can use concatenating z x n with 0 to reconstruct set x n .When the set x a is fed to two encoders q s and q z , the output of those two encoders q s and q z are z x a and s x a , respectively.Finally, the decoder of f θ can use concatenat- ing z x a with s x a to reconstruct set x a .Therefore, we can get the conclusion that concatenating z x a with 0 is latent vari- able for x n .And concatenating z x a with s x a is latent varia- ble for x a .The lower bounds of likelihood of x n and x a are shown as Eqs.( 4) and ( 5), respectively.The cVAE is trained by the maximizing sum of Eqs. ( 4) and ( 5).The detail of training the cVAE is shown in Dai et al. [15].
The second step is to generate new attack sampling.As is mentioned in the part of Introduction, the amount of data recorded for the normal visiting information is much greater than that of attack traffic.Consequently, there are imbalance problems in the training dataset.
All the samples in the training dataset are divided into three kinds, namely, safe samples, dangerous (2) i ≥ E q �z (z),q �s (s) f θ x a i |s, z − KL q � s s|x a i ||p(s) − KL q � z z|x a i ||p(z) (5) Fig. 1 Structure of cVAE samples and noisy samples [11].Oversampling with dangerous samples is helpful to improve the efficiency of attack detection, because dangerous samples belong to attack samples.Based on dangerous samples, synthetic Minority Oversampling Technique (SMOTE)is applied to generate new attack samples.The pseudo code of this specific algorithm is shown in Algorithm 1.
The detail of oversampling is described as follows: Firstly, the training dataset is applied to train the cVAE.Secondly, the training data is fed to the encoder of the cVAE, as it shown in Line 5 of Algorithm 1.At this time, for l i , y i ∈ Z , if y i equals to attack, l i equals to the con- catenating q � s (x i ) with q � z (x i ) .If y i equals to normal, l i equals to the concatenating q � s (x i ) with 0 .According to the value l i , Euclidean distance is used to obtain all the dangerous samples in Z a shown from Lines 8 to 11 of Algorithm 1. Finally, SMOTE is used to generate new attack samples.And the generated samples are used to balance the training dataset

Training
The cVAE-DML combined cVAE with triplet networks, which is one of the main types of deep metric learning, to construct the attack detection kmodel.Triplet use triplets of samples to learn an embedding space, where distances between samples labelled with opposite classes are greater than the distance labelled with the same class.Each triple x, x + , x − consists of anchor sample x (a training sample choose from the training dataset), positive counterpart of x and negative counterpart of x.The positive counterpart of x, denoted as x + , is the sample which belongs to the same class with x.The negative counterpart of x, denoted as x − , is the sample which belongs to different class with x.
The cVAE-DML uses the cVAE to construct triplets.And then the triplets are used to train triplet networks.The pseudo code of the training stage is shown in Algorithm 2. The detail is shown in Fig. 2. For each sample x = x i , y i in training dataset, the triplet anchor sample is x.The negative and positive counterpart are obtained as follows.
If sample x belongs to attack sample, that is,y i = attack , the decoder f θ of the cVAE can use con- catenating q � s (x i ) with q � z (x i ) to reconstruct x i .As a consequence, for sample x, the triplet positive counterpart x + i is f θ q � s (x i ), q � z (x i ) , and the negative counter- part x − i is f θ q � s (x i ), 0 .If sample x belongs to normal sample, that is, y i = normal , the decoder f θ of the cVAE can use con- catenating q � s (x i ) and 0 to reconstruct x i .As a conse- quence, for sample x, the triplet positive counterpart x + i is f θ q � s (x i ), 0 , and the negative counterpart When all the triplets are constructed, those triplets are used to train the triplet networks.The triplets consist of three networks.And those networks share common weights and structure.The soft-margin triplet loss [17] is used as the loss function and the formula is as follows: where x is anchor, x + is the positive counterpart of x, x − is the negative counterpart.
When training the triplet has finished, distances between the samples and the positive counterpart are greater than distance between the samples and the negative counterpart.( 6)

Predictive stage
The predictive stage is described in Algorithm 3. The cVAE-DML uses the distance from ϕ(x) to ϕ(x 1 ) and ϕ(x 2 ) to predict the classification for x.If d(x 1 ) < d(x 2 ) , then x is classified as an attack traffic.Otherwise, x is classified as a normal visiting.

Experiment
This paper proposed a novel attack detection system for IoT, denoted as cVAE-DML.And two public datasets were applied, including CIC-IDS 2017 [37] and CSE-CIC-IDS 2018 [38], to verify the efficiency of the cVAE-DML.
The experiment results showed that the efficiency of the cVAE-DML is better than traditional attack detection.In this section, the detail of experiments will be introduced.[11][12][13].After finishing sampling, the cVAE is applied to generated new attack samples.

Evaluation metric
The following indicators are appliedto verify the efficiency of the cVAE-DML and those indicators are shown from Eqs. ( 7) to (10).And the cVAE-DML network structure parameters of the cVAE-DML are shown in Table 2.

Balance training dataset
As is mentioned in Introdction, the serious imbalanced class distribution in the training dataset causess  If the number of generated samples for those categories is equal to the number of samples in the Benign data, the generated data will be redundant and lose its diversity, even affecting the judgment of the Benign class [11].The detail of those two public datasets is shown in [37,38].Therefore, for each attack class, the number of samples of this class is gradually increased by an integer multiple of 10, and is finally determined.This method is applied in [11,13].In the experiment, the number K of the nearest neighbors is set to 6.The number of generated samples for each class attack is shown in Table 3.

Performance of the cVAE-DML before and after the balanced training dataset
In

Comparative experiments
In this paper, five algorithms were used, that is, THEO-DORA [31], AIDA [32], MINDFUL [33], DNN-3 [19] and DNN4Layers [26], as the comparative algorithms to verify the efficiency of the cVAE-DML.The details of those comparative algorithms are shown as follows.
The results of comparative experiments are shown in Table 5.For the CIC-IDS2017 dataset and the CSE-CIC-IDS 2018 dataset, the efficiency of the cVAE-DML is better than five comparative models,which means the cVAE-DML can distinguish attack traffic from normal visiting than others algorithms more accurately.The drawback of those comparative models is that they cannot accurately extract private information from intrusion attack.In addition, they cannot alleviate imbalance problem in the training dataset.As a result, as for the accuracy and F1-scores measurements, those comparative models are lower than the cVAE-DML.

Conclusion
This paper proposes a novel attack detection system for IoT, denoted as cVAE-DML.The cVAE-DML combines oversample technology with the cVAE to generate new attack samples.Then, it generates attack samples, which are added to the original data in the balanced training dataset.Finally, the cVAE-DML combines the cVAE with the triplet networks to achieve attack detection.In the end, two public datasets and five comparative models were used to verify the efficiency of the cVAE-DML.The results of experiments show that the accuracy and F1-scores of the cVAE-DML are better than five other comparative models.
And the cVAE introduce contrastive learning to VAE to enhance latent features of attack traffic.As a consequence, the new sample generated by the cVAE-DML can efficiently improve effectiveness of attack detection.Here are the two steps for oversampling.The first step is to train the cVAE based on the attack detection dataset.The structure of cVAE is shown in Fig.1.Firstly, the training dataset is divided into two parts.The first part, denoted as x n , consists of all the samples (1)x i = x i − Min Max − Minbelonging to normal traffic.The second part, denoted as x a , consists of all the samples belonging to attack traffic.And then all the divided training datasets are fed to the cVAE.

Table 1
The meaning of all the abbreviation tion of attack data.As a consequence, attack features extract by the traditional technology contain relevant and irrelevant attack information.That irrelevant information can reduce the effectiveness of attack detection.The cVAE model two types of features.The first type is features shared between normal traffic and attack traffic.The second type is private features for attack traffic.
-IDS 2017 was released as a public dataset for attack detection in 2017 and was collected by the Canadian Institute of Cybersecurity for a total of 5 days.The CIC-IDS 2017 has more species diversities than KDDCUP 99 and NSLKDD.For CIC-IDS 2017, the researchers apply CIC-FlowMeter to construct real network environment.And based on networking protocols such as HTTP and FTP, researchers created abstract behaviors of 25 users and collected traffic data from Monday to Friday.The traffic data on Monday belongs to normal traffic data.The traffic data from Tuesday to Friday belongs to anomaly traffic data.

Table 3
The number of original samples and generated sample

Table 4
The performance of the cVAE-DML on CIC-IDS 2017 and CSE-CIC-IDS 2018 with oversampling and without oversampling

Table 5
Comparative result