Skip to main content

Advances, Systems and Applications

Security issues of news data dissemination in internet environment

Abstract

With the rise of artificial intelligence and the development of social media, people's communication is more convenient and convenient. However, in the Internet environment, the untrue dissemination of news data leads to a large number of problems. Efficient and automatic detection of rumors in social platforms hence has become an important research direction in recent years. This paper leverages deep learning methods to mine the changing trend of user features related to rumor events, and designs a rumor detection model called Time Based User Feature Capture Model(TBUFCM). To obtain a new feature vector representing the user's comprehensive features under the current event, the proposed model first recomputes the user feature vector by using feature enhancement function. Then it utilizes GRU(Gate Recurrent Unit, GRU) and CNN(Convolutional Neural Networks, CNN) models to learn the global and local changes of user features, respectively. Finally, the hidden rumor features in the process of rumor propagation can be discovered by user and time information. The experimental results show that TBUFCM outperforms the baseline model, and when there are only 20 forwarded posts, it can also reach an accuracy of 92%. The proposed method can effectively solve the security problem of news data dissemination in the Internet environment.

Introduction

In the era of cloud computing, the Internet has become the main channel for news dissemination. However, the rapid spread of false information has brought serious challenges to the security of news data. The widespread application of cloud computing technology has provided convenience for the storage and processing of news data, but at the same time, the openness of the cloud environment has also created opportunities for the spread of false information. Our research is committed to understanding and solving this problem. By integrating knowledge in cloud computing, data mining and artificial intelligence, we propose innovative false information detection methods to ensure the security of news data dissemination based on cloud computing in the Internet environment.

Rumor detection, as one of the most important tasks in artificial intelligence security, is currently an urgent problem to be solved. Rumor detection had received widespread attention in recent years and many related methods had been proposed. Most of the previous rumor detection methods regard rumor detection as a binary classification problem in supervised learning. That is, the constructed classifier can classify the targeted information into rumor or non-rumor labels. Existing rumor features are mainly selected from the text content, user attributes, and information dissemination. Taking Sina Weibo as an example, a piece of information published by a community user includes text content, user attributes, and social communication features (click likes, comments, retweets, etc.). The previous rumor detection methods can be divided into two categories: rumor detection based on traditional machine learning and rumor detection based on deep learning.

The rumor detection based on traditional machine learning involves three processes: (1) select and extract features that can effectively represent rumors from the training data set; (2) leverage the features to train a classifier on the data set; (3) utilize the trained classifier to predict the unknown data and judge whether the data is a rumor. Commonly used classifiers include decision tree classifiers [1,2,3,4], support vector machines [2, 5,6,7,8], Naive Bayes [1, 2] and random forests [9, 10], etc. The traditional machine learning methods have limitations in data representation ability and rumor detection accuracy, although they have made certain achievements in the rumor detection.

The rumor detection based on deep learning can automatically extract deeper features from the original data, the heavy feature engineering can be avoided. At the meantime the accuracy of rumor detection can be improved greatly. Recurrent Neural Network and Convolutional Neural Networks (CNN) are widely used in rumor detection tasks, where Recurrent Neural Network (RNN) is well-suited for time series information mining and CNN can learn local features by convolution operations. Ma et al. [11] proposed a rumor detection model based on recurrent neural network for the first time, which made a new breakthrough in the rumor detection task by learning the semantic features and timing information of text content. Ma et al. [12] proposed a tree-structured recurrent neural network to distinguish rumors from non-rumors by learning the non-sequential propagation structure of text content. Xu et al. [13] developed a deep recurrent neural network based on symmetric network structure, where the reply text according to the number of users’ fans is filtered, and the characteristics of dynamic time signals is captured. Sujana et al. [14] presented a multi-loss hierarchical BiLSTM model with decay factor. Yu et al. [15] proposed a rumor detection method based on convolutional neural network (CAMI method), and verified the effectiveness of convolutional neural network in rumor detection task by experiments. In addition to RNN and CNN, the other deep learning models such as Graph Neural Networks, Transformer models and graph attention networks had also been widely used in rumor detection tasks [16, 17]. Lu et al. [18] proposed a new rumor detection model by the attention mechanism and several neural network models, where the user features, propagation feature sand text content can be better combined. Rao et al. [19] designed a new BERT model with language noise for the layer granularity technique, namely LPGAM-BERT model, which masks the insignificant common concerns between source posts and low-level reviews. Ma et al. [20] developed a rumor detection model based on generative adversarial networks with robust and effectiveness guarantees. Compared with the rumor detection based on machine learning method, the rumor detection based on deep learning can bring in the better detection effect.

Most of the previous rumor detection methods only consider the text content. The single information source and incomplete content would adversely affect the utility of the detection model. In addition, most of the previous methods ignore the situation that user features may shift over time under different topics, the association and difference between users cannot be illustrated well. To alleviate these issues, we take the user's personal information, behavior information and user influence into account while building user characteristics of multiple dimensions. Besides, we leverage deep learning models to study the change of user characteristics in the process of spreading trend. The proposed method can recapture user characteristics to adapt to the needs of the current topic in the specific scenario of rumor detection. The main contributions of this paper can be summarized as follows.

  1. (1)

    In order to solve the problem that user characteristics will shift with the development of the event in the process of event propagation, this paper designs a rumor detection model called TBUFCM. Firstly, the proposed model uses the feature enhancement function to recalculate the user feature vector, and obtains a new feature vector that can represent the user's comprehensive characteristics under the current event. Then the GRU model and CNN model are leveraged to learn the global and local changes of user characteristics with the development of events. In addition, the TBUFCM model can perform efficient and stable detection in the early stage of propagation.

  2. (2)

    The experiments are conducted on the real-world dataset. The experimental results indicate that TBUFCM outperforms the baseline mode, and the practice of the proposed model is verified. The TBUFCM model proposed in this article is one of the contributions of this article. The accuracy of TBUFCM is higher than that of the other models under the same number of retweets, and the accuracy of TBUFCM can also reach 92% with only 20 retweets.

Related work

For the data stored in the current cloud platform, people often only care about the security of data storage and use, but not the authenticity of the data itself. To control the security of data transmission, the first thing is to control the authenticity of the source of data. It is particularly important to incorporate true and correct data into the dissemination of news data in the Internet environment.

There are three categories of features that can be used to distinguish between rumors and non-rumors in rumor detection: text features, user features and propagation features. Early research mostly extracted features from news text content to detect the authenticity of news articles. The text content includes the original post content and the user's response. For example, Castillo et al. [21] regarded the proportion of positive blog posts, the proportion of negative blog posts, and the proportion of blog posts including the first personal pronoun as text features, and adopted C4.5 classifiers to identify false news topics on Twitter. Yang et al. [8] considered link URLs, the number of positive and negative emotional words, while analyzing the text features. Then the SVM classifier is utilized to achieve rumor detection tasks. Ma et al. [11] leveraged Recurrent Neural Network to learn user response characteristics by useful query phrase of user response. The rumor detection methods in literature [22,23,24] focus on the language style, writing style and social emotion, respectively.

However, the early rumor detection methods only considered text content and has their limitations. For example, a single information source and the unavailable content would adversely influence the performance of detection mode. To solve these issues, this paper introduced user and propagation features into the research method for rumor detection tasks. The rumor detection method based on user characteristics mainly simulates the user characteristics that forward and respond to the source event, relying on user feature information to achieve the purpose of detecting the authenticity of the event. Yang et al. [8] learned the l feature of users by their basic information. The common user information includes the user authentication information, gender, hometown, and fan count. Ruchansky et al. [25] leveraged the user relationship matrix to learn user features where the concept of trustworthy scores was introduced. Liu et al. [26] developed a rumor detection model that combines RNN and CNN to learn local and global characteristics of users to get a better representation forwarding users. To identify users in shared accounts, Jiang et al. [27] exploited a session based heterogeneous graph model to learn user features. Through experimental verification, Shu et al. [28] indicated that there was a significant difference between the basic information of users who are involved by the fake new and the real one. Dou et al. [29] learned the implicit preference characteristics of users by analyzing their historical posts and introduced the concept of content information. The detection method based on propagation features mainly utilizes the propagation structure in social networks to detect rumors. Ma et al. [12] developed a Recursive Neural Network based on tree structure to learn the structural information of rumor propagation. Literature [25, 28] jointly learns the order relationship of user forwarded comments and their correlation with news content, and utilizes the attention mechanism to provide interpretability. Lin et al. [30] represented the conversation structure as an undirected interaction graph and proposed the ClaHi-GAT model to encode the undirected interaction graph. This model consists of a Graph Attention Networks and a hierarchical Graph Attention Networks based on declaration guidance. It can alleviate the negative impact of irrelevant post interactions while enhancing user opinion interaction. Tian et al. [31] proposed a DUCK model based on user and comment propagation networks, where the model comment content, conversation structure and user interaction networks were illustrated by Transformer and graph attention networks jointly. Min et al. [32] proposed a PSIN model based on user information and posts, where a divide and conquered strategy were leveraged to illustrate the heterogeneous relationships.

The dissemination of news data is mainly carried out through social media and information dissemination platforms. Lian et al. [33] proposed an intelligent fake news detection system FIND based on federated learning (FL) in the literature, which to some extent solves the problem that the spread of fake news often brings serious harm to society and individuals. Phan et al. [34] believes that the need for false information detection to filter fake news in social networks is crucial, and outlines a comprehensive method for GNN to implement a fake news detection system in the article. Liao et al. [35] believes that the spread of false information can have a series of negative impacts on society, so he proposed the FDML model to solve the problem of false information and further reduce its spread. Zrnec et al. [36] believes that digital information exchange can bring about information sharing, and social media has become the main source of news for people. However, these behaviors have caused a proliferation of false news, which poses challenges for the safe dissemination of internet news data. An automatic false news detection system is needed to solve this problem. Other scholars [37, 38] have proposed new false information detection algorithms, and scholars are attempting to further enhance the dissemination security of internet news data through technological means. Gupta et al. [39] proposed a novel malicious user prediction model based on quantum machine learning that estimates the vicious entity present in the communication system precedently before allocating the data in the distributed environments, it significantly improves the security of the system. Gupta et al. [40] proposed a novel secure data protection model for privacy preservation in the cloud environment by partitioning, sanitizing, and analyzing the data effectively to improve the model’s privacy. Gupta et al. [41] proposed a novel model based on differential privacy and machine learning approaches that enable multiple owners to share their data for utilization and the classifier to render classification services for users in the cloud environment, and also provided a robust mechanism to preserve the privacy of data and the classifier. Singh et al. [42] proposed a novel model that partitions data into sensitive and non-sensitive parts, injects the noise into sensitive data, and performs classification tasks using k-anonymization, differential privacy, and machine learning approaches. The recent developments on sentiment analysis using speech and different problems related to the same have been presented. Different types of features of emotional speech data and extraction techniques concerned with them are described [43]. Therefore, the dissemination of news in the Internet space needs to ensure the authenticity of news data. One of the security issues in news data dissemination is to ensure the authenticity of news data dissemination.

There is relatively little research on the security issues of information security and news data dissemination in current cloud computing. In this article, a new perspective will be used to examine this issue, providing a better processing approach and means for news data stored on cloud platforms. Based on the forementioned research work, this paper proposes a rumor detection model, called TBUFCM, which considers the user deep features and temporal signal changes during user information transmission. The proposed method can improve the performance in rumor detection.

TBUFCM model

The TBUFCM model consists of three key modules: data preprocessing, feature extraction, and feature fusion and classification. The data preprocessing module is responsible for cleaning, segmenting, removing stop words, and quantifying user comments. The feature extraction module uses user basic attributes, time series, behavioral information, and comment text to learn user temporal features, propagation behavior features, and comment language features. The feature fusion and classification module integrates different features and achieves accurate classification of rumors and non rumors through a classifier. This model structure effectively captures multi-dimensional user characteristics and improves the performance of rumor detection.

Definition of problem

The rumor detection model proposed in this paper belongs to a two-category task, which is to judge whether the event is a rumor or a non-rumor based on the sequence of users participating in the event propagation (users are ranked according to the chronological order of users' comments). The objective function is as follows.

$$\widehat{y}=f\left(X\right)$$
(1)

where \(\widehat{y}\in \left\{0, 1\right\}\), 0 and 1 denotes that the event is true and is a rumor, respectively.

The mathematical definition of rumor detection task implemented by the model is as follows.

Suppose that \(S=\left\{{s}_{1},{s}_{2},{s}_{3}, \cdots , {s}_{\left|S\right|}\right\}\) is the event set, where \({s}_{i}\) represents the ith event. Let \(U=\left\{{u}_{1}, {u}_{2}, {u}_{3}, \cdots , {u}_{\left|U\right|}\right\}\) be the user set. Every \({u}_{j}\in U\) corresponds to an \({x}_{j}\in {R}^{d}\) vector which represents user eigenvector. When event \({s}_{i}\) occurs, some users will share it and generate a series of forwarding behaviors. This series of time series are denoted as \({P}_{{(s}_{i})}=\{\cdots ,\left({x}_{j}, t\right),\cdots \}\), where each tuple \(\left({x}_{j}, t\right)\) represents that user \({u}_{j}\) release or forward events \({s}_{i}\) at time \(t.\) The user characteristic matrix that participates event \({s}_{i}\) propagation is \({X}_{i}={\left[{x}_{1}^{T}, {x}_{2}^{T}, {x}_{3}^{T}, \cdots ,{x}_{n}^{T}\right]}^{T}\), where \(n\) represents number of users who propagated participation of event \({s}_{i}\), \({x}_{t}\) denotes the eigenvector of the propagated user who is the ith participator event \({s}_{i}.\)

Model architecture

TBUFCM is mainly composed of three modules: data preprocessing module, feature extraction module, and feature fusion and classification module. The overview of the proposed model can be seen in Fig. 1.

  1. 1)

    Data preprocessing. The main task of this module is to process the user comment data under the event, and convert the user comment content into the word vector form that the model can learn from. This module includes data cleaning, word segmentation, delete stop words, word vectorization and other steps.

  2. 2)

    Feature extraction. This module aims to learn the user's time series features, propagation behavior features and user's comment language features. There are user basic attributes, time series, behavior information and user comment text information can be utilized. The module consists of three parts: the user's time series features learning model, the user's communication behavior feature learning model and the user's comment language feature learning model. These three models are composed of the TBUFCM model, the Bi-GCN model, and the Text CNN-GRU model, respectively.

  3. 3)

    Feature fusion and classification. This module aims to fuse different features extracted from the feature extraction module into a comprehensive feature representation. Meanwhile, the feature learning is needed. This module consists of two main parts: feature fusion and classifier.

Fig. 1
figure 1

Architecture of TBUFCM

Data preprocessing

The overall structure of the data preprocessing module is shown in Fig. 2, which mainly implements two tasks: user feature vector representation and user time series modeling. Both the two tasks would be elaborated in this subsection.

Fig. 2
figure 2

The flow chart of user data preprocessing

The representation of user’s feature vector

The main purpose of representation for user’s feature vector is to represent the user's basic information in the form of vectors and obtain the user's initial features, this task is very helpful for the subsequent training and prediction of the model. The specific implementation process is shown in Fig. 3.

Fig. 3
figure 3

Flowchart of the representation for user’s feature vector

First of all, the user data required by the model is collected according to Table 1, which includes the name of the account, personal information introduction, number of followers, number of fans, number of historical posts, user authentication information and whether the geographical location is enabled.

Table 1 User information table

Then, different types of user information in Table 1 are preprocessed to obtain the corresponding vector representation of this type of information, and the processing method is shown in Table 2. Finally, the vector representation of all types of information is spliced to obtain the user feature vector representation.

Table 2 Table of preprocessing methods

Different preprocessing methods are leveraged to process various user information.

  1. 1)

    Character length representation method: The number of characters in character user data is counted to indirectly represent user characteristics.

  2. 2)

    Numerical representation method: For the user data whose type is numeric, its own numerical value is directly used to represent the user characteristics.

  3. 3)

    Time representation method: Calculating the days difference between two times, and using the day value to indirectly represent the user characteristics, that means calculating the days difference between the user comment time and the user registration time, and obtaining the characteristics indirectly.

  4. 4)

    Boolean representation method: For a user profile with only two categories, convert the categories to Boolean values, and use Boolean values (0 or 1) to indirectly represent user characteristics.

The user information in Table 1 is closely related to the social media platformand it is inevitable that there would be missing user information. Hence this section fills in the missing content of user information by the fixed value filling, where the missing information is replaced by a fixed value. The fixed value is set to be 0 by default.

User time series modeling

To learn the change of user characteristics with time in the process of event propagation, the construction of user time series is needed. The user time series modeling process is shown in Fig. 4. To form a feature matrix, it arranges user feature vectors according to the time sequence of user comments.

Fig. 4
figure 4

Flowchart of user time series modeling

It is worth noting that if the event set is intercepted with the same propagation time, the number of participating users under the event set is obviously not exactly the same. That is, the user time series is a variable length series. Specifically, the number of participants involved by events are different from each other. For example, after 10 minutes of event propagation, the number of users participating in some events is 20, the number of users participating in some events is 5, and even some events have no users participating. Therefore, the user time series needs to be completed or truncated, and the variable length sequence is converted into a fixed length sequence, so that the data of the input model has the same dimension. The specific operation flow is shown in Algorithm 1.

figure a

Algorithm 1. Sequence completion algorithm

Feature extraction

The feature extraction module mainly learns the potential time difference features of users, and its overall structure is shown in Fig. 5. This module is mainly composed of three parts: feature enhancement, local feature extraction and global feature extraction, which will be elaborated as follows.

Fig. 5
figure 5

Structure diagram of feature extraction module

Feature enhancement

In the traditional rumor detection model, the feature vectors used to represent user information are trained by the data in the open domain. For example, (social network platform) the number of users' followers and personal profile in social network platform can support the training of feature vectors. These data are different from the topics involved in the current rumor detection task, hence these feature vectors may be biased by specific topics. In order to represent the user characteristics under specific events more accurately and achieve the purpose of enhancing the original user characteristics, this paper designs a feature enhancement function. The calculation formula of the feature enhancement function is shown in Eq (2).

$$f(t)={e}^{-(\sigma (\sum_{k = 1}^{n}{w}_{ik}{x}_{kj}+{b}_{ij})+\beta )t}$$
(2)

where \(t\) is the time span that denotes the difference between the time of the user participation and event occurrence, \(i\) represents the user Posting order, \({x}_{ij}\) represents the jth feature of the ith post user. \({W}_{0}\epsilon {R}^{n\times n}\) and \({B}_{0}\epsilon {R}^{n\times d}\) are initial weight matrices and bias matrices that are randomly set, where there are \({w}_{ik}\epsilon {W}_{0}\), \({b}_{ij}\epsilon {B}_{0}\).\(\sigma (\cdot )\) represents the activation function, and the ReLU function is chosen as the activation function to avoid negative values. \(\beta\) is the fixed deviation value, giving a biased upper bound on the final weight.

The feature matrix \(X\) is processed by the feature enhancement function to obtain the attention weight matrix \(V\). The aging feature matrix \({X}^{\mathrm{^{\prime}}}\) is obtained after matrix dot multiplication of the attention matrix \(V\) and the feature matrix \(X\).

The original feature matrix in the early detection of rumors is small and the deep learning model may lose some original information by directly using the user aging feature matrix. Hence we use the balance coefficient to combine the two matrices of the original feature matrix and the aging feature matrix. Then the feature matrix \({X}_{final}\) is obtained, the user characteristics can be represented by it. The formula for calculating \({X}_{final}\) is shown in Eq (3), where \(\mathrm{\varphi }\) is the equilibrium factor.

$${X}_{final}=X+\varphi \cdot {X}^{\prime}$$
(3)

Local feature extraction

The convolutional layer of the CNN can extract the local features of the input data, it has better generalization and feature extraction ability when compared with other models. Hence CNN is leveraged to learn the local change information of the user time series. The CNN model structure is shown in Fig. 6.

Fig. 6
figure 6

The structure of CNN

The user feature matrix \({X}_{final}\) enhanced by feature enhancement function is fed to the convolutional neural network. The specific implementation process is as follows:

  1. 1)

    Convolution layer. Input data \({X}_{final}\) first enters the convolution layer, and the convolution kernel \({W}_{f}\in {R}^{r\times d}\) in the convolution layer performs convolution operations on \({X}_{t:t+r-1}\epsilon {R}^{r\times d}\), where \(r\) represents the size of the convolution kernel and \(d\) represents the dimension of user characteristics. Then, the activation function ReLU is used to calculate the feature representation \({c}_{t}\in R\), as is shown in Eq (4). \({b}_{f}\) denotes the bias parameter and the resulting vector is \(C={\left[{c}_{1},{c}_{2},...,{c}_{t},\dots ,{c}_{n-r+1}\right]}^{T}\)

.

$${c}_{t}=max({W}_{f} \cdot {X}_{t:t+r-1}+{b}_{f},0)$$
(4)
  1. 2)

    Pooling layer. The average pooling method is used to pool the output data of the convolutional layer. The calculation method is shown in Eq (5).

$$c=\frac{1}{n}\sum\nolimits_{t = 1}^{n-r+1}{c}_{t}$$
(5)
  1. 3)

    Concatenation. In this paper, \(m\) convolution kernels of different sizes \({X}_{t:t+r-1}\epsilon {R}^{r\times d}\) are used to perform the same convolution operation, and then the obtained vectors are concatenated. Finally, the feature representation \(c\epsilon {R}^{m}\) is obtained.

Global feature extraction

Recurrent Neural Network (RNN) is a kind of neural network model specially used to process sequence data. It has the characteristics of memory and recursion, and can dynamically model the input sequence and extract time series features, so as to achieve more precise and accurate prediction and classification. Compared with other RNN, GRU model uses fewer parameters, is easier to calculate, and has lower time cost. Therefore, the proposed method uses GRU model to learn the temporal relationship and hidden features of user time series, and realize the extraction of user global features in the process of event propagation. Figure 7 shows the structure of GRU model.

Fig. 7
figure 7

Structure of GRU

\({{\text{x}}}_{t}(t=\mathrm{1,2},\dots ,n)\) represents the input data of the GRU mode in Fig. 7, where there are two inputs of the GRU unit: the user aging feature \({x}_{t}\) corresponding to the GRU unit and the output state \({h}_{t-1}\) at the last time. The GRU model first takes the user time series in the matrix \({X}_{final}\) as the input data of the model, and then uses the GRU unit to learn the input sequence data. Finally, it performs the mean pooling operation on these output states \({{\text{h}}}_{t}(t=\mathrm{1,2},\dots ,n)\) to obtain the user global time series feature representation \(h\). The calculation method is shown in Eq (6).

$$h=\frac{1}{n} \sum\nolimits_{t=1}^{n}{h}_{t}$$
(6)

Feature fusion and classification

In the rumor detection task, both local and global features are very important sources of information. Compared with local features, global features contain broader and deeper user characteristics and behavioral patterns, which can provide more comprehensive and accurate rumor detection information. Therefore, in order to better realize the task of rumor detection classification, local features and global features need to be fused and classified. A common approach is to concatenate the representation vectors of the two types of features together, and meanwhile further process and classify them by fully connected layers and activation functions. The advantages of the two features can be fully utilized to enhance the robustness and accuracy of the model with this method, hence the performance of the rumor detection would be improved. The module structure is shown in Fig. 8, and the specific calculation process is shown in Algorithm 2.

figure b

Algorithm 2. Calculation flow of feature fusion and classification

Fig. 8
figure 8

Structure of feature fusion and classification module

Experiment results and analysis dataset and parameter settings

The experiment evaluation is conducted on the microblog public data set [11]. The data set includes all user-related information used in the model, such as the user's nickname, personal profile, number of fans, number of followers and so on [11]. There are a total of 4664 events in the dataset, among which there are 2313 rumor events and 2351 non-rumor events. The total number of users is 2,746,818, and the total number of comments is 3,805,656. The relevant information of the Weibo public dataset is shown in Table 3.

Table 3 Statistical table of related information of microblog dataset

In this paper, rumors are used as positive samples and the events in the dataset are divided according to the ratio of 7:2:1. Specifically, in this chapter, 70% of the events are used as the training set to train the model, 20% of the events are used as the test set l to evaluate the model, and 10% of the events are used as the validation set to tune. Table 4 shows the division of the data set.

Table 4 Details of data set partition table

The loss functions used during model training as well as some hyperparameter Settings are shown in Table 5.

Table 5 Hyperparameter settings of TBUFCM model

Baseline model comparison experiment

Baseline model

  1. a)

    DTC [21]: This model combines user profiles and original Posting content, and uses decision tree classifiers to implement rumor detection tasks.

  2. b)

    SVM-RBF [8]: The model combines the statistical features of posts and SVM classifier based on RBF to realize the rumor detection task.

  3. c)

    SVM-TS [7]: The model combines time series and support vector machine classifier to realize rumor detection task.

  4. d)

    DTR [4]: The model combines text features and decision tree classifiers to realize rumor detection.

  5. e)

    GRU [12]: The model combines propagation structure and GRU model to realize rumor detection task.

  6. f)

    RFC [9]: The model combines user features, text features and propagation structure features, and uses a random forest classifier to detect rumors.

  7. g)

    PPC [26]: The model uses the user characteristics, and realizes the rumor detection task by combining the RNN model and the CNN model to learn the time series of the user characteristics in the rumor sequential propagation path.

Experimental result and analysis

The experimental performance of different models on the dataset is shown in Table 6.

Table 6 Results of model comparison

Observing the experimental results in Table 6, the experiment evaluation is conducted on the microblog public data set [11], the data set includes all user-related information used in the model, the experimental results of the proposed TBUFCM model on microblog data are better than the baseline model, and the accuracy of the model reaches 92.9%, which verifies the effectiveness of the TBUFCM model in the rumor detection task. It is noteworthy that both our model and PPC are grounded on CNN and GRU, and the proposed model outperforms PPC. The superiority of our model indicates that the user timeliness feature proposed in this paper can achieve the effect of enhancing user characteristics.

In addition, the early detection effect of the model is also evaluated in this section, and the experimental results are shown in Fig. 9.

Fig. 9
figure 9

Comparison experiment of early detection

As shown in Fig. 9, it is clear that the accuracy of TBUFCM is higher than that of the other models under the same number of retweets, and the accuracy of TBUFCM can also reach 92% with only 20 retweets. These results demonstrate that TBUFCM has a good performance in the early rumor detection task. Significantly, it is difficult to quantitatively analyze the early detection effect of the model by directly using the event propagation time, but after analyzing the data, it is found that there is a certain law between the event propagation time and the number of retweets participating in the event propagation. Generally, there are about 30 retweets after 5 minutes of event propagation, hence this section uses the number of retweets participating in the event propagation to indirectly represent the event propagation time. The quantitative analysis of the early detection effect of the model was realized by changing the number of retweets participating in the event propagation. That is, the number of retweets was used as an independent variable to observe the change of accuracy.

In summary, the experimental results indicate that TBUFCM has a good performance in both the accuracy and early rumor detection tasks.

Conclusion

In the security of news data on the Internet, the user features used in previous rumor detection models may have biases due to differences in event themes. To address this issue, this article designs a rumor detection model called TBUFCM. The model first uses feature enhancement functions to generate user feature vectors, which can more accurately represent and highlight the associations and differences between users. Then, GRU and CNN are used to extract global and local time changes of features, respectively. The experimental results show that the performance of TBUFCM is improved compared to the baseline model. The proposed model also has its limitations.

This article mainly achieves the task of rumor classification by learning user features under a single event. That is to say, the user set under an event is considered to exist in isolation and ignores the coupling and correlation between events. Ultimately, the proposed method can effectively address the security issues of news data dissemination in the internet environment. In future research, the symbiotic relationship between users under rumor events and real events can be simulated to further improve the performance of rumor detection models and ensure the safe dissemination of news data on the Internet.

In the following research, in addition to the task of detecting false information, there are many other tasks in the security issues of news data dissemination. For example, in the face of news data dissemination links, there are also false information adversarial tasks. The adversarial task of false information will be a new problem encountered in news data dissemination, which is full of hope and prospects and is expected to be solved by everyone. This article will continue to focus on the latest developments in the field of deep learning, explore and introduce new algorithms, and further deepen this research in conjunction with large language models to better improve performance.

References

  1. Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web, p 729-736

  2. Liang G, He W, Xu C, Chen L, Zeng J (2015) Rumor identification in microblogging systems based on users’ behavior. IEEE Trans Comput Soc Syst 2(3):99–108

    Article  Google Scholar 

  3. Sun S, Liu H, He J, Du X (2013) Detecting event rumors on sina weibo automatically. In Web Technologies and Applications: 15th Asia-Pacific Web Conference, APWeb 2013, Sydney, Australia, April 4-6, 2013. Proceedings 15. Springer, Berlin, Heidelberg, p 120-131

  4. Kwon S, Cha M, Jung K (2017) Rumor detection over varying time windows. PloS one 12(1):e0168344

    Article  Google Scholar 

  5. Zhang Q, Zhang S, Dong J, Xiong J, Cheng X (2015) Automatic detection of rumor on social network. In Natural Language Processing and Chinese Computing: 4th CCF Conference, NLPCC 2015, Nanchang, China, October 9-13, 2015, Proceedings 4. Springer International Publishing, p 113-122

  6. Wu K, Yang S, Zhu KQ (2015) False rumors detection on sina weibo by propagation structures. In 2015 IEEE 31st international conference on data engineering. IEEE, p 651-662

  7. Ma J, Gao W, Wei Z, Lu Y, Wong K-F (2015) Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM international on conference on information and knowledge management, p 1751-1754

  8. Yang F, Liu Y, Yu X, Yang M (2012) Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD workshop on mining data semantics, p 1-7

  9. Kwon S, Cha M, Jung K, Chen W, Wang Y (2013) Prominent features of rumor propagation in online social media. In 2013 IEEE 13th international conference on data mining. IEEE, p 1103-1108

  10. Zhao Z, Resnick P, Mei Q (2015) Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proceedings of the 24th international conference on world wide web, p 1395-1405

  11. Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong KF, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks

    Google Scholar 

  12. Ma J, Gao W, Wong K-F (2018) Rumor detection on twitter with tree-structured recursive neural networks. Association for Computational Linguistics

  13. Xu Y, Wang C, Dan Z, Sun S, Dong F (2019) Deep recurrent neural network and data filtering for rumor detection on sina weibo. Symmetry 11(11):1408

    Article  Google Scholar 

  14. Sujana Y, Li J, Kao HY (2020) Rumor Detection on Twitter Using Multiloss Hierarchical BiLSTM with an Attenuation Factor[J]. https://doi.org/10.48550/arXiv.2011.00259

  15. Yu F, Liu Q, Wu S, Wang L, Tan T (2017) A Convolutional Approach for Misinformation Identification. In IJCAI, p 3901-3907

  16. Monti F, Frasca F, Eynard D, Mannion D, Bronstein M (2019) Fake news detection on social media using geometric deep learning. arXiv preprint. arXiv:1902.06673

  17. Wu Z, Pi D, Chen J, Xie M, Cao J (2020) Rumor detection based on propagation graph neural network with attention mechanism. Expert Syst Appl 158:113595

    Article  Google Scholar 

  18. Lu Y-J, Li C-T (2020) GCAN: Graph-aware co-attention networks for explainable fake news detection on social media. arXiv preprint arXiv:2004.11648

  19. Rao D, Miao X, Jiang Z, Li R (2021) STANKER: Stacking network based on level-grained attention-masked BERT for rumor detection on social media. In Proceedings of the 2021 conference on empirical methods in natural language processing, p 3347-3363

  20. Ma J, Gao W, Wong K-F (2019) Detect rumors on twitter by promoting information campaigns with generative adversarial learning. In The world wide web conference, p 3049-3055

  21. Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. Proceedings of the 20th international conference on World wide web

  22. Popat K (2017) Assessing the credibility of claims on the web. In Proceedings of the 26th international conference on world wide web companion, p 735-739

  23. Potthast M, Kiesel J, Reinartz K, Bevendorff J, Stein B (2017) A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638

  24. Guo C, Cao J, Zhang X, Shu K, Yu M (2019) Exploiting emotions for fake news detection on social media. arXiv preprint arXiv:1903.01728

  25. Ruchansky N, Seo S, Liu Y (2017) Csi: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, p 797-806

  26. Liu Y, Wu Y-F (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Proceedings of the AAAI conference on artificial intelligence 32:(1)

  27. Jiang J-Y, Li C-T, Chen Y, Wang W (2018) Identifying users behind shared accounts in online streaming services. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, p 65-74

  28. Shu K, Bernard HR, Liu H (2019) Studying fake news via network analysis: detection and mitigation. Emerging research challenges and opportunities in computational social network analysis and mining 43-65

  29. Dou Y, Shu K, Xia C, Yu PS, Sun L (2021) User preference-aware fake news detection. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, p 2051-2055

  30. Lin H, Ma J, Cheng M, Yang Z, Chen L, Chen G Rumor detection on twitter with claim-guided hierarchical graph attention networks. arXiv preprint arXiv:2110.04522

  31. Tian L, Zhang XJ, Lau JH (2022) Duck: Rumour detection on social media by modelling user and comment propagation networks. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p 4939-4949

  32. Min E, Rong Y, Bian Y, Xu T, Zhao P, Huang J, Ananiadou S (2022) Divide-and-conquer: Post-user interaction network for fake news detection on social media. In: Proceedings of the ACM web conference 2022, p 1148-1158

  33. Lian Z, Zhang C, Su C, Dharejo FA, Almutiq M, Memon MH (2023) FIND: Privacy-Enhanced Federated Learning for Intelligent Fake News Detection. IEEE Transactions on Computational Social Systems

  34. Phan HT, Nguyen NT, Hwang D (2023) Fake news detection: A survey of graph neural network methods. Applied Soft Computing 110235

  35. Liao Q et al (2021) An integrated multi-task model for fake news detection. IEEE Trans Knowl Data Eng 34(11):5154–65

    Article  Google Scholar 

  36. Zrnec A, Poženel M, Lavbič D (2022) Users’ability to perceive misinformation: an information quality assessment approach. Inform Process Manage 59(1):102739

    Article  Google Scholar 

  37. Guo Y, et al (2023) MDG: Fusion learning of the maximal diffusion, deep propagation and global structure features of fake news. Expert Systems with Applications 213

  38. Hu L, Chen Z, Zhao Z, Yin J, Nie L (2022) Causal inference for leveraging image-text matching bias in multi-modal fake news detection. IEEE Transactions on Knowledge and Data Engineering

  39. Gupta R et al (2022) Quantum machine learning driven malicious user prediction for cloud network communications. IEEE Netw Lett 4(4):174–178

    Article  Google Scholar 

  40. Gupta R et al (2022) Differential and triphase adaptive learning-based privacy-preserving model for medical data in cloud environment. IEEE Netw Lett 4(4):217–221

    Article  Google Scholar 

  41. Gupta R, Singh AK (2022) A differential approach for data and classification service-based privacy-preserving machine learning model in cloud environment. New Generation Computing 40(3):737–764

    Article  Google Scholar 

  42. Singh AK, Gupta R (2022) A privacy-preserving model based on differential approach for sensitive data in cloud environment. Multimed Tools Appl 81(23):33127–33150

    Article  Google Scholar 

  43. Tripathi A, Singh U, Bansal G, Gupta R, Singh AK (2020) A review on emotion detection and classification using speech. In Proceedings of the international conference on innovative computing & communications (ICICC)

Download references

Funding

This work is partly supported by “the Fundamental Research Funds for the Central Universities CUC230A013”, Natural Science Foundation of Beijing Municipality (No. 4222038), National Natural Science Foundation of China (Grant No. 62176240).

Author information

Authors and Affiliations

Authors

Contributions

W.S., K.S. and X.W. wrote the main manuscript text, T.Y and Y.Z. prepared Figs. 1, 2 and 3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Tong Yi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, K., Shang, W., Zhang, Y. et al. Security issues of news data dissemination in internet environment. J Cloud Comp 13, 68 (2024). https://doi.org/10.1186/s13677-024-00632-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-024-00632-w

Keywords