Security issues of news data dissemination in internet environment

With the rise of artificial intelligence and the development of social media, people’s communication is more convenient and convenient. However, in the Internet environment, the untrue dissemination of news data leads to a large number of problems. Efficient and automatic detection of rumors in social platforms hence has become an important research direction in recent years. This paper leverages deep learning methods to mine the changing trend of user features related to rumor events, and designs a rumor detection model called Time Based User Feature Capture Model(TBUFCM). To obtain a new feature vector representing the user’s comprehensive features under the current event, the proposed model first recomputes the user feature vector by using feature enhancement function. Then it utilizes GRU(Gate Recurrent Unit, GRU) and CNN(Convolutional Neural Networks, CNN) models to learn the global and local changes of user features, respectively. Finally, the hidden rumor features in the process of rumor propagation can be discovered by user and time information. The experimental results show that TBUFCM outperforms the baseline model, and when there are only 20 forwarded posts, it can also reach an accuracy of 92%. The proposed method can effectively solve the security problem of news data dissemination in the Internet environment.


Introduction
In the era of cloud computing, the Internet has become the main channel for news dissemination.However, the rapid spread of false information has brought serious challenges to the security of news data.The widespread application of cloud computing technology has provided convenience for the storage and processing of news data, but at the same time, the openness of the cloud environment has also created opportunities for the spread of false information.Our research is committed to understanding and solving this problem.By integrating knowledge in cloud computing, data mining and artificial intelligence, we propose innovative false information detection methods to ensure the security of news data dissemination based on cloud computing in the Internet environment.
Rumor detection, as one of the most important tasks in artificial intelligence security, is currently an urgent problem to be solved.Rumor detection had received widespread attention in recent years and many related methods had been proposed.Most of the previous rumor detection methods regard rumor detection as a binary classification problem in supervised learning.That is, the constructed classifier can classify the targeted information into rumor or non-rumor labels.Existing rumor features are mainly selected from the text content, user attributes, and information dissemination.Taking Sina Weibo as an example, a piece of information published by a community user includes text content, user attributes, and social communication features (click likes, comments, retweets, etc.).The previous rumor detection methods can be divided into two categories: rumor detection based on traditional machine learning and rumor detection based on deep learning.
The rumor detection based on traditional machine learning involves three processes: (1) select and extract features that can effectively represent rumors from the training data set; (2) leverage the features to train a classifier on the data set; (3) utilize the trained classifier to predict the unknown data and judge whether the data is a rumor.Commonly used classifiers include decision tree classifiers [1][2][3][4], support vector machines [2,[5][6][7][8], Naive Bayes [1,2] and random forests [9,10], etc.The traditional machine learning methods have limitations in data representation ability and rumor detection accuracy, although they have made certain achievements in the rumor detection.
The rumor detection based on deep learning can automatically extract deeper features from the original data, the heavy feature engineering can be avoided.At the meantime the accuracy of rumor detection can be improved greatly.Recurrent Neural Network and Convolutional Neural Networks (CNN) are widely used in rumor detection tasks, where Recurrent Neural Network (RNN) is well-suited for time series information mining and CNN can learn local features by convolution operations.Ma et al. [11] proposed a rumor detection model based on recurrent neural network for the first time, which made a new breakthrough in the rumor detection task by learning the semantic features and timing information of text content.Ma et al. [12] proposed a tree-structured recurrent neural network to distinguish rumors from non-rumors by learning the non-sequential propagation structure of text content.Xu et al. [13] developed a deep recurrent neural network based on symmetric network structure, where the reply text according to the number of users' fans is filtered, and the characteristics of dynamic time signals is captured.Sujana et al. [14] presented a multi-loss hierarchical BiL-STM model with decay factor.Yu et al. [15] proposed a rumor detection method based on convolutional neural network (CAMI method), and verified the effectiveness of convolutional neural network in rumor detection task by experiments.In addition to RNN and CNN, the other deep learning models such as Graph Neural Networks, Transformer models and graph attention networks had also been widely used in rumor detection tasks [16,17].Lu et al. [18] proposed a new rumor detection model by the attention mechanism and several neural network models, where the user features, propagation feature sand text content can be better combined.Rao et al. [19] designed a new BERT model with language noise for the layer granularity technique, namely LPGAM-BERT model, which masks the insignificant common concerns between source posts and low-level reviews.Ma et al. [20] developed a rumor detection model based on generative adversarial networks with robust and effectiveness guarantees.Compared with the rumor detection based on machine learning method, the rumor detection based on deep learning can bring in the better detection effect.
Most of the previous rumor detection methods only consider the text content.The single information source and incomplete content would adversely affect the utility of the detection model.In addition, most of the previous methods ignore the situation that user features may shift over time under different topics, the association and difference between users cannot be illustrated well.To alleviate these issues, we take the user's personal information, behavior information and user influence into account while building user characteristics of multiple dimensions.Besides, we leverage deep learning models to study the change of user characteristics in the process of spreading trend.The proposed method can recapture user characteristics to adapt to the needs of the current topic in the specific scenario of rumor detection.The main contributions of this paper can be summarized as follows.
(1) In order to solve the problem that user characteristics will shift with the development of the event in the process of event

Related work
For the data stored in the current cloud platform, people often only care about the security of data storage and use, but not the authenticity of the data itself.To control the security of data transmission, the first thing is to control the authenticity of the source of data.It is particularly important to incorporate true and correct data into the dissemination of news data in the Internet environment.
There are three categories of features that can be used to distinguish between rumors and non-rumors in rumor detection: text features, user features and propagation features.Early research mostly extracted features from news text content to detect the authenticity of news articles.The text content includes the original post content and the user's response.For example, Castillo et al. [21] regarded the proportion of positive blog posts, the proportion of negative blog posts, and the proportion of blog posts including the first personal pronoun as text features, and adopted C4.5 classifiers to identify false news topics on Twitter.Yang et al. [8] considered link URLs, the number of positive and negative emotional words, while analyzing the text features.Then the SVM classifier is utilized to achieve rumor detection tasks.Ma et al. [11] leveraged Recurrent Neural Network to learn user response characteristics by useful query phrase of user response.The rumor detection methods in literature [22][23][24] focus on the language style, writing style and social emotion, respectively.
However, the early rumor detection methods only considered text content and has their limitations.For example, a single information source and the unavailable content would adversely influence the performance of detection mode.To solve these issues, this paper introduced user and propagation features into the research method for rumor detection tasks.The rumor detection method based on user characteristics mainly simulates the user characteristics that forward and respond to the source event, relying on user feature information to achieve the purpose of detecting the authenticity of the event.Yang et al. [8] learned the l feature of users by their basic information.The common user information includes the user authentication information, gender, hometown, and fan count.Ruchansky et al. [25] leveraged the user relationship matrix to learn user features where the concept of trustworthy scores was introduced.Liu et al. [26] developed a rumor detection model that combines RNN and CNN to learn local and global characteristics of users to get a better representation forwarding users.To identify users in shared accounts, Jiang et al. [27] exploited a session based heterogeneous graph model to learn user features.Through experimental verification, Shu et al. [28] indicated that there was a significant difference between the basic information of users who are involved by the fake new and the real one.Dou et al. [29] learned the implicit preference characteristics of users by analyzing their historical posts and introduced the concept of content information.The detection method based on propagation features mainly utilizes the propagation structure in social networks to detect rumors.Ma et al. [12] developed a Recursive Neural Network based on tree structure to learn the structural information of rumor propagation.Literature [25,28] jointly learns the order relationship of user forwarded comments and their correlation with news content, and utilizes the attention mechanism to provide interpretability.Lin et al. [30] represented the conversation structure as an undirected interaction graph and proposed the ClaHi-GAT model to encode the undirected interaction graph.This model consists of a Graph Attention Networks and a hierarchical Graph Attention Networks based on declaration guidance.It can alleviate the negative impact of irrelevant post interactions while enhancing user opinion interaction.Tian et al. [31] proposed a DUCK model based on user and comment propagation networks, where the model comment content, conversation structure and user interaction networks were illustrated by Transformer and graph attention networks jointly.Min et al. [32] proposed a PSIN model based on user information and posts, where a divide and conquered strategy were leveraged to illustrate the heterogeneous relationships.
The dissemination of news data is mainly carried out through social media and information dissemination platforms.Lian et al. [33] proposed an intelligent fake news detection system FIND based on federated learning (FL) in the literature, which to some extent solves the problem that the spread of fake news often brings serious harm to society and individuals.Phan et al. [34] believes that the need for false information detection to filter fake news in social networks is crucial, and outlines a comprehensive method for GNN to implement a fake news detection system in the article.Liao et al. [35] believes that the spread of false information can have a series of negative impacts on society, so he proposed the FDML model to solve the problem of false information and further reduce its spread.Zrnec et al. [36] believes that digital information exchange can bring about information sharing, and social media has become the main source of news for people.However, these behaviors have caused a proliferation of false news, which poses challenges for the safe dissemination of internet news data.An automatic false news detection system is needed to solve this problem.Other scholars [37,38] have proposed new false information detection algorithms, and scholars are attempting to further enhance the dissemination security of internet news data through technological means.Gupta et al. [39] proposed a novel malicious user prediction model based on quantum machine learning that estimates the vicious entity present in the communication system precedently before allocating the data in the distributed environments, it significantly improves the security of the system.Gupta et al. [40] proposed a novel secure data protection model for privacy preservation in the cloud environment by partitioning, sanitizing, and analyzing the data effectively to improve the model's privacy.Gupta et al. [41] proposed a novel model based on differential privacy and machine learning approaches that enable multiple owners to share their data for utilization and the classifier to render classification services for users in the cloud environment, and also provided a robust mechanism to preserve the privacy of data and the classifier.Singh et al. [42] proposed a novel model that partitions data into sensitive and nonsensitive parts, injects the noise into sensitive data, and performs classification tasks using k-anonymization, differential privacy, and machine learning approaches.The recent developments on sentiment analysis using speech and different problems related to the same have been presented.Different types of features of emotional speech data and extraction techniques concerned with them are described [43].Therefore, the dissemination of news in the Internet space needs to ensure the authenticity of news data.One of the security issues in news data dissemination is to ensure the authenticity of news data dissemination.
There is relatively little research on the security issues of information security and news data dissemination in current cloud computing.In this article, a new perspective will be used to examine this issue, providing a better processing approach and means for news data stored on cloud platforms.Based on the forementioned research work, this paper proposes a rumor detection model, called TBUFCM, which considers the user deep features and temporal signal changes during user information transmission.The proposed method can improve the performance in rumor detection.

TBUFCM model
The TBUFCM model consists of three key modules: data preprocessing, feature extraction, and feature fusion and classification.The data preprocessing module is responsible for cleaning, segmenting, removing stop words, and quantifying user comments.The feature extraction module uses user basic attributes, time series, behavioral information, and comment text to learn user temporal features, propagation behavior features, and comment language features.The feature fusion and classification module integrates different features and achieves accurate classification of rumors and non rumors through a classifier.This model structure effectively captures multi-dimensional user characteristics and improves the performance of rumor detection.

Definition of problem
The rumor detection model proposed in this paper belongs to a two-category task, which is to judge whether the event is a rumor or a non-rumor based on the sequence of users participating in the event propagation (users are ranked according to the chronological order of users' comments).The objective function is as follows.
where y ∈ {0, 1} , 0 and 1 denotes that the event is true and is a rumor, respectively.
The mathematical definition of rumor detection task implemented by the model is as follows.
Suppose that S = s 1 , s 2 , s 3 , • • • , s |S| is the event set, where s i represents the ith event.Let U = u 1 , u 2 , u 3 , • • • , u |U | be the user set.Every u j ∈ U corresponds to an x j ∈ R d vector which represents user eigenvector.When event s i occurs, some users will share it and generate a series of forwarding behaviors.This series of time series are denoted as where each tuple x j , t represents that user u j release or forward events s i at time t.The user characteristic matrix that participates event s i propagation is , where n represents num- ber of users who propagated participation of event s i , x t denotes the eigenvector of the propagated user who is the ith participator event s i .

Model architecture
TBUFCM is mainly composed of three modules: data preprocessing module, feature extraction module, and feature fusion and classification module.The overview of the proposed model can be seen in Fig. 1.
1) Data preprocessing.The main task of this module is to process the user comment data under the event, and convert the user comment content into the word vector form that the model can learn from.This module includes data cleaning, word segmentation, delete stop words, word vectorization and other steps.
2) Feature extraction.This module aims to learn the user's time series features, propagation behavior features and user's comment language features.There are user basic attributes, time series, behavior information and user comment text information can be utilized.The module consists of three parts: the user's time series features learning model, the user's communication behavior feature learning model and

Data preprocessing
The overall structure of the data preprocessing module is shown in Fig. 2, which mainly implements two tasks: user feature vector representation and user time series modeling.Both the two tasks would be elaborated in this subsection.

The representation of user's feature vector
The main purpose of representation for user's feature vector is to represent the user's basic information in the form of vectors and obtain the user's initial features, this task is very helpful for the subsequent training and prediction of the model.The specific implementation process is shown in Fig. 3. First of all, the user data required by the model is collected according to Table 1, which includes the name of the account, personal information introduction, number of followers, number of fans, number of historical posts, user authentication information and whether the geographical location is enabled.
Then, different types of user information in Table 1 are preprocessed to obtain the corresponding vector representation of this type of information, and the processing method is shown in Table 2. Finally, the vector representation of all types of information is spliced to obtain the user feature vector representation.
Different preprocessing methods are leveraged to process various user information.

1) Character length representation method:
The number of characters in character user data is counted to indirectly represent user characteristics.2) Numerical representation method: For the user data whose type is numeric, its own numerical value is directly used to represent the user characteristics.3) Time representation method: Calculating the days difference between two times, and using the day value to indirectly represent the user characteristics, that means calculating the days difference between the user comment time and the user registration time, and obtaining the characteristics indirectly.4) Boolean representation method: For a user profile with only two categories, convert the categories to Boolean values, and use Boolean values (0 or 1) to indirectly represent user characteristics.
The user information in Table 1 is closely related to the social media platformand it is inevitable that there would be missing user information.Hence this section fills in the missing content of user information by the fixed value filling, where the missing information is replaced by a fixed value.The fixed value is set to be 0 by default.

User time series modeling
To learn the change of user characteristics with time in the process of event propagation, the construction of user time series is needed.The user time series modeling process is shown in Fig. 4. To form a feature matrix, it arranges user feature vectors according to the time sequence of user comments.
It is worth noting that if the event set is intercepted with the same propagation time, the number of participating users under the event set is obviously not exactly the same.That is, the user time series is a variable length series.Specifically, the number of participants involved by events are different from each other.For example, after 10 minutes of event propagation, the number of users participating in some events is 20, the number of users participating in some events is 5, and even some events have no users participating.Therefore, the user time series needs to be completed or truncated, and the variable length sequence is converted into a fixed length sequence, so that the data of the input model has the same dimension.The specific operation flow is shown in Algorithm 1.

Feature extraction
The feature extraction module mainly learns the potential time difference features of users, and its overall structure is shown in Fig. 5.This module is mainly composed of three parts: feature enhancement, local feature extraction and global feature extraction, which will be elaborated as follows.

Feature enhancement
In the traditional rumor detection model, the feature vectors used to represent user information are trained by the data in the open domain.For example, (social network platform) the number of users' followers and personal profile in social network platform can support the training of feature vectors.These data are different from the topics involved in the current rumor detection task, hence these feature vectors may be biased by specific topics.In order to represent the user characteristics under specific events more accurately and achieve the purpose of enhancing the original user characteristics, this paper designs a feature enhancement function.The calculation formula of the feature enhancement function is shown in Eq (2).
where t is the time span that denotes the difference between the time of the user participation and event occurrence, i represents the user Posting order, x ij rep- resents the jth feature of the ith post user.W 0 ǫR n×n and B 0 ǫR n×d are initial weight matrices and bias matrices that are randomly set, where there are w ik ǫW 0 , b ij ǫB 0 .σ(•) represents the activation function, and the ReLU function is chosen as the activation function to avoid negative The feature matrix X is processed by the feature enhance- ment function to obtain the attention weight matrix V .The aging feature matrix X ′ is obtained after matrix dot multiplication of the attention matrix V and the feature matrix X.
The original feature matrix in the early detection of rumors is small and the deep learning model may lose some original information by directly using the user aging feature matrix.Hence we use the balance coefficient to combine the two matrices of the original feature matrix and the aging feature matrix.Then the feature matrix X final is obtained, the user characteristics can be represented by it.The formula for calculating X final is shown in Eq (3), where ϕ is the equilibrium factor.

Local feature extraction
The convolutional layer of the CNN can extract the local features of the input data, it has better generalization and feature extraction ability when compared with (3) other models.Hence CNN is leveraged to learn the local change information of the user time series.The CNN model structure is shown in Fig. 6.
The user feature matrix X final enhanced by feature enhancement function is fed to the convolutional neural network.The specific implementation process is as follows: 1) Convolution layer.Input data X final first enters the convolution layer, and the convolution kernel W f ∈ R r×d in the convolution layer performs con- volution operations on X t:t+r−1 ǫR r×d , where r rep- resents the size of the convolution kernel and d rep- resents the dimension of user characteristics.Then, the activation function ReLU is used to calculate the feature representation c t ∈ R , as is shown in Eq (4).b f denotes the bias parameter and the resulting vec- 2) Pooling layer.The average pooling method is used to pool the output data of the convolutional layer.The calculation method is shown in Eq (5).
3) Concatenation.In this paper, m convolution kernels of different sizes X t:t+r−1 ǫR r×d are used to per- form the same convolution operation, and then the obtained vectors are concatenated.Finally, the feature representation cǫR m is obtained.

Global feature extraction
Recurrent Neural Network (RNN) is a kind of neural network model specially used to process sequence data.It has the characteristics of memory and recursion, and can dynamically model the input sequence and extract time series features, so as to achieve more precise and accurate prediction and classification.Compared with other RNN, GRU model uses fewer parameters, is easier to calculate, and has lower time cost.Therefore, the proposed method uses GRU model to learn the temporal relationship and hidden features of user time series, and realize the extraction of user global features in the process of event propagation.Figure 7 shows the structure of GRU model.
x t (t = 1, 2, . . ., n) represents the input data of the GRU mode in Fig. 7, where there are two inputs of the GRU unit: the user aging feature x t corresponding to the GRU unit and the output state h t−1 at the last time.The GRU model first takes the user time series in the matrix X final as the input data of the model, and then uses the GRU unit to learn the input sequence data.Finally, it performs the mean pooling operation on these output states h t (t = 1, 2, . . ., n) to obtain the user global time series feature representation h .The calcula- tion method is shown in Eq (6).

Feature fusion and classification
In the rumor detection task, both local and global features are very important sources of information.Compared with local features, global features contain broader and deeper user characteristics and behavioral patterns, which can provide more comprehensive and accurate rumor detection information.Therefore, in order to better realize the task of rumor detection classification, local features and global features need to be fused and classified.A common approach is to concatenate the representation vectors of the two types of features together, and meanwhile further process and classify them by fully connected layers and activation functions.The advantages of the two features can be fully utilized to enhance the robustness and accuracy of the model with this method, hence the performance of the rumor detection would be improved.The module structure is shown in Fig. 8, and the specific calculation process is shown in Algorithm 2.

Experiment results and analysis dataset parameter settings
The experiment evaluation is conducted on the microblog public data set [11].The data set includes all userrelated information used in the model, such as the user's nickname, personal profile, number of fans, number of followers and so on [11].There are a total of 4664 events in the dataset, among which there are 2313 rumor events and 2351 non-rumor events.The total number of users is 2,746,818, and the total number of comments is 3,805,656.The relevant information of the Weibo public dataset is shown in Table 3.
In this paper, rumors are used as positive samples and the events in the dataset are divided according to the ratio of 7:2:1.Specifically, in this chapter, 70% of the events are used as the training set to train the model, 20% of the events are used as the test set l to evaluate the model, and 10% of the events are used as the validation set to tune.Table 4 shows the division of the data set.
The loss functions used during model training as well as some hyperparameter Settings are shown in Table 5.

Baseline model comparison experiment
Baseline model a) DTC [21]: This model combines user profiles and original Posting content, and uses decision tree classifiers to implement rumor detection tasks.b) SVM-RBF [8]: The model combines the statistical features of posts and SVM classifier based on RBF to realize the rumor detection task.c) SVM-TS [7]: The model combines time series and support vector machine classifier to realize rumor detection task.d) DTR [4]: The model combines text features and decision tree classifiers to realize rumor detection.e) GRU [12]: The model combines propagation structure and GRU model to realize rumor detection task.f ) RFC [9]: The model combines user features, text features and propagation structure features, and uses a random forest classifier to detect rumors.g) PPC [26]: The model uses the user characteristics, and realizes the rumor detection task by combining the RNN model and the CNN model to learn the time series of the user characteristics in the rumor sequential propagation path.

Experimental result and analysis
The experimental performance of different models on the dataset is shown in Table 6.
Observing the experimental results in Table 6, the experiment evaluation is conducted on the microblog public data set [11], the data set includes all userrelated information used in the model, the experimental results of the proposed TBUFCM model on microblog data are better than the baseline model, and the accuracy of the model reaches 92.9%, which verifies the effectiveness of the TBUFCM model in the rumor detection task.It is noteworthy that both our model and PPC are grounded on CNN and GRU, and the proposed model outperforms PPC.The superiority of our model indicates that the user timeliness   feature proposed in this paper can achieve the effect of enhancing user characteristics.
In addition, the early detection effect of the model is also evaluated in this section, and the experimental results are shown in Fig. 9.
As shown in Fig. 9, it is clear that the accuracy of TBUFCM is higher than that of the other models under the same number of retweets, and the accuracy of TBUFCM can also reach 92% with only 20 retweets.These results demonstrate that TBUFCM has a good performance in the early rumor detection task.Significantly, it is difficult to quantitatively analyze the early detection effect of the model by directly using the event propagation time, but after analyzing the data, it is found that there is a certain law between the event propagation time and the number of retweets participating in the event propagation.Generally, there are about 30 retweets after 5 minutes of event propagation, hence this section uses the number of retweets participating in the event  propagation to indirectly represent the event propagation time.The quantitative analysis of the early detection effect of the model was realized by changing the number of retweets participating in the event propagation.That is, the number of retweets was used as an independent variable to observe the change of accuracy.In summary, the experimental results indicate that TBUFCM has a good performance in both the accuracy and early rumor detection tasks.

Conclusion
In the security of news data on the Internet, the user features used in previous rumor detection models may have biases due to differences in event themes.To address this issue, this article designs a rumor detection model called TBUFCM.The model first uses feature enhancement functions to generate user feature vectors, which can more accurately represent and highlight the associations and differences between users.Then, GRU and CNN are used to extract global and local time changes of features, respectively.The experimental results show that the performance of TBUFCM is improved compared to the baseline model.The proposed model also has its limitations.
This article mainly achieves the task of rumor classification by learning user features under a single event.That is to say, the user set under an event is considered to exist in isolation and ignores the coupling and correlation between events.Ultimately, the proposed method can effectively address the security issues of news data dissemination in the internet environment.In future research, the symbiotic relationship between users under rumor events and real events can be simulated to further improve the performance of rumor detection models and ensure the safe dissemination of news data on the Internet.
In the following research, in addition to the task of detecting false information, there are many other tasks in the security issues of news data dissemination.For example, in the face of news data dissemination links, there are also false information adversarial tasks.The adversarial task of false information will be a new problem encountered in news data dissemination, which is full of hope and prospects and is expected to be solved by everyone.This article will continue to focus on the latest developments in the field of deep learning, explore and introduce new algorithms, and further deepen this research in conjunction with large language models to better improve performance.

( 1 )
y = f (X) the user's comment language feature learning model.These three models are composed of the TBUFCM model, the Bi-GCN model, and the Text CNN-GRU model, respectively.3) Feature fusion and classification.This module aims to fuse different features extracted from the feature extraction module into a comprehensive feature representation.Meanwhile, the feature learning is needed.This module consists of two main parts: feature fusion and classifier.

Fig. 3
Fig. 3 Flowchart of the representation for user's feature vector

Table 1
User information table

Table 3
Statistical table of related information of microblog dataset

Table 4
Details of data set partition table

Table 5
Hyperparameter settings of TBUFCM model

Table 6
Results of model comparison Fig. 9 Comparison experiment of early detection