In view of the above discussion on the status quo of recommendation model research, this paper proposes a hybrid recommendation model based on deep emotion analysis of user reviews and cooperative fusion of multi-source recommendation views, named DMHR. The process of DMHR hybrid recommendation model is as follow: firstly, The perspective pre-filtering method [33] is used to achieve a comprehensive measure of the user’s emotional tendency and the original rating level, and provides a more accurate comprehensive scoring data reflecting the user’s real interest preference for the post-based collaborative filtering recommendation model. Simultaneously, the text information of the content description of the post is mined, and the neural network method is used to represent it as a distributed paragraph vector, realizing the similarity calculation of the content of the post, and then a recommendation model based on the content of the post is constructed. Secondly, the cooperative training strategy is used to achieve the fusion of two recommended views, adding a data selection strategy based on confidence estimation and cluster analysis in collaborative training, and eliminating the data distribution deviation added to the training data pool in the iterative training. Finally, On this basis, the initial recommendation results are filtered and sorted by using the scoring matrix and the similarity of posts output from the collaborative training model, and the final recommendation results are obtained. The deviation of the user’s original score from the user’s real interest preference is corrected by mining the emotional tendency of user’s reviews for the next recommendation. The hybrid recommendation model system framework is shown in Fig. 1.
Emotional analysis of user reviews
Distributed vector representation of user review text
Through statistical analysis of the user review text in the recommendation model, it is found that the presentation form is usually a keyword and a short text. Research shows that these short text messages are usually processed differently from long text. The short text has the characteristics of short length and irregular grammar, which makes traditional natural language processing technology powerless in short text analysis. Early analysis and application of short text mainly rely on enumeration or keyword matching, avoiding the semantic understanding of text, while automatic short text understanding usually relies on additional knowledge. In this paper, we use the keyword representation method based on word vector to solve the dimension disaster of traditional sparse representation and the problem of unable to express semantic information. At the same time, the association attributes between words are also mined, which improves the accuracy of the semantic representation of keywords.
Word2vec is a predictive model for high-efficiency word nesting learning, including two variants of CBOW model and Skip-Gram model [34]. CBOW predicts the probability of occurrence of a central word through words within the window, while Skip-Gram is based on the probability that the word appears within the window of the central word prediction. Its training goal is to find the vector representation of the words useful for predicting the surrounding words in sentences or documents. If for a given sentence, ω1, ω2, …, ωT means the words in the sentence, the objective function g(ω) of Skip-Gram model is to maximize the average logarithmic probability.
$$ \mathrm{g}\left(\upomega \right)=\frac{1}{T}{\sum}_{t=1}^T{\sum}_{-c\le j\le c,j\ne 0\log}\log p\left({\upomega}_{t+j}|{\upomega}_t\right) $$
(3)
In the above formula, c denotes the number of training texts, the larger c is, and the higher the accuracy of the model may be. The Skip-Gram model uses the hierarchical Softmax function to define p(ωt + j| ωt). Hierarchy Softmax uses W words as the binary tree representation of the leaf’s output layer. For each node, the relative probability of its sub-nodes is clearly expressed. Random walk algorithm is used to assign the probability of each word.
Word2vec automatically learns syntactic and semantic information from large-scale unlabeled user reviews, enabling the characterization of keywords in user reviews. The use of Word2vec to vectorize the short text information of user reviews is mainly divided into the following two steps:
(i) According to the collected user review text data, using the Skip-Gram or CBOW training word vector model, each word is expressed as a K-dimensional vector real value;
(ii) For the short text of user reviews, Top-N words are extracted to express the emotion of the text based on word segmentation using TF-IDF and other algorithms, and then K-dimensional vector representation of the extracted Top-N words is found from the word vector model.
After obtaining the K-dimension real vector representation of each key word, a common method is to use weighted average method to process the vector of the key word, which is equivalent to the vector representation of the user review text, in order to realize the emotional analysis of the review information. This weighted averaging method ignores the influence of word order on the affective prediction model. Because word vector representation based on Word2vec is only based on the dimension of words to carry out “semantic analysis”, while weighted average processing of word vectors does not have the ability of “semantic analysis” of context. Therefore, this paper constructs an emotional computing model based on word vector and long short-term memory network to realize the emotional analysis of user reviews.
Emotional calculation based on word vector and long short-term memory network
In text information processing, the commonly used method is the Recurrent Neural Network (RNN) [35]. However, RNN can lead to the disappearance of gradient in optimization when dealing with long sequences. To solve this problem, the researchers proposed a threshold (Gated RNN), the most famous of which is the Long Short-Term Memory Network (LSTM) [36]. The research also shows that the neural network with LSTM structure performs better than that with the standard RNN network in many tasks.
LSTM uses a “gate” structure to remove or add information to the cell state. It achieves the purpose of enhancing or forgetting information by adding three “gate” structures of input gate, forgetting gate and output gate in the neuron, so that the weight of the self-loop is changed. The model based on LSTM can effectively avoid the gradient expansion and even disappearance of the RNN network structure by dynamically changing the accumulation at different times when the parameters are fixed. In the LSTM network structure, the calculation formula of each LSTM unit is as shown in formulas (4) to (9).
$$ {f}_t=\upsigma \left({W}_f\bullet \left[{h}_{t-1},{x}_t\right]+{b}_f\right) $$
(4)
$$ {i}_t=\upsigma \left({W}_i\bullet \left[{h}_{t-1},{x}_t\right]+{b}_i\right) $$
(5)
$$ \overset{\sim }{C_t}=\tanh \left({W}_C\bullet \left[{h}_{t-1},{x}_t\right]+{b}_C\right) $$
(6)
$$ {C}_t={f}_t\ast {C}_{t-1}+{i}_t\ast \overset{\sim }{C_t} $$
(7)
$$ {O}_t=\upsigma \left({W}_O\bullet \left[{h}_{t-1},{x}_t\right]+{b}_O\right) $$
(8)
$$ {h}_t={O}_t\ast \tanh \left({C}_t\right) $$
(9)
In formulas (4) ~ (9), ft denotes the forgetting gate, it denotes the input gate, Ot denotes the output gate; \( \overset{\sim }{C_t} \) denotes the state of the cell at the previous moment, Ct denotes the state of the current cell, and ht − 1 and ht respectively represent the previous moment unit output and current unit output.
In this paper, an emotional analysis method based on Word2vec and LSTM is presented as Fig. 2. Firstly, the input of matrix form is coded into the one-dimensional vector by Word2vec to save most useful information; Then, LSTM algorithm is used to train the emotional classification model of user review text, and the grading prediction of user review is realized. At the same time, in order to take account of the interaction of user ratings and review information on real emotions, this paper uses the pre-filtering method based on viewpoints and the embedding method based on user ratings to fuse user ratings and emotional prediction ratings respectively. The former uses the LSTM network to get the prediction score, and then weights the sum with the original user score. The method based on user score embedding combines the LSTM network vector with the user rating information, and uses the result as the input of the last layer to directly output the final comprehensive score.
Based on the method of perspective pre-filtering, the emotion analysis of user review text modeling is performed by Word2vec and LSTM, and the emotional tendency score scorer of each user’s review on the post is predicted, and the user’s original score is weighted and summed to obtain a comprehensive score scorec.
$$ {score}_c=\alpha\ {score}_r+100\left(1-\alpha \right)\ {score}_H $$
(10)
In the above formula, scorer represents the user’s emotional prediction score for the post review, scoreH represents post’s authority value in HITS algorithm, due to the limit of the number of data taken, the post’s authority value is small. In order to increase its impact on the results, it is expanded by 100 times. α is the balance factor between the two scores.
The method based on user rating embedding is based on the emotional analysis of the user review information, combining the obtained LSTM output vector with the user rating information, then the above result is used as the input to the last layer (fully connected layer) and the final comprehensive emotional score is directly output via the SoftMax activation function.
$$ {H}_i={h}_tS\mathrm{core}\otimes \left({\mathrm{User}}_i\right) $$
(11)
Calculation of similarity based on post content
In the recommendation model, since the natural language description of the post content is short and mostly incomplete, and usually does not follow the grammatical rules. Thus this paper uses the paragraph vector [37] to distribute the short text of the post content description. Paragraph vector is a neural network-based implicit short text comprehension model, which uses a short text vector as “context” to assist in reasoning. In maximal likelihood estimation, text vector is also updated as model parameters. It also adds encoding to the paragraph during the model training process compared with the text vector representation method based on Word2vec. Like ordinary words, paragraph coding is also mapped to a vector (i.e. paragraph coding vector). In the calculation, paragraph coding vectors and word vectors are accumulated or connected as input of SoftMax in the output layer. The paragraph code remains unchanged during the training of the text description of the post, and semantic information of the entire sentence is integrated every time the word probability is predicted. In the prediction phase, a new paragraph code is assigned to the description text of the post content while keeping the parameters of the word vector and the input layer SoftMax unchanged. Finally, the gradient descent method is used to train the new post description text until it converges, resulting in a low-dimensional vector representation of the post content. The distributed representation of the paragraph vector of the post content is shown below (Fig. 3).
After obtaining the unique d-dimensional distributed vector representation of the post content, the similarity and distance between each two post contents can be obtained by the similarity calculation. This paper uses the cosine formula to measure the similarity between two posts, and uses the Mahala Nobis distance to calculate the distance between the natural language descriptions of the two posts. Assume that the paragraph vectors of the natural language description of the two post contents are represented as PVa = (x11, x12, …, x1d) and PVb = (x21, x22, …, x2d), where d denotes the dimensions of two paragraph vectors. Then the similarity and distance between them are defined as follows:
$$ \mathrm{sim}\left({\mathrm{PV}}_a,{\mathrm{PV}}_b\right)=\frac{PV_d\bullet {PV}_d}{{\left\Vert {PV}_d\right\Vert}^2\bullet {\left\Vert {PV}_d\right\Vert}^2}=\frac{\sum \limits_{i=0}^{i=d}{x}_{1i}{x}_{2i}}{\sqrt{\sum \limits_{i=0}^{i=d}{x}_{1i}^2\sqrt{\sum \limits_{i=0}^{i=d}{x}_{2i}^2}}} $$
(12)
$$ dis\left({\mathrm{PV}}_a,{\mathrm{PV}}_b\right)=\sqrt{{\left({\mathrm{PV}}_a-{\mathrm{PV}}_b\right)}^T{S}^{-1}\left({\mathrm{PV}}_a-{\mathrm{PV}}_b\right)} $$
(13)
where S is the covariance matrix of eigenvectors PVa and PVb.
Multi-source view fusion based on collaborative training
In the construction of the hybrid recommendation model, this paper uses the user comprehensive scoring view to build a post-based collaborative filtering recommendation model; at the same time, a recommendation model based on post content is constructed by using the natural language description view of post content; Finally, the fusion of two recommendation views is realized based on cooperative training strategy. In data selection, data selection algorithm based on confidence estimation and clustering analysis is used to filter the data, and then added to the training data pool of another classifier for the next round of training, so as to iterate.
Hybrid recommendation algorithm based on collaborative training
The hybrid recommendation algorithm based on collaborative training is used to construct the initial scoring matrix based on the user’s scoring of the posts. Then the perspective pre-filtering method is used to measure the composite score to update the scoring matrix. Finally, a hybrid recommendation algorithm based on collaborative training is designed in which the scoring matrix is cyclically filled and optimized according to the vector similarity of the comprehensive scoring matrix and the post content description, so as to achieve recommendation and sorting. Besides the hybrid recommendation algorithm based on collaborative training is shown in the following Fig. 4.
In the recommendation model, the score of user u on post p is recorded as Ru(p) which takes from post’s authority value in HITS algorithm; The corresponding scoring matrix is Rm × n(U, P), where the row vector m represents the number of users, and the column vector n represents the number of posts. In the object-based collaborative filtering recommendation model, input the user’s original scoring matrix Rm × n(U, P), where Ru(p) ∈ [0, 1], and the virtual scoring matrix \( {\overrightarrow{R}}_{m\times n}\left(U,P\right) \) predicted by the emotion analysis model, where \( {\overrightarrow{R}}_u(p)\in \left\{0,1\right\} \), 0 means that the user’s emotion is negative, and 1 means that the user’s emotion is positive, output as data set Dtrain. The description of the post-based collaborative filtering recommendation algorithm is as shown in Algorithm 1.
In Algorithm 1, the post-based collaborative filtering recommendation method is used to populate the default value of the user’s scoring matrix and update the training data set of user u at the same time. In the emotional classification model, it is generally divided into fine-grained (5-level classification) and coarse-grained (2-level classification). Considering that the accuracy of the 2-level emotional classification model is much higher than that of the 5-level emotional classification model, this paper adopts 2-level emotional classification in the recommendation algorithm. The user’s emotions were set to 1 point and 0 point, respectively. Then, the user’s emotional scores and original scores were comprehensively measured by means of viewpoint pre-filtering. Finally, the post-based collaborative filtering model is used to predict and fill the scoring matrix, and the data selection algorithm based on confidence estimation and cluster analysis is used to filter the data, and add the incremental data to the training data set of user u.
In the content-based description model, K-nearest neighbor algorithm is used to calculate the distance of content description, and the cosine similarity of posts and the Mahala Nobis distance of K nearest neighbor posts are used to update or fill in the user’s score and default value, which is then used in the content-based recommendation model for the next iteration. The description of the recommendation algorithm based on the content of the post is as shown in Algorithm 2.
The multiple recommended techniques are mixed within the hybrid recommendation method to compensate for the shortcomings and achieve better recommendations. Different from traditional hybrid recommendation technologies, such as weighted fusion, hybrid recommendation and cascade recommendation, the collaborative training strategy is used in this paper to construct a hybrid model of collaborative filtering recommendation based on posts (Algorithm 1) and content-based recommendation (Algorithm 2). In each iterative training process of the collaborative training model, the calculated comprehensive scoring data is used to train the scoring prediction model to achieve the filling and updating of the scoring matrix. Then, the training model based on the content of the post is trained to be scored according to the updated scoring matrix and the content description information of the post (the posts with the score ≥ 0.7 and the score ≤ 0.3 are respectively placed in the training pool of the post that the user likes and dislikes). The matrix is filled and updated, and it is used as the input of the post-based collaborative filtering recommendation model for the next iteration training. This paper proposes a hybrid recommendation method based on collaborative training compared with weighted fusion hybrid recommendation, which needs to adjust the weight of each recommendation result, the difficulty of ranking hybrid recommendation, and the staged process of cascaded recommendation, which makes full use of user’s scoring information of the post (Post Profile view) and the content description information of the post (metadata view of the post) in each iteration training to achieve the fusion of the two kinds of recommendation views and a better mixed recommendation effect.
-
1)
Data selection in collaborative training
In this paper, a data selection strategy is added to construct the collaborative training model to filter the data to join the training pool. Each grade of the user is specified as a category in the data. The training data in the data pool is tagged data, and the data to be predicted is unlabeled data. In the data selection strategy, not only the confidence score of the sample belongs to a certain category, but also the selected samples are evenly distributed in each cluster, which can avoid the large estimation bias of the selected training data on the Gaussian distribution. A data selection algorithm based on confidence estimation and cluster analysis is described as Algorithm 3.