Cloud-based intelligent self-diagnosis and department recommendation service using Chinese medical BERT

With the rapid development of hospital informatization and Internet medical service in recent years, most hospitals have launched online hospital appointment registration systems to remove patient queues and improve the efficiency of medical services. However, most of the patients lack professional medical knowledge and have no idea of how to choose department when registering. To instruct the patients to seek medical care and register effectively, we proposed CIDRS, an intelligent self-diagnosis and department recommendation framework based on Chinese medical Bidirectional Encoder Representations from Transformers (BERT) in the cloud computing environment. We also established a Chinese BERT model (CHMBERT) trained on a large-scale Chinese medical text corpus. This model was used to optimize self-diagnosis and department recommendation tasks. To solve the limited computing power of terminals, we deployed the proposed framework in a cloud computing environment based on container and micro-service technologies. Real-world medical datasets from hospitals were used in the experiments, and results showed that the proposed model was superior to the traditional deep learning models and other pre-trained language models in terms of performance.


Background
China is a country of large medical services and about 8 billion medical visits annually. Thus, hospitals in China, especially tertiary class hospitals, are always overcrowded with patients. This scenario directly leads to heavy workload for doctors, long queuing time, and poor medical treatment experience for patients. The appointment service has been applied in most hospitals to improve the efficiency of medical treatment and reduce the queuing time. Scheduling an appointment in advance through the website or mobile APP is convenient for patients. However, reservation registration also brings new problems. For instance, patients lacking professional medical *Correspondence: zhangka81@126.com 1 Key Laboratory for Virtual Geographic Environment Ministry of Education Nanjing Normal University, 210008 Nanjing, China Full list of author information is available at the end of the article knowledge have no idea how to choose the appropriate department of registration. Making an appointment registration according to previous experience is unsuitable because registering in an inappropriate department leads to patient inconvenience and tremendous waste in healthcare resources. Therefore, a method that predicts the type of disease according to patients' chief complaints and then accurately recommends registration departments for patients is the key to solve the problem.
With the widespread application of hospital information systems, abundant diagnosis and treatment data of patients have been collected by electronic medical record systems in hospitals. The rapid development of cloud computing, big data, and artificial intelligence has provided favorable conditions to construct self-diagnosis and registration department recommendation systems for patients with big medical data [1][2][3]. The medical data in electronic medical record systems are mainly (2021) 10:4 Page 2 of 12 unstructured text, which needs natural language processing (NLP) technology to model. The task in this paper is mainly related to text classification. Text classification methods consist of traditional shallow algorithms (e.g., support vector machine (SVM), random forest, and Bayes) and deep learning algorithms (e.g., Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Hierarchical Attention Network (HAN), Bidirectional Encoder Representations from Transformers (BERT)). The deep learning method performs better than the traditional methods. In deep learning algorithms, the pre-trained language model BERT has been widely studied and applied, and has achieved state-of-the-art performance in many NLP tasks [4][5][6]. However, there are some limitations in applying the existing pre-training language model directly to medical text mining. First, the performance of BERT on medical text mining needs further evaluation because this model is trained on general text datasets. Second, medical and general texts differ in word distribution [7]. Training a BERT model in the medical domain to perform medical text mining is urgently needed. The number of parameters of BERT influences its performance. In general, more model parameters correspond to better effect. Large pre-trained models usually take up large space and run slowly in intelligent terminals with low computing power, causing difficulty for the models to run directly on terminals.
For the above mentioned challenges, we proposed an intelligent self-diagnosis and department recommendation model based on Chinese medical BERT in the cloud computing environment. The model was deployed in the cloud. It first predicted the type of disease on the basis of chief complaints, then recommended registration departments for patients, finally provided medical help-seeking advice for patients and avoided medical resources waste. In order to verify our model, a Chinese BERT model CHMBERT was trained on medical text data. The proposed framework CIDRS was tested on the inpatient dataset from a tertiary class hospital in Jiangsu Province, China. The experimental results showed that the proposed model achieved the best performance compared with the state-of-the-art methods.

Contributions
The contributions of this work are as follows: 1) A Chinese medical pre-trained language model trained on a large-scale medical text corpus from more than 100 hospitals was proposed. The text corpus was collected by the Jiangsu National Health Information Platform. The proposed model is the first medical BERT model trained on such a large scale of Chinese medical corpus, which is of great significance for evaluating the effectiveness of the pre-trained model in the Chinese medical domain.
2) An intelligent self-diagnosis and department recommendation model based on Chinese medical BERT (CHMBERT) was proposed. This model predicted diseases and recommended registration departments according to chief complaint. Thus, it can provide medical help-seeking advice for patients effectively and avoid medical resources waste. Furthermore, the model was deployed in the cloud computing framework, which can solve the problem of low terminal computing power.
3) Experiments were performed on a large-scale medical dataset by comparing with current popular algorithms and other pre-trained models to evaluate the effectiveness of our proposed framework.
The rest of this paper is organized as follows. In "Preliminary knowledge" section, we depict the pretrained model BERT and medical texts. "Methodology" section introduces the proposed self-diagnosis and department recommendation model and the corresponding cloud computing framework CIDRS. "Experiments" section evaluates the proposed CHMBERT model over real-world medical texts. "Related work" section summarizes the current research. Finally, we conclude our study and provides future research direction in "Conclusion" section.

Preliminary knowledge
This section discusses the preliminary knowledge of the medical text data and pre-trained BERT model.

Medical text data
Medical text data mainly include patient demographic information, past history data, clinical diagnosis and treatment data, and so on [8].
Demographic information: contains the patient's basic information, including name, gender, age, and address.
Past history data: contains summary information of the patient's historical health, including disease history, allergy history, surgical history, trauma history, blood transfusion history, family/genetic history, and hospital history.
Clinical diagnosis and treatment data: contains the patient's detailed clinical diagnosis and treatment information, including the symptoms and signs, chief complaint, history of present illness, diagnosis name, treatment process description, test results, and examination report.

BERT
BERT [4] is a pre-trained language model that has been widely used in the past year. It has achieved state-ofthe-art effects in multiple downstream NLP tasks, such as Machine Translation, Named Entity Recognition, Text Classification, Reading Comprehension, and Question Answering. BERT extracts context features by using a bidirectional transformer encoder [9], which has deeper levels and better parallelism. In the pre-training process of BERT, the input is constructed by summing over Token Embedding, Segment Embedding, and Position Embedding. New target tasks, namely, Masked Language Model and Next Sentence Prediction, are designed to analyze the relationship between words and sentences and to learn the corresponding expressions. The pre-trained BERT model can be fine-tuned through an additional output layer, which is widely applicable to the construction of NLP downstream tasks without large architectural modifications for specific tasks. Due to the space limitation, the detailed introduction of BERT can be found in the reference [4].

Methodology
Our proposed CIDRS framework includes a cloud computing platform and a self-diagnosis and department recommendation model. The model is deployed on the cloud computing platform after executing offline training on GPU. The users upload chief complaints to the cloud platform through terminals, and then the deployed model predict disease and recommend departments for users.

Cloud computing framework
The self-diagnosis and department recommendation model is hard to deploy on terminals because of its high computational power and large storage space requirements. Cloud computing service can solve the low computational power problem of user terminals [10,11]. By adopting cloud computing architecture, the proposed model can be efficiently run on multi-CPUs/GPUs computing resources, whose capacity can be expanded dynamically as users' requests increase. It reduces the cost of hardware and maintenance for service providers [12,13].
In this study, the cloud computing framework was deployed based on container and micro-service technologies. All of the containers were deployed on a Kubernetes cluster. Figure 1 illustrates the proposed cloud computing framework, which mainly includes Web application service, authentication service, query processing service, model service, monitoring service, a configuration database, and a user database. Web application service served the middle layer, which was called by the user terminals by publishing a REST service interface. Considering the privacy of medical diagnosis and treatment data, we executed bidirectional authentication on the client and the server by using a Kerberos authentication service to guarantee the reliability of both communication sides and the security of data transmission. Request data were listened by query processing service and routed to the model service deployed on the Kubernetes cluster, which included a disease prediction model and a department recommendation model. The model service predicted the type of disease and recommended a department on the basis of the input request data. The monitoring service was responsible for monitoring latency and workload throughput of the query processing service and the model service. Adding copies of the model service is convenient when its single instance fails to meet the throughput requirements of the service workload. The user and configuration databases were used to store user information and configuration parameters for service components, respectively. Our proposed cloud computing framework guaranteed the flexibility and security of the whole system. The above framework was implemented based on Java and TensorFlow Serving. The Java-based Web application service, authentication service, query processing service, model service, and monitoring service were developed with SpringBoot, and corresponding docker images were created based on CentOS base image and JDK1.8. The docker image of the user database was created with the MariaDB base image, and the docker image of the configuration database was created with the Redis base image. The docker image of the model service was built with the base image of TensorFlow Serving, and nvidia-docker should be installed on the hosts if GPU was used. When deploying the framework in production environment, at least three hosts are required to form a Kubernetes cluster. It is recommended to configure 64GB memory, 32 core CPU, 4T disk and 1 Tesla V100 graphics card for each host. All the services are deployed via containerization in a Kubernetes cluster. In order to guarantee high availability, each service needs to start at least two instances. Containers are spread across hosts in Kubernetes, therefore, when one host fails, the services on other hosts can still run normally.

Disease prediction and department recommendation model
A chief complaint [14] includes a patient's self-reported symptoms, signs, nature, and duration. The model in this section makes predictions according to the chief complaint of a patient, and outputs the possible disease category and the recommended department. The disease prediction and department recommendation task can be transformed into two separate text classification problems: 1) predicting the disease category according to the chief complaint and 2) predicting the department category according to the chief complaint. Considering the great success of the BERT-based pre-trained model in NLP tasks, we pre-trained a Chinese medical BERT model and obtained two fine-tuned models by fine-tuning the classification tasks on disease prediction and department recommendation.

Pre-training a Chinese medical BERT model
The traditional Chinese BERT model is a universal language representation model pre-trained on Wikipedia corpus. However, medical texts contain several professional terms and differ in word distribution from general texts. Therefore, NLP models designed for universal natural language understanding always perform poorly in medical text mining tasks [7]. To solve this problem, we constructed a medical text corpus based on more than 100 hospitals from the Jiangsu Regional Health Information Platform, including past history data and clinical diagnosis and treatment data. The corpus data, which consist of complaints, hospital admissions, progress notes, and discharge records, were obtained from the electronic medical record system in hospitals. The data were about 185GB in size. CHMBERT, a BERT model that focuses on the Chinese medical domain, was trained on this corpus. The original BERT code, model structure, and parameters were used to train the CHMBERT model on the Chinese medical corpus. The original Chinese BERT model parameters were utilized for CHM-BERT initialization instead of training from scratch to improve the computational efficiency. In the pre-training process, the maximum length of the sentence was set to 128, and the number of training steps was set to 10 million (100K) steps. Finally, about a month was needed to complete the pre-training process on a Tesla V100 GPU.

Fine-tuning CHMBERT for disease prediction and department recommendation
On the basis of the pre-trained CHMBERT model, a fully connected output layer Softmax was used to finetune the two classification tasks in this paper. The input text sequence of a chief complaint was described as X = {x 1 , x 2 , . . . , x L }, and x i denotes a Chinese character, 0 < i ≤ L, where L is the maximum length of input text sequence. For example, the input chief complaint is , then X ={ }. The text sequence X was encoded into a fixed-length sentence vector S through the CHMBERT model, which was expressed by Formula (1) as follows: where CHMBERTsent(.) represents the transformation from text sequence into sentence vector. Then the sentence vector S was passed into a fully connected layer with dropout using Formula (2) as follows: Finally, the Softmax layer will output the probability distribution of disease or department category according to Formula (3) as follows: where p k denotes the probability that sentence vector S belongs to category k, and j exp w T k S is a normalized item. The cross-entropy loss function and Adam optimization algorithm were used for fine-tuning the model parameters.

Dataset
The purpose of our model was to predict disease category and recommend registration department according to patient's chief complaint. To verify the performance of our model, we selected 200,000 inpatients' chief complaints and the corresponding disease diagnosis codes and treatment departments from a tertiary class hospital from January 2015 to December 2018. In electronic medical records, the disease diagnosis codes were classified by International Classification of Diseases (ICD)-10 (https:// www.who.int/classifications/icd/en/), and only the first three bits of the ICD code were considered in this paper. After data cleaning and filtering, 198,000 records were collected, including 130 types of disease diagnosis and 25 departments, covering about 80% of the inpatients. In the dataset, the maximum and minimum lengths of sentences were 36 and 2, respectively, and the average length was 12. The total number of Chinese characters was 1456. The dataset was divided into a training set, a validation set, and a test set in a ratio of 70:15:15. Figure 2 illustrates the distribution of the number of patients with 130 types of diseases, which follows the power-law distribution. The top 30 diseases account for about 50% of patients. The disease with the largest number of patients is K80 (Cholelithiasis), and the disease with the least number of patients is O36 (Maternal care for other known or suspected fetal problems). Figure 3 illustrates the distribution of the number of patients in 25 departments, which presents the power-law distribution as well, and the top three departments (general surgery department, obstetrics department and Vasculocardiology Department) in the number of patients accounted for about 35% of the total patients.

Baselines
• TextCNN [15]. It is a text classification algorithm based on CNN. It utilizes multiple convolution kernels in different sizes to extract key information from sentences, which can capture local correlation of sentences. TextCNN has simple architecture and fast training speed, achieving state-of-the-art results on multiple datasets. • BiLSTM [16]. RNN is a widely applied NLP model that can process variable length text sequences and learn long distance dependencies from sentences. In this experiment, a single-layer bidirectional LSTM network was utilized to classify the input text.

Fig. 3 The distribution of the number of patients in 25 departments
• LEAM [17].It is a model based on attention mechanism. It performs well in text representation by learning the joint embedding of word and label in the same space. Compared with other attention-based models, LEAM needs fewer model parameters and converges faster, and has good interpretability. • Transformer [9].It is a sequence processing model based on self-attention mechanism, which can learn long-distance dependency from sentences. It can run in parallel paradigm and is the basis of BERT and other pre-trained models. • BERT-base [4]. It is the original Chinese BERT pre-trained model published by Google, which achieves the state-of-the-art performance in many text classification tasks. • BERT-wwm [18]. The updated version of BERT, published by Harbin Industrial University, is a Chinese pre-trained model based on Whole Word Masking technology. Its performance is slightly better than that of the original BERT in sentence classification task.

Implementation details
We selected the optimum parameters on the validation dataset through parameter tuning. The differences of the experimental results with different parameters were small, indicating that the clinical dataset in this paper was insensitive to parameters. In addition, Chinese BERT-base was segmented by character size without considering Chinese word segmentation in traditional NLP. In our experiments, the word segmentation was not under consideration either. The same word embedding size, batch size, and maximum sentence length of 64, 128, and 36, respectively, were adopted in the models of TextCNN, LSTM, LEAM, and Transformer. The Adam algorithm was utilized for optimization. The number of iterations (epochs) was not limited, and the training process was conducted until the accuracy was not improved for 10 consecutive iterations. The parameters were set as follows: TextCNN: Four types of convolution kernels with sizes of 2, 3, 4, and 5 were used. Each convolution kernel contained 128 kernels. The fully connected layer contained 256 neurons. The dropout was 0.5, and the learning rate was 1e4.
BiLSTM: The number of neurons in the LSTM hidden layer and the full connection layer was 128, dropout was 0.2, and the learning rate was 0.001.
LEAM: The label penalty coefficient was 1.0, the convolution kernel size was 3, and the number of neurons in the hidden layer was 300. The dropout was 0.5, and the learning rate was 0.001. Transformer: The numbers of encoder layers and heads were 4 and 8, respectively, and the number of neurons in the Point wise feed forward network was 512. The dropout was 0.1, and the learning rate was 2e5.
BERT-base: The parameter setting should be same as that in the original BERT model when tuning the pretrained model. The parameters in this paper were set as follows. The maximum sentence length was 36, and the batch size was 16. The number of epochs ranged from 1 to 5, and the tuning ranges of learning rates were 5e-6, 1e-5, 2e-5, 3e5, 4e5, and 5e-5 [4].
The parameter setting and the corresponding tuning ranges of BERT-wwm and CHMBERT were the same as those in BERT-base.

Experimental results
The commonly used Accuracy and F1 score in NLP classification task were used as evaluation criteria to compare the effects of different models. The same chief complaints may lead to different diseases; for instance, stomachache may be caused by enteritis, appendicitis, or other diseases. Therefore, top-k prediction results were calculated when predicting the type of disease. The k values were set to 1, 5, and 10, respectively, in these experiments. Similarly, more than one choice of first diagnosis department may be present on the basis of chief complaints. Thus, we obtained the prediction results of top-k when k=1, 2, and 3 when predicting the departments. The experimental results of disease and department prediction of different models are shown in Tables 1  and 2.
Tables 1 and 2 show that the pre-trained models based on BERT were significantly better than other state-of-theart models. The CHMBERT model proposed in this paper performed the best among the tested models, which indicated that the pre-trained model had great potential in medical NLP task. As for the non-pre-trained models, text-CNN performed the best, followed by the Transformer models, whereas LSTM and LEAM performed the worst.
In the disease prediction experiment, the proposed CHMBERT model showed obvious advantages in the top-1 prediction. Compared with those of the sub-optimal model BERT-wwm, the accuracy and F1 of CHMBERT improved by 0.16% and 0.39%, respectively. Compared with those of text-CNN, which performed the best among the non-pre-trained models, the accuracy and F1 of CHMBERT improved by 0.9% and 1.35% respectively. In the prediction of top-5 and top-10, the performance of CHMBERT was similar to that of the sub-optimal model and slightly better than that of the text-CNN model.
In the department prediction experiment, our CHM-BERT model achieved the best results. Compared with those of the sub-optimal model, the accuracy and F1 of CHMBERT improved by 0.14% and 0.59%, respectively, in the top-1 prediction. Compared with those of text-CNN, the prediction accuracy and F1 of CHMBERT improved by 0.79% and 1.74%, respectively, in the top-1 prediction. In the prediction of top-2 and top-3, the CHMBERT model also performed better than the BERT-wwm and text-CNN models.

Parameters discussion
We compared the performance of the CHMBERT model in disease prediction and department prediction with different learning rates and epochs. Figure 4 shows the top-1 prediction accuracy with different epochs when the learning rate was fixed with 2e5. Figure 5 shows the top-1 prediction accuracy with different learning rates when the number of epochs was set to 3.
As shown in Figs. 4 and 5, the prediction accuracy of CHMBERT was less affected by parameters. When the learning rate was fixed at 2e5 and the number of epochs varied from 1 to 5, the differences between the maximum and minimum values of disease prediction accuracy and department prediction accuracy were 1.58% and 0.78%, respectively. When the number of epochs was fixed at 3, the differences between the maximum and minimum values of the disease prediction accuracy and department prediction accuracy were 1.11% and 0.51% under different  The best performance is boldfaced learning rates, respectively. In general, when the number of epochs was small (such as 1, 2) and the learning rate was small (such as 5e6, 1e5), the performance was poor. These results indicate that 3 or 4 is the recommended number of epochs while 2e5, 3e5, or 5e5 is the recommended learning rate.

Text classification and pre-trained model
Text classification has attracted considerable attention as an important NLP task. In the early stage, shallow machine learning models (e.g., SVM [19] and logistic regression [20]) were utilized for text classification, and the performance was highly dependent on manually extracted features. With the rapid development in recent years, deep learning models that can automatically extract text features have been widely used in NLP tasks and achieved optimal results in text classification tasks, such as CNN, RNN, and the variants, such as GRU and LSTM/bi-lstm. The TextCNN [15] proposed by Kim [23] model. The text was represented by the hierarchical structure of "word-sentencearticle, " and different weights were assigned to words and sentences according to the attention mechanism, which can effectively improve the long text classification accuracy.
In the past year, the pre-trained models represented by BERT, such as XLNet [5] and RoBERTa [6] , have achieved great success in many NLP tasks. In terms of Chinese pre-trained models, ERNIE [24] as a representation model of knowledge enhancement was proposed by Baidu Company. It exceeded BERT when applied in Chinese datasets. The Chinese pre-trained model BERTwwm [18] with full word coverage was released by Harbin Institute of Technology. It demonstrated the best performance among current Chinese pre-trained models. As for domain-oriented pre-trained models, Beltagy [25] et al. trained SciBERT for the scientific domain based on the scientific publication corpus, and it performed well in the text analysis of scientific datasets. Lee et al. proposed the BioBERT [7] model in the biomedical domain and found that it can achieve the best effect on multiple biomedical datasets. The lack of a pre-trained model in the Chinese medical domain limits the application of pre-trained models in medical text mining.

Disease classification
Several studies on automatic disease classification have been published. Most recent research has focused on deep neural network model. Shi [26] et al. used the LSTM network in character and word level to learn the diagnosis description and the implied representation of ICD name and matched them through the attention mechanism to achieve the automatic coding of ICD. Mullenbach [27] et al. proposed a model to predict disease diagnosis codes according to patients' discharge records based on CNN and label attention mechanism, which made the model highly interpretable. Zeng [28] proposed a deep transfer model that transfers the knowledge learned from the Medical Subject Headings index task to the ICD coding task and improved the effect of ICD coding. Li [29] et al. learned text patterns in different lengths based on a convolutional layer with multiple filters, and augmented the acceptable domain through the residual convolutional layer to classify ICD, which achieved good performance in MIMIC medical datasets. At present, disease diagnosis classification mainly focuses on English datasets, but no Chinese medical data related research has been found in this area. In addition, research on disease classification using BERT and other pre-trained models is lacking.

Cloud computing
Cloud computing has the advantages of virtualization, high availability and scalability, and low requirements for the users' terminals. It can be quickly deployed and has been widely applied in different domains [30][31][32].
Many issues remain to be tackled in cloud computing. Recent studies have focused on the security issues of cloud computing platforms and their application services [33,34], privacy protection [35][36][37], optimization of energy consumption [38,39], load balancing and resource scheduling [40][41][42]. In terms of cloud deployment for machine learning models, some frameworks have been proposed, such as Clipper (http://clipper.ai/), developed by UC Berkeley RISE Lab and Graphpipe (https://oracle. github.io/graphpipe/), which is the cloud deployment tool of Oracle's open source machine learning model. However, the cloud deployment of machine learning models remains to have many challenges, such as model extension and scalability, performance tuning, security, continuous integration, and deployment, which need further study. In terms of disease diagnosis based on cloud computing, Chen et al. [43] proposed a Disease Diagnosis and Treatment Recommendation System (DDTRS) based on Apache Spark cloud platform, which has high performance and low latency response. Lin et al. [8] put forward a cloud-based framework for implementing Home-diagnosis. In the framework, a distributed Lucenebased search engine was designed to provide scalable online and highly concurrent medical record retrieval service.

Fig. 5
The prediction accuracy with different learning rates when the number of iterations was 3

Conclusion
A cloud computing service framework for disease prediction and department recommendation was proposed to guide patients to seek medical diagnosis and treatment effectively and avoid the waste of medical resources. A pre-trained language model in the Chinese medical domain CHMBERT was trained on large-scale Chinese medical corpus for the first time and used to optimize disease prediction and department recommendation tasks. Experimental results on the real-world medical datasets showed that our model achieved the best effect, which was superior to the traditional deep learning models and other pre-trained models. The pre-trained model for the medical domain has great potential in medical text mining tasks. In addition, our model provided services through the cloud computing environment, which can overcome the insufficient computing power of user terminals. In our future work, we will utilize additional medical data to train our model for disease prediction and department recommendation and further improve the performance and availability of the model. In addition, we will further optimize the pre-trained model in the medical domain and try additional parameters and other advanced pre-trained methods, such as RoBERTa, XLNet, and ALBERT. We will further evaluate the performance of CHMBERT in other Chinese medical text mining tasks.