BVFLEMR: an integrated federated learning and blockchain technology for cloud-based medical records recommendation system

Blockchain is the latest boon in the world which handles mainly banking and finance. The blockchain is also used in the healthcare management system for effective maintenance of electronic health and medical records. The technology ensures security, privacy, and immutability. Federated Learning is a revolutionary learning technique in deep learning, which supports learning from the distributed environment. This work proposes a framework by integrating the blockchain and Federated Deep Learning in order to provide a tailored recommendation system. The work focuses on two modules of blockchain-based storage for electronic health records, where the blockchain uses a Hyperledger fabric and is capable of continuously monitoring and tracking the updates in the Electronic Health Records in the cloud server. In the second module, LightGBM and N-Gram models are used in the collaborative learning module to recommend a tailored treatment for the patient’s cloud-based database after analyzing the EHR. The work shows good accuracy. Several metrics like precision, recall, and F1 scores are measured showing its effective utilization in the cloud database security.


Introduction
In recent times, hospitals are moving from traditional records to Electronic Health records. This Electronic Health Record contains all the information regarding the diagnosis, treatment procedures, and other details of the patient for future use. Electronic Health Records have to be more secure since it contains sensitive information about the patient. A patient's entire history is stored in the blockchain [1] including diagnosis, treatment, medication, surgery, and diet specifications. The data in EHR is very sensitive and has to be maintained with high security in order to avoid misuse by unauthorized users or hackers. Third-parties or hackers can retrieve the information from the EHR in the traditional method of storing the data. There are however some disadvantages in the existing system. In the traditional method of storing data, there is less chance of retrieving the entire information about the patient's history if the hospitals or diagnostics center uses a third-party software for the storage of EHR. Trusting the third-party software or storage will end up in lack of security. To overcome this challenge, this work proposes a framework to automate the process of storing the EHR in the blockchain. It also facilitates the creation of a recommendation system for tailored treatment by analyzing the Electronic Health Records and historical data of the patient. It also checks with other factors like blood pressure, sugar level, pulse, etc. This work has two modules: Hai et al. Journal of Cloud Computing (2022) 11:22 2. Federated learning is applied to learn the data from various sources including different blockchain data and then the recommendation system is applied.
This work uses federated deep learning for analysis and prediction, and uses blockchain for storing the data. Blockchain is considered the safest storage area with high security and privacy [2,3]. Blockchain is a revolutionary technology that provides a decentralized storage area with several characteristics including immutability, security, and distributed data storage [4]. Blockchain technology is based on a consensus mechanism first introduced for cryptocurrency in the year 2008. Subsequently, the technology was adapted by various fields including banking, transportation, healthcare, etc. In the healthcare field, there is a need to store the data more securely, because it contains personal and sensitive information about the patient. This work uses the Hyperledger fabric for storing the data which is capable of continuous tracking and monitoring the changes that have been made in the digital ledger. The goal is not only to provide a secure place for storing the electronic medical record, but also to use a federated deep learning algorithm N-gram and LightGBM for the recommendation system. This work proposes the recommendation system after analyzing the EHR of the patients. This recommendation system recommends tailored treatment for the patients after analyzing their health records.

Motivation
Electronic Health Records contain personal and confidential information, so it has to be protected from cyberattacks and third-party authentication in the traditional storage methods. To overcome this challenge, the decentralized storage [5] method was proposed in this work. The work also focuses on the treatment recommendation for the patient by comparing the particular health record to the historical data. This work is motivated due to the limitations of existing works. Even though, state-of-theart past research supports the storage of the EHR, they do not discuss or integrate a recommendation system, so it's a challenge to create a recommendation system for treatment. Therefore, using federated learning the treatment recommendation can get much more accurate.

Contribution
The decentralized digital ledger system is introduced for the EHR storage. It uses the Hyperledger Fabric to store the data since it can track the changes in the record from time to time. This work also concentrates on the recommendation of tailored treatment for the patient using Federated deep learning. This work proposes a framework-Blockchain Vertical Federated Learning E-Medical Recommendation (BVFLEMR) for providing the storage space.

Organization of the paper
The rest of the paper is organized as follows: Section 2 contains the literature review of related works that have been previously developed. In Section 3, the overview of blockchain-based EHR and federated learning is discussed. It also explains the detailed design of the proposed framework. Section 4 explains the working and results of the proposed model is discussed. Finally, Section 5 concludes the work.

Related Work
Shimada et al. [6] presented a drug-recommendation system for patients with infectious diseases. To recommend drugs to patients suffering from infectious diseases, they developed a clinical decision support system. Its goal was to assist health care professionals, especially doctors, in selecting drugs appropriately. Medical decisions [7] for disease recognition, treatment, and recovery time were studied by Meisamshabanpoor and Mahdavi. The authors explained their proposed system in their article "Implementation of a Recommender System on Medical Recognition and Treatment". A collaborative filtering approach was used to recommend the best treatment. In their article "A Nursing Care Plan Recommender System Using a Data Mining Approach", Duan, Street, and Lu described a nursing care plan recommender system that they developed. Based on historical data, they proposed a recommender system that provides a ranked list of nursing plans to be updated as new items are entered. The association-rule measures (support and confidence) and a novel approach called "information value", which assesses which selections are likely to improve rankings in the future, are both employed [8,9]. Research by Hoens, Blanton, and Chawla on "Reliable Medical Recommendation Systems with Patient Privacy" describes a physician recommendation system. The system generates recommendations based on physician ratings and patient satisfaction. An anonymous contribution mechanism and secure processing architecture were also examined. With a secure processing architecture, patients are able to provide encrypted ratings, and the recommendations are generated over encrypted data. Patient ratings can be submitted anonymously through the anonymous contribution architecture. Users and physicians cannot tamper with ratings in order to have a more reliable system. Based on the reliability of their recommendations and the performance of the system, they evaluated their recommendation system [10]. "Recommender System for Personalized Wellness Therapy" describes Lim, Husain, and Zakaria's personalized recommender system [11,12].
Based on artificial intelligence and hybrid case-based reasoning, their system generated personalized wellness treatment recommendations. The system used casebased reasoning to find similar cases for wellness concerns expressed by users. The system also provided users with an online consultation form. If there were no suitable similar cases, it provided recommendations using rule-based reasoning. IAServ is a personalized healthcare service implemented as a web service and deployed in a cloud computing environment [13][14][15]. It is not appropriate to classify IAServ as a medical recommendation system, but rather as a clinical decision support system. Based on the patient's ontological profile and rules, IAServ creates a personalized care plan [11].

Blockchain
Blockchain is a revolutionary technology that is applicable in all fields. It provides the storage area with assured privacy and security. It has many useful features like immutability, a digital ledger concept, and a decentralized concept. Blockchain is based on a consensus algorithm and smart contract. An orderly structure of blocks allows the blockchain store data and records of executed transactions. In other words, the blockchain acts as a secure distributed database [16]. A block of data represents each transaction, containing the details of the time, date, price, and the participants. In the blockchain network, where information is distributed, almost every independent node participates in validating transactions without knowing one another. There are two hash codes for each block in the network: a previous and current hash. A previous hash code refers to a prior block, while a current hash code refers to a current block. If a block's information is revised, all its related information must also be updated within a reasonable time, so that the blockchain ensures the privacy and security of the data. As part of the network, all blocks are strongly linked and protected by transaction codes and crypto codes. Strong mathematical algorithms give miner nodes the ability to validate these blocks without affecting their data, after which the blocks are added to the blockchain. As a result, blockchain ensures both security and transparency. Blockchain technology is a system that stores information in a chain of blocks according to defined rules. A transaction is formed by adding new blocks to the chain of blocks; the nodes are independent of each other and are operated and controlled by the same protocol. The Blockchain network contains all the information about the transactions and participants and keeps a record of all the transactions done in the network. Depending on their functionality, blockchain networks can be classified as private, public, or consortium chains. In permission-less or public blockchain networks, there is no admin node to control and check the transactions, but all miner nodes can verify and validate the transactions. As well as participating in consensus processes, miner nodes maintain validity between themselves via consensus. Ethereum and Bitcoin are examples. An admin node in the consortium blockchain network handles data and transactions. Data may be accessed by public or private users. Based on business terms, some data could be available to the general public, while others could be restricted to specific types of private participants. The data on these networks is both public and private. They are not fully decentralized. An example is the Hyperledger Fabric platform. All data and transactions are strictly private in the private blockchain network. Only authorized members of the network can see the data. In one aspect, the admin node is very similar to the consortium network in that only the admin can add members to the network. e.g., Hyperledger, multichain networks, etc. Blockchain is the best suited and privacy-ensured technology for storing healthcare data. This manuscript uses blockchain and the Hyperledger fabric to secure the medical data of the patient. The Hyperledger fabric is capable of monitoring and tracking the electronic medical record at a certain timestamp. The main goal of the proposed work is not only to secure the data, but also to pave the way for a recommendation system. In the end, it introduces the recommendation module to recommend the tailored treatment for the patient by considering their medical report. Furthermore, the machine learning models are trained on the dataset to recommend the best-personalized treatment to the patients.

Federated learning
The healthcare field is a large, modern, distributed and decentralized network generating a huge amount of data every day. Simultaneously, those healthcare data is stored in different storage technologies like the cloud, blockchain, etc. Even with the blockchain technology, different types of blockchain are used in different hospitals and institutes. Given the increasing computing power in addition to concerns over transmitting private data, local data storage and network computation are becoming increasingly attractive. Federated learning has gained popularity in such environments. In this article, we discuss how federated learning necessitates advancements in privacy, large-scale machine learning, and distributed optimization. We also raise new questions about machine learning and systems in healthcare. The main task of federated learning includes learning from many types of blockchain, adapting the similarities, and analyzing the EHR stored in the different types of blockchain. As a result, it aims to provide the tailored treatment for the patient by analysing all the particulars of the patient and comparing it with the historical data. There are four fundamental challenges with federated learning:

Expensive communication 2. System heterogeneity 3. Statistical heterogeneity 4. Privacy concerns
As such, these challenges are similar to classical problems in privacy, large-scale machine learning, and distributed optimization. Numerous methods have been proposed in machine learning, optimization, and signal processing communities to address expensive communication. In contrast, prior methods generally cannot handle the scale of federated networks, let alone the challenges of the system and statistical heterogeneity. Additionally, while privacy remains a key aspect of many machine learning applications, privacy-preserving methods for federated learning can be difficult to demonstrate due to statistical variability in the data. It is also hard to implement because of system limitations on each device and over the massive network. In the proposed work, the data is collected from different storage technologies and devices and analyzed. A tailored treatment is then recommended using a recommendation system. Figure 1 shows the overview of federated learning. The data acquired from various blockchain sources are stored in the federated server.

BVFLEMR Framework
The first and foremost goal of blockchain is to store the data in the transaction and provide privacy and security for those transactions. All the blocks contain multiple transactions in it. All those transactions are validated by the miners, and in return, they get rewards. The proposed architecture is shown in the figure. The proposed work is of two parts: 1. EHR saved in blockchain by different hospitals and clinics into different types of blockchain. 2. Data collected from different sources using federated learning and a recommendation algorithm is applied. Figure 2 explains the architecture of the proposed work. Data is collected from different types of the blockchain, wherein the data is stored in blockchain by different hospitals, clinics, healthcare providers, and diagnostics centers. The different types of blockchain have different types of characteristics and properties. Those data are validated by miners for rewards and stored in the blocks, with each block containing multiple transactions. Here, the blockchain uses Hyperledger fabric and IPFS for storing the data in the blockchain from different sources, and these data are collected in a federated server for further processing. The process continues with data pre-processing, data analysis, data representation, and data selection. The pre-processed data is split into training data and testing data and the recommendation algorithm is implemented to get the tailored treatment recommendation for the patients.

Recommendation module
In this section, the recommendation module is discussed. N-gram for the probability of terms in the sentences, the Light gradient boosting model [16][17][18] for reducing the low gradient values, and sentimental analysis is done with emotional analysis.

N-gram
The term "language models" refer to models that determine the probabilities of words from word sequences. In essence, the N-gram model calculates any word's probability distribution from a sequence of letters. It is a probabilistic model that is trained on a large collection of texts called a corpus. By counting the number of times each word sequence appears in the corpus, the N-gram model estimates the probability. An N-gram model will predict words with a high probability of occurrence in an N-1 word sequence, assuming that it is given an N-1 word sequence. In mathematics, N-grams are sequences of N words. A bigram is made up of two words such as "Don't disturb", "My car", and "Your notebook", while a trigram is made up of three words such as "Please, don't disturb" and "Close the door". Is it possible to predict the probability of occurrence of a word in a corpus, using for instance, the N-gram model? In the training collection of texts, we might have two sequences "heavy flood" and "heavy rain". Using the N-gram model [19][20][21][22], we can predict that the probability of heavy rain is greater than that of flood. It will, however, be more frequent and selected by the model. The method can be used in a wide range of natural languages processing applications, such as speech recognition, word similarity comparison speech tagging, predictive input, natural language generation, grammar application, machine translation, and sentiment analysis. An analysis of the training accuracy and loss of the N-gram model in the BVFLMR framework is presented. Figure 3 shows the training accuracy and training loss of the N-gram model.
Light Gradient Boosting Machine (LightGBM) Among the many ways to use the Gradient Boosting Tree algorithm in the machine, the learning domain primary algorithms are the Extreme Gradient Boosting (XGBOOST) and Parallel Gradient Boosting Regression Tree (pGBRT). Engineering optimization methods [22] are used in these algorithms, but the efficiency and scalability of the models become unsatisfactory when the features and data are too large. It is time-consuming to check all data instances for information gained for every single feature. Microsoft's solution for this problem includes two new technologies, known as Gradientbased One-Side Sampling (GOSS) and Exclusive Feature Bundle (EFB). Data instances with small gradients are excluded from the GOSS model since only gradient data instances are important for estimating information gain. Many data points are eliminated during training. Hence, the data instance with a low gradient will have no effect on the model estimation. To increase information gain, only data instances with high gradients are used in our case. However, the GOSS can provide accurate estimation with a small data size. On the other hand, EFB is used for feature reduction, by grouping mutually exclusive features together [23][24][25]. In EFB, the greedy algorithm can be utilized to reduce the features in an effective manner without affecting the accuracy of information gain. Microsoft named these GOSS and EFB-based implementations LightGBM [26]. Lastly, LightGBM is a faster and more accurate model than the other GBDT. They claim that it is 20 times faster than other GBDT models [27][28][29][30][31].

Working of the BVFLEMR
The framework BVFLEMR (Blockchain Vertical Federated Learning E-Medical Recommendation) uses the vertical federated learning method to acquire the data from various types of blockchain. The flow of the proposed work is shown in Fig. 4. Initially, the data from the different types of blockchain is acquired by the federated server and the data is analyzed and pre-processed.

Data representation 4. Data selection
The pre-processed data is fed into a data partition, where the data is divided into testing and training data and the model is trained with the data. The data then travels to the recommendation module using the previous EHR.
Nowadays, main software applications and frameworks used in the healthcare industry continuously generate and handle a large amount of Electronic Medical Records. The data scientists and healthcare providers are interested in developing an automated system for analysis and recommendation of tailored treatment. The second part of the proposed work is to automate the recommendation system using Natural Language Processing and Machine learning algorithms. The workflow of the recommendation system is shown in Fig. 5.The steps of the proposed recommendation system begins with Data Pre-processing which comprises reviews and ratings of the treatment taken by the patients. This pre-processing module works in two different parts. It analyses the patient's condition i.e., cancer, liver damage, headache, blood pressure etc. Pre-processing removes the missing values from the dataset and normalizes it. The next module are the Natural Processing Language and Machine Learning Models-LightGBM and N-Gram with the help of sentimental analysis and emotional analysis. These two

Results and discussion
In the proposed work, the framework is designed to recommend tailored treatment for the patients. Initially, the patient's records and data from the hospitals are stored in the blockchain using the protocol Interplanetary File System (IPFS) in Hyperledger fabric and the data is acquired by the federated server for further processing. The data from the federated server is analyzed and pre-processed and fed into the recommendation module to suggest the tailored recommendation using sentimental analysis and emotional analysis library. The data was tested in the incremental procedure. Table 1 shows the performance of the incremental chunks of the dataset and the graphical representation is shown in Fig. 6. Table 1 and Fig. 6 show the results of the incremental dataset, which shows the differences between recall, precision, and F-score. Figure 7 shows the treatment for tumor disease with its accuracy. It discusses the different types of treatment for the tumor, with its mean accuracy. Figure 8 shows the treatment for cancer with its accuracy. It also discusses the types of treatment for cancer with its mean accuracy.

Conclusion
Blockchain technology updated the traditional storage method to a new cloud-based, robust, secured and transparent system. It ensures the security and privacy of the data. In this work, Hyperledger fabric which is capable of continuous tracking of the electronic medical record in the cloud, is used. The main novelty of the proposed work is to use the blockchain and ML/DL models to recommend tailored treatment for patients no matter where they are located. The proposed system used the blockchain and machine learning/deep learning algorithms in healthcare to generate a good result as stipulated. Furthermore, this system helps the healthcare providers, doctors, and patients to counter the treatment problem. There is a significant increase of the survival rate of the patient after the treatment. This work also shows the accuracy of the treatment recommendation system for various diseases like tumors and cancer. In future work, we will increase the size of the dataset and implement it to test the performance of the framework. Furthermore, we will also improve the ML/ DL models in terms of accuracy and recommendation results.