 Research
 Open access
 Published:
A multistrategy ontology mapping method based on costsensitive SVM
Journal of Cloud Computing volume 13, Article number: 144 (2024)
Abstract
As the core of ontology integration, the task of ontology mapping is to find the semantic relationship between ontologies. Nevertheless, most existing ontology mapping methods only rely ontext information to calculate entity similarity, thereby disregarding semantic nuances and necessitating substantial manual intervention. Therefore, this paper introduces an ontology mapping method. Based on the traditional ontology mapping method, the process employs a deep learning model to mine the semantic information of entity concepts, entity properties and ontology structure to obtain the embedding vector. We use the similarity mechanism to calculate the similarity between different embedding vectors, and combine the similarity values obtained from multiple strategy entities into a similarity matrix. The similarity matrix serves as input to the support vector machine (SVM), and the ontology mapping problem is finally transformed into a binary classification problem. However, since the number of mapped pairs is much larger than the number of nonmapped pairs, the number of positive samples in the data set is much smaller than the number of negative samples. Therefore, based on the traditional SVM, the paper adopts costsensitive strategy to deal with the class imbalance problem. In comparative evaluations against contemporary ontology mapping techniques, our method demonstrates a noteworthy 5.0% enhancement in recall and a 3.0% improvement in F1 score when tested on both public benchmark datasets and domainspecific datasets.
Introduction
With the rapid development of computer technology, diverse sectors are actively transitioning towards digitalization, informatization, and intellectualization, culminating in substantial efficiency improvements across industries. However, the inconsistency of different information managers’ understanding and description of data information has caused great inconvenience for data sharing and communication. As an example, the shield machine industry can be divided into mudwater pressure shield machines, earth pressure balance shield machines, mixed shield machines, etc. Firstly, different shield machines are selected when facing different geological conditions. Secondly, since shield machines come from different manufacturers and operate in different sites, the operation monitoring data of various shield machines are scattered in multiple systems, resulting in the emergence ofnumerous “information islands”. Consequently, how to integrate the data distributed in each system is of great significance.
In light of the rapid evolution of the Semantic Web, ontologybased data integration has garnered considerable attention among relevant researchers [1]. This method employs ontology mapping technology to find the mapping relationship between ontologies, and uses the mapping relationship to achieve entity alignment. Ontology mapping predominantly falls into two categories: traditional similarity calculation and machine learning methods.
The similarity calculation method is based on the conventional ontology from the different strategies between the entity similarity calculation, such as semantic distance [2], information content [3], and property information [4]. These similarities are subsequently weighted and summed to yield the comprehensive similarity value between entities for ontology mapping. The method based on machine learning transforms the ontology mapping into a classification task, which takes the similarity value calculated based on the traditional method as the input and the mapping result as the output to train a classifier model [5]. According to these mapping methods, researchers have developed a large number of ontology mapping systems, such as AML [6], LiLy [7], LogMap [8], and GLUE [9].
Although these systems have achieved good results [1], these ontology mapping methods still exhibit notable limitations. Firstly, these methods predominantly focus on the information of a single entity node and ignores the input of neighbor entity nodes. Secondly, the similarity calculation relies too much on dictionary and text information, and does not fully mine the semantic information associated with entities. Thirdly, manual intervention is needed, such as setting the weighting coefficients of different similarity values.
For cases of class imbalanced samples, common processing methods include sampling (such as undersampling or oversampling), algorithm adjustment, data augmentation, feature selection, and model evaluation adjustment. Costsensitive strategy has the advantages of improving the accuracy of minority class recognition, strong flexibility and generality, interpretability, adaptability, and overall model performance in dealing with class imbalanced samples. These advantages make costsensitive strategy an important and effective method for solving class imbalance problems.
To mitigate these challenges, this paper proposes a multistrategy ontology mapping method based on back propagation (BP) neural network. Firstly, the structure, concepts, and property embedding vectors are calculated using the graph attention network model (GAT) and Bidirectional Encoder from Transformer (BERT). Subsequently, entity similarity is computed using a multistrategy embedding vector to produce a similarity matrix. Finally, the costsensitive support vector machine (SVM) model is used to transform the problem of mapping relationship discovery into a binary classification task, which reduces the degree of manual intervention. Experiments demonstrate that the performance of the proposed method is greatly improved compared with the existing mapping systems, and the performance is also excellent in the professional domain ontology based on the shield mechanism.
More specifically, we summarize our contributions as follows:

1.
An embedding vector is introduced to enhance the traditional similarity calculation method. Embedding vectors can embed semantic information in concepts, properties, and structures into a lowdimensional vector space. It makes up for the limitations of traditional methods which predominantly focus on text information and solve the problem of semantic mismatch.

2.
The twolayer GAT model is employed to learn the structure information of the ontology. This model enables the entity node to fuse the structure information of its twolayer neighbor nodes, thereby effectively addressing the mapping challenges posed by heterogeneous ontologies.

3.
We perform experiments on public datasets to evaluate our method. The results show that our method is better than other methods and has improved in recall and F1 scores.
The subsequent sections of this paper are organized as follows. Problem description section introduces the definition and problem statement of ontology mapping. Proposed approach section describes our proposed approach, while Multistrategy similarity calculation section describes the concrete method of similarity calculation based on concepts, properties, and structure. Ontology mapping section offers a costsensitive SVM model to predict the ontology mapping relationship. Experiments section presents the experimental results and interpretation of the results Finally, Conclusion and future work section provides a summary and conclusion.
Problem description
Ontology
The concept of ontology originated from Aristotle’s (384322 BC) investigation of the nature of the existence of things. In philosophy, it is defined as “the systematic description of objective existence in the world, namely ontology”, which is a systematic explanation or explanation of objective existence [10]. As a tool for information abstraction and knowledge description, ontology has been gradually applied to the computer science. Relevant researchers have also provided definitions for ontology in the computer field. Notably, the ontology definition by Studer et al. is widely accepted, that is, “Ontology is a clear formal specification of shared conceptual model” [11]. In the information technology domain, ontology construction is primarily accomplished by capturing relevant domain knowledge, including areas like commerce, medicine, and financial fields, etc. The ontology serves to complete functions such as intelligent semantic search [12], and intelligent question answering [13], and decision reasoning [14]. Knowledge in an ontology is formally represented by classes, relations, functions, axioms, and instances [15]. Ontology O can be mathematically represented by Eq. (1).
Here, C stands for everything in the real world. It is also called a concept. Such as, animal, vehicle, mammal, etc.; R denotes the interaction between classes, referred to as a property, including data properties and object properties; F represents the special relationship established between classes based on real world rules. For example, driveOf is a special relationship, and driveOf (people, vehicle) indicates that people can drive a vehicle; A means the facts in the ontology, constraining the concepts or properties in the ontology. For instance, minCardinality indicates that the value of the constrained attribute should be at least the specified value; I represents an embodied representation of a class. In the computer field, it is necessary to employ specific language to describe the conceptual model of ontology. Web ontology Language (OWL) is the latest ontology description Language standard recommended by W3C. It describes abstract classes, relations, and instances as a whole and describes them as entities. The ontology models in the rest of the paper are constructed using OWL. Figure 1 shows a sample ontology model.
Ontology mapping problem
With the wide use of ontologies in diverse domains, the heterogeneity of ontologies becomes increasingly evident. Since ontologies established by different organizations always serve their organizations, ontologies built by different organizations in the same domain exhibit heterogeneity in semantics, syntax and structure, especially semantic heterogeneity. For example, the concepts of person, people, and human are identical in semantics. Ontology heterogeneity poses great obstacles to information interoperability. A viable remedy to address this challenge is ontology mapping, which discovers semantic relations between ontologies, establishes mapping rules for them, and employs mapping rules to realize entity matching between ontologies [16].
Consider two input ontologies, a source ontology \(O_s\) and a target ontology \(O_t\), where \(O_s\) comprises m entities and \(O_t\) contains n entities. The existence of a mapping relationship set, denoted as Map, is represented by Eq. (2).
For each entity \(e_{is}\) in \(O_s\), it is necessary to identify the entity \(e_{jt}\) with the highest similarity to \(e_{is}\) from \(O_t\). This process is repeated until there are no entity pairs left to map. For the convenience of the subsequent description, Table 1 provides a list of symbols used in the paper along with their respective meanings.
Proposed approach
Our proposed method is depicted in Fig. 2. Initially, we preprocess the source ontology \(O_s\) and the target ontology \(O_t\), encompassing entity extraction, property extraction, relation extraction, and string preprocessing. Subsequently, the concepts, properties, and structures that can describe entities are embedded into the lowdimensional vector space, and the cosine similarity algorithm is employed to calculate the similarity value between vectors. The similarity matrix \(\varvec{M}\) is constructed based on the similarity value. Finally, this matrix is utilized as the input of the BP neural network to realize the discovery of ontology mapping relations. Within this process, there are two core functions.
Similarity based Feature vector construction: To obtain the similarity value based on concepts, properties, and structures, the paper introduce an embedding vector method based on traditional similarity calculation. Firstly, the concept embedding vector \(\varvec{CE}\) and property embedding vector \(\varvec{PE}\) are obtained by BERT, and the structure embedding vector \(\varvec{SE}\) is obtained by GAT. Subsequently, the cosine similarity of the embedding vectors of \(e_{is}\) and \(e_{jt}\) is calculated by cosine similarity, where the concept similarity value is combined with the similarity value based on semantic distance through the cosine similarity calculation. Finally, the multistrategy similarity vector is obtained, and it is combined into the similarity matrix \(\varvec{M}\) by column. The calculation method of cosine similarity in ndimensional space is illustrated in Eq. (3):
Where \(\varvec{x}\) and \(\varvec{y}\) represent the concept, property, and structure embedding vectors of the entities in \(O_s\) and \(O_t\), respectively. The smaller the angle between the two vectors, the closer the cosine similarity value is to 1, and the similarity value is correspondingly higher. Further details of this process are described in Multistrategy similarity calculation section.
Entity matching: The core of ontology mapping is entity matching. There are only two mapping relationships between entities: matching and mismatching. This characteristic aligns it as a typical binary classification problem. Therefore, the paper combines the similarity matrix \(\varvec{M}\) with the class labels y to construct the training data XY, as expressed in Eq. (4):
We employs a costsensitive SVM to transform the ontology mapping problem into a binary classification problem. Further details of this process are described in Ontology mapping section.
Multistrategy similarity calculation
Within the proposed method, the similarity matrix M serves as the input for the costsensitive SVM to train the costsensitive SVM model and predict entity matching. Therefore, this section delineates the method of calculating similarity based on concept, property, and structure, and constructs the similarity matrix \(\varvec{M}\) based on them. The definition of similarity matrix \(\varvec{M}\) is presented in Table 2. Each row of the similarity matrix \(\varvec{M}\) represents the similarity of any entity \(e_{is}\) of \(O_s\) and any entity \(e_{jt}\) of \(O_t\).
Vectorized representation of concepts and properties
The foundational method in ontology mapping is the discovery of entity matching pairs using concepts and properties. A concept serves to describe the meaning of an entity, that is, the entity’s name. If two entities have similar names, the entities are similar. Properties, further categorized into object properties and data properties, elucidate the specific characteristics of entities within the objective world. An entity will contain multiple properties, which are unique and deterministic for the entity. Therefore, property similarity is an important benchmark to judge entity similarity [17]. In this paper, we exclusively consider data properties. For these two perspectives, relevant researchers have proposed a large number of methods [4, 18]. Nevertheless, the majority methods only use the text information of concepts and properties, but do not use the rich semantic information of the text. To utilize the semantic information of concepts and properties, we introduce the embedding vector method.
The embedding vector solves the feature sparsity problem in traditional methods. It employs a deep learning model to map a word to a lowdimensional dense vector space, so that similar words can share contextual information, to enhance the generalization ability. Moreover, unsupervised training can significantly provide the model’s generalization ability, such as ELMO [19], OpenAI GPT, and BERT. As the final model in recent years, the BERT model has demonstrated remarkable results in intelligent question answering, inference, and prediction in the field of natural language processing. It is a bidirectional encoder representation based on a transformer and a pretrained language representation model. It can be trained with largescale unlabeled anticipation to obtain rich semantic information of text [20]. Its fundamental architecture involves stacking multiple transformers, as shown in Fig. 3. The input comprises the text to be trained, initialized initially as a vector, and subsequently undergoes training through a multilayer transformer stack. Finally, the embedding vector of the text is obtained.
In our method, the trained BERT model is selected. The model contains 12 Transformer layers, 512 hidden layers, and 8 attention heads. The output embedding vector dimension is 512. Initially, the concepts and properties within the ontology are preprocessed (such as word segmentation, stopping words, special symbols, etc.). As for concepts, the concept names of entities \(e_{is}\) and \(e_{jt}\) are input into the BERT model, and the concept embedding vector \(\varvec{CE}\) is obtained after training. Concerning properties, as shown in Fig. 4, we first acquire the set of data properties of entity \(e_p =\{ p_1,p_2\ldots p_t\}\). It has t data properties in total. Then the properties in the property set are successively input into the BERT model to obtain the corresponding data property vector set \(\varvec{E}_s = \{\varvec{E}_1,\varvec{E}_2\ldots \varvec{E}_t\}\). Finally, a weighted sum of the data property vectors is carried out to obtain the property embedding vector \(\varvec{PE}\).
Conceptbased similarity calculation
Conceptbased similarity calculation is one of the most popular methods in ontology mapping. Its fundamental principle is to use concept similarity to find entity matching pairs. Resnik emphasized that the similarity of a pair of concepts is determined by their commonality [21]. Consequently, traditional methods are also based on this principle, such as the edit distance method [22] and dictionarybased similarity calculation method [20, 23, 24]. The principle of the edit distance method is to calculate the minimum number of edit operations (replace, insert, and delete) required to transform one string into another. The fewer the operations, the lower the similarity between the strings. Dictionaries in the ontology field are semantic webs with robust classification and tomographic structures constructed by researchers based on entities and relations in the realworld, such as WordNet [25] and HowNet [26]. The lexicalsbased similarity calculation method is to property all the concepts to the Semantic Web. The commonality of a pair of concepts is how much information they share on the Semantic Web. However, these methods still exhibit some drawbacks and are less effective in handling synonymous names (e.g., electromotor and motor) and specialized fields.
To address this challenge, we propose a concept similarity calculation method that combines Wordnet and a concept embedding vector. The embedding vectorbased similarity calculation method first employs the BERT model to map the semantic information of concepts into a multidimensional vector space. Subsequently, the cosine similarity of each concept embedding vector pair is calculated. Finally, the concept similarity vector \(\varvec{C}\) based on the embedding vector is obtained (refer to Vectorized representation of concepts and properties section). The Algorithm process is shown in Algorithm 1. The concept similarity calculation method based on WordNet firstly uses the commonality between WordNet concept nodes for measuring the similarity of concept nodes. The experimental results demonstrate that the proposed method is more effective than the traditional or embedding vector method, and the results are ideal under synonymous names and specialized fields.
WordNet serves as an English lexical semantic web, organizing words of speech’s various parts into a synonym network. Each synonym set represents a fundamental semantic concept. Various relations connect these sets. Based on the Semantic Web, numerous similar calculation methods have been proposed. Notably, the concept similarity calculation method based on information content proposed by Lin, is widely used [2], as illustrated in Eq. (5).
\(LCS(\cdot )\) denotes the nearest common parent of both concepts. \(IC(\cdot )\) signifies the information content of the concept, calculated using the method proposed by Seco et al. [27, 28] based on WordNet. The numerator is the common denominator of \(e_{is}\) and \(e_{jt}\), and the denominator is the total information of \(e_{is}\) and \(e_{jt}\). The more commonalities between two entities, the higher the similarity. Whereas, the less common the lower the similarity.
However, two issues arise when only using the above method to calculate concept similarity. Firstly, the Semantic Web cannot contain all concepts, particularly in professional fields. Secondly, the concept is also represented by a sequence of words. To address the first issue, we employ the edit distance algorithm to solve it. When any concept in the entity matching pair does not exist in the Semantic Web, the edit distance algorithm is used to calculate the similarity between them. As for the second challenge, we employ the method outlined in Eq. (6).
Where \(set_{is}\) denotes a collection of words forming up the concept of \(e_{is}\), \(set_{is}\) represents the length of the collection, \(set_{jt}\) represents a collection of words form up the concept of setjt. The \(sim\left( W_{sk},\ W_{tk}\right)\) is calculated as follows: for all word pairs in set \(set_{is}\) and \(set_{jt}\), the similarity value based on information content is calculated. In cases where a word is not present in WordNet, the edit distance method is employed to calculate their similarity value, and the maximum value is considered as \(sim\left( W_{sk},\ W_{tk}\right)\). The word pairs corresponding to the maximum values are \(W_{sk}\) and \(W_{tk}\). They are removed from \(set_{is}\) and \(set_{jt}\), respectively. Iteratively follow the aforementioned steps with the remaining words in \(set_{is}\) and \(set_{jt}\) as the new set until there are no words in \(set_{is}\) and \(set_{jt}\). Lastly, calculate the concept similarity value of all entity pairs in \(O_s\) and \(O_t\), and obtain the concept similarity vector \(\varvec{D}\) based on semantic distance. The Algorithm process is presented in Algorithm 2.
Structurebased similarity calculation
The tree structure of the ontology model is a special graphic structure. The parentchild relationship across different layers and the sibling relationship within the same layer contains a lot of semantic information. Entity pairs with mapping relations also exhibit a similar structure, underscoring the significance of using semantic information of ontology structure for ontology mapping. GAT is a new neural network model based on graphic structure. It dynamically assigns different attention scores to each neighbor node through a multihead attention mechanism, to identify the importance of different neighbor nodes to the central node [29]. It learns multilevel structure information of nodes through multiple GAT layers, ultimately yielding structurebased embedding vectors. The input of layer lth of the model is the node feature vector S, and the output of layer lth is obtained by Eq. (7) below.
\(\alpha\) denotes an activation function. represents the splicing operation. k independent attention mechanisms are employed to calculate k feature vectors. Subsequently, they are concatenated to obtain the output feature vector of layer l. \(W^k\) represents the weight matrix for the corresponding linear transformation, while \(\alpha _{ij}^k\) is the attention coefficient between node i and node j calculated by the kth attention mechanism, as given by (8) below.
The weight matrix W represents the linear transformation, and LeakyReLu is the activation function. In particular, in the last layer of the network, the final node feature vector is obtained through the averaging of K feature vectors calculated by the multihead attention, as shown in Eq. (9).
The concept to be calculated in the ontology is e, and its parent node and sibling node are its neighbors, so we adopt the twolayer GAT model, as shown in Fig. 5. The input is the entity initialization vector \(\varvec{h}_{1}\) of \(O_s\) and \(O_t\). The output is the structure embedding vector \(\varvec{SE}\) of each entity fused with two levels of parentchild nodes and sibling nodes. To make the final embedding vector represent entity information and ontology structure information as much as possible, we use a set of mapped entity pairs as the training dataset to train the GAT model and obtain the best embedding vector by minimizing the loss function described in Eq. (10). The input size of the GAT first layer is 600, the output size is 300, it contains 6 attention heads, uses LeakyReLu activation function, hyperparameter \(\alpha\) is 0.2, and dropout is 0.001. The input size of the GAT second layer is 300 and the output size is 200, with other configurations consistent with the first layer.
Where \(L_l\) represents the loss function from \(O_s\) to \(O_t\), \(L_r\) denotes the loss function from \(O_t\) to \(O_s\), and f signifies the similarity calculation function. In the paper, we adopt cosine similarity, as indicated in Eq. (3). \(f\left( h_r\left( e\right) ,h_r\left( v\right) \right)\) represents the cosine similarity of matched entity pairs. \(f\left( h_r\left( e^\prime \right) ,\ h_r\left( v^\prime \right) \right)\) represents the cosine similarity of randomly selected entity pairs that do not match the source entity. The loss function L is optimized by minimizing the entity mapping pair distance and maximizing the negative entity mapping pair distance. Stochastic gradient descent (SGD) is employed to minimize the above loss function. After the structure embedding vector \(\varvec{SE}\) is obtained, the cosine similarity of each pair of structure embedding vectors is calculated to obtain the structure similarity vector \(\varvec{S}\) based on the embedding vector. The Algorithm process is outlined in Algorithm 3.
Propertybased similarity calculation
The fundamental concept of the propertybased similarity calculation method is to measure the similarity of two concepts according based ontheir common properties. The more common properties of two concepts, the more similar they are considered. The traditional method is to use Jaccard coefficient to calculate the properties similarity between concepts \(e_{is}\) and \(e_{jt}\), as shown in Eq. (11).
Where, \(P_{is}\) represents the set of properties of concept \(e_{is}\), \(P_{jt}\) denotes the set of properties of concept \(e_{jt}\). The numerator represents the number of common properties of \(e_{is}\) and \(e_{jt}\). The denominator represents the number of all properties of \(e_{is}\) and \(e_{jt}\). Nevertheless, this method only considers the textual information of properties, and completely ignores the semantic information of properties. Consequently, relevant researchers have improved this. For example, literature [4] proposed to calculate the similarity of the property name, property data type, and property value, respectively. The weighted sum of the three similarity values was carried out to obtain the final result. However, this approach requires numerous considerations and requires a lot of human intervention.
Considering that we have measured entity similarity from the perspective of structure and concept, and properties contain rich semantic information. We introduce a property similarity calculation method based on property embedding vector (refer to Vectorized representation of concepts and properties section) and Jaccard coefficient, as shown in Eq. (12).
\(\varvec{pe}_{is}\) and \(\varvec{e}_{jt}\) represent the property embedding vectors of source entity \(e_{is}\) and target entity \(e_{jt}\), respectively. If both entities contain data properties, the similarity \(\varvec{p}_{is,jt}\) is calculated using their property embedding vectors. If either entity lacks data properties, we use Eq. (11) to calculate the similarity between properties. Considering the value range of cosine similarity, we modify Eq. (11) so that its similarity calculation result is between \([1, 1]\). Ultimately, the property similarity vector \(\varvec{P}\) based on embedding vector is obtained.
Finally, we concatenate the acquired similarity vectors \(\varvec{C}\), \(\varvec{D}\), \(\varvec{S}\), and \(\varvec{P}\) into a similarity matrix \(\varvec{M}\) by columns. Each row of \(\varvec{M}\) represents a training sample \(\varvec{x}_i\) within XY.
Ontology mapping
SVM theory
SVM is a model for data classification, pattern recognition, and regression analysis within the supervised learning paradigm. It exhibits a robust mathematical foundation and theoretical support, formally introduced in 1995 [30]. For the data set \(D=\{(x_1,y_1),(x_2,y_2),\ldots ,(x_n,y_n)\}\), where \(x_i\) denotes an input vector and \(y_i\) represents a category. The basic concept for binary classification of data sets by SVM is to find a relatively optimal hyperplane to divide datasets. For a linearly divisible dataset, the hyperplane can be represented, as illustrated in Eq. (13).
where \(\varvec{\omega }\) represents weight vector and b represents weight bias. The process of solving the hyperplane can be simplified by addressing the quadratic programming problem in the original space, as shown in Eq. (14).
\(\xi _i\) is the relaxation variable, indicating the extent to which each sample fails to meet the optimal model. C is the penalty factor. Equation (14) constitutes a convex quadratic programming problem, which is transformed into a dual problem by introducing Lagrange multiplier method to reduce the computational complexity, as depicted in Eq. (15).
We can determine \(\alpha\) according to Eq. (15), and subsequently solve for \(\varvec{\omega }\) and b to get the model. Nevertheless, the majority ofsample data in reality do not have a linearly separable hyperplane. Hence, we can map the sample data from the original space to a higherdimensional feature space to make it linearly separable, as illustrated in Eq. (16).
Directly calculating \(\phi (\varvec{x}_i)^T\phi (\varvec{x}_j)\) is very difficult, so imagine a function \(\kappa\) predominate, as shown in Eq. (17).
These functions are referred to as kernel functions, and common examples are linear kernel, polynomial kernel, Laplacian kernel, and Gaussian kernel. The widely adopted Gaussian kernel is shown in Eq. (18).
The optimal hyperplane model is provided in Eq. (19).
Mapping discovery based on costsensitive SVM
The application of SVM to data classification relies on the assumption that the number of samples in different categories is similar. If the amount of sample data of different categories is too large, it will significantly impact the effect of the model. For example, consider a sample data with 998 negative samples and only 2 positive samples. The model needs only to predict all positive samples to be negative to achieve \(99.8\%\) accuracy, rendering it useless as it fails to identify any positive samples. In the paper, the same challenge exists. For an entity in the source ontology, there is only one entity in the target ontology with a mapping relationship. Therefore, there is only one positive sample, and all the others are negative samples. This problem is called the sample imbalance problem.
There are three strategies to solve the problem of sample imbalance: oversampling, undersampling, and cost sensitivity [27]. Oversampling balances the sample distribution by creating a small number of class samples. The widely used method is to randomly duplicating a few samples and applying SMOTE [31]. Undersampling balances the sample distribution by discarding most of the class samples. The common method is random undersampling (RUS). These two methods are simple to solve the problem of sample imbalance from the data level. Nevertheless, it changes the original sample distribution. The cost sensitive method solves the issue of sample imbalance from the algorithm level. The design strategy of the classification algorithm is to minimize the population error by assuming that all misclassification has the same cost, so the problem arises. However, when the sample data set category is not balanced, the model tends to learn data sets with more categories, potentially ignoring data sets with fewer categories.
In SVM, different penalty factors \(C_1\) and \(C_{1}\) were introduced for the relaxation variables of positive and negative SVM during training [32]. The original SVM model was transformed into a costsensitive SVM model, as depicted in Eq. (20).
The paper adopted the costsensitive SVM model as shown in Eq. (20). The input value is the similarity matrix \(\varvec{M}\). When output is 1, entities \(e_{is}\) and \(e_{jt}\) have a matching relationship; conversely, when output is \(1\), there is no matching relationship. The penalty factors \(C_1\) and \(C_{1}\) are calculated as shown in Eq. (21). Where, n samples denotes the number of sample training, \({class}_{nums}\) is the number of categories, \({pos}_{nums}\) is the number of positive samples, and \({neg}_{nums}\) is the number of negative samples.
Experiments
To test the performance of the proposed method, the Benchmark dataset provided by OAEI serves as the test dataset. The source ontology of this dataset includes 33 concepts, 59 properties, and 96 instances. The dataset is divided into three classes and numbered 1XX, 2XX, and 3XX, respectively. Where 101 represents the original ontology with complete data and is used as the source ontology, 103 and 104 denote the source ontology described by different languages; 2XX is the target ontology obtained by changing some features in the source ontology; 3XX is the ontology description in the real scene.
Moreover, to verify the performance of the proposed method in the professional field, we constructed the shield machine dataset. This dataset is derived from the model parameters of shield machine components used in three different construction sources. During the operation of the shield machine, realtime monitoring and recording of the operating status of each component are required, such as cutting tools, cutterheads, motors, and bearings. Unifying the description and recording of shield machine components from different construction sources is an important foundation for achieving the intelligence, autonomy, and efficiency of shield machines. We construct the shield machine equipment ontology, which abstracts each component and system of the shield machine into an ontology model. The source ontology is a global ontology made by integrating various types of shield mechanisms. The target body is the earth pressure balance shield machine body and the mudwater balance shield machine body.
Evaluation criteria
This paper employs Precision, Recall, and F1 value to evaluate the performance of ontology methods, where the meaning of each symbol is: TP represents the number of positive samples correctly found, FP denotes the number of positive samples predicted to be negative, and FN represents the number of negative samples predictedas positive.
The precision rate is calculated as shown in Eq. (22a), representing the proportion of correct matching results calculated by the algorithm to all matching results. Recall is calculated as shown in Eq. (22b), representing the proportion of the correct matching result calculated by the algorithm to the original correct matching result; The above two evaluation criteria have certain onesidedness, and the F1 value can measure the algorithm performance more comprehensively, as illustrated in the Eq. (22c).
Experimental results and analysis
According to the proposed method and Fig. 1, the specific experimental process is as follows:

1.
Preprocess. Taking 101 as the source ontology and others as the target entities, the concept names, data properties and structural relations of each pair of mapping ontologies are extracted.

2.
Vector embedding. The concept name and data property are input into the BERT model to obtain the concept embedding vector and property embedding vector. The structural relations are input into the GAT model to obtain the structural embedding vector.

3.
Similarity calculation. Based on the vector obtained in Step 2 and the similarity calculation method in Multistrategy similarity calculation section, the multistrategy similarity vector is obtained and spliced into the similarity matrix \(\varvec{M}\).

4.
Entity matching. The similarity matrix M is partitioned with a ratio of 3 : 7, where 70% is used as the training set of SVM, and 30% serves as the testing set. Model training and matching relationship prediction are carried out. The penalty parameter C of SVM is 20, the kernel function is radial basis function, and the training times are 1000.
In addition, AML, Lily, LogMap, and GLUE were used as control methods for control experiments. A total of 5 rounds of experiments were conducted to obtain their average experimental results. The experimental results of Benchmark datasets are shown in Table 3 and Fig. 6. The experimental results of Benchmark and shield machine data set under the method in this paper are shown in Table 4.
From the experimental results, our method outperforms other systems. The average accuracy of Benchmark dataset is 98.0%, the average recall is 88.0%, and the average F1 value is 92.0%, proving the effectiveness of the proposed method on public datasets. The average accuracy rate of the shield machine dataset is 99.0%, the average recall rate is 99.0%, and the average F1 value is 99.0%, providing the effectiveness of the proposed method in professional datasets.
Furthermore, it can be obtained by analyzing the experimental data that the accuracy of the proposed method is lower than that of AML and LiLy. Considering that AML uses background knowledge as the translation and matching process of knowledgeassisted ontology, it is equivalent to pretraining before entity matching. In contrast, our proposed method does not directly use any knowledge base for training and prediction, resulting in higher than our proposed algorithm. Lily only uses the structurebased similarity calculation method to achieve entity matching, reducing the interference of many wrong information about concepts and properties. Our proposed method integrates multiple information, so it is susceptible to interference, and its accuracy is slightly lower than Lily’s accuracy. Nevertheless, the recall and F1 value of our proposed method are much higher than those of other systems. Compared with Lily, we calculate entity similarity from multiple perspectives including concepts, properties, and structures. Compared with AML, GLUE, and LogMap, we employ BP neural network to combine similarity values from multiple perspectives, proving that this method is feasible. It also shows that the proposed method has good robustness. The proposed method is also validated on the shield machine dataset.
Conclusion and future work
In this paper, we propose a new ontology mapping method. Based on the traditional similarity calculation method, BERT model and GAT model are empolyed to calculate the embedding vector of the entity concept, property, and structure. Then the cosine similarity between vectors is calculated to obtain the entity similarity value of multiple strategies. We take the similarity matrix M constructed from the entity similarity values of multiple strategies as the input. Costsensitive SVM model is employed to predict entity matching relationship. Compared with other methods, the proposed method does not need manual intervention, greatly improving the automation and robustness of the model.
Extensive experiments on public and specialty domain datasets showed that our model improves recall and F1 values by 5.0% and 3.0%, respectively, over the best performing methods. Consistent results have also been achieved in professional domain datasets. Therefore, this method is not only applicable to general ontology, but also to domain ontology.
In future work, we will focus on the following four points. Firstly, this paper only uses the data property information, but does not consider the object property information. We will plan to add the object property in the future. Secondly, on the basis of not affecting F1 and recall, the accuracy is further improve.; Thirdly, the robustness of the proposed method needs to tested on more domain ontologies. Lastly, with the increase of data dimensions, ontology mapping faces significant challenges in processing highdimensional data. Future research can focus on how to construct effective ontology mapping models in highdimensional spaces and apply them to practical data analysis.
Availability of data and materials
Not applicable.
Data availability
Data is provided within the manuscript.
References
Benedikt M, Grau BC, Kostylev EV (2018) Logical foundations of information disclosure in ontologybased data integration. Artif Intell 262(SEP.):52–95
Lin D (1998) An informationtheoretic definition of similarity. Icml 98:296–304
Pantel P, Lin D (2002) Discovering word senses from text. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM/Edmonton, Alberta, Canada, pp 613–619
Zhang XK, Zhang Q, Gao XY (2017) Ontologybased comprehensive weighted similarity algorithm research. Computer Science and Technology: Proceedings of the International Conference (CST2016), World Scientific, pp 1035–1043
Spohr D, Hollink L, Cimiano P (2011) A machine learning approach to multilingual and crosslingual ontology matching. Springer, Berlin, Heidelberg
Faria D, Pesquita C, Santos E, Palmonari M, Cruz IF, Couto FM (2013) The agreementmakerlight ontology matching system. In: On the Move to Meaningful Internet Systems: OTM 2013 Conferences: Confederated International Conferences: CoopIS, DOATrusted Cloud, and ODBASE 2013, Graz, Austria, September 913, 2013. Proceedings. Springer, Heidelberg, Berlin, pp 527–541
Wang P, Zhou Y, Xu B (2011) Matching large ontologies based on reduction anchors. In: IJCAI. pp 2343–2348
JimenezRuiz E, Grau BC (2011) Logmap: Logicbased and scalable ontology matching. In: SEMANTIC WEB  ISWC 2011, PT I. vol 7031. Springer/Bonn, Germany, pp 273–288
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2018) Glue: A multitask benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461
Guarino N, Oberle D, Staab S (2009) What is an ontology? Handbook on ontologies, Springer, pp 1–17
Studer R, Benjamins VR, Fensel D (1998) Knowledge engineering: Principles and methods. Data Knowl Eng 25(1–2):161–197
Guha R, McCool R, Miller E (2003) Semantic search. In: Proceedings of the 12th international conference on World Wide Web. ACM/Budapest, Hungary, pp 700–709
Cui W, Xiao Y, Wang H, Song Y, Hwang SW, Wang W (2019) Kbqa: Learning question answering over qa corpora and knowledge bases. arXiv preprint arXiv:1903.02419
Yang B, Mitchell T (2019) Leveraging knowledge bases in lstms for improving machine reading. arXiv preprint arXiv:1902.09091
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing? Int J Hum Comput Stud 43(5–6):907–928
Noy NF (2009) Ontology mapping. In: Handbook on ontologies. Springer, Heidelberg, Berlin, pp 573–590
Qian P (1999) Text similarity computing based on attribute theory. Chin J Comput 22:651–655
Cao X, Zhao J, Zhang F, Wu G, Wang H (2020) A method of computing conceptual semantic similarity based on partwhole relationship. J Phys Conf Ser 1544:012046
Gao W, Farahani MR, Aslam A, Hosamani S (2017) Distance learning techniques for ontology similarity measuring and ontology mapping. Clust Comput J Netw Softw Tools Appl 20:959–968
Devlin J (2018) Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
SarzynskaWawer J, Wawer A, Pawlak A, Szymanowska J, Okruszek L (2021) Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res 304(10):114135
Radford A, Narasimhan K (2018) Improving language understanding by generative pretraining
Resnik P (1970) Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmplg/9511007
Bouquet P, Euzenat J, Franconi E, Serafini L, Tessaris S (2004) Specification of a common framework for characterizing alignment. static.digns.com
Hirst G, StOnge D et al (1998) Lexical chains as representations of context for the detection and correction of malapropisms. WordNet Electron Lexical Database 305:305–332
Yang D, Powers DM (2005) Measuring semantic similarity in the taxonomy of WordNet. Australian Computer Society, Melbourne, Australia
Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in wordnet. In: ECAI 2004: 16TH European Conference On Artificial Intelligence, Proceedings, vol 110. DBLP/Valencia, Spain, pp 1089–1090
Goble A, Stevens JR, Brass CA, Lord P (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Pergamon Press, Oxford, UK
Miller George A (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Dong Z, Dong Q (2003) Hownet  a hybrid language and knowledge resource. In: 2003 International Conference on Natural Language Processing and Knowledge Engineering Proceedings. IEEE/Beijing, China
Cortes C, Vapnik V (1995) Supportvector networks. Mach Learn 20(3):273–297
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from classimbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239
Acknowledgements
This work was supported by the National Key Research and Development Program of China (Grant No.2020YFB1712101).
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Fan Zhang and Peichen Yang: Conceptualization, Methodology, Investigation, Data curation, Writing  original draft, Writing  review & editing. Rongyang Li and Sha Li: Validation, Supervision. Jianguo Ding and Jiabo Xu: Project administration, Supervision. Huansheng Ning: Writing  review & editing, Project administration, Supervision, Funding acquisition. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Zhang, F., Yang, P., Li, R. et al. A multistrategy ontology mapping method based on costsensitive SVM. J Cloud Comp 13, 144 (2024). https://doi.org/10.1186/s13677024007087
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13677024007087