Skip to main content

Advances, Systems and Applications

Algorithm selection using edge ML and case-based reasoning

Abstract

In practical data mining, a wide range of classification algorithms is employed for prediction tasks. However, selecting the best algorithm poses a challenging task for machine learning practitioners and experts, primarily due to the inherent variability in the characteristics of classification problems, referred to as datasets, and the unpredictable performance of these algorithms. Dataset characteristics are quantified in terms of meta-features, while classifier performance is evaluated using various performance metrics. The assessment of classifiers through empirical methods across multiple classification datasets, while considering multiple performance metrics, presents a computationally expensive and time-consuming obstacle in the pursuit of selecting the optimal algorithm. Furthermore, the scarcity of sufficient training data, denoted by dimensions representing the number of datasets and the feature space described by meta-feature perspectives, adds further complexity to the process of algorithm selection using classical machine learning methods. This research paper presents an integrated framework called eML-CBR that combines edge edge-ML and case-based reasoning methodologies to accurately address the algorithm selection problem. It adapts a multi-level, multi-view case-based reasoning methodology, considering data from diverse feature dimensions and the algorithms from multiple performance aspects, that distributes computations to both cloud edges and centralized nodes. On the edge, the first-level reasoning employs machine learning methods to recommend a family of classification algorithms, while at the second level, it recommends a list of the top-k algorithms within that family. This list is further refined by an algorithm conflict resolver module. The eML-CBR framework offers a suite of contributions, including integrated algorithm selection, multi-view meta-feature extraction, innovative performance criteria, improved algorithm recommendation, data scarcity mitigation through incremental learning, and an open-source CBR module, reshaping research paradigms. The CBR module, trained on 100 datasets and tested with 52 datasets using 9 decision tree algorithms, achieved an accuracy of 94% for correct classifier recommendations within the top k=3 algorithms, making it highly suitable for practical classification applications.

Introduction

In data mining, prediction problems are commonly and frequently encountered, and they are effectively addressed through the utilization of machine learning algorithms. Machine learning experts have devised and created a large number of algorithms tailored for data mining applications [1]. Similarly, to improve classification accuracy, especially on difficult problems, researchers are developing new and innovative methods for combining and designing new classifiers, such as ensemble or exploit the intrinsic structure classes [2]. This makes the algorithm space vast, which is categorized into distinct families, encompassing decision trees, Bayes classifiers, rule-based learners, meta-learners, multi-instance classifiers, and function classifiers and lazy classifiers [3]. Among these categories, decision tree-based algorithms are widely acknowledged and employed as a prominent data classification technique due to their efficiency, user-friendly nature, and comparatively modest complexity [4]. In various applications, including medicine, finance, public policy, and more, they serve as valuable tools for facilitating decision-making [5].

Picking an appropriate classification algorithm (a decision tree in this case) for a given dataset is a difficult and challenging task not only for end users but also for machine learning practitioners [6]. This is because different algorithms behave differently on the same problem due to the inherent characteristics of the data. For example, some algorithms are better suited for linear data, while others are better suited for non-linear data. Additionally, some algorithms are more computationally efficient than others [7].

One approach involves empirically evaluating all available classifiers for a given classification problem and selecting the one with the best results as the winner [7, 8]. However, this method faces the challenge of exhaustive search, leading to computational complexity [9]. Several studies have demonstrated that there isn't a universal classification algorithm suitable for all classification problems. For instance, if the same classifier is applied to a different problem, it may yield suboptimal results, thus affirming the established "No Free Lunch" theorem [10]. The reason is that the results of classifiers depend on the specific characteristics of each problem, making the task of classifier selection as a meta-learning approach [11]. In this approach, the meta-characteristics of classification problems are computed, and classifiers are evaluated based on their performance on these problems. Subsequently, a mapping is learned between problem features and the classifier(s) that exhibit the best performance, enabling the recommendation of an appropriate classifier [12]. Hence, the task of automatic algorithm selection through meta-learning can be represented as a four-step process model [13]. This model encompasses the following stages: classifier characterization, where classifier performance is assessed; problem characterization, involving the extraction of inherent meta-characteristics of the problem; the mapping and learning of problem meta-characteristics against classifier performance, and ultimately the recommendation of suitable classifier(s) for a new problem.

Classifier characterization represents the user's objective in developing an application, such as accuracy or computational efficiency. It can be assessed using performance evaluation metrics [14]. The research community has approached classifier characterization from both uni-metric and multi-metric perspectives, often referred to as meta-target. Problem characterization involves extracting the underlying data behaviors that reflect its unseen nature, measured through meta-features or meta-characteristics. Various types of meta-characteristics have been identified, including statistical, information theoretic, model-based, land-marking, and complexity [15, 16].

Recently, Q. Song et al. [17] has used a new dataset characterization method for computing datasets features and computed performance of seventeen classification algorithms over 84 UCI publically available datasets [18]. Mapping meta-characteristics and classifiers performance is the process of aligning each problem against the appropriate classifier. The objective of the process is to make algorithm selection problem as a machine learning problem where meta-characteristics form a feature-vector and label(s) of the classifier(s), with best performance, as the class label. Identification of the class-label is a challenging task and researchers have approached the issue using various approaches, such as multiple comparison method. As a result of these methods, some of the problems have more than one applicable classifiers as the class label. This makes the problem of algorithm selection is a single-class and multi-class problem and research community has approached them using single-label learning and multi-label learning. For learning association or mapping function between problems meta-characteristics and class label(s), researcher have used different approaches that can broadly be categorized – define categories - as decision tree-based learner (e.g., C4.5 [19]), rule-based learner [20], linear regression [21] and instance-based learner (e.g., k-NN [17, 22]). Finally, for the selection of appropriate classifier(s) on the fly, researchers have used different approaches.

In the realm of classification algorithm selection, meta-learning has been employed extensively. EFFECT [23] offers an interpretable meta-learning framework to explain recommendation results and algorithm performance in specific business scenarios. AMLBID [24] automates algorithm selection for analyzing industrial data, while [15, 16] utilizes meta-learning to assess data characteristics and recommend algorithms across various datasets.

Summary of the existing research work on automatic algorithm selection reveals several limitations. One significant challenge is the computational complexity involved in empirically assessing classifiers across multiple datasets and employing various performance metrics. This complexity can make the algorithm selection process time-consuming and resource-intensive, particularly for large datasets. Another limitation stems from the scarcity of sufficient training data required for effective machine learning, including the representation of diverse datasets and their associated meta-features. Additionally, the unpredictable performance of classification algorithms on different datasets adds an element of uncertainty to the selection process. Furthermore, the variability in dataset characteristics, such as data distribution and feature space complexity, makes finding a universally optimal algorithm difficult. Lastly, determining appropriate performance metrics for evaluation can be a challenging task, as different datasets may necessitate different metrics for meaningful assessment. These limitations underscore the need for innovative approaches like meta-learning to address the complexities inherent in algorithm selection for data mining.

To address the issues highlighted earlier on the subject problem of algorithm selection, this research paper proposes an integrated framework called eML-CBR that combines edge machine learning (ML) and case-based reasoning (CBR) methodologies. It adapts a multi-level, multi-view CBR methodology that considers data from diverse feature dimensions and algorithms from multiple performance aspects. The computation is distributed to both cloud edges and centralized nodes. On the edge, the first-level reasoning employs machine learning methods to recommend a family of classification algorithms, while at the second level, it recommends a list of the top-k algorithms within that family. This list is further refined by an algorithm conflict resolver module.

Key contributions of the research work are enlisted as follows.

The eML-CBR boasts several key contributions that are set to transform the field:

  • Pioneering the design and development of an integrated algorithm selection framework that seamlessly integrates edge-ML with CBR methodology.

  • Thoroughly exploring and extracting multi-view meta-features from datasets to provide a deeper insight into their intricacies.

  • Devising a new multi-objective criteria with weighted summation to accurately assess algorithm performances.

  • Enhancing algorithm recommendation significantly through the integration of the algorithm conflict resolver (ACR).

  • Mitigating the persistent challenge of data scarcity by introducing the concept of incremental evolutionary learning using the CBR methodology.

  • Releasing the CBR module as a stand-alone open-source software to the research community in the field.

The remainder of this paper is structured as follows: Section "Edge ML and Case-based Reasoning for Algorithm Selection" provides an in-depth overview of Edge ML and Case-based Reasoning for Algorithm Selection. In Section "Multi-views Case-based Reasoning for Algorithm Selection", we delve into the Multi-views Case-based Reasoning methodology applied to Algorithm Selection. Section "Implementation, experiments and evaluation" is dedicated to the implementation, experiments, and evaluation of our proposed methodology. Finally, in Section "Conclusion and Future Work", we conclude our work and outline potential avenues for future research.

Edge ML and case-based reasoning for algorithm selection

Definition of algorithm selection problem

Based on the Rice model [25], the algorithm selection problem can be defined as follows: given a problem p as input and a set of candidate machine learning algorithms A that can learn the same p with varying performance levels Y, the objective of an algorithm selection method is to find and select an algorithm a A that can learn p with the best possible performance. Now, we introduce the notation that will be used throughout this paper. Let P denote a set of historical problems (i.e., classification datasets in this case) with F as the feature vector representing the meta-features of each problem p P, and let A be a set of classification algorithms capable of solving P with some performance level Y.

Algorithm selection as an edge ML problem

Edge machine learning, abbreviated as edge ML, involves executing machine learning algorithms on edge devices located close to the data source rather than on remote cloud servers, enabling faster decision-making. In the proposed study, the algorithm selection problem is divided into two levels, taking place at both local and remote nodes within the distributed system. At the first level, we select the appropriate family of ML algorithms, including probabilistic, decision tree, function-based, lazy learners, meta-learners, and rules-based families. At the second level, we choose the appropriate ML algorithm on the remote cloud server. In both levels, we employ a case-based reasoning methodology, rendering the algorithm selection process a multi-level reasoning process.

Overview of the proposed Edge ML computing environment is shown in Fig. 1.

Fig. 1
figure 1

Overview of the proposed edge ML computing environment

The proposed methodology for edge ML computing in ML algorithm selection is supported by a hierarchical machine learning approach. In this approach, the first layer of the hierarchy recommends a family of algorithms, and the second layer of the hierarchy selects the appropriate algorithm within the chosen family. In this framework, both the edge device and the cloud server act as computing nodes, both locally and remotely. The paper will now focus on the case-based reasoning methodology used on both the edge and cloud devices.

Multi-views case-based reasoning for algorithm selection

As previously discussed, algorithm selection is an edge ML computing problem wherein case-based reasoning (CBR) is employed at both the local device and the centralized cloud server. This process encompasses three key modules: (i) datasets and algorithms characterization, (ii) model creation, and (iii) algorithm recommendation. CBR utilizes diverse families of data characteristics, rendering it a multi-view CBR approach. An architectural depiction of the proposed framework, inspired by the Rice framework [12], is presented in Fig. 2.

Fig. 2
figure 2

Case-based reasoning for algorithm selection

The subsequent section explains the workings of each module, with its technical details and the methods used.

Datasets and Algorithms Characterization (DAC)

The datasets and algorithms characterization (DAC) module plays a crucial role by extracting meta-features (denoted as f F) for each dataset (d) and aligning them with the most suitable algorithm (a A). This alignment process prepares instances for the training dataset, Case-Base in this case, facilitating the subsequent stages of the algorithm selection.

Multi-view meta-features extraction

In this study, we propose a new approach that involves the extraction of multi-view meta-features from the archived datasets, with each meta-feature belonging to a unique feature family. These families include general, simple statistical, advanced statistical, and information theoretic. The rationales behind the selection of these four families of meta-feature come from the fact that they represent global view for different classification data types and can be computed in real-time, supporting practical data mining applications, such as algorithm selection. The general view includes simple measurements, computed for the entire dataset, offering a global perspective using aggregated values. The basic statistical view encompasses measurements related to dataset dimensionality and attribute ratios. Advanced statistical features provide valuable insights about a dataset, such as the distribution of numeric attributes, the balance between positive and negative instances, the accuracy of default classification, the presence of incomplete data, and the distinct values in nominal attributes. By examining these advanced statistical characteristics, analysts can gain deeper knowledge and make informed decisions when working with datasets, enabling more effective data mining and analysis for best algorithm selection. Similarly, as each dataset contains both continuous and symbolic data features. To enhance algorithm selection, we extract symbolic meta-features known as information-theoretic features. These features, based on entropy, measure data purity relative to class labels. This approach is unique as compared to single-view approaches [26, 17] because it extracts diverse or multi-view meta-features from the datasets. This diverse set of meta-features enables the utilization of a multi-view learning approach in the classifier selection process. This approach is aligned with the fundamental concept of approaching the algorithm selection problem from multiple aspects, considering various factors and perspectives. This concept is illustrated in Fig. 3 and the characteristics considered are listed in Tables 1, 2, 3 and 4.

Fig. 3
figure 3

Enriching Algorithm Selection with Multi-View Dataset Representation

Table 1 General characteristics of dataset
Table 2 Basic statistical characteristics
Table 3 Advanced statistical characteristics
Table 4 Information theoretic characteristics

These meta-features are computed using [14, 27] available on GitHub.

Machine learning algorithms characterization

In this section, we characterize the performance of the decision tree algorithms listed in Table 5. Our objective is to identify the best-performing algorithms for the dataset at hand. We assess performance using a multifaceted evaluation criteria, considering the weighted average F-score and standard deviation. Results for these criteria are obtained using the Weka experimenter environment [3], employing the default parameters for the algorithms and standardized 10x10-fold cross-validation. To determine the best-performing algorithm (where a A) for a specific dataset (where (p P)), we apply PerformanceEval algorithm 1, as illustrated in Fig. 4.

Table 5 Decision tree algorithms in the Weka environment
Fig. 4
figure 4

Multi-criteria analysis for algorithm performance evaluation

Algorithm 1.
figure a

PerformanceEval: A comprehensive performance evaluation approach

Model creation (Case-Base Creation)

Once the datasets and classifiers are characterized, as described in previous sections, the best classifier(s) are assigned to the set of meta-features (e.g., F→bestClassifier(s)) using a simple alignment function to produce a single training dataset or Case-Base. The mapping of features against classifiers forms resolved cases for a CBR process.

The rationale behind using case-base creation is that different machine learning algorithms can also be used as predictive models, but the small number of training instances makes it difficult for conventional classifiers. To overcome this issue, we adapt the CBR model with some enhancements in the case base creation and retrieval phases.

For case representation, we adapt propositional case representation schemes [28], where a case is represented as a proposition containing key-value pair format. In the proposed algorithm selection scenario, a case contains data characteristics (i.e., extracted meta-features) as problem description and the best algorithm name as solution description. A generic structure of the proposed Case-Base, using feature-vector representation, is shown in Table 6.

Table 6 Representation of algorithm selection case base

The meta-features 1-29, shown in Table 6, are multiple views of data characteristics given in Tables 1, 2, 3 and 4. Similarly, the best-classifier (last column) is the label of one or more, best decision tree classifiers, from Table 5. The size of the case-base is 100 resolved cases, authored from 100 freely available classification datasets collected from UCI [29] and OpenML [30] machine learning repositories. A subset of datasets used for case-base creation is provided in Table 7 along with brief descriptions of the general characteristics of the datasets

Table 7 Subset of datasets for case-base creation with brief descriptions

In the proposed Case-Base, all the features are real numbers, so their data types are set to numeric.

Algorithm recommendation using CBR

Algorithm recommendation using CBR represents the online phase of algorithm selection, suggesting the top-k suitable algorithms to end users through the CBR cycle. The rationale for using CBR methodology for algorithm recommendation, as opposed to state-of-the-art machine learning methods, is that algorithm recommendation is an estimation problem, in which CBR excels, rather than a discrete solution produced by conventional machine learning methods. In the study, we enhance the classical CBR with the use of accurate similarity functions, multi-view features extraction, and incremental learning to improve the algorithms selection process. Methodology of CBR cycle follows the steps given below.

New case preparation - multi-view meta-features extraction

A Query Case (Q) is prepared from a given new dataset by extracting multi-view features with the help of a meta-feature extractor to form a feature vector. For this purpose, the same dataset characterization mechanism described in the offline phase is used. As different families of the features are extracted, a multi-view reasoning process is initiated.

CBR cycle – 4R

The CBR cycle comprises retrieve, reuse, revise and retain steps that are performed in sequential order as explained below.

In the retrieve step, similarity functions are defined to match the meta-features of the query dataset Q against the resolved cases R in the Case-Base, retrieving the top-k cases as suggested solutions. For individual meta-feature similarity matching, we use the local similarity function, as shown in Eq. 1. For matching a new case with the existing resolved cases in the Case-Base, we employ a global similarity function, as shown in Eq. 2

$${\mathrm{Sim}}_{\mathrm{l}}\left({\mathrm{nC}}_{{\mathrm{mf}}_{\mathrm{i}}},{\mathrm{ eC}}_{{\mathrm{mf}}_{\mathrm{i}}}\right)={\mathrm{idealSim}}_{{\mathrm{mf}}_{\mathrm{i}}}-\frac{{\mathrm{d}}_{\mathrm{l}}\left({\mathrm{nC}}_{{\mathrm{mf}}_{\mathrm{i}}},{\mathrm{ eC}}_{{\mathrm{mf}}_{\mathrm{i}}}\right)}{{\mathrm{d}}_{\mathrm{g}}\left({\mathrm{Max}}_{{\mathrm{mf}}_{\mathrm{i}}},{\mathrm{ Min}}_{{\mathrm{mf}}_{\mathrm{i}}}\right)}$$
(1)

where, \({\mathbf{i}\mathbf{d}\mathbf{e}\mathbf{a}\mathbf{l}\mathbf{S}\mathbf{i}\mathbf{m}}_{{\mathrm{mf}}_{\mathbf{i}}}\)=1 & \({\mathbf{d}}_{\mathbf{g}}\left({\mathbf{M}\mathbf{a}\mathbf{x}}_{{\mathrm{mf}}_{\mathbf{i}}},{\mathbf{M}\mathbf{i}\mathbf{n}}_{{\mathrm{mf}}_{\mathbf{i}}}\right)\) is the global interval or range of the values of each continuous value meta-feature. Similarly, \({\mathbf{n}\mathbf{C}}_{{\mathrm{mf}}_{\mathbf{i}}}\) represents meta-feature of new case and \({\mathbf{e}\mathbf{C}}_{{\mathrm{mf}}_{\mathbf{i}}}\) represents meta-feature of existing cases.

$${\mathrm{Sim}}_{\mathrm{g}}\left(\mathrm{nC},\mathrm{ eC}\right)=\frac{{\propto }_{1}*{\mathrm{ Sim}}_{\mathrm{l}}\left({\mathrm{nC}}_{{\mathrm{mf}}_{\mathrm{i}}},{\mathrm{ eC}}_{{\mathrm{mf}}_{\mathrm{i}}}\right) +\dots +{\propto }_{\mathrm{n}}* {\mathrm{Sim}}_{\mathrm{l}}\left({\mathrm{nC}}_{{\mathrm{mf}}_{\mathrm{n}}},{\mathrm{ eC}}_{{\mathrm{mf}}_{\mathrm{n}}}\right)}{{\propto }_{1}+{\propto }_{2}+\dots +{\propto }_{\mathrm{n}}}$$
(2)

Where, \({\propto }_{{\varvec{i}}}\) is the weight value of each \({\mathbf{m}\mathbf{f}}_{\mathbf{i}}\) in the Case-Base and we assigned equal weight value to each meta-features, based on the assumption that all the 29 meta-features are equally important for selecting the best algorithm.

Reuse

In the reuse step, the solution part, i.e., the label of best algorithm, of the top-k similar cases are assigned to the problem description part of the new case as a suggested solution (recommended algorithm in this case).

This process of retrieve and reuse is algorithmically presented in Algorithm 2.

figure b

Algorithm 2. Top-k algorithm selection using case-based reasoning

Descriptions of each of the procedures, used in this algorithm 2, are given below.

CalculateSimilarityIntervals: This procedure loops through all meta-features, calculates interval value and defines weight for each of feature. The interval value is computed using \({\mathrm{d}}_{\mathrm{g}}\left({\mathrm{Max}}_{{\mathrm{mf}}_{\mathrm{i}}},{\mathrm{ Min}}_{{\mathrm{mf}}_{\mathrm{i}}}\right)\), while the weight assigned to each meta-feature is same, i.e., 1.

BuildNNConfig(I): This procedure performs the main task of finding nearest neighbor computation. The set of tasks performed using this procedure are: initialize NNConfig, set global similarity function as per Eq. 2, map a local similarity function with each feature as per Eq. 1, set weight for each feature, i.e., assign 1 to each feature in this case and return NNConfig.

evaluateSimilarity(\(R, Q, S\)): evaluates similarity of each \({c}_{i}\) where \({c}_{i}\) \(\in\) C against the queryCase Q using similarity function mapped in NN similarityConfig S, and returns a collection of retrievalResults RR (most similar cases).

selectTopK(RR, K): this procedure Selects top K most similar CBR cases from the collection of retrievalResult RR.

The final output of the retrieve and reuse steps, denoted as RR, consists of a list of the top-k (with k=3) cases that exhibit the highest similarity scores in comparison to the query case (Q). If RR yields the top-k algorithms with distinct Wgt.Avg.F-score values, the top-ranked one is recommended as the most suitable algorithm and assigned as the label to the feature vector as the class label. Otherwise, the revise step of the CBR is initiated to uniquely identify the most suitable algorithm.

Revise – algorithm conflict resolver (ACR)

In the revision step of the proposed CBR approach, the unique algorithm recommended in the reuse step is added as a new instance into the existing Case-Base. However, if more than one algorithm with similar or statistically insignificant differences in the similarity scores are recommended, then either any of the recommended algorithms is randomly selected as the final recommendation, or conflict resolution becomes necessary among the competing algorithms. To resolve this conflict, we propose a method known as the Algorithm Conflict Resolver (ACR). This method performs meta-reasoning at the meta-characteristics level of the classifiers (e.g., decision tree length, number of rules, depth, among others) rather than focusing on the data characteristics. The proposed ACR employs a multi-objective criteria with weighted summation, as presented in Eq. 3, which takes into account the comprehensibility characteristics of the classifiers listed in Table 8. This allows us to recommend the most comprehensible classifier.

Table 8 Characterization of decision tree classifiers
$${{\varvec{a}}}_{{\varvec{i}}}= \sum ({{\varvec{w}}}_{{\varvec{n}}{\varvec{j}}}* \mathbf{C}\left(\mathbf{i}, \mathbf{j}\right)$$
(3)

Where: i represents algorithm in A(algorithm space) and j represents criteria in C(criteria space)

  • \({{\varvec{a}}}_{{\varvec{i}}}\) is the weighted sum score for algorithm \(\mathbf{a}\in \mathbf{A}\)

  • wnj is the normalized weight assigned to criterion cj , calculated using Eq. 4.

  • C(i,j) is the performance or evaluation score of algorithm ai on criterion cj

$${{\varvec{w}}}_{{\varvec{n}}{\varvec{j}}}=\frac{({{\varvec{w}}}_{{\varvec{i}}}- {\varvec{m}}{\varvec{i}}{\varvec{n}}\left({\varvec{w}}\right)}{{\varvec{m}}{\varvec{a}}{\varvec{x}}({\varvec{w}}) - {\varvec{m}}{\varvec{i}}{\varvec{n}}({\varvec{w}})}$$
(4)

where: wi is the original weight assigned to criterion ci by the user, using the AHP process

  • min(w) is the minimum weight for all criteria

  • max(w) is the maximum weight for all criteria

Working of the ACR algorithm is shown in Algorithm 3.

figure c

Algorithm 3. Algorithms Conflict Resolver (ACR)

Given that conflict resolution is application-dependent, we employ a semi-automatic expert-based criteria weighting approach, adopting the analytical hierarchy process (AHP) pairwise comparison processes. The algorithm used for AHP-based criteria weighting is presented in Algorithm 4.

figure d

Algorithm 4. Weight calculation using Analytic Hierarchy Process (AHP)

In the retention step, Q, represented by a meta-features containing features from all the families, along with the final recommended algorithm either from the reuse or revise steps of CBR, is incorporated into the Case-Base. This process incrementally enlarges the size of the Case-Base and improves the quality of the CBR for recommending the most suitable algorithm. This demonstrates the proposed algorithm recommendation model as an instance of an incremental and evolutionary learning process.

Implementation, experiments and evaluation

This section describes the implementation of the proposed system and presents experiments conducted to validate the methodology.

Implementation

The proposed multi-view case-based reasoning methodology for accurate classifier selection has been implemented in the Java environment as an open-source application. The key components of the methodology include the extraction of multiple categories of meta-features from the dataset and meta-reasoning by utilizing the Case-Base. These meta-features are computed using the OpenML [30] data characteristics (DC) open-source library, which is freely available on GitHub [27] . For the CBR-based reasoning process, we utilized jColibri 2.0, a case-based reasoning framework [31], where we implemented our custom case similarity functions to ensure accurate matching of existing cases. The resulting CBR-based incremental learning and reasoning system has been released as an open-source application on GitHub, featuring an extensible and adaptable implementation strategy [32], enabling end-users to utilize it for selecting a suitable decision tree classifier for their application's data. The interface of the CBR application for meta-feature extraction is displayed in Fig. 5.

Fig. 5
figure 5

Interface of the CBR Application for Meta-Features Extraction

The process of multi-view reasoning using CBR is shown in Fig. 6.

Fig. 6
figure 6

Case-Based Reasoning (CBR) for optimal machine learning algorithm selection

Experimental setup

Classifiers under consideration

We conducted experiments on the nine most commonly used multi-class classification algorithms, as listed in Table 5. These algorithms are implemented in the Weka machine learning library [3]. We utilized them with their default parameters

Training and testing datasets

To train and test the proposed methodology, two disjoint sets of datasets are utilized. For training, the CBR model, i.e., Case-Base, is constructed using 100 multi-class classification datasets, as shown in Table 9. These datasets are sourced from the UCI machine learning repository [29] and OpenML repositories [30]. Similarly, a separate set of 52 datasets is employed for testing the methodology. All the classifiers listed in Table 5 are evaluated for each of the test datasets using the method implemented and described in Fig. 6. Subsequently, the best classifiers are determined to assess the performance of the proposed methodology.

Table 9 Training datasets for case-base creation with brief descriptions

Evaluation methodology and criteria

To evaluate the accuracy of the proposed method, the following steps are used:

  • For each given dataset (test datasets in this case), meta-features are extracted using the developed meta-feature extractor to prepare a Query Case (Q).

  • The CBR methodology is used to recommend the top-k (k=3) best classifiers for each Q.

  • Measure the similarity between the recommended top-k (k=3) classifiers and the actual classifiers of those datasets. If the recommended classifier for a given dataset belongs to any of the top-k (k=3) classifiers, the recommendation is declared as correct; otherwise, it is considered incorrect.

Experiments and results analysis

When experiments were conducted on the test Case-Base consisting of 52 datasets, shown in Table 10, and the results were evaluated, 48 out of the 52 datasets received accurate classifier recommendations. As a result, the overall accuracy of the proposed methodology was determined to be 94% for appropriate algorithm recommendations with a top-k value of 3 algorithms.

Table 10 Performance results of case-based reasoning for algorithm selection
$$\mathrm{Overall\;Accuracy}=\frac{\mathrm{Number\;of\;Accurate\;Recommendations}\times 100}{\mathrm{Total\;Number\;of\; Datasets}}=\frac{48\times 100}{52}=94\%$$

The results in Table 10 show that only 3 out of 52 classifiers, namely Dataset 30, Dataset 38, and Dataset 48, were not recommended with correct classifiers. Similarly, for a top-k value of 1, the proposed methodology correctly recommended accurate classifiers for 30 datasets, resulting in an accuracy of 57.6% (calculated as 30 * 100 / 52). To analyze the results for a top-k value of 2, the methodology correctly recommended classifiers for 38 datasets and achieved an accuracy of 73% (calculated as 38 * 100 / 52).

Conclusion and future work

This research centers on the automatic selection of machine learning (ML) algorithms through the integration of edge machine learning (edge ML) and a case-based reasoning (CBR) methodology. Edge ML enhances the capabilities of CBR by facilitating the recommendation of ML algorithm families at edge nodes and the selection of the actual algorithm at remote cloud servers. This integration serves to enhance system performance by significantly expediting algorithm recommendations while minimizing associated costs.

In the future, our research endeavors will expand upon the current study. This expansion will encompass the practical implementation of edge ML computing, a facet not covered in this research. Additionally, we aim to augment the case-based approach by introducing more meta-characteristics and incorporating a broader array of algorithm families. These enhancements are designed to transform our platform into a universal and versatile tool for machine learning practitioners. Through these developments, we seek to provide a comprehensive and adaptable solution to the challenges of ML algorithm selection.

Availability of data and materials

Data can be made available to the researchers on their personal requests to the corresponding author.

References

  1. Koerich, A.L. Improving classification performance using metaclasses. in SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme-System Security and Assurance (Cat. No. 03CH37483). 2003. IEEE.

  2. Tavakoli, S., Signal classification using weighted orthogonal regression method. arXiv preprint arXiv:2010.05979, 2020.

  3. Bouckaert RR et al (2010) WEKA–-experiences with a java open-source project. J Mach Learn Res 11:2533–2541

    MathSciNet  MATH  Google Scholar 

  4. Jalernrat, S., Data Mining Using Decision Tree Algorithms. University of the Thai Chamber of Commerce Journal, 2013: p. 11-43.

  5. Engel, J., T. Erickson, and L. Martignon. Teaching about decision trees for classification problems. in IASE Satellite Meeting, https://iase-web.org/documents/papers/sat2019/IASE2019% 20Satellite% 20132_E NGEL. pdf. 2019.

  6. Géron, A., Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. 2022: " O'Reilly Media, Inc.".

  7. Ali R, Lee S, Chung TC (2017) Accurate multi-criteria decision making methodology for recommending machine learning algorithm. Expert Syst Appl 71:257–278

    Article  Google Scholar 

  8. Reif M et al (2014) Automatic classifier selection for non-experts. Pattern Anal Appl 17:83–96

    Article  MathSciNet  Google Scholar 

  9. Brodley, C.E. Addressing the selective superiority problem: Automatic algorithm/model class selection. in Proceedings of the Tenth International Conference on Machine Learning. 1993. Citeseer.

  10. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Article  Google Scholar 

  11. Aha, D.W. Generalizing from Case studies: A Case Study. in Ninth International Conference on Machine Learning. 1992. Citeseer.

  12. Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):1–25

    Article  Google Scholar 

  13. Monteiro JP et al (2021) Meta-learning and the new challenges of machine learning. Int J Intell Syst 36(11):6240–6272

    Article  Google Scholar 

  14. Ali, R., et al. A case-based meta-learning and reasoning framework for classifiers selection. in Proceedings of the 12th international conference on ubiquitous information management and communication. 2018.

  15. Bernado-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space. Evol Comput IEEE Trans 9(1):82–104

    Article  Google Scholar 

  16. Pise N, Kulkarni P. Algorithm selection for classification problems. in 2016 SAI Computing Conference (SAI). 2016. IEEE.

  17. Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognit 45(7):2672–2689

    Article  Google Scholar 

  18. Bache, K. and M. Lichman, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. 2013, Irvine, CA: University of California, School of Information and Computer Science.

  19. Brazdil P, Gama J, Henery B (1994) Characterizing the applicability of classification algorithms using meta-level learning. in European Conference on Machine Learning: ECML-94. Springer

    Google Scholar 

  20. Ali S, Smith KA (2006) On learning algorithm selection for classification. Applied Soft Computing 6(2):119–138

    Article  Google Scholar 

  21. Gama J, Brazdil P (1995) Characterization of classification algorithms. Progress in Artificial Intelligence. Springer, pp 189–200

    Chapter  Google Scholar 

  22. Brazdil PB, Soares C, Da Costa JP (2003) Ranking learning algorithms: using IBL and meta-learning on accuracy and time results. Mach Learn 50(3):251–277

    Article  MATH  Google Scholar 

  23. Shao X et al (2023) EFFECT: Explainable framework for meta-learning in automatic classification algorithm selection. Inform Sci 622:211–234

    Article  Google Scholar 

  24. Garouani M et al (2022) Using meta-learning for automated algorithms selection and configuration: an experimental framework for industrial big data. J Big Data 9(1):57

    Article  Google Scholar 

  25. Rice JR (1976) The algorithm selection problem. Adv Comput 15:65–118

    Article  Google Scholar 

  26. Wang G et al (2014) A generic multilabel learning-based classification algorithm recommendation method. ACM Trans Knowl Discov Data 9(1):7

    Article  Google Scholar 

  27. Sun, Q., Integrated Fantail library. 2014, GitHub.

  28. Sarkheyli A, Sa’ffker D (2015) Case indexing in case-based reasoning by applying situation operator model as knowledge representation model. IFAC-PapersOnLine 48(1):81–86

    Article  Google Scholar 

  29. Lichman, M., UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 2013.

  30. Van Rijn JN et al (2013) OpenML: A collaborative science platform. Machine learning and knowledge discovery in databases. Springer, pp 645–649

    Google Scholar 

  31. Bello-Tomás JJ, González-Calero PA, Díaz-Agudo BJ (2004) An object-oriented framework for building cbr systems. Advances in case-based reasoning. Springer, pp 32–46

    Chapter  Google Scholar 

  32. Rahman, A. and S. Muhammad, Automatic-algorithm-selector. 2016, GitHub.

Download references

Funding

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) IITP-2017-0-00655, Lean UX core technology and platform for any digital artifacts UX evaluation.

Author information

Authors and Affiliations

Authors

Contributions

R.A. proposed the idea, formulated the problem, designed the algorithms and wrote the main manuscript. M.S.H.Z implemented the algorithm selection platform as an open source software. A.M.K. rigorously reviewed the paper before and after the revision and J.H. prepared the figures, reviewed the manuscript, interpreted the results and provided financial support.

Corresponding author

Correspondence to Jamil Hussain.

Ethics declarations

Ethics approval and consent to participate

“Not applicable”.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, R., Zada, M.S.H., Khatak, A.M. et al. Algorithm selection using edge ML and case-based reasoning. J Cloud Comp 12, 162 (2023). https://doi.org/10.1186/s13677-023-00542-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-023-00542-3

Keywords