Fault localization by analyzing failure propagation with samples in cloud computing environment

With the development of information technology such as cloud computing, IoT, etc, software becomes the infrastructure. On the one hand, it is critical to ensure the reliability of software, on the other, sample code can be mined from open source software to provide reference for automatic debugging. Most of existing automated debugging researches are based on the assumption that defect programs are nearly correct, therefore they can successfully pass some test cases and fail to execute others. However, this assumption sometimes does not hold. In view of the fact that a programs may fail for all the given test cases in the test suite, but there are a large number of examples available for reference, a sample based fault localization method is studied. A fault localization method by analyzing the context of failure propagation is proposed, which locates suspicious statements by identifying the execution status and structural semantics differences between the defective program and sample program. Through the interactive analysis of value sequences and structure semantics, the influence of code variations and failure propagation is reduced. The experimental results have shown that the method can effectively locate suspicious statements and provide assistance for defect repair when there are enough sample programs.


Introduction
Software becomes the infrastructure of human society. For example, software defines computing, which gives birth to cloud computing. The computing resources can be configured arbitrarily, and the network can be controlled with little management work. All hardware products in the IoT (Internet of Things) era have the characteristics of intelligence and networking. Each product is an intelligent terminal, and the other end of all products must be connected to cloud computing, so these products are all defined by software. All edge devices have broader value because of the definition of software. The intelligent identification, location, tracking, monitoring and management of items can not be separated *Correspondence: wangtiantian@hit.edu.cn 1 School of Computer Science and Technology,Harbin Institute of Technology, XiDazhi Street, 150001 Harbin, China Full list of author information is available at the end of the article from software implementation [1][2][3][4][5]. The development of cloud computing, big data and artificial intelligence requires strong system software support [6][7][8][9]. However, bugs often appears in software, which reduce the safety, security, reliability, etc. of the system. Automatic debugging is helpful to improve the efficiency of software development and maintenance as well as the quality of software. Therefore, a lot of researches have been carried out on automatic fault localization [10][11][12] and repair [13][14][15] for industrial software. Especially, with the development of open source software, the research of mining open source code to aid automatic debugging has become a research hotspot [16][17][18].
In the field of computer-aided education, the support of automatic debugging technology is also needed. For massive open online courses with a large number of students, it is difficult for teachers to interact with each student. This brings new challenges to information technology: (2020) 9: 17 Page 2 of 13 how to automatically evaluate students' learning effects, provide adequate feedback, and interact with students?. For the practical programming course, the solution of this problem is particularly important. In the process of programming exercises or examinations, a large number of programs implementing the same task can be obtained. Many of these programs are correct and can be used as potential sample programs to guide the fault localization and repair of other defective programs. This paper focuses on accurately localization of suspicious statements, when sufficient sample programs are provided. The main contributions are as follows. A fault localization method by analyzing the context of failure propagation is proposed, which locates suspicious statements by identifying the execution states and structural semantics differences between the defective program and the sample program. It identifies the equivalent and different statements between the defective program and the sample program, by interactive analysis of execution value sequences and structural semantics. Execution value sequences analysis helps to reduce the impact of code variation on program difference analysis, and structural semantics analysis helps to reduce the impact of failure propagation. These two operations cooperate with each other and help to improve the effectiveness of fault localization. When sufficient sample programs are provided, this method can accurately identify the differences between the defective program and the sample program. Even if the defective programs cannot pass any test case in the given test suite, it can effectively locate the suspicious statements in the defective programs and assist in understanding the causes of failure and repairing defects.
The rest of our paper is organized as follows: the "Related work" section discusses related work of fault localization and automatic debugging. In "Problems of code variation and failure propagation" section, the phenomenon of code variation and failure propagation and their influence on fault localization is analyzed. "Research framework" section presents the overview of the proposed approach. In "Clustering similar programs with code variations" section, the clustering method is described in detail. In "Fault localization by analyzing failure propagation" section, the fault localization method is detailed, whereas "Experimental analysis" section contains the experimental results and evaluation. Finally, conclusions and future work are highlighted in the "Conclusion" section.

Fault localization
Software fault localization can help developers improve the reliability and safety of information systems, such as cloud computing systems, etc. So many approaches have been proposed. These approaches can be divided into program spectra-based, program slicing, state-based, model-based, program invariant based, mutation analysis, and machine learning-based approach [19][20][21][22][23] etc. These methods have their own advantages and disadvantages [10,11].
Among them, the program spectra-based method has been widely studied because of its low computational complexity and simple implementation. Program spectrabased methods locate suspicious code by comparing the program spectrum of failure execution and successful execution. Predefined measurement formulas are used to calculate the suspiciousness of each program element (e.g. statement), and the program elements are ranked according to the suspiciousness. The main difference of various program spectra-based methods is that the measurement formulas used are different. Some formulas are equivalent in terms of localization effectiveness [24][25][26].
Empirical studies have been carried out to evaluate the effectiveness of fault localization tools [27][28][29][30][31][32][33], especially the spectra-based methods. Although the conclusion are different from different researches because of different study subjects, some common observations can be got from these researches. To sum up, in order to be welladopted by practitioners, the following future work of fault localization is needed.
• Fault localization techniques need to be more trustworthy, scalable, able to provide insightful rationales, and integrated to popular IDEs [29]. • In the researches, fault localization and bug fixing are two separate tasks, however, in practice, they are tightly coupled and there is no clear boundary between them [27,33]. • Browsing the context of failure propagation and providing program comprehension is necessary and helpful in debugging [27,30]. • Existing work are limited in localizing missing code, however a tool that can recommend additional code to insert maybe helpful in the debugging process [28,31]. • Existing bug isolation methods are limited in localizing multiple faults, as mis-grouping usually occurs due to the clustering algorithm may not able to divide failed test cases accurately [31,32].
Besides these observations, it is particularly important to note that except for the program slicing method, most of existing approaches perform fault localization by comparing the execution information of successful test cases and failure test cases [34,35]. However, defective programs may fail for all the test cases in the given test suite. We call such programs "completely failed programs". In this case, these fault localization methods will not work. Therefore, it is necessary to study the scenarios without successful test cases in the given test suite.

Automatic program repair
It is difficult to obtain a complete formal specification of requirements for industrial software. Therefore, in software fault localization and repair, the execution information of test cases is usually used to assist the analysis. However, only using testing results as the criteria of automatic software repair may cause the overfitting problem. Even if a patched program passes all tests, it may still be incorrect [16,17]. Therefore other information is needed to complement the test suite. Existing work has shown that example programs mining from open source software, can provide useful information for the generation of patches [18]. Yi et al. studied the performance of current popular industrial software repair tools, GenProg [36], AE [37], Prophet [38] and Angellix [39], in guiding students to debug programs [40]. All the four tools adopt a test-driven approach, that is, if the patch passes all test cases in the test suite, it is considered a successful fix. For example, GenProg first uses program spectra-based method to locate suspicious statements by comparing the statements of successful test cases with those of failed test cases, and assigns suspicious statements a higher probability of selection to be modified. Then genetic programming algorithm is used to transfer correct statements from other locations of the program to be repaired. After continuous evolution, patches that can pass all tests are searched. The results have shown that the repair rate of these tools is low. One reason is that student programs usually have multiple bugs, which require complex modifications, and often need to introduce new program logic when making modifications. However, the studied tools are difficult to repair missing statements and complex defects with dependencies. Another reason is that these tools use the program spectrabased fault localization method to identify potential modification locations. The program spectra-based method locates suspicious statements by comparing the successful execution and failure execution, while the students' programs may fail for all the given test cases in the test suite. In this case, the spectra-based method can not be applied. The inaccuracy of fault localization results will directly reduce the effectiveness of program repair.
The enlightenment of the above researches to this paper is as follows.
(1) It is necessary to further study the fault localization method under the scenario of complex defects which lead to the failure of all the given test cases.
(2) In addition to test cases, sample programs can also be mined from a large number of correct programs to provide auxiliary information for fault localization.
This paper focuses on the sample based fault localization method.

Problems of code variation and failure propagation
This paper studies how to locate suspicious statements in defective programs by analyzing the differences of execution state and structure semantics between the defective program and the sample program. However, code variations and failure propagation bring difficulties to program difference analysis and reduce the effectiveness of fault localization. This section takes the source code in Fig. 1 as an example to analyze the problem. The goal of program codes A, B and C in the figure is to use iterative algorithm to achieve the desired function of computing base exp . A and B are correct programs, C is a defective program, lacking the initialization statement of variable r. (

1)Code variation
In addition to the identical representation, there may be two other situations for functionally equivalent programs: one is that they adopt completely different implementations (e.g. different algorithms and data structures); the other is that they adopt the same algorithms and data structures but have different syntactic representations, such as different variable names, control structures, expression forms, statements order, etc. This phenomenon is called code variation [41].
Code variation makes it difficult to identify differences between the defective program and the sample program. For example, in Fig. 1, if the for statement in program A and the while statement in program B can not be identified as equivalent code, when program A is used as the sample program to match with program B, the two loop structures will be identified as difference statements, which will lead to fault location misdetection. In addition, program A and program B have different orders of equivalent statements and variable names. Failure to accurately identify these code variations can also lead to false positives. So, how to recognize code that syntactically different but functionally equivalent, is a key problem to be solved.
There are two advantages of identifying equivalent program with code variations. On the one hand, it can be used to reduce the number of sample programs that need to be provided. For example, only one sample program is needed for program A, program B and program C. On the other hand, it can improve the accuracy of difference analysis between the defective program and the sample program, thereby improving the effectiveness of fault localization.
(2)Failure propagation The bottom of Fig. 1 shows the sequence of execution values for each variable when fact (3, 4) is called.
Definition (Execution value sequence): Given a test case set T, the sequence of execution values of variable var assigned at assignment statement s i in program P is as follows:

value(s i , var, t) denotes the values of variable var when executing statement s i with test case t.
The values of the variable at each assignment statement, got by executing all test cases, constitutes the execution value sequence of the variable. Execution value sequences reflect how the program defines and uses variables, so they imply the execution semantics of the program. Variables that achieve the same function usually have the same value sequence in the execution. These value sequences are not affected by code variations. Especially, the influence of statement order and variable name variation on identifying equivalent codes can be eliminated, by matching execution value sequences. For example, because the variables j in program C and i in program A have the same value sequence for the same test case, so (j, i) is a matching variable pair. According to the variable pair (j, i), the statements "j = j + 1;" in C and "i++" in A can be matched accurately.
However, unfortunately failure propagation may cause equivalent variables to fail to obtain the same sequence of execution values. For example, due to the lack of initialization statement for variable r in defective program C, the value of r is random number, which is an error state. As the program is executed, the error state is propagated, resulting in r at "r = r * base;" getting a error value sequence. As a result, the statement cannot match the "r * = base;" in program A, but in fact the two statements are equivalent. To sum up, in the case of failure propagation, identifying differences between the defective program and the sample program only by matching the value sequences will also lead to false detection of fault location. Therefore, how to reduce the impact of failure propagation on program difference analysis, is also a key problem.

Research framework
Based on the above analysis, a fault localization method by analyzing failure propagation with samples is proposed. The research framework is shown in Fig. 2. The inputs include a defective program, a program set implementing the same task, and a set of test cases. The output is the bug context to provide assistance in understanding and repairing the defects.
(1) In order to identify new reference programs from large scale program sets, a program clustering method based on structural metrics and execution value sequences is proposed. The main steps are as follows.
Step 1: test the programs and collect execution information.
The programs are executed with test cases and the output results are collected. Compare the actual running results of the program with the expected output results of the test cases. If a program pass all the test cases, it is considered as potentially functionally correct and used as a reference program. Follow-up cluster analysis is performed on such programs.
Step2: perform program clustering based on structural metrics to quickly identify programs with similar structure.
Programs with similar structure are divided into the same cluster, while programs with different structure are divided into different clusters.
Step 3: perform program clustering based on the execution value sequences.
Programs are instrumented based on abstract syntax tree to print the values for each variable. The instrumented programs are executed with test cases, and the execution value sequences are collected. Programs that implement the same algorithm are divided into the same cluster, programs of different algorithms can be divided into different clusters. Furthermore, a template program can be selected from each cluster to form a template set.
(2)Select the sample program to be matched with the defective program.
Computing the similarity of structure metrics and execution value sequences of defective program and template programs, selecting the template program which is most similar to defective program as an sample program for subsequent fault localization. The correct program which is has the highest similarity with the defective program is chosen as the sample program. This is because the more similar, the more likely the sample program adopts the similar algorithm with the defective program.
(3) In order to effectively locate complex defects, reduce the search space of suspicious statements and explain the causes of errors, a fault localization method by analyzing the context of failure propagation is proposed.
This method identifies the difference of execution states and structural semantics [42] between the defective program and the sample program, and locates suspicious statements even in the absence of successful test cases. It focus on solving the problem of code variation and failure propagation.
Firstly, in order to avoid the influence of code variations, especially the statement order variation (2020) 9:17 Page 6 of 13 and the variable name variation, the execution value sequences of variables in the defective program and the sample program are matched, to identify mapping variables and assignments between these two programs.
Then, in order to reduce the false positives of suspicious statements caused by failure propagation. The structural semantics of the programs is further analyzed based on control dependence trees [42], to identify possible matching variable pairs and statements.
Finally, the different statements identified in the process of structure semantics matching are identified as suspicious statements, and the possible mutation operations to repair the suspicious statements are identified, which helps to analyze the causes of failure, and provides a reference for further repair.
Because this method identifies suspicious statements based on structural semantics difference analysis, it can locate various types of defects, including the missing statements defect. It does not require successful test cases.
The fault localization method can also be integrated into the program repair framework based on genetic programming, which provides support for suspicious statement localization in the absence of successful test cases, and lays a foundation for repairing programs based on samples.

Clustering similar programs with code variations
This paper proposes two levels of program clustering method. Firstly, the method based on structure metrics is performed to quickly identify the programs with similar structures with lower complexity. On this basis, the execution value sequences of variables are further analyzed. By detecting the similarity of the execution sequences, it is possible to recognize functional equivalent programs more accurately, no matter whether the syntactic representations are different. The related definitions are as follows.
Definition (Metric vector): A metric vector is a point in Euclidean space, that is obtained by statistics of the size, structure, and complexity of the abstract syntax tree of the program.
For example, v(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10), v1: the number of nodes; v2: the number of operators; v3: the number of assigned nodes; v4: the number of loop structures; v5: the number of selection structures; v6: the number of system function calls; v7: the number of special data types (such as pointers, arrays and structures); v8: the longest path length; v9: the length of recursive call path; v10: the longest nested path depth of circular.
Definition (Structure similarity) [42]: The structure similarity of two programs is the similarity of their metric vectors as shown in Equation 2.
In which, v and v are metric vectors. The structural similarity between programs is computed based on the structural metric vectors statistically calculated based on the abstract syntax trees of the two programs.
Definition (Similarity of two value sequences): Assuming that X i and Y j are two execution value sequences, their similarity is defined in Equation 3.
In which, LCS(X i , Y j ) represents the maximum number of common subsequences in the execution value sequences of X i and Y j , where the subsequences may be discontinuous.

then the execution value similarity of programs A and B is defined in Equation 4
.
The program clustering algorithm is shown in Algorithm 1. The input is a set of programs, and the output is the set of program clusters. Line 2 to 8 describe the clustering based on structure metrics. Line 5, in order to analyze the syntax and structural semantic of programs, each program is parsed and its abstract syntax tree is created. Line 6, the abstract syntax tree is traversed to statistically calculate the metric values, and a metric vector is generated for each program, to approximately evaluate the structure of the program. Line 8, the programs are clustered and grouped according to the values of Structure_sim (see Equation 2) with a hierarchical clustering method.
Line 9 to 20 describe the clustering based on execution value sequences. The result of the structure metrics based clustering is further analyzed.
Line 12, in order to collect the execution information of variables, each program is instrumented based on the abstract syntax tree. Probe statements are inserted into the syntax tree to output the line numbers and variable values of the executed statements.

Require:
TP: a set of template programs, T: A set of test cases Ensure: = ω 1 , ω 2 , ..., ω k :clustering set 1: Line 13 to 18, the instrumented program is executed with each test case, to collect the execution value sequences.
Line 19, the programs are clustered and grouped according to the execution value similarity with a hierarchical clustering method. The execution value similarity of two programs, as shown in Equation 3 and 4, is calculated by the longest common subsequence algorithm.

Fault localization by analyzing failure propagation
Execution value sequences analysis is helpful to identify the equivalent codes with code variations in the defective program and the sample program. Therefore, this paper identifies the equivalent variable pairs and equivalent assignment statements in two programs by matching execution value sequences. However, due to the failure propagation, the equivalent assignment statements may not obtain the matched execution value sequence. Therefore, it is necessary to further identify whether the unmatched value sequences is caused by the failure propagation based on the structural semantics, by matching assignment expressions and their context.

Definitions
The combination of dynamic execution value sequences analysis and static structural semantics analysis helps to identify the differences between the defective program and the sample program accurately and locate the suspicious statements in defective programs. The related definitions are as follows.
Definition Definition(Control dependence tree): program P's control dependence tree, defined as T = (V , E), is a directed, ordered tree, where V is a set of nodes, representing statements and predicates in the program, E is a set of control dependency edges and procedure call edges [39]. The following attributes are satisfied.
(1) T is a directed tree. The control dependence edge from node u to node v represents the control dependence, and v control depends on u. This means that whether node v executes or not depends on the value of predicate in node u.
(2) T is an ordered tree. If node v and node u have the same parent node, they are called brothers. If v is the left brother of u, v precedes u in the execution.
(3) The node is a triple (id, type, exp), where id is the line number of the statement in the program, type is the statement type, exp is the abstract syntax tree of the expression.
(4) The procedure call edge represents the call relationship between functions, and connects the function call node and the entry node of the function being called.
Based on the control dependence tree to perform structural semantic analysis, the variable pairs matched by structural semantics are further identified.

Definition(Variable pairs matched by structural semantics):
Given a test case set T, variable var b at assignment statement s b in program P b , variable var s at assignment statement s s in program P s , variable var b and var s are variable pairs matched by structural semantics, and s b and s s are matching assignment statements, if and only if in the control dependence tree of P b and P s , the syntax trees of s b and s s are completely matched, and both the control dependent paths of s b and s s have the same type of nodes.

Fault localization algorithm
Fault localization by analyzing the context of failure propagation is shown in Algorithm 1. The inputs include a defective program, the sample program, and a test case set. The goal is to identify the statements that are different between the defective program and the sample program as suspicious locations, return mutation operations that may be performed to repair defects, and map variables in the defective program and the sample program to assist program repair. Lines 1 to 5, instrument and execute the defective program and the sample program with each test case, so as to collect the execution value sequence of each assignment variable.
Lines 6 to 9, map the equivalent variables and assignment statements in the defective program and the sample program by matching the execution value sequences.
Line 10, create the control dependence tree for the defective program and the sample program respectively. Lines 11 to 15, and Lines 16 to 20, mark the equivalent assignment statements matched by the execution sequences in the two control dependence trees, respectively.
Line 21, The dynamic programming algorithm is used to find the maximum matching between two control dependence trees. On the one hand, the equivalent assignments and the variable pairs matched by structural semantics are identified to reduce the false positives of suspicious statements caused by failure propagation. The variable pairs are added to the variable mapping table. On the other hand, the different statements in the two programs are assigned higher suspiciousness, the mutation operations (insertion, deletion, replacement) that may repair the suspicious statements are identified and given higher mutation operation probability.
Finally, The variable mapping table, mutation operation probability, suspicious location and probability are output to assist understanding the causes of errors. Statement nodes with high probability are suspicious, and mutation operation with high probability is a more likely editing operation to repair the suspicious statement. The control dependence trees with the matching marks can also be output to assist the understanding of the failure context. Figure 3 is an example of the structural semantic analysis of the programs in Fig. 1. By matching the subtrees of the program control dependence trees, the similar nodes and different nodes of the defect program C and the sample program A are identified, so as to localize the nodes that may contain defects, identify the possible mutation operations to repair the defects, and narrow the search scope.

A sample
Each node in the node sequence of the defect program and the sample program is given a weight, the mismatched node is given a higher probability (value is 0.8), and the matched node is given a lower probability (value is 0.2). The unmatched nodes in the defective program are regarded as suspicious statement. The if v i .pos ∈ Map then 13: mark v i 14: end if 15: end for 16: for each node v j ∈ CDT s do 17: if (v i .pos, v j .pos) ∈ Map then 18: mark v j

Research questions
The research questions are as follows.
(1) RQ1: What is the effectiveness of the program clustering method? Ideally, programs with similar structure and semantics but possibly with code variations should be divided into the same cluster, and programs with different implementation logic or algorithms should be divided into different clusters.
(2) RQ2: What causes a program failed on all the test cases?
(3) RQ3: Can the proposed fault localization method effectively identify the differences between the defective programs and the sample program? Which factors affect the effectiveness of fault localization?  Table 1, when studying the performance of industrial software repair tools in debugging students' programs [40]. These programs are actually written by students in the course of introductory programming. Each experiment has 4-8 programming tasks. The principle of Yi et al. in choosing programming tasks is to cover as many syntactic structures and algorithms as possible. The correct versions and the buggy versions are distinguished by comparing the actual and expected results. In the benchmark, each programming task provides a correct template program and multiple program pairs (Pb, Pc). Pb is a defective program version, which failed on one or more test cases. Pc is a modified version of Pb, which is obtained by the same person who modifies Pb. Pc can pass all test cases in the test suite. There are 661 defective programs in the data set, all of which contain logic errors and no syntactic errors. There are two reasons for our experimenting on this benchmark. The first one is that it provides not only the buggy versions, but also the correct versions which helps in verify our sample based approach. The second one is that we want to test the capability of our method in deal with programs with complex bugs that cannot pass any test case, this benchmark provides such buggy versions.

Clustering effect analysis
To answer RQ1, the correct versions of each programming task were clustered to select template programs. Ideally, the programs in each cluster are structurally and semantically equivalent, and the programs in different clusters should adopt different implementation logic or algorithms. In order to improve the efficiency of code review, Gumtree [43] was used to generate syntax tree differences between two programs, and then these differences were manually examined to determine whether the two programs should belong to the same cluster.
Clustering result is related to distance threshold. The smaller the distance threshold is, the higher the similarity between programs in the same cluster will be, and the more clusters will be generated. It is expected that the programs in the same cluster should be as similar as possible to avoid filtering out too many template programs to provide sufficient template programs, so the distance threshold is set to 0.1.
The clustering effectiveness was evaluated by Purity and Entropy. Purity calculates the proportion of the correct clustered programs to the total number of programs. Entropy reflects the uncertainty of clustering results.
The average Purity is 0.95576, and the average Entropy is 0.15497. The clustering partition and the actual expectation partition are in good agreement. Therefore, the program clustering method is effective.

Cause analysis of complete failure
To answer RQ2, the benchmark provided by Yi et al. are further analyzed [40]. Among 661 defective programs, 206 defective programs failed to pass any test case in the test suite. Because there is not any successful test case information, these programs cannot analyzed by traditional fault localization methods, such as spectra-based method. This subsection focuses on the analysis of these programs, which are called "completely failed programs".
In the complete failed program set, 86 programs only contain one defect and 120 programs contain multiple defects. Figure 4 counts the number of program versions containing single defect and multiple defects in each programming task.
As shown in Table 2, the types of defects in completely failed programs are various. For example, an operator error, a logical expression error, or an output format error may cause the programs failed on the whole test case set, and the interaction of different kinds of defects may also lead to completely failure. In the completely failed program set, the single defective program mainly contains the following types of defects: input and output format, logical expression, arithmetic expression, variable initialization, variable assignment, array subscript, function parameter errors, etc. The defect types of multiple defect programs include not only the defect types in the single defective programs mentioned above, but also the large concepts missing, complex control structure errors, function invocation errors, statement order errors and so on.  A program may also contain many types of defects. These defects interact with each other, resulting in the failure of the program on the entire test case set.

Effectiveness analysis of fault localization
To answer RQ3, the effectiveness of fault localization on defective programs failed on all test cases in three cases was analyzed. Case1: Provide sufficient sample programs. Using the original data in the benchmark, that is, there is a corrected version of P c corresponding to each defective program P b .
Case 2: Perform clustering to select sample program. A program is randomly selected from each cluster and added to the template set. In this case, for a defective program P b , in the process of fault localization, the matching sample program may not be its initial corresponding version P c , but semantically equivalent to the P c , but there may be multiple correct versions such as expressions, variable names, statement order, etc.
Case 3: The sample program is inadequate. On the basis of Case2, the set of template programs is reduced. From the set of template programs generated by clustering algorithm, 1/2 and 1/3 number of template programs are randomly selected to form a new template set. Table 3 counts the number of false positives of fault locations in three cases. It is considered that the defects in the program are correctly localized only when the location of defects is identified as suspicious. If a correct statement is identified as suspicious statement, it is considered as a false positive.
The following conclusions can be drawn from Table 3 and Fig. 5.
• In the case of providing the corresponding correct program as the sample program, this method can accurately identify these defect locations as difference statements, and the corresponding correct statements in the example program as potential repair statements. Moreover, the effectiveness of fault localization is not limited by the number and type of defects. • When the sample program is selected by program clustering, the defect statement can still be accurately localized, which shows that the proposed clustering method combined with fault localization method can effectively localize various defects, including the missing code defect. • When the number of templates is small and the sample program provided is insufficient, the correct statement will be misreported as suspicious statement. The fewer sample programs available, the greater the probability of false positives. Therefore, the precondition for the effectiveness of this method is that there are enough sample programs available.

Conclusion
A fault localization method based on sample programs by analyzing failure propagation is proposed. The experimental results have shown that, when sufficient sample programs are provided, even if the defective program can not pass any test cases, the proposed methods of program clustering and fault localization working together can effectively locate the suspicious statements.
With the development of swarm intelligence and open source software, this method is expected to be applied to industrial software and learn to debugging from open source code, so as to improve the software reliability for information systems such as cloud computing and IoT systems. We will collect example in open source code and verify the effectiveness of our approach in industrial systems in the future work.