Revisiting the power of reinsertion for optimal targets of network attack

Fan, Changjun; Zeng, Li; Feng, Yanghe; Xiu, Baoxin; Huang, Jincai; Liu, Zhong

doi:10.1186/s13677-020-00169-8

Research
Open access
Published: 07 May 2020

Revisiting the power of reinsertion for optimal targets of network attack

Changjun Fan¹^na1,
Li Zeng¹,
Yanghe Feng¹,
Baoxin Xiu²,
Jincai Huang¹ &
…
Zhong Liu¹

Journal of Cloud Computing volume 9, Article number: 24 (2020) Cite this article

2347 Accesses
5 Citations
1 Altmetric
Metrics details

Abstract

Understanding and improving the robustness of networks has significant applications in various areas, such as bioinformatics, transportation, critical infrastructures, and social networks. Recently, there has been a large amount of work on network dismantling, which focuses on removing an optimal set of nodes to break the network into small components with sub-extensive sizes. However, in our experiments, we found these state-of-the-art methods, although seemingly different, utilize the same refinement technique, namely reinsertion, to improve the performance. Despite being mentioned with understatement, the technique essentially plays the key role in the final performance. Without reinsertion, the current best method would deteriorate worse than the simplest heuristic ones; while with reinsertion, even the random removal strategy achieves on par with the best results. As a consequence, we, for the first time, systematically revisit the power of reinsertion in network dismantling problems. We re-implemented and compared 10 heuristic and approximate competing methods on both synthetic networks generated by four classical network models, and 18 real-world networks which cover seven different domains with varying scales. The comprehensive ablation results show that: i) HBA (High Betweenness Adaption, no reinsertion) is the most effective network dismantling strategy, however, it can only be applicable in small scale networks; ii) HDA (High Degree Adaption, with reinsertion) achieves the best balance between effectiveness and efficiency; iii) The reinsertion techniques help improve the performance for most current methods; iv) The one, which adds back the node based on that it joins the clusters minimizing the multiply of both numbers and sizes, is the most effective reinsertion strategy for most methods. Our results can be a survey reference to help further understand the current methods and thereafter design the better ones.

Introduction

Many real-world systems can be described through the complex network perspective, including air transport [17], power grid [3], malicious organization [9, 10], Internet [3] or inter-personal networks [15]. One of the most important topics on these networks is about the robustness, i.e., the capacity to maintain the functionality after a major failure [29]. Since connectivity is the fundamental basic for almost all behaviors on networks, researches thus try to quantify how the connectivity is affected by node(or link) removal, and there comes with the well-defined network dismantling problem [1], which aims at identifying an optimal sequence of nodes that maximizes the damage on the network connectivity [5]. Such analysis yields a wide range of practical applications, such as immunize the epidemic propagation in populations [23], block the rumor spreading on social networks [15], prevent the virus diffusion in computer networks [7], etc.

However, the exact solution is computationally intractable for medium and large networks due to its NP-hard nature [5], thus a large number of approximate methods have been proposed, including the heuristic methods [4, 11, 12, 21, 23, 31], and some message-passing algorithms [5, 22]. The former methods often greedily select target nodes based on local metrics, like node degree, which often leads to sub-optimal solutions; the latter ones are more accurate and global, while they need to iterate certain steps on the whole network to select the suitable candidate nodes [31], which would sacrifice some efficiency.

Although these methods looks different from each other, many of them [5, 21, 22, 24, 31] share the same refinement technique, named reinsertion (we later introduce it in detail in Section 2), which is just simply mentioned in the respective literature, while has significant influence on the final results. As illustrated in Fig. 1, we draw the robustness curves (“Robustness measure” section) of random removal, simplest heuristic HDA and the representative CI (details of these methods will be introduced in “Competing methods” section) on a real-world Gnutella31 network [18]. We can see that without reinsertion, the representative method CI cannot even beat the simplest heuristic HDA, while with reinsertion, the random removal strategy can achieve comparable performance than the state-of-the-art results. In some literature, people just compare their methods enhanced with reinsertion with others without reinsertion, and then report the ’fake’ superiority of their model, since we are not sure whether the superiority comes from the model itself or just the reinsertion. Such confused results prevent us from selecting the best algorithm to handle the application at hand.

In this paper, we systematically investigate the power of reinsertion on the current methods for network dismantling. As far as we know, there are no previous efforts that conduct such comprehensive ablation studies for the reinsertion. We aim at figuring out the following three questions: i) Which is the current best method if all without reinsertion? ii)Which one is the best if all with reinsertion? iii) Which is the best reinsertion strategy?

To achieve this, we conduct ablation study (with/without reinsertion) for all the current network dismantling methods, including both traditional heuristics and the state-of-art message-passing ones, on synthetic networks and real-world networks. We use four random network models, including ER [8], WS [30], BA [3] and PLC [13], to generate diverse graphs with varying sizes and structures by controlling the model parameters. For real-world networks, we select 18 real networks covering 7 domains and with different scales. Considering that the network robustness can be described by different measures, we choose the area under the robustness curve as the main evaluation metric, since it captures the response of the whole dismantling process. Extensive experiments demonstrate that the reinsertion can significantly improve the performance regardless of the network types and the methods. Besides, since reinsertion is rather effective for the network dismantling problem, perhaps people should focus on this technique itself rather than other aspects, so as to design a better attack strategy.

The main contributions of this paper are summarized as follows:

1.
We conduct comprehensive ablation studies that are with and without reinsertion for the network dismantling problem. We compare 10 competing methods on both synthetic graphs generated from four random network types and 18 real-world networks covering seven domains and scales up to hundreds of thousands nodes;
2.
We design two other reinsertion strategies, and empirically prove that they have surpassed the previous reinsertion technique in a large margin;
3.
The results obtained in this paper could provide a valuable guide for selecting and designing the most appropriate method for practical network dismantling problems.

The rest of the paper is organized as follows. We introduce the reinsertion method, robustness measures, competing methods and experimental data in “Method” section. We analyze the comprehensive ablation results and effects of different reinsertion strategies in “Results” section. Finally, we conclude the paper in Section 5.

Method

In this section, we introduce the experimental setups. We first introduce the robustness measure to evaluate the dismantling efficacy, then we introduce the reinsertion technique that is widely adopted in most current competitors. After that, we describe the competitors we are to analyze and the experimental data, including both synthetic graphs and real-world networks.

Robustness measure

Network dismantling is to identify a sequence of nodes of which removal would degrade the network connectivity maximally, and this connectivity disintegration is often measured as the relative reduction in the size of the giant(largest) connected component (GCC size) [5, 21]. The smaller the remaining GCC size, the more the network is considered to have been disintegrated.

We consider the area under the robustness curve as the evaluation metric, which is plotted with horizontal axis being the fraction of nodes removed, and the vertical axis being the remaining GCC size. It is defined as:

$$ R = \frac{1}{N}\sum_{Q=1}^{N}s(Q) $$

(1)

where N is the number of graph nodes, s(Q) is the remaining GCC size after removing Q nodes. Intuitively this measure is equivalent to assessing how many nodes the GCC contains when a new node is deleted from the network, and sum this for all nodes [29]. Note that Eq. 1 captures the network’s response to the dismantling throughout the whole process, and the computation of R requires a ranking of the nodes, we are interested in minimizing R over all possible node orders.

In this paper, we evaluate the ablation performance of reinsertion for this robustness measure.

Reinsertion technique

The reinsertion is firstly proposed as an independent strategy for network destruction and immunization [27], and later developed as an important refinement technique for other dismantling strategies. The reinsertion starts from the point, where the network has been dismantled over by a certain strategy, it adds back one of the removed node, chosen such that, once reinserted, it joins the smallest number of clusters. When the node is reinserted, restore the edges with its neighbors which are in the network (but not the ones with neighbors not yet reinserted, if any). Repeat the above the procedure until all the nodes are back in the network.

As is shown in Fig. 2, each node is assigned an index c(i) given by the number of clusters it would join if it is reinserted in the network. The red node has c(red)=2, while the blue one has c(blue)=4, the green node has c(green)=3. Then the node with the smallest c(i) is reinserted, i.e., the red node. After that, the c(i)s are recalculated and the new node with smallest c(i) is found and reinserted. Repeat these steps until the terminal criteria meets.

We will later show with extensive experiments how powerful such a simple technique is to the current network dismantling methods.

Competing methods

In this paper, we compare with 9 most representative competing methods. The first five are traditional heuristics which are based on some local or global structure centrality, such as degree, betweenness, closeness, pagerank, or collective influence. The remaining five are specifically designed for dismantling networks. Note that we also add a Random removal strategy as a worst possible baseline.

High Degree Adaptive (HDA) [23]. HDA is an adaptive version of high degree method [2]. Within each step, the node with the highest degree is removed, and then the remaining degrees are updated.

High Betweenness Adaptive (HBA) [12]. HBA is the adaptive version of the high betweenness method, where the betweenness centrality of the remaining nodes is recomputed after each node removal. Betweenness centrality of a node equals to the sum of the fraction of all pairs shortest paths that pass through this node. It is a very useful centrality measure that benefits many network-related applications such as community detection and network vulnerability. However, the high computing cost prohibits its use in large-scale problem settings.

High Closeness Adaptive(HCA) [4]. HCA is the adaptive version of the high closeness method. Closeness centrality describes how close a node is to all the other nodes in the graph. It is calculated as the reciprocal of average distances from one node to all the others. Similar as HBA, the high complexity cost prevents its application in large networks.

High PageRank Adaptive (HPRA) [6]. HPRA is the adaptive version of high PageRank method. PageRank has been widely employed in search engines, as it provides a global ranking of all web pages, regardless of their content, based solely on their location in the Web’s graph structure [6]. PageRank computes the probabilities for a random-walking agent to reach every node in the network, which is also regarded as useful indications to supervise the network attack.

Collective Influence(CI) [21]. The Collective Influence measure is defined as the product of the node’s reduced degree (i.e. original degree minus one) with the sum of the reduced degrees of the nodes that are within a constant hops away from it. This measure describes the proportion of other nodes that can be reached from a given node, assuming the nodes with higher CI values play more crucial roles in networks. The CI method sequentially removes the node with the highest CI value and recalculating the collective influence for the rest following operations.

MinSum [5]. MinSum is proposed to address the network dismantling problem. It consists three stages, which firstly utilizes a variant of message-passing algorithm to break all the cycles, and then breaks the remaining tree into small components by removing a fraction of nodes that vanishes in the large size limit. In the third stage, it greedily reinserts some nodes that close cycles without increasing too much the size the largest component, to reduce the total number of nodes removed.

Belief Propagation-guided Decimation (BPD) [22]. BPD is very similar as MinSum, which contains the same three stages. The difference lies on that BPD treats the decycling problem as the minimum-FVS construction. The FVS refers to the feedback vertex set, which is a set of node that will cause the network to become a forest if being deleted. To solve this problem, BPD proposes a belief propagation-guided decimation algorithm. After, it conducts the same subsequent steps, including tree breaking and node reinsertion.

CoreHD [31]. CoreHD also contains the similar three stages. The only difference lies in the decycling stage. Unlike the message-passing or belief-propagation algorithm, CoreHD instead seeks to remove the minimum nodes to empty the 2-core subgraph in the network, since the network is acyclic equals to that the 2-core subgraph is empty. CoreHD greedily remove the highest degree node in the 2-core subgraph until the end.

GND [25]. GND is the state-of-the-art method to address the network dismantling problem with non-unit removal costs. It first defines a node weighted Laplacian, and then proposes a simple and elegant approximate algorithm to calculate its second smallest eigenvector, based on which the set of nodes are removed. GND repeats the process until the end. Note that the unit-cost GND is just the spectral cut method.

We use SNAP software^{Footnote 1} to implement the heuristic methods, including Random, HDA, HBA, HCA and HPRA. For the other baselines, we use the source codes^{Footnote 2}^{Footnote 3}^{Footnote 4}^{Footnote 5}^{Footnote 6} released online, and use the defaut parameter settings for each method.

Synthetic graphs

We evaluate all competitors against various synthetic networks. Synthetic networks are the result of applying generative function, present the advantage of displaying specific topological features that are both a prior known and tunable [29]. More specifically, we select a collection of 4 most common network types, summarized in Table 1. Note that there are many other random network models, such as regular graphs, circle graphs, grid graphs, ladder graphs, etc, we do not consider them since they are not difficult to dismantle, and there always exists some effective heuristic methods for them.

Table 1 Overview of four random network types

Full size table

Erdos-Renyi(ER) [8]. ER model is first introduced by Paul Erdos and Alfred Renyi, it returns a G_n,p graph, where n is the graph nodes, p is the edge creation probability. The G_n,p chooses each of the possible edges with probability p. This model can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs [8].

Watt-Strogatz(WS) [30]. WS is a random generative model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in 1998. The tunable parameters include the node number n, k nearest neighbors in a ring topology that each node is joined with, and the probability of rewiring each edge p.

Barabasi-Albert(BA) [3]. BA is a model that generates random scale-free networks using a preferential attachment mechanism. Many real-world networks are thought to be approximately scale-free and contain few nodes (called hubs) with unusually high degree as compared to the other nodes. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-Laszlo Barabasi and Reka Albert and is a special case of a more general model called Price’s model [28]. It generates a graph of n nodes by attaching new nodes with each adding m edges that are preferentially attached to existing nodes with high degree.

Powerlaw-Cluster(PLC) [13]. PLC is a mode for generating graphs with powerlaw degree distribution and approximate average clustering. It is essentially the BA growth model with an extra step that each random edge is followed by a chance of making an edge to one of its neighbors too (and thus a triangle) [13]. The model improves on BA in the sense that it enables a higher average clustering to be attained if desired. The tunable parameters include the number of nodes n, the number of random edges to add for each new node m, and the probability of adding a triangle after adding a random edge p.

Figure 3 visualizes one instance for each of the above four networks types.

Real-world networks

We also conduct experiments on 18 real-world networks, which cover a wide range of domains, including malicious networks, PPI networks, infrastructure networks, social networks, citation networks, communication networks, etc. Specifically, they are:

Corruption [26], a malicious network where nodes are people listed in scandals, and the ties indicate that two people were involved in the same corruption scandal;

Crime [16], a malicious network from the projection of a bipartite network of persons and crimes, each node denotes a person, an edge represents that two person are involved in the same crime;

USairport [16], a network of flights between US airports in 2010. Each node is an airport, and each edge represents a connection from one airport to another;

Hamster [16]. This Network contains friendships and family links between users of the website hamster.com;

Figeys [16], a network of interactions between proteins in Humans (Homo sapiens), from the first large-scale study of protein–protein interactions in Human cells using a mass spectrometry-based approach;

CA-GrQc [18], a collaboration network from the e-print arXiv and covers scientific collaborations between authors papers submitted to General Relativity and Quantum Cosmology category;

HI-II-14, the corresponding Human Interactome dataset covering Space II and reported in 2014. Each node represents a distinct protein, each edge denotes the interaction between the corresponding proteins;

Powergrid [16], a power grid network of the Western States of the United States of America. An edge represents a power supply line. A node is either a generator, a transformator or a substation;

CA-HepPh [18], a collaboration network from the e-print arXiv and covers scientific collaborations between authors papers submitted to High Energy Physics - Phenomenology category;

DBLP [16], a citation network of DBLP, a database of scientific publications such as papers and books. Each node in the network is a publication, and each edge represents a citation of a publication by another publication;

Cora [16], a citation network of Cora. Nodes represent scientific papers. An edge between two nodes indicates that the left node cites the right node;

Digg [16], a reply network of the social news website Digg. Each node in the network is a user of the website, and each edge denotes that a user replied to another user;

Email-Enron [20], the Enron email communication network which covers all the email communication within a data set of around half million emails. Each node is an email address, and an edge denotes at least one email communication;

Brightkite [16], a social network contains user–user friendship relations from Brightkite, a former location-based social network were user shared their locations. A node represents a user, and an edge indicates that a friendship exists between the user represented by the left node and the user represented by the right node;

Gnutella31 [19], a sequence of snapshots of the Gnutella peer-to-peer file sharing network from August 2002. Nodes represent hosts in the Gnutella network topology and edges represent connections between the Gnutella hosts;

Facebook [16], contains friendship data of a small subset of Facebook users. A node represents a user and an edge represents a friendship between two users;

Epinion [16], the trust network from the online social network Epinions. Nodes are users of Epinions and directed edges represent trust between the users;

Douban [16], a social network of Douban, a Chinese online recommendation site. A node represents a user of Douban and an edge represents a friendship between two users.

We treat all the networks as undirected ones and remove the self-loops. We extract the largest connected component. Basic statistics of the extracted networks are reported as Table 2.

Table 2 Basic statistics for real-world networks. Ordered by the number of nodes. Values are for the giant component of the network. MSP is the mean shortest path length, CC is the clustering coefficient, Assor is the assortativity, PE is the powerlaw exponent

Full size table

We also draw the degree distributions for these networks in Fig. 4. We can see most real networks (except Corruption network) share an approximate scale-free structure, which presents a well-known resilience against random failures, but disintegrate rapidly under intentional attacks targeting key nodes [2].

Results

In this section, we first demonstrate the effectiveness of the reinsertion technique on both synthetic graphs and real-world networks, then we explore the effects of different reinsertion techniques.

Synthetic results

We test all methods w/o the reinsertion technique on synthetic graphs randomly generated by four classic models introduced in “Synthetic graphs” section. For each model, we generate 100 graphs with the parameters in Table 1, and report the values of mean and standard variance results. Table 3 shows the comparison results of Eq. 1 without reinsertion, we can clearly see that HBA the best across different types of networks, which is widely validated by previous research [14,27], since HBA adaptively removes the highest betweenness nodes, which are key to the whole network connectivity. HCA, which adaptively removes the highest closeness nodes, also performs excellently due to the similar reasons. However, considering the high computational costs of these two methods (Table 6), they are not practical in large or even medium scale networks. We can also see in methods achieve good results in ER graphs, since these graphs are purely random ones that there are no ’critical’ nodes that determine the graph connectivity.

Table 3 Comparison results (%) on synthetic graphs without reinsertion. Each result is averaged over 100 test instances. The result format is mean ±variance. The bold ones indicate the best results for that network

Full size table

In Table 4, we enhance each method with the reinsertion technique introduced in “Reinsertion technique” section, and report the refined results, and we also show the promotion (Eq. 2) after adding the reinsertion in Table 5. We can see that most methods (except for HBA, HCA and GND) get improved after using reinsertion, and on average, HPRA (reinserted) performs the best among all. We also observe two interesting things: i) The best performed HBA gets deteriorated greatly when utilized with reinsertion, however, even the best result for reinserted methods (HPRA) cannot beat the vanilla HBA (Table 3). This indicates that the vanilla HBA has achieved the close-to-optimal performance for the network dismantling problem, at which the reinsertion is no longer a refinement, but a hindrance; ii) The pure Random strategy gets greatly improved with reinsertion, making the reinserted random strategy be close to those manually-designed state-of-the-arts.

$$ promotion = \frac{(R_{original} - R_{reinsert})}{R_{original}} $$

(2)

Table 4 Comparison results (%) on synthetic graphs with reinsertion. Each result is averaged over 100 test instances. The result format is mean ±variance. The bold ones indicate the best results for that network

Full size table

Table 5 The promotion of R (%) on synthetic graphs with reinsertion. Each result is averaged over 100 test instances. The result format is mean ±variance. The bold ones indicate the best results for that network

Full size table

Table 6 Time (/s) comparison of different methods on synthetic graphs. Each result is averaged over 100 test instances. The result format is mean ±variance. The bold ones indicate the best results for that network

Full size table

However, if taking account of running time, we find actually the simple heuristic HDA achieves the best balance between effectiveness and efficiency (Tables 3, 4 and 6). The reinserted HDA is only 1.74% worse than the best result (vanilla HBA), while is hundreds of times faster (Table 6). Note that we do not list the time for Random strategy, since it basically takes no time to obtain a random solution.

Real-world results

Now we will see the effects of reinsertion on real-world networks. Since HBA and HCA are computationally prohibitive on medium or large networks (e.g., HBA takes over 5 days to finish computation on the Cora network, with 23,166 nodes and 89,157 edges.), we do not compare with them in this section.

Table 7 shows the results of vanilla methods without reinsertion. We can see that HDA, HPRA and GND performs relatively better than other methods, and HPRA is the best (0.0986) among all, and followed by HDA (0.1043). Table 8 gives the results after reinsertion, and Table 9 shows the promotion results. Consistent with the observations from synthetic results, most methods get improvements for different levels, with the refinement of reinsertion. For example, the random strategy obtains an average 71.56% gain (Table 9) with reinsertion, making it even beat the state-of-the-art MinSum strategy (Table 8). Among the reinserted methods, HDA achieves the highest performance with an average 0.0938 (Table 8) robustness score (Eq. 1). However, GND is deteriorated on some networks when refined with reinsertion (Table 8), the reason behind remains to be explored. When considering the execution, HDA is far more efficient than the other ones, e.g., it is about 767 times faster than HPRA, which is very close to HDA in effectiveness.

Table 7 Comparison results (%) on real-world networks without reinsertion. The bold result is the best one of that network

Full size table

Table 8 Comparison results (%) on real-world networks with reinsertion. The bold result is the best one of that network

Full size table

Table 9 The promotion of R (%) on real-world networks with reinsertion. The bold result is the best one of that network

Full size table

Effects of different reinsertion strategies

We have observed the impressive gains brought by the reinsertion technique in “Synthetic results” and “Real-world results” sections, now we may ask: Is the reinsertion in “Reinsertion technique” section the best one? Does there exist more effective reinsertion methods? In this section, we try to answer this question by exploring other potential reinsertion techniques (Table 10).

Table 10 Time (/s) comparison of different methods on real-world networks. The bold result is the best one of that network

Full size table

We name the previous reinsert method as Reinsert_I, and here we propose two other ones, and call them Reinsert_II and Reinsert_III respectively. Basically, the general reinsertion technique is to add back one of the removed node (together with the adjacent edges), chosen based on some criteria, until all nodes are back in the network. Different reinsertion methods define different criteria, based on which, we define the following three reinsertion strategies:

Reinsert_I: The criteria is once reinserted, it joins the smallest number of clusters;
Reinsert_II: The criteria is once reinserted, it joins the clusters of smallest sizes;
Reinsert_III: The criteria is once reinserted, it joins the clusters minimizing the multiply of both numbers and sizes;

In Fig. 2, each node is assigned an index c(i) given by the criteria specified by the reinsertion technique. For Reinsert_I, c(red)=2,c(blue)=4,c(green)=3, then the red node is reinserted; for Reinsert_II, c(red)=10,c(blue)=5,c(green)=6, then the blue node is reinserted; for Reinsert_III, c(red)=20,c(blue)=20,c(green)=18, then the green node is reinserted. After that, the c(i)s are recalculated and the new node with smallest c(i) is found and reinserted. Repeat these steps until the end. As a consequence, different reinsertion strategies determines different nodes to be reinserted first, leading to different refinement results. To decide which one is better in practice, we compare the average performance promotion for each method on both synthetic graphs and real-world networks (Tables 11 and 12).

Table 11 Average promotion of R on synthetic graphs for different reinsertion techniques. Each result is averaged over all test graphs (including four types of graphs, and 100 graphs for each type), and the result format is mean ±variance. The bold result is the best one for that method

Full size table

Table 12 Average promotion of R on real-world networks for different reinsertion techniques. Each result is averaged over all test networks (total 18 real-world networks), and the bold ones are the best results for that method

Full size table

It can be clearly observed in Tables 11 and 12 that Reinsert_III achieves the most promotions for most methods (except CI) on both synthetic and real-world networks, compared to other two reinsertion strategies, and excels to a significant extent to the current strategy Reinsert_I. For CI method, Reinsert_I tends to be more effective. All the three reinsertion strategies fail in HBA and GND.

To illustrate the effects of these three strategies more intuitively, we draw the robustness curve of CA-GrQc network for different methods with different reinsertions in Fig. 5, which is plotted with horizontal axis being the fraction of removed nodes, and vertical axis being the remaining giant connected component size. Actually, the value of Eq. 1 approximates the area under the robustness curve. The figure clearly shows that the reinsertion greatly helps reduce the area under the curve, compared to the original method, and Reinsert_III is among the most effective one, while all the reinsertions produce negative effects on the GND method.

Conclusion

In this paper, we, for the first time, systematically explore the effects of reinsertion techniques for the network dismantling problem. Previous research tend to use their reinserted results to compare with other un-reinserted baseline methods, which may mislead us in the selection of the real best dismantling strategy for applications at hand. We conduct comprehensive ablation studies on both synthetic graphs generated by four classical random network models, i.e., ER, WS, BA and PLC, and 18 real-world networks across seven different domains and with different scales, and the results show that: i) HBA (no reinsertion) is the most effective network dismantling strategy, however, it can only be applicable in small scale networks; ii) HDA (with reinsertion) achieves the best balance between effectiveness and efficiency. It is surprising that such a simple heuristic method would beat most state-of-the-art methods if enhanced with reinsertion techniques; iii) The reinsertion technique helps improve the performance for most current methods, except for HBA, HCA and GND (on small-world type graphs); iv) Reinsert_III, which determines the node based on that it joins the clusters minimizing the multiply of both numbers and sizes, is the most effective reinsertion strategy for most methods (except for CI, where Reinsert_I suits best). We believe the results in this paper could provide as a reference for choosing and designing the most effective strategy for realistic network dismantling applications.

However, we still lack a deep understanding about why such a simple reinsertion technique works so well for the network dismantling problem, which would be a very meaningful future research topic to be explored. We will later release the codes and data to support the research in this direction.

Availability of data and materials

Upon reasonable requests, all code/data used in the analysis will be available to any researcher for purposes of reproducing or extending the analysis.

Notes

References

Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47.
Article MathSciNet Google Scholar
Albert R, Jeong H, Barabási AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378.
Article Google Scholar
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512.
Article MathSciNet Google Scholar
Bavelas A (1950) Communication patterns in task-oriented groups. J Acoust Soc Am 22(6):725–730.
Article Google Scholar
Braunstein A, Dall’Asta L, Semerjian G, Zdeborová L (2016) Network dismantling. Proc Natl Acad Sci 113(44):12,368–12,373.
Article Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1-7):107–117.
Article Google Scholar
Cohen R, Erez K, Ben-Avraham D, Havlin S (2001) Breakdown of the internet under intentional attack. Phys Rev Lett 86(16):3682.
Article Google Scholar
ERDdS P R&wi (1959) On random graphs i. Publ Math Debrecen 6:290–297.
Fan C, Xiao K, Xiu B, Lv G (2014) A fuzzy clustering algorithm to detect criminals without prior information In: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 238–243.. IEEE Press.
Fan C, Liu Z, Lu X, Xiu B, Chen Q (2017) An efficient link prediction index for complex military organization. Phys A Stat Mech Appl 469:572–587.
Article Google Scholar
Fan C, Zeng L, Ding Y, Chen M, Sun Y, Liu Z (2019) Learning to identify high betweenness centrality nodes from scratch: A novel graph neural network approach In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 559–568.
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry: 35–41.
Holme P, Kim BJ (2002) Growing scale-free networks with tunable clustering. Phys Rev E 65(2):026,107.
Article Google Scholar
Holme P, Kim BJ, Yoon CN, Han SK (2002) Attack vulnerability of complex networks. Phys Rev E 65(5):056,109.
Article Google Scholar
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 137–146.. ACM.
Kunegis J (2013) Konect: the koblenz network collection In: Proceedings of the 22nd International Conference on World Wide Web, 1343–1350.. ACM.
Lalou M, Tahraoui MA, Kheddouci H (2018) The critical node detection problem in networks: a survey. Comput Sci Rev 28:92–117.
Article MathSciNet Google Scholar
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1(1):2.
Article Google Scholar
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123.
Article MathSciNet Google Scholar
Morone F, Makse HA (2015) Influence maximization in complex networks through optimal percolation. Nature 524(7563):65.
Article Google Scholar
Mugisha S, Zhou HJ (2016) Identifying optimal targets of network attack by belief propagation. Phys Rev E 94(1):012,305.
Article Google Scholar
Pastor-Satorras R, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86(14):3200.
Article Google Scholar
Ren XL, Gleinig N, Helbing D, Antulov-Fantulin N (2018) Generalized network dismantling. arXiv preprint arXiv:180101357.
Ren XL, Gleinig N, Helbing D, Antulov-Fantulin N (2019) Generalized network dismantling. Proc Natl Acade Sci 116(14):6554–6559.
Article MathSciNet Google Scholar
Ribeiro HV, Alves LG, Martins AF, Lenzi EK, Perc M (2018) The dynamical structure of political corruption networks. J Complex Netw 6(6):989–1003.
Article MathSciNet Google Scholar
Schneider CM, Mihaljev T, Herrmann HJ (2012) Inverse targeting—an effective immunization strategy. EPL (Europhys Lett) 98(4):46,002.
Article Google Scholar
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416.
Article MathSciNet Google Scholar
Wandelt S, Sun X, Feng D, Zanin M, Havlin S (2018) A comparative analysis of approaches to network-dismantling. Sci Rep 8(1):13,513.
Article Google Scholar
Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440.
Article Google Scholar
Zdeborová L, Zhang P, Zhou HJ (2016) Fast and simple decycling and dismantling of networks. Sci Rep 6:37,954.
Article Google Scholar

Download references

Acknowledgments

We thank Yuan Liu for her valuable discussions.

Funding

This work was supported by CSC scholarship offered by Chinese Scholarship Council and NSFC-71701205.

Author information

Changjun Fan and Li Zeng contributed equally to this work.

Authors and Affiliations

College of Systems Engineering, National University of Defense Technology, Changsha, Hunan, China
Changjun Fan, Li Zeng, Yanghe Feng, Jincai Huang & Zhong Liu
School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong Province, China
Baoxin Xiu

Authors

Changjun Fan
View author publications
You can also search for this author in PubMed Google Scholar
Li Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yanghe Feng
View author publications
You can also search for this author in PubMed Google Scholar
Baoxin Xiu
View author publications
You can also search for this author in PubMed Google Scholar
Jincai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Authors’ contributions

CF and YF initiated the project. CF and YF, XB designed and managed the project. CF and LZ performed calculations. All authors analyzed the results, wrote the manuscript, and edited the manuscript. All author(s) read and approved the final manuscript.

Authors’ information

CF, LZ, YF, JH and ZL are all affiliated with College of Systems Engineering, National University of Defense Technology. CF and LZ are both Ph.D. candidates. YF is an associate professor, JH and ZL are both professors. XB is a professor in School of Systems Science and Engineering, Sun Yat-sen University.

Corresponding author

Correspondence to Changjun Fan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fan, C., Zeng, L., Feng, Y. et al. Revisiting the power of reinsertion for optimal targets of network attack. J Cloud Comp 9, 24 (2020). https://doi.org/10.1186/s13677-020-00169-8

Download citation

Received: 30 June 2019
Accepted: 03 April 2020
Published: 07 May 2020
DOI: https://doi.org/10.1186/s13677-020-00169-8

Revisiting the power of reinsertion for optimal targets of network attack

Abstract

Introduction

Method

Robustness measure

Reinsertion technique

Competing methods

Synthetic graphs

Real-world networks

Results

Synthetic results

Real-world results

Effects of different reinsertion strategies

Conclusion

Availability of data and materials

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Authors’ contributions

Authors’ information

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords