An enhanced state-aware model learning approach for security analysis in lightweight protocol implementations

Owing to the emergence and rapid advances of new-generation information and digitalization technologies, the concept of model-driven digital twin has received widespread attentions and is developing vigorously. Driven by data and simulators, the digital twin can create the virtual twins of physical objects to perform monitoring, simulation, prediction, optimization, and so on. Hence, the application of digital twin can increase efficiency and security of systems by providing reliable model and decision supports. In this paper, we propose a state-aware model learning method to simulate and analyze the lightweight protocol implementations in edge/cloud environments. We introduce the data flow of program execution and network interaction inputs/outputs (I/O) into the extended finite state machine (EFSM) to expand the modeling scope and insight. We aim to calibrate the states and construct an accurate state-machine model using a digital twin based layered approach to reasonably reflect the correlation of a device’s external behavior and internal data. This, in turn, improves our ability to verify the logic and evaluate the security for protocol implementations. This method firstly involves instrumenting the target device to monitor variable activity during its execution. We then employ learning algorithms to produce multiple rounds of message queries. Both the I/O data corresponding to these query sequences and the state calibration information derived from filtered memory variables are obtained through the mapper and execution monitor, respectively. These two aspects of information are combined to dynamically and incrementally construct the protocol’s state machine. We apply this method to develop SALearn and evaluate the effectiveness of SALearn on two lightweight protocol implementations. Our experimental results indicate that SALearn outperforms existing protocol model learning tools, achieving higher learning efficiency and uncovering more interesting states and security issues. In total, we identified two violation scenarios of rekey logic. These situations also reflect the differences in details between different implementations.


Introduction
With the flourishing development of smart and digital technologies including Internet of Things (IoT), the fifth-generation cellular network (5G), cloud computing, and big data, various kinds of intelligent and digitalized products are widely revolutionizing today's society [1][2][3].To provide efficient monitoring, modeling, analysis, and optimization from an overall perspective for these smart systems and devices, digital twin, an emerging technology based on model-driven to achieve physical-virtual convergence, is proposed [4,5].Digital twin can map the physical world to the digital world by using novel hybrid simulation and data-driven modeling approach [6].Therefore, the digital twin model can support design decisions, function tests, logic verifications, and performance statistics for the complex system [7].
To evaluate and test the functionality, logic, and security of network protocols in actual operation, model learning methods have been studied and applied in both researches and practical applications [8][9][10][11][12].These methods employ automata learning algorithms and corresponding learning frameworks to automatically establish a state-machine model of a protocol device's behavior.The inferred state-machine model is then analyzed to uncover vulnerabilities and violations thus improving the security of the protocol systems.Serving as a complementary technique in model-driven digital twin construction, model learning is characterized by its simplicity, intuitiveness, and broad adaptability.It offers a focused lens to examine the execution logic of protocols and identify anomalous behaviors [13][14][15].
Most current protocol automata learning are based on black-box setup, focusing exclusively on I/O interactions.They assume that the model is finite and employ learning algorithms (such as L* [16] or TTT [17]) to infer hypotheses of the model.They also set an upper limit, known as the testing depth, for input sequence combinations and perform equivalence queries, such as the Wp method [18], to confirm if the inferred hypothesis aligns with the true model.While this black-box approach is straightforward and requires minimal setup, it tends to produce models that only reflect observable interactions, neglecting the internal complexities of the program.This limitation gives rise to several challenges in inferring and analyzing protocol state machines: 1. Difficulty in inferring accurate models: Black-box learning verifies the hypotheses through equivalence tests constrained by a set testing depth [8][9][10].
If the testing depth is too low, the model may miss critical behaviors, such as a backdoor activated by a series of repeated messages.However, increasing the depth exponentially elevates query costs, particularly in worst-case scenarios.Restricted by learning time and query limitations, black-box learning struggles to uncover deep and concealed protocol states, impeding the inference of an accurate model.2. Limitations in identifying security risks: Since blackbox learning relies entirely on I/O observations, it lacks the capability to perceive or interpret other vital information within the protocol such as cryptographic primitive algorithms [11,12,19].This limitation not only hampers the discovery of logical issues at different program state nodes-such as authentication bypasses or key leakages-but also impedes the detection of code implementation problems, such as memory leaks, buffer overflows, or null pointer invocations.

Challenges in filtering redundant queries:
To ensure the completeness of the state-machine model, blackbox learning typically queries all messages in every possible state.However, for certain states where the tested program has terminated its interactions-such as disconnection or periods of implicit silence-these queries become not only time-consuming but also minimally informative.
The aforementioned issues arise from the conventional black-box learning framework's exclusive reliance on I/O observations.Mere parameter modifications cannot overcome these limitations.Some scholars proposed potential solutions including learning register automatas from source code based on taint analysis [20], inferring state machines via symbolic execution [21], and deriving models from specification documents [22].However, these methods are generally applicable only for smallscale protocols or fall short in revealing vulnerabilities in real-world implementations.
To enhance the perception of state machines, a more comprehensive approach is required.In this study, we propose a state-aware model learning method that synergistically integrates network interaction I/O with filtered dynamic lifetime memory variables.This integration enables the inferred state-machine model to provide a more nuanced understanding of the system under learning (SUL), increasing the probability of discovering deep-level abnormal states and behaviors.As illustrated in Fig. 1, the first step involves instrumenting the SUL to monitor variables used during testing.Subsequently, the learning engine generates multiple rounds of message queries based on the learning algorithms.A mapper retrieves the results from these interactions with the SUL, while an execution monitor gathers state calibration information from filtered memory variables.Utilizing both I/O data and state calibration information, the learning engine incrementally constructs the protocol extended finite state machine (EFSM).This workflow continues until the learning process is completed, resulting in the final output of the state-machine model.For the inferred state machine, further analysis can be conducted to identify the potential suspicious paths that correspond to the search for program vulnerabilities.
We applied this novel method to develop SALearn, focusing on IKEv2 as our protocol for analysis.In recent studies, a lightweight version of the IKEv2 protocol has been standardized [23] and has been applied for multiple security schemes in edge/cloud environments [24][25][26].We conducted tests on two widely used IKEv2 implementations-StrongSwan and Libreswan-and compared our method with existing model learning methods.Our experimental results demonstrate that state-aware model learning not only reduces the learning time but also exhibits greater adaptability, enabling the discovery of more intriguing states and security issues.Specifically, we identified two violation scenarios as follows: abnormal management when rekeying IPsec SA in Strongswan and anomalous logic of rekeying IKE SA in Libreswan.
Our contributions in this study can be summarized as follows: 1. First, we propose a state-aware model learning method that enhances the quality of inferred states by correlating them more closely with the internal dynamics of the program.Automated model-driven digital twins reduce barriers to understanding the behaviors and analyzing the security for lightweight protocol implementations.2. Second, we develop SALearn based on the stateaware model learning approach, designing a testing scheme and implementing a mapper for IKEv2 model learning.3. Third, we conduct an extensive evaluation of two widely used IKEv2 implementations: Strongswan and Libreswan.Our results demonstrate that SALearn is more efficient in learning and is capable of discovering more interesting states and security issues.We also discuss the impacts of the two specific violations we identified.
The remainder of this paper is structured as follows: Related works section provides an overview of the related work in the fields of model learning and state-aware fuzzing test.Preliminaries section offers background knowledge of network protocol models and IPsec.Stateaware model learning framework using digital twins section unveils a comprehensive framework and methodology for state-aware model learning.Experimental evaluation and analysis section introduces the results of our experimental evaluation and analysis.Finally, Conclusion section concludes the article and proposes potential directions for future research.

Related works
Program states serve as observable running attributes that can differentiate program behaviors or can be combined with certain specifications to determine the correctness of those behaviors.These states can further guide model inference or optimize adversarial test cases [27].In the realm of network protocols, which are engineered to fulfill communication, program states can be expressed from two levels: how the program processes response events, known as its external representation, and the data context within which the program operates, known as its internal representation [28].Academic research in protocol analysis and testing has been extensive, which includes model learning and state-aware fuzzing tests.

Model learning
Model learning, also known as state-machine inference, is a technique used for constructing state-machine models of both software and hardware systems.This is done by providing input and observing the corresponding output, which is crucial for comprehending the system's functionality and behavior [29].Existing research in this field can be broadly classified into two categories: active learning, which involves generating queries and analyzing responses, and passive learning, which is based on collected samples.Our study predominantly focuses on the active learning approach.The concept of model learning was formalized in 1987 when Angluin proposed the L* learning algorithm, providing a foundational framework for modeling reactive systems [16] .In 1999, Peled et al. [30] applied model learning to software analysis.More recently, the

State machine
Output Vulnerabilities Further analysis Fig. 1 Basic framework of state-aware based model learning method model learning has been widely employed in network protocol testing [31].For instance, Joeri de Ruiter and Poll [8] grabbed scholarly attention with their work on analyzing SSL/TLS protocols using state-machine inference.They created a tool called StateLearner for black-box state-machine inference and applied it to nine SSL/TLS implementations.Their approach led to the discovery of three security vulnerabilities, including client authentication bypasses, encrypted data leakages, and anomalies during rekeying.Stone et al. [9] conducted state-machine inference to seven Wi-Fi implementations and unearthed two downgrade attacks and one encrypted multicast leakage.Brostean et al. [10] inferred state-machine models from three SSH implementations, identifying seven subtle violations of specifications.Furthermore, Brostean et al. [11] proposed a protocol state fuzzing framework for DTLS to analyze 13 widely used DTLS servers and discover four security vulnerabilities, including client authentication bypass and handshake sequencing anomalies.Moreover, Brostean et al. [12] inferred state machines and constructed bug pattern catalogs to test implementation flaws.They tested 12 SSH and DTLS implementations and discovered 96 new vulnerabilities and errors.
Furthermore, model learning can be synergistically combined with other techniques such as symbolic execution, formal analysis, and fuzzing tests to broaden its scope and efficacy.For example, Marcovich et al. [22] combined model learning and symbolic execution to directly infer state machines and message formats from binary codes.They developed PISE, a tool based on this hybrid method, which successfully inferred the command-and-control (C &C) protocol state machine for the Gh0st RAT malware.In a similar vein, Wang et al. [32] combined model learning with formal analysis to create an automated solution, MPInspector, designed to scrutinize the security of message-passing (MP) protocols.When applied to nine popular IoT platforms-including MQTT, CoAP, and AMQP-MPInspector identified 252 attribute violations and proposed 11 different types of attacks within two realistic scenarios.Brostean et al. [33]  values from a software perspective, as well as virtual memory and register states from a hardware perspective [28].These state variables are typically accessed and shared by different segments of the program and can influence control flows or memory access pointers either directly or indirectly.Therefore, comprehensively exploring these states is beneficial to uncover hidden vulnerabilities in a program.
Recently, the field of state-aware fuzz testing has seen significant advancements.Aschermann et al. [35] proposed IJON, an innovative mechanism using artificial annotations to trace program states, which allows fuzzers to explore a program's behavior more systematically.Experimental evaluations indicate that IJON, based on manual annotations, can explore deeper into program states than traditional fuzzing or symbolic execution tools.Fioraldi et al. [27] developed InsvCov, which uses program invariants as boundaries to partition a program's state space.It employs a combination of control flow and program invariants as feedback for fuzz testing.Similarly, Pham et al. [36] designed AFLNET, which leverages server response codes as indicators of program states in network protocols.AFLNET not only performs substantially better than its counterparts but also identified two new CVEs.Natella [37] designed StateAFL, a tool that employs localized sensitive hashing on longlifetime variables to identify program states.It incrementally builds a protocol state machine for guiding the fuzzing process.Compared to AFLNET, StateAFL achieves comparable, if not superior, code coverage and produces more accurate state inferences.Ba et al. [38] developed SGFuzz, which annotates program states based on various variable types like enumerations and macros.Experimental data indicates that SGFuzz discovers state errors and achieves code coverage more quickly than other mainstream stateful fuzzers.It has also unearthed eight new CVEs.Wen et al. [39] proposed MemLock, a memory-usage-oriented fuzzing technique.It uses extreme memory allocation values as additional feedback to detect memory leaks, outperforming other fuzzing techniques and identifying 15 new CVEs.Zhou et al. [40] presented Ferry, a state-aware symbolic execution approach that considers conditionally branched and input-influenced variables.Experimental evaluations demonstrate that compared to existing tools, Ferry achieves broader code coverage and triggers more states and vulnerabilities.Zhao et al. [28] developed StateFuzz, which employs static analysis to select long-lifetime, real-time updated variables that influence program control flow.It employs a unique feedback mechanism, proving effective in discovering new vulnerabilities and identifying 15 new CVEs.However, these state-aware approaches mainly focus on optimizing test-case generation using coarse-grained combinations of variables.They lack the nuance needed for fine-grained state calibration and accurate state-machine inference.
Our study takes a different approach.We combine network interaction I/O with carefully selected dynamic lifetime memory variables for a more nuanced state calibration.Compared to black-box model learning, our method offers a deeper understanding of the program's internal context.Unlike Stone et al. 's gray-box approach [34], we directly obtain runtime variable content, bypassing limitations imposed by stack memory and multiple subprocesses.Lastly, distinct from state-aware fuzzing, our focus is on inferring a detailed state-machine model to improve the efficiency of discovering both crash errors and logic issues, setting our work apart from traditional state-aware fuzzing.

Preliminaries
In this section, we introduce the foundational concepts of network protocol models and the IPsec-IKE protocol.

Network protocol model
In general, a network protocol model depends on a server operating in a continuous cycle of receiving, processing, and responding to requests.In this scheme, both the client and server participate in a session, marked by a series of request and response messages.This basic loop can be succinctly captured by the following pseudocode in Algorithm 1:

Algorithm 1 A network server process model
As the session progresses, driven by various external events, the internal execution of the program adjusts and adapts, leading to updates in the program's state.In this context, "state" encompasses both the program's external expected behaviors and internal runtime data.
Externally, the protocol's state is typically reflected in terms of the actions the process is allowed to take, which events it expects to happen, and how it will respond to those events [37].Most internet protocols elucidate their standardized states either through natural language descriptions or, less commonly, through finite state machines, as comprehensively illustrated in RFC documents.These states are closely tied to the current stage of the protocol, where both inputs and outputs are fully contextualized within this framework.
Internally, the state of the program refers to a rigorously maintained execution environment that includes all presently active components.These facets are accessed and manipulated by various program operations, which in turn can impact the program's control flow or memory pointers [28].During the protocol's execution, variables and data are created, used, and eventually released, residing in both heap and stack memory.These variables undergo updates as the server handles each request and response interaction.Nonetheless, among these variables, there exist indeterminate values, such as those associated with time, random numbers, and temporary keys.Although they impact the program's operations, they lack the desirability of distinguishing and designating a concrete state.In this paper, we focus on calibrating the state using variables that record previous program operations or user interactions, possess deterministic values for the same operation, and are tied to critical nodes such as the client's current authentication status or working directory [37].By carefully filtering and monitoring these variables, we achieve a more nuanced understanding of the program's state.

IPsec-IKE protocol
IPsec [41][42][43] is a suite of security protocols that operate at the IP layer and is composed of three sub-protocols: IKE, AH, and ESP.The IKE protocol handles cryptographic algorithm selection, session key negotiation, and identity authentication.By contrast, AH and ESP are used for secure communication.
IKE serves as the signaling protocol for IPsec, overseeing secure communications by maintaining security associations (SAs): repositories of cipher and identity information for communicating parties.Initially, these parties set up an IKE SA, which is followed by negotiating an IPsec SA within the encrypted channel of the existing IKE SA.The IPsec SA is used for secure ESP or AH communications, while the IKE SA manages this IPsec SA.
IKE exists in two versions: IKEv1 [42] and IKEv2 [43].Compared to the multi-modal complexities of IKEv1, IKEv2 offers a streamlined approach with only four types of interactions: Figure 2 illustrates the generic interaction flow within the IKEv2 protocol.Detailed explanations of payload field meanings and symbol notations are available in RFC7296 [43].
In the initial phase of communication, the two parties commence with the negotiation of cryptographic parameters and identity authentication via IKE_SA_INIT and IKE_ AUTH interactions, respectively.Consequently, separate IKE SA and IPsec SA instances are established.The IPsec SA specifies parameters for safeguarding data communication.Moreover, during the course of communication, the CREATE_CHILD_SA interaction allows rekeying procedures, facilitating the creation of new IKE SA or IPsec SA instances to ensure forward secrecy.The communication session concludes when both parties utilize notification payloads to delete the established IPSec and IKE SAs.

State-aware model learning framework using digital twins
In the following section, we delve into the intricate details of the overarching insight, framework, and methodology for state-aware based model learning.We describe the techniques for defining states, capturing relevant state variables, and the specific learning processes involved.

Insights from digital twins
Linking digital twins with a state-aware model learning approach for security analysis in lightweight protocol implementations entails combining these two principles to improve digital system knowledge and protection.We can integrate the proposed model with in the digital twins through a layer approach as shown in Fig. 3.
1. Digital twin layer: This layer represents the system's digital twins, which are virtual representations of physical elements.Device twin: A digital twin that represents particular system devices.Communication twin: A digital twin that records the channels and patterns of communication between devices.Data flow twin: A digital twin that depicts the data flow within the system.This layer's digital twins capture real-time data using instrumentation and sensors implanted into the represented things.Using integrated data, the system discovers abnormalities in lightweight protocol implementations.This layer tries to improve system security by proactively addressing possible security concerns.

Overall framework
Figure 4 describes the design of our state-aware based model learning framework, which comprises five major components: the state-machine learning engine, intermediate mapper, execution monitor, SUL, and vulnerability analysis and exploitation modules.The state-machine learning engine employs a specific learning algorithm to generate multi-round message queries, process the results, and dynamically construct a protocol state-machine.It operates on an input alphabet consisting of abstract protocol messages.Guided by the learning algorithm, sequences of abstract message requests are formed and dispatched to the intermediate mapper for further processing.Upon receiving feedback sequences from the mapper and the state calibration identifications provided by the execution monitor, the learning algorithm incrementally constructs a protocol state machine, eventually yielding the final state-machine model.To enhance the learning process's efficiency and facilitate debugging, the engine comes equipped with features for query caching and logging.
The intermediate mapper serves as a critical intermediary, bridging the gap between abstract and concrete

Intermediate mapper
System under learning (SUL) The execution monitor plays a pivotal role in observing and filtering memory variables during the execution process of the SUL, it also calculates state calibration metrics in sync with specific watch points, thereby aiding the learning engine in crafting an accurate state machine.During each message interaction, the monitor identifies and logs variable values created by the SUL, focusing on those with lifetimes that align with the established watch points.Using hash calculations, it then derives the state calibration value pertinent to the current interaction.In this way, the execution monitor calculates the state calibration sequence that correlates with the query sequence from the learning engine, forwarding this information for the construction of the state machine.
The SUL functions as the protocol server under evaluation and can either be a white-box software library or a gray-box binary program that is susceptible to instrumentation.Before undergoing any tests, the SUL is compiled and instrumented to allow variable tracking during execution through shared memory.
The vulnerability analysis component scrutinizes the output state-machine model generated for the SUL, which involves using manual techniques or model checking methods to identify suspicious pathways within the state machine.These are then cross-referenced with predefined specifications or debugging tools to identify violations.

Description of state
To describe the state model, enriched with both I/O and memory information, we introduce the following definitions: Definition 1 An alphabet is a finite, nonempty set of single letters, denoted as , such as = a, b, ... .The concatenation of letters is denoted as a • b , where the concatenation symbol • can be omitted.State machines with memory calibration have graphical representations that describe states as nodes and state transitions as edges.Figure 5 illustrates a simple graphical example.When the state machine is in state s 0 ∈ S and receives input i ∈ I , the state transition δ(s 0 , i) = s 1 and the output (s 0 , i) = (o 1 , m 1 ) correspond to an edge from s 0 to s 1 in the graph with the label i/(o 1 , m 1 ) .Because memory-calibrated strings are only used to distinguish between different states and not as inputs for state transitions, they can be omitted from graphical representations.Definition 3 A finite sequence of concatenated letters is called a string and is denoted as ω , and the set of strings it forms is denoted as * .The concatenation of letters can be extended to the concatenation of strings, denoted as ω 1 • ω 2 .In particular, ǫ denotes the empty string with a Fig. 5 An example of state machine length of 0. Similarly, I * ⊂ * represents the set of input containing strings.Definition 4 δ can be extended to * by defining δ * : S × � * → S , where δ denotes a special form of δ * when the length of the second parameter is 1.Since it does not cause ambiguity, δ can be used instead of δ * to represent the state transi- tion function.

Capture memory state variables
In the EFSM model, states are described by a combination of network I/O behaviors and memory calibrations.Memory calibration, calculated from a specific set of memory variables during state transition, plays a significant role in determining the program's behavior through that transition.We elaborate on these concepts as follows: Definition 7 Dynamic lifetime state variables refer to a group of memory variables that consistently produce the (1) same value for identical inputs during a state transition instigated by message interactions.Importantly, the lifetime of these variables encompasses the time interval defined by the watch point range for that particular transition.Watch point range refers to the time interval divided by the watch point.We define the watch points as follows: Definition 8 Watch points refer to specific temporal markers associated with the start or end of learning queries, as well as any alterations in security attributes that may occur during program runtime.
In this paper, we provide six watch points: initiation of queries, enabling of encryption, completion of authentication, rekeying, disconnection, and query termination.These watch points unfold chronologically from the start of each query round.Scenarios may arise where these watch points overlap (such as enabling of encryption and completion of authentication at the same time) or are absent by default (such as the absence of rekeying).Memory variables according to these watch points are used to refine state calibration, thereby making state partition more relevant to the protocol's security attributes.
To collect real-time feedback regarding the protocol's state, we need to instrument probes within the target program's code during its compilation phase.These probes consist of external functions that are invoked during specific conditions.Running concurrently with the SUL, these probes detect triggering conditions and invoke external functions to either update or trace corresponding state memory variables.The probes are strategically placed in five specific code locations: query_start, memory_allocate, interaction_update, memory_deallocate, and query_end.The specific functionalities of these probes are detailed in Table 1.
These functions invoked by the probes are initialized at the start of a query and continuously monitor relevant variables through both memory allocation and deallocation phases.During each iteration of interaction, these tracked data are updated to reflect the shifts in state variables.At the termination of the query, the functions sift through and identify dynamic lifetime memory variables, computing the state's identification in the process.Information such as the start and end times of queries, interaction cycles, and watch points is transmitted to the probe's external function through shared memory by the intermediate mapper.The architecture of this system is illustrated in Fig. 6.

Calculate state identification
Once a query round concludes, the monitor scrutinizes the memory variables involved in the interactions.It then refines this list to further filter and determine the dynamic lifetime memory variables.These variables are used to compute unique state identifiers, commonly referred to as "state_id, " for each state transition.
Algorithm 2 Filter the dynamic lifetime memory variables and calculate the state identification As shown in Algorithm 2, the process computing state identification involves several steps.First, it excludes short lifetime variables that do not span any watch point range.Then, as interactions incrementally progress from the initial number, the variables encompassing the current watch point range are selectively filtered.These filtered variables are sorted, and their hash values are computed to serve as the state identifiers for the ongoing program interactions.Utilizing this methodology, the monitor calculates a sequence of state identifiers that align with the query sequence of the learning engine, facilitating the construction of a state machine.

Overall learning process
Our learning methodology is grounded in this state description and state identification method.The inputs for the learning algorithm include an alphabet, a happy flow (which is a standard negotiation sequence based on the given alphabet), a configurable depth for equivalence checking, an optional dictionary of variables that are irrelevant to the SUL states, and a optional number of dry run repetitions.The output is a finely inferred state-machine model.

Algorithm 3 State aware based model learning
As shown in Algorithm 3, the learning process unfolds in two primary phases.The first phase centers on gathering preliminary data and building an initial state-machine model.Leveraging the happy flow, initial query sequences are assembled and executed in a loop, with the monitor tracking and capturing relevant memory variables.The monitor uses the dictionary of state-irrelevant variables along with a differential comparison method to filter out unrelated variables.It also updates this filtering dictionary in real time.Upon capturing this data, the monitor assembles an initial state-machine model founded on the query outcomes.The second phase entails additional scrutiny of state variables and further refinement of the statemachine model.The algorithm generates query sequences Specifically, the methodology for equivalence checking operates as follows: Starting from a designated initial state, if two distinct sequences ultimately lead to states sharing the same state identification, the algorithm delves further.It either seeks I/O sequences that can differentiate these similar states, guided by the pre-set equivalence checking depth, or it evaluates the feasibility of merging these states into one.This added layer of scrutiny ensures a robust and accurate representation of states within the final state-machine model.In addition, we set up judgment and optimization for the disconnected state, which accurately identifies the normal communication end state through state identification, and no further query testing will be performed in this state.

Experimental evaluation and analysis
In this section, we delve into the experimental results, the inferred models, and the issues identified, all stemming from our state-aware model learning framework, SALearn.

IPsec implementations
We evaluated SALearn using two mainstream opensource IPsec implementations: Strongswan and Libreswan.These implementations offer a range of IKEv2-based VPN functionalities, as outlined in Table 2.

Learning scheme for IPsec
To develop a comprehensive model for IPsec, we utilized alphabets based on RFC7296 to facilitate learning under IKEv2 protocols.
In IKEv2, the alphabet focused primarily on standard negotiation messages and additional messages for rekeying, deleting, and testing both IKE and IPsec SAs, detailed in Table 3. Importantly, we differentiate between new and old SAs based on their creation time, assigning labels that allow the mapper to retrieve them in chronological order.After a successful rekeying operation based on RFC7296, the mapper transitions the IPsec SA from the old to the new IKE SA, while maintaining the previously assigned labels.It is worth noting that the unrestricted creation of SAs can lead to models that are overly complex and difficult to analyze.Therefore, we imposed a constraint: each IKE SA can have no more than two associated IPsec SAs.If a rekeying request exceeds this limit, the mapper returns a "None, " thereby keeping the number of SAs within our defined range.
To enhance learning efficiency, we implemented several optimization strategies and integrated a message inspection mechanism to aid in the detection of anomalous behavior.

Mapper configuration and optimization:
We configured reasonable timeout periods for each message type to speed up transmission.The mapper's messageparsing mechanism was also fine-tuned to reduce inconsistent feedback.To build on this, we introduced a query result cache.If a test's query sequence yielded a result that differed from a cached response, we conducted multiple retests until a consistent response was confirmed as the final query outcome.To ensure reliable interactions, TCP transmission is employed between the learning machine and the mapper.

Time synchronization:
A synchronized clock is maintained between the intermediate mapper and the execution monitor, ensuring accurate interaction counts and watchpoints.3. State-machine model refinements: In the final learning state-machine model, edges with identical start and end points are combined to simplify model inspection and analysis.For transitions lacking practical significance-such as querying to delete an IPsec SA when none exists-the mapper directly returns a "None" response.As a result, irrelevant edges are omitted from the final model.4. A mechanism is integrated into the mapper to scrutinize both incoming and outgoing messages.This feature checks for behaviors that deviate from expected responses and flags abnormal messages, which could be indicative of service crashes or other issues.

Learning result
We tested two leading IPsec implementations and compared the performance of our approach, SALearn, with two existing tools: StateInspector [34] and StateLearner [8].The statistical outcomes of the learning process are summarized in Table 4.The results reveal that SALearn significantly outpaces the black-box learning tool StateLearner in terms of both the number of queries required and the time needed for model inference.This efficiency stems from SALearn's use of correlated state variables, which allows for quicker differentiation between distinct states without the need for exhaustive queries.This ability to finely distinguish states also results in a state-machine model with greater complexity and depth.When compared to the gray-box tool StateInspector, SALearn demonstrates comprehensive model inference capabilities.StateInspector's limitations become evident when handling StrongSwan, as it can only trace the system calls in a single process.This constraint makes it incapable of capturing the right time to take a memory snapshot to filter candidate state memory for state-machine inference.Further, SALearn's fine-grained variable calibration leads to the inference of models with a greater number of states on Libreswan, as compared to StateInspector.

Case analysis
In this subsection, we discuss the characteristics and shortcomings of these implementations based on the state-machine model inferred by SALearn.Presenting the complete model would be cumbersome; therefore, we have optimized it for clarity.First, we removed certain non-essential transition paths to highlight the crucial aspects of the model.For example, in the case of IKEv2, some key IKE and rekey ESP transitions were omitted.
Second, irrelevant transitions were eliminated.Transitions with a "None" response have been excluded.Finally, transitions with identical starting and ending nodes were merged, and any remaining unspecified input strings for the state were represented as "Other." To further clarify the state machine, each model features bold lines to indicate normal negotiations (happy flows).Red dotted lines mark paths with issues were identified.Labels on the edges of the state-machine model correspond to the current state's inputs and outputs, denoted by "IO." Abbreviations are used to represent corresponding abstract messages.Labels for messages that resulted in meaningful responses and led to state transitions are highlighted in larger font sizes.

Strongswan IKEv2
Figure 7  States s6, s7, s8, s9, s10, and s11 describe various scenarios that involve rekeying or deletion of IKE or IPsec SAs after completing authentication in state s2.Some of these states are interchangeable in terms of transitions.For instance, state s2 can morph into either s6 (by rekeying the IKE SA and then deleting the previous one) or s9 (by rekeying the IPsec SA and discarding the old one).Importantly, when multiple IPsec SAs are active, StrongSwan only supports ESP communication via the most recently established SA, such as in states s9 and s10.Whether initiating direct ESP creation (transitioning from s2 to s9), rekeying IKE before ESP creation (going from s2 to s6 and then to s10), or creating an ESP prior to IKE rekeying (transitioning from s2 to s9 and then to s10), StrongSwan exclusively facilitates ESP communication through the latest tunnel.Deleting the most recent ESP enables communication through the older, still-active ESP (transitioning from s9 to s11).

Libreswan IKEv2
Figure 8 presents a simplified state-machine model of Libreswan operating under the IKEv2 alphabet.Like Strong-Swan, the sequence from state s0 to s1 and then to s2 covers interactions involving IKE_SA_INIT and IKE_AUTH.Transitions from s2 to s3 and then to s4 are concerned with deleting the active ESP and IKE, effectively terminating the connection.Regarding the management of multiple ESPs, Libreswan also supports ESP communication through the most recently created instance, as evidenced by the transition from s2 to s5.However, once the latest ESP is deleted, the older ESPs cannot be reactivated for communication, as illustrated by the transition from s5 to s6.
In contrast to StrongSwan, Libreswan demonstrates more flexibility in state s1.Specifically, after completing the IKE_SA_INIT interaction, Libreswan can still complete IKE_AUTH requests after receiving some unexpected request.Moreover, Libreswan allows the creation of an ESP on an older IKE channel, facilitating communication through that newly established ESP, as seen in the transition from s7 to s8.Note that ESPs created on these older IKE channels interfere with ESP communication on the current channel.Even deleting the ESP established on the older IKE channel does not reinstate the communication capabilities of the ESP on the current IKE channel, as shown by the transition from s8 to s9.

Discussion
Based on the comparative analysis between SALearn and various model learning tools, as well as SALearn's performance in dissecting different IPsec implementations, several key observations emerge: Despite these strengths, SALearn is not without limitations: 1.For SALearn to operate effectively, the protocol code needs to be instrumented, thereby imposing higher learning requirements compared with StateInspector and StateLearner.2. The learning process in SALearn necessitates the gathering and filtering of state-variable information, introducing computational overhead.3. The task of collating state-variable information to identify abnormal paths and states in SALearn involves manual effort, adding to the overall complexity of the process.
Further discussion on the implications of identified anomalies:

Conclusion
In this research, we introduce a state-aware model learning approach aimed at bridging the gap between inferred states and the inner workings of a protocol device.By synchronizing network interaction I/O and filtering dynamic memory variables across their lifetimes, our method refines the inferred state machine to include more relevant elements.Our evaluation, covering two widely used IKEv2 implementations, reveals two instances where these systems violate expected behavior.The results validate the efficacy of our approach; a digital twin paired with a state-aware model can give insights into predicted communication patterns in edge computing scenarios where devices connect with each other in a decentralized way utilizing lightweight protocols.Based on the learnt behaviors, the system may update its security measures in real-time, detecting any abnormalities or potential attacks on lightweight protocol implementations.Regarding memory information management, the current approach is somewhat simplistic, focusing primarily on the lifespan and values of memory variables for filtering.This approach leaves much to be desired in terms of the precision and interpretability of these variables within the context of the program.To address this issue, future work could involve several improvements.First, utilizing more specific variables like macro definitions or enumerated types could refine state calibration and enhance its interpretability.Second, considering advanced analysis methods such as symbolic execution or taint analysis could provide a more accurate understanding of how variables influence program states, thereby honing the precision of state inference.Additionally, a formal analysis model could be developed based on the inferred state paths and calibrated variables to assess compliance with security requirements.Taken together, this research not only broadens the methodologies available for protocol model learning but also offers valuable insights into enhancing the security analysis of network protocols.

State-aware fuzzing test
Detailed framework of state-aware model learning method messages.It also captures specific watch points during each packet transmission cycle.Functioning as a simulated client, the mapper converts abstract message queries from the learning engine into actionable packet requests for the SUL.After the SUL processes these packet requests, the mapper converts the resulting concrete messages back into abstract form for the learning engine's consumption.Each query cycle resets the SUL to its original state.The mapper processes each request in the query sequence sequentially, collecting the responses, and then furnishing the learning engine with a feedback sequence composed of abstract response messages.During these interactions, the intermediate mapper not only simulates the client's role but also maintains the session information with the SUL.If the message exchange involves encryption or decryption, the mapper handles these processes in real time, ensuring smooth interaction with the SUL.It also monitors and records various watch points related to SUL behavior during the learning phase-such as query initiation, encryption procedures, authentication completion, rekeying processes, disconnections, and query terminations-and communicates this data to the execution monitor, facilitating the selection of state-related variables.

Definition 2
An extended finite state machine (EFSM) with memory calibration based on alphabet can be represented by a seven-tuple A =< S, s 0 , I, O, M, δ, > , where 1. S is a finite, nonempty set of states; 2. s 0 ∈ S is the initial state; 3. I ⊆ is an input set; 4. O ⊆ is an output set; 5. M is a set of memory calibrations; 6.
δ : S × � → S is the state transition function, where δ(s, i) = s 1 means that the state machine accepts input i in state s and transitions to s 1 .7. : S × → O × M is the state output function, where (s, i) = (o, m) indicates that the state machine accepts input i in state s, generates output o, and updates its memory calibration to m.

Definition 5
Binary relation ∼ is defined for ∀s, s (s M , i M ) = (o M , m M ) ∼ ,where under the action of the equivalence relation ∼ , M generates output o M and updates the memory calibration information to m M upon receiving input i M in state s M .
′ , s ∼ s ′ if and only if ∀i ∈ I * , (s, i) = (s ′ , i) .It can be observed that this binary relationship is equivalent.The two equivalent states imply that the same input always produces the same output.Definition 6 Based on the equivalence relation ∼ , an EFSM M =< S M , s 0,M , I M , O M , M M , δ M , M > is con- structed for state-aware model learning, where the subscript M is only used to distinguish the state machine, and is constructed as follows: 1. S M = S/ ∼ ; 2. s 0,M ∈ S M is the initial state; 3. I M is a set of input strings that is a subset of * ; 4. O M is a set of output strings; 5. M M is a set of memory calibrations; 6. δ M (s M , i M ) = s ′ M , where under the action of the equivalence relation ∼ , δ M transitions the state from s M to s ′ M upon receiving input i M ; 7. M

Table 1
Probe types and descriptions

Table 2
Information of IPsec SUL

Table 4
Statistics of state-aware model learning presents a streamlined state-machine model of StrongSwan operating under the IKEv2 alphabet.The interaction sequence goes from states s0 to s1 and finally to s2, where IKE_SA_INIT and IKE_AUTH occur.In state s2, both parties establish IKE and IPsec SAs, thereby enabling Encapsulating Security Payload (ESP) communications.Advancing from s2 to s3, and finally to s4, involves deleting the active IPsec and IKE SAs, culminating in a communication termination.Furthermore, Strong-Swan prohibits the insertion of unexpected messages during negotiations prior to authentication.As a result, Simplified state-machine model of StrongSwan with IKEv2 alphabet submitting an abnormal authentication request in state s1 terminates the negotiation and transitions to state s5.
Simplified state-machine model of Libreswan with IKEv2 alphabet 2. When compared with the similar gray-box model learning tool StateInspector, SALearn offers broader applicability across protocol implementations.It relies solely on the collection of code instrumentation data for learning, thereby circumventing the limitations of StateInspector, such as stack memory constraints and multi-process complexities.3. SALearn is capable of inferring comprehensive statemachine models based on symbolic alphabets.These models faithfully encapsulate the intricate logic governing message interactions within the protocol implementations.4. Analysis of the models generated by SALearn enables the identification of not only abnormal paths and logical inconsistencies that contravene protocol specifications but also uncovers potential anomalous states.These findings can catalyze the detection of issues in the protocol's code implementation.
1. Inconsistent IPsec Management in StrongSwan: The StrongSwan state machine under IKEv2 reveals that when multiple IPsec SAs are active, StrongSwan prioritizes communication through the most recently established ESP.Intriguingly, further packet analysis shows that StrongSwan crafts response messages using the latest ESP key, even if a test request is sent using a previously established ESP key.Conversely, upon deleting the newest ESP and sending a test request via the previous ESP, StrongSwan invariably replies using the second most recent ESP.According to Section 2.8 of RFC 7296, "to rekey a child SA within an existing IKE SA, create a new, equivalent SA, and when the new one is established, delete the old one." In contradiction to this specification, Strong-Swan neither deletes the old IPsec SA upon establishing a new one, nor prevents communication between the old and new IPsec SAs.This behavior violates the standards and undermines forward secrecy.2. Flawed Logic in IKE SA Updating in Libreswan: The IKEv2 state machine for Libreswan reveals that the system allows for the creation of ESP on older IKE channels, and such ESPs are functional for regular communication.In contradiction to Section 2.8 of RFC 7296, which states that "after the new equivalent IKE SA is created, the initiator deletes the old IKE SA, and the Delete payload to delete itself MUST be the last request sent over the old IKE SA, " Libreswan neither mandates the deletion of the old IKE SA upon establishing a new one, nor restricts ESP creation on the old IKE SA.This lapse contradicts the RFC guidelines and compromises forward secrecy.