CNN based lane detection with instance segmentation in edge-cloud computing

At present, the number of vehicle owners is increasing, and the cars with autonomous driving functions have attracted more and more attention. The lane detection combined with cloud computing can effectively solve the drawbacks of traditional lane detection relying on feature extraction and high definition, but it also faces the problem of excessive calculation. At the same time, cloud data processing combined with edge computing can effectively reduce the computing load of the central nodes. The traditional lane detection method is improved, and the current popular convolutional neural network (CNN) is used to build a dual model based on instance segmentation. In the image acquisition and processing processes, the distributed computing architecture provided by edge-cloud computing is used to improve data processing efficiency. The lane fitting process generates a variable matrix to achieve effective detection in the scenario of slope change, which improves the real-time performance of lane detection. The method proposed in this paper has achieved good recognition results for lanes in different scenarios, and the lane recognition efficiency is much better than other lane recognition models.


Introduction
With the advent of autonomous driving technology, people can largely get rid of the safety problems caused by daily manual driving. Therefore, self-driving cars are sought after by many automobile consumers. In recent years, many researchers from academic institutions and industries have engaged in autonomous driving technology and these researches have promoted the development of image processing and computer vision technology. As a key part of the automatic driving system, lane detection technology is meaningful. At present, the difficulty in lane detection is how to deal with lane detection accuracy and real-time issues at the same time, so we need to improve the accuracy and efficiency of lane recognition between traditional and neural network-based lane recognition methods. The traditional computer vision-based *Correspondence: njnuwjs@njnu.edu.cn 3 Key Laboratory for Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing, China 4 Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, China Full list of author information is available at the end of the article lane detection technology is mainly based on image processing algorithms to extract the features of lane lines, reduce the image channels, perform gray processing on the original image, and then use Canny algorithm or Sobel algorithm to edge the grayed image, extract some features of the acquired image, and then perform lane line fitting after extracting the lane. Common lane fitting models include the cubic polynomial, the spline curve, and the arc curve [1][2][3]. At the same time, the quality of fitting is improved. Bertozzi [4] usually uses an inverse perspective transform to convert the image into a bird's-eye view, and finally performs lane tracking and other operations. Recently, deep learning techniques have been used for lane detection and can effectively detect the lane images. Based on large-scale data training, the probability of lane detection errors can be greatly reduced and detection accuracy can be improved.
Cloud computing, as a kind of distributed computing paradigm, can decompose large-scale data into submodules through the network center, allocate them to a system composed of multiple servers for processing and analysis, and finally feed the calculation results to the central node [5,6]. Cloud computing system has a new resource scheduling method, and the use of the cloud computing system can effectively reduce the system computing load and improve the data processing efficiency with current computer systems, which is helpful to lane detection [7,8]. At the same time, a dynamic resource allocation method for intensive data in the cloud, which has greatly improved the fault tolerance of cloud computing has been proposed [9]. These are the factors that cloud computing has become the current mainstream framework [10]. Based on the advantages of cloud computing technology in large-scale data processing, we decided to use cloud computing technology to improve data processing efficiency in lane detection.
Edge computing refers to a computing platform that is close to the object or data source and can integrate core capabilities such as networking, computing, and applications [11]. It has a very prominent advantage in the field of intelligent transportation [12]. To achieve safe and reliable operation, smart cars need to process a large amount of data collected by multiple sensor in real times [13]. Through edge computing, data processing will be made closer to the source [9]. Compared with cloud computing, centralizing data processing in the cloud can greatly reduce latency time and effectively improving the data processing efficiency of smart cars. In terms of security and privacy protection, edge computing solves the risk of transferring private data to the cloud and leaking user privacy under the cloud computing model, which can effectively protect data and privacy security using its distributed and mobile features [14][15][16]. A data privacy optimization scheme that supports edge computing [17,18], which further strengthens the data security of edge computing has been proposed and we have combined this scheme in the lane detection process in order to further improve the efficiency of data processing.
However, it is difficult for cloud computing or edge computing to play a prominent role in lane recognition of smart cars. Current lanes processed by convolutional neural networks face the problem of low real-time performance, especially in complex scenarios [19]. Currently, few researchers use cloud computing or edge computing to analyze lane lines, while only use cloud computing needs to upload a large amount of data to the cloud for processing, which increases the load on the network and is difficult to obtain better results under the circumstances of the Internet. Therefore, we combine cloud computing and edge computing to process lane data, which greatly improves the real-time performance of lane detection during the driving process of smart cars.
The main contributions of this paper are as follows: 1) Inspired by cloud computing and edge computing, adopt more efficient data processing methods to deal with lane image recognition, the real-time performance of lane image detection is greatly improved, and the delay is reduced.
2) Based on convolutional neural network and semantic segmentation, the lane line problem in multi-lane scenes is effectively identified, and when vehicles change lanes, the current lane scene can be dynamically identified.
3) During the lane fitting process, the fixed inverse perspective transformation matrix is changed. The neural network was used to dynamically generate the inverse perspective transformation matrix, which effectively solved the problem of lane line deviation of the fixed inverse perspective transformation matrix fitting under the gradient road scene.
The remainder of this paper is organized as follows: In "Related work" section, we review the peer research and work. "Prior knowledge" section describes a priori knowledge of lane detection, including the two main frameworks of lane detection techniques in this article. "Method" section describes the specific steps of lane detection. The experimental environment and results analysis are included in "Experimental evaluation" section. Finally, we conclude our work in "Conclusion" section.

Related work
The current methods for lane detection can be divided into three main categories: road-based models, roadbased features, and neural network-based models. The detection method based on the road model mainly abstracts the lane lines into geometric shapes such as straight lines, curves, parabolas, and splines, and uses different two-dimensional or three-dimensional models to determine each model parameter.
Wang et al. [20] first located the initial area of the lane line, and then converted the lane line detection problem into the problem of determining the spline model in the initial area. Tan et al. [21] proposed a robust curve lane detection method, which uses the improved river method to search for feature points in the far field of view, guided by a straight line detected in the near field or the curve of the last frame, and can connect dotted lane marks or fuzzy lane marks. Shin et al. [22] used parallel features with constant distance between the left and right lane lines in the top view, and use parallel lines to detect and track lane lines. The above method has high accuracy for detecting a specific lane line, but needs to select an appropriate model and parameters. One model is often difficult to cope with multiple road scenarios, and its generalization ability and real-time performance are poor.  [23] proposed a gradient-enhanced conversion method for robust lane detection, which converts RGB space images into grayscale images and combines Canny's algorithm to generate large gradients at the boundaries. Son et al. [24] realized lane line detection by extracting white and yellow lane lines respectively. Jung et al. [25] proposed an effective method for reliably detecting lanes based on spatiotemporal images, which improved the detection accuracy. Niu et al. [26] proposed a lane detection method with two-stage feature extraction in order to solve the problem of robustness caused by the inconsistency of lighting and background clutter, where each lane has two boundaries. The above-mentioned detection methods are easy to implement, have low complexity, and can obtain high real-time performance, but are extremely susceptible to environmental influences. In rainy days or under insufficient light, misjudgment of lane lines is prone to occur.
With the continuous development of science and technology and the continuous improvement of hardware equipment, neural network-based detection methods have made breakthrough progress. Using deep learning to detect lane lines can ensure good recognition accuracy in most scenarios [27]. Insteading of relying on highly specialized manual features and heuristics to identify lane breaks in traditional lane detection methods, target features under deep learning can automatically learn and modify parameters during the training process.
Liu et al. [28] used the mobile edge computing framework to greatly reduce the time loss of learning. Qi et al. [29,30] significantly improved the data processing rate through the method of distributed location and multiple data sources. John et al. [31] used the learned CNN model to predict the lane line position in the image, especially in the case of occlusion and missing, CNN can extract robust features from the road image, and then train an additional tree-like regression model directly from the features estimated by lane line position. Kim et al. [32] used serial end-to-end transfer learning to directly estimate and separate the left and right lanes, and collects a dataset that includes multiple road conditions to train a CNN model. Pan et al. [33] proposed a spatial CNN (SCNN) and the convolution of the extended layer structure in the feature map is a slice structure convolution. In this way, the information transfer between pixels is performed between the rows and columns of each layer, which is suitable for strong spatial relationships, but long continuous-shaped structures or large targets lack apparent clues in this method, such as lane lines, walls, and pillars. Zhang et al. [34] applied deep learning-based convolutional neural networks to instance segmentation of monocular vision to achieve segmentation of different objects in actual scenes, and achieved better segmentation results, but less robust. The use of deep learning methods for lane line detection has greatly improved the robustness and accuracy compared with traditional detection methods, but in order to train the network, a huge data set is required as support, and the requirements for hardware facilities are correspondingly increased. Xu et al. [35,36] proposed a video surveillance resource method when studying the Internet of Vehicles (IoV), which supports edge computing and can effectively use edge computing to solve the problem of lane image acquisition and training requirements and high network bandwidth, we combine this method in the data processing process to distribute computing power from the central node to devices with image acquisition and processing capabilities.

Lane prediction
Lane detection usually requires the use of relevant algorithms to extract the pixel features of the lane line, and then the appropriate pixel fitting algorithm is used to complete the lane detection. Traditional lane detection uses Canny edge extraction algorithm or Sobel edge extraction algorithm to obtain lane line candidate points and use Hough transform for lane feature detection, but most operations are based on manual feature extraction [37,38]. The latest research is based on deep neural networks to make dense predictions instead of artificial feature extraction.
In order to enable the model to adapt to more road scenes, by analyzing the structure of classic convolutional neural networks and semantic segmentation, we use an improved two-branch network and custom function network to convert the lane line detection problem into an instance segmentation problem detection lane line, where each lane line forms its own instance. In this paper, we use a multi-branch network with two branches and the twobranch network contains a lane segmented branch and a lane embedded branch that can be trained end-to-end. Lane segmentation branches output backgrounds or lane lines, and lane embedding branches further decompose At the same time, combined with edge computing, a blockchain-based edge offloading method is used to ensure real-timeness and integrity in the data transmission process [39,40]. For real-time processing and feedback, edge computing, as a new computing paradigm, has a better ability to solve some problems such as high transmission delays, high bandwidth expenditures, and privacy leaks.

Lane fitting
To estimate the lane instance, determine which pixel belongs to which lane, we need to convert each of them into a parameter description. To this end, we use a widely used fitting algorithm. At present, the widely used lane fitting models are mainly cubic polynomials, spline curves or arc curves. Inverse perspective transformation is used to transform the image into a "bird's-eye view" to improve the quality of the fitting while maintaining the computational efficiency, and then adopt the method of curve fitting [41]. The fit lines in the "bird'seye view" can be re-projected into the original image through an inverse transformation matrix. Generally, the inverse perspective transform calculates the transformation matrix on a single image and can remain fixed, but if the road plane changes, the fix is no longer valid, causing lane points near the horizon to be projected to infinity, which affects lane line simulation total accuracy. In order to solve this situation, we apply an inverse perspective transformation to the image before fitting the curve, and uses a loss function customized for the lane fitting problem to optimize it. A custom function network is used to generate a transformable matrix and transform the lane pixels, then use curve fitting polynomials to perform pixel fitting on the converted pixels, and finally convert the fitted pixels into the input image. The advantage of this network is that the detection algorithm can fit the pixels of distant lanes with good robustness when the road surface changes, and better adapt to lane changes.
The overall framework of the model is shown in Fig. 1. The camera on the car collects the image and transmits it to the cloud data processing center. After the image is subjected to binary segmentation and embedded segmentation, the clustering operation is performed and combined with the transformable matrix generated by the custom network to generate the final lane detection image.

Binary segmentation
In lane detection, in order to save computing resources, we binarize the image.

Lane instance embedding segmentation
The lane loss embedding branch function of the dual branch network is used to output the pixel embeddings of each lane, so that the pixel embeddings belonging to the same lane are clustered together, and the pixel embeddings belonging to different lanes have the largest distance. In the way, the pixel embeddings of the same lane will come together to form a unique cluster for each lane and this is achieved by introducing two terms, one is the variance term (L var ), which applies a pulling force to each embedding and tends to be closer to the average embedding of the lane. The other is the distance term (L dist ), which clusters the centers push away from each other. The pull force is active only when the distance from the embedding point to the cluster center is greater than δ v . The thrust force is active only when the distance from the embedding point to the cluster center is less than δ d .
In (1) represents the threshold of the variance term, δ d represents the threshold of the distance term, and the total loss L = L var + L dist . After the network converges, the lane pixel embeddings will be clustered together so that the distance between each cluster is greater than δ d and the radius of each cluster is less than δ v .
In Fig. 2a and b are the lane images collected, and the images generated by the example embedding segmentation are (c) and (d), respectively.

Clustering
For a large number of unlabeled data sets, the data set is divided according to the inherent similarity of the data, and the data set is divided into different categories with inherent differences. The similarity between the data in the same category is large and the similarity between the data in different categories is small. In order to improve the robustness of lane line detection in different environments, it is necessary to perform cluster analysis on the lane line pixels, embed the pixels of the same lane line together, and distinguish the pixels of different lane lines. In the future, clustering can ensure the recognition accuracy under the conditions of insufficient lighting conditions and poor traffic conditions.
In this paper, we complete clustering by iteration, combining the clustering algorithm with the loss function. In

Pixel fitting
The output of the two-branch network is a set of pixels for each lane line. It is not ideal to fit a polynomial with these pixels in the input image space, because a higher-order polynomial is needed to process the curve lane. Several commonly used fitting methods are RANSAC algorithm, Bezier curve fitting, curve fitting based on spline interpolation, and polynomial fitting based on least squares. In this paper, the image is first transformed by inverse perspective and projected into the "bird's eye view", where the lanes are parallel to each other, so the curve lane can be fitted by a second to third order polynomial. However, in these cases, the fixed transformation matrix H is calculated only once and applied to all images. This results in errors in the case of ground-level changes, where the vanishing point projected to infinity moves up or down as shown. To solve this problem, this paper trains a neural network with a custom loss function. The network is optimized end-to-end to predict the parameters of the perspective transformation matrix H. In the perspective transformation matrix H, the transformed lane points can be optimally fitted with second or third order polynomials. The prediction is based on the input image, allowing the network to adjust the projection parameters when the ground plane changes, so that the lane line fit is still correct. In (2), the transformation matrix H has 6 degrees of freedom, of which six variables a-f represent 6 degrees of freedom parameters. The purpose of placing 0 is to enforce the constraint and make the horizontal line remains horizontal under the transformation.
The network architecture of the custom function network remains small in size and consists of a continuous block of 3 × 3 convolutional layers, Batchnorm, and ReLU activation functions, and the largest pooling layer to reduce the network size, and finally add 2 fully connected layers. Before a pixel fits a lane line, a custom function network output transformable matrix is used to transform the lane pixels. The inverse perspective transformation is performed on the road image to a new top view to generate pixels in the top view lane after the inverse perspective transformation and the inverse perspective transformation formula is shown in (3).
In (6), T is the predicted lane pixel point, and H −1 represents the perspective transformation matrix. In order to train the output of the custom function network H, which is the most suitable transformation matrix H for pixel fitting polynomials, the following loss function is constructed. Since lane fitting is done by using the least squares method, the loss function is differentiable and the loss function of the custom function network is shown in (7).

Data set
This article uses the Tusimple dataset as a training and testing source, which includes lane line data at different times during the day, and also includes situations with 2 lanes, 3 lanes and 4 lanes. For the lane segmentation task, the final segmentation result needs an evaluation index to measure the performance of the test. We choose F-Measure to measure the model's ability to predict the lane line because the F-Measure evaluation index can comprehensively consider the two measures of precision and recall in the experiment. All pixels of the lane line are defined as positive samples, which are denoted by P and background. A pixel is a negative sample, denoted as N. It is determined whether the pixel prediction is correct by judging whether the predicted image result and each pixel of the real label are equal. Formulas for evaluating image prediction results using the F-Measure index are shown in (8) and (9).
Precision reflects the accuracy of the model and is expressed as the ratio of the number of correct predictions to the total number of predictions. Recall reflects the recall rate of the model and is a measure of coverage, expressed as the ratio of the number of targets detected in the data to the total number of targets that actually exist. TP represents the number of positive instances detected correctly, FP represents the number of instances detected as positive cases, FN represents the number of instances detected as negative cases and β is an adjustable parameter which role is to highlight the proportion of Recall and we set it to 0.8, which helps reduce the damage caused by misrecognizing lane areas.

Experimental results
The experimental environment of this article is Ubuntu 16.04 operating system, using NVIDIA GEFORCE 920M graphics processor to implement all code based on Tensorflow deep learning framework. During the training process, the input image is regularized and randomly cropped and randomly rotated. The size of the obtained pictures is uniformly set to 256 × 512, and the batch size is set to 6 so that 6 pictures can input at the same time for training. The gradient descent algorithm optimizes the training of network parameters. The optimizer has a momentum parameter of 0.9, a weight decay of 1 × 10 −4 , and a learning rate update strategy of exponential decay: In (10), lr init is the initial learning rate, lr init = 0.0005, N is the total number of iterations for training, set to 80010, iter is the current number of iterations, and power is the attenuation coefficient, set to 0.9. We select some images from the Tusimple data set for testing. By comparing with the traditional lane detection algorithms and the currently used deep learning algorithms, it is found that the lane detection algorithm used in this paper has excellent performance in scenarios such as inadequate lighting, shadow occlusion, missing lane lines and curved lanes. By using the model in this paper to detect under different road complex environments, we can get the following detection results with Table 1. We found that other models can achieve correspondingly good results when dealing with lanes in some scenes, but it is difficult to take into account all the scenes, while our proposed method is superior to other models in various scenarios, which is a benefit for lane recognition technology.
At the same time, we obtained the time information of the relevant model when processing the lane images. During the lane instance segmentation and clustering process, each frame image took 16.6ms and the FPS reached 60.2. During the lane fitting process, each frame took 2.4ms and the FPS reached 416.7. The total test efficiency reached 52.6 frames per second and achieved a good processing rate.
In Fig. 3a and b are the lane images collected in a poorly lit environment, (c) and (d) are the lane detection results. It can be seen that the lane line detection model used in this paper can still achieve better detection results under poor lighting conditions. In Fig. 4a and b are the lane images collected in the road line degradation scene, (c) and (d) are the corresponding  In the inverse perspective transformation, the use of a fixed transformation matrix will cause errors when ground changes, which will cause the vanishing point projected to infinity to move up or down. We trained a neural network with a custom loss function that can dynamically predict the parameter values of the transformable matrix. The transformed lane points are fitted with second or third-order polynomials. The prediction is based on the input image, allowing the network to adjust the projection parameters when the ground plane changes, making the model excellent stickiness. The final tests show that the model in our paper has better performance in scenarios such as insufficient lighting and lane line degradation.