Target tracking using video surveillance for enabling machine vision services at the edge of marine transportation systems based on microwave remote sensing

Automatic target tracking in emerging remote sensing video-generating tools based on microwave imaging technology and radars has been investigated in this paper. A moving target tracking system is proposed to be low complexity and fast for implementation through edge nodes in a mini-satellite or drone network enabling machine intelligence into large-scale vision systems, in particular, for marine transportation systems. The system uses a group of image processing tools for video pre-processing, and Kalman filtering to do the main task. For testing the system performance, two measures of accuracy and false alarms probability are computed for real vision data. Two types of scenes are analyzed including the scene with single target, and the scene with multiple targets that is more complicated for automatic target detection and tracking systems. The proposed system has achieved a high performance in our tests.


Introduction
Automatic satellite and aerial surveillance has received much attention from the related industries, governments, and environmental departments since many years ago.Today, there is a reliable possibility of Earth observation and surveillance by using remote sensing satellites, drones, and other sophisticated facilities for persistent and periodical surveillance [1].For example, these tools are utilized to monitor cities, analyze weather, protect environment (e.g., vegetation), and control political boundaries of different countries for national security issues.
A key application of the persistent (online, real-time) and pervasive surveillance is to use them in military services whereas the periodical surveillance is mainly used for environmental use.The persistent use is mostly more expensive to provide real-time services, however the periodical is cheaper and may not be real-time (either offline or semi-realtime) [2].
One of the main ideas of automatic surveillance for real-world use is to monitor maritime transportation in sea and ocean.This may be toward commercial use, political aspects, or military actions.Its commercial use is the main object of this paper where the online monitoring of trade and fishing ships, marine vehicles, and their supporting stations is performed for various goals such as economic management, traffic management, and security or safety issues.In general, video tools-assisted surveillance systems for ships are mainly using groundbased cameras placed on marine stations and offshore [3,4].Nevertheless, the surveillance systems suffer from numerous obstacles and lacks including limited coverage and complicated maintenance and repair process.Fortunately, space-borne and airborne surveillance by using satellites and drones (unmanned aerial vehicles (UAVs)) is an ideal solution to overcome the mentioned weaknesses.In addition, continuous vision-providing satellites and real-time/semi-real-time drones belong to a kind of relatively fresh technology in the world.Since a long time ago, these platforms have been providing offline-monitoring services based on the RGB cameras and other optical and radar sensors for capturing still images.The idea of making real-time videos is a new goal of the remote sensing industry in recent years.In addition to classical optical sensors such as infrared and visible-light cameras (including multi-spectral imaging (MSI) sensors), radars-working based on my microwave imaging technology (here, active imaging)-are used in remote sensing and surveillance industry.Despite the optical sensors that are passive and cannot work in nights and all-weather (just for completing the subject, a weak performance of infrared may be useful in nights), active microwave sensors in radar remote sensing do not face any problem in such situations.Therefore, in recent years, many detection and recognition algorithms have been suggested in order to use this capability of radar imaging in days and nights both, and in allweather conditions [5][6][7][8].The radar sensors are well deployed on aerial and space-borne platforms such that they are recently used for making remote sensing videos [9][10][11][12][13][14].Compared with the optical images and videos, radar images and videos have more spatial resolution even though they highly suffer from the lack of natural spectral information like the real color.Colors can be very useful to detect and recognize moving vehicles, and are a good source for making AI features for intelligent vision systems.Now-adays, the technology is going to combine the capability of both imaging tools to boost visual performance and benefit from their advantages simultaneously, for instance, joint synthetic aperture radar (SAR)-optical image fusion and SAR image colorization using deep learning [15,16].
The main idea behind this work is to propose a new framework of processing blocks for an edge-enabled online target detection and tracking system.In fact, we illustrate a kind of architecture based on the existing techniques in machine vision.Online monitoring at the remote sensing platforms in space or sky is a new crucial need determined by the remote sensing industry.To do some processing tasks such as detection and tracking, the remote sensing data processing should be performed as on-board.However, on-board processing of high-volume data in real-time use may be a difficult task in a small platform such as drones and mini-satellites.Today, the only way of real-time implementation is not to improve the processing hardware or to decrease the computational complexity of the utilized algorithms in the processing system.Fortunately, edge-fog-cloud architecture is a new solution to realize the real-time performance of the platform.Traditional cloud-based processing is now being replaced with cloud-edge or cloud-fog-edge-based processing.Edge includes local processors at the thing layer or its around into an internet of things (IoT) infrastructure [17,18].This new model helps to schedule the processing tasks based on their complexity and/or priority.Thus, real-time algorithms are performed at the edge to reduce delays.The edge processors in the application of this paper can be other near mini-satellites or drones provided through an ad-hoc network of remote sensing objects.Figure 1 shows how edge processors are working.A supporting node, either drone or mini-satellite, can be a communication relay and edge computing sever at the same time, just if the network needs such a capability.In addition, a surveillance network can be a combination of UAV networks and mini-satellite networks to increase the performance of computing and communication services.All UAVs/mini-satellites can send and receive data, but a relay node among them, which is responsible for being in contact with the ground stations, should have more energy storage capacity, and better transceivers for long-range communications.

Machine vision techniques in marine remote sensing
Although the most satellite images are taken by the optical sensors that might be corrupted by bad weather, clouds, marine waves, and so on, some research work has focused on the optical images to detect ships [19][20][21][22].Most classic detection techniques in marine remote sensing have used classic learning models, for example, supervised statistical learning methods.Two main directions of the detection strategies in remote sensing systems are to remove false alarms (FAs) and to find the main objects.For example, the techniques can extract marine vehicles such as ships based on the discrepancies in this scene and gray-level difference between the potential targets and the image background [23,24].Then, most of the algorithms use the properties of the devices such as shape in template matching or other features for applying to a classifier to recognize the targets [25].Some existing methods used prior information of the offshore to determine the sea areas that can help us find the real targets much better [26].
The issue of low temporal resolution of many satellite-imaging sensors has made the recognition and tracking low-accuracy, and limited, while monitoring the ships.Today, the video surveillance-providing low Earth orbit (LEO) satellites are available because of the big progress of camera technology in terms of both spatial and temporal resolution.In recent years, a number of the related studies on video satellites have been accomplished that caused a reliable detection, recognition and tracking of static targets and moving objects [27][28][29][30].It is noticeable that the satellite videos and real-time persistent control are crucial for maritime applications.Among the three key words of the artificial intelligence (AI) used in SAR video systems, i.e., detection, recognition, and tracking, most outputs of the current research have focused on the detection topic, mainly, ships for maritime systems.The lack of research on the other two topics, video recognition and tracking, is explicitly sensible from the related literature.The present study is going to indicate a suitable solution to jointly detect and track the moving objects (mobile targets).Our finding is presented as a remote sensing data processing system benefiting from microwave imaging in SAR sensors and edge computing.It will help to find and track ships, whether commercial, or military/combat devices (and maybe equipment such as fighters) on the sea surface.In detail, this system is using the Kalman filter along with some additional preprocessing.A main part of the processing is to do pre-processing on radar frames to enhance the tracking step.Note that in the SAR imaging technology, the aim from the moving object/target is its radar shadow so that we should track the shadow of the real objects, however for simplicity in all the text, the shadow is named object/target.It is expected that by using the modern AI and edge computing tools with high performance in other areas of research [31,33], for example LSTMs with good capability of working with temporal data [31,32], the performance of our system can be improved.

Organization
This paper is organized as follows.The second section is to review the used methods from the literature and to form the proposed system.In the third section, we provide all tests and results.The last section is a conclusion on the whole of the research.

Materials and methods: basic concepts and proposed system
This section is presented into two subsections.First, the basic tools and pre-processing steps are provided, then, the Kalman filtering and the proposed system are introduced.
SAR images are captured from a very long distance from the oceans.However, these images suffer from complicated noises and artifacts.In total, the two biggest categories of noises are made because of the existence of multiplicative artifact of the imaging system mainly known as speckle noise and low signal-to-noise ratio (SNR) of the imaging system caused by the unsuitable transmission power of the sensor and the aerial or satellite platforms' height.
To overcome this issue, a step of pre-processing under the subject of noise removal is essential.Noise removal, or more realistic noise reduction, is not the only pre-processing step here.The other steps could be image spatial enhancement (usually as arbitrary), the behavior of SAR noises are nonlinear and multiplicative so that nonlinear filters should be normally used.Moreover, due to the semi-sparse nature of SAR images, histogram equalization and morphological operations may be required.All of these are named pre-processing that can well affect the quality of detection, recognition, and tracking.

Noise removal
As known, one of the well-discussed roots of unsuitable quality of digital images is noise.SAR images may be in more danger of being affected by the noises as a kind of added signals.Adding noise to the main information of an image will reduce the quality of experience (QoE) sensed by an end-user (human) or an AI interpreter.Noises are sometimes interpreted into two types of distortions depending upon their reasons of appearance; one is ambient or environmental noises, and the other one is internal noises or artifacts of the imaging device or processing tools.In this research, we treat all in the same way and considerer their mixture as a complicated noise with multiplicative nonlinear components.Thus, the denoising tools must be effective sufficiently to compensate.About the SAR images, despite the optical images, the ambient components are not as harmful as internal components.Therefore, noises are made by a single source of multiple sources of processing algorithms in a sensor and its surrounding processors at both signal processing and data processing stages.The most known signal processing noise is Speckle artifact, and the most common data processing noise is compression artifact.If no compensation process to remove noises is done, they will affect the next data processing steps such as edge detection and object detection.The compensation process is a kind of filtering.The noise-reduction filters often use a process named masking to sweep all pixels of an input image.In this simplest description, the filters are a kind of central tendency measure which are computed based on the neighboring pixels.As an example, the mean filter is a simple and linear solution for some kinds of additive noises, e.g., Gaussian noise.Nonetheless, for SAR images, the linear filters are not helpful enough according to the complicated nature of the images because not only they may not remove the strong Speckle noise, but also, the images' edge would be devastated.Regardless of the discrepancies of the candidate filters, all of them are low-pass.A low-pass filter with an averaging mechanism must refine all pixels of an input image with masking such that an average of the neighboring pixels in the local mask is computed and the central pixel is replaced with it.The averaging mechanism is usually a general term indicating a kind of central tendency based on the local inputs of the mask.
The famous measures for central tendency are mean and median that are linear and non-linear filters in their basic forms, respectively.The mean filter can work on Gaussian and Poison noises whereas median is more appropriate for multiplicative noises such as salt and pepper.In addition, some extensions of these two are widely available in the literature, for example, Gaussian mean, which is a Gaussian low-pass filter (a kind of linear filter in image processing).As a result, the first step before further processing of SAR images is to enhance the images by using denoising filters.A main setting parameter of the denoising masks is their window size.In higher order of the size, noises are well reduced, but instead, the main image information such as edges would be destroyed as well.In small-sized windows, image edges are being kept acceptably whereas the noise is not removed desirably.It shows that a kind of trade-off usually exists on how to select the window size.

Image enhancement
Using high-boost filters and a kind of fusion of local and global contrast information can help to find targets into a context of fake objects while doing tracking.Even though radar imaging does not require the natural light of around the Earth for remote sensing services and data provisioning, sometimes, atmospheric elements that can destroy the optical images, may affect the radar images as well, but their effect is not as hard as their impact for optical imaging systems.As usual, the impacts of aerospace resources is modelled as noises.Some pre-processing techniques such as dehazing and fog removal are used to enhance images in addition to the general solutions including denoising, resolution enhancement, contrast improvement (including Gamma correction, and histogram matching and equalization), and edge enhancement (and high-boosting).In particular, contrast of radar images taken by a SAR sensor is not acceptable normally, thus this problem must be solved earlier before the main processing.
All of these are subject to single-channel microwave imaging in SAR platforms that provide radar-imaging services for a variety of applications.There are other kinds of SAR sensors as well, which provide multichannel spectral information such as virtual color.As an example, polarimetric SAR uses polarization modes in microwave systems to make a synthesized virtuallycolored radar image.Also, multi-band SAR with different microwave ranges can also make color images.About the color images in radar imaging systems, improvement of low contrast is not often required because of benefiting from the spectral information, but for single-channel two-dimensional (2D) SAR, handling of the image contrast is a very helpful pre-processing.This may cause an increase in images' energy, entropy, and variance, and eventually the accuracy in the performance of the AI units in data post-processing.

Morphological processing
As mentioned earlier, median is a non-linear filter that can equalize the inputs.Here, we use its 2D standard form to operate on the SAR images, exclusively for non-target regions, mainly for the areas affected by Speckle, and salt and pepper-like noises.Image enhancement based on histogram equalization to modify the contrast is the second main pre-processing.Now, a filtering based on the morphological operators is utilized to complete the pre-processing steps.
Morphological tools are part of image processing operations that are used widely to change a group of pixels or process shapes.Among the well-known morphological operations, morphology-based binary extension is used.A binary image has just two levels.The filter is usually and mainly impactful on non-dark pixels of a binary image.This filter similar to the other image filters will use a sliding mask in 2D.

Kalman filtering and tracking
The topic of tracking has been a hot research in video processing over decades.In fact, the difference of still image and video sequence depends on such subjects.Tracking in SAR videos is one of the hottest topics of the recent research.After the proper pre-processing discussed in the prior sections, a review on some details of Kalman filtering is provided here.This filter is an estimator, a second-order linear model that uses the error measurements.The Kalman filtering sets an algorithm to guess the status of dynamic systems in terms of time.In the literature of advanced statistical algorithms, specifically statistical signal processing, Kalman filter is considered as a Bayesian estimator.The corresponding algorithm is implemented in two main steps.In the first step, which is a prediction, the filter provides the current situation of variables with uncertainty.When the next set of measurements over time is recorded, the past estimate is updated by using a weighted average.This way of updates causes more impact of the information with certainty and reliability.The algorithm is recursive on which it works with new inputs and past states.It is often assumed that all errors are Gaussian for the inputs, but if it is not really followed by the inputs, the accuracy of the algorithm will decrease.In brief, the Kalman filter's algorithm generates the best estimate of system only if its assumptions exist.The filter produces a state and then compares it with the measured information.Ultimately, it sets a weighting based on discrepancies of the prediction and measurements to form a new estimate for the next moment.Although, we do not want to review the filter with its computation details (refer to [34] for more background), just to make a moderate review, a number of equations to detail the filter's performance are given as follows.
The estimation for the moment t i is according to eq. 1.
Where G(t i ) is the gain, M(t i ) is named measurement matrix, and x(t i ) is the system input.P denotes the prediction factor, Eq. 2.
In Eq. 2, A(t i ) is the system matrix.The other term with argument of "t-1" shows that in order for the next estimate to be computed, the prior information is being continuously used.The gain matrix is obtained based upon covariance matrix.For more details, refer to the text books.Now, we would concentrate on its use in video processing.If it is supposed that s(x,y) is a sample pixel of a frame at t i , s(x,y,t) is the under-estimation term.Consider m(x,y,t i-1 ) is a binary map of pixels that determines whether the pixel at location (x,y) is belonging to the background or the moving object at t i-1 .m(x,y,t i-1 ) is formulated in Eq. 3. The value "1" indicates that the pixel at (x,y) is moving, whereas "0" is for the relatively static background.In Eq. 3, Th(x,y,t i ) is the threshold as explained in Eq. 4.
Depending upon occurrence of "0" or "1", the gain factor would be different.As noted, the algorithm is recursive.The system's prior information is included to estimate the current state without a need to store all measurements.The intensity of every pixel is estimated as a state of the system to be flagged as a background pixel.In this method, a threshold is set according to the equations to specify whether the pixel is finally part of a moving object or background.

Proposed system: integration
The proposed system does not consist of new algorithms, but it is an optimized combination of the reviewed algorithms to find and track moving objects in SAR videos.We tried to heuristically optimize the performance for the radar dataset.The procedure presented below is a brief description of the proposed system.In the next section, the test results are provided along with visual outputs.This system does not require complicated computations of supervised machine learning systems such that can be easily performed at the edge. (1) The last two steps are presented in the next part of the paper.

Results and evaluation
The data used for testing the proposed system is coming from ICEYE Company as freely available on their website (https:// www.iceye.com/ blog/ iceye-sar-videos-publi shedtechn ical-insig hts-and-highl ICEYE is a European start-up for satellite services and SAR videos.Our tests are generally into two parts of preprocessing, and tracking.For pre-processing, there is no separate test, but its independent visual outputs are illustrated.On the other hand, the tracking result is an integrated output of both pre-processing and Kalman filtering.Figure 2 is showing the visual performance of the pre-processing steps for the first dataset.In this figure, numerous moving objects exist.Every setting has recorded a different output, we do not want to discuss the details of the settings here because it completely depends on the input frame, thus settings must be changed for any new input per use to select the best one (by an expert).
Then, the best output of the pre-processing step will be an input of the tracking step.Figure 3 is the same output for the second radar dataset which includes only one moving object.
Figure 4 illustrates the visual outputs of the tracking step for different settings in the first dataset.Two tables for qualitative and quantitative analysis of the visual outputs in Fig. 4 are provided, Tables 1 and 2, respectively.In Fig. 4, the 8 th part although records the target detection Fig. 2 Visual outputs of the pre-processing step on dataset_1, the first row shows some denoised frames with various settings, the second shows the upper denoised frames after applying the enhancement filters, and finally morphology is applied on the upper enhanced frame to make a binary matrix extracting possible moving objects from background.This clearly shows that various settings can record fully different visual results and accuracy when there are multiple moving objects Fig. 3 Visual outputs of the pre-processing step on dataset_2, the first row shows some denoised frames with various settings, the second shows the upper denoised frames after applying the enhancement filters, and finally morphology is applied on the upper enhanced frame to make a binary matrix extracting possible moving objects from background.This clearly shows that various settings may approximately record the same visual results and accuracy when there is a single moving object accuracy of 100%, it faces a very high false alarm probability that makes it finally unreliable.This is because of the sensitive settings assigned for the system that help us find all moving targets from one hand, but on the other hand, a more number of false targets are also detected wrongly.In Table 2, two metrics of the detection accuracy of real moving target/object and the probability of false alarms (P FA ) are used as the main measures of performance, Eq. 5, and Eq. 6, respectively Figure 5 illustrates the visual outputs of the tracking in the second dataset with only one moving target.Figure 4 has shown that the system performance is very sensitive  regarding the selected settings while there are multiple moving targets but this recent test in Fig. 5 shows that the system works reliably for a scenario with single target without dependency on the assigned settings.Tables 3 and 4 summarize the interpretations of the second dataset and results in Fig. 5.
The findings of this paper is useful for any remote sensing platforms in the sky or space.However, all the test data are coming from satellite sensors, not UAV sensors.It is because the goal was to monitor marine transportation, and no marine UAV data was found so far for this microwave sensor.In total, monitoring of seas and oceans by using satellites is more affordable.

Conclusions
This research has studied radar video tracking as a very hot topic research related to radar temporal data processing for remote sensing applications.An unsupervised system of target tracking was proposed and evaluated by using the satellite radar data.This system does not use complex algorithms or highly sensitive supervised machine learning methods, thus, its reliability for new data with limited computing capacity at the edge is dependable.As a result, for a scene with single moving object, the system not only is very high-performance, but also its settings are not complicated.Table 2 Quantitative analysis of the outputs of tracking for dataset_1 in Fig. 4, the best performance is accuracy of 100% while the false alarm probability is 0   On the other hand, for scenes with multiple moving objects, although the system could be high-performance, its settings was a little complicated.Therefore, the proposed system is more suitable for single moving target detection and tracking.In addition, to prevent heuristic setting of attaining the best possible performance in multiple moving targets tracking, it is suggested to use an optimizer or an unsupervised strategy to make this part of the system automatic as well.

Fig. 1
Fig.1Edge-based processing in an aerial UAV ad-hoc network; UAV_1 is the platform of remote sensing imaging, UAV_2 AND UAV_3 our Edge computing resources while UAV_4 acts as the cluster head in a clustered network Number of the correctly − found moving objects Number of all moving objects in a scence × 100 (6) P FA = Number of FAs Number of total detections

Fig. 4
Fig. 4 Visual outputs of tracking for dataset_1

Fig. 5
Fig.5 Visual outputs of tracking for dataset_2, for most tests including Frames 1 and 2 here, the accuracy is 100% while the false alarm probability is 0

Table 1
Qualitative analysis of the outputs of tracking for dataset_1 in Fig.4

Table 3
Qualitative analysis of the outputs of tracking for dataset_2 in Fig.5

Table 4
Quantitative analysis of the outputs of tracking for dataset_2 in Fig.5, the best performance is accuracy of 100% while the false alarm probability is 0