Journal of Communications and Information Networks

Figure 1 Learning automata

Using the definition of c_i, the average penalty M(t)can be defined by the following expression:

M(t)=E[β(t)|P(t)]

=E[β(t)|p₁(t), p₂(t), …, p_n(t)]

=Pr[β(t=0)|P(t)]

= $\sum_{i = 1}^{n} P r [β (t) = 0 | α (t) = α_{i}] \cdot \Pr [α (t) = α_{i}]$

= $\sum_{i = 1}^{n} c_{i} \cdot p_{i} (t),$ ,

where P(t)represents the action probability at instant t. Thus, the average penalty for the pure chance automaton is expressed as

M_{0} = \frac{1}{n} \sum_{i = 1}^{n} c_{i} .

$D e f i n i t i o n 2$ (automaton) In a general sense, an automaton is a work system which does not need guidance from the outside. From the mathematical point of view, the automaton can be defined as{A, B, Q, T, G}(A=α₁, α₂, …, α_t, B=β₁, β₂, …, β_t have been defined in Definition 1). Q=q₁(t), q₂(t), …, q_n(t)represents the state in time slot t. T : Q×B×A−→Q denotes the state transition function of the automaton, which determines the way in which automaton transfers to the next state from the current state and input. We usually use the following formula to represent the state transition from instant t to t+1.

q (t + 1) = T (q (t), α (t), β (t)) .

G determines the output based on the state at instant t.

α (t) = G (q (t)) .

Ⅳ. SYSTEM MODEL

In this section, we construct a new stability measurement model and define an effective energy ratio function. On this basis, we define the node-weighted value function, which can be used to construct a MANET environment feedback mechanism(a routing learning process).

A. Network Model

In general, a MANET can be described by a undirected graph G=(V, E), where V represents the set of nodes and E represents the set of edges^[28,29]. Therefore, a path in the network can be regarded as a set of nodes, which connect to each other from the source node to the destination node. In our paper, all nodes exist in a 2D rectangular scenario and communicate through a common broadcast channel using omnidirectional antennas. They have the same transmission range of r. Assuming that the distance between node i and j can be represented as Dist(i, j)and j is i’s neighbor node, Dist(i, j) should be no more than r. Additionally, we do not have to consider the impact of an interference range or avoid collision by using a shared wireless channel. It should be explained that these preconditions have been widely adopted in previous work.

B. Node Stability Measurement Model

The chosen relay node in an available path from the source node to the destination node should be stable enough to ensure path stability. This distinguishes our work from previous studies, in which we used only node velocity and did not consider the impact of the distribution of neighbor nodes on stability. We measure the node stability by estimating the average time a node is connected to its neighbor nodes. Without any loss of generality, we assume that v_i represents the mobility speed of node i in a small time slot, and that the velocity of the node remains unchanged during this short period of time. Hence, we can estimate the survival time of the link between node i and any of its neighbor nodes, using the following formula:

r^{2} = {[(v_{i_{x}} - v_{j_{x}}) . t_{i, j} + D i s t {(i, j)}_{x}]}^{2} +

{[(v_{i_{y}} - v_{j_{y}}) . t_{i, j} + D i s t {(i, j)}_{y}]}^{2}, (1)

where $v_{() x}$ , $v_{() y}$ represent the horizontal component and ver-tical component of speed, respectively; Dist()x, Dist()y the horizontal component and vertical component of the current distance between node i and node j; node j is a neighbor node of node i. By solving the above formula, we can get the ex-pression of t_{i, j}

t_{i, j} = \frac{\sqrt{H_{b}^{2} - H_{a} \cdot H_{c} - H_{b}^{2}}}{H_{a}}, (2)

where H_a, H_b, H_c can be represented as

{\begin{cases} H_{a} = {(v_{i_{x}} - v_{j_{x}})}^{2} + {(v_{i_{y}} - v_{j_{y}})}^{2}, \\ H_{b} = D i s t {(i, j)}_{x} \cdot (v_{i}_{_{x}} - v_{j_{x}}) + D i s t {(i, j)}_{y} \cdot (v_{i, y} - v_{j y}), \\ H_{c} = {[D i s t {(i, j)}_{x}]}^{2} + {[D i s t {(i, j)}_{y}]}^{2} . \end{cases} (3)

Thus, we can use a weighted average function to measure node i’s stability(Ns_i)

{\begin{cases} N_{s_{i}} = \sum_{j = 1}^{m} π {c u r r e n t d i s \tan c e = D i s t (i, j)} . t_{i, j}, \\ \sum_{j = 1}^{m} π_{i j} = 1, \end{cases} (4)

where π{current distance = Dist(i, j)} represents the weighted value of t_{i j}, which means the limiting probability that the current distance between i and j is Dist(i, j). To give the expression of π{current distance=Dist(i, j)}, we should first deduce the expression of Prob{Dist(i, j)< r}(r is the transmission range defined in part A of section Ⅲ).

Without loss of generality, the node mobility in this MANET is independent and stochastic. Thus, we can deduce the spatial PDF(probability distribution function)of any node in the MANET using the following uniform distribution:

f (x, y) = \frac{1}{L \cdot W} | x \in (0, L), y \in (0, W), (5)

where L, W respectively represent the length and width of MANET. Therefore, the joint PDF between node i and j can be represented as follows, noting that the joint PDF relies on the setting of network scenarios

{\begin{cases} f_{i, j} (D x, D y) = \frac{4}{L^{2} W^{2}} \cdot (L - D x) (W - D y), \\ D x^{2} + D y^{2} = D i s t (i, j) = D, \\ D x = D i s t {(i, j)}_{x}; D y = D i s t {(i, j)}_{y} . \end{cases} (6)

Thus, the expression of Prob{Dist(i, j)6 r} can be expressed as

{\begin{cases} f o r 0 ＜ r \leq W : \Pr o b {D i s t (i, j) = D \leq r} \\ = \int_{0}^{r} \int_{0}^{\sqrt{r^{2} - D x^{2}}} f_{i j} (D x, D y) d D x d D y, \\ f o r W ＜ r \leq L : \Pr o b {D i s t (i, j) = D \leq r} \\ = \int_{0}^{r} \int_{0}^{\sqrt{r^{2} - W^{2}}} f_{i j} (D x, D y) d D x d D y + \\ = \int_{0}^{\sqrt{r^{2} - D x^{2}}} \int_{\sqrt{r^{2} - W^{2}}}^{r} f_{i j} (D x, D y) d D x d D y, \\ f o r L ＜ r \leq \sqrt{L^{2} + W^{2}} : \Pr o b {D i s t (i, j) = D ＜ r} \\ \int_{0}^{W} \int_{0}^{\sqrt{r^{2} - W^{2}}} f_{i j} (D x, D y) d D x d D y + \\ = \int_{0}^{\sqrt{r^{2} - D x^{2}}} \int_{\sqrt{r^{2} - W^{2}}}^{r} f_{i j} (D x, D y) d D x d D y . \end{cases} (7)

Simplifying the above integral (7) using the elementintegral method

\Pr o b {D i s t (i, j) = D \leq r} = \frac{4 D}{L^{2} W^{2}} f_{u n i t} (D), (8)

where f_unit(D)can be written as

{\begin{cases} f o r 0 ＜ r \leq W : f_{u n i t} (D) = \frac{π}{2} L W - L r - W r + \frac{r^{2}}{2}, \\ f o r W ＜ r \leq L : f_{u n i t} (D) \\ = L W \arcsin \frac{W}{r} + L \sqrt{r^{2} - W^{2}} - \frac{W^{2}}{2} - L r, \\ f o r L ＜ r \leq \sqrt{r^{2} + W^{2}} : f_{u n i t} (D) \\ = L W \arcsin \frac{W}{r} + L \sqrt{r^{2} - W^{2}} - \frac{W^{2}}{2} + \\ W \sqrt{r^{2} - L^{2}} - \frac{L^{2}}{2} - \frac{r^{2}}{2} . \end{cases} (9)

Generally, the transmission range r is always less than the width W. With the help of the spline interpolation method^[30,31], we can deduce the expression of π{current distance=Dist(i, j)}

π {c u r e n t d i s \tan c e = D i s t (i, j) = D}

= \Pr o b {D i s t (i, j) = D \leq r ＜ W}^{- 1} .

[\frac{2 π}{L W} + \frac{6 D^{2}}{L^{2} W^{2}} - \frac{8 (L + W) D}{L^{2} W^{2}}] . (10)

$R e m a r k 1$ D is a simplified representation of Dist() in previous formulas. The reason we get the expression (10) through the spline interpolation method is that it meets Lip-schitz Condition^[30,32].

By substituting formula(10)into formula(4), we can obtain Ns.

C. Effective Energy Ratio Function

In an available path, the chosen relay node should have enough power to ensure the transmission of packets. We use the effective energy ratio function(Er)to represent the energy level of the relay node

E r_{i} = \frac{{Re}_{i}}{I e_{i}}, (11)

where Re_i and Ie_i represent the residual energy and initial energy of node i, respectively. Thus, we can define the node weighted value(Nw)as

{\begin{cases} N w_{i} = ω_{1} \cdot N {s^{'}}_{i} + ω_{2} \cdot E r_{i}, \\ N {s^{'}}_{i} = \frac{N s_{i} - N s_{\min}}{N s_{\max} - N s_{\min}}, \\ ω_{1} + ω_{2} = 1, \end{cases} (12)

where ω₍₎represents the weighting factor, Nb⁰_i represents the normalized value of Nb_i. To find the optimal route among all of the available routes, ensuring that it is not only stable but also guarantees energy conservation and balance, it is necessary to optimize the selection of paths. We note that traditional heuristic routing algorithms generally lack extendibility, mathematical rigor, and adaptability for a dynamic environment. Therefore, unlike previous work, we have chosen LA theory to complete the optimization process. In the next section, we discuss in more detail our LA theory-based, energy-efficient, stable routing algorithm.

Ⅴ. STABLE AND ENERGY EFFICIENT ROUTING ALGORITHM BASED ON LA THEORY

In this section, we first use LA theory to construct a MANET environment feedback mechanism. In other words, by judging the content of each reply message, we reward or punish the relay node through a rigorous iteration expression. We then provide a detailed algorithm implementation scheme. Finally, we prove the convergence of our algorithm.

A. MANET Environment Feedback Mechanism

When the source node plans to send packets to the destination node, it needs to find the available paths from the source node to destination node. To find all of the available paths to the destination node, the source node broadcasts build-route messages to its neighbor nodes(flooding requests). Once the build-route message is received by the destination node, it will reply the build-route message, so that all available paths from the source node to the destination node can be identified. Obviously, not all of these paths will be good enough to transmit data packets. Therefore, it is necessary to find the optimal path, which is sufficiently stable and ensures overall energy saving and balance.

Using LA theory, we construct a MANET environment feedback model, which is an optimization mechanism for route selection. In this model, each node is equipped with a learning automaton to execute the feedback mechanism. Fig.2 shows the feedback mechanism of the LA. To find the optimal path from the available paths, the source node sends request packets to the destination node through the available paths. When it receives the request packets, the destination node responds by sending reply packets to the source node along the available paths. In this reply process, each relay node on an available path receives a reply packet from the next available hop node. Drawing on LA theory, the next available hop node senses the surrounding MANET environment and adds environment-feedback information to the reply message.

As mentioned in section Ⅱ, there are two types of environment feedback. Based on the information content of the replies, we set two feedback criteria(Fig.3 shows these two criteria, where j and k represent the next available hop nodes of node i):

Figure 2

Figure 2 Feedback mechanism of learning automata

(1) If a relay node receives a reply packet from the next hop node, and the information contained in this reply packet is“good information”(a reward signal), we will execute the reward scheme and the weighted value(Nw)of the next hop node will be increased accordingly.

(2)If a relay node receives the reply packet from its next hop node, and the information contained in this reply packet is“bad information”(a penalty signal), we will execute penalty scheme and the weighted value(Nw)of the next hop node will be reduced accordingly.

Figure 3

Figure 3 Two feedback criterions

Using the standard linear iteration equation of LA theory^[23], the node’s reward feedback scheme for receiving“good information”can be represented as

N w_{i} (t + 1) = N w_{i} (t) + a [1 - N w_{i} (t)] . (13)

The penalty feedback scheme for altering a node’s weighted value on receipt of“bad information”can be represented as

N w_{i} (t + 1) = (1 - b) N w_{i} (t) . (14)

where a and b represent the linear update rate of the weighted value; t represents the instant(in a discrete condition, it denotes iteration times).

It is now essential to define a feedback judgement function, which can decide whether information is good or bad. Based on the definition of feedback judgement function in LA theory^[23], the feedback judgment function(ϕ)can be expressed as

φ_{i} (t) = N w_{i} (t) - \sum_{j = 1}^{N i} \frac{N w_{j} (t)}{N i}, (15)

where Ni represents the number of node i’s neighbor nodes. When Nw_i(t)> 0(better than the average level), the MANET environment has generated a reward feedback signal and the node has received good information; conversely, Nw_i(t)< 0 (worse than the average level)means that the MANET environment has generated a penalty feedback signal and the node has received bad information.

We note that only using formulas(13)and(14)cannot reflect feedback strength, as they change with iteration times. Therefore, we continue to optimize these two formulas, using the following expression to represent the improved feedback scheme:

{\begin{cases} I f n o d e i i s r e w a r d e d a n d g e t g o o d i n i f o r m a t i o n \\ N w_{i} (t + 1) = N w_{i} (t) + a_{i} (t) [1 - N w_{i} (t)], \\ I f n o d e i i s p u n i s h e d a n d g e t b a d \inf o r m a t i o n \\ N w_{i} (t + 1) = [1 - b_{i} (t)] N w i (t), \end{cases} (16)

where

{\begin{cases} a (t) = δ \cdot μ (t), \\ b (t) = ε \cdot [1 - μ (t)], \\ u (t) = \exp [- | φ_{i} (t) |] . \end{cases} (17)

As the node-weighted value updates, based on our proposed MANET feedback mechanism, the path-weighted value will be updated accordingly. In this feedback mechanism, the path value(P)of an available path from the source node to the destination node can be represented as

P_{n} (t) = \sum_{i = 1}^{M} \frac{N w_{i} (t)}{M}, (18)

where M represents the number of relay nodes on an available path from the source node to the destination node; m represents the ID of the available path. With the help of this feedback mechanism, the node learns from the information by sensing the MANET environment(judging the type of feedback signal) and updates its weighted value. With the help of this feedback mechanism, we can find the optimal path, with the highest path value P. Finally, the data packets will be transmitted through this optimal path that is stable enough and ensures overall energy conservation and balance.

B. Algorithm Implementation Scheme

We give the detailed algorithm implementation scheme as follows:

C. Convergence Proof of Our Proposed Routing Algorithm

As mentioned in a previous paper, our proposed algorithm uses LA theory to optimize the selected paths. As it is important to ensure the convergence of this process, we have provided the following rigorous proof.

Our proposed routing algorithm

Initialization:

adjustable parameters of LA(δ, ε)

Input:

the available paths between the source node and the destination node(α_.), relay nodes which belong to the available paths

Output:

the optimal path

Execution steps:

1)sending a packet through an available path;

2)while receiving the packet, the destination node will send a reply message to the source node;

3)for i=1, 2, ···, m−1(i represents the relay node ID);

4)if the relay node i receives a reply message from the next available hop node j and $N w_{i} (t) - \sum_{k = 1}^{N j} (N w_{k} (t) / N j) \geq 0$ ;

5)using the reward feedback scheme to update the weighted value of node j;

6)Nw_j(t+1)=Nw_j(t)+a_j(t)[1−Nw_j(t)];

7)updating the path value which contains this relay node j accordingly;

8)P_m(t+1)=P_m(t)+(a_j(t)[1−Nw_j(t)])/M;

9)else(node i receives the reply message from the available next hop node j and $N w_{i} (t) - \sum_{k = 1}^{N j} (N w_{k} (t) / N j) ＜ 0$ = ;

10) using the penalty feedback scheme and updating the weighted value of the node j;

11)P_j(t+1)=[1−b_j(t)]p_j(t);

12)updating the path value which contains this relay node j accordingly;

13)P_m(t+1)=P_m(t)+[b_j(t)]/M;

End

$T h e o r e m 1$ The node-weighted value Nw(t)can converge if and only if the drift of the node-weighted value Nw⁰(t)is not a direct function of the instant t.

$P r o o f$

$L e m m a 1$ the drift of node weighted value Nw⁰(t) is not directly a function of instant t.

$L e m m a P r o o f$ As the process of this algorithm can be regarded as an optimization problem, we can use approximation theory^[32]to prove the algorithm. In our paper, each node has a learning automaton; for this reason, we can use the theory of stochastic process to represent the genericity update process of weighted value(in the dynamics control method, the one step probability update is called the drift function of probability because the long update process can be regarded as a continuous update process. For this reason, we can present the genericity update process of node-weighted value in this way:

N w^{'} (t + 1) = E [N w (t + 1) | N w (t)] - N w (t) . (19)

Based on the Kolmogorov criterion^[23], formula(19)can be represented as

N {w^{'}}_{i j} (t) = Λ \cdot N w_{i j} (t) [1 - N w_{i j} (t) E [β_{i j} (t)] -

Λ \cdot \sum_{q \neq j} N w_{i q} (t) N w_{i j} (t) E [β_{i j} (t)]]

= Λ \cdot N w_{i j} (t) \sum_{q \neq j} N w_{i q} (t) [E [β_{i j} (t)] - E [β_{i q} (t)]] . (20)

Based on the definition in LA theory, Λ represents the genericity maximum update rate proposed in our improved LA. Generally, the update rate floats around 0.10(in our paper, we set δ =0.10 and ε=0.10). Thus, we can reach the following conclusion by differentiating both sides of the formula(20)

\frac{ϑ \sum_{q \neq j} N w_{i q} E [β_{i q} (t)]}{N w_{i j}} = E [β_{i j} (t)] . (21)

Let $ϑ \sum_{q \neq j} N w_{i q} E [β_{i q} (t)] / N w_{i j} = Q (N w)$ , $N {w^{'}}_{i j} (t)$ can be represented as

{\begin{cases} N {w^{'}}_{i j} (t) = Λ \cdot N w_{i j} (t) \sum_{q \neq j} N w_{i q} (t) [\frac{Q (N w)}{N w_{i j}} - \frac{Q (N w)}{N w_{i q}}], \\ N {w^{'}}_{i j} (t) = f u n c t i o n (N w (t)) . \end{cases} (22)

Obviously, we conclude that $N {w^{'}}_{i j}$ is not a direct function of the time slot t. This conclusion allows us to use the spline interpolation method^[31] to rewrite the node-weighted value Nw(t)

N w (t) = P^{Λ} (τ), (23)

where τ∈[nΛ, (n+1)Λ], P^Λ(τ)is a piecewise constant interpolation function(In LA theory, P(t)represents the action probability at instant t). Now we only care about the convergence of P^Λ().

Generally, the genericity maximum update rate Λ is close to 0^[23]. Hence, based on approximation theory^[33], we can make an assertion that p^Λ(τ)can weakly converge to the solution set of an ordinary differential equation^[30], which can be represented as

{\begin{cases} \frac{d x_{i j}}{d τ} = x_{i j} (t) \sum_{q \neq j} x_{i q} [\frac{Q (x)}{x_{i j}} - \frac{Q (x)}{x_{i q}}], \\ x (0) = p^{Λ} (0), \end{cases} (24)

where X(. )denotes Nw(. )in our study. It must be noted that formula(24)is a particular case in the weak convergence theorem, as it relies on the fact that P(t), β(t), α(t)constitute a Markov process. In addition, the value of β(t), α(t)(defined in definition 1)takes on values in a compact metric space. The outputs of the LA derive from a finite set, and the reinforcement signals take values from the closed interval[0, 1]. It is important to note that, in our proposed algorithm, α(t)represents the result of the sensing environment, which has two types of information, good and bad. Hence, α(t)is a finite set. Similarly, β(t)represents the feedback mechanism for the relay nodes, which also can be represented as two probability iteration formulas. For this reason, β(t)in our paper is also a finite set. In addition, we can determine that the computation space has been defined in[0, 1], which, of course, meets the condition.

In our routing algorithm, we have improved the update factor using formula (17). It should be stressed here that a(t), b(t) are represented as two exponential functions. We can easily find that (

{\begin{cases} \max {a (t)} = δ, \\ \max {b (t)} = ε . \end{cases} (25)

Therefore, we can confirm that the update rate is in[0, 1], which also meets the condition. We find that the formula (24)(an ordinary differential equation)has a unique solution, which is based on the initial x(0).

In this proof, j represents the chosen next hop node of node i; q represents the ID of the hop node not chosen by node i. Hence Nw_{i j} represents the edge between nodes i and j, which is chosen by selecting the next hop node j from current node i. The value of Nw_{i j} is equal to the current node-weighted value Nw _j. The genericity maximum update rate Λ can have two parameters: a reward coefficient and a penalty coefficient. Thus, we are able to provide a convergence proof using LA theory.

Ⅵ. SIMULATION RESULTS ABOUT THE PROPOSED ALGORITHM

This section evaluates the performance of the proposed algorithm. By using NS-3^[34,35], we compare our proposed algorithm(LASEERA)with that of ACSRA(Ref. [12]), ANNQARA(Ref. [21]), and classical AODV.

A. Simulation Parameters

Tab.1 shows the network parameters.

Our experiment uses the Random Waypoint model to describe the node motion and standard energy module parameters of NS-3(Power End, Power Start, Current A)in order to describe the energy transmission consumption. It is important to stress that the parameters δ and ε must not be too large or too small. Like the memory factor of Markov Chain, these two parameters determine the maximum update rate of the LA. If these parameters are too large, the optimization process rate will be added, but this process cannot ensure the robustness of the algorithm. If these parameters are too small, the optimization process rate will be reduced and learning efficiency will not be good enough. In general, the value of the update rate floats around 0.10, which is an empirical value. In addition, we must fairly consider route stability and energy efficiency. We therefore set ω₁=ω₂=0.5(Obviously, the sum of the weighting coefficients must be 1).

Table 1 Parameters

network scene scale	500 m×600 m
nodes number	{40,60,80,100}
transmission range	50 m
simulation time	600 s
mobility model	random waypoint
speed range	0∼20 m/s
pause time	0s
initial distribution	completely uniform
MAC protocol	802.11 b
initial energy	2 Joule
power end	16.0206 dBm
power start	16.0206 dBm
current A	0.0174 Ampere
bandwidth	250 kbit/s
packet size	256 B
weighting coefficient	ω₁=ω₂=0.5
reward coefficient	δ (0.10)
penalty coefficient	ε(0.10)

New window| CSV

B. Experiment Metrics

To measure the performance of our proposed routing algorithm, we use the following metrics.

• $R o u t e s u r v i v a l t i m e$ The time period when the route is connected. This metric is significant as a measure of the rationality and effectiveness of the node-stability measurement model used in our proposed routing algorithm.

• $R e s i d u a l e n e r g y$ The energy remaining in the node. This metric uses our routing algorithm to reflect the level of energy consumption.

• $E n e r g y v a r i a n c e$ The difference in residual energy between nodes. This metric reflects the energy balance level. An efficient routing protocol can ensure that this metric is as small as possible.

• $P a c k e t d e l i v e r r a t i o n$ The ratio of packets successfully delivered to the destination node. An efficient routing protocol can maintain this metric at a relatively high level.

• $E n d - t o - e n d d e l a y$ The time it takes for data packets sent from the source node to reach the destination node. An efficient routing protocol can ensure that this metric is as small as possible.

• $N o r m a l i z e d c o n t r o l o v e r h e a d$ This metric reflects the extra overhead that results from using non-data-transmission packets, which are needed to construct the optimization strategy of our algorithm.

Fig.4 shows the variation in average route survival time in relation to the number of nodes, when the maximum veloc ity is 10 m/s. It is clear that our proposed routing algorithm (LASEERA) has a higher route survival time than ACSRA, ANNQARA, or AODV. When compared with ACSRA, ANNQARA and AODV, our proposed routing algorithm’s route survival time increases by an average of 22. 5%, 24. 9%, and 80.2%, respectively.

Figure 4

Figure 4 Survival time vs. nodes

The relationship between route survival time and the number of nodes is not a monotonic relationship. Overall, the route survival time will slightly but not strictly increase with the number of nodes.

By analyzing the simulation results, we can confirm that our node stability measurement model is useful for enhancing route stability. In the ACSRA routing algorithm, the authors focus on efficiently adjusting the relay node, which uses only node density as its optimization parameters; this cannot provide a route survival time as long as that offered by our routing algorithm. In ANNQARA, the authors used a two-layer CNN to optimize the throughput ratio and end-to-end delay. This is why it cannot ensure as long a route survival time as our routing algorithm. Classical AODV uses the shortest path as its routing policy, which, of course, cannot ensure route stability(the shortest path cannot ensure the relative stability of the link).

Fig.5 shows the variation in average route survival times, in relation to maximum velocity, when the number of nodes is 40.It is clear that our proposed routing algorithm (LASEERA) has a higher route survival time than ACSRA, ANNQARA, or AODV. In fact, our proposed routing algorithm’s route survival time is higher than those of ACSRA, ANNQARA, and AODV by an average of 37. 4%, 41. 5%, and 104. 3% respectively.

By analyzing the simulation results, we can see that, as the velocity increases, the network topology stability decreases. This means that the link cannot be maintained long enough. This is why we see so many decreasing route survival times. Owing to the stability model constructed in our paper, our proposed algorithm has the best route survival time.

Figure 5

Figure 5 Survival time vs. velocity

Fig.6 shows the variations in average residual energy in relation to the number of nodes, when the maximum velocity is 10 m/s. It is clear that our proposed routing algorithm (LASEERA) produces higher residual energy than ACSRA, ANNQARA or AODV. In fact, our proposed routing algorithm’s residual energy is an average of 21. 9%, 18. 3%, 55. 6% higher than ACSRA, ANNQARA and AODV, respectively.

Figure 6

Figure 6 Residual energy vs. nodes

The relationship between our routing algorithm’s residual energy and the number of nodes is not a monotonic relationship. The residual energy of the two other routing algorithms(ACSRA and ANNQARA)decreases with the number of nodes, but the relationship is not strictly linear.

Analyzing the simulation results shows that our strategy is useful in reducing energy consumption. It is worth noting that, although we use only an effective energy ratio function in the optimization strategy(in common with the other three algorithms cited, we do not have an intelligence transmission power adjustment strategy), energy consumption is reduced. The first reason for this is that we are able to maintain the route life cycle as long as possible, as our stability control model decreases route adjustment frequency, thus saving energy. The second reason is that the effective energy ratio function can ensure the node residual energy level. In other words, the probability that a node will not have enough energy to transmit a packet in the packet transmission process is lower than that in the other three algorithms. If the relay node does not have enough energy to transmit a packet, we must abandon this routing and adjust the relay node accordingly, which naturally increases energy consumption. Because of its frequent route reconstruction, AODV has the highest energy consumption.

Fig.7 shows the variations in average residual energy in relation to maximum velocity, when the number of nodes is 40.It is clear that our proposed routing algorithm (LASEERA) has higher residual energy than ACSRA, ANNQARA, or AODV. In fact, our proposed routing algorithm’s residual energy is an average of 16. 6%, 19. 5%, and 40.5% higher than ACSRA, ANNQARA and AODV, respectively. Overall, residual energy decreases with the maximum velocity value.

Figure 7

Figure 7 Residual energy vs. velocity

By analyzing the simulation results, we can see that, as velocity increases, residual energy decreases due to the increase of network reconstruction frequency, inflating energy consumption. As detailed above, our optimization strategy can ensure the residual energy level, reducing the probability of relay node adjustment during the packet transmission process. Hence, as velocity increases, our algorithm still has the best performance.

Fig.8 shows the variations in energy variance in relation to the number of nodes, when the maximum velocity is 10 m/s. On the whole, our proposed routing algorithm (LASEERA) has lower energy variance than ACSRA, ANNQARA, or AODV. In fact, our proposed routing algorithm’s energy variance is an average of 25. 5%, 24. 9%, and 34. 9% lower than ACSRA, ANNQARA and AODV, respectively. This means that our proposed routing algorithm performs better in balancing energy. In addition, the energy variance decreases with the number of nodes; the relationship is not strictly linear.

Figure 8

Figure 8 Energy variance vs. nodes

Analyzing the simulation results shows that our strategy is useful for balancing overall residual energy. It should be noted that, in our method, the node chosen to be a relay node should have higher residual energy than neighboring nodes. In this way, we can ensure that the same nodes will not always be reused during the route reconstruction process. This is the reason why our algorithm has the smallest energy variance. In the other three routing algorithms, the authors have not considered this problem; as a result, their energy variance performance is not as good as ours. As the number of nodes increases, the reuse frequency of nodes decreases during the route reconstruction process. This demonstrates that, as the number of nodes increases, the energy variance decreases.

Fig.9 shows the variations in energy variance in relation to maximum velocity, when the number of nodes is 40.Clearly, our proposed routing algorithm(LASEERA)has less energy variance than ACSRA, ANNQARA or AODV. In fact, our proposed routing algorithm’s energy variance is an average of 29. 8%, 26. 1%, and 38. 0% lower than that of ACSRA, ANNQARA and AODV, respectively. This means that our proposed routing algorithm performs better in balancing energy.

Analyzing the simulation results shows that, as velocity increases, so does energy variance. The relationship between them is not strictly linear. The reason for this is that, as velocity increases, so does route reconstruction frequency. Given that there is a fixed number of nodes in the network, the probability of node reuse also increases, enhancing the value of energy variance and decreasing energy balance.

Fig.10 shows the variations in end-to-end delay in relation to number of nodes, when maximum velocity is 10 m/s. We can see that our proposed routing algorithm(LASEERA)performs not as good as ANNQARA but better than ACSRA in end-to-end delay. Compared to ACSRA, our algorithm (LASEERA)’s end-to-end delay reduces 42. 2% in average. Compared to ANNQARA, our algorithm (LASERA)’s endto-end delay increases 6. 7% in average. Compared to AODV, our algorithm(LASERA)’s end-to-end delay reduces 19. 6% in average.

Figure 9

Figure 9 Energy variance vs. velocity

Figure 10

Figure 10 The end-to-end delay vs. nodes

Analyzing the simulation results shows that that ACSRA achieves the highest end-to-end delay performance. The reason is that in ACSRA, the node density is an important factor to choose or adjust the relay node, which ensures the chosen node has relatively high node density. This method may cause the route contains more relay nodes when the number of nodes increases. From the simulation results, it is easy to follow this truth. We also find that the AODV has a relative high delay performance, which means the original strategy cannot represent the shortest delay performance. It is interesting to discuss the delay performance of our algorithm and ANNQARA. As we have mentioned in related work, ANNQARA use the pa rameter of end-to-end delay to construct the convolution calculation model(2 layers CNN). Hence, the result that the delay performance of ANNQARA is better than other routing algorithms is normal. Owing to the stability model constructed in our paper, the definition of node weighted value has contained the consideration to distance factor(node distribution). This is the reason why our proposed routing algorithm can achieve good delay performance. We note that the delay performance of our algorithm and ANNQARA are not influenced by the number of nodes(as a whole).

Fig.11 shows the variations in end-to-end delay in relation to maximum velocity, when the number of nodes is 40.In this case, our proposed routing algorithm (LASEERA) performs less well than ANNQARA but better than ACSRA in end-to-end delay. Compared to ACSRA, our algorithm’s endto-end delay is an average of 11. 7% lower. Compared to ANNQARA, its end-to-end delay is an average of 6. 7% higher. Compared to AODV, its end-to-end delay is 11. 3% lower, on average.

Figure 11

Figure 11 End-to-end delay vs. velocity

Analyzing the simulation results, we find that as velocity increases, delay increases (not absolutely). The reason for this is that, as velocity increases, so does route reconstruction frequency, leading to the frequent adjustment of relay nodes. In this situation, the route may need auxiliary nodes to ensure transmission from the current node to its next hop node, which, of course, adds to the number of relay nodes. The endto-end period of delay increases accordingly.

Fig.12 shows the variations in packet delivery ratios in relation to the number of nodes, when maximum velocity is 10 m/s. We can see that our proposed routing algorithm (LASEERA) has a relatively low packet delivery ratio, in comparison to ACSRA, ANNQARA, and AODV. In fact, our proposed routing algorithm’s packet delivery ratio is an average of 2. 0% and 1. 1% lower than those of ACSRA and ANNQARA, respectively. Compared to AODV, our routing algo rithm’s packet delivery ratio is 5. 5% higher.

Figure 12

Figure 12 The packet delivery ratio vs. nodes

Analyzing the simulation results shows that ANNQARA has the best packet delivery ratio because it uses a convolution calculation model. ACSRA also has a better packet delivery ratio than our proposed algorithm. This reflects its unique strategy. As discussed above, a node that has relatively high density is a suitable relay node. This approach ensures the relay node always has auxiliary nodes, guaranteeing transmission, when the next hop node is malfunctioning. Hence, ACSRA’s results are easy to understand. Of all of the systems, AODV has the worst packet delivery ratio.

Fig.13 shows the variations in packet delivery ratios, in relation to maximum velocity, when the number of nodes is 40.We can see that our proposed routing algorithm(LASEERA)’s packet delivery ratio is entirely lower than that of ACSRA and partly lower than that of ANNQARA. Compared to ACSRA, our proposed routing algorithm’s packet delivery ratio is an average of 2. 3% lower. Compared to ANNQARA, our algorithm (LASEERA)’s packet delivery ratio is an average of 0.8% lower. Compared to AODV, our algorithm (LASEERA)’s packet delivery ratio is an average of 5. 3% higher.

Analyzing the simulation results shows that, with increased velocity, the packet delivery ratio decreases. The reason for this is easy to follow: route stability is reduced by increasing velocity. Hence, the packet delivery ratio decreases.

Now, we must analyze the computational complexity of our algorithm. In the worst possible conditions, we would need (n−2)nodes to construct a route between the source node and the destination node. These relay nodes are within the same transmission range. Based on the optimization strategy in our algorithm, we need to ascertain the current state of the neighbor nodes and then judge the type of feedback. In this way, a node needs to communicate with (n−3) neighbor nodes. Therefore, a one-time feedback process needs(n−2)(n−3)communication times. As we have mentioned above, the optimization process can be finished in a finite time period(a finite number of iteration times). Assuming that the finite number of iteration times is M, the total communication time will be M(n−2)(n−3), or O(n²). We note that, in practical terms, the computational complexity would normally be less than that shown in this example. To measure the computational complexity of our algorithm, we use control overhead to evaluate this metric.

Figure 13

Figure 13 Packets delivery ratio vs. velocity

Fig.14 shows the impact of the number of nodes on the amount of control overheads. It is clear that AODV has the smallest amount of control overheads, while ANNQARA and ACSRA have higher control overheads than our routing algorithm. That means that, as a learning-based routing protocol, our routing algorithm has an acceptable amount of control overheads, which are 22. 7% lower than ANNQARA and 25. 2% lower than ACSRA. By contrast, they are 38. 3% higher than AODV.

Figure 14

Figure 14 Control overhead vs. nodes

Analyzing the simulation results shows that that AODV has the smallest amount of control overheads. The reason for this is that AODV does not use any additional optimization methods to optimize the chosen route, which, of course, reduces the control overheads. As mentioned in the previous section, ANNQARA uses a 2-layer CNN, which can enhance computation costs. For this reason, ANNQARA’s control overheads are larger than those of our routing algorithm. ACSARA needs periodical control packets to adjust its relay nodes, which can also increase control overheads. Its control overheads are therefore larger than those of our routing algorithm. The above results confirm that the MANET routing algorithm, which uses optimization methods, increases control overheads, boosting computation costs.

Fig.15 shows the impact of velocity on control overheads. It is clear that AODV has the smallest amount of control overheads, while ANNQARA and ACSRA have higher control overheads than our routing algorithm. This means that, as a learning-based routing protocol, our routing algorithm has an acceptable amount of control overheads: 26. 6% lower than ANNQARA and 27. 5% lower than ACSRA. By contrast, our proposed algorithm’s control overheads are 30.2% higher than those of AODV.

Figure 15

Figure 15 Control overhead vs. velocity

Analyzing the simulation results shows that, as velocity increases, so do control overheads. The reason for this is easy to understand: route stability is reduced by increasing velocity (frequent route reconstruction requires more control packets). Hence, control overheads increase.

The simulation results above offer the following generalized findings:

(1) Our proposed routing algorithm (LASEERA) has the best performance when it comes to route survival time, energy consumption, and energy balance.

(2)Intelligence algorithms(LASEERA, ANNQARA, and ACSRA)need more costs to control their optimization strategy, which, of course, increases their control overheads. Com pared to the other intelligence algorithms (ANNQARA and ACSRA), our algorithm has an acceptable performance in relation to control overheads.

(3) No algorithm can optimize all of the metrics without paying any additional cost. The optimization strategy determines which metrics it can optimize. An efficient optimization strategy relies on the precondition that the additional costs of the strategy are acceptable.

Ⅶ. CONCLUSION AND FUTURE WORK

In this paper, we proposed an energy-efficient, stable routing algorithm based on LA theory for MANET, and provided clear research steps. We first constructed a new node stability measurement model and defined an effective energy ratio function; these were used to define the node-weighted value. Second, we constructed a MANET environment feedback mechanism, in which each node is equipped with a learning automaton to execute an optimization process and update its own weighted value, based on different feedback signals, which are generated by sensing the node’s ambient network environment. In this process, we also improved the basic LA, so that each node can sense variations in feedback signal strength over time. In addition, we provided a rigorous mathematical proof to authenticate the convergence of our proposed routing algorithm, which has not been done well in earlier studies. Through a simulation experiment, we found that our proposed routing algorithm has the best performance in route survival time, energy consumption, and energy balance and achieves an acceptable performance in end-to-end delay and packet delivery ratio.

In the future study, the following research directions are meaningful.

(1)Extending this algorithm to a layered network structure. (2) Considering how to improve this algorithm within an energy harvesting MANET scene.

(3)Designing a QoS cross layer routing algorithm to extend our current work.

The authors have declared that no competing interests exist.

作者已声明无竞争性利益关系。

Reference

By original order

By published year

By cited within times

By Impact factor

[1]

Blazevic

, L.

Buttyan

, S.

Capkun

, et al.

Self-organization in mobile ad-hoc networks: the approach of terminodes

[J]. IEEE Communications Magazine, 2001, 39(6): 166-174.

[2]

Bruno

, M.

Conti

, E.

Gregori

Mesh networks: commodity multihop Ad Hoc networks

[J]. IEEE Press, 2005, 43(3): 123-131.

[3]

Yang

, Y.

Liu

Understanding node localizability of wireless Ad Hoc and sensor networks

[J]. IEEE Transactions on Mobile Computing, 2012, 11(8): 1249-1260.

[4]

A. Rahman

, M.

S. Hossain

A location-based mobile crowdsensing framework supporting a massive Ad Hoc social network environment

[J]. IEEE Communications Magazine, 2017, 55(3): 76-85.

[5]

T. Dinh

, Y.

Kim

Information-centric dissemination protocol for safety information in vehicular ad-hoc networks

[J]. Wireless Networks, 2017, 23(5): 1359-1371.

[6]

J. Lee

, W.

, M.

Gerla

Wireless Ad Hoc multicast routing with mobility prediction

[J]. Mobile Networks and Applications, 2001, 6(4): 351-360.

[7]

, S.

Papavassiliou

MHMR: Mobility-based hybrid multicast routing protocol in mobile Ad Hoc wireless networks

[J]. Wireless Communications and Mobile Computing, 2003, 3(2): 255-270.

[8]

Bentaleb

, S.

Harous

, A.

Boubetra

A weight based clustering scheme for mobile Ad Hoc networks

[C]// The 11th International Conference on Advances in Mobile Computing and Multimedia,Vienna, 2013: 161-166.

[9]

Guo

, O.

Yang

Maximizing multicast communication lifetime in wireless mobile Ad Hoc networks

[J]. IEEE Transactions on Vehicular Technology, 2008, 57(4): 2414-2425.

[10]

B. Thriveni

, G.

M. Kumar

, R.

Sharma

Performance evaluation of routing protocols in mobile ad-hoc networks with varying node density and node mobility

[C]// International Conference on Communication Systems and Network Technologies,Gwalior, 2013: 252-256.

[11]

Suraj

, S.

Tapaswi

, S.

Yousef

, et al.

Mobility prediction in mobile Ad Hoc networks using a lightweight genetic algorithm

[J]. Wireless Networks, 2016, 22(6): 1797-1806.

[12]

Manimegalai

, C.

Jayakumar

, G.

Gunasekaran

Using animal communication strategy(ACS)for MANET routing

[J]. Journal of the National Science Foundation of Sri Lanka, 2015, 43(3): 199-208.

[Cited within: 4]

[13]

Singal

, V.

Laxmi

, M.

S. Gaur

, et al.

Moralism: mobility prediction with link stability based multicast routing protocol in MANETs

[J]. Wireless Networks, 2017, 23(3): 663-679.

[14]

Selvi,

Pitchaimuthu

Ant based multipath backbone routing for load balancing in MANET

[J]. IET Communications, 2017, 11(1): 136-141.

[15]

Kout

, S.

Labed

, S.

Chikhi

, et al.

AODVCS,a new bio-inspired routing protocol based on cuckoo search algorithm for mobile Ad Hoc networks

[J]. Wireless Networks, 2017(9): 1-11.

[16]

Liu

, Y.

, X.

Jiang

End-to-end delay in two hop relay MANETs with limited buffer

[C]// Second International Symposium on Computing and Networking,Shizuoka, 2015: 151-156.

[17]

Chettibi

, S.

Chikhi

Adaptive maximum-lifetime routing in mobile ad-hoc networks using temporal difference reinforcement learning

[J]. Evolving Systems, 2014, 5(2): 89-108.

[18]

Petrowski

, F.

Aissanou

, I.

Benyahia

, et al.

Multicriteria reinforcement learning based on a Russian doll method for network routing

[C]// IEEE International Conference Intelligent Systems,London, 2010: 321-326.

[19]

Vijayalakshmi

, S.

A. J. Francis

, J.

A. Dinakaran

A robust energy efficient ant colony optimization routing algorithm for multi-hop Ad Hoc networks in MANETs

[J]. Wireless Networks, 2016, 22(6): 2081-2100.

[20]

Chettibi

, S.

Chikhi

Dynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networks

[J]. Applied Soft Computing, 2016, 38: 321-328.

[Cited within: 4]

[21]

Srivastava

, R.

Kumar

A new QoS-aware routing protocol for MANET using artificial neural network

[J]. Journal of Computing and Information Technology, 2016, 24(3): 221-235.

[Cited within: 4]

[22]

K. Das

, S.

Tripathi

Intelligent energy-aware efficient routing for MANET

[J]. Wireless Networks, 2016(7): 1-21.

[23]

S. Narendra

, M.

A. L. Thathachar

Learning automata: An introduction

[M]. USA: DBLPPress, 2012.

[Cited within: 6]

[24]

A. L. Thathachar

, P.

S. Sastry

A hierarchical system of learning automata that can learn the globally optimal path

[J]. Information Sciences, 1987, 42(2): 143-166.

[25]

Beigy

, M.

R. Meybodi

Utilizing distributed learning automata to solve stochastic shortest path problems

[J]. International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems, 2006, 14(05): 591-615.

[26]

L. Thathachar

, P.

S. Sastry

Varieties of learning automata: an overview

[J]. IEEE Transactions on Systems Man＆Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man＆Cybernetics Society, 2002, 32(6): 711-722.

[27]

A. Anasane

, R.

A. Satao

A survey on various multipath routing protocols in wireless sensor networks

[J]. Procedia Computer Science, 2016, 79: 610-615.

[28]

B. West

Introduction to graph theory

[M]. 2nd ed. USA: McGrawHill Higher EducationPress, 2005: 260.

[29]

Das

, D.

K. Lobiyal

, C.

P. Katti

Multipath routing in mobile Ad Hoc network with probabilistic splitting of traffic

[J]. Wireless Networks, 2016, 22(7): 2287-2298.

[30]

J. Kushner

Approximation and weak convergence methods for random processes,with applications to stochastic systems theory

[M]. USA: MIT PressPress, 1984.

[31]

Wahba

Erratum: spline interpolation and smoothing on the sphere

[J]. Siam Journal on Scientific ＆ Statistical Computing, 2012, 2(2): 5-16.

[32]

Chen

, W.

Convergence behaviour of inexact Newton methods under weak Lipschitz condition

[J]. Journal of Computational＆Applied Mathematics, 2006, 191(1): 143-164.

[33]

A. Anastassiou

, S.

G. Gal

Approximation theory: Moduli of continuity and global smoothness preservation

[M]. USA: DBLPPress, 2000.

[34]

F. Riley

, T.

R. Henderson

The ns-3 network simulator

[J]. Modeling and Tools for Network Simulation, 2010: 15-34.

[35]

S. Khan

, Q.

K. Jadoon

, M.

I. Khan

Mobile and wireless technology 2015: A comparative performance analysis of MANET routing protocols under security attacks

[M]. Germany: Springer Berlin Heidelberg,2015Press, 310: 137-145.