From a functional perspective, we divide I-DAG into three sub-components:

### Label-based event streaming

Let IoT devices events be a sequence of error, backup and information messages with a representation as *E*_{i}, *B*_{i} and *I*_{i}, where each of the message belongs to sensory devices as *D**e**v**i**c**e*_{i} in the distributed computing environment as shown in Fig. 3. At each time interval *t*, streams generated through a function *f*_{i} holds an array of event messages *G*[1..(*E*_{i},*B*_{i},*I*_{i})] with *G*[*i*]=*f*_{i}. Therefore, when a new occurrence of event messages arrive, the function representation changes to *G*[*i*++] and the individual event message collection at each node could be represented as,

$$ G\left[i++\right]=G\left [\left(E_{i}, B_{i}, I_{i}\right) ++\right] $$

(1)

Where, *G*[*i*++] is a container managing multiple event messages arrival with *x*≥0.

In order to approximate the inner function elements of *G*[*i*++], implicit vectors such as *x*(*E*[1..*n*]), *y*(*B*[1..*n*]) and *z*(*I*[1..*n*]) are added into the stream instruction set with a proportion of (*E*_{i},*x*)++, (*B*_{i},*y*)++ and (*I*_{i},*z*)++ and returns an output approximation as,

$$ Event_{m}=\sum_{i=1}^{n}E_{i}\ast B_{i} \ast I_{i} $$

(2)

Where, *E**v**e**n**t*_{m}>0 and represents the container of processed heterogeneous event messages.

#### Lemma-1: \(SE_{o,p,q}=\sum _{n}^{i=1}\left \{ \left (e_{i}\times n_{o}\right), \left (b_{i}\times n_{p}\right), \left (i_{i}\times n_{q}\right)\right \}\)

The individual data segments of *E*_{i}, *B*_{i} and *I*_{i} arrives at nodes *N*_{o}, *N*_{p} and *N*_{q} through an incremental function *G*[*i*++] that assembles segments in formation order. This order summarize stream segments in such a way that *G*[*i*++] stores *S**E*_{o,p,q}≤0.

#### Lemma-2: *E*[*s*]=*P**P*(*e*_{i},*b*_{i},*i*_{i})

Since, \(SE_{o,p,q}= \sum _{n}^{i=1}\left \{ \left (E_{i}\times N_{o}\right), \left (B_{i}\times N_{p}\right), \left (I_{i}\times N_{q}\right)\right \}\), but, \(\sum _{n}^{i=1}\left \{ \left (E_{i}\times N_{o}\right)\times \left (B_{i}\times N_{p}\right)\times \left (I_{i}\times N_{q}\right)\right \}\neq N_{o,p,q}\times \left (\sum _{n}^{i=1}\left (E_{i},B_{i},I_{i}\right)\right)\). Therefore, the constraints are residing within the (*E*_{i},*B*_{i},*I*_{i},). Moreover, if *i*=*j*=*k* then *E*[*N*_{o,p,q}]=*E*[1]=1 and if *o*≠*p*≠*q*, then *E*[*N*_{o,p,q}] are independent and could be retrieved as, \(E\left [N_{o,p,q}\right ]=\tfrac {1}{2}1+\tfrac {1}{2}\left (-1\right)\). After that, the linearity of expectation could be represented as,

$$ {\begin{aligned} E\left[SE_{o,p,q}\right]=E\left[\left(\sum_{i=1}^{n}\left(E_{i}, B_{i}, I_{i}\right)\right)\right]\left(\sum_{i=1}^{n}\left(N_{o}, N_{p}, N_{q}\right)\right) \end{aligned}} $$

(3)

$${\begin{aligned} =E\left[\sum_{o,p,q}^{n}\left(N_{o}, N_{p}, N_{q}\right)\left(E_{i},B_{i},I_{i}\right)\right] \end{aligned}} $$

$${\begin{aligned} &=\sum_{o}^{n}\left(N_{o}\right)E\left [E_{i},B_{i},I_{i}\right] +\sum_{o \neq p }^{n}\left(N_{p}\right)E\left [E_{i},B_{i},I_{i}\right]\\&\quad+\sum_{o \neq p \neq q}^{n}\left(N_{q}\right)E\left [E_{i},B_{i},I_{i}\right] \end{aligned}} $$

Where *E*[*S**E*_{o,p,q}] manages the heterogeneous events with independent expectation parameters.

#### Lemma-3: \(V\left [sE_{o,p,q}\right ]\leq 2E\left [sE_{o,p,q}^{}\right ]^{2}\)

Since,

$$V\left[SE_{o,p,q}\right]=E\left[\left(SE_{o,p,q}\right)\right]^{2}-E\left[SE_{o,p,q}\right]$$

$$=\left(\sum_{o,p}^{n}...N_{o}N_{p}\right)\times \left(\sum_{p,q}^{n}...N_{p}N_{q}\right) $$

$${\begin{aligned} &=\sum_{o,p,q}^{n}\left(...N_{o}N_{p}N_{q}\right)\leq 2\left(\sum_{o}^{n}E_{i},B_{i},I_{i}\right)\\&\quad\times \left(\sum_{p}^{n}E_{i},B_{i},I_{i}\right)\times \left(\sum_{q}^{n}E_{i},B_{i},I_{i}\right) \end{aligned}} $$

$$=2 E \left[SE_{o,p,q}\right]^{2} $$

#### Lemma-4: average *T*_{1} and *T*_{2} of *S**E*_{o,p,q}

Let *A* be the output of algorithm-1, so

$$E\left[S\right]=PP\left(E_{i}, B_{i}, I_{i}\right), V\left(A\right)\leq 2E\left [A\right]^{2} $$

and that equals to the,

$$\sigma \left(A\right)=\sqrt{V\left(A\right)}\leq \sqrt{2}E\left[A\right] $$

Therefore, the bound of stream segment could be obtained as,

$$PE\left[\left| A-E\left[A\right]\right|> \varepsilon E\left[A\right]\right] $$

Thus,

$${\begin{aligned} &PE\left[\left| A-E\left[A\right]\right|> \varepsilon E\left[A\right]\right]\\&\leq PE\left[\left| A-E\left[A\right]\right|> \sqrt{2} \varepsilon \sigma \left(A\right)\right] \end{aligned}} $$

In order to reduce the variance, we apply Chebyshev inequality [40] to \( \sqrt {2}\varepsilon > 1\), we get the output as,

$$E\left[A_{i}\right]=PP\left(E_{i}, B_{i}, I_{i}\right), V\left(A_{i}\right)\leq 2E\left [A_{i}\right]^{2} $$

So if *B* be the average of \(\phantom {\dot {i}\!}A_{i},...,A_{T_{1}T_{2}}\)

$$E\left[B\right]=PP\left(E_{i}, B_{i}, I_{i}\right), V\left(B\right)\leq \frac{2E\left [B\right]^{2}}{T_{1}T_{2}} $$

Now, by Chebyshev’s inequality, as \(T_{1}T_{2}\geq \frac {16}{\varepsilon ^{2} }\),

we get,

$$PE\left[\left| B-E\left[B\right]\right|> \varepsilon E\left[B\right]\right]\leq \frac{V\left(B\right)}{\left(\varepsilon H\left[B\right]\right)^{2}} $$

$$PE\left[\left| B-H\left[B\right]\right|> \varepsilon H\left[B\right]\right]\leq \frac{2H\left[B\right]^{2}}{\left(T_{1}T_{2}\varepsilon^{2}H\left[B\right]^{2}\right)}\leq \frac{1}{8} $$

At this point, streaming bound *δ* could be obtained but since a dependence of \(\frac {1}{\delta }\) is present, therefore, we apply lower bound inequality Hoeffding [41] on *H*[*B*]=*P**P*(*E*_{i},*B*_{i},*I*_{i}) and get,

$$PE\left[\left(1-\varepsilon\right)H\left[B\right] \leq B\leq \left(1+\varepsilon\right)H\left[B\right]\right]\geq \frac{7}{8} $$

Now execute median function *Z* of *T*_{1}*T*_{2} onto \(B,B_{1},...,B_{T_{1}T_{2}}\phantom {\dot {i}\!}\) and we get,

$$ PE\left[\left| Z-H\left[B\right]\right|\geq \varepsilon H\left[B\right]\right]\leq \delta $$

(4)

when,

$$T_{1}T_{2}\geq \frac{32}{9}ln\frac{2}{\delta } $$

The stream approximation could be obtained as,

$$ \left(E_{i}\right)_{N_{o}}=O\left(\frac{1}{\varepsilon^{2}}ln\frac{2}{\delta}\right)_{HN_{i,i}} $$

(5)

$$ \left(B_{i}\right)_{N_{p}}=O\left(\frac{1}{\varepsilon^{2}}ln\frac{2}{\delta}\right)_{BN_{o,p}} $$

(6)

$$ \left(I_{i}\right)_{N_{q}}=O\left(\frac{1}{\varepsilon^{2}}ln\frac{2}{\delta}\right)_{IN_{i,k}} $$

(7)

This stream approximation defines the existence of managing heterogeneous parameters in the I-DAG.

### Heterogeneous stream transformation

The distributed stream elements with probability *α*(*t*) are sampled at time *t* with a computing average of,

$$\alpha \left(t\right)=\alpha,contant: error\simeq \frac{1}{\sqrt{\alpha \times t}}\rightarrow 0 $$

and,

$$\alpha \left(t\right)\simeq \frac{1}{\varepsilon^{2}\times t}:error\simeq \varepsilon, constant\ over\ time $$

In order to perform encapsulation, reservoir sampling is used because it allows adding first *k* stream elements to the sample having total items *t*−*t**h* with probability \(\frac {k}{t}\). Thus, for every *t* and *i*≤*t*, the sample probability is evaluated as,

$$ P_{i,t}=PE\left[s_{i}\ in\ sample\ at\ time\ t\right]=\frac{k}{t} $$

(8)

and for *t*+1, the sample probability becomes,

$$P_{t+1,t+1}=PE\left[s_{t+1}\ sampled\right]=\frac{k}{t+1} $$

This is mandatory because of the inter-connected heterogeneous IoT tuples that are to be incorporated with the internal of time. The processing of *t*+1 with *i*≤*t* eventually reduces the role of *s*_{i} and returns *s*_{t+1} as,

$$ P_{i,t+1}=\frac{k}{t}\times \left(1-\frac{k}{t+1}\times \frac{1}{k}\right) $$

(9)

$$=\frac{k}{t}\times \left(1-\frac{1}{t+1}\right) $$

$$=\frac{k}{t}\times \frac{t}{t+1}=\frac{k}{t+1} $$

The frequency table of stream events uses the event arrival probability *P*_{i,t+1} into Like space saving of count-min sketch to bring an order between transformed heterogeneous stream events as shown in Fig-3. This space saving function provides an approximation *f**x*′ to *f*_{x} for every *x* and consumes memory equals to \(O\left (\frac {1}{\Theta }\right)\). Therefore, when a stream vector *G*[*n*] is processed with *G*[*i*]≥0 for ∀*i* *ε* *t*, it estimates heterogeneous stream *G*^{′} of *G* as,

$$G\left[i\right]\leq G'\left[i\right]\ \ \ \forall i $$

and,

$$G'\left[i\right]\leq G\left[i\right]+\varepsilon \left| G\right|{~}_{1}\ \ \ \forall i, with\ probability\ \geq 1-\delta $$

Where, \(\left | G\right |{~}_{1} =\sum _{i} G\left [i\right ]\) and |*G*| _{1}≪*s**t**r**e**a**m**l**e**n**g**t**h* having \(O\left (\frac {1}{\varepsilon ^{2}}ln\frac {2}{\delta }\right)_{HN_{i,i}}\), \(O\left (\frac {1}{\varepsilon ^{2}}ln\frac {2}{\delta }\right)_{BN_{o,p}}\) and \(O\left (\frac {1}{\varepsilon ^{2}}ln\frac {2}{\delta }\right)_{IN_{i,k}}\) memory with \(O\left (ln\frac {n}{\delta }\right)\) update time *t*.

The heterogeneous events stream \(\sum _{i} G\left [i\right ]\) consists of *d* independent hash functions *h*_{1}...*h*_{d}:[1..*n*]→[1..*w*] where, each of the stream element holds memory *g*_{p}(*i*) that uses instruction set *G*[*i*]+=(*E*_{i},*B*_{i},*I*_{i}) having *g*_{p}(*i*)+=(*E*_{i},*B*_{i},*I*_{i}) for ∀ *j**ε* 1..*d* and the frequency table of heterogeneous events stream could be retrieved as,

$$ G'\left[i\right]=min\left \{ g_{p}\left(i\right)|j=1..d \right \} $$

(10)

This declares that the accessibility of the heterogeneous events stream in enlisted in the I-DAG.

#### Lemma-5: *G*^{′}[*i*]≥*g*[*i*]

The minimum count of heterogeneous events stream *G*[*i*] remains ≥ 0 for ∀ *i* with a frequency of *u**p**d**a**t**e*(*g*_{p}(*i*)). The stream element having hash function *I*_{o,p,q}=1 if *g*_{p}(*i*)=*g*_{p}(*k*)=0 could be retrieved as,

$$ H\left[I_{o,p,q}\right]\leq \frac{1}{range\left(g_{p}\right)}=\frac{1}{w} $$

(11)

By definition \(A_{o,p}=\sum _{k}H\left [I_{o,p,q}\right ]\times G\left [k\right ]\), the heterogeneous events stream can be represented as,

$$ A_{o,p}=\sum_{k}H\left[I_{o,p,q}\right]\times G\left[k\right]\leq \frac{\left| G\right|{~}_{1}}{w} $$

(12)

Now, this stream is well connected and could not be ready independently. Therefore, we apply Markov inequality and pairwise independence as,

$$ PE\left[A_{o,p} \geq \varepsilon \left| G\right|{~}_{1}\right]\leq \frac{H\left[A_{o,p}\right]}{\varepsilon \left| G\right|{~}_{1}}\leq \frac{\left(\frac{\left| G\right|{~}_{1}}{w}\right)}{\left(\varepsilon \left| G\right|{~}_{1}\right)}\leq \frac{1}{2} $$

(13)

if \(w=\frac {2}{\varepsilon }\) then,

$$PE\left[G'\left[i\right]\geq G\left[i\right]+\varepsilon \left| G\right|{~}_{1}\right] $$

$$=PE\left[\forall\ j \ :G\left[i\right] +A_{o,p}\geq G\left[i\right]+\varepsilon \left| G\right|{~}_{1}\right] $$

$$ =PE\left[\forall\ j \ : A_{o,p}\geq \varepsilon \left| G\right|{~}_{1}\right] \leq \left(\frac{1}{2}\right)^{d}=\delta $$

(14)

$$if \ d=log\left(\frac{1}{\delta}\right) $$

for fixed value of *i* as shown in Figs. 4 and 5. Thus, we observe that the events are synchronized to a central container with independence of accessibility.

### I-DAG workflow

The events generated through IoT devices with a sequential order of *P**E*[∀ *j* :*A*_{o,p}≥*ε*|*G*| _{1}]are scheduled onto the I-DAG that consists of an identifier *L**o**c**a**t**o**r*_{I−DAG} which reads events labels \(\left (E_{i}\right)_{N_{o}}=O\left (\frac {1}{\varepsilon ^{2}}ln\frac {2}{\delta }\right)_{HN_{i,i}}\), \(\left (B_{i}\right)_{N_{p}}=O\left (\frac {1}{\varepsilon ^{2}}ln\frac {2}{\delta }\right)_{BN_{o,p}}\) and \(\left (I_{i}\right)_{N_{q}}=O\left (\frac {1}{\varepsilon ^{2}}ln\frac {2}{\delta }\right)_{IN_{i,k}}\) in the source file and shuffle the pointer between *n* stages as shown in Fig. 6.

In order to perform stage predictor evaluation, the workflow targets *P**E*[∀ *j* :*A*_{o,p}≥*ε*|*G*| _{1}]: *s**t**a**g**e*(*n*)→*s**t**a**g**e*(*n*+1) with *L**o**c**a**t**o**r*_{I−DAG}: *s**t**a**g**e*(*n*)→*s**t**a**g**e*(*n*+1) keeping the error under loss function \(\vartheta :stage\left (n+1\right)\times stage\left (n+1\right)\rightarrow \mathbb {R}\). The predictor error can be obtained as,

$$ {\begin{aligned} {}& H_{stage\left(n\right)}\left[\vartheta\left(P\left[\forall\ j \ : A_{o,p}\geq \varepsilon \left| G\right|{~}_{1}\right]\right.\right. \\& \left.\left.\left(stage\left(n\right)\right),Locator_{I-DAG}\right)\right] \end{aligned}} $$

(15)

This predictor error manages the discrepancies of inter-connection in the I-DAG workflow.

The *L**o**c**a**t**o**r*_{I−DAG}with an approximated finite heterogeneous event labels can be sampled with *S*_{I−DAG}=((*s**t**a**g**e*(*n*)_{1},*s**t**a**g**e*(*n*+1)_{1}),...,(*s**t**a**g**e*(*n*)_{n},*s**t**a**g**e*(*n* + 1)_{n})) through \(\frac {1}{n}\sum _{i=1}^{n}\vartheta \left (P\left [ \forall \ j \ : A_{o,p}\geq \varepsilon \left | G\right |{~}_{1}\right ], stage\left (n+1\right)\right)\). The workflow loss function are categorized into two types: (i) regression and (ii) classification. The regression loss on predictor *L**o**c**a**t**o**r*_{I−DAG} is expressed as,

$$ \vartheta\left(a,b\right)=\left(a-b\right)^{2} $$

(16)

and classification loss on predictor *L**o**c**a**t**o**r*_{I−DAG} is expressed as,

$$ \vartheta\left(a,b\right)=0\ if\ a=b, 1\ otherwise $$

(17)

Thus, I-DAG is ready to facilitate the independent heterogeneous IoT entries with prediction locator.