Cluster-based routing and wavelength assignment for multiple multicasts in 3D Optical Network-on-Chip

Wen Yang; Yawen Chen; Zhiyi Huang; Haibo Zhang; Huaxi Gu; Jun Zhao

doi:10.1364/OE.430194

1. Introduction

Optical Network-on-Chip (ONoC) is an emerging chip-level optical interconnection technology to realise high-performance and power-efficient inter-core communication for many-core processors [1]. Three-Dimensional (3D) integration is another emerging technology to overcome the barriers of interconnect scaling, where Through-Silicon-Via (TSV) [2] is used to stack multiple device layers together with direct vertical interconnects. By combining the 3D integration technology with ONoC, 3D ONoC will bring further performance improvements compared to its 2D counterparts, such as high routing diversity and low transmission latency. Within 3D ONoC, multicast communication, where a packet is delivered from a source to multiple destinations concurrently, is one of the most important inter-core communication forms. It is not only widely used in parallel computing applications in Chip Multi-Processors (CMPs) (e.g., cache coherence, barrier synchronization) [3], but also common in emerging areas such as neuromorphic computing [4]. Previous researches have shown that multicast traffic contributes to a large percentage of the total traffic in CMPs [5]. Moreover, the percentage of multicast traffic and the average number of destinations both increase with the increase on the number of cores [6]. Therefore, multicast communication is an important and challenging problem in 3D ONoC.

To support high-performance multicast in 3D ONoC, the following constraints should be considered: (1) Wavelength constraint. Although Wavelength-Division-Multiplexing (WDM) [7] allows multiple optical signals to be transmitted concurrently through one waveguide using different wavelengths in ONoC, the maximal number of supported wavelengths per waveguide is limited in realistic scenarios (e.g., at most 62 wavelengths in 19 Gbps data rate network [8]). Besides, laser power is proportional to the number of wavelengths used [8]. Hence, using more wavelengths will incur more power consumption, which in turn causes high heat dissipation and affects the reliability of ONoC. (2) Architecture constraint. Because 3D ONoC has different architecture characteristics with 2D ONoC (e.g., the layout of nodes, optical routers), the simple extension of existing methods from 2D ONoC to 3D ONoC may not have desirable performance. (3) Multiple multicasts constraint. Most existing studies for multicast in 3D ONoC only consider optimization of one multicast (one source and corresponding multiple destinations), which will cause high contentions without considering other multicasts if multiple multicasts exist at the same time. Overall, existing multicast studies have rarely considered all the above constraints. The problem of efficiently routing and assigning wavelengths for multiple multicasts has not been well studied for 3D ONoC. To bridge this gap, we will target optimal routing of multiple multicasts requested simultaneously from applications in 3D ONoC to improve the utilization of available wavelengths.

In this paper, we propose routing and wavelength assignment schemes to support multiple multicasts in 3D ONoC, with the objective of reducing the number of wavelengths required. As far as we know, this is the first work investigating the multiple multicasts problem in 3D ONoC. Our main contributions can be summarized as follows: (1) We identify six basic routing schemes used in 3D ONoC. (2) We derive six special distributions for multicast nodes in 3D ONoC, and propose an optimal routing scheme for each distribution with only one wavelength. (3) We design a heuristic routing and wavelength assignment scheme for random distributions of multicast nodes. (4) A three-plane hierarchical ONoC architecture is proposed to implement the proposed scheme. (5) We find an upper bound on the number of wavelengths used to evaluate the wavelength requirement.

The rest of the paper is organized as follows. Section 2 introduces related work. Section 3 introduces six basic routing schemes for 3D ONoC. Section 4 gives the routing theorems for special instances. Section 5 presents a heuristic routing and wavelength assignment method for multiple multicasts with general instances. Section 6 illustrates the system implementation for the given method. Section 7 evaluates performance through simulations. Finally conclusions are given in Section 8.

2. Related work

Previous work on multicast routing of 3D ONoC can be classified into two categories: tree-based (TB) and path-based (PB) algorithms. In the tree-based routing [7,8], a spanning tree is constructed by combining a root (the source node) and individual leaves (all destination nodes). Then the packet is transmitted along the tree. In [9], two efficient multicast algorithms for 3D mesh-connected multicomputers were designed, named DIAG and DDS, which are tree-based shortest-path multicast algorithms with a low complexity and a low transmission delay. The authors in [10] proposed a Single-cycle Multi-hop Asynchronous Repeated Traversal (SMART) 3D ENoC architecture that can achieve high-performance collective communication. In [11], the authors designed a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets for a NoC-based DNN accelerator. Path-based routing is another routing method designed for multicast in on-chip network, in which one packet is sent along a fixed Hamiltonian path. In this method, packets do not need to be replicated in the network, so it decreases packets congestion. The authors in [12] proposed an efficient Dynamic Partition Merging (DPM)- based multicast routing algorithm, which divides the multicast destination set into partitions dynamically by comparing the routing cost of different partition merging options and selecting the merged partitions with lower cost. In [13], several partitioning methods for the path-based multicast approach in 3D mesh-based ENoC were proposed, where a novel analytical model was designed to explore the efficiency of each partitioning approach. In [14], a highly adaptive and deadlock-free multicast routing method was proposed for 3D mesh-based ENoC (3D HOE) without using virtual channels, which used a turn model to maximise the degree of adaptiveness by minimising the number of prohibited turns. An additional method for unicast and multicast communication in 3D mesh-based network was presented in [15], which was guaranteed to be deadlock-free by means of the Hamiltonian path. However, it suffers from low performance and an inability to efficiently partition the network. In [16], the authors proposed an equilibrium partitioning method based on the 3D mesh architecture which can obtain a trade-off between the startup latency and the network latency to reduce the total latency with good scalability. In [17], a high-performance and adaptive routing algorithm was proposed for a partially connected 3D ENoC, which can achieve good performance under different traffic patterns, different number of elevators and different elevator assignment mechanisms. The authors in [18] proposed an "Elevator-First" distributed routing algorithm under non-regular 3D topologies in which the usual planar topologies were partially connected by only vertical links, using two virtual channels per physical link in X and Y dimensions.

Moreover, many multicast routing methods were proposed in optical network. In [19] , the authors proposed a highly efficient heuristic multicast-capable routing that is based on an adaptive genetic algorithm (GA) with minimum solution revisits on elastic optical networks (EONs). It showed that the method can provide more efficient EON planning than the existing multicast-capable RMSA algorithms that use the shortest path tree (SPT) and the minimal spanning tree (MST). The authors in [20] discussed how to realize highly efficient data migration and backup for big data applications in elastic optical networks, and modeled the data migration in these networks as a dynamic anycast problem with several efficient algorithms. In [21], the authors proposed several online service provisioning algorithms that incorporate dynamic routing, modulation, and spectrum assignment (RMSA) with a hybrid single-/multi-path routing (HSMR) scheme. The results showed that the proposed HSMR schemes could effectively reduce the bandwidth blocking probability of dynamic RMSA, as compared to two benchmark algorithms that used single-path routing and split spectrum. A new algorithm based on EON was proposed by employing a subtree-based scheme to establish multicast transmissions [22], which takes into account the effects of splitting the transient light at the intermediary nodes on the signal-to-noise ratio to enhance the accuracy of the resource allocation.

Although these methods can improve network performance, they only consider one multicast. When there are multiple multicasts requested from the applications concurrently, these methods are very likely to cause high contentions without considering other multicasts. Besides, most of these methods are designed in terms of electrical NoC or optical network without considering the wavelength utilization that is a vital performance factor for ONoC. Hence, an efficient routing and wavelength assignment design for multiple multicasts in 3D ONoC is necessary.

In the following sections, we first introduce basic routing schemes used in 3D mesh-based ONoC, and then, an optimal routing algorithm for special distributions is identified which only needs one wavelength. Finally, a heuristic routing and wavelength assignment algorithm is proposed by extending the results obtained by special instances to general instances.

3. Basic routing schemes in 3D ONoC

Since the complexity of a routing scheme in 3D ONoC impacts the network performance (e.g., power consumption and transmission latency), low-complexity dimension-ordering routings are used as the basic routing components in this paper because of their simple logics and easy implementation. In this section, six dimension-ordering routings used in 3D mesh topology are proposed firstly. Then, six special distributions of multicast nodes are derived by analysing the distribution of the nodes in a mesh-based 3D ONoC.

3.1 Preliminaries

Some general terms used in this paper are introduced at first. The interconnection of a 3D ONoC can be modelled as a directed graph $G=(V, E)$, where the set of vertices $V=\{v_1, v_2,\ldots, v_N\}$ represents all nodes in the network and the coordinate of node $v_i\ (1\leq i \leq N)$ is represented by ($x_{v_i}, y_{v_i}, z_{v_i}$); the set of edges $E=\{e_{ij}\}$ includes all the optical links between two adjacent nodes $v_i$ and $v_j$ ($1\leq i \leq N$, $1\leq j \leq N$). Let $M=\{m_i| 1\leq i \leq C\}$ be the multicast set and $m_i$ be the $i$th multicast. The nodes involved in multicast $m_i$ include its source node $s_i$ and its destination node set $D_i=\{d_{i,j}|1\leq j \leq |D_i|\}$, where $d_{i,j}$ is the $j$th destination node and $|D_i|$ is the total number of destinations for multicast $m_i$.

Mesh-based and ring-based topologies have been widely used in ONoC. Ring is a simple topology for on-chip interconnection. However, it lacks transmission parallelism due to its shared bus topology. Moreover, the high connectivity requirements of ring-based topology leads to higher power requirement, especially when the number of nodes increases. Therefore, ring-based topology is only suitable for a small sized network (e.g., tens of nodes). Compared to the ring-based topology, the mesh-based topology uses less number of wavelengths since it has more routes for communication, which makes the same wavelength used by multiple communications if their transmission paths are link disjoint. Besides, mesh networks match well with the planar silicon geometry and provide better scalability and higher bandwidth than bus or ring networks. Considering the above reasons, the chip development trend in future (i.e., thousands of cores to fit on one chip), and our research objective (i.e., reducing the number of wavelength used), we choose the mesh-based topology in the routing design of this paper. In mesh-based 3D ONoC, there are 3 axes: x-axis, y-axis, z-axis, which run horizontally, depth-wise, and vertically, respectively. We call them row, column, and shaft, respectively. By fixing the value of one axis, three flat planes can be derived. Hence, XY-plane (Fig. 1 (a)) is the plane in 3D space which has a fixed value for z-axis, XZ-plane (Fig. 1 (b)) is the plane which has a fixed value for y-axis, YZ-plane (Fig. 1 (c)) is the plane which has a fixed value for x-axis.

Fig. 1. Three planes derived by fixing the value of one axis in 3D space

Download Full Size | PDF

3.2 Dimension-Ordering Routing in 3D ONoC

Dimension-Ordering Routing (DOR) is widely used in ONoC with a grid-based topology [23], which can uniformly distribute the shortest paths between all pairs of nodes. In DOR, packets are transmitted along one dimension first, and then turned to another dimension, according to the position of the nodes. For example, XY-routing is a DOR in mesh-based 2D ONoC, which transmits a packet using y-dimension links to reach the destination after using x-dimension links. In mesh-based 3D ONoC, six DORs can be derived according to a different routing sequence of three dimensions, which are shown as follows: In XYZ routing, packets are transmitted along the x-axis first, until reaching the column that has the same x-axis coordinate as the destination. Then, turn to the y-axis and go along the y-axis until the intermediate node that has the same y-axis coordinate as the destination. Finally, turn to the z-axis to arrive at the destination. Similarly, YXZ routing, XZY routing, YZX routing, ZXY routing and ZYX routing can be derived.

3.3 Properties of the proposed DORs

XYZ, YXZ, XZY, YZX, ZXY and ZYX routings have the following properties, and we will use them in the following algorithms. (1) Low routing complexity. These six routing schemes are dimension-ordering routings with simple combinational logic in routing functions and none have optical buffer to store routing tables. (2) Low latency. These six routing schemes transmit packets along shortest paths between all pairs of nodes, which can reduce the packet’s latency. (3) At most two turn-around counts. Since a turn-around is a 90 degree turn from one dimension to another, by tuning on one Micro-ring Resonator (MR), a low number of turn-around counts can reduce the power consumption used for tuning the MRs.

3.4 Distribution of multicast nodes

To solve the communication problem of multiple multicasts, we should consider not only each multicast individually, but also the distributions of different multicasts. Hence, we analyse the distribution of nodes in mesh-based 3D ONoC firstly, and then six special distributions for multiple multicasts are proposed. The distribution of nodes in mesh-based 3D ONoC is classified as follows:

Same Row: Nodes have the same y-axis and z-axis coordinates;

Same Column: Nodes have the same x-axis and z-axis coordinates;

Same Shaft: Nodes have the same x-axis and y-axis coordinates;

Different Rows: Nodes in the XY-plane have different y-axis coordinates; Nodes in the XZ-plane have different z-axis coordinates;

Different Columns: Nodes in the XY-plane have different x-axis coordinates; Nodes in the YZ-plane have different z-axis coordinates;

Different Shafts: Nodes in the XZ-plane have different x-axis coordinates; Nodes in the YZ-plane have different y-axis coordinates.

For multiple multicasts, two sets of nodes exist: a source node set and a destination node set. Each node set has these six distributions. By analysing all possible combinations of source node distribution and destination node distribution, six special distributions are identified as follows: ① Sources are in different rows and destinations from different multicasts are in different columns; ② Sources are in different columns and destinations from different multicasts are in different rows; ③ Sources are in different rows and destinations from different multicasts are in different shafts; ④ Sources are in different shafts and destinations from different multicasts are in different rows; ⑤ Sources are in different columns and destinations from different multicasts are in different shafts; ⑥ Sources are in different shafts and all destinations from different multicasts are in different columns.

For each special distribution, one of the DORs derived in Section 3.2 can be chosen to establish non-overlapping routing paths by an optimal routing scheme that is proposed in the following section.

4. Optimal routing and wavelength assignment of multiple multicasts for special instances in 3D ONoC

The main idea of the optimal routing and wavelength assignment for multiple multicasts is that, for any given special distribution of multicast nodes in 3D ONoC, we can always use one of the six DORs to establish non-overlapping routing paths with only one wavelength. Specifically, six routing theorems are derived for the six special distributions.

Theorem 1 (for distribution ①) Given a set of multicasts $M=\{m_1,\ldots, m_i,\ldots,m_C\}$ in 3D ONoC, if the distribution of sources and destinations of the set satisfies the following three conditions, only one wavelength is required by using XZY routing to achieve $M$ simultaneously.

Condition I: If $z_{s_i}=z_{s_j}$ for any $i$, $j$ ($1\leq i\leq C, 1\leq j\leq C, i\neq j$), $y_{s_i}\neq y_{s_j}$.

Condition II: $\forall d_{i,p}\!\!\in \!\!\ D_i$, $\forall d_{j,q}\!\!\in \!\! D_j$ ($p\in [1, |D_i|], q\in [1, |D_j|]$), if $z_{d_{i,p}}=z_{d_{j,q}}$, $x_{d_{i,p}}\neq x_{d_{j,q}}$.

Condition III: if $z_{s_i}\neq z_{s_j}$ and $y_{s_i}=y_{s_j}$ for any $i$, $j$, $x_{d_{i,p}}\neq x_{d_{j,q}}$.

Proof 1 For Condition I, if sources in an XY-plane have distinct y-axis coordinates (different rows), each row in this plane hosts one multicast at most, and the links in the row can be used exclusively by the multicast.

For Condition II, if destinations of any multicast do not share columns with other multicasts in an XY-plane, each column hosts one multicast at most in this plane, and the links of the column can only be used exclusively by the multicast.

For Condition III, if sources in different XY-planes have the same y-axis coordinate, the corresponding destinations cannot have the same x-axis coordinate. Because if they have the same x-axis coordinate, packets may arrive at the XY-plane that the destinations reside along the shaft having the same y-axis coordinate with the sources, making links in the shaft conflict; thereby, Condition III can prevent this situation from happening. Therefore, each shaft hosts at most one multicast and the links of the shaft can only be used by the multicast exclusively.

For any multicast in the set, we can use XZY routing via its dedicated row, dedicated shaft and dedicated column to find its multicast path that is non-overlapping with other multicasts. The routing path can be constructed like this: First, route from the source along the x-axis to find the column that each destination is located. If the destination is in the same XY-plane with the source, turn to y-axis to reach the destination. Otherwise, go along the z-axis to reach the same XY-plane with the destination and finally turn to the y-axis to reach the destination. Since the rows, shafts and columns host the multicast exclusively, the resulting paths will not overlap with the paths of any other multicasts in the set.

Theorem 2 (for distribution ②) Given a set of multicasts $M=\{m_1,\ldots, m_i,\ldots,m_C\}$, if the distribution of sources and destinations of the set satisfies the following three conditions, only one wavelength is required by using YZX routing to achieve $M$ simultaneously.

Condition I: If $z_{s_i}=z_{s_j}$ for any $i$, $j$ ($1\leq i\leq C, 1\leq j\leq C, i\neq j$), $x_{s_i}\neq x_{s_j}$.

Condition II: $\forall d_{i,p}\!\!\in \!\! D_i$, $\forall d_{j,q}\!\!\in \!\! D_j$ ($p\in [1, |D_i|], q\in [1, |D_j|]$), if $z_{d_{i,p}}=z_{d_{j,q}}$, $y_{d_{i,p}}\neq y_{d_{j,q}}$.

Condition III: If $z_{s_i}\neq z_{s_j}$ and $x_{s_i}=x_{s_j}$ for any $i$, $j$, $y_{d_{i,p}}\neq y_{d_{j,q}}$.

Theorem 3 (for distribution ③) Given a set of multicasts $M=\{m_1,\ldots, m_i,\ldots,m_C\}$, if the distribution of sources and destinations of the set satisfies the following three conditions, only one wavelength is required by using XYZ routing to achieve $M$ simultaneously.

Condition I: If $y_{s_i}=y_{s_j}$ for any $i$, $j$ ($1\leq i\leq C, 1\leq j\leq C, i\neq j$), $z_{s_i}\neq z_{s_j}$.

Condition II: $\forall d_{i,p}\!\!\in \!\! D_i$, $\forall d_{j,q}\!\!\in \!\! D_j$ ($p\in [1, |D_i|], q\in [1, |D_j|]$), if $y_{d_{i,p}}=y_{d_{j,q}}$, $x_{d_{i,p}}\neq x_{d_{j,q}}$.

Condition III: If $y_{s_i}\neq y_{s_j}$ and $z_{s_i}=z_{s_j}$ for any $i$, $j$, $x_{d_{i,p}}\neq x_{d_{j,q}}$.

Theorem 4 (for distribution ④) Given a set of multicasts $M=\{m_1,\ldots, m_i,\ldots,m_C\}$, if the distribution of sources and destinations of the set satisfies the following three conditions, only one wavelength is required by using ZYX routing to achieve $M$ simultaneously.

Condition I: If $y_{s_i}=y_{s_j}$ for any $i$, $j$ ($1\leq i\leq C, 1\leq j\leq C, i\neq j$), $x_{s_i}\neq x_{s_j}$.

Condition II: $\forall d_{i,p}\!\!\in \!\! D_i$, $\forall d_{j,q}\!\!\in \!\! D_j$ ($p\in [1, |D_i|], q\in [1, |D_j|]$), if $y_{d_{i,p}}=y_{d_{j,q}}$, $z_{d_{i,p}}\neq z_{d_{j,q}}$.

Condition III: If $y_{s_i}\neq y_{s_j}$ and $x_{s_i}=x_{s_j}$ for any $i$, $j$, $z_{d_{i,p}}\neq z_{d_{j,q}}$.

Theorem 5 (for distribution ⑤) Given a set of multicasts $M=\{m_1,\ldots, m_i,\ldots,m_C\}$, if the distribution of sources and destinations of the set satisfies the following three conditions, only one wavelength is required by using YXZ routing to achieve $M$ simultaneously.

Condition I: If $x_{s_i}=x_{s_j}$ for any $i$, $j$ ($1\leq i\leq C, 1\leq j\leq C, i\neq j$), $z_{s_i}\neq z_{s_j}$.

Condition II: $\forall d_{i,p}\!\!\in \!\! D_i$, $\forall d_{j,q}\!\!\in \!\! D_j$ ($p\in [1, |D_i|], q\in [1, |D_j|]$), if $x_{d_{i,p}}=x_{d_{j,q}}$, $y_{d_{i,p}}\neq y_{d_{j,q}}$.

Condition III: If $x_{s_i}\neq x_{s_j}$ and $z_{s_i}=z_{s_j}$ for any $i$, $j$, $y_{d_{i,p}}\neq y_{d_{j,q}}$.

Theorem 6 (for distribution ⑥) Given a set of multicasts $M=\{m_1,\ldots, m_i,\ldots,m_C\}$, if the distribution of sources and destinations of the set satisfies the following three conditions, only one wavelength is required by using ZXY routing to achieve $M$ simultaneously.

Condition I: If $x_{s_i}=x_{s_j}$ for any $i$, $j$ ($1\leq i\leq C, 1\leq j\leq C, i\neq j$), $y_{s_i}\neq y_{s_j}$.

Condition II: $\forall d_{i,p}\!\!\in \!\! D_i$, $\forall d_{j,q}\!\!\in \!\! D_j$ ($p\in [1, |D_i|], q\in [1, |D_j|]$), if $x_{d_{i,p}}=x_{d_{j,q}}$, $z_{d_{i,p}}\neq z_{d_{j,q}}$.

Condition III: If $x_{s_i}\neq x_{s_j}$ and $y_{s_i}=y_{s_j}$ for any $i$, $j$, $z_{d_{i,p}}\neq z_{d_{j,q}}$.

The proof of Theorem 2 to Theorem 6 are similar to the proof of Theorem 1, so we omit them here.

Overall, the above six routing theorems for special distributions have the following advantages: (1) Only one wavelength is needed. As long as the distribution of multicast nodes satisfies the conditions of one routing theorem, the corresponding routing theorem can be used with only one wavelength; therefore, the number of wavelengths used in these situations is optimal. (2) Low routing complexity. Only dimension-ordering routings are used in these routing theorems, which are easy to implement and are shortest-path routing algorithms.

While the above routing algorithm is optimal for special distributions of multicast nodes, the distribution of multicast nodes is random in reality and it is hard to classify all multicast nodes to one particular distribution. Therefore, in the next section, the design is extended for general instances, where the distribution of multicast nodes is random.

5. Heuristic routing and wavelength assignment of multiple multicasts for general instances in 3D ONoC

In this section, a cluster-based routing and wavelength assignment scheme for the general distributions of multicast nodes is proposed, with the novelty of adopting the multicast-splitting strategy. Using the multicast-splitting strategy, one multicast can be logically split into several sub-multicasts, and multiple sub-multicasts without overlapping paths can be merged into one cluster with only one wavelength. Since the number of wavelengths is equal to or smaller than the number of clusters, the objective can be summarised to reduce the number of clusters. As shown in Fig. 2 (a), 3 wavelengths are needed without multicast-splitting in order to deal with the link conflicts. But, in Fig. 2 (b), the number of wavelengths can be reduced to 2 if we split multicast 1 and multicast 2 into two sub-multicasts respectively and merge sub-multicasts without overlapping-paths into one cluster. In Fig. 2 (b), the routing paths from node 5 to 14 and from node 4 to 0 are merged into cluster 1, while the remaining routing paths are merged into cluster 2. Each cluster requires only one wavelength as there are no overlapping paths; therefore, 2 wavelengths are needed by multicast-splitting strategy.

Fig. 2. Illustration of multicast-splitting strategy

Download Full Size | PDF

Specifically, we decouple all multicast nodes into a minimal number of clusters by iteratively forming a cluster as follows: firstly select the routing scheme that can hold more multicast nodes than others, and then choose the multicast nodes based on the selected routing scheme to form a cluster. Since each cluster holds multicast nodes that satisfy the conditions of one of the routing theorems derived from Section 4, the corresponding routing scheme can be used to establish routing paths and only one wavelength is assigned to each cluster. We call this algorithm Cluster-based Routing and Wavelength Assignment for Multiple Multicasts in 3D ONoC (CRWAMM).

With the objective of obtaining as few clusters as possible, the key point is which routing theorem (Theorem 1, 2, 3, 4, 5, 6) can choose nodes to form a cluster that accommodates more multicast nodes than using other routing theorems in each round. Before introducing the algorithm, several factors related to the distribution of multicast nodes in mesh-based 3D ONoC are given firstly.

Definition 1 Source Count is the number of multicasts whose sources are in a row (column / shaft), denoted as SC.

Definition 2 Destination Count is the number of multicasts whose destinations are in a row (column / shaft), denoted as DC.

SC and DC indicate the distribution of sources and destinations, respectively, in each dimension, which can be presented in an array diagram (Fig. 3). The meaning of each value in the diagram is as follows:$SD$ is an SC or DC identifying index. If $SD=0$, this diagram shows SC, while it shows DC when $SD=1$. $SI$ is a dimension identifying index that shows the specific dimension SC or DC obtained. If $SI=0$, it shows SC or DC in the rows; if $SI=1$, it shows SC or DC in the columns; if $SI=2$, it shows SC or DC in the shafts. $Value$ is a $q$-value area showing the exact number of SC or DC, where $f_i$ ($i\in [1, q]$) shows the SC or DC in the $i$th row (column / shaft). The value of $q$ is related to $SI$. For example, in an $n\times n\times l$ ONoC, if $SI=0$, $q= nl$; if $SI=1$, $q= nl$; if $SI=2$, $q= n^2$. We define $T_{SD,SI}$ as the biggest value among all $q$ values of a diagram ($T_{SD, SI}= max\{f_1, f_2,\ldots,f_q \}|_{SD, SI}$), which presents the number of multicasts whose sources or destinations are in the most dense row, column, or shaft.

Fig. 3. An array diagram for Source (Destination) Count

Download Full Size | PDF

To establish this routing method, three phases are needed, which are shown as follows:

Phase 1: Obtain the distribution of multicast nodes in mesh-based 3D ONoC. In this phase, SC and DC for each dimension are obtained at first, which works as a foundation to choose nodes to form a cluster. Specifically, six array diagrams are derived in a 3-dimension ONoC for the given multiple multicasts. Then, calculate $\{T_{SD,SI}\}$ for all SD, SI and use these values to choose a proper routing theorem for the cluster forming in the next phase.

Phase 2: Choose one routing theorem to form a cluster holding more multicast nodes than others. The problem solved in this phase is which special routing theorem can be used to form a cluster holding more multicast nodes than other routing theorems. The idea is to select the optional routing set $R$ from the six routing theorems according to the distribution of sources, and then choose one routing theorem from $R$ according to the distribution of destinations. The following is the detailed method according to the current distribution of multicast nodes.

(1) Select the optional routing set $R$. Since $\{T_{0,SI}\}$ shows the distribution of sources in the network, compare $\{T_{0,SI}\}$ for $T_{0,SI}\neq 0\ (SI\in P, P=\{0,1,2\})$ and get the minimum value $t$ ($t=min\{T_{0,SI}\ |\ SI\in P,\ P\in \{0,1,2\}\}$). When $t$ is obtained, the optional routing set $R$ is determined by the corresponding values of $SD$ and $SI$ according to Table 1 (a). The reason why the minimum value among $\{T_{0,SI}\}$ is used to choose the routing theorem is: the sources in the most dense row (column/shaft) will be decoupled to at most $T_{0,0}\ (T_{0,1}/ T_{0,2})$ clusters at the worst case. Choosing the minimum value among $\{T_{0,SI}\ |\ SI\in P,\ P\in \{0,1,2\} \}$ can make these sources grouped into as few clusters as possible, reducing the number of clusters.

Table 1. The relationship between SD, SI and routing selection

View Table | View all tables in this article

The relationship between the routing selection and the value of $SD$, $SI$ is shown in Table 1 when $t$ is obtained. For example, in Table 1 (a), the corresponding conditions of XZY or XYZ routing can be used to select sources for one cluster when $t$ is achieved by $SD=0$ and $SI=0$. This is because $t$ is obtained in one row for this situation. To make these sources in one row be grouped into different clusters, choosing sources row by row can make sources in as few clusters as possible. If the sources in one cluster are in different rows, they satisfy the Condition I of Theorem 1 and Theorem 3, so XZY and XYZ routing can be used in this situation.

(2) Select one routing from $R$. Get the corresponding $SD$ and $SI$ from Table 1 (b) for the routings in $R$ and calculate {$T_{1,SI}|\ SI\in Q,\ Q\subseteq P$} to get the minimum value $t'$ ($t'\!\!=\!\!min\{T_{1,SI}\}$) for $T_{1,SI}\!\!\neq \!\! 0$. The corresponding routing theorem in $R$ is selected when $t'$ is obtained.

Phase 3: Select nodes to form a cluster. After the routing theorem is decided, the corresponding conditions can be used to choose nodes to form a cluster. The algorithm works as follows:

Step 1 Select the sources that satisfy the Condition I of the routing theorem and mark them selected. Mark the sources that have not been chosen and the corresponding destinations unselected.
Step 2 Mark the destinations that do not satisfy the Condition III unselected.
Step 3 Select the destinations that satisfy the Condition II and derive one cluster.
Step 4 Repeat Step 1 to Step 3 until all multicast nodes are selected.

Each cluster derived as above satisfies the conditions of one special routing theorem and only one wavelength is needed for each cluster. The pseudocode for cluster-based routing algorithm of multiple multicasts in 3D ONoC is given in Algorithm 1. The algorithm complexity is $O(n^4 l^2)$ for $n\times n\times l$ ONoC. CRWAMM is a polynomial time algorithm which has low complexity. Through analysis, we derive the following result about the upper bound on the number of wavelengths used by CRWAMM.

Theorem 7 The upper bound of the number of wavelengths derived by CRWAMM is the number of clusters.

Proof 2 For CRWAMM, the distribution of multicast nodes in each cluster satisfies one of the special routing theorems, so only one wavelength is needed for each cluster. If the routing paths of each cluster conflict with any other clusters, one distinct wavelength is assigned to each cluster and the number of wavelengths is equal to the number of clusters. Otherwise, the same wavelength can be assigned to the clusters without links overlapping. Therefore, the number of wavelengths is smaller than the number of clusters.

6. System implementation

6.1 Network architecture

For the system implementation, a three-plane (core plane, optical control plane and data transmission plane) hierarchical ONoC architecture can be used [24] (as shown in Fig. 4(a)). The core plane contains all the micro-cores to realize the computation applications. The control plane is the kernel part of CRWAMM to construct the multicast paths and allocate optical wavelengths. The data transmission plane provides passive transmission for the multicast packets from the source core to all the destination cores along the configured multicast path and using the allocated wavelength. To implement CRWAMM, the functions of the corresponding planes work as follows.

6.1.1 Core plane

In the core plane (Fig. 4(b)), each core connects with the network through a network interface (NI), by which each core also connects the optical control plane with an optical access point and connects the data transmission plane with an optical router. The network interface realizes the coordination with the other two planes, such as sending the multicast routing requests to the optical control plane and configuring the connected optical router in the data transmission plane to construct the routing paths.

Fig. 4. A three-plane ONoC architecture for implementing CRWAMM

Download Full Size | PDF

6.1.2 Optical control plane

The optical control plane (Fig. 4(c)) is a vital component in the architecture, which consists of a Centralized Control Unit (CCU) and a cyclic optical control channel. In the optical control plane, CCU collects the multicast requests from the core plane, conducts the multicast routing and wavelength allocation, and dynamically configures the multicast paths in the data transmission plane by utilizing a centralized manner. CCU consists of a global wavelength table and a multicast Routing and Wavelength Assignment (RWA) optimizer. The global wavelength table maintains the wavelength utilization of each optical link, while the multicast RWA optimizer optimizes the multicast routing path by considering the minimization of the wavelength usage. When a multicast path is allocated, CCU updates the global wavelength table and sends out the configuration packets to all the cores which are located in the multicast path. The optical control channel only transmits the control packets between the network interfaces and CCU, e.g., the multicast routing requests, the multicast configuration and release packets. Thus, each network interface in the core plane has an optical access point in the optical control channel.

6.1.3 Data transmission plane

The data transmission plane (Fig. 4(d)) is a configurable wavelength-routed optical communication network and provides the passive transmission of multicast packets from the source core to all the destination cores along the configured routing paths using the assigned wavelengths. It consists of multicast-enabled optical routers and bidirectional optical links. As shown in Fig. 5(a), each optical router has seven pairs of input/output ports to connect with six neighboring routers in different directions (North, East, South, West, Up, Down) and a local core. The optical router utilizes the active MR to implement configurable optical switching for dynamically established routing paths. Each MR in the router has a unique resonant wavelength $\lambda$ just like a wavelength-selective filter, and it can be tuned by using thermal-optical or electrical-optical effects [25], which works as follows: When an MR is in the on-state mode (i.e., the wavelength of a traveling optical signal is equal to the resonant wavelength of an MR), the signal will be coupled into the MR and transmit along the other waveguide; otherwise, the signal will propagate along the original waveguide when an MR is in the off-state mode (i.e., the wavelength of a traveling optical signal is different to the resonant wavelength of an MR). In order to provide wavelength-routed multicast communication, the MRs in the optical router can be configured according to the routing matrix as shown in Fig. 5(b). For example, if the optical signal inputs from the East port and outputs to the West port, MR 24 is tuned to on-state. For the situation that a switch needs to switch two wavelengths to the same port, the corresponding MRs on the paths from input to output should be tuned on. For example, in Fig. 5, if the optical signal inputting from the North and East ports need to be transmitted to the same South port, MR 28 and 24 should be tuned to on-state simultaneously.

Fig. 5. The configurable optical router and the MR configuring matrix.

Download Full Size | PDF

6.2 Communication scheme

In the proposed CRWAMM architecture, the communication process mainly involves three steps: send multicast requests, compute routing and wavelength, and configure network. (1) Send Multicast Requests. In a fixed time slot, the cores in the core plane send the communication requests (including the addresses of source cores and destination cores) to the control plane. In the control plane, the multicast request packet is transmitted from the corresponding optical access point of the source core to CCU through the optical control channel. (2) Compute Routing and Wavelength. According to the distribution of the destination cores and the global wavelength utilization of optical links, the CCU calculates the routing paths and allocates optical wavelengths by using the heuristic algorithm given in Section 5. When the routing paths are allocated, the CCU updates the global wavelength table and sends out the configuration packets to all the cores which are located in the multicast paths. (3) Configure Network. When the configuration packet is received by the network interface of the cores on the multicast routing path, the local configuring unit changes the interconnection state of the connected optical router to establish the routing path. Finally, the multicast packets are modulated to optical signals and transmitted from the source core to all the destination cores in the data transmission plane.

It is worth noting that the system implementation is relatively static in this scheme, which collects the multicast requests in a fixed time slot. If there is a new multicast coming during the running time of step (2) and step (3), it should wait for the next time slot that the control plane collects the new multicast requests. Since the running time of implementation is very short, this system does not cause a long delay.

7. Performance evaluation

To evaluate the performance of CRWAMM, extensive simulations are carried out, using both synthetic traffic patterns and real data traces, compared with baseline routing methods (i.e., Tree-based routing (TB) and Path-based routing (PB)). The number of wavelengths is evaluated firstly since it is one of the important factors that influences energy consumption and chip complexity. Then, we evaluate the packet latency and power consumption of different routing and wavelength assignment schemes.

7.1 Wavelength usage

7.1.1 Simulation with synthetic traffic patterns

In synthetic-based simulations, the multicast traffic is subjected to the following settings: (1) Each node generates the multicast packets independently, with a data rate of $\theta$ packets/cycle/node, which follows the Poisson distribution, $\theta \in$ [0, 1]; (2) In each multicast, the source node and the destination nodes are distributed uniformly. This multicast traffic is generated in advance and stored in separate traffic files; therefore, the same traffic files are used to evaluate the performance of different routing and wavelength assignment schemes. (3) We compare CRWAMM with other routing schemes under different multicast ratios and different network sizes. The ratio of multicast nodes is defined as the proportion of multicast nodes to all nodes in the network, which indicates the number of nodes participating in multicast communication [26]. For instance, in an $8\times 8\times 3$ ONoC, if the multicast ratio is 30%, there are 57 nodes involved in the multicast communication. Since each multicast is supposed to have at least 3 nodes (one source and two destinations), there are at most 19 multicasts without overlapping. In our research, multiple multicasts means there are more than one multicast in the network. Therefore, we can select 2 to 19 multicasts (including the addresses of source and destinations) randomly from the derived multicast traffic files. Figure 6 shows the average number of wavelengths required for different multicast ratios ($30\%$, $50\%$, $90\%$) in different network sizes ($4\times 4\times 3$, $8\times 8\times 3$, $16\times 16\times 3$).

(1) $4\times 4\times 3$ ONoC. There are at most 14, 24 and 43 multicast nodes without overlapping in a $4\times 4\times 3$ ONoC (48 nodes) for 30%, 50%, 90% multicast ratios, respectively. Therefore, there are at most 4, 8 and 14 multicasts, respectively. From Fig. 6(a), it can be seen that CRWAMM requires the least number of wavelengths compared with traditional routing schemes (TB, PB) in a $4\times 4\times 3$ mesh-based ONoC under different multicast ratios. Overall, the number of wavelengths reduction of CRWAMM over TB and PB is 31.4%.

Fig. 6. Average number of wavelengths used of CRWAMM, PB and TB in different ONoC sizes under different multicast ratios

Download Full Size | PDF

(2) $8\times 8\times 3$ ONoC. There are at most 57, 96 and 172 multicast nodes without overlapping in an $8\times 8\times 3$ ONoC (192 nodes) for 30%, 50%, 90% multicast ratios respectively. Accordingly, there are at most 19, 32 and 57 multicasts respectively. Overall, the number of required wavelengths can be reduced by 35.1% in average compared to TB and PB (Fig. 6(b)).

(3) $16\times 16\times 3$ ONoC. When the multicast ratio is 30%, there are at most 230 multicast nodes without overlapping and at most 76 multicasts in a $16\times 16\times 3$ ONoC. Likewise, there are at most 384 (691) nodes and 128 (230) multicasts for multicast ratio 50% (90%). CRWAMM still requires the least number of wavelengths compared with other routing schemes. From Fig. 6(c), it can be seen that the overall reduction of the average number of wavelengths is 33% in a $16\times 16\times 3$ ONoC.

As observed from the above results, CRWAMM outperforms the other two routing schemes in the number of required wavelengths. The foremost reason for this performance gain is due to the efficiency of CRWAMM which considers multiple multicasts as a whole and establishes routing paths based on the distribution of multicast nodes. CRWAMM shows obvious advantages than tree-based and path-based routings: (1) The number of wavelengths required for CRWAMM is always least under different multicast ratios and different network sizes. (2) CRWAMM has good scalability. As illustrated in the above figures, CRWAMM is always least when the network size increases. (3) CRWAMM is a polynomial time algorithm which has low complexity.

7.1.2 Simulation with real data traces

In order to know the real impact of the presented methods, traces from some application benchmark suites selected from PARSEC [27] are used. We form a 64-node on-chip network ($4\times 4\times 4$) that four layers are attached on top of each other. Figure 7 gives the simulation results with the realistic data traces.

Fig. 7. Average number of wavelengths in different applications of trace-based simulations in a $4\times 4\times 4$ 3D ONoC

Download Full Size | PDF

It can be seen that CRWAMM consistently reduces the number of wavelengths required across all tested applications. Compared with TB and PB, CRWAMM reduces the number of wavelengths by 34.3% and 13.55%, respectively. As we discussed before, CRWAMM considers the group of multiple multicasts as a whole and designs a routing algorithm from the global perspective. It uses non-overlapping links efficiently, reducing the links congestion. Other multicast routing schemes have no such ability as they were designed for single multicast and they only consider each multicast individually, which will increase links conflicts. Thus, CRWAMM can achieve much better performance than the other multicast routing schemes in the number of wavelength used.

7.2 Packet latency

The end-to-end packet latency is the time interval that a packet is sent by the source core until it is received by the destination core. In CRWAMM, the optical routing paths and wavelengths are allocated periodically before communication. Thus, in the time period of a specific distribution of multiple multicasts, the average packet latency, denoted by D, can be calculated as

(1)$$D = D_{eo} + N_{hop}/V_{os} + D_{oe}+D_{p}+D_{c},$$

where $D_{eo}$ and $D_{oe}$ are the delays for E-O and O-E conversions respectively, $N_{hop}$ is the average length of optical routing paths in the hops of routers, and $V_{os}$ is the transmission speed of optical signal in a waveguide. $D_{p}$ is the processing delay for centralized routing and wavelength allocation scheme, and $D_{c}$ is the configuration delay. In this simulation, we compare CRWAMM with basic routing schemes (i.e., PB, TB), as well as ENoC with XYZ routing. To make fair comparison, we use the same parameter settings for all the schemes, which are summarized in Table 2.

Table 2. Latency Parameters

View Table | View all tables in this article

Figure 8 (a) shows the average latency for different network sizes. It can be seen that the average latency is similar for CRWAMM, PB, and TB. This is because the average latency in ONoC is mainly determined by the average length of routing paths from Eq. (1). Since the high speed of optical signals, e.g., $V_{os} = 8hops/cycle$ for $8\times 8$ mesh in a $20mm \times 20mm$ chip [28] with the system clock working at the frequency of 5GHz, the difference of latency between different routing schemes is very small. We notice that the latency in ENoC is much higher than other routing schemes in ONoC. This is because the ENoC has to route the packets hop by hop from the source to the destination in each communication, while the ONoC uses the pre-configured non-blocking optical paths between nodes.

Fig. 8. Performance comparison of different routing schemes. (a) latency under different network sizes. (b) power consumption

Download Full Size | PDF

7.3 Power consumption

The power consumption of multicast communication in 3D ONoC can be divided into two parts: electrical power consumption ($P_E$) and optical power consumption ($P_O$). The optical power consumption is provided by the laser source and power attenuation incurred by all the optical devices on routing paths, which can be computed as [29]:

(2)$$P_O = {\frac{1}{\eta}}\times N_{\lambda}\times 10^{P_{rs}}\times 10^{IL_{wil}}\times N_{mn},$$

where $\eta$ is the power efficiency of laser source, $N_\lambda$ is the number of wavelengths used, $P_{rs}$ is the sensitivity of photodetector, $N_{mn}$ is the number of multicast nodes, and $IL_{wil}$ is the worst-case insertion loss of optical devices (e.g., waveguides, MRs, and optical routers) on the optical routing path from the source to the destinations. The electronic power consumption mainly consists of the dynamic power for modulation (E-O conversion) and photodetection (O-E conversion), and the static power for thermally tuning the resonant wavelength of $MRs$. Thus, $P_E$ can be calculated as [29]:

(3)$$P_E \!=\!\! N_{MR}\times N_\lambda \times P_{MT} + (E_{EO}\!+\!E_{OE})\times B_o\times N_\lambda \times \sum \theta_i,$$

where $N_{MR}$ is the total number of $MRs$ used for each wavelength; $P_{MT}$ is the tuning power for each $MR$; $E_{EO}$ and $E_{OE}$ are the energy costs for modulation and photodetection, respectively; $B_o$ is the bandwidth of optical link for each wavelength; $\theta _i$ is the actual traffic load of an active core, $\theta _i\in [0, 1]$. Since the length of vertical links is significantly less than planar links (e.g., $1.25mm$ for planar links, $100\sim 200\mu m$ for vertical links) [30], the electrical power consumption of vertical links is very small compared to the planar links, As a result, we neglect it when we calculate the electrical power model. The typical power parameters for the optical devices are used to analyze the overall power consumption [31,32] (Table 3).

Figure 8 (b) shows the average power consumption for different routing schemes (i.e., PB, TB, CRWAMM) in 3D ONoC and XYZ routing in 3D ENoC in $4\times 4\times 4$ network. It can be seen that CRWAMM has the least overall power consumption compared with others. Specifically, the ENoC consumes much higher power because of the hop-by-hop routing and buffering in the electrical routers, especially when the distances between nodes are increased. For the ONoC with PB and TB routings, they use more wavelengths than CRWAMM; therefore, they lead to higher electrical power consumption for tuning the resonant wavelengths of MRs. According to Fig. 8 (b), the total power consumption of CRWAMM is about $390mw$, which is $24.66\%$ less than the other ONoC schemes.

Table 3. Power Consumption Parameters

View Table | View all tables in this article

8. Conclusion

In this paper, the routing and wavelength assignment problem for multiple multicasts in a mesh-based 3D ONoC has been studied. Firstly, six Dimension-Ordering Routings (DORs) for 3D ONoC were proposed, which were used as the foundation of routing design. Then, the distribution of multicast nodes was analysed in mesh-based 3D ONoC and six special distributions for multicast nodes were derived. Specifically, six routing theorems were proposed, which can establish non-overlapping routing paths for any given special distribution using only one wavelength. Based on these routing theorems, the method was extended for general distributions, where the distribution of multicast nodes is random. A Cluster-based Routing and Wavelength Assignment method for Multiple Multicasts (CRWAMM) was designed by decoupling all multicast nodes into a number of clusters, each of which satisfies the conditions of the corresponding routing theorem. The upper bound of the number of wavelengths was proved to be the number of clusters. Finally, extensive simulations were carried out, which showed that (1) CRWAMM outperforms other routing schemes in terms of the number of wavelengths required; (2) CRWAMM has the advantages of a low routing complexity, a low wavelength requirement and good scalability.

Disclosures

The authors declare no conflicts of interest.

Data availability

The datasets presented in this paper are available from the corresponding author upon reasonable request.

References

1. A. Shacham, K. Bergman, and L. P. Carloni, “Photonic Networks-on-Chip for future generations of chip multiprocessors,” IEEE Trans. Comput. 57(9), 1246–1260 (2008). [CrossRef]

2. V. F. Pavlidis, I. Savidis, and E. G. Friedman, Three-dimensional integrated circuit design (Newnes, 2017).

3. P. Abad, V. Puente, and J.-A. Gregorio, “MRR: Enabling fully adaptive multicast routing for CMP interconnection networks,” in High Performance Comput. Archit. (HPCA), (2009), pp. 355–366.

4. S. Abadal, A. Mestres, R. Martinez, E. Alarcon, and A. Cabellos-Aparicio, “Multicast on-chip traffic analysis targeting manycore noc design,” in PDP, (IEEE, 2015), pp. 370–378.

5. T. Krishna, L.-S. Peh, B. M. Beckmann, and S. K. Reinhardt, “Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication,” in MICRO, (ACM, 2011), pp. 71–82.

6. S. Abadal, R. Martínez, E. Alarcón, and A. Cabellos-Aparicio, “Scalability-oriented multicast traffic characterization,” in IEEE/ACM NoCS, (2014), pp. 180–181.

7. Z. Wang, H. Gu, Y. Yang, H. Zhang, and Y. Chen, “An adaptive partition-based multicast routing scheme for mesh-based networks-on-chip,” Comput. & Electr. Eng. 51, 235–251 (2016). [CrossRef]

8. F. A. Samman, T. Hollstein, and M. Glesner, “Adaptive and deadlock-free tree-based multicast routing for networks-on-chip,” IEEE Trans. VLSI Syst. 18(7), 1067–1080 (2010). [CrossRef]

9. H. A. Harutyunyan and S. Wang, “Two New Multicast Algorithms in 3D Mesh and Torus Networks,” in IEEE Int. Conf. HPCC, CSS, ICESS, (2014), pp. 1150–1157.

10. B. K. Joardar, K. Duraisamy, and P. P. Pande, “High performance collective communication-aware 3D Network-on-Chip architectures,” in Des. Autom.Test Eur. Conf. Exhib. (DATE), (IEEE, 2018), pp. 1351–1356.

11. Y. Ouyang, F. Tang, C. Hu, W. Zhou, and Q. Wang, “Mmnnn: A tree-based multicast mechanism for noc-based deep neural network accelerators,” Microprocessors and Microsystems p. 104242 (2021).

12. B. Tiwari, M. Yang, Y. Jiang, and X. Wang, “Efficient on-chip multicast routing based on dynamic partition merging,” in 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), (IEEE, 2020), pp. 274–281.

13. M. Ebrahimi, M. Daneshtalab, P. Liljeberg, J. Plosila, J. Flich, and H. Tenhunen, “Path-based partitioning methods for 3D Networks-on-Chip with minimal adaptive routing,” IEEE Trans. Comput. 63(3), 718–733 (2014). [CrossRef]

14. P. Bahrebar, A. Jalalvand, and D. Stroobandt, “Adaptive multicast routing method for 3D mesh-based Networks-on-Chip,” in 27th IEEE Int. System-on-Chip Conf. (SOCC), (2014), pp. 70–75.

15. E.-O. Amnah and W.-L. Zuo, “Hamiltonian paths for designing deadlock-free multicasting wormhole-routing algorithms in 3D meshes,” J. Appl. Sci. 7(22), 3410–3419 (2007). [CrossRef]

16. L. Wei and L. Zhou, “An equilibrium partitioning method for multicast traffic in 3D NoC architecture,” in IEEE/IFIP Int. Conf. VLSI, (2015), pp. 128–133.

17. R. Salamat, M. Khayambashi, M. Ebrahimi, and N. Bagherzadeh, “Lead: An adaptive 3d-noc routing algorithm with queuing-theory based analytical verification,” IEEE Trans. Comput. 67, 1 (2018). [CrossRef]

18. F. Dubois, A. Sheibanyrad, F. Petrot, and M. Bahmani, “Elevator-First: A Deadlock-Free Distributed Routing Algorithm for Vertically Partially Connected 3D-NoCs,” IEEE Trans. Comput. 62(3), 609–615 (2013). [CrossRef]

19. L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efficient resource allocation for all-optical multicasting over spectrum-sliced elastic optical networks,” J. Opt. Commun. Netw. 5(8), 836–847 (2013). [CrossRef]

20. P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly efficient data migration and backup for big data applications in elastic optical inter-data-center networks,” IEEE Network 29(5), 36–42 (2015). [CrossRef]

21. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic service provisioning in elastic optical networks with hybrid single-/multi-path routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

22. M. Tarhani, S. Sarkar, M. K. Eghbal, and M. Shadaram, “Efficient multicasting technique for elastic optical network,” IET Networks (2021).

23. J. M. Monta nana, M. Koibuchi, H. Matsutani, and H. Amano, “Balanced dimension-order routing for k-ary n-cubes,” in Int. Conf. Parall. Processing Workshops (ICPPW’09)., (IEEE, 2009), pp. 499–506.

24. F. Liu, H. Zhang, Y. Chen, Z. Huang, and H. Gu, “Dynamic ring-based multicast with wavelength reuse for optical network on chips,” in IEEE 10th Int. Symp. Embedded Multicore/Many-core Systems-on-Chip (MCSOC), (2016), pp. 153–160.

25. C. Batten, A. Joshi, V. Stojanovć, and K. Asanović, “Designing chip-level nanophotonic interconnection networks,” in Integrated Optical Interconnect Architectures for Embedded Systems, (Springer, 2013), pp. 81–135.

26. W. Yang, Y. Chen, Z. Huang, and H. Zhang, “RWADMM: Routing and Wavelength Assignment for Distribution-Based Multiple Multicasts in ONoC,” in IEEE Int. Symp. ISPA/IUCC, (2017), pp. 550–557.

27. J. Hestness, B. Grot, and S. W. Keckler, “Netrace: dependency-driven trace-based Network-on-Chip simulation,” in Int. Workshop on Network on Chip Archit., (2010), pp. 31–36.

28. J. Liu, J. Yang, and R. Melhem, “Gasolin: global arbitration for streams of data in optical links,” in IEEE Int. Parallel and Distributed Processing Symp., (2015), pp. 93–102.

29. C. Li, M. Browning, P. V. Gratz, and S. Palermo, “Luminoc: A power-efficient, high-performance, photonic network-on-chip,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 33(6), 826–838 (2014). [CrossRef]

30. R. W. Morris, A. K. Kodi, A. Louri, and R. D. Whaley, “Three-dimensional stacked nanophotonic network-on-chip architecture with minimal reconfiguration,” IEEE Trans. Comput. 63(1), 243–255 (2014). [CrossRef]

31. J. Chan, G. Hendry, A. Biberman, K. Bergman, and L. P. Carloni, “Phoenixsim: A simulator for physical-layer analysis of chip-scale photonic interconnection networks,” in Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), (IEEE, 2010), pp. 691–696.

32. X. Zhang and A. Louri, “A multilayer nanophotonic interconnection network for on-chip many-core communications,” in Proceedings of the 47th Design Automation Conference, (2010), pp. 156–161.

(a) SD=0			(1) SD=1
SD	SI	Routing Selection	SD	SI	Routing Selection
0	0	XZY,XYZ	1	0	YZX,ZYX
0	1	YZX,YXZ	1	1	XZY,ZXY
0	2	ZYX,ZXY	1	2	XYZ,YXZ

Parameter	Value	Parameter	Value
Channel bandwidth	$10 G b p s / λ$	Clock frequency	$5 G H Z$
Transmission speed	$8 r o u t e r s / c y c l e$	Processing delay	$2 c y c l e s$
O-E conversion delay	$1 c y c l e$	E-O conversion delay	$1 c y c l e$

Parameter	Value	Parameter	Value
Modulator	85 $f J / b i t$	MR drop	0.5 $d B / M R$
Photodetector	50 $f J / b i t$	MR pass	0.005 $d B / M R$
Receiver sensitivity	-26 $d B m$	Thermal tuning	26 $μ W / M R$

(a) SD=0			(1) SD=1
SD	SI	Routing Selection	SD	SI	Routing Selection
0	0	XZY,XYZ	1	0	YZX,ZYX
0	1	YZX,YXZ	1	1	XZY,ZXY
0	2	ZYX,ZXY	1	2	XYZ,YXZ

Parameter	Value	Parameter	Value
Channel bandwidth	$10 G b p s / λ$	Clock frequency	$5 G H Z$
Transmission speed	$8 r o u t e r s / c y c l e$	Processing delay	$2 c y c l e s$
O-E conversion delay	$1 c y c l e$	E-O conversion delay	$1 c y c l e$

Cluster-based routing and wavelength assignment for multiple multicasts in 3D Optical Network-on-Chip

Abstract

1. Introduction

2. Related work

3. Basic routing schemes in 3D ONoC

3.1 Preliminaries

3.2 Dimension-Ordering Routing in 3D ONoC

3.3 Properties of the proposed DORs

3.4 Distribution of multicast nodes

4. Optimal routing and wavelength assignment of multiple multicasts for special instances in 3D ONoC

5. Heuristic routing and wavelength assignment of multiple multicasts for general instances in 3D ONoC

6. System implementation

6.1 Network architecture

6.1.1 Core plane

6.1.2 Optical control plane

6.1.3 Data transmission plane

6.2 Communication scheme

7. Performance evaluation

7.1 Wavelength usage

7.1.1 Simulation with synthetic traffic patterns

7.1.2 Simulation with real data traces

7.2 Packet latency

7.3 Power consumption

8. Conclusion

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (3)

Equations (3)

Optics Express