## Abstract

We numerically and experimentally demonstrate the utilization of the synchronization of chaotic lasers for decision making. We perform decision making to solve the multi-armed bandit problem using lag synchronization of chaos in mutually coupled semiconductor lasers. We observe the spontaneous exchanges of the leader-laggard relationship under lag synchronization of chaos, and we find that the leader laser can be controlled by changing the coupling strengths between the two lasers. To solve the multi-armed bandit problem, we select one of the slot machines with unknown hit probabilities based only on the identity of the leader laser while reconfiguring the coupling strength to determine the correct decision. We successfully perform an on-line experimental demonstration of the decision making based on the two-laser coupled architecture. This is the first time that synchronization in chaotic lasers is utilized for decision making, and this study paves the way for novel resources for future photonic intelligence.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Reinforcement learning [1] has been utilized for a variety of applications ranging from robotics [2] to parameter searching for neural networks [3], and even gaming [4,5]. Reinforcement learning is a type of artificial intelligence that seeks to maximize the total reward without using training data. One of the fundamental problems in reinforcement learning is the multi-armed bandit (MAB) problem [6]. In the MAB problem, it is assumed that a player selects one slot machine from among multiple slot machines with unknown hit probabilities to maximize the total reward (coins) from the slot machines. The player does not know which slot machine has the maximum hit probability; therefore, the player randomly plays multiple slot machines at the beginning and estimates the slot machine with the maximum hit probability. A certain number of plays is required to search for the slot machine with the maximum hit probability (exploration). However, too many plays reduce the chances of the plays for the slot machine with the maximum hit probability (exploitation). Therefore, it is important to set a balance between the two operations, which is known as the trade-off of the exploration-exploitation dilemma in the MAB problem.

Physical implementations for solving the MAB problem have been demonstrated in photonic systems using quantum dots at the nanoscale [7,8] and single photons [9]. However, the speed of the fluctuations is limited to the order of Hertz by the experimental measurement systems. Recently, decision making using the chaotic temporal waveform in semiconductor lasers was reported [10,11]. Chaotic laser waveforms in semiconductor lasers [12] have been used for physical random number generation [13–15], secure key distribution [16–18], and reservoir computing [19–21], based on the fast and complex temporal waveforms of chaotic lasers. For the implementation of chaotic temporal waveforms [10,11], one of the two slot machines is selected by comparing the chaotic temporal waveform with a variable threshold. The threshold is adjusted based on the reward of the selected slot machine. It has been demonstrated that the decision making between the two slot machines is, in principle, achieved at a rate of 1 GHz [10,11]. It has also been shown that the sampling interval of the temporal waveforms with a negative correlation improves the performance of adaptive decision making. In addition, scalable decision making has been proposed by introducing time-division multiplexing of chaotic temporal waveforms and combining multiple thresholds in a tree structure [22].

However, the physical dynamics of a semiconductor laser has not been used directly to solve the bandit problem in these schemes because the threshold adjustment for the decision-making algorithm is achieved using software. Very recently, Homma et al. proposed the use of spontaneous mode switching in a micro-ring laser for decision making [23], which addresses the issue of utilizing the intrinsic physical attributes of lasers directly for decision making.

From the dynamics point of view, lag synchronization of chaos has been reported in mutually coupled semiconductor lasers [24–31]. Lag synchronization of chaos has been observed in the low-frequency fluctuation (LFF) regime. LFFs consist of sudden power dropouts and the subsequent gradual power recovery of the laser intensity [12,24]. In the LFF regime, one laser is synchronized with the other laser by using the coupling delay time. This phenomenon is called the leader-laggard relationship [24], where the laser oscillating in advance is called the “leader”, and the other is called the “laggard.” In addition, spontaneous exchange of the leader-laggard relationship has been reported for each coupling delay time, and it can be controlled by adjusting the detuning of the initial optical frequencies [31].

This spontaneous dynamical phenomenon can be applied for solving the MAB problem using photonic systems. In particular, it is expected that the spontaneous exchange of the leader laser at each propagation delay time could be used to select one of the slot machines alternately for the purpose of efficient exploration for decision making. We expect that the implementation using photonic dynamical systems could result in fast and efficient decision-making systems.

In this study, we demonstrate the utilization of the synchronization of chaotic lasers to perform decision making. Then, we numerically and experimentally demonstrate a scheme for solving the multi-armed bandit problem in decision making using the lag synchronization of chaos in mutually coupled semiconductor lasers. We numerically investigate the spontaneous exchange of the leader-laggard relationship and control the leader-laggard relationship by changing the coupling strengths between the two lasers. We then apply this physical principle to solve the two-armed bandit problem by changing the detuning of the coupling strengths and examining the detailed parameter dependencies on the decision-making performance. We experimentally implement this principle in a two-laser coupled system, in which the two-armed bandit problem is successfully resolved.

## 2. Numerical simulations

#### 2.1 Numerical model

Figure 1 shows our numerical model of mutually coupled semiconductor lasers. Two semiconductor lasers are mutually coupled by the coupling strengths *κ*_{1} (from laser 1 to laser 2) and *κ*_{2} (from laser 2 to laser 1) with a coupling delay time *τ*. The model of mutually coupled semiconductor lasers is described by the Lang-Kobayashi equations as follows [31,32].

Laser 1:

*E*

_{1,2}(

*t*) and

*N*

_{1,2}(

*t*) are the complex electric-field amplitude and carrier density of the semiconductor lasers, respectively. The subscripts 1 and 2 represent lasers 1 and 2, respectively. We set the coupling strengths for both lasers to 30 ns

^{−1}(

*κ*

_{1 }=

*κ*

_{2 }= 30 ns

^{−1}). The initial optical frequency detuning Δ

*f*

_{ini}= (

*ω*

_{1}−

*ω*

_{2}) / (2

*π*) is set to 0 Hz in order to maintain the symmetry between laser 1 and 2. The numerical results for the cases of non-zero frequency detuning are summarized in the Appendix. The coupling delay time between the two lasers is set to

*τ*= 5.0 ns. The other parameter values are summarized in Table 1.

We observe the lag synchronization of chaos and spontaneous exchange of leader-laggard relationship from the temporal waveforms of laser outputs. Figure 2(a) shows the temporal waveforms of the two laser outputs. The red and blue curves indicate the laser intensities *I*_{1,2}(*t*) of laser 1 and 2, given by *I*_{1,2}(*t*) = |*E*_{1,2}(*t*)|^{2}, respectively. Fast irregular oscillations are observed for both lasers. The temporal waveforms filtered by a low-pass filter to remove the high-frequency components are shown in Fig. 2(b). The cut-off frequency of the low-pass filter is set to 60 MHz. In Fig. 2(b), the low-pass-filtered laser intensities show the abrupt dropouts and the subsequent gradual recovery. Therefore, the two laser outputs oscillate in the LFF regime [12,24]. The dropouts of the two laser outputs are synchronized with the coupling delay time *τ*. Thus, we found that lag synchronization of chaos is observed in mutually coupled semiconductor lasers.

To determine the leader laser in a short-term interval, we calculate two short-term cross-correlation values [31]. The two short-term cross-correlation values *C*_{1}(*t*) and *C*_{2}(*t*) are defined as follows.

*C*

_{1}(

*t*) is the cross-correlation between the intensity of time-delayed laser 1

*I*

_{1}(

*t*−

*τ*) and laser 2

*I*

_{2}(

*t*) for the coupling delay time

*τ*.

*C*

_{2}(

*t*) is the cross-correlation between the intensity of laser 1

*I*

_{1}(

*t*) and time-delayed laser 2

*I*

_{2}(

*t*−

*τ*) for

*τ*, $< \cdot { > _\tau }$ represents the short-term average over the period

*τ*, i.e., $< f{ > _\tau } = ({1/T} )\mathop \sum \nolimits_{j = 0}^{T - 1} f({t - jh} )$ for the function

*f*, where

*h*is the sampling interval of the laser intensity, and

*T*=

*τ*/

*h*is the number of sampled data within

*τ*. When

*I*

_{1}(

*t*−

*τ*) and

*I*

_{2}(

*t*) are synchronized well, we consider that laser 1 is the leader, and

*C*

_{1}(

*t*) is larger than

*C*

_{2}(

*t*). On the contrary, when

*I*

_{1}(

*t*) and

*I*

_{2}(

*t*−

*τ*) are synchronized, we consider that laser 2 is the leader, and

*C*

_{2}(

*t*) is larger than

*C*

_{1}(

*t*). Therefore, the leader-laggard relationship can be determined by comparing

*C*

_{1}(

*t*) with

*C*

_{2}(

*t*) [31].

Figure 3(a) shows the temporal evolution of *C*_{1}(*t*) and *C*_{2}(*t*) $,$ calculated from the temporal waveforms in Fig. 2(a). In Fig. 3(a), the values of *C*_{1}(*t*) and *C*_{2}(*t*) are exchanged alternately after the occurrence of the intensity dropouts and the subsequent gradual recovery. The leader laser is exchanged spontaneously between lasers 1 and 2. Figure 3(b) shows an enlarged view of the low-pass-filtered temporal waveform (upper curves) and the temporal evolution of *C*_{1}(*t*) and *C*_{2}(*t*) (lower curves). It is clearly seen that the values of *C*_{1}(*t*) and *C*_{2}(*t*) are exchanged alternately for each coupling delay time *τ* = 5.0 ns. The spontaneous exchange of the leader-laggard relationship is confirmed in the numerical simulation.

We investigate the probability of becoming the leader laser, which is calculated from the probability of the time duration for which one of the lasers is the leader by comparing *C*_{1}(*t*) with *C*_{2}(*t*). We introduce the detuning of the coupling strengths Δ*κ* = *κ*_{1} − *κ*_{2}, and the coupling strengths *κ*_{1} and *κ*_{2} are changed using Δ*κ* as follows.

*κ*

_{max}is the maximum coupling strength (

*κ*

_{max}= 30 ns

^{−1}). One of the coupling strengths is fixed at the maximum value, and the other is decreased to obtain coupling strengths as large as possible between the two lasers, as implemented in the experiment (see Section 3). For example,

*κ*

_{1}is fixed at

*κ*

_{max}, and

*κ*

_{2}is decreased for a positive Δ

*κ*. In contrast,

*κ*

_{2}is fixed at

*κ*

_{max}, and

*κ*

_{1}is decreased for a negative Δ

*κ*. Both

*κ*

_{1}and

*κ*

_{2}are fixed at

*κ*

_{max}for Δ

*κ*= 0.

We observe the temporal waveforms of the two laser outputs for 10 *μ*s for a fixed Δ*κ* and calculate the probability of becoming the leader laser by comparing *C*_{1}(*t*) and *C*_{2}(*t*). The probabilities of becoming the leader laser for lasers 1 and 2 are defined as *L*_{1} = *T*_{1} / *T*_{total} and *L*_{2} = *T*_{2} / *T*_{total}, respectively, where *T*_{1} and *T*_{2} are the time durations for becoming the leader laser for lasers 1 and 2, respectively, and *T*_{total} is the total time duration (10 *μ*s) for the calculation of *C*_{1}(*t*) and *C*_{2}(*t*).

Figure 4 shows the probabilities of becoming the leader laser for *L*_{1} and *L*_{2} when the detuning of the coupling strength Δ*κ* is adjusted. The probability of the leader laser is 0.5 when Δ*κ* = 0 for both lasers. Laser 1 becomes the leader with a higher probability when Δ*κ* is positive (*L*_{1} > *L*_{2}), and *L*_{1} is close to 1 for a large positive Δ*κ*. On the contrary, laser 2 becomes the leader with a higher probability when Δ*κ* is negative (*L*_{1} < *L*_{2}), and *L*_{2} is close to 1 for a small negative Δ*κ*. The two probabilities *L*_{1} and *L*_{2} are symmetric for the change in Δ*κ*. Asymmetric curves of the leader probabilities can be observed in the cases of non-zero values of the initial optical frequency detuning Δ*f*_{ini}, as described in the Appendix. We found that *L*_{1} and *L*_{2} can be controlled by changing Δκ. This characteristic can be applied for decision making, as described in the next section.

#### 2.2 Decision making for the multi-armed bandit problem

We propose a decision making method for the multi-armed bandit problem using the spontaneous exchange of the leader-laggard relationship. Here, we consider the decision making of two slot machines with unknown hit probabilities. We consider the situation in which one of the slot machines is selected, and the slot machine returns the result of “hit” or “miss”. The decision is made to select the slot machine with the highest hit probability, based on the past results of the slot machine selection.

Figure 5 shows our decision making scheme based on the lag synchronization of chaos. First, two coupled lasers (lasers 1 and 2) are assigned to two slot machines (slot machines 1 and 2, denoted as *S*_{1} and *S*_{2}), respectively. The spontaneous exchange of the leader-laggard relationship is observed in the two coupled lasers, and one of the slot machines corresponding to the leader laser is selected. For example, *S*_{1} is selected if laser 1 is the leader, and *S*_{2} is selected if laser 2 is the leader. Next, the detuning of the coupling strengths Δ*κ* is changed based on the result of the slot machine selection. For example, if *S*_{1} is selected and the result of *S*_{1} is a “hit” (or “miss”), Δ*κ* is changed in the positive (or negative) direction, such that the probability of becoming the leader for laser 1 *L*_{1} can be increased (or decreased), as shown in Fig. 4. On the contrary, if *S*_{2} is selected and the result of *S*_{2} is “hit” (or “miss”), Δ*κ* is changed in the negative (or positive) direction, such that *L*_{2} can be increased (or decreased). This procedure is repeated until *L*_{1} or *L*_{2} reaches 1 for the final decision.

The detuning of coupling strengths Δ*κ*(*t*) for decision making is changed as follows.

*k*is the step width, and

*N*is the number of step levels for positive (or negative) Δ

*κ*(

*t*) [9]. The total number of step levels is 2

*N*+ 1.

*TA*(

*t*) is the threshold adjuster value [33,34], which is defined as where

*a*is the memory parameter [11].

*X*(

*t*) is defined from the result of the selected slot machine, as shown in Table 2. In Table 2, Δ and Ω are the shifts in the detuning of the coupling strengths in the cases of “hit” and “miss”, respectively, and they are defined as where ${\bar{P}_{S1}}$ =

*N*

_{S1,hit}/

*N*

_{S1,total}and ${\bar{P}_{S2}}$ =

*N*

_{S2,hit}/

*N*

_{S2,total}, are the estimated hit probabilities of

*S*

_{1}and

*S*

_{2}, respectively, while

*N*

_{Si,hit}is the number of “hits” for slot machine

*i*and

*N*

_{Si,total}is the total number of selections for slot machine

*i*. Note that the expressions of Δ and Ω are modified from that in the previous method [10,11] to avoid values of Ω that are too large or small; Ω = (${\bar{P}_{S1}}$ + ${\bar{P}_{S2}}$) / {2 − (${\bar{P}_{S1}}$ + ${\bar{P}_{S2}}$} and Δ = 1 were used in [10,11]. In addition, zero-prior-knowledge is assumed in this work (i.e.,

*P*+

_{S1}*P*is unknown in advance [10]), which provides a more difficult situation to solve.

_{S2}The procedure of decision making is repeated for each sampling interval *τ*_{SI} by selecting one of the slot machines based on the comparison of C_{1}(*t*) and C_{2}(*t*), and by changing the detuning of the coupling strengths Δ*κ*(*t*) based on the result of the slot machine selection. We start this procedure when the intensity dropout is observed in one of the coupled lasers in the LFF regime.

We investigate the decision-making using lag synchronization of chaos in the LFF regime. We set the parameter values for decision-making as shown in Table 3. Figure 6 shows the short-term cross-correlation value, the results of the slot machine selection, and the estimated hit probabilities for 50 plays of selecting the slot machines. The hit probabilities of slot machines 1 and 2 are set to *P _{S}*

_{1}= 0.4 and

*P*

_{S}_{2}= 0.6 (unknown to the player), respectively. In Fig. 6(a), the values of the short-term cross-correlation are exchanged for each 5 ns of the coupling delay time

*τ*after the intensity dropout. One of the slot machines is selected for approximately five consecutive plays, and the other slot machine is selected for the next five plays, as shown in Fig. 6(b), because the sampling interval

*τ*

_{SI}for decision making is set to 1 ns in this case. In Fig. 6(c), the estimated hit probabilities ${\bar{P}_{S1}}$ and ${\bar{P}_{S2}}$ converge close to the ideal hit probabilities

*P*

_{S}_{1}= 0.4 and

*P*

_{S}_{2}= 0.6, respectively. After the estimation of ${\bar{P}_{S1}}$ and ${\bar{P}_{S2}}$,

*S*

_{2}is selected for all the plays in Fig. 6(b), and the correct decision is made successfully (

*P*

_{S}_{2}>

*P*

_{S}_{1}). Therefore, the spontaneous exchange of the leader-laggard relationship can be used for the exploration of decision making.

We evaluate the correct decision rate (CDR) of the slot machine selection. CDR is defined as the ratio of the number of plays in selecting the slot machine with the highest hit probability to the total number of plays in selecting the slot machines. CDR is calculated as follows.

*C*(

*i*,

*t*) gives 1 if the slot machine with the highest hit probability is selected at the

*t*-th play in the

*i*-th cycle, or 0 otherwise. The total number of cycles is given by

*n*. Faster convergence of CDR close to 1 indicates better decision-making performance.

We select the slot machines for 1000 plays (*m* = 1000) and consider this procedure as one cycle. We then repeat playing the slot machines for 100 cycles (*n* = 100) to calculate the CDR. We set different combinations for the hit probabilities: {*P _{S}*

_{1},

*P*

_{S}_{2}} = {0.2, 0.8}, {0.3, 0.7}, and {0.4, 0.6}. These parameter values are summarized in Table 4. The other parameter values are the same as in Table 3.

Figure 7 shows the CDR as the number of plays is changed for different hit probabilities. In Fig. 7, the CDR for the pair of hit probabilities {*P _{S}*

_{1},

*P*

_{S}_{2}} = {0.2, 0.8} approaches 1 quicker than the cases of {

*P*

_{S}_{1},

*P*

_{S}_{2}} = {0.3, 0.7} and {0.4, 0.6}. The convergence of CDR is affected by the difficulty of the decision-making problem, which depends on the difference between the two hit probabilities. In fact, a smaller difference between

*P*

_{S}_{1}and

*P*

_{S}_{2}requires a larger number of plays for the convergence of the CDR to 1 to estimate the correct hit probabilities. From these results, we numerically perform decision making using the lag synchronization of chaos in mutually coupled semiconductor lasers.

We estimate the speed of decision making as follows. The sampling interval is set to 1.0 ns, and the frequency of selecting each slot machine is 1.0 GHz. The correct decision can be made after 26 plays (CDR = 1) in the case of {*P _{S}*

_{1},

*P*

_{S}_{2}} = {0.2, 0.8}, as shown in Fig. 7. Therefore, the correct decision time is 26 ns, and the speed of correct decision making is 38 MHz. This speed depends on the combination of the hit probabilities {

*P*

_{S}_{1},

*P*

_{S}_{2}}, i.e., a larger difference between

*P*

_{S}_{1}and

*P*

_{S}_{2}corresponds to faster decision making, as shown in Fig. 7. The speed of decision making is not as fast as the case in which chaotic temporal waveforms are used [10,11]. However, we show that the synchronization of chaotic dynamics can be utilized for decision making on the order of tens of MHz. The speed can be improved by reducing the coupling delay time

*τ*and the sampling interval

*τ*

_{SI}(see Fig. 9 in Section 2.3).

#### 2.3 Parameter dependence

We investigate the parameter dependence of decision making in this subsection. First, we change the difference between the two hit probabilities. We change *P _{S}*

_{1}from 0.1 to 0.9 and set

*P*

_{S}_{2}= 1 −

*P*

_{S}_{1}to calculate the CDR. Figure 8(a) shows the CDR at

*t*= 1000 for different values of

*P*

_{S}_{1}. When

*P*

_{S}_{1}is far from 0.5, the CDR converges to 1. However, the CDR decreases as

*P*

_{S}_{1}becomes close to 0.5. This result indicates that the decision making becomes more difficult for smaller differences between

*P*

_{S}_{1}and

*P*

_{S}_{2}.

We also investigate the average hit rate (AHR), which is defined as follows.

where*H*(

_{i}*t*) = 1 if the selected slot machine is “hit”, whereas

*H*(

_{i}*t*) = 0 if the selected slot machine is “miss” at the

*t*-th play and the

*i*-th cycle. The AHR represents the total reward acquisition rate, which is defined as the ratio of the number of hits to the total number of plays and cycles.

Figure 8(b) shows the AHR for different values of *P _{S}*

_{1}. The AHR decreases as

*P*

_{S}_{1}is increased, and the minimum value is obtained when

*P*

_{S}_{1}= 0.5. Then, the AHR starts increasing with increase in

*P*

_{S}_{1}. In fact, the AHR is close to the maximum value of the two hit probabilities, i.e., max (

*P*

_{S}_{1}, 1 −

*P*

_{S}_{1}). Thus, we found that decision making is successful for different combination of

*P*

_{S}_{1}and

*P*

_{S}_{2}.

Next, we change the sampling interval *τ*_{SI} and investigate the AHR. The ratio between *τ*_{SI} and the coupling delay time *τ* (i.e., *τ* / *τ*_{SI}) determines the number of consecutive explorations for one of the slot machines (see Fig. 6(b)). Faster decision making can be achieved for smaller *τ*_{SI}; however, too many explorations of only one of the slot machines is required for smaller *τ*_{SI}.

We calculate the AHR as a function of the sampling interval *τ*_{SI} for different coupling delay times of *τ* = 2, 5, and 20 ns, as shown in Fig. 9(a). We set {*P _{S}*

_{1},

*P*

_{S}_{2}} = {0.4, 0.6}, and the ideal AHR is 0.6 in this case (the dotted line). We found that a high AHR is obtained for a large

*τ*

_{SI}. However, the AHR decreases for smaller values of

*τ*

_{SI}. This result indicates that too many explorations of one of the slot machines deteriorate the AHR. Compared with the results for different values of

*τ*, the AHR is saturated near 0.6 for all

*τ*when

*τ*

_{SI}is large. On the contrary, a larger AHR is obtained for smaller

*τ*when

*τ*

_{SI}is small. Therefore, faster decision making can be achieved for smaller

*τ*.

Figure 9(b) shows the AHR as a function of the ratio *τ* / *τ*_{SI} for different values of *τ*. The ratio *τ* / *τ*_{SI} approximately corresponds to the number of same slot machine selections during the delay time *τ*. All the three curves are overlapped, and no significant difference is found, compared with Fig. 9(a). The AHR becomes saturated close to the maximum hit probability (∼0.6) when the ratio *τ* / *τ*_{SI} is smaller than ∼10. This indicates that one of the slot machines is selected for about ten consecutive plays for *τ* / *τ*_{SI }= 10. Therefore, the ratio *τ* / *τ*_{SI} is a crucial parameter to determine the number of times the same slot machine is selected during one delay time.

## 3. Experimental results

#### 3.1 Experimental setup

To confirm our numerical investigation, we performed a decision-making experiment using the lag synchronization of chaos in mutually coupled semiconductor lasers. Figure 10(a) schematically shows our experimental setup. We used two distributed-feedback (DFB) semiconductor lasers (NTT Electronics, KELK1C5GAAA, wavelength of 1547 nm), referred to as laser 1 and laser 2. The two lasers were mutually coupled via unidirectional coupling from one laser to the other through optical isolators. A voltage-controlled variable attenuator (Thorlabs, V1550PA) was inserted into each unidirectional coupling path to adjust the coupling strength from one laser to the other laser independently. A manual variable attenuator was also inserted into each coupling path to adjust the coupling strengths symmetrically. The outputs of the two lasers were detected using photodetectors (New Focus, 1554-B, 12 GHz bandwidth) and amplified by electric amplifiers (New Focus, 1422-LF, 20 GHz bandwidth). The temporal waveforms of the laser outputs were measured using a digital oscilloscope (Tektronix, DPO72304DX, 23 GHz bandwidth, 100 GigaSamples/s). The radio frequency (RF) spectra were measured using a RF spectrum analyzer (Agilent, N9010A-544, 44 GHz bandwidth). The optical spectra were measured using an optical spectrum analyzer (Yokogawa, AQ6370B).

The injection currents of lasers 1 and 2 were set to 9.37 mA (1.1 *I _{th,1}*) and 8.47 mA (1.1

*I*), respectively, where

_{th,2}*I*was the injection current at the lasing threshold for the laser

_{th,i}*i*. The coupling delay times from laser 1 to 2 and from laser 2 to 1 were 45.22 ns and 43.79 ns, respectively. We considered the average of these two values as the coupling delay time

*τ*= 44.50 ns. The maximum injection power from laser 1 to 2 and from laser 2 to 1 were 39.22

*μ*W and 30.54

*μ*W, respectively. We changed the detuning of the coupling strength Δ

*κ*=

*κ*

_{1}–

*κ*

_{2}for decision making, where

*κ*

_{1}and

*κ*

_{2}were the coupling strengths normalized by the maximum injection power for each coupling path (i.e., 0 ≤

*κ*

_{1},

*κ*

_{2}≤ 1). The coupling strengths were changed as shown in Eqs. (9) and (10) in Section 2.1, where one of the coupling strengths was fixed at the maximum value, and the other was decreased to obtain coupling strengths as large as possible in the experiment.

Figure 10(b) shows our experimental decision-making system (see also Fig. 5). We emulated two slot machines in the embedded computer in the digital oscilloscope with pseudo-random numbers as a software implementation. The temporal waveforms of the two laser outputs were detected and stored in the oscilloscope. The short-term cross-correlation between the two temporal waveforms was calculated, and the leader laser was determined. One of the slot machines was selected based on the leader laser, e.g., slot machine 1 was selected if laser 1 was the leader; otherwise, slot machine 2 was selected. After the selection, the result of playing the slot machine (“hit” or “miss”) was sent to the decision-making algorithm, and the value of the coupling strengths was updated via the bias controller (Tektronix, AFG3152C). For example, if the result of slot machine *S _{i}* was hit, the coupling strength was changed so that the probability of leader laser

*L*could be increased. In contrast, if the result of

_{i}*S*was miss, the coupling strength was changed so that

_{i}*L*could be decreased. This procedure was repeated until either

_{i}*L*

_{1}or

*L*

_{2}converged to 1, and the decision was made regarding which slot machine had the highest hit probability. All the procedures were implemented in an on-line manner in this experiment on the decision-making system.

#### 3.2 Experimental observation of the lag synchronization of chaos

We first investigated the lag synchronization of chaos in the experiment. Figures 11(a) and 11(b) show the optical spectra of the two mutually coupled lasers without and with optical coupling, respectively. We precisely matched the optical wavelengths of the two lasers without coupling at 1547.098 nm (Δ*f*_{ini} = 0 Hz), as shown in Fig. 11(a). The two peaks in the optical spectra appear after the coupling between the two lasers, as shown in Fig. 11(b), and the wavelength of the maximum peak is shifted to 1547.219 nm. The optical spectra are well matched between the two lasers after the coupling because injection locking is achieved between the two lasers.

Figures 11(c) and 11(d) show the RF spectra of the two lasers without and with optical coupling, respectively. In Fig. 11(c), the peak of the relaxation oscillation frequency appears at 1.24 GHz for both lasers. After the coupling, broad spectral components with a peak frequency at around 4 GHz are observed in Fig. 11(d), which indicates the appearance of chaotic dynamics. Both the RF spectra are well matched with each other.

Figure 12 shows the experimentally obtained temporal waveforms of the two laser outputs when the detuning of the coupling strengths Δ*κ* = *κ*_{1} – *κ*_{2} is adjusted. When the coupling strength is very asymmetric (Δ*κ* = −0.90), chaotic oscillations are observed for both lasers, as shown in Fig. 12(a). As the coupling strengths become symmetric (Δ*κ* = −0.15), the dynamics switches from chaos to LFFs, as shown in Fig. 12(b). The LFF dynamics is observed near the symmetric condition of the two coupling strengths (Δ*κ* = 0.00) in Fig. 12(c). As Δ*κ* is increased further, the LFF dynamics (Δ*κ* = 0.15, Fig. 12(d)) and the chaotic oscillations (Δ*κ* = 0.90, Fig. 12(e)) are observed again for asymmetric coupling.

Figure 13 shows the short-term cross-correlations *C*_{1}(*t*) and *C*_{2}(*t*) for different coupling strengths Δ*κ*. The values of the coupling strengths correspond to each subfigure in Fig. 12. In Fig. 13(a), *C*_{2}(*t*) is always larger than *C*_{2}(*t*), and laser 2 is always the leader for Δ*κ* = −0.90. As Δ*κ* increases, the LFF dynamics appears, and the spontaneous exchange of *C*_{1}(*t*) and *C*_{2}(*t*) are observed at the intensity dropouts and the subsequent power recovery processes for Δ*κ* = −0.15, as shown in Fig. 13(b). The spontaneous exchange appears irregular for Δ*κ* = 0.00 in Fig. 13(c). The LFF dynamics and spontaneous exchange of *C*_{1}(*t*) and *C*_{2}(*t*) are also observed for Δ*κ* = 0.15 in Fig. 13(d). The dropout disappears, and *C*_{1}(*t*) is always larger than *C*_{2}(*t*) for Δ*κ* = 0.90 in Fig. 13(e) with chaotic dynamics, with laser 1 as the leader.

It is worth noting that *C*_{1}(*t*) always decreases earlier than *C*_{2}(*t*), and laser 2 becomes the leader laser at the beginning of the dropouts for Δ*κ* = −0.15 (Fig. 13(b)), whereas *C*_{2}(*t*) always decreases earlier than *C*_{1}(*t*) and laser 1 becomes the leader laser at the beginning of the dropouts for Δ*κ* = 0.15 (Fig. 13(d)). The spontaneous exchange occurs more irregularly for the symmetric coupling of Δ*κ* = 0.00 in Fig. 13(c), i.e., either *C*_{1}(*t*) or *C*_{2}(*t*) decreases earlier at the beginning of the dropouts.

Figure 14 shows the experimental result of the probabilities of becoming the leader for laser 1 and 2 (*L*_{1} and *L*_{2}). The leader probability changes as the detuning of the two coupling strengths Δ*κ* changes. The curves of *L*_{1} and *L*_{2} appear symmetric. *L*_{1} is larger than *L*_{2} for positive Δ*κ* (*κ*_{1} > *κ*_{2}), and *L*_{2} is larger than *L*_{1} for negative Δ*κ* (κ_{1} < κ_{2}). *L*_{1} becomes 1 for Δ*κ* ≥ 0.75, indicating that laser 1 is always the leader in this range, and *L*_{2} becomes 1 for Δ*κ* ≤ −0.75, indicating that laser 2 is always the leader in this range. In the middle range (0.2 ≤ |Δ*κ*| ≤ 0.6), there are plateaus for *L*_{1} and *L*_{2}, which correspond to the dynamical transitions between LFFs and chaotic oscillations. These curves are similar to the numerical result shown in Fig. 4, except for the existence of plateaus. The shape of these curves of leader probabilities may strongly affect the decision-making performance [23], and the optimal design of the curves of leader probabilities would be an interesting future study.

#### 3.3 Experimental results of decision making for the multi-armed bandit problem

We experimentally demonstrate the decision-making system by controlling the detuning of the coupling strengths Δ*κ* automatically. First, we acquire the temporal waveforms for *τ* = 44.50 ns to calculate the short-term cross-correlations *C*_{1}(*t*) and *C*_{2}(*t*) and to determine the leader laser. Two slot machines are emulated in the computer of the digital oscilloscope. The slot machine corresponding to the leader laser is selected, and the slot machine returns the result (“hit” or “miss”) (see Fig. 10(b)). Based on the result of the selected slot machine, Δ*κ* is changed with a step width of 0.05 between −0.95 to 0.95 (the total number of step levels is 2*N* + 1 = 39), using the bias controller and the voltage-controlled variable attenuators for the coupling strengths. The acquisition of the temporal waveforms and the control of Δ*κ* are repeated automatically by the computer in the digital oscilloscope for decision making.

We set the number of total plays *m* = 100 and the number of total cycles *n* = 100 to obtain the CDR. The combinations of the hit probabilities of the two slot machines are set to {*P _{S}*

_{1},

*P*

_{S}_{2}} = {0.8, 0.2} and {0.7, 0.3}, respectively. Figure 15 shows the CDR as the number of plays is changed for different hit probabilities. The CDR increases and reaches 1 in both cases. It is found that the CDR quickly converges to 1 when the difference between

*P*

_{S}_{1}and

*P*

_{S}_{2}is large, for which the slot machine with the highest hit probability can be selected easily. These experimental results agree well with our numerical results shown in Fig. 7. We succeeded in performing the on-line experimental demonstration of decision-making using lag synchronization of chaos in mutually coupled semiconductor lasers.

## 4. Conclusions

We demonstrated the utilization of the synchronization of chaotic lasers for decision making. We numerically and experimentally performed decision making for solving the multi-armed bandit problem using lag synchronization of chaos in mutually coupled semiconductor lasers. We observed the spontaneous exchange of the leader-laggard relationship under lag synchronization of chaos and found that the leader laser can be controlled by changing the detuning of the coupling strengths between the two lasers. The decision was based only on the identity of the leader laser. The betting results were used to update the coupling strengths between the two lasers. We succeeded in demonstrating the decision making for slot machines with different combinations of hit probabilities in the on-line experimental implementation. We found that a larger difference in the hit probabilities makes decision making easier.

This is the first demonstration of the utilization of lag synchronization of chaos in coupled lasers for decision making. This concept can be extended using the synchronization of laser networks for solving the multi-armed bandit problem with a large number of slot machines. We consider that the synchronization phenomena in coupled lasers and laser networks could be interesting resources for future photonic artificial intelligence.

## Appendix

We conducted additional numerical simulations with non-zero values of the initial optical frequency detuning Δ*f*_{ini} to demonstrate the feasibility of the numerical results. Figures 16(a) and 16(b) show the leader probabilities of the two lasers for the frequency detuning of 4 GHz and −4 GHz, respectively. These figures correspond to Fig. 4 in the case of zero frequency detuning. The shapes of the curves are similar; however, the crossing point of the two curves is shifted to negative and positive detuning in Figs. 16(a) and 16(b), respectively. Therefore, the asymmetric characteristic of the leader probability can be observed in the case of non-zero frequency detuning. These results can be used for decision making, even though the asymmetric feature may appear in the decision-making process.

We used the delay time of τ = 5.0 ns in the numerical simulation, which is shorter than the value obtained in the experiment (τ = 44.50 ns), to reduce the time consumed by the numerical simulation. However, a similar result of leader probabilities was obtained with the value of τ = 44.50 ns in the numerical simulation.

## Funding

Japan Society for the Promotion of Science (JP17H01277, JP19H00868); Core Research for Evolutional Science and Technology (JPMJCR17N2).

## Acknowledgement

We acknowledge Shoma Ohara for helpful discussions and comments.

## References

**1. **R. S. Sutton and A. G. Barto, * Reinforcement Learning* (MIT University, 1998).

**2. **O. B. Kroemer, R. Detry, J. Piater, and J. Peters, “Combining active learning and reactive control for robot grasping,” Rob. Auton. Syst. **58**(9), 1105–1116 (2010). [CrossRef]

**3. **J. Bueno, S. Maktoobi, L. Froehly, I. Fischer, M. Jacquot, L. Larger, and D. Brunner, “Reinforcement learning in a large-scale photonic recurrent neural network,” Optica **5**(6), 756–760 (2018). [CrossRef]

**4. **D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature **529**(7587), 484–489 (2016). [CrossRef]

**5. **D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature **550**(7676), 354–359 (2017). [CrossRef]

**6. **H. Robbins, “Some aspects of the sequential design of experiments,” Bull. Amer. Math. Soc. **58**(5), 527–536 (1952). [CrossRef]

**7. **S.-J. Kim, M. Naruse, M. Aono, M. Ohtsu, and M. Hara, “Decision maker based on nanoscale photo-excitation transfer,” Sci. Rep. **3**(1), 2370 (2013). [CrossRef]

**8. **M. Naruse, W. Nomura, M. Aono, M. Ohtsu, Y. Sonnefraud, A. Drezet, S. Huant, and S.-J. Kim, “Decision making based on optical excitation transfer via near-field interactions between quantum dots,” J. Appl. Phys. **116**(15), 154303 (2014). [CrossRef]

**9. **M. Naruse, M. Berthel, A. Drezet, S. Huant, M. Aono, H. Hori, and S.-J. Kim, “Single-photon decision maker,” Sci. Rep. **5**(1), 13253 (2015). [CrossRef]

**10. **M. Naruse, Y. Terashima, A. Uchida, and S.-J. Kim, “Ultrafast photonic reinforcement learning based on laser chaos,” Sci. Rep. **7**(1), 8772 (2017). [CrossRef]

**11. **T. Mihana, Y. Terashima, M. Naruse, S.-J. Kim, and A. Uchida, “Memory effect on adaptive decision making with a chaotic semiconductor laser,” Complexity **2018**, 4318127 (2018). [CrossRef]

**12. **A. Uchida, * Optical Communication with Chaotic Lasers, Applications of Nonlinear Dynamics and Synchronization* (Wiley-VCH, 2012).

**13. **A. Uchida, K. Amano, M. Inoue, K. Hirano, S. Naito, H. Someya, I. Oowada, T. Kurashige, M. Shiki, and S. Yoshimori, “Fast physical random bit generation with chaotic semiconductor lasers,” Nat. Photonics **2**(12), 728–732 (2008). [CrossRef]

**14. **R. Sakuraba, K. Iwakawa, K. Kanno, and A. Uchida, “Tb/s physical random bit generation with bandwidth-enhancedchaos in three-cascaded semiconductor lasers,” Opt. Express **23**(2), 1470 (2015). [CrossRef]

**15. **K. Ugajin, Y. Terashima, K. Iwakawa, A. Uchida, T. Harayama, K. Yoshimura, and M. Inubushi, “Real-time fast physical random number generator with a photonic integrated circuit,” Opt. Express **25**(6), 6511 (2017). [CrossRef]

**16. **K. Yoshimura, J. Muramatsu, P. Davis, T. Harayama, H. Okumura, S. Morikatsu, H. Aida, and A. Uchida, “Secure key distribution using correlated randomness in lasers driven by common random light,” Phys. Rev. Lett. **108**(7), 070602 (2012). [CrossRef]

**17. **H. Koizumi, S. Morikatsu, H. Aida, T. Nozawa, I. Kakesu, A. Uchida, K. Yoshimura, J. Muramatsu, and P. Davis, “Information-theoretic secure key distribution based on common random-signal induced synchronization in unidirectionally-coupled cascades of semiconductor lasers,” Opt. Express **21**(15), 17869–17893 (2013). [CrossRef]

**18. **T. Sasaki, I. Kakesu, Y. Mitsui, D. Rontani, A. Uchida, S. Sunada, K. Yoshimura, and M. Inubushi, “Common-signal induced synchronization in photonic integrated circuits and its application to secure key distribution,” Opt. Express **25**(21), 26029–26044 (2017). [CrossRef]

**19. **D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer, “Parallel photonic information processing at gigabyte per second data rates using transient states,” Nat. Commun. **4**(1), 1364–1367 (2013). [CrossRef]

**20. **J. Nakayama, K. Kanno, and A. Uchida, “Laser dynamical reservoir computing with consistency: an approach of a chaos mask signal,” Opt. Express **24**(8), 8679–8692 (2016). [CrossRef]

**21. **Y. Kuriki, J. Nakayama, K. Takano, and A. Uchida, “Impact of input mask signals on delay-based photonic reservoir computing with semiconductor lasers,” Opt. Express **26**(5), 5777–5788 (2018). [CrossRef]

**22. **M. Naruse, T. Mihana, H. Hori, H. Saigo, K. Okamura, M. Hasegawa, and A. Uchida, “Scalable photonic reinforcement learning by time-division multiplexing of laser chaos,” Sci. Rep. **8**(1), 10890 (2018). [CrossRef]

**23. **R. Homma, S. Kochi, T. Niiyama, T. Mihana, Y. Mitsui, K. Kanno, A. Uchida, M. Naruse, and S. Sunada, “On-chip photonic decision maker using spontaneous mode switching in a ring laser,” Sci. Rep. **9**(1), 9429 (2019). [CrossRef]

**24. **T. Heil, I. Fischer, W. Elsässer, J. Mulet, and C. R. Mirasso, “Chaos synchronization and spontaneous symmetry breaking in symmetrically delay-coupled semiconductor lasers,” Phys. Rev. Lett. **86**(5), 795–798 (2001). [CrossRef]

**25. **E. A. Rogers-Dakin, J. García-Ojalvo, D. J. Deshazer, and R. Roy, “Synchronization and symmetry breaking in mutually coupled fiber lasers,” Phys. Rev. E **73**(4), 045201 (2006). [CrossRef]

**26. **R. Vicente, C. R. Mirasso, and I. Fischer, “Simultaneous bidirectional message transmission in a chaos-based communication scheme,” Opt. Lett. **32**(4), 403–405 (2007). [CrossRef]

**27. **E. Klein, N. Gross, M. Rosenbluh, W. Kinzel, L. Khaykovich, and I. Kanter, “Stable isochronal synchronization of mutually coupled chaotic lasers,” Phys. Rev. E **73**(6), 066214 (2006). [CrossRef]

**28. **J. F. M. Avila and J. R. R. Leite, “Time delays in the synchronization of chaotic coupled lasers with feedback,” Opt. Express **17**(24), 21442–21451 (2009). [CrossRef]

**29. **M. Peil, L. Larger, and I. Fischer, “Versatile and robust chaos synchronization phenomena imposed by delayed shared feedback coupling,” Phys. Rev. E **76**(4), 045201 (2007). [CrossRef]

**30. **I. Fischer, R. Vicente, J. M. Buldu, M. Peil, C. R. Mirasso, M. C. Torrent, and J. Garcia-Ojalvo, “Zero-lag long-range synchronization via dynamical relaying,” Phys. Rev. Lett. **97**(12), 123902 (2006). [CrossRef]

**31. **K. Kanno, T. Hida, A. Uchida, and M. Bunsen, “Spontaneous exchange of leader-laggard relationship in mutually coupled synchronized semiconductor lasers,” Phys. Rev. E **95**(5), 052212 (2017). [CrossRef]

**32. **R. Lang and K. Kobayashi, “External optical feedback effects on semiconductor injection laser properties,” IEEE J. Quantum Electron. **16**(3), 347–355 (1980). [CrossRef]

**33. **S.-J. Kim, M. Aono, and M. Hara, “Tug-of-war model for the two-bandit problem: Nonlocally-correlated parallel exploration via resource conservation,” BioSystems **101**(1), 29–36 (2010). [CrossRef]

**34. **S.-J. Kim, M. Aono, and E. Nameda, “Efficient decision-making by volume-conserving physical object,” New J. Phys. **17**(8), 083023 (2015). [CrossRef]