Wavy water-to-air optical camera communication system using rolling shutter image sensor and long short term memory neural network

Shang-Yen Tsai; Shang-Yen Tsai; Yun-Han Chang; Yun-Han Chang; Chi-Wai Chow; Chi-Wai Chow

doi:10.1364/OE.503787

1. Introduction

Nowadays, > 90% of the ocean is unexplored by human. Oceanic research for marine lives and underwater volcanic activities has attracted attention globally. Underwater sensors and autonomous underwater vehicles (AUVs) are widely utilized for these oceanic researches. As the number of underwater sensors and vehicles are increasing very quickly, the bandwidth demands are rising accordingly. The data acquired by these underwater sensors can be collected via transmitting to the mother-ships located on the water surface, or to the unmanned aerial vehicles (UAVs) located near the water [1]. Traditionally, radio-frequency (RF) is utilized to communicate between AUV and UAV [2]. As RF suffers high power loss in water, the AUV have to be close to the water surface to provide effective communication with the UAV [2]. Besides, as the absorption of sound is several orders of magnitude lower than that of RF in water, acoustic wave is also utilized to provide underwater communication [3]. However, underwater acoustic communication still has many challenges, such as limited bandwidth and high signal latency [4].

On the other hand, underwater wireless optical communication (UWOC) can provide distinct advantages including higher bandwidth and lower latency [5–12] when compared with using RF and acoustic wave. Besides, UWOC could also be a promising candidate for the realization of direct connectivity in the water-air interface [1,13,14]. The water-air direct optical link is highly desirable to transmit real-time data from underwater AUV to unmanned aerial vehicle (UAV) or airplane, which then sends the data to ground stations using optical wireless communication (OWC) [11,15] or RF. Recently UAV-assisted wireless communication can provide additional dimension and mobility augmenting the existing terrestrial networks. It can also employ to offer urgent communication during catastrophic disasters, e.g. earth quakes or hurricanes [16–18]. Despite the aforementioned merits, UWOC and water-air optical communication have several challenges. Besides the absorption and scattering introduced by water, the underwater turbulence also generates power fluctuation at the receiver (Rx), producing scintillation effect. In the water-air optical communication, the wavy water surface will significantly deflect the optical signal reaching the Rx, producing communication link outage. To mitigate the deflection of optical signal; higher field-of-views (FOV) for both the transmitter (Tx) and Rx are highly desirable.

In this work, we propose a wide FOV water-to-air optical transmission using RS based optical camera communication (OCC). OCC is one realization of OWC using camera as Rx [19,20]. The advantages of OCC include the high availability of complementary-metal-oxide-semiconductor (CMOS) cameras in UAVs and smart-phones; large size of photo-sensitive area; as well as rolling shutter (RS) effect allowing OCC data rate higher than camera frame rate. Table 1 summaries the recent RS camera based OCC systems. On-off keying (OOK) is usually used; and thresholding schemes, including polynomial-curve-fitting [19], extreme-value-averaging (EVA) [21], beacon-jointed packet reconstruction (BJPR) [22] were utilized for decoding the RS pattern. Besides, artificial intelligent/machine learning (AI/ML) was also proposed to enhance the OCC performances, including convolutional neural network (CNN) [23,24], pixel-per-symbol labeling neural network (PPS-NN) [25]. The uneven light exposure effect can also be mitigated by using convolutional autoencoder [26]. When comparing the data rate of this work with Refs. [21,22], similar data rates are observed; however, the transmission distance of Refs. [21,22] is about 30 cm. Although Refs. [23,24] can achieve higher data rates, the transmission distance is also limited to 40 cm. Reference [25] has similar transmission data rate and distance with our proposed work; however, it has not consider the turbulence caused by the wavy water-to-air environment. In this work, the “higher FOV” is compared with typical OWC systems using positive-intrinsic-negative (PIN) photodiode (PD) or avalanche photodiode (APD) for optical detection, in which the optical alignment is very critical and the FOV is < 1 degree. In order to establish the communication link using PIN PD or APD, active alignment systems, such as using optical phase array (OPA) [27] or spatial light modulator [28] are needed.

Table 1. Recent RS-based OCC systems

View Table

In this work, we propose and preset the first demonstration up to the authors’ knowledge of a wide FOV water-to-air optical transmission using RS-based OCC. It is worth to mention that the proposed system is not a hybrid system combining underwater and water-to-air communication links. Long short term memory neural network (LSTM-NN) is utilized to mitigate the water turbulence induced link outage and to decode 4-level pulse-amplitude-modulation (PAM4) RS pattern. Data rates of 7.2 kbit/s (no water ripple case) and 6 kbit/s (high water ripple case) are achieved, meeting the pre-forward error correction bit-error-rate (pre-FEC BER = 3.8 × 10⁻³). We also evaluate the FOVs of the proposed water-to-air RS-based OCC system. This can be implemented by using different angular rotations of the camera. Experimental results show that the system can support ±70°, ± 30°, and ±30° rotations around the z-, y- and x-directions, respectively.

2. Experiment and mechanism

Figure 1 shows the scenario of the water-to-air optical communication from the AUV to UAV. As discussed before, the wavy surface will significantly deflect the optical signal reaching the Rx, producing communication link outage. To mitigate the communication link outage due to water turbulence; higher FOVs for both Tx and Rx are highly desirable.

Fig. 1. Scenario of the water-to-air optical communication from the AUV to UAV with wide FOV.

Download Full Size | PDF

Figure 2(a) illustrates the operation mechanism of RS in CMOS camera. The camera Rx will integrate signal in a row-by-row manner similar to a scanning function. Hence, bright and dark fringes can be observed in the RS pattern representing LED “ON” and “OFF”. The RS effect allows the data rate of OCC system faster than the camera frame-rate. As also illustrated in Fig. 2, the next pixel-row will start to activate before the completion of the previous pixel-row with a start time delay. Because of this, when the LED is modulated at high data rate, ambiguity occurs due to the uneven integrated light in the pixel-row. This is because only part of the exposure time is at LED “ON” state, producing a grey fringe. This ambiguity problem is even more severe in PAM4 OCC decoding. Figure 2(b) shows the structure of the data packet. It consists of a 10-symbol header and different symbol-length payload. In the header, it consists of 8 symbols of PAM4 level zero (i.e. complete dark) followed by 2 symbols of PAM4 level three (i.e. complete bright). The payload is generated by the random function in Matlab. The payload data can be located between two headers. If two headers cannot be received in an image frame, the frame is considered unsuccessful.

Fig. 2. (a) RS operation in CMOS camera. Color keys: white: LED at level 3; yellow: LED at level 2; light yellow: LED at level 1; black: LED at level 0; blue: read-out time. (b) Structure of data packet.

Download Full Size | PDF

Figure 3(a) shows the proof-of-concept experiment of the proposed water-to-air optical transmission using RS OCC. The PAM4 signal is generated by a LED light panel (Li-Cheng) connected to an arbitrary waveform generator (AWG, Tektronix AFG3252C). The dimensions of the LED panel and water tank are 0.58 m × 0.88 m; and 0.73 m × 0.88 m, respectively. The camera has the resolution of 1920 × 1440 pixels and 30 fps. The mobile phone is mounted on a stand allowing different angular rotations around the z-, y- and x-directions. The distance between the smart-phone and LED panel is about 1.1 m. A wave maker (Jebao SCP-120) is used to produce the wavy water surface. As illustrated in Fig. 3(b), the percentage increase in water ripple is define as $\left( {\frac{{{h_{peak}} - {h_{ave}}}}{{{h_{peak}}}}} \right) \times 100\%$, where h_peak is the height of water peak and h_ave is the height of average water level. In this proof-of-concept demonstration, a wave maker producing the wavy water surface mechanically has only two operating modes; hence, we only emulate 3 scenario: no water ripple, 9% increase of water ripple, 12% increase of water ripple. Figures 4(a)-4(b) illustrate the photos of the experiment, indicating the LED panel Tx, water tank and the RS pattern observed in the mobile phone screen.

Fig. 3. (a) Experimental setup of the water-to-air OCC system. (b) Illustration of h_peak and h_ave for the definition of water ripple.

Download Full Size | PDF

Fig. 4. Photos of the experiment, indicating (a) the LED panel Tx water tank; and (b) RS pattern observed in the mobile phone screen.

Download Full Size | PDF

Figure 5(a) shows the algorithm of the LSTM-NN model used to decode the PAM4 OCC signal and to mitigate the wavy water-surface induced link outage. LSTM-NN [29] is a kind of recurrent neural network (RNN) having a chain-like structure, including an input gate, an output gate and a forget gate. Unlike standard feed-forward artificial neural network (ANN), LSTM-NN has feedback operation. As a result, LSTM-NN enables classification and predictions based on time series data. Image frames from video are read-in and converted into grayscale values (0 to 255). Column matrix selection is performed to select the 85% highest grayscale value in each pixel-row. The use of 85% is to avoid pixel saturation generated error. Once the grayscale value of each pixel-row is obtained, these values can form a column matrix representing the received optical intensity (i.e. received optical PAM4 data). The detail of the column matrix selection is in Ref. [20]. In the data pre-processing unit, features of the input signal, including present symbol value, pre-/post-symbol relations and symbol average are extracted. After this, the data is sent to the LSTM layer. In this proposed LSTM-NN model, we have optimized the decoding performance and the complexity. The model has two LSTM layers with neuron numbers of 64 and 32 respectively. Batch normalization is utilized. The last three layers are fully-connected (FC) dense layers, with neuron numbers of 32, 16 and 4 respectively. The dropouts are used to prevent overfitting. Relu is used at the first and second FC dense layers; and Softmax is the activation function in the last layer. The loss function is sparse categorical cross entropy and the optimizer is Adam.

Fig. 5. Architectures of (a) LSTM-NN model used to decode the PAM4 OCC signal; (b) LSTM cell.

Download Full Size | PDF

Figure 5(b) illustrates the LSTM cell used in the LSTM layer. The cell contains an input x_t. h_t-1 and c_t-1, which are the inputs from the last LSTM cell time-step. In addition, the LSTM cell also produces the c_t and h_t for the consumption of the next time-step. Equation (1) illustrates the operations of forget gate f_t, input gate i_t and output gate o_t, where σ stands for the sigmoid function [29].

(1)$$\begin{array}{l} {i_t} = \sigma ({{W_i}{x_t} + {M_i}{h_{t - 1}} + {B_i}} )\\ {f_t} = \sigma ({{W_f}{x_t} + {M_f}{h_{t - 1}} + {B_f}} )\\ {o_t} = \sigma ({{W_o}{x_t} + {M_o}{h_{t - 1}} + {B_o}} )\end{array}$$

The operations of the hidden state h_t and the cell state c_t are illustrated in Eq. (2) [29]. It is worth to note that both Eqs. (1) and (2) are only for a single time-step, meaning that they will be recalculated for the next time-step. Besides, the weights (W_f, M_f, W_i, M_i, W_o, M_o, W_c, M_c) and the biases (B_f, B_i, B_o, B_c) are time independent. Once the LSTM-NN model has been trained and built, these weight matrices will not vary from one time step to another.

(2)$$\begin{array}{l} {c_t} = {f_t}{c_{t - 1}} + {i_t}c{^{\prime}_t}\\ c{^{\prime}_t} = \tanh ({{W_c}{x_t} + {M_c}{h_{t - 1}} + {B_c}} )\\ {h_t} = {o_t}\tanh ({c_t}) \end{array}$$

3. Results and discussions

Two decoding approaches, using ANN and LSTM-NN are compared as illustrated in Figs. 6(a) and 6(b) respectively. In this proof-of-concept demonstration, a wave maker producing the wavy water surface mechanically has only two operating modes, hence, we only emulate 3 scenarios: no water ripple, 9% increase of water ripple, 12% increase of water ripple. The ANN is also optimized having an input, an output and 6 FC hidden layers. The neuron numbers in the FC hidden layers are 36, 64, 128, 64, 16, 4. Relu and Softmax activation functions are used for the first 5 FC layers and the last layer respectively. At no ripple case, we can observe that LSTM-NN can successfully decode PAM4 at 7.2 kbit/s, satisfying the pre-FEC BER threshold (BER = 3.8 × 10⁻³). However, the traditional ANN can only decode PAM4 at 4.8 kbit/s. At the 9% water ripple case, the proposed LSTM-NN can also operate at 7.2 kbit/s meeting the FEC; however, the ANN can only operate at 3 kbit/s. At the high water ripple case 12%, the ANN can only operated at 3 kbit/s, while the proposed LSTM-NN can significantly enhance the data rate to 6 kbit/s. When the data rate is further increased beyond 7.2 kbit/s, high inter-symbol interference (ISI) is generated and data packet cannot be decoded. In this proof-of-concept demonstration, the transmission distance is about 1.1 m. Higher transmission distance is possible by increasing the LED light panel length and/or using a higher resolution camera. The transmission distance can also be increased by incorporating a telescope lenses in front of the camera [18].

Fig. 6. Measured BERs of the proposed OCC system when the water ripple is increasing gradually, and they are decoded using (a) ANN and (b) LSTM-NN.

Download Full Size | PDF

In order to evaluate the efficiency the proposed LSTM-NN, we compare loss functions of the LSTM-NN and ANN as shown in Figs. 7(a) and 7(b). The loss function is sparse categorical cross entropy and the optimizer is Adam. It can be observed that for the LSTM-NN model, the loss function converges to a minimum value of < 0.01 after 25 epochs. However, the loss function of the ANN model converges to a minimum value of about 0.53 after 40 epochs. Hence, the LSTM-NN performs well in both the performance and efficiency.

Fig. 7. Loss functions of the (a) LSTM-NN and (b) ANN.

Download Full Size | PDF

We also evaluate the FOVs of the proposed water-to-air RS-based OCC system. This can be implemented by using different angular rotations of the camera around the z-, y- and x-directions, respectively, as illustrated in Fig. 3. We would like to evaluate the performance of each rotation at low, medium and high data rates; hence, 3 kbit/s, 4.8 kbit/s and 6 kbit/s are selected in these comparisons. In these cases, LSTM-NN is utilized. First, the mobile phone is rotated around the z-axis as illustrated in Fig. 8(a). Assume at θ = 0°, the LED light panel occupies x pixels and y pixels in the horizontal direction and vertical direction respectively. When the mobile phone is rotated at an angle θ, the observed LED light panel in the screen still occupies x pixels in horizontal direction; however, the occupied pixels in vertical direction decrease. When the z-rotation angle is > 70°, a complete data packet cannot be observed. As a result, no further rotation is performed. The BER measurements of different OCC data rates at different rotation angles are shown in Fig. 8(b). Experimental results show that the system can support ±70° around the z-axis rotation, fulfilling the FEC. The RS patterns at different rotation angles are shown in Fig. 8(c).

Fig. 8. (a) Schematic to illustrate mobile phone is rotated around the z-axis. (b) Around z-axis rotation (degree). (c) RS patterns at different rotation angles.

Download Full Size | PDF

Then, the mobile phone is rotated around the y-axis as illustrated in Fig. 9(a). When the mobile phone is rotated at an angle ω, the LED panel becomes more “trapezium” shaped, hence, pixels of the RS pattern occupied in the vertical direction decreases. This makes the RS decoding more challenging. During y-rotation, the position of the stand should also be adjusted accordingly in order to capture the whole “trapezium” shaped LED panel at the center of the image frame. The BER measurements of different OCC data rates at different rotation angles are shown in Fig. 9(b). Experimental results show that the system can support ±30° around the y-axis rotation, fulfilling the FEC. The “trapezium” shaped RS patterns at different rotation angles are shown in Fig. 9(c).

Fig. 9. (a) Schematic to illustrate mobile phone is rotated around the y-axis. (b) Around y-axis rotation (degree). (c) RS patterns at different rotation angles.

Download Full Size | PDF

Finally, the mobile phone is rotated around the x-axis as illustrated in Fig. 10(a). When the mobile phone is rotated at an angle ψ, only part of the LED panel can be observed by the screen. The x-direction offset is ±30°, fulfilling the FEC, as illustrated in the BER measurements shown in Fig. 10(b). The RS patterns at different rotation angles are shown in Fig. 10(c). It can be observed that when the x-axis rotation angle increases, the reflection from the side wall of the water tank can be captured by the camera. LED bars are installed on both the left and right edges of the LED panel. The obvious water ripple effect observed in Fig. 10(c) is due to the water tank side-wall reflection of leakage LED light.

Fig. 10. (a) Schematic to illustrate mobile phone is rotated around the x-axis. (b) Around x-axis rotation (degree). (c) RS patterns at different rotation angles.

Download Full Size | PDF

4. Conclusion

We first demonstrated a wide FOV water-to-air RS based OCC. LSTM-NN was utilized to mitigate the water turbulence induced link outage and to decode PAM4 RS pattern. Data rates of 7.2 kbit/s (no water ripple case) and 6 kbit/s (high water ripple case) were achieved, meeting the pre-FEC BER of 3.8 × 10⁻³. We also evaluated the FOVs of the proposed water-to-air RS-based OCC system. Experimental results showed that the system can support ±70°, ± 30°, and ±30° rotations around the z-, y- and x-directions, respectively when operated at 6 kbit/s and decoded using LSTM-NN.

Funding

National Science and Technology Council (NSTC-110-2221-E-A49-057-MY3, NSTC-110-2224-E-A49-003, NSTC-112-2218-E-011-006, NSTC-112-2221-E-A49-102-MY3).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. K. Chen, Y. Shao, and Y. Di, “Underwater and water-air optical wireless communication,” J. Lightwave Technol. 40(5), 1440–1452 (2022). [CrossRef]

2. H. B. Singh and R. Pal, “Submarine communications,” Def. Sci. J. 43(1), 43–51 (1993). [CrossRef]

3. A. H. Quazi and W. L. Konrad, “Underwater acoustic communications,” IEEE Commun. Mag. 20(2), 24–30 (1982). [CrossRef]

4. I. F. Akyildiz, D. Pompili, and T. Melodia, “Underwater acoustic sensor networks: research challenges,” Ad Hoc Netw. 3(3), 257–279 (2005). [CrossRef]

5. Z. Zeng, S. Fu, H. Zhang, et al., “A survey of underwater optical wireless communications,” IEEE Commun. Surv. Tutorials 19(1), 204–238 (2017). [CrossRef]

6. H. M. Oubei, J. R. Duran, B. Janjua, et al., “4.8 Gbit/s 16-QAM-OFDM transmission based on compact 450-nm laser for underwater wireless optical communication,” Opt. Express 23(18), 23302–23309 (2015). [CrossRef]

7. C. Shen, Y. Guo, H. M. Oubei, et al., “20-meter underwater wireless optical communication link with 1.5 Gbps data rate,” Opt. Express 24(22), 25502–25509 (2016). [CrossRef]

8. H. H. Lu, C. Y. Li, H. H. Lin, et al., “An 8 m/9.6 Gbps underwater wireless optical communication system,” IEEE Photonics J. 8(5), 1–7 (2016). [CrossRef]

9. C. Y. Li, H. H. Lu, W. S. Tsai, et al., “A 5 m/25 Gbps underwater wireless optical communication system,” IEEE Photonics J. 10(3), 1–9 (2018). [CrossRef]

10. C. Y. Li, H. H. Lu, Y. C. Huang, et al., “50 Gb/s PAM4 underwater wireless optical communication systems across the water–air–water interface [Invited],” Chin. Opt. Lett. 17(10), 100004 (2019). [CrossRef]

11. S. T. Hayle, Y. C. Manie, C. K. Yao, et al., “Hybrid of free space optics communication and sensor system using IWDM technique,” J. Lightwave Technol. 40(17), 5862–5869 (2022). [CrossRef]

12. T. C. Wu, Y. C. Chi, H. Y. Wang, et al., “Blue laser diode enables underwater communication at 12.4 Gbps,” Sci. Rep. 7(1), 40480 (2017). [CrossRef]

13. Y. Chen, M. Kong, T. Ali, et al., “26 m/5.5 Gbps air-water optical wireless communication based on an OFDM-modulated 520-nm laser diode,” Opt. Express 25(13), 14760–14765 (2017). [CrossRef]

14. Y. Shao, R. Deng, J. He, et al., “Real-Time 2.2-Gb/s Water-Air OFDM-OWC System With Low-Complexity Transmitter-Side DSP,” J. Lightwave Technol. 38(20), 5668–5675 (2020). [CrossRef]

15. C. W. Chow, C. H. Yeh, Y. Liu, et al., “Enabling techniques for optical wireless communication systems (Invited),” Proc. OFC2020, paper M2F.1.

16. M. S. Bashir and M.-S. Alouini, “Optimal positioning of hovering UAV relays for mitigation of pointing error in free-space optical communications,” IEEE Trans. Commun. 70(11), 7477–7490 (2022). [CrossRef]

17. H. Takano, M. Nakahara, K. Suzuoki, et al., “300-meter long-range optical camera communication on RGB-LED-equipped drone and object-detecting camera,” IEEE Access 10, 55073–55080 (2022). [CrossRef]

18. Y. H. Chang, S. Y. Tsai, C. W. Chow, et al., “Unmanned-aerial-vehicle based optical camera communication system using light-diffusing fiber and rolling-shutter image-sensor,” Opt. Express 31(11), 18670–18679 (2023). [CrossRef]

19. C. Danakis, M. Afgani, G. Povey, et al., “Using a CMOS camera sensor for visible light communication,” Proc. OWC 12, 1244–1248 (2012). [CrossRef]

20. C. W. Chow, C. Y. Chen, and S. H. Chen, “Visible light communication using mobile-phone camera with data rate higher than frame rate,” Opt. Express 23(20), 26080–26085 (2015). [CrossRef]

21. C. W. Chen, C. W. Chow, Y. Liu, et al., “Efficient demodulation scheme for rolling-shutter-patterning of CMOS image sensor based visible light communications,” Opt. Express 25(20), 24362–24367 (2017). [CrossRef]

22. W. C. Wang, C. W. Chow, C. W. Chen, et al., “Beacon jointed packet reconstruction scheme for mobile-phone based visible light communications using rolling shutter,” IEEE Photonics J. 9(6), 1–6 (2017). [CrossRef]

23. L. Liu, R. Deng, and L. K. Chen, “47-kbit/s RGB-LED-based optical camera communication based on 2D-CNN and XOR-based data loss compensation,” Opt. Express 27(23), 33840–33846 (2019). [CrossRef]

24. L. Liu, R. Deng, J. Shi, et al., “Beyond 100-kbit/s transmission over rolling shutter camera-based VLC enabled by color and spatial multiplexing,” Proc. OFC2020, paper M1J.4.

25. Y. S. Lin, C. W. Chow, Y. Liu, et al., “PAM4 rolling-shutter demodulation using a pixel-per-symbol labeling neural network for optical camera communications,” Opt. Express 29(20), 31680–31688 (2021). [CrossRef]

26. C. Jurado-Verdu, V. Guerra, V. Matus, et al., “Convolutional autoencoder for exposure effects equalization and noise mitigation in optical camera communication,” Opt. Express 29(15), 22973–22991 (2021). [CrossRef]

27. C. W. Chow, Y. C. Chang, S. I. Kuo, et al., “Actively controllable beam steering optical wireless communication (OWC) using integrated optical phased array (OPA),” J. Lightwave Technol. 41(4), 1122–1128 (2023). [CrossRef]

28. Y. H. Jian, C. C. Wang, C. W. Chow, et al., “Optical beam steerable orthogonal frequency division multiplexing (OFDM) non-orthogonal multiple access (NOMA) visible light communication using spatial-light modulator based reconfigurable intelligent surface,” IEEE Photonics J. 15(4), 1–8 (2023). [CrossRef]

29. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation 9(8), 1735–1780 (1997). [CrossRef]

Ref.	Modulation & Decoding Scheme	Data Rate (kbit/s)	Distance (m)	Water-Air Interface
[19]	OOK (Poly. curve fitting)	3.1	0.35	No
[21]	OOK (EVA)	7.68	0.3	No
[22]	OOK (BJPR)	10.32	0.2	No
[23]	OOK & RGB (2D-CNN)	47	0.4	No
[24]	OOK & RGB (Double-EQ)	111	0.4	No
[25]	PAM4 (PPS-NN)	14.4	2	No
This Work	PAM4 (LSTM-NN)	7.2 (no ripple) 6 (high ripple)	1(Air) + 0.1 (Water)	Yes

Wavy water-to-air optical camera communication system using rolling shutter image sensor and long short term memory neural network

Abstract

1. Introduction

2. Experiment and mechanism

3. Results and discussions

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (1)

Equations (2)

Optics Express