Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

47-kbit/s RGB-LED-based optical camera communication based on 2D-CNN and XOR-based data loss compensation

Open Access Open Access

Abstract

In an RGB-LED-based optical camera communication (OCC) system, the inter-symbol interference and inter-channel interference deteriorate the transmission performance considerably. In this paper, a two-dimensional CNN structure is proposed for data recovery by learning features between color channels and neighboring symbols in the rolling shutter based OCC system under random data transmission. Moreover, we further propose an XOR-based data loss compensation method to realize 21% data rate improvement by restoring the lost data during the transmission. A record-high data rate at 47 kbit/s has been experimentally achieved for an RGB-LED-based OCC system using a rolling shutter camera in a smartphone.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Optical camera communication (OCC) is a promising wireless communication method utilizing complementary metal-oxide-semiconductor (CMOS) cameras as the receivers. Compared with photodiode based visible light communication [1], by leveraging on the widely distributed embedded cameras in electronic devices like smartphones, OCC can be deployed easily and readily. Over the last few years, the OCC system has been developed for multiple services and applications, such as indoor communication, indoor localization, vehicle communication, and broadcasting, by using the existing facilitates [2].

The transmitter of the OCC system can be either display screens or LEDs. By using a screen as the transmitter, a data rate of 16.67 kbit/s has been achieved in a covert OCC system [3]. Due to the much larger deployment of LED, the OCC system can be easily deployed using LED as the transmitter. In [4], the rolling shutter effect has been employed to increase the data rate up to several kbit/s for a single light-emitting diode (LED) based OCC system. However, the data rate is still limited by the bandwidth of the camera. Nowadays, RGB-LED can be employed to increase the system capacity by using color division multiplexing or color shift keying [5]. With a quadrichromatic LED, a transmission data rate up to 13.2 kbit/s has been achieved [6]. In [7], the dual cameras on a smartphone have been used to receive signals in color shift keying and gray level modulation, respectively, and a data rate of 17.1 kbit/s has been realized. Due to the effects of amplitude variation, exposure time overlapping and inter-channel interference, the data rate achieved by the RGB-LED is not high, only comparable to the 14-kbit/s data rate realized in single white LED-based OCC system [8]. More effective and ingenious decoding algorithms should be investigated to combat the above issues and unleash the full potential brought by RGB-LED.

Deep neural network has been applied in many fields, such as face recognition, natural language processing, and audio recognition, and found great success. In the smartphones, the image processing algorithms are various and well known to be nonlinear. To decode the received signal under a complex channel, neural networks with powerful recognition ability will play an important role in the OCC system. In the artificial neural network (ANN), each neuron is connected to all the neurons in the next layer. Different from that, convolutional neural network (CNN) shows much lower complexity by using small kernels and weight sharing [9]. Thus, CNN has been widely used in the field of computer vision to process numerous data. In [10], a two-dimensional (2D) CNN has been applied to detect the ID of an RGB-LED in the captured images. Note that the RGB-LED is only modulated with a fixed ID data pattern, not random data. The CNN used the whole image as the input, which contained much redundancy since the pixels in one row record the same LED state.

Other than the aforementioned bandwidth limit of the receiver, another challenge for OCC is the gap-time between two adjacent frames. In the CMOS sensor, during the video capturing, a new frame cannot start recording until the last frame has been completely readout [11]. Consequently, a gap-time exists between two adjacent frames and thus, the signals transmitted at the gap cannot be received. To mitigate the gap-time effect, typically, the signals are separated into multiple packets and repeatedly transmitted multiple times (typically two or three) to guarantee the successful reception of at least one complete packet [4]. For data loss compensation, Reed-Solomon (RS) code and Hamming code with bit interleaving [12,13] have been applied, which requires that the length of the redundancy should be at least twice the length of the lost data (i.e. length of the gap-time). If the gap-time is larger than 50% in the video recording, the packets cannot be directly restored according to the aforementioned limitation.

In this paper, to decode the random data transmitted by an RGB-LED, a 2D-CNN is used to decode the received symbols and achieves the best BER performance compared with ANN and 1D-CNN. A simple but efficient data loss compensation method is proposed via employing a parity-check packet based on the XOR operation to maximize the transmission efficiency. The proposed scheme can be applied when the camera’s video gap-time is larger than 50% (which is very common in smartphones). The results show that the 21% of data rate enhancement has been realized compared with the repetitive transmission.

2. Decoding algorithm

2.1 Proposed CNN structure

After recording the video, the captured video is converted to multiple images. For each image, we can extract the signals from the center column (Curve 1, C1, yellow dashed line) or a bell-shaped-curve (Curve 2, C2, orange solid line). Next, the amplitude fluctuation will be mitigated based on the improved frame averaging based signal tracing algorithm proposed in [6]. Firstly, an average frame is generated from the grayscale frames and the average grayscales are extracted on C1 or C2 as the scaling factors. The extracted signals of RGB color channels are scaled by the scaling factors so that signals at the edge rows have about the same intensity variation as those at the central rows as shown in Fig. 1(a).

 figure: Fig. 1.

Fig. 1. (a) Pre-processing for signal extraction and scaling (signals on the green channel are taken as an example). (b) The 2D-CNN structure proposed for RGB-LED-based OCC system.

Download Full Size | PDF

The proposed CNN structure is shown in Fig. 1(b). In CNN, the output of the l-th convolutional layer Wl (i, j, v) can be represented as:

$${W^l}({i,\;j,\;v} )= f({({{W^{l - 1}} \ast K_v^l} )({i,\;j} )+ b(v )} ),\;i = 1,\ldots ,X,\;j = 1,\ldots ,Y,\;v = 1,\ldots ,D.$$
where f(·) is the activation function, Wl-1 is the output of the (l-1)-th layer, * is the correlation operation, Kvl is the kernel of the v-th channel in the l-th layer, X is the width, Y is the height, D is the depth, and b(v) is the bias of the v-th channel. Here, we use on-off keying (OOK) for each channel so that each symbol includes three bits. For symbol decision, the M adjacent symbols with an oversampling ratio N will be taken as the input to construct a 2D layer. The input layer contains three columns, which are occupied by the signals from red, green, and blue channels. The first convolutional layer filters the MN×3×1 input layer by five kernels of size 3×3×1. The second convolutional layer takes the output of the first convolutional layer as the input and filters it with three kernels of size 5×5×5. The ReLU activation function and batch normalization are applied to the output of each convolutional layer. Next, the output of the second convolutional layer is flattened to a 1D vector. A fully connected layer with eight neurons is used to make the classification. After activated by the ReLU function, the softmax function [14] will produce the probability distribution over the eight possible symbols. By using the efficient and low-complexity network, CNN, the received symbol can be recognized based on the extracted features among the color channels and neighboring symbols.

2.2 Training details

The objective of the training is to minimize the loss represented by the cross-entropy. In this work, CNN is trained using the received signals from 50 images. The random messages in the training process and the validation process are separately generated and captured in two videos to avoid the memory effect in the neural network. The mini-batch size is set as 200 and the number of epochs is set as 60. The learning rate is set as 1×10−2 and then gradually decreased to 1×10−5 to finely optimize the parameters during training. Note that the hyperparameters such as the number of layers, the size of the kernels and batch size are optimized to keep the network relatively small while maintaining good performance.

3. XOR-based data loss compensation

In this paper, we propose to use an XOR-based data loss compensation (XDLC) scheme to compensate the data loss during the gap-time, which exhibits high transmission efficiency and improved data rate in the case that the length of gap-time, T, is larger than half of 1/frame rate. Here, (1/frame rate – T) is defined as the actual frame recording length, R. The complete packet can be transmitted one or multiple times followed by an XOR-based parity-check packet. The length of the payload and the parity-check packet is crucial to guarantee the effective recovery of the data loss, as will be discussed in detail later.

The parity-check packet is generated as follows. Firstly, the payload with length P is separated into multiple sub-packets based on the length of the parity-check packet, C. The length of the last sub-packet should be smaller than C/2. Next, we replace the last one with a pseudo-sub-packet with length C by padding the last sub-packet from the end to fill the missing part. Then, the parity-check packet is generated by XOR among all the sub-packets. The header is ignored to simplify the description in the following analysis. The gap-time is assumed to be slightly larger than half of the 1/frame rate. Possible scenarios for data loss recovery when using two repeated packets plus one parity-check packet include (i) recovery by two repeated packets directly and (ii) recovery by two repeated packets plus the parity-check packet. For the latter, three possible cases exist and are discussed here.

Case 1 in Fig. 2(a) describes the situation when the leading part of the payload is lost. This data loss should be recovered by implementing XOR operation with the received payload and the parity-check packet. Assume that the length of the lost parity-check packet is η. To fully recover the packet, the length of the lost packet P + C-η-R should be no larger than the length of the received parity-check packet C-η, i.e. P + C-η-R ≤ C-η. The result shows the limitation on the payload length, i.e., requirement 1: P ≤ R.

 figure: Fig. 2.

Fig. 2. (a-c) Different cases of data loss.

Download Full Size | PDF

Figure 2(b) shows the Case 2 when the leading and the trailing parts of the payload are received with a total length 2P-T while the middle part is missing. Fortunately, a complete parity-check packet is captured and the lost part can be restored if the length of the lost part P-(2P-T) is no larger than the length of the parity-check packet C, thus, P-(2P-T) ≤ C. Based on that, the length of parity-check packet C should follow the requirement 2: T-P ≤ C.

Case 3 shows the situation when the trailing part of the payload is lost. Thanks to the padding of the last sub-packet, we can firstly restore it by XOR operation and then utilize the restored pseudo-sub-packet to recover the residual lost part. As shown in Fig. 2(c), to correctly restore the last sub-packet, the length of the packet that can be firstly recovered should be no smaller than the length of the last sub-packet γ, i.e. γ ≤ (C-η)-(T-P-η-γ). The result derived is the same as the requirement 2. As mentioned, the length of the last sub-packet γ should be smaller than C/2 so that at least one sub-packet can exist at both the leading and trailing parts of the pseudo-sub-packet, i.e., requirement 3: γ ≤ C/2. In that case, the last sub-packet can be well recovered no matter whether the leading or the trailing part of the parity-check packet is received, shown as case 1 and 3. Note that the requirement can be easily satisfied by properly adjusting C. As shown in Fig. 3(a), by using the XDLC scheme, the length of the payload can be set close to the length of actual frame recording. In addition, XDLC scheme can be extended to other cases with different gap-times.

 figure: Fig. 3.

Fig. 3. (a) Repetitive transmission and proposed XDLC scheme. (b) Experimental setup. (c) Decoding diagram.

Download Full Size | PDF

4. Experimental setup

The experimental setup and decoding algorithm of the RGB-LED-based OCC system is shown in Fig. 3. The transmitted signal was controlled by a computer and fed into a field-programmable gate array (FPGA, Xilinx Spartan 6). The output signal from the circuit was used to drive an RGB LED. After a 40-cm free-space transmission, at the receiver side, a plano-convex lens was set in front of a smartphone (Huawei P20 Pro). The signal was captured by a video recording with 60 frames per second (fps), and processed offline. The resolution of each image was 1080×1920. We set the exposure compensation as −4 EV. The exposure time and ISO are automatically set. After the bit generation for individual color, the signal was constructed packet by packet. Each packet had a 16-bit header and a payload.

5. Experimental results

To validate the effectiveness of the CNN in RGB-LED-based OCC system, the experiment has been demonstrated and the results are shown in the following. In the experiment, we first characterized the gap-time between the adjacent frames to determine the parameters for the XDLC scheme. As shown in Fig. 4(a), the average gap-time is 8.597 ms and the maximum gap-time is 8.603 ms for the phone used. The actual frame recording time R is around 8.136 ms, which is smaller than the gap-time. According to the analysis above, the length of the payload P is set based on the R to allow correct recovery of each packet (P ≤ 8.136 ms). Furthermore, considering the interference between the adjacent symbols, the decoding of a symbol depends on the neighboring ones. Therefore, the symbols received at the edge of the images cannot be decoded, which is regarded as data loss, due to the unavailability of adjacent symbols. Thus, the length of the parity-check packet C is set around 2 ms.

 figure: Fig. 4.

Fig. 4. Parameter optimization. (a) Gap-time estimation. (b) BER performance under different symbol number and oversampling ratio for the input layer of 2D-CNN.

Download Full Size | PDF

To optimize the size of input information, the BER performance is estimated under the different number of symbols and oversampling ratios for the input layer. The number of symbols of each channel is set as 1, 3, and 5. The oversampling ratio is set as 1, 3, 5, and 9. For a symbol rate of 34 kbaud/s, the BER with five symbols is slightly better than that with three symbols. An oversampling ratio of 3 is sufficient to ensure the convergence of the BER performance since the waveform generated provides enough features for symbol recognition. In the following, the number of symbols for the input information is set as 5 and the oversampling ratio is set as 3.

Next, we compare the BER performance with different decoding algorithms. Here, each packet is directly transmitted three times. The 2D-CNN is implemented on C1 and C2, respectively. It can be shown that the BER performance on C2 is slightly better than that on C1 since the blooming effect (overexposure) is less severe in C2. The ANN, 1D-CNN and double-equalization (DE) [8] are also utilized for comparison. For neural networks, the number of the input, the output, hidden layers and neurons in each layer is the same as that of 2D-CNN. For the 1D-CNN, the kernel size of two convolutional layers is set as 1×3×1 and 1×5×5 respectively. For the least mean square based feedforward equalizer (FFE) in DE, the fractional T/3 FFE is used with a tap number of 45 (5 symbols × oversampling ratio 3 × 3 channels). As shown in Fig. 5(a), the BER performance of CNN schemes is better than that of DE and ANN. By using 2D-CNN, the best BER performance can be achieved. For the three channels, the BER of the green channel outperforms the other two channels. The reason is that the number of green picture sensing elements on the image sensor is larger than that of red and blue ones since human eyes show high sensitivity to the green color. Therefore, the high received power results in the superior signal-to-noise ratio in the green channel.

 figure: Fig. 5.

Fig. 5. BER performance and data rate comparison. (a) BER performance by using various algorithms. (b) Data rate versus symbol rate and (c) BER performance w/ and w/o XDLC.

Download Full Size | PDF

Furthermore, we investigate the BER performance with XDLC scheme. Here, we extract the signals from C2. As shown in Fig. 5(b), under the same symbol rate, the data rate achieved by the data loss compensation is much higher than that with the repetitive transmission. By using the XDLC scheme, as shown in Fig. 5(c), the data rate can be improved from 39 kbit/s (217 symbol/frame × 3 bit/symbol × 60 frame/second) to 47 kbit/s (264 symbol/frame × 3 bit/symbol × 60 frame/second) under the HD-FEC limit, i.e. 21% of data rate enhancement, which shows the effectiveness of our proposed scheme.

6. Conclusion

We have proposed a CNN structure for data decoding in OCC with random data transmission. Benefiting from the pattern recognition property, CNN can effectively identify the received symbol based on the extracted features. In addition, we proposed an XOR-based data loss compensation method to further enhance the transmission data rate. Based on the proposed schemes, a record-high data rate of 47 kbit/s is achieved by the RGB-LED-based OCC system using a rolling shutter camera in a smartphone.

Funding

Hong Kong Research Grants Council (GRF 14201818).

Disclosures

The authors declare no conflicts of interest.

References

1. Y. Hong, T. Wu, and L. K. Chen, “On the performance of adaptive MIMO-OFDM indoor visible light communications,” IEEE Photonics Technol. Lett. 28(8), 907–910 (2016). [CrossRef]  

2. M. Z. Chowdhury, M. T. Hossan, A. Islam, and Y. M. Jang, “A comparative survey of optical wireless technologies: architectures and applications,” IEEE Access 6, 9819–9840 (2018). [CrossRef]  

3. J. Wang, W. Huang, and Z. Xu, “Demonstration of a covert camera-screen communication system,” in Proceedings of International Wireless Communications and Mobile Computing Conference (IEEE, 2017), pp. 910–915.

4. C. Danakis, M. Afgani, G. Povey, I. Underwood, and H. Haas, “Using a CMOS camera sensor for visible light communication,” Globecom Workshops (IEEE, 2012), pp. 1244–1248.

5. R. Deng, J. He, Y. Hong, J. Shi, and L. Chen, “2.38 Kbits/frame WDM transmission over a CVLC system with sampling reconstruction for SFO mitigation,” Opt. Express 25(24), 30575–30581 (2017). [CrossRef]  

6. H. Chen, X. Z. Lai, P. Chen, Y. T. Liu, M. Y. Yu, Z. H. Liu, and Z. J. Zhu, “Quadrichromatic LED based mobile phone camera visible light communication,” Opt. Express 26(13), 17132–17144 (2018). [CrossRef]  

7. Y. Q. Xu, J. Hua, Z. Gong, W. Zhao, Z. Q. Zhang, C. Y. Xie, Z. T. Chen, and J. F. Chen, “Visible light communication using dual camera on one smartphone,” Opt. Express 26(26), 34609–34621 (2018). [CrossRef]  

8. L. Liu, R. Deng, and L. K. Chen, “Spatial and Time Dispersions Compensation with Double-equalization for Optical Camera Communications,” Photon. Technol. Lett. (posted 3 October 2019, in press). [CrossRef]  

9. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural network,” in Proceedings of Conference on Neural Information Processing Systems (ACM, 2012), pp. 1097–1105.

10. W. Guan, J. Li, S. Wen, X. Zhang, Y. Ye, J. Zheng, and J. Jiang, “The detection and recognition of RGB-LED-ID based on visible light communication using convolutional neural network,” Appl. Sci. 9(7), 1400 (2019). [CrossRef]  

11. M. Meingast, C. Geyer, and S. Sastry, “Geometric models of rolling-shutter cameras,” https://arxiv.org/abs/cs/0503076.

12. P. Hu, P. H. Pathak, X. Feng, H. Fu, and P. Mohapatra, “ColorBars: increasing data rate of LED-to-camera communication using color shift keying,” in Proceedings of International Conference on emerging Networking EXperiments and Technologies (ACM, 2015), pp. 1–13.

13. D. T. Nguyen and Y. Park, “Data rate enhancement of optical camera communications by compensating inter-frame gaps,” Opt. Commun. 394, 56–61 (2017). [CrossRef]  

14. I. Goodfellow, Y. Bengio, and A. Courville, “Deep feedforward networks,” in Deep Learning, (The Massachusetts Institute of Technology Press, 2016).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1.
Fig. 1. (a) Pre-processing for signal extraction and scaling (signals on the green channel are taken as an example). (b) The 2D-CNN structure proposed for RGB-LED-based OCC system.
Fig. 2.
Fig. 2. (a-c) Different cases of data loss.
Fig. 3.
Fig. 3. (a) Repetitive transmission and proposed XDLC scheme. (b) Experimental setup. (c) Decoding diagram.
Fig. 4.
Fig. 4. Parameter optimization. (a) Gap-time estimation. (b) BER performance under different symbol number and oversampling ratio for the input layer of 2D-CNN.
Fig. 5.
Fig. 5. BER performance and data rate comparison. (a) BER performance by using various algorithms. (b) Data rate versus symbol rate and (c) BER performance w/ and w/o XDLC.

Equations (1)

Equations on this page are rendered with MathJax. Learn more.

W l ( i , j , v ) = f ( ( W l 1 K v l ) ( i , j ) + b ( v ) ) , i = 1 , , X , j = 1 , , Y , v = 1 , , D .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.