Interference Cancellation in MIMO NLOS Optical Camera Communication-based Intelligent Transport Systems

Northumbria University has developed Northumbria Research Link (NRL) to enable users to access the University’s research output. Copyright © and moral rights for items on NRL are retained by the individual author(s) and/or other copyright owners. Single copies of full items can be reproduced, displayed or performed, and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided the authors, title and full bibliographic details are given, as well as a hyperlink and/or URL to the original metadata page. The content must not be changed in any way. Full items must not be sold commercially in any format or medium without formal permission of the copyright holder. The full policy is available online: http://nrl.northumbria.ac.uk/pol  i cies.html


INTRODUCTION
Intelligent transportation systems (ITS) enables sharing of safety and traffic-related information between vehicles and/or vehicles and road infrastructures. The current ITS technologies are radio frequency (RF) based under the name of dedicated short-range communications (DSRC) [1]. However, there are a number of drawbacks related to DSRC including (i) sharing the same carrier frequency with a number of RF-based services such as fixed satellite and wireless services, mobile services, radiolocation, amateur radio, etc., thus resulting in both experiencing and introducing interference; (ii) increased costs due to the need for the RF-based on-board unit (OBU) and road-side unit (RSU) on vehicles and roadside infrastructure, respectively; (iii) vulnera-bility to the so-called broadcast storm, where several vehicles transmitting at the same time, which may lead to packet collision [2]; and (iv) potential hazards to the environment and human health [3]. As a viable alternative solution, the visible light communications (VLC) technology [4] could be adopted in vehicular communications by simply using lighting fixtures (i.e., front, back and internal), which are based on light emitting diodes (LEDs), organic LEDs (OLEDs), or laser diodes (LDs).
In VLC systems two types of detectors are commonly used: (i) photodiodes (PD) with a wide bandwidth (i.e., a few MHz to beyond a GHz depending on the PD's size), which is the most widely used due to their high-speed [5,6]; and (ii) image sensors (ISs) (i.e., multi-array PDs) as in cameras with much lower data rates R b . PD-based VLC systems are complex to implement for multiple access-based schemes and suffer from a high-level of ambient light-induced noise especially in outdoor environments in the presence of sunlight. Cameras, on the other hand, are considered as an imaging massive multiple-input multiple-output (MIMO) receiver (Rx), thus offering spatial separation of multiple light sources and spatial diversity [7,8]. In addition, cameras can be used for multiple purposes such as vision and positioning [9]. Complementary metal-oxide semiconductor (CMOS) based cameras can offer a higher data rate than the frame rate of the camera by employing rolling-shutter (RS)-based sensor readout [10,11]. However, in ITS there are a number of issues with RS-based communication including (i) very low exposure time, which can lead to poor SNR particularly over a longer transmission spans when operating at night times with streetlights of given illumination levels; (ii) a very small footprint of the light source at longer distances, which highly limits the performance of RS-based communications; and (iii) shorter size data packets and repetitive over a frame period given that the footprint of the light source only covers a small portion of the image sensor.
Most camera-based (i.e., IS) VLC, or commonly known as optical camera communications (OCC), systems reported are based on the line-of-sight (LOS) transmission mode. However, in some scenarios the transmitters (Txs) may not be within the LOS's field of view (FoV) such as: (i) two vehicles approaching a crossroad where vehicles are not in the FoV of each other but the light beams and its reflections from the road surface are; (ii) two vehicles travelling on a motorway side-by-side where the camera can only pick reflections of the headlight of the other car from the road surface; (iii) taller SLs (SLs) high lampposts in highways, which can be outside the camera's FoV but the light reflections from the road surface can be picked up by the camera; (iv) blocking of the LOS of the vehicle by a heavy goods vehicle, where the camera can still pick up off-axis projected optical illuminations induced reflections; and (v) blocking of the SLs by tall trees in urban areas, where camera can only capture reflected lights.
In [12], a 2 × N non-LOS (NLOS) OCC system with a small overlap illumination region was proposed for SL-to-vehicle communications. It was shown that, for an overlap area above a certain level the information could not be extracted. Hence, in scenarios where the overlap region increases provided all the lights transmit data simultaneously then the information cannot be retrieved successfully. In [13], a space-and time-division multiplexing (STDM) was proposed for NLOS OCC with large overlaps of the illumination footprints. However, using time division multiplexing (TDM) the system throughput is reduced [14], more specifically in urban areas with a large number of light sources (i.e., SLs, vehicles, display signs, etc.), see Fig. 1. In this situation, since the camera frame rate is low, selecting TDM as a multiple access technique is not the most efficient option. In addition, in cameras the IS, which is a massive matrix of photodiodes, channel inversion (CI) can be used, while the channel access is based on space division multiplexing (SDM).
CI, which eliminates the interference by directly forcing the interference terms to be zeros, has been investigated within the context of VLC for interference cancellation under the name of zero forcing (ZF) in multi-colour based MIMO and multipleinput single-output (MISO) links [6]. In [15], the performance of CI was compared with a combination of CI and successive interference cancellation (SIC) for a 3 × 3 MIMO wavelength division multiplexing (WDM) VLC system. It was shown that, for the optical band-pass filter and for signals having the same full width half maximum, the bit error rate (BER) performance of CI with SIC was half that of CI without SIC. CI was also adopted for multiuser MIMO-orthogonal frequency division multiplexing (OFDM)-based VLC and compared with minimum mean square error (MMSE) for a transmit power of 0 dBW, which offered 0.5 bit/s/Hz lower spectral efficiency [16]. The performance of CI was theoretically investigated in [17] for MIMO carrier-less amplitude and phase (CAP) Rxs in VLC showing an improvement in signal-to-noise ratio (SNR) by ∼ 3 dB for a BER of 3.8 × 10 −3 by combining it with optimally-ordered SIC (OSIC) at a cost of increased complexity. However, in [15][16][17], the performance of CI in a NLOS link or a nonlinear VLC system is not investigated. For a multiuser MISO downlink VLC system such as in airplane cabins, CI was adopted in [18] showing that an average per-user rate of ∼ 0.3 bps/Hz lower than Tomlinson-Harashima precoding scheme at 20 o degrees of LED directivity in a dynamic scenario. CI was also adopted in a MU-MIMO underwater VLC system with a spectral efficiency of ∼ 15 bps/Hz compared with ∼ 8 bps/Hz for TDMA [14].
Note, CI is the preferred option provided the SNR is high. Under ill-conditioned channel matrix, CI needs a large normalization factor, which will dramatically reduce the SNR [19]. Therefore, for low SNRs CI cannot achieve a good performance since noise (not the interference) is the dominant impairment of the system. Thus, the use of MMSE precoding. It is worth noting that, under higher SNR, (i.e., high transmit power levels), the performance of CI is the same as MMSE [16,17]. Note, in linear MMSE precoding, the interference at the Rxs is not identically zero, thus the trade-off between noise and interference, i.e., which is the dominant at the Rx. In addition, in [20] maximum likelihood (ML) detection was adopted for a generalised space shift keying signalling for an indoor MIMO VLC environment, which achieved a symbol error rate of 10 −4 at a SNR of ∼ 23.5 dB using 4 Rxs. A fast ML detection technique was proposed in [21] based on space-collaborative constellations, which achieved the same performance as the conventional ML detectors with reduced complexity. However, ML detectors are too complex and since the frame rate of the camera is limited they may introduce latency, which is unacceptable in vehicle-to-vehicle (V2V) communications. Therefore, in this work we have adopted CI due to its lower complexity compared with MMSE, ML, and MLSE.
In this paper, to the best of authors knowledge, for the first time we have adopted CI for OCC and propose two detection schemes of (i) CI; and (ii) frame subtraction and CI (FSCI) for NLOS MIMO-based OCC in order to mitigate the impact of crosstalk. We investigate the performance of CI-based NLOS OCC by considering the non-linearity of the camera, i.e., gamma correction. We show that, (i) error-free transmission is possible even under a high-level of crosstalk compared with hybrid selection/equal gain combining (HS/EGC); (ii) at a distance of 5 m and an ISO of 6400, CI offers improved performance compared with HS/EGC and FSCI by ∼ 1.1 and 1.2 normalized eye-height, respectively at the cost of increased processing time (i.e.,by 3 times); and (iii) for the Txs located close to each other, both CI and FSCI offer improved performance compared with HS/EGC. The rest of the paper is organized as follows. In Section II, the system model is presented and in Section III, the proposed detection algorithms are described. Section IV outlines experimental test setup and results. Finally, Section V concludes the paper.

SYSTEM SETUP
The schematic system block diagram of the proposed system is illustrated in Fig. 2(a). M independent pseudo-random binary sequences in the on-off keying non-return-to-zero format is the number of Txs, is generated as the payload. At the packet generator module where preamble and pilot with the lengths of N pr -bit preamble and M-bit pilot are added to the payload, see Fig. 2 (b), to indicate the start of the packet and obtain the channel coefficient matrix at the Rx, respectively. The role of preamble and pilot is to find start of the packet and to obtain channel coefficient matrix in the Rx side, respectively. The gen- , and x l 0 is set to "0". Note that, if the pilot signals are not interference-free, then the channel coefficient matrix will be inaccurate. Therefore, in order to ensure interference-free pilots, for the l-th Tx only the l-th bit is assigned "1" and remaining bits are set to "0" for the pilot signal. The output of the differential signalling block is then used for intensity modulation of the light sources for transmission over the NLOS channel with an impulse response given by [12]: where where R t,l (x, y), R r (x, y) and ψ c (x, y) denote Lambertian patterns of the l-th Tx, reflections from the floor and the angle of incident, respectively. U and V are the number of pixels in the horizontal and vertical directions in the image, respectively. In on the floor, effective area of camera lens, distance from the l-th Tx to dA, and the distance from dA to the Rx, respectively. Moreover, x u−1 , x u , y v−1 (x) and y v (x) are the horizontal and vertical boundaries of an area covered by the pixel (u, v), respectively.
At the Rx, a camera records the intensity modulated light sources (i.e., capturing the changes in the light intensity) by using a discrete-time model as given by [22]: where N = V × U, G N×1 , T exp , T = 1/R f , and Γ(.) denotes the gain of a pixel array, exposure time, inter-frame time, and Heaviside step function, respectively. The camera's sampling rate R f is set to 2R b in order to avoid inter-symbol interference (ISI). The received signal is given as: where H N×M (kT) = H l 1×N (kT) Note that, in cameras, non-linear functions of debayering, down-sampling and gamma correction, which are independent of the input signal, are applied to the RAW images to convert them to the JPEG format as given by: where g(.) is is debayering function, and Y B N×1 are red, green and blue components of the image, respectively. Note that, since typical SLs are phosphor-coated LEDs, in this work we do not consider the colours, therefore following selection of odd/even frames all frames are converted to the grayscale as given by [23]: In order to reduce the complexity of processing of the received data, we apply an m × n binning to Y N×1 (kT) as given by: . . , n, o 2 = 1, . . . , m, 1 1×mn is the matrix of ones and N = N mn . Based on Lyapunov central limit theorem, when mn is a large number, b i is a Gaussian distributed random variable with mean of . Note, following binning the number of pixels cannot be less than the number of Txs.

PROPOSED ALGORITHMS
We propose two detection schemes in order to extract the payload from captured video streams and compare them with the HS/EGC algorithm in [12]. The operations of the algorithms are best described by the flowchart shown in Fig. 3. For the entire captured video, the proposed algorithms, which are first applied to the odd and then to even frames of Y o N×1 and Y e N×1 , respectively have three stages of (i) preamble detection; (ii) pilot detection; and (iii) payload extraction, which are controlled by two flags, i.e., Flag pr and Flag pi . Here, the subscript "pr" and "pi" refer to preamble and pilot, respectively. Note, at the start of the algorithm, both flags are set to "0". In order to reduce the level of ambient light, each frame is subtracted from the previous frame as given by: where At the preamble detection stage, which is the same for all three schemes, since Txs are synchronized and share the same preamble, ∆Y o N ×1 (kT) is averaged as given by: where |.| is the absolute value. This procedure is performed N pr times. S k is then applied to a preamble hard threshold module the output of which is compared with the preamble. When matching is achieved, Flag pr is toggled to "1" and detection proceeds to the next stage. In the following subsections, we describe each of the proposed detection techniques.

A. Frame Subtraction and Channel Inversion (FSCI)
Following preamble detection, H N ×M (kT) is generated for even values of k using pilot bits of the Txs. Since pilots are unique and only with a single bit set to "1" for each Tx, the frame related to each pilot is used to construct each row of the FSCI matrix of coefficients, i.e., H N ×M (kT), and on completion Flag pi is toggled to "1". Finally, in the payload extraction stage, in order to estimate the values of X o M×1 (kT) both sides of (9) are multiplied with H N ×M (kT) −1 . Note, for N = M, H N ×M is rank deficient, thus no inverse matrix. Instead, a pseudo inverse matrix can be used, which is given as [24]: where H H M×N is Hermitian transpose of H N×M . Accordingly, the transmitted signal can be estimated as: However, this approach also leads to noise amplification (i.e., reduced SNR). Finally, ∆X o M×1 (kT) is passed through a hard threshold module th pa to estimate D M×1 . With the upper and lower levels of ∆X o M×1 (kT) being "1" and "0", respectively th d is set to 0.5. This stage is repeated for N d iterations.

B. Channel Inversion (CI)
The CI algorithm is very similar to FSCI except for H + M×N that is applied to Y N ×1 as: X o M×1 (kT) is then compared with th pa and successively subtracted to estimate D M×1 . In CI, the detection is carried out on each received frame. As a result, the output of CI is not affected by defects in frame subtraction due to gamma correction. Note,  the only key difference between CI and FSCI is that, in FSCI, similar to HS/EGC, the inverse matrix of channel coefficients is applied to the subtracted frames, whereas in CI it is applied to each frame directly. In addition, as mentioned the pilot bits create the pattern of the Txs' footprint in the image corresponding to a set of channel coefficients, which are later used to extract the information from the captured video stream at the Rx. In general, however, geometric distortions effects originating from the motion of the users, unfocused images, bad weather conditions, etc. have to be solved within OCC. In the proposed algorithms, as long as the shape of the footprint does not change significantly within the packet duration, the impact of geometric distortions will be negligible as the pilot and payload bits experience the same pattern of light. In [25,26] , different techniques are discussed to migitage the image distortion. Figure 4(a) illustrates the experimental setup for the performance evaluation of the proposed system. A 1066-bit packet composed of payload, pilot, and preamble of 1000, 2 and 64 bits long, respectively was generated in the OOK-NRZ format. Note, for a payload of length 1000, the probability of a 64-bit preamble repeating within in the payload is 5 × 10 −17 , i.e., extremely low [13]. Two Txs each composed of two cheap-on-board LEDs (COBLED) consisting of 48 small LEDs mounted on a heat sink were located on a frame at the height of H t = 2 m above the floor level. COBLEDs on each Tx are spaced apart by 6 cm (see the inset in Fig. 4(a)), and have Lambertian beam profiles with the order of m = 1 and 2/3 in the horizontal and vertical directions, which is equivalent to the viewing angles of ϕ t,h,l = 60 o and ϕ t,v,l = 70 o in the horizontal and vertical directions, respectively. The normalized illumination profile of both Txs are projected on to the floor covered with a white sheet of paper with a reflection coefficient of ∼0.67, where the distance between two Txs D = 1 m, was measured using a lux meter, see Fig. 4 (b). Note, in the case of non-uniform reflection (e.g., wet road, which results in a more shiny surface), there will be very little interference in the NLOS link. Therefore, the problem of interference is inherently resolved. In this work, we use a matt surface to explore the worst-case scenario. A camera (Canon EOS 100D, with a sensor size of 22.3 × 14.9 mm and a Canon EF-S 18-55 mm lens) positioned at a distance L c of 5 m from the centre of the illumination plan on the floor with an elevation angle of 20 o were used to capture reflected lights. For the camera with R f of 60 fps (i.e., a data rate of 30 bps), we set ISO, exposure time T exp and aperture f-stop to 6400, 0.01 s, and f /4, respectively. For each experiment, a 3 minutes long RGB video stream with a 720p resolution was recorded for processing off-line in Matlab. Note that, in the detection process we used 10 binning. A camera-based Rx (with IS) offers a spatial diversity capability given that the pixel size in the IS is in the order of ∼ µm, which is larger than the lights' half wavelength. signals for HS/EGC, CI, and FSCI before th pa , where R b is reduced ten times in order to have 10 samples per symbol. Note, with high levels of interference it is not possible to establish an error-free transmission with HS/EGC. We observe the followings for all eye diagrams: (i) the eye symmetry is maintained thus indicating very little contributions due to channel distortion; (ii) sharp slopes indicate good tolerance to the timing jitter; (iii) the wide eye width for some cases (i.e., reduced interference and noise, thus lower BER); (iv) multi-levels; and (v) higher SINR levels for some cases. As shown in Figs. 5(a), and (b), for HS/EGC and FSCI, there are 7 and 4 levels, respectively, which are best explained with reference to state diagrams shown in Fig. 6. In the case of HS/EGC, L 2 is the main reason that causes error in the data extraction, which happens when Txs have different initial stage and they toggle at the same time. In this situation the signal of one Tx is subtracted from the signal of the other Tx in the overlapping region. In case of FSCI, due to using the channel inversion to recover the data, the impact of interference is mitigated Additionally, for the CI scheme, 3-level shown is based on the states of desired and interfering Txs as outlined in Table 1. Multi-levels observed in the eye-diagrams for CI and FSCI are mainly due to gamma correction, while for HS/EGC it is due to both gamma correction and interference. In the presence of gamma correction, the intensity of light in the overlapping area is not due to superposition of light rays; hence following CI, new levels are evident in the eye diagrams. Figure 7(a) shows the simulated normalized eye-height (i.e., normalized to the widest eye opening w eye,1 /w eye,2 as a function of D/H t for HS/EGC, CI and FSCI and for two Txs with the viewing angle of 60 o . Note that, CI and FSCI have almost flat responses over all D/H t up to 2 m/m, which outperforms HS/EGC. The negative value of the normalized eye opening for HS/EGC shows that L2 is less than L3, hence data cannot be recovered. Figure 7(b) depicts the normalized eye-height against the normalized ambient light illumination level γ (i.e., normalized to the peak illumination level of the Tx 1 ) for the "0" "0" ine L1 "0" "1" ine L3 "1" "0" ine L2 "1" "1"

EXPERIMENTAL RESULTS
ine ine proposed algorithms, P t of 20 dBm, and D/H t = 0.5 m/m. Here as well, HS/EGC display the worse normalised eye-height of < −0.3 compared with the average normalized heights of 0.8 and 0.9 for FSCI and CI, respectively for 0.5 < γ < 4. Note, HS/EGC display an almost flat response and for FSCI and CI algorithms, the normalized heights of the eyes are higher by 0.75 and 0.85 (on average), respectively compared with HS/EGC. For FSCI and CI the increase in the eye width beyond γ = 0.5 is due to the increase in the intensity of low-level pixels, hence the entire image is shifted to the more linear region of the gamma correction curve. Figure 8(a) depicts the measured BER performance as a function of the transmit power for the proposed algorithms and for the ISO levels of 3200 and 6400, T exp = 0.01 s, L c = 5 m and f-stop of f /3.5 with only a single Tx in the absence of ambient light. The results show almost the same BER performance for HS/EGC and CI with a marginal power penalty of 0.1 dB irrespective of the ISO. Note, for ISO of 6400, a 3 dB lower transmit power is required compared with the ISO of 3200 for all schemes at BERs lower than the forward error correction (FEC) limit of 3.8 × 10 −3 . In addition, there is a ∼1 dB power penalty between CI and HS/EGC irrespective of ISO. Finally, Fig. 8 (b) demonstrates the BER as a function of the transmit power for the three schemes and for the link spans of 5 and 10 m, ISO of 6400, T exp of 0.01 s and f-stop of f /4 displaying a same profile as in Fig.  8 (b). Note, for a link span of 5 m the transmit power level is 3 dB lower compared with the link span of 10 m at the BER below the FEC limit for all schemes.

CONCLUSION
In this paper, we proposed two algorithms based on CI for NLOS MIMO OCC systems for interference cancellation and for extracting the payload in the presence of gamma correction, noise, and interference. We showed that, the proposed algorithms tolerate higher levels of interference compared with HS/EGC. For the OOK-NRZ signalling format, we showed that the eye diagrams displayed multiple-levels due to gamma correction and interference. We also showed that for a higher transmit power level of 20 dBm CI-based algorithms outperformed HS/EGC because of higher tolerance to the ambient light and a ratio of spacing to the height of Txs (i.e., 0.25 m/m). However, for lower transmit power levels (i.e., 6 dBm < P t < 13 dBm) HS/EGC offered almost the same BER performance as FSCI (i.e., below the FEC limit). Moreover, HS/EGC showed ∼ 1.1 dB worse performance compared with CI due to the increased level of noise as a result of frame subtraction. Although FSCI showed ∼ 1.2 dB power penalty compared with CI, th pa in this scheme was fixed to 0.5. Compared with HS/EGC, which is limited to few scenarios, the proposed CI and FSCI algorithms offered higher tolerance to the spacing between Txs, but at the cost of increased computation time (i.e., three times for 10 binning) for obtaining the inverse channel matrix. The work presented in this paper considered a relaxed situation of the static communications environment. However, in practice, mobility and the issues it cause such as near-far problem [27] should be considered, which will be the subject of our future works.