Watching the watchers: camera identification and characterization using retro-reflections

Trevor Seets; Alec Epstein; Andreas Velten

doi:10.1364/OE.520545

1. Introduction

Retro-reflections (RRs) or cat-eye reflections occur when a focused imaging system reflects light directly back to the light source. Because light between the RR and the light source can be collimated and therefore lose little intensity during propagation for distances below the Raleigh range, RRs can be used for sensing across very large distances even with only moderately sized probe beam widths. Small RRs placed on the moon, for example, can be probed using an eye-safe laser power from a telescope on earth, over 400,000 kilometers away. In recent years, there has been a lot of work using retro-reflections in security and privacy settings to detect hidden spy cameras [1–8] or optical sights [9–11], where a light source illuminates a scene and the resulting RRs from hidden cameras or other small optical systems [12,13] are subsequently detected and localized. In this work, we wish to expand the scope of uses for RR to more than just detection by asking the question "what other information is contained in RRs?" We find that for many target cameras the shape of a RR can give us information on many imaging parameters such as where a camera is looking, the focal length used, and even the depth at which the camera is focused, while also being able to classify cellphone models. Our work has applications in control, privacy, identification, and image validation.

Our RR probe, shown in Fig. 1, consists of a light source, either a laser or LED [14,15], and camera with a shared optical path. First, we send light into the scene, where it will be imaged by a target camera system. Some of the imaged light will reflect off the target camera sensing array and then it will be re-focused by the target camera’s lens back along the input light path in a RR. In general, RRs are quite bright compared to standard diffuse reflections because the return light is well-collimated due to the focusing nature of the target camera’s lens so very little light is lost on the return path. The intense brightness of RRs have made them useful in detecting small hidden cameras or cameras at long distances. However, we find that RRs also contain rich information about the imaging parameters of a target camera. For example, if the target camera rotates or changes focus, changes appear in the RR allowing us to accurately predict where a target camera is looking and at what depth it is focusing. We also see signals related to the structure and the Bidirectional Reflectance Distribution Function (BRDF) of the sensor and filters in the optical path.

Fig. 1. Retro-Reflection Imaging: In retro-reflection imaging, (a) first the probe emits light into a scene, then (b) a target camera images this light onto a sensor array. Then some light reflects off of the sensor array and returns through the target aperture and lens. This light returns directly to our probe in a retro-reflection (RR), we use a beam splitter to image the RR onto a camera in our probe. (d) The RR contains information about the target camera such as its rotation allowing us to predict the view of a camera using a neural network.

Download Full Size | PDF

In this paper, we conduct a study of RRs by combining existing diffraction-based models from Gong et al. [16,17] and Liu et al. [18] to include rotation effects of the target imaging system. We find the resulting simulation pipeline generates results matching our measurements for thin lens systems. However, for more complex imaging systems, we find the RRs differ significantly from a thin lens model, so we study more complex RRs with a data-driven approach. We capture two datasets with commercial lens systems, one containing sets of RRs and rotation angles and the other sets of RRs and focal distances. We use these datasets to train two models taking as input a RR and predicting the corresponding parameter from each dataset. The first type of model is a linear model based on the popular lens aberration basis, Zernike polynomials, this model shows that changes in the target camera can be understood as lens aberrations in the RR. Our second model is a standard convolutional neural network (CNN) that pushes the performance and performs well on both datasets. The resulting models are able to accurately perform gaze tracking to predict where a target camera is looking such as in Fig. 1.

We finally study RRs from different cell phones for gaze tracking and classification. We find that the RRs are unique between different models of cell phones allowing for very accurate classification. We also find that cellphone gaze tracking is possible using RRs albeit for some phone models over a small angular range. Additionally, we find that for many cellphones the RR is only present when the cellphone is actively imaging (e.g. camera app is open), allowing us to find when a cellphone is actively recording and what it is looking at. This could be used to validate images by providing outside proof of where and when a photo was taken.

2. Toy simulation model

Here we introduce a toy simulation model for the thin lens case in order to gain intuition on the RR problem. Our simulation model is adapted from Gong et al. [16,17] and Liu et al. [18] with an additional simulation step to account for the rotation of the lens at the exit of the target aperture. This step aligns the returning light from the target camera with the optical axis, allowing for simulation of larger target rotations without the beam leaving the simulation bounds and without greatly increasing the computation time that would be required to expand the bounds.

The unrolled optical path of our simulated geometry is shown in Fig. 2. First, we generate the illumination field $U_1(x,y,z=0)$ at the target aperture which we assume can be well approximated by a plane wave,

(1)$$U_1 = e^{iky\sin{\theta}}$$

where $k=\frac {2\pi }{\lambda }$ is the wave number, $\theta$ is the lens rotation off of the optical axis, and for notational simplicity, we have dropped the spatial dependencies of $U_1$. Because our system is rotationally symmetric we only need to consider the magnitude of the rotation and define the coordinate system such that $y$ is the direction of this rotation. Next, we model $U_1$ going through the target lens as a circular masking operation followed by multiplication with a quadratic phase exponential,

(2)$$U'_1 = U_1 \text{Circ}_r(x,y) e^{{-}i\frac{k}{2f_{t}}(x^2+y^2)}$$

where $r$ is the aperture radius of the target, $\text {Circ}_r(x,y)$ is a binary circle mask of radius $r$ centered at $(x=0,y=0)$ and $f_t$ is the target focal length. We then propagate $U'_1$ to the target’s sensor array,

(3)$$U_2 = R[f_t+x_i]\big(U'_1\big)$$

where $x_i$ is the distance from the focal plane to the sensor array and $R[d](U)$ is the free space propagation operator of U a distance $d$. We treat the sensor array as a reflective surface with a mask, $\text {Arr}(x,y)$, for example, $\text {Arr}(x,y)$ could be an arrayed pixel system or microlens array. Therefore $U_2$ is then modulated by the mask $\text {Arr}(x,y)$. Then in the unrolled version, the result propagates another distance of $f_t+x_i$ before exiting the target lens. These operators give,

(4)$$U_3 = R[f_t+x_i]\big(Arr(x,y) U_2\big)$$

(5)$$U'_3 = U_3 \text{Circ}_r(x,y) e^{{-}i\frac{k}{2f_{t}}(x^2+y^2)}$$

where $U'_3$ is the field exiting the target imaging system. If we propagate $U'_3$ it will be moving at an angle of $\theta$ with respect to the $z$-direction and is not perpendicular to our imaging system. In order to deal with this we need to rotate $U'_3$ by $\theta$ in order to align it with the new propagation axis. We use the method in [19] to get a rotated field $U_4$, the method is based on direct integration in the rotation dimension and a frequency domain approach along the non-rotation axis, resulting in significant speed improvements compared to a full integration method.

Fig. 2. Unrolled Light Path: This figure shows the unrolled light path in our problem. First, the target lens is illuminated with a rotated plane wave. The light then travels to the sensor array at $f_t+x_i$. The sensor array modulates the intensity of light following its pattern. Then we propagate the result another $f_t+x_i$. The light then leaves the target lens, which we rotate back to be in line with the optical axis of the probe system. The result is then propagated to our probe which images the final retro-reflection (RR).

Download Full Size | PDF

Finally, $U_4$ is propagated a distance $d$ and imaged by our infinity focused probe. The resulting field on our imaging sensor, $U_5$, is given by,

(6)$$U_5 = R[f_p]\Big(e^{{-}i\frac{k}{2f_{p}}(x^2+y^2)}R[d]\big(U_4\big)\Big)$$

where $f_p$ is the focal length of our probe. The final diffraction pattern is given by $I(x,y) = |U_5(x,y)|^2$. We use the python package Light Pipes implementation [20] for the $R[d]$ operator and our own implementation of the method in [19] for rotating the field.

Liu. et al [18] found that RRs can occur from both the sensor array and a filter above the array. This is easy to account for in our model by simply adding the result of multiple forward passes, one that accounts for each optical element.

Distance Invariance: With our simulation model we find that the results are invariant to $d$, as long as diffraction from the target to d does not create a spot larger than the probe aperture. This allows us to speed up simulations by using smaller values of $d$. However, experimentally we find that only specular RRs are distance invariant and if there are non-specular components in a reflection they may change size with distance.

Measuring $x_i$: We avoid the difficulties in directly measuring $x_i$ by instead measuring the object distance, $x_d$, of the target lens. $x_d$, and $x_i$ are related by the lens maker’s formula,

(7)$$\frac{1}{f} = \frac{1}{x_d}+\frac{1}{f+x_i}.$$

3. Illumination modes

We identify 3 illumination modes for probe setups based on the divergence of our illumination source that have different tradeoffs. The illumination divergence angle controls how quickly the illumination beam expands, and mainly affects the illuminated field-of-view (FoV) and the signal strength at the target. The three modes are shown in Fig. 3.

Fig. 3. Illumination modes: This figure shows the three illumination modes. (a) The collimated mode uses a collimated illumination that maintains a fixed area and signal level with distance. (b) The diverging mode covers a larger area at the cost of signal density as distance increases. (c) The fixed area mode adapts to a predicted target distance by changing the illuminations divergence angle. This mode allows for larger fixed areas than the collimated mode.

Download Full Size | PDF

(a) Collimated mode: The first mode is the collimated mode, where the illumination light source is collimated. This mode, within the Rayleigh range (12km with a beam radius of 5cm), has distance invariance intensity response due to the collimation of both the illumination light and return light; however, in this mode the illumination light only covers a FoV equal to the aperture of the probe. This is the mode analyzed in our simulations.

(b) Diverging mode: The second mode seeks to expand the FoV without increasing the complexity or size of the probe. To do this we simply defocus the illumination source, so the illumination beam expands with distance. This mode leads to a larger FoV due to a larger illumination area but it spreads out the illumination power as a function of distance, leading to a distance squared drop off in the RR intensity. We use this mode in our experiments to have a larger FoV and experimentally find that the divergence angle has very little impact on the shape of the returning RRs (including the Collimated Mode).

(c) Fixed area mode: In the final mode, the illumination source divergence is adjusted based on expected or measured target distance to always cover the same area at the target distance. This leads to a system that can observe an area in the scene with signal strength that is independent of distance but allows to observe much larger areas with a limited detector. This mode adds complexity to the probe setup but allows for finer control of the FoV and returning intensity.

Finally, it is possible to combine multiple probes with diverging beams and overlapping FoVs to create signals that overall remain constant with distance in SNR and illumination flux throughout a desired volume. A target at short distance would only be visible to one probe, but generate a stronger signal while a target at larger distances would create weaker signals in an individual probe, but visible to more probes. Using these tools it is possible to create probes with constant SNR and illumination power over any desired scene volume; for example, the audience of a movie theater using a set of probes above the view screen.

4. Experiments

We use a 640nm laser, $f_p=200$mm, and vary the parameters of our target camera. First, we verify our model with a camera with a simple lens (Thorlabs LA1509-B) that fits our thin lens approximation, then move on to commercial photography lenses with more complicated aberration profiles. Due to the complex nature of modern lens designs, our thin lens assumption no longer holds and we use a data-driven approach in order to show the ability of our system to predict different imaging parameters of the target system. Finally, we switch to a 785 nm laser to image 11 different cellphone cameras as the target and use the reflections to classify the phone. Unless otherwise noted the target camera is located $5.5$m away from the probe for all measurements and the probe is used in the diverging illumination mode.

4.1 Thin lens experiments

To verify our diffraction model we start with a target consisting of a machine vision sensor (STC-MBA5MUSB3) and a thin $f_t=100$mm Thorlabs lens located $5$m away from the probe. We simulate and experimentally image the RR at $\theta =[0,0.5,1]$ degrees, and $x_d=[1,2.5,5]$m. Both results are shown in Fig. 4, the shape of the experimental RRs closely matches the simulations albeit with fewer details.

Fig. 4. Thin lens experiment: We measure and simulate RRs of a thin $100$mm lens focused at different object distances and at different rotations downward. Our simulations and experiments match fairly well.

Download Full Size | PDF

We notice a few patterns in Fig. 4 as the target’s rotation changes, when the target is looking directly at the probe an airy disk appears. As the target rotates away from the probe at small rotations (i.e. $0.5$ degrees) 2 airy disk-like patterns appear that are symmetric across the line perpendicular to the rotation and as the rotation increase the patterns get further apart. Finally, at large rotations, a long ellipse often forms in the direction of rotation. Next we notice patterns as the target changes it’s object distance (focus), we notice that the size of the RR is larger when $x_d$ is small, this is due to the target system not focusing on the laser light so the illumination is large on the sensor leading large reflections and RRs shaped like the target aperture. As the target focuses closer to infinity ($x_d$ increases) the RR becomes smaller.

We repeat this experiment for $f_t = 75,50$mm. At these focal lengths, our simulation begins to break down, likely due to our thin lens assumption; however, the general shapes are the same just not at the exact same parameters predicted by the simulation model. For completeness, the $f_t = 75,50$mm results are included in the Supplement 1.

4.2 Effect of distance

We conduct an experiment in the diverging illumination mode to see the effect of target distance from our probe. From our simulations, we expect that the shape of the RR will remain mostly constant. Also, because the returning light from the target is relatively collimated we expect any decrease in RR intensity to be explained by the increase in the illumination beam area due to distance ($\propto \frac {1}{d^2}$). In order to test this, we capture a head-on ($\theta =0$) RR for a camera mounted with a commercial lens (Fujinon DV3.4x3.8SA-1) at distances from $0.5$m to $5.0$m in $0.5$m increments, our results are summarized in Fig. 5. To find the intensity of a RR, we sum all pixels above the noise floor for each image. We measure our beam expansion to be half a degree and calculate the beam area at each distance. We find the intensity drop-off and RR shape match our expectations. Because the RR shape does not change much with distance, as long as enough signal photons are collected, results from our prediction models will be robust to changes in target distances.

Fig. 5. Experimental distance behavior: We measure the effect of distance for a head-on RR in the diverging illumination mode. The RR shape does not change much while the intensity follows the expected decay from an expanding beam. Images have different tone mappings to better show the RR shape.

Download Full Size | PDF

4.3 Specular vs. diffuse reflections

We investigate the difference between a specular and diffuse reflection at the target sensor plane. We use a commercial lens (Fujinon DV3.4x3.8SA-1) as the target lens and place a mirror (specular) or white cardboard (diffuse) at the sensor plane. We find that the type of reflection plays a role in the shape and behavior of the RRs. Example images of the diffuse and specular reflection are shown in Fig. 6. Diffuse reflections are larger and take the shape of our probe aperture and scale in size with $d$ while specular RRs are brighter and constant in size over $d$. This behavior could potentially be used to help probe the reflectance profiles of sensors at different wavelengths possibly helping with target classification.

Fig. 6. Specular vs. Diffuse RRs: Specular and diffuse sensor plane surfaces produce different RRs.

Download Full Size | PDF

4.4 Commercial lens datasets

Commercial lens designs are often systems of lenses and break the thin lens assumption used in our simulation model, so to deal with these lenses we opt for a data-driven approach. Our goal is to collect a dataset of RRs from a commercial lens along with the corresponding imaging parameters ($f_t,x_i,\theta,\phi$). For ease of capture and to reduce the total data requirements, we split our imaging parameters into two different datasets. In the first dataset "Focusing," we use a variable focal length Canon lens to investigate the behavior of $f_t$ and $x_d$. In the second dataset "Rotation," we use a large FoV C-mount lens (Fujinon DV3.4x3.8SA-1) to investigate the lens rotation parameters $\theta$ and $\phi$.

Rotation dataset: The rotation dataset is made up of 7 $100$ second videos at 15 FPS using a machine vision camera (STC-MBA5MUSB3) for both the probe and target camera. The two cameras operate simultaneously at the same frame rate and we sync the videos after capture. The target camera is focused on the probe, and the probe laser is visible to the target camera so the location of the focused probe light will be a small spot in the target video. The ($x,y$) location of the bright laser spot will correspond to a unique ($\theta,\phi$) pair; for ease, we use ($x,y$) as a stand-in for ($\theta,\phi$) in our models as the prediction target.

Figure 7(a) shows an example of the RR as the camera looks upward. Similar to the thin lens, when the target camera looks directly at the probe an airy disk pattern is visible. As the camera is rotated the airy pattern becomes elliptical and a brighter line appears perpendicular to the rotation direction. Finally, at large rotations, a strong central ellipse is present along with a dim "$+$" pattern.

Fig. 7. Select dataset images: This figure shows a few examples from our two datasets. (a) shows an example of the RR as the target rotates upwards. (b) shows the RR as the target focuses further away ($x_d$ increases) for fixed $f_t$. (c) shows the effects of decreasing focal length ($f_t$) on the RR for a fixed $x_d$.

Download Full Size | PDF

Focusing dataset: In order to evaluate the effects of $f_t$ and $x_d$ we use a Canon Rebel T5 camera with a Canon EF 18-135mm lens. Instead of explicitly finding $x_i$ we capture a dataset with the target camera focused at different distances $x_d$ away. We vary $f_t = [50,85,135]$mm, for each $f_t$ we focus the camera using autofocus at $x_d=[3,3.5,4,4.5,5,5.5]$m. Then for each combination of $f_t$ and $x_d$ we capture approximately a 1-minute probe video with the hand-held target camera being rotated and moved. Figure 7(b) shows an example image from the Focusing dataset with $f_t=135$mm at increasing $x_d$, at this $f_t$ we see the RR becomes smaller and closely approaches a point as the object distance gets further away.

4.5 Cell phones

Cellphones are particularly challenging due to their small aperture, and complex optics designed to remove many aberrations that allow us to distinguish larger lenses. For many modern cell phones, the RR is just simply a spot at 640 nm with very little variation as the phone rotates. To allow rotation tracking of cell phones we use a 785 nm laser instead which many cellphone camera paths filter out with IR cut filters near the sensor. The IR cut filters may not be placed at the focal plane of the cameras so this will introduce a slight defocus (i.e. non zero $x_i$) resulting in more unique reflections from cell phone cameras. Because our probe camera now captures IR light we also capture the IR lasers used in the phone LiDARs of newer models giving another useful feature. We notice a strong difference between many phone models in the resulting reflections, an example from each phone model is shown in Fig. 8.

Fig. 8. Phone RRs: This figure shows representative examples of the RRs from 11 phone models. Since most modern phones combine multiple cameras, the RR contains information from each camera’s lens in the phone’s camera cluster. Some phone models contain strong specular RR spots such as the iPhone XR and the Samsung Note 8.

Download Full Size | PDF

We capture a dataset of 11 phones as the target camera using the same procedure as the Rotation Dataset. We then predict the phone model and the phone rotation from the RRs. Many cell phones have a sharp RR signal dropoff with rotation, so the majority of our phone rotation data was confined to the center $50{\%}$ of the camera FoV.

In phones, we notice strong RRs from early optical elements, such as the IR filter, that exhibit distance-dependent sizes from diffuse reflections and a dimmer spot that is usually only present when the phone camera is actively capturing. This spot is most obvious in the iPhone XR and Samsung Note 8 in Fig. 8 and is possible to see in other phone models by toggling the camera app causing the spot to flash. These elements may be wavelength dependent so using a system with multiple optimized wavelengths could provide additional information for many camera models. For example, we notice that the specular RR that is visible when the camera app is open is more visible at 640nm than at 785nm for iPhone models while Google Pixel models show a stronger diffuse RR at 785nm.

5. Parameter prediction

We train two models, a linear model based on Zernike Polynomials and a CNN.

Zernike model: We first test our intuition that lens abberations are useful RR features for parameter prediction. We do this by using the popular Zernike polynomials [21–24], which are an orthogonal basis on the unit circle often used in optics to distinguish and classify aberrations. If our intuition is correct a simple model based on Zernike polynomial features should provide reasonable performance.

We extract the Zernike features at different scales centered on our RR. We center the RR by finding the brightest point in a smoothed RR image, then crop a region of interest around this point. We then create square crops ($s\times s$) at 4 different scales, $s=100,200,300,400$. Let $a^s_{m,n}$ and $b^s_{m,n}$ be the positive and negative Zernike polynomials, respectively, at scale $s$. We calculate the coefficients on the square root intensity of the crops up to $n=12$ for all $m \leq n$. We use the python implementation from Antonello and Verhaegen [22] to calculate the Zernike polynomial coefficients. We finally take the square and $3^{rd}$ power of the coefficients to create our final feature vector $\psi =\{ (a^s_{m,n})^p, (b^s_{m,n})^p : s\in [100,200,300,400], 0\leq m \leq n \leq 12, p\in 1,2,3\}$. We then use Scikit-learn’s implementation of Ridge regression [25,26] ($\alpha =10^{-6}$) with features, $\psi$, to fit our datasets to get our final Zernike models.

Neural network: To further improve the performance of our predictions we use Res-Net-18 [27]. We hope that Res-Net-18 is able to pick up on more features than our simple linear model; for example, due to imperfections in our beam splitter, our input light has a dim secondary component at a slightly different angle. This secondary component leads to a dimmer RR at a slightly different illumination angle that the Res-Net-18 model may be able to use.

5.1 Model results

We train both models on the rotation and focusing dataset using the same train/test split, for the cellphone dataset we only use the Res-Net-18 model due to multiple cameras in the cellphones. For each video in a dataset we use the first $80{\%}$ of the video for training and the last $20{\%}$ for testing.

5.1.1 Commercial lens datasets

For each model, we compute the mean absolute error (MAE) and root mean square error (RMSE) between the model predictions and ground truth.

Rotation: For the Rotation dataset our models predict the pixel location where the probe laser will appear in the target frame, giving us an error in pixels, we normalize this value by our target cameras field of view in pixels. So a RMSE of $0.12$ indicates that our prediction was on average within a ball with a radius equal to $12{\%}$ of the field of view of the camera.

Figure 9 shows the results of the predictions of the Res-Net-18 model on the rotation dataset. Figure 9(a) displays our predictions as well as the ground truth rotation in a section of a test video on top of the FoV of the target camera centered at the probe location along with 3 RR showing the change as the target camera moves. Figure 9(b-c) displays the X and Y normalized coordinate prediction and ground truth over the entire testing set, in general, we notice that the predictions are quite close however there are occasional outliers most likely due to lower signal in certain frames.

Fig. 9. Rotation prediction results: (a) displays the path of the target camera’s center of FoV and corresponding Res-Net-18’s prediction over time encoded by the color of the line. The background image represents the FoV of the target camera. Three select RRs are also shown on the left side of the image as the target camera looks downward the RRs have large changes. (b-c) show the X and Y Res-Net-18 predictions and ground truth over the entire training set. The predictions are in general close with a few noticeable outliers.

Download Full Size | PDF

To examine how the target gaze affects our error rates, we calculate the distance the laser peak, $(x,y)$, is from the center of a frame, giving us a measure of how far away from head-on our target camera is looking. Figure 10 shows how our model error changes as the target rotates away from head-on. We find that as the target looks further away from the probe our model error increases, this could be due to less signal at more extreme angles or because the spot changes less at larger angles. Interestingly, the difference between the two models increases at moderate angles while at small angles or extreme angles the difference is less. This implies that the CNN is able to pick up on more features in the moderate angles than the Zernike model is able to capture. Note, the error for both models near the edge of the FoV (0.5 normalized units) is close to half of the full FoV equivalent to random guessing.

Fig. 10. Errors in rotation dataset: This figure shows our error rate (MAE) as our target camera looks further away from the probe in normalized units of the fraction of the field of view (to match Fig. 9).

Download Full Size | PDF

Focusing: The results of both models’ predictions for the focal length, $f_t$, and the object distance $x_d$ are shown by testing set image number in Fig. 11. We then calculate the error for both $f_t$ and $x_d$; the results are summarized in Table 1. Unsurprisingly, Res-Net-18 outperforms our Zernike model for both target parameters.

Fig. 11. Focusing dataset prediction results: The results of the prediction models are shown for (a) 6 varying object distances ($x_d$) from 3 to 5.5 meters, where the camera is focused to a distance $x_d$, and for each $x_d$ (b) 3 different focal lengths ($f_t$) are tested. The Res-Net-18 model successfully predicts the different combinations of $x_d$ and $f_t$.

Download Full Size | PDF

Table 1. This table shows the errors of our models on different datasets, we report errors as MAE/RMSE for each task with both the Zernike and Res-Net-18 models. Rotation errors are normalized to the target camera’s field of view.

View Table | View all tables in this article

To further understand where focus prediction errors occur, we create a heatmap of the MAE for our Res-Net-18 model versus the true $x_d$ and $f_t$. The heatmap is shown in Fig. 12, we find that the model struggles the most at low $f_t$. For $x_d$ errors, this could be explained by the observation that lower $f_t$ have larger depth of fields, making it harder to predict a specific $x_d$.

Fig. 12. Errors in focusing dataset: This figure shows a heatmap of errors (MAE) for our Res-Net-18 model trained on the Focusing dataset. We show the errors for both the predicted object distance ($x_d$) (top) and the focal length ($f_t$) prediction (bottom) versus the true object distance and focal length. We find the model has the highest errors at the shortest focal length and far object distances.

Download Full Size | PDF

5.1.2 Cellphone results

Due to the fact that many phone cameras have multiple lenses the simple Zernike model no longer holds, so we only train the Res-Net-18 model on the cellphone dataset. Similar to the Rotation dataset we predict the rotation of the cellphone. We also predict the cellphone model from one of the 11 cellphones we used. The results for these tasks are summarized in 1. We find that the problem of classifying the target phone model is quite easy as Res-Net-18 is able to predict correctly with 99${\%}$ top-1 accuracy on the testing set. However, rotation prediction is more difficult likely due to lower signal levels and fewer lens aberrations but we are able to achieve error rates similar to the commercial lens dataset. We use the same normalized rotation error value as before, and found that Res-Net-18 achieves a MAE of $0.095$ corresponding to half a meter of error for the Google Pixel 6a at $5.5\text {m}$. We found that the target phone’s model plays an important role in achievable accuracy, the rotation errors by phone model are summarized in Table 2. The highest error model is the Samsung Note 8 which has low signal levels of the diffuse RRs as seen in Fig. 8, whereas the lowest error phone is the iPhone 12 Pro Max which has two clear diffuse RRs in Fig. 8. The Google Pixel models have the strongest signals but are middle of the pack in terms of performance; however, the Google Pixel models exhibit only one clear diffuse RRs as opposed to many of the iPhone models where two diffuse RRs are visible. Likely, a combination of signal levels and specific RR properties (e.g. one or two diffuse RRs) play a key role in rotation prediction accuracies between phone models.

Table 2. This table shows the rotation error (MAE/RMSE) of the Res-Net-18 models prediction of cellphone rotation for each of the phone models used in our experiments. We find performance is dependent on the specific phone model used.

View Table | View all tables in this article

The main drawback of using this method to predict phone camera gaze is the lack of signal exhibited by certain phone models at large angles restricting the usable field of view to the middle $50{\%}$. This problem could possibly be mitigated through a careful selection of laser wavelength. It may also be possible to use many probes nearby each other that work together to cover a wide range of angles and could increase the overall accuracy of the method by combining results from each probe.

6. Discussion

In this work, we studied retro-reflections (RR) of target cameras and used RRs to predict imaging parameters. We found that RRs contain aberrations that can be used to predict a target camera’s gaze, focal length, and depth of focus, as well as classify phone camera models. We only began to scratch the surface of all that can be done with RRs; for example, spectral content could be exploited with a change in laser wavelength. One idea is to probe the image sensor’s spectral reflectance at multiple wavelengths to possibly distinguish specific sensor chips. The choice of laser wavelength(s) will be crucial for tasks involving cellphones, likely many wavelength optimizations can be done to improve performance over a population of cellphones; however, this may be difficult without optical element information from phone manufacturers. Another possible source of new features is to change the divergence of the laser to sweep a focused spot across specific slices of a target’s optical path, probing the target along the optical axis.

Applications: This technology has applications in control, privacy, identification, and image validation. A RR sensing system could be used to create a network-free controller only requiring users to have a camera. Collecting user input remotely would usually require installation of software and would consume network resources which is avoided when using RR to determine camera view angle. For example, in a museum users could actively control an exhibit with a cellphone camera without connecting to a network, which provides a smoother user experience. Mass input from many users could be collected at performances where the audience is asked to take part in a vote by pointing the phone camera at a certain location. RR systems could be used for privacy, they could be used to detect cameras in a scene or determine if a camera is pointed at an object or person, whether pictures were taken, and estimate what the taken photograph will look like to help identify it when it appears elsewhere. This could be used to verify the authenticity of a picture, which is becoming a growing concern with the growth of generative AI. RRs contain information on the specific camera model so they could be used to identify which camera was used to take certain images which could be used in criminal or civil investigations. Another application could be protection of sensitive infrastructure of copyright protection; for example, RRs could be used to detect videos being captured in a movie theater.

Funding

McPherson Eye Research Institute (Expanding our Vision); National Science Foundation (CAREER-184688, DGE-1747503).

Acknowledgements

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1747503, CAREER Grant No. 184688, and McPherson Eye Research Institute "Expanding our Vision" grant. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Support was also provided by the Graduate School and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison with funding from the Wisconsin Alumni Research Foundation, and support was provided by the University of Wisconsin-Madison, Discovery to Product (D2P) with funding from the State of Wisconsin.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [28].

Supplemental document

See Supplement 1 for supporting content.

References

1. S. He, Y. Meng, and M. Gong, “Active laser detection system for recognizing surveillance devices,” Opt. Commun. 426, 313–324 (2018). [CrossRef]

2. L. Li, J. Ren, and X. Wang, “Fast cat-eye effect target recognition based on saliency extraction,” Opt. Commun. 350, 33–39 (2015). [CrossRef]

3. F. Qian, B. Zhang, C. Yin, et al., “Recognition of interior photoelectric devices by using dual criteria of shape and local texture,” Opt. Eng. 54(12), 123110 (2015). [CrossRef]

4. C. Liu, C. Zhao, H. Zhang, et al., “Design of an active laser mini-camera detection system using cnn,” IEEE Photonics J. 11(6), 1–12 (2019). [CrossRef]

5. C. Liu, C. Zhao, H. Zhang, et al., “Spectrum classification using convolutional neural networks for a mini-camera detection system,” Appl. Opt. 58(33), 9230–9239 (2019). [CrossRef]

6. J. Huang, H. Zhang, L. Wang, et al., “Improved yolov3 model for miniature camera detection,” Opt. Laser Technol. 142, 107133 (2021). [CrossRef]

7. D. Svedbrand, L. Allard, M. Pettersson, et al., “Optics detection using an avalanche photo diode array and the scanning-slit-method,” in Technologies for Optical Countermeasures XVI, vol. 11161D. H. Titterton, R. J. Grasso, M. A. Richardson, eds., International Society for Optics and Photonics (SPIE, 2019), pp. 167–177.

8. S. Sami, S. R. X. Tan, B. Sun, et al., “Lapd: Hidden spy camera detection using smartphone time-of-flight sensors,” in Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, (Association for Computing Machinery, New York, NY, USA, 2021), SenSys ’21, p. 288–301.

9. A. L. Mieremet, R. M. A. Schleijpen, and P. N. Pouchelle, “Modeling the detection of optical sights using retro-reflection,” in Laser Radar Technology and Applications XIII, vol. 6950M. D. Turner and G. W. Kamerman, eds., International Society for Optics and Photonics (SPIE, 2008), p. 69500E.

10. C. Lecocq, G. Deshors, O. Lado-Bordowsky, et al., “Sight laser detection modeling,” in Laser Radar Technology and Applications VIII, vol. 5086G. W. Kamerman, ed., International Society for Optics and Photonics (SPIE, 2003), pp. 280–286.

11. M. Auclair, Y. Sheng, and J. Fortin, “Identification of targeting optical systems by multiwavelength retroreflection,” Opt. Eng. 52(5), 054301 (2013). [CrossRef]

12. A. Leal-Junior, L. Avellar, A. Frizera, et al., “Smart textiles for multimodal wearable sensing using highly stretchable multiplexed optical fiber system,” Sci. Rep. 10(1), 13867 (2020). [CrossRef]

13. A. Leal-Junior, L. Avellar, V. Biazi, et al., “Multifunctional flexible optical waveguide sensor: On the bioinspiration for ultrasensitive sensors development,” Opto-Electron. Adv. 5(10), 210098 (2022). [CrossRef]

14. K. Wu, H. Zhang, Y. Chen, et al., “All-silicon microdisplay using efficient hot-carrier electroluminescence in standard 0.18um cmos technology,” IEEE Electron Device Lett. 42(4), 541–544 (2021). [CrossRef]

15. K. Xu, “Silicon electro-optic micro-modulator fabricated in standard cmos technology as components for all silicon monolithic integrated optoelectronic systems,” J. Micromech. Microeng. 31(5), 054001 (2021). [CrossRef]

16. M. Gong, S. He, R. Guo, et al., “Cat-eye effect reflected beam profiles of an optical system with sensor array,” Appl. Opt. 55(16), 4461–4466 (2016). [CrossRef]

17. M. Gong and S. He, “Periodicity analysis on cat-eye reflected beam profiles of optical detectors,” Opt. Eng. 56(5), 053110 (2017). [CrossRef]

18. C. Liu, C. Zhao, H. Zhang, et al., “Analysis of mini-camera’s cat-eye retro-reflection for characterization of diffraction rings and arrayed spots,” IEEE Photonics Journal PP, 1–1 (2019).

19. J. Stock, N. G. Worku, and H. Gross, “Coherent field propagation between tilted planes,” J. Opt. Soc. Am. A 34(10), 1849–1855 (2017). [CrossRef]

20. FredvanGoor, guyskk, and jmmelko, “Lightpipes,” https://github.com/opticspy/lightpipes/ (2019).

21. J. Wyant and K. Creath, “Basic wavefront aberration theory for optical metrology,” Appl Optics Optical Eng11, 1 (1992).

22. J. Antonello and M. Verhaegen, “Modal-based phase retrieval for adaptive optics,” J. Opt. Soc. Am. A 32(6), 1160–1170 (2015). [CrossRef]

23. V. Lakshminarayanan and A. Fleck, “Zernike polynomials: A guide,” J. Mod. Opt. 58(18), 1678 (2011). [CrossRef]

24. W. J. Tango, “The circle polynomials of zernike and their application in optics,” Applied physics (1997).

25. F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research 12(85), 2825–2830 (2011).

26. A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics 42(1), 80–86 (2000). [CrossRef]

27. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 770–778.

28. T. Seets, A. Epstein, and A. Velten, “Watching the watchers: camera identification and characterization using retro-reflections - Dataset.” DRYAD, (2024), https://doi.org/10.5061/dryad.6t1g1jx64.

Method	Rotation	$f_{t}$ (mm)	$x_{d}$ (m)
Zernike Model	0.23/0.22	8.89/12.1	0.344/0.448
Res-Net-18	0.12/0.12	0.30/1.45	0.057/0.112

Phone Model	Rotation Error
iPhone XR	0.076/0.072
iPhone 10	0.061/0.052
iPhone 12	0.175/0.192
iPhone 12 Pro Max	0.027/0.026
iPhone 13	0.087 / 0.091
Google Pixel 6	0.104/0.122
Google Pixel 7	0.08 / 0.074
Samsung Note 8	0.186/0.21
Samsung SE F20	0.103/0.109
Samsung G7 Prime	0.056/0.05
Samsung A12	0.075/0.072
All Cellphones	0.095/0.11

Method	Rotation	$f_{t}$ (mm)	$x_{d}$ (m)
Zernike Model	0.23/0.22	8.89/12.1	0.344/0.448
Res-Net-18	0.12/0.12	0.30/1.45	0.057/0.112

Phone Model	Rotation Error
iPhone XR	0.076/0.072
iPhone 10	0.061/0.052
iPhone 12	0.175/0.192
iPhone 12 Pro Max	0.027/0.026
iPhone 13	0.087 / 0.091
Google Pixel 6	0.104/0.122
Google Pixel 7	0.08 / 0.074
Samsung Note 8	0.186/0.21
Samsung SE F20	0.103/0.109
Samsung G7 Prime	0.056/0.05
Samsung A12	0.075/0.072
All Cellphones	0.095/0.11

Watching the watchers: camera identification and characterization using retro-reflections

Abstract

1. Introduction

2. Toy simulation model

3. Illumination modes

4. Experiments

4.1 Thin lens experiments

4.2 Effect of distance

4.3 Specular vs. diffuse reflections

4.4 Commercial lens datasets

4.5 Cell phones

5. Parameter prediction

5.1 Model results

5.1.1 Commercial lens datasets

5.1.2 Cellphone results

6. Discussion

Funding

Acknowledgements

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (12)

Tables (2)

Equations (7)

Optics Express