Compressed sensing in photonics: tutorial

Velat Kilic; Trac D. Tran; Mark A. Foster

doi:10.1364/JOSAB.469865

1. INTRODUCTION

The world is composed of analog signals: the spectrum of sunlight scattered from the leaf of a crop, the flash of light from a fluorescent protein within a neuron, or the fluctuations of the electric field of a RADAR pulse returning from a distant object. The goal of modern photonic imaging and sensing systems is to capture such analog signals into an accurate digital representation that can be efficiently transmitted, stored, and analyzed. Analog signals travel through many processing steps before they are saved on a computer for further analysis including sampling and quantization, and each of these steps inevitably leads to some loss of information. The challenge for an imaging or sensing system is to strike an optimal balance between information loss and efficiency.

The first step in the journey from analog signal to digital representation is sampling, where a set of discrete analog measurements representative of the analog signal is made. A simple example is shown in Fig. 1(a), where an image is discretized into many smaller rectangular areas corresponding to the sampling effect of camera pixels. Sampling intrinsically leads to information loss since the analog signal could be changing arbitrarily in between samples or within a sample (e.g., averaged intensity over a pixel). This is largely avoided by filtering signals in the frequency domain so the analog signal is bandwidth limited and sampling at a rate greater than the Nyquist limit; however, in this case information is still lost during the filtering stage.

Fig. 1. (a) After sampling at the Nyquist rate and digitization, an arbitrary scene can be represented on the computer. The scene data can be significantly reduced through lossy compression by first projecting it to a sparse domain (in this case the wavelet domain) and then keeping a small percentage of the largest elements. For example, keeping only 1% of the wavelet coefficients leads to minimal degradation in image quality. (b) In conventional sensing, data is first digitized and then compressed, which is inefficient since hardware resources are wasted on capturing data redundantly. (c) In compressed sensing, analog signals are first compressed and then sampled, which maximizes the efficiency with which information is captured by the hardware sampling resources.

Download Full Size | PDF

Following sampling, the next step is measurement with some form of quantization. Quantization is necessary not only because computer memory is finite but also because one can only measure by comparisons. For example, an analog voltage can be measured by comparing it to many precise reference voltage levels. Time can be measured by measuring the number of cycles of a reference clock. Many examples of quantized measurements can be given here, and they would all have similar types of noise associated with them, such as the step size (resolution) of the reference levels, the accuracy of the reference levels, and the nonlinearity of the step sizes, which carry various names for different quantizing devices [e.g., differential/integral nonlinearity (DNL/INL) for an analog-to-digital converter, or jitter in a clock]. Many fundamental trade-offs exist between quantization or sampling parameters and design performance metrics, such as cost, speed, and resolution. For example, a large number of reference levels would allow measurement of a larger range of (or more precise) signal values. However, this would incur increased financial cost, reduced speed since more bits need to be transmitted, and increased data storage. Data compression has played a significant role in relaxing some of the physical constraints associated with these design choices for both conventional and compressed sensing modalities.

In conventional sensing, signals are compressed after digitization for efficient storage and transmission as depicted in Fig. 1(b). Technologies such as video streaming have become so ubiquitous that we take them for granted, and yet they would have been impossible without compression algorithms. A tremendous amount of effort has gone into designing efficient compression algorithms, which are generally grouped into lossless and lossy algorithms. Lossless algorithms exploit redundancies in the signal. For example, if the signal has a lot of repeating patterns (e.g., “000000011111”), one can store “runs” of data (e.g., “7 0s, 5 1s”), which is known as run-length encoding. Huffman coding is another example where patterns with the highest probability of occurrence are encoded with the fewest number of bits, which is an example of variable-length code (e.g., common letters of the English alphabet such as “e” or “a” can be transmitted with the fewest number of bits). Lossy compression algorithms, on the other hand, make use of the general fact that high-dimensional natural signals live in a low-dimensional manifold. Therefore, high-dimensional data (e.g., images with millions of pixels) can be projected down to a low-dimensional space [e.g., a small fraction of the largest discrete cosine transform (DCT) coefficients] and reconstructed computationally with minimal error. An example of this is shown in Fig. 1(a), where keeping as few as 1% of the wavelet transform coefficients of an image results in almost no visible degradation in image quality.

Critically, research into lossy compression has demonstrated that natural signals can be compressed with little loss of information, which suggests that conventional sensing modalities are inefficient since conventionally sampled data has a large amount of redundant information. The main goal of compressed sensing is to compress signals in the physical (analog) domain prior to sampling in order to maximize the information acquired per sample [Fig. 1(c)]. The increased sampling efficiency of compressed sensing is extremely valuable for “measurement constrained” applications. Problems can be measurement constrained due to many reasons, such as the cost of pixels (e.g., IR imaging), bottlenecks in throughput of high speed imaging, and limited spatial access to the sample (e.g., minimally invasive microendoscopy).

The term multiplexing will be used often throughout the text, which is an important concept borrowed from telecommunications terminology where signals are added to share a scarce resource. In a very similar manner, we want information from the sample to share the scarce resource of limited measurements. As we will discuss later, pseudo-random multiplexing of signals is a very efficient way to compress them in the analog domain before sampling.

We organized the rest of this tutorial into three sections: Section 2 covers compressed sensing theory and algorithms, Section 3 focuses on optical technologies for multiplexing optical signals, and finally Section 4 discusses some applications that make use of both the compressed sensing algorithms and the multiplexing technologies to solve real problems in optical sensing. This last section is not intended to be an exhaustive review but a collection of selected works that the authors find instructive. There are many excellent introductory tutorials on compressed sensing [1,2] and its applications to imaging [3] as well as books [4,5] with an emphasis on mathematically rigorous explanations. In our tutorial, we cover compressed sensing theory with an emphasis on intuitive understanding followed by a discussion on commonly used photonic tools and applications. We assume linear algebra knowledge at the level of [6], and although not required, signal processing knowledge at the level of [7] would be useful.

2. COMPRESSED SENSING

A. Notation

Matrices and vectors are denoted with boldface uppercase and lowercase letters, respectively. The vector norm is a measure of its length—a generalization of the Euclidean distance. The general ${l_p}$ norm of vector ${\textbf x}$ is defined as

(1)$${\left\| {\textbf x} \right\|_p} = {\left({\sum\limits_i |{x_i}{|^p}} \right)^{1/p}}$$

for $p \ge 1$, where the popular Euclidean distance corresponds to $p = 2$. For $p \lt 1$, the triangle inequality is not satisfied, and therefore it does not satisfy the definition of a norm. Loss functions based on norms are convex and therefore much easier to optimize. While not considered a norm, the special case of $p \to 0$ is known as the ${l_0}$ “pseudo-norm” ${\| {\textbf x} \|_0}$, which yields the number of nonzero elements in ${\textbf x}$. A vector ${\textbf x}$ is said to be $S$-sparse (${\textbf x} \in {\Sigma _S}$) when it has no more than $S$ nonzero entries, i.e., ${\| {\textbf x} \|_0} \le S$, where the number of nonzero elements is often much smaller than the signal dimension $N$, i.e., $S \ll N$.

Optimization problems will be posed in the following form:

(2)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}f({\textbf x}),$$

where argmin refers to the argument that minimizes the expression to its right. In other words, the value of ${\textbf x}$ that minimizes the cost function $f({\textbf x})$ is found and set to the signal estimate ${\hat{\textbf{x}}}$.

B. Sampling

Natural signals of interest have a finite extent in time (or space) and possess a minimum feature size and thus finite bandwidth in the Fourier domain. For example, the spatial bandwidth of free-space optics is limited by their finite aperture. According to the Shannon–Nyquist sampling theorem, a signal with bandwidth $W$ in the frequency domain can be sufficiently sampled at equal intervals of $\delta t = \frac{1}{{2W}}$, first proven by Shannon and attributed to Nyquist. This makes intuitive sense as Shannon explains, “if $f(t)$ contains no frequencies higher than $W$, it cannot change to a substantially new value in a time less than one-half cycle of the highest frequency” [8]. Equivalently, $N = \frac{T}{{\delta T}} = 2TW$ is the total number of independent measurements needed to specify a signal with finite extent $T$, which is also known as the time-bandwidth product (TBP) or space-bandwidth product (SBP) in the optics literature. TBP and SBP are powerful concepts for describing sensing systems and can be used to describe either the optical signal or the optical system of interest [9]. For example, for imaging systems a phenomena of interest may possess a certain minimum spatial feature size and maximum spatial extent determining the SBP of the signal itself. Furthermore, the SBP of the optical system used to capture this phenomenon may be constrained by the resolution and field of view of the physical optics or by the number of pixels on the sensor. Ultimately the goal for the imaging hardware is to match or exceed the SBP of the signal of interest.

We will group sampling strategies into three categories: oversampling, critical sampling (Nyquist), and undersampling (compressed sensing). One way to visualize these strategies in a general way is to use set diagrams as shown in Fig. 2, where mappings from a set of measurements to a set of signals are drawn. For example, a signal could be an image and the measurement its Fourier transform. In the critical sampling case, it is straightforward to go from a measurement ${M_i}$ to a signal ${S_i}$. For our example, the mapping is a Fourier transform, and, for critical sampling, a unique inverse transform exists. In an ideal world, critical sampling would be sufficient; however, in the presence of noise, multiple measurements might be necessary to calculate signals from measurements in a robust way. That leads to the oversampled case as shown in Fig. 2(b), where the same signal results in different measurements. Here simple methods such as averaging or ridge regression can be used to reduce the effects of noise.

Fig. 2. Measurement ${M_i}$ to signal ${S_i}$ mapping for different sampling schemes: (a) critical (“Nyquist”) sampling case where mapping is invertible, (b) oversampled case (often employed if measurements are noisy), and (c) undersampled case (compressed sensing). (d) Undersampled mapping can be made invertible by selecting a subset of all possible signals that utilizes prior information about signals (e.g., sparsity).

Download Full Size | PDF

Fig. 3. (a) Raster scanning: a photodetector is scanned through the entire image to acquire independent measurements. (b) Basis scan: independent patterns are projected onto the scene, and the light from the entire scene is integrated by the photodetector. (c) Pseudo-random patterns can be used to optimally sample a scene for compressed sensing.

Download Full Size | PDF

The undersampled case is of specific interest here because it corresponds to the conditions of compressed sensing. Inversion for the undersampled case seems impossible at first glance where multiple signals map to the same measurements as shown in Fig. 2(c). However, if we could incorporate prior information about signals of interest, then it might be possible to invert the measurements as shown in Fig. 2(d). Specifically, given measurement ${M_1}$ or ${M_2}$ and knowledge that the signal of interest lies in some subset shown in red, we can infer that the signal that led to that particular measurement was the signal ${S_2}$ or ${S_3}$, respectively. As one example, an oracle who knows the position of the nonzero elements of a sparse vector might be able to convert an underdetermined set of linear equations to an overdetermined one, as will be discussed later.

Measurements should be maximally informative, which implies orthogonality in linear algebra terms. However, there are infinitely many ways to construct linearly independent measurements, and the specific choice of measurement basis affects system performance significantly. For example, given a single photodetector, a scene can be imaged by scanning the detector pointwise across the entire image as shown in Fig. 3(a), which is also known as raster scanning. Alternatively, the scene can be blocked using a complete basis (e.g., Hadamard), and the light from the entire image can be integrated by the photodetector as shown in Fig. 3(b). Basis scan is comparable to raster scan, where the number of measurements is equal to the reconstructed signal dimension and the image can be reconstructed using a direct inverse transform. For compressed sensing, the scene is often sampled using a pseudo-random pattern as shown in Fig. 3(c). As will be shown, the latter approach is often more efficient requiring fewer such measurements to reconstruct the signal.

Linear algebra is a powerful tool in untangling the image from the measurements. The imaging process in this case can be converted to a linear algebra problem by vectorizing the images and the projected patterns resulting in the following linear equation:

(3)$${\textbf A}{\textbf x} = {\textbf y},$$

where ${\textbf A}$ is known as the sensing matrix of size $m \times n$, ${\textbf x}$ is the vectorized image of size $n \times 1$ (e.g., $n$ is the number of image pixels), and ${\textbf y}$ is the vector of measurements of size $m \times 1$, where $m$ is the number of measurements. Vectorizing an image involves taking each column or row of an image and stacking them to form a vector. Following our psuedo-random measurement example, each row of the measurement matrix ${\textbf A}$ would be a pseudo-random pattern that multiplies the column vector ${\textbf x}$ and results in a single measurement that goes into the corresponding row in ${\textbf y}$.

The case of critical sampling corresponds to $m = N$ and can be solved by ${\textbf x} = {{\textbf A}^{- 1}}{\textbf y}$ if the matrix inverse exists. However, the sensing matrix is not square for the oversampled or the undersampled case and therefore its inverse does not exist. This problem can be circumvented by multiplying both sides of the equation by its transpose ${{\textbf A}^T}$,

(4)$${{\textbf A}^T}{\textbf A}{\textbf x} = {{\textbf A}^T}{\textbf y}.$$

Since ${{\textbf A}^T}{\textbf A}$ is square and can be inverted if $|{{\textbf A}^T}{\textbf A}| \ne 0$, we can solve for the image

(5)$${\textbf x} = {\left({{{\textbf A}^T}{\textbf A}} \right)^{- 1}}{{\textbf A}^T}{\textbf y} = \to {\textbf x} = {{\textbf A}^\dagger}{\textbf y},$$

where the generalized inverse (also known as pseudo-inverse or Moore–Penrose inverse) is defined as ${{\textbf A}^\dagger} = {({{{\textbf A}^T}{\textbf A}})^{- 1}}{{\textbf A}^T}$. The pseudo-inverse is optimal in the sense that it provides the minimal squared error:

(6)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}\left\| {{\textbf A}{\textbf x} - {\textbf y}} \right\|_2^2,$$

which can be shown by taking the derivative of the cost function and setting it equal to the zero vector. The pseudo-inverse is useful for the oversampled case $m \gt N$, in which case it provides a least-squares solution. However, for the undersampled (compressed sensing) case where infinitely many solutions exist, the pseudo-inverse provides the solution with the smallest Euclidean length (e.g., power), which is rarely the correct solution in practice. This is where we must incorporate prior information to restrict the possible set of solutions as depicted in Fig. 2(d). Sparsity is a very powerful tool to achieve this goal. Therefore, we will focus on sparsity in the next section. However, other prior information that can constrain the possible set of solutions also helps, such as non-negativity, small total-variation (TV) norm, or machine learning based priors that are learned from a dataset. Effects of noise can be modeled using ridge regression, which imposes an ${l_2}$ penalty on the signal and fundamentally assumes zero mean Gaussian noise; however, that alone would fail to recover signals captured below the Nyquist limit, which requires compressible signals. A combination of these loss functions can be used to further constrain the set of possible signals.

C. Sparsity

Sparse signals are compressible since only a few nonzero elements need to be stored to accurately represent the signal. Most natural signals are sparse in some domain and therefore can be compressed in the digital domain (e.g., JPEG for image compression) with very little loss in signal quality. Compressed sensing advocates that, since the vast majority of signals are compressible, compression prior to analog-to-digital conversion would make the data collection process more efficient. A more efficient data collection process would imply either (i) fewer measurements can be used to reconstruct a signal or (ii) the same number of measurements can be used to extract more information about the signal.

Some signals are naturally sparse in the measurement domain, such as stars in the night sky. Clearly, raster scanning these scenes is inefficient since most of the time our detector would measure zero. For other signals, a sparse domain of representation can be realized by applying a sparsifying transformation such as the DCT, wavelet transform, or basis vectors learned from training data. With a sparsifying transformation, we have

(7)$${\textbf A}{\boldsymbol \Psi}{\boldsymbol \alpha} = {\textbf y},$$

where ${\textbf A}$ is the $m \times n$ sampling matrix (e.g., pseudo-random binary pattern), ${\boldsymbol \Psi}$ is the $n \times n$ sparsifying transformation (e.g., DCT), and ${\boldsymbol \alpha}$ is the $n \times 1$ sparse representation of our signal. In this case, the sensing matrix becomes ${{\textbf A}^\prime} = {\textbf A}{\boldsymbol \Psi}$, and the linear problem becomes

(8)$${{\textbf A}^\prime}{\boldsymbol \alpha} = {\textbf y}.$$

Once a sparse solution is found, the original signal can be recovered using ${\textbf x} = {\boldsymbol \Psi}{\boldsymbol \alpha}$. Therefore, for any signal sparse in some arbitrary domain, we can always write a linear matrix-vector equation of the form Eq. (8). However, solving Eq. (8) is equivalent to solving Eq. (3); therefore, we will simply refer to ${\textbf x}$ as a sparse vector in the following sections without any loss of generality.

The inverse problem of recovering ${\textbf x}$ from a small set of measurements ${\textbf y} = {\textbf A}{\textbf x}$ involves two critical questions: (i) how should we design the sensing matrix ${\textbf A}$ to preserve the essential information in ${\textbf x}$?; and (ii) how can we recover ${\textbf x}$ from measurements ${\textbf y}$? In this section, we summarize a number of desirable properties that ${\textbf A}$ should have and discuss popular algorithms for signal recovery. We delay the photonic implementation of such operations to Section 3 and applications to Section 4.

At first glance, the problem ${\textbf y} = {\textbf A}{\textbf x}$ where ${\textbf A}$ has more columns than rows is underdetermined and, hence, has an infinite number of solutions. Enforcing sparsity on the solution ${\textbf x}$ is a form of regularization, making the problem better posed and leading to our first optimization problem:

(9)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}{\left\| {\textbf x} \right\|_0}\;{\rm subject}\;{\rm to}\;{\textbf A}{\textbf x} = {\textbf y},$$

where we attempt to minimize the number of nonzero elements in ${\textbf x}$. In other words, out of all possible candidates ${\textbf x}$ that satisfies the constraint set $\{{\textbf x}\;|\;{\textbf A}{\textbf x} = {\textbf y}\}$, we propose to select the simplest (sparsest) one, following our original sparsity motivation. This ${\ell _0}$-minimization problem in Eq. (9) turns out to belong to a class of combinatorial constrained optimization problems, which is notoriously difficult to solve. The key observation is illustrated pictorially in Fig. 4: suppose that the support of ${\textbf x}$ (i.e., the nonzero elements) is known; we can then convert the underdetermined problem to an overdetermined one, which can then be solved efficiently using the pseudo-inverse. Therefore, without the knowledge of the support set, one can only try an exhaustive search strategy where all different sparse combinations of nonzero elements in ${\textbf x}$ are investigated until a feasible solution (satisfying the constraint set) is found.

Fig. 4. Underdetermined system of equations can be converted to an overdetermined problem if locations of nonzero elements are known. The subspace solution ${{\textbf x}_S}$ can then be solved for using the pseudo-inverse.

Download Full Size | PDF

Next, we should immediately observe that the null space of ${\textbf A}$, denoted as ${\cal N}({\textbf A}) = \{{\textbf z}\;|\;{\textbf A}{\textbf z} = {\textbf 0}\}$, holds the key to the unique recovery of ${\textbf x}$ because both ${\textbf x}$ and ${\textbf x} + {\textbf z}$ are possible solutions to ${\textbf y} = {\textbf A}{\textbf x}$. Moreover, since we look for sparse solutions as in Eq. (9), unique sparse recovery can be successful as long as the null space of ${\textbf A}$ does not contain any sparse vectors. More precisely, ${\textbf A}$ represents all $S$-sparse signals ${\textbf x}$ if and only if ${\cal N}({\textbf A})$ contains no vector in ${\Sigma _{2S}}$, a set of ${2}S$-sparse vectors. This is equivalent to the fact that any random subset of $2S$ columns of ${\textbf A}$ should be linearly independent—a critical property that we strive to achieve in the design of good sensing matrices in Section 3.

D. Restricted Isometry Property and Mathematical Guarantee

While the null-space property of ${\textbf A}$ is both necessary and sufficient for establishing guarantees of successful sparse recovery, these guarantees do not account for noise, and the signals of interest ${\textbf x}$ might only be approximately $S$-sparse. The pioneering work by Candès and Tao in [10] introduces the following crucial property called the restricted isometry property (RIP) on matrices ${\textbf A}$ defined as follows: a matrix ${\textbf A}$ satisfies RIP of order $2S$ if there exists a bound $0 \lt {\delta _{2S}} \lt 1$ such that

(10)$$(1 - {\delta _{2S}})\left\| {\textbf x} \right\|_2^2 \le \left\| {{\textbf A}{\textbf x}} \right\|_2^2 \le (1 + {\delta _{2S}})\left\| {\textbf x} \right\|_2^2$$

holds $\forall {\textbf x} \in {\Sigma _{2S}}$. RIP eventually allows exact sparse recovery of any $S$-sparse signal via ${\ell _1}$-minimization as follows:

(11)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}{\left\| {\textbf x} \right\|_1}\;{\rm subject}\;{\rm to}\,{\textbf A}{\textbf x} = {\textbf y}.$$

Figure 5 illustrates at a high level why ${\ell _1}$-minimization encourages sparse solutions, showing unit circles for different $p$ values (or more accurately ${l_p}$ balls) in two-dimensional (2D) space [11]. Intuitively, the solution will be at the intersection of hyper-plane (in red, representing the constraint set satisfying ${\textbf A}{\textbf x} = {\textbf y}$) and the lowest ${l_p}$-constant contours of ${\textbf x}$ (in blue, representing the minimum ${l_p}$-norm). Since the ${l_p}$ balls are “spikier” for $p = 1$ than $p \gt 1$ , the solution at the intersection will more likely be sparse (on either the $x$ or $y$ axis in the 2D space shown).

Fig. 5. Red lines show all the solutions satisfying ${\textbf Ax} = {\textbf y}$. Blue lines show the “${l_p}$ balls” that are grown until they intersect with the red line. Gray circles show the solutions from each process where the two lines intersect.

Download Full Size | PDF

In fact, even with approximately $S$-sparse signal and noisy measurements ${\textbf y} = {\textbf A}{\textbf x} + {\textbf w}$ where ${\textbf w}$ is an additive noise vector, the following generalized optimization also yields robust recovery:

(12)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}{\left\| {\textbf x} \right\|_1}\;{\rm subject}\;{\rm to}\,\,{\left\| {{\textbf A}{\textbf x} - {\textbf y}} \right\|_2} \le \epsilon .$$

We can interpret Eq. (10) as saying that ${\textbf A}$ approximately preserves the distance between any pair of $S$-sparse vectors $\{{\textbf u},{\textbf v}\}$ (where we can interpret ${\textbf x}$ as ${\textbf u} - {\textbf v}$). This close relationship between RIP and the Johnson–Lindenstrauss lemma points us to construction of robust matrices ${\textbf A}$’s as well-understood dimensionality-reduction operators [12]. At closer inspection, one can observe that RIP also has a tight relationship with the aforementioned null-space property of ${\textbf A}$: any random subset of $2S$ columns of ${\textbf A}$ yields an approximate isometry, and this degree of isometry ${\delta _{2S}}$, as expected, will play an important role in the quality of recovery. Therefore, RIP leads to fundamental implications on recovery robustness with the presence of measurement noise in ${\textbf y}$ as well as with an approximately $S$-sparse signal ${\textbf x}$, establishing the following powerful guarantee: when the sensing matrix ${\textbf A}$ satisfies the RIP of order $2S$ with ${\delta _{2S}} \lt \sqrt 2 - 1$, then solving Eq. (12) yields a solution whose deviation from the ground truth ${\textbf x}$ is bounded by the noise power ${\| {\textbf w} \|_2}$ and the error of the best $S$-sparse approximation to ${\textbf x}$. In other words, the solution ${\hat{\textbf{x}}}$ to Eq. (11) obeys

(13)$${\left\| {{\hat{\textbf{x}}} - {\textbf x}} \right\|_2} \le \frac{{{C_{\cal S}}}}{{\sqrt S}}{\left\| {{\textbf x} - {{\textbf x}_{\cal S}}} \right\|_1} + {C_W} \epsilon ,$$

where ${\textbf x}$ is our ground-truth signal-of-interest, ${{\textbf x}_{\cal S}}$ is the best $S$-sparse estimate to ${\textbf x}$, ${\| {\textbf w} \|_2} \le \epsilon$, and constants $\{{C_{\cal S}},{C_W}\}$ are simple functions of the RIP constant ${\delta _{2S}}$ [13]. The reader should note that when our ground truth ${\textbf x}$ is $S$-sparse and when our measurement ${\textbf y}$ is noiseless, then we can drive the error bound to zero and achieve exact recovery.

E. Coherence, the Gershgorin Disc Theorem, and the Benefit of Random Sensing Matrices

Although RIP has nice mathematical properties and guarantees, it is difficult to evaluate in practice since its evaluation is a combinatorial problem. Coherence of a sensing matrix is much easier to calculate and can often elucidate the trade-offs when designing sampling strategies for photonic applications. For successful recovery, knowing the support of ${\textbf x}$ (i.e., positions of nonzero elements) as shown in Fig. 4 is not enough. We also need to guarantee that the following subspace problem is invertible as well [14]:

(14)$${\textbf A}_S^T{{\textbf A}_S}{{\textbf x}_S} = {\textbf A}_S^T{\textbf y},$$

where ${{\textbf A}_S}$ contains columns corresponding to the nonzero elements of ${\textbf x}$. Therefore, the eigenvalues of ${\textbf A}_S^T{{\textbf A}_S}$ cannot be zero as that would imply $\textit{det}({{\textbf A}_S^T{{\textbf A}_S}}) = 0$. Intuitively, we can project the signal ${{\textbf x}_S}$ onto the eigenvectors of ${\textbf A}_S^T{{\textbf A}_S}$ and observe that an eigenvalue of zero would multiply the signal’s corresponding coefficient with zero and information about that component would be irrecoverably lost. Furthermore, a small eigenvalue is not desirable as it might push the signal energy in that mode below the noise floor. Estimating the eigenvalues of ${\textbf A}_S^T{{\textbf A}_S}$ proves invaluable in assessing the reconstruction success.

The Gershgorin disc theorem provides an easy and intuitive method for bounding the eigenvalues of nearly diagonal matrices. For diagonal matrices, the eigenvalues are the diagonal elements. For matrices that are nearly diagonal, we would expect the eigenvalues to be close to the diagonal values. The Gershgorin disc theorem states that indeed an eigenvalue of a matrix is located in a disc centered around the diagonal element with radius equal to the sum of absolute values of the off-diagonal elements [15] as shown graphically in Fig. 6. It is worth noting that the off-diagonal elements could have been picked from the columns instead of the rows since the eigenvalues of a matrix and its transpose are the same.

Fig. 6. Applying the Gershgorin disc theorem to estimate eigenvalues of an example $3 \times 3$ matrix. Eigenvalues lie on a disc centered around the diagonal elements with radius determined by the sum of absolute values of the off-diagonal elements. Numerically calculated eigenvalues ${\lambda _i}$ are indeed located inside the disc.

Download Full Size | PDF

The Gershgorin disc theorem might not always give a tight bound (especially if off-diagonal elements are not small), but it is useful in finding the trade-offs in a compressive sensing scheme. The matrix on the left-hand side of the inverse problem measures the overlap of basis functions:

(15)$${{\textbf A}^T}{\textbf A} = \left[{\begin{array}{*{20}{c}}{\left\langle {{{\textbf a}_1},{{\textbf a}_1}} \right\rangle}& \cdots &{\left\langle {{{\textbf a}_1},{{\textbf a}_n}} \right\rangle}\\ \vdots & \ddots & \vdots \\{\left\langle {{{\textbf a}_n},{{\textbf a}_1}} \right\rangle}& \cdots &{\left\langle {{{\textbf a}_n},{{\textbf a}_n}} \right\rangle}\end{array}} \right],$$

where ${{\textbf a}_i}$ are the column vectors of ${\textbf A}$. The eigenvalues of this matrix will be in a disc centered around the diagonal elements ${\lambda _{c,i}} = \langle {{{\textbf a}_i},{{\textbf a}_i}} \rangle$ with a radius ${R_i} = \sum\nolimits_{j \ne i} | {\langle {{{\textbf a}_i},{{\textbf a}_j}} \rangle} |$. When solving the subspace problem, ${\textbf A}_S^T{{\textbf A}_S}{\textbf x} = {\textbf A}_S^T{\textbf y}$, successful reconstruction would need eigenvalues of ${\textbf A}_S^T{{\textbf A}_S}$ to be nonzero. In the worst case, all correlated signals would appear in the subspace problem so it is useful to define the mutual coherence or simply the coherence of the sensing matrix ${\textbf A}$, $\mu({\textbf A})$, as [16]

(16)$$\mu ({\textbf A}) = \mathop {\max}\limits_{1 \le i \ne j \le n} |\langle {{\textbf a}_i},{{\textbf a}_j}\rangle |,$$

which is easily computable as the largest off-diagonal element magnitude-wise in ${{\textbf A}^T}{\textbf A}$. Suppose the column vectors are normalized, i.e., ${\| {{{\textbf a}_i}} \|_2} = 1$, and in the worst case, the eigenvalues would be pushed towards zeros. So with $S$-sparse problem, we need $1 - (S - 1)\mu ({\textbf A}) \gt 0$ if the support of the sparse signal is known. Otherwise, we would need $2S$ measurements to differentiate between two $S$-sparse candidate solutions when using the combinatorial approach as in Eq. (9). In that case, we would need $1 - (2S - 1)\mu ({\textbf A}) \gt 0$. In other words, we can obtain the following guarantee: when the coherence is low enough $\mu({\textbf A}) \lt \frac{1}{{2S - 1}}$, or, equivalently, when the signal is sparse enough $S \lt \frac{1}{2}({1 + \frac{1}{{\mu ({\textbf A})}}})$, then ${\ell _1}$-minimization optimization in Eq. (11) or Eq. (12) would recover the original sparse signal [14].

Coherence is much easier to calculate in practice for photonic systems compared to the RIP. For example, for a free-space imaging system, $\mu$ would be the overlap integral between point spread functions (PSFs) separated by the sampled distances. If the PSF is very broad, then the overlap integral and therefore $\mu$ would be large, which would reduce resolution.

It should be noted that since coherence or the Gershgorin bound is not very tight, results derived from it should only hold qualitative value. In fact, tighter bounds can be obtained using RIP or the null-space property of the sensing matrix. However, it is satisfying to clearly see that a lower sparsity level or lower coherence yields better reconstruction. The sparsity level can be reduced by obtaining better basis functions possibly through machine learning techniques, whereas coherence level $\mu({\textbf A})$ can be reduced through a better sensing matrix design.

Pseudo-random matrices are often employed since they have low coherence with high probability and they yield elegant mathematical proofs and bounds. For random matrices, we can obtain robust recovery when the number of measurements is simply high enough [4]:

(17)$$m \gt CS\;{\rm ln}\left({\frac{N}{S}} \right),$$

where $C$ is a coherence-dependent constant. This is a significant improvement over the coherence treatment above. For this reason, typically, the goal for optical hardware is to implement pseudo-random multiplexing of the signal of interest, and this is detailed in Section 3.

It is also worth noting here that clearly small coherence, a large number of measurements, or sparse signals help in signal reconstruction success. In fact, Donoho and Tanner show that noise-free reconstruction success has abrupt phase transitions that vary as a function of the undersampling fraction and sparsity fraction [17]. This behavior is shown to hold generally for a larger number of cases and can provide guidance on possible achievable compression ratios. In the author’s experience, the value of $C$ is roughly 3–4, which means for most practical applications of compressed sensing the sparsity level should be below 5%–10% to claim significant benefits over Nyquist sampling.

F. Sparse Recovery Algorithms

In this section, we present some of the most common algorithms and techniques used for sparse recovery. Some are presented in pseudo-code form as well (Algorithms 1 –4), with consistent notation to highlight the differences between the algorithms.

1. Greedy Algorithms

Greedy recovery strategies have been known and developed long prior to the compressed sensing theory [18]. Algorithms in this direction attempt to identify the support ${\cal S}$ of the sparse signal and then employ the pseudo-inverse of the corresponding set of columns in ${\textbf A}$ to obtain the magnitude of the sparse codes, as we demonstrate previously in Fig. 4. The optimization problem here can be posed as

(18)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}{\left\| {{\textbf A}{\textbf x} - {\textbf y}} \right\|_2}\;\;{\rm subject}\,\,{\rm to}\;\;{\left\| {\textbf x} \right\|_0} \le S.$$

Among these greedy algorithms, the strategy for finding the support ${\cal S}$ is also vastly diverse. Orthogonal matching pursuit (OMP) [19] grows this set one-by-one in a truly greedy fashion, whereas subspace pursuit (SP) [20] and compressive sampling matching pursuit (CoSaMP) [21] attempt to identify this set (or at least its approximation) in each of their iteration steps. All three try to identify the support from set of columns that have high coherence with the current residue vector, again demonstrating the importance of the coherence $\mu({\textbf A})$ in the recovery process—the lower it is, the more accurate the support-recovery step. Another key component that improves the robustness of all three algorithms involves the concept of keeping the approximation residue orthogonal to the best estimate of ${\textbf x}$ so far via a simple projection. Finally, we note that SP and CoSaMP have a backtracking mechanism to eliminate “bad” indices that somehow creep into the support set previously while OMP does not (although it is quite simple to modify OMP to do so).

Algorithm 1. Orthogonal Matching Pursuit [19]

View Table | View all tables in this article

Algorithm 2. Subspace Pursuit [20] and CoSaMP [21]

View Table | View all tables in this article

2. Linear Programming

Convex relaxation of the original optimization problem from ${\ell _0}$ penalty to ${\ell _1}$ can make reconstruction significantly faster. Moreover, ${\ell _1}$-minimization guarantees to find the sparsest solution when the level of coherence $\mu({\textbf A})$ is low enough as discussed in previous sections. Linear programming is a mature optimization technique available in many specialized software packages, and therefore its inner workings will not be discussed here [22]. Its canonical form is

(19)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}\,{{\textbf c}^T}{\textbf x}\,\,{\rm subject}\,\,{\rm to}\,\,{\textbf A}{\textbf x} \le {\textbf y}\,\,{\rm and}\,\,{\textbf x} \ge {\textbf 0}.$$

The sparse recovery problem via ${\ell _1}$-norm minimization as in Eq. (11) can be cast into a linear program as follows. We can represent the sparse vector ${\textbf x}$ as ${\textbf x} = {\textbf u} - {\textbf v}$ where both ${\textbf u}$ and ${\textbf v}$ are non-negative, capturing the positive and negative nonzero components of ${\textbf x}$, respectively. This implies that ${u_i} \ge 0$ and ${v_i} \ge 0$ for any $i$ and the ${l_1}$-norm cost function of ${\textbf x}$ is then ${\| {\textbf x} \|_1} = {{\textbf 1}^T}{\textbf u} + {{\textbf 1}^T}{\textbf v}$. Therefore, we manage to convert Eq. (11) to the following equivalent standard linear program:

(20)$${\hat{\boldsymbol z}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf z}{{\textbf 1}^T}{\textbf z}\,\,{\rm subject}\,\,{\rm to}\,\,{\textbf B}{\textbf z} = {\textbf y}\,\,{\rm and}\,\,{\textbf z} \ge {\textbf 0},$$

where ${{\textbf z}^T} = {\textbf [}{{\textbf u}^T}\;\;{{\textbf v}^T}]$ and ${\textbf B} = [{\textbf A}\;\; - {\textbf A}]$.

Algorithm 3. Iterative Hard Thresholding [23]

View Table | View all tables in this article

Algorithm 4. Iterative Shrinkage Thresholding [24] and Fast Iterative Shrinkage Thresholding [25]

View Table | View all tables in this article

3. Iterative Convex Optimization

In practice, when the dimension of the problem becomes large, we typically convert the constrained optimization problem in Eqs. (11) or (12) into the following unconstrained convex optimization problem:

(21)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\,{\rm min}}\limits_{\textbf x}\;\;\frac{1}{2}\left\| {{\textbf y} - {\textbf A}{\textbf x}} \right\|_2^2 + \lambda {\left\| {\textbf x} \right\|_1},$$

where the first strictly convex term $\frac{1}{2}\| {{\textbf y} - {\textbf A}{\textbf x}} \|_2^2$ enforces measurement consistency, whereas the second term ${\| {\textbf x} \|_1}$ is the regularization penalty cost, which encourages sparsity in the solution, and the hyperparameter $\lambda$ controls the desired sparsity level. The penalty term ${\| {{\textbf y} - {\textbf A}{\textbf x}} \|_2}$ is commonly squared to avoid handling the square root associated with the ${\ell _2}$-norm. Note that for a certain choice of $\lambda$, the solutions obtained from Eq. (12) and from Eq. (21) are the same. However, in general, the value of $\lambda$ that makes these problems equivalent is unknown a priori. The choice of $\lambda$ trades off sparsity for agreement with measurement. The optimal value for $\lambda$ is problem dependent and often not known a priori either. Therefore, it is often set empirically.

The simplest strategy in this direction is the iterative hard thresholding (IHT) algorithm [23], described in Algorithm 3. Starting from the trivial signal estimate ${\hat{\textbf{x}}} = {\textbf 0}$, the algorithm iterates a gradient-descent step followed by hard thresholding (as shown in the left of Fig. 7) to enforce the sparsity regularization until a convergence criterion is met. In fact, we can even classify IHT under the greedy pursuit category where the hard thresholding step at each iteration reveals the $S$-sparse support of the current estimate.

Fig. 7. Thresholding operations are used in iterative convex optimization routines. Hard thresholding (left) simply sets signal elements with absolute value below a certain threshold $T$ to zero while keeping the rest of the values the same. In contrast, soft thresholding (right) scales all the values down by $T$ and then sets elements with small absolute values to zero.

Download Full Size | PDF

A natural improvement in this direction is the iterative shrinkage thresholding algorithm (ISTA) [24] as depicted in Algorithm 4, which borrows powerful ideas from past signal denoising efforts. ISTA consists of two steps: a gradient-descent step and a proximal shrinkage step via soft thresholding of the result obtained from the gradient step to generate sparsity. The most popular ISTA-style algorithm is a modified fast version named FISTA [25]; it accelerates ISTA convergence significantly by adding a momentum term with adaptive step size in each of its iterations. Another well-known algorithm with a different approach to speeding up convergence is the two-step ISTA coined TwIST [26], which adds an element-wise reweighted scheme and a two-step-back update to the shrinkage thresholding step for enhanced robustness.

Alternatively, we can opt for converting the constrained optimization problem in Eq. (11) to its unconstrained form via introducing additional Larange multipliers ${{\boldsymbol \lambda}^T}({\textbf y} - {\textbf A}{\textbf x})$ and then strengthening these constraints further via an additional penalty term $\frac{\gamma}{2}\| {{\textbf y} - {\textbf A}{\textbf x}} \|_2^2$. Such a strategy is known as the augmented Lagrangian multipliers technique [27], and the overall cost function becomes

(22)$${\hat{\textbf{x}}} = \mathop{{\rm arg}\, {\rm min}}\limits_{{\textbf x},{\boldsymbol \lambda}}\;\;{\left\| {\textbf x} \right\|_1} + {{\boldsymbol \lambda}^T}({\textbf y} - {\textbf A}{\textbf x}) + \frac{\gamma}{2}\left\| {{\textbf y} - {\textbf A}{\textbf x}} \right\|_2^2,$$

which is commonly solved iteratively with gradient descent in an alternating direction fashion (fix ${\boldsymbol \lambda}$ solve for ${\textbf x}$ and then fix ${\textbf x}$ solve for ${\boldsymbol \lambda}$ within each iteration). In fact, this approach can be employed on splitting sets of variables, which turns out to be very efficient and effective in large-scale applications. Another motivation for the splitting algorithms is to allow incorporating nonsmooth functions such as shrinkage steps or clipping into the reconstruction model. The cost function in this case becomes

(23)$$\{{\hat{\textbf{x}}},{\hat{\textbf{z}}}\} = \mathop{{\rm arg}\, {\rm min}}\limits_{{\textbf x},{\textbf z}}\;\;{\left\| {\textbf x} \right\|_1} + \frac{\gamma}{2}\left\| {{\textbf y} - {\textbf A}{\textbf z}} \right\|_2^2\;\;{\rm subject}\,\,{\rm to}\;\;{\textbf x} = {\textbf z}.$$

Applying the augmented Lagrangian strategy on the “constraints” ${\textbf x} = {\textbf z}$ leads to the following optimization problem:

(24)$$\begin{split}\{{\hat{\textbf{x}}},{\hat{\textbf{z}}}\} &= \mathop{{\rm arg}\, {\rm min}}\limits_{{\textbf x},{\textbf z},{\boldsymbol \lambda}}\;\;{\left\| {\textbf x} \right\|_1} + \frac{\gamma}{2}\left\| {{\textbf y} - {\textbf A}{\textbf z}} \right\|_2^2 \\&\quad+ {{\boldsymbol \lambda}^T}({\textbf x} - {\textbf z}) + \frac{\alpha}{2}\left\| {{\textbf x} - {\textbf z}} \right\|_2^2,\end{split}$$

where techniques from gradient descent, shrinkage thresholding, and alternating direction [28] can be combined together to solve the problem iteratively. It is worth noting that we revert back and forth between unconstrained and constrained optimization here as part of a divide-and-conquer strategy typically employed in huge complex optimization problems.

Probabilistic optimization routines are largely omitted in this tutorial due to space constraints, which are nonetheless a large and an important group of algorithms. One example approach is from [29], where shot-noise-limited measurements are modeled using a Poisson model. Log likelihood is then optimized along with ${l_1}$ and TV-norm penalty terms using separable quadratic approximations.

Fig. 8. Patch-based image recovery iterates between two steps—global reconstruction of the image estimates and maximizing the sparsity level of all local image patches. Patch-based recovery works better for local sensing. Figure adapted from [30].

Download Full Size | PDF

4. Local versus Global Recovery

In practice, there are at least two compelling reasons why one would prefer local patch- or block-based recovery over global image-based recovery. The first reason involves efficiency: it is much simpler and faster to recover a small image patch or region, especially in a high-resolution setting. More importantly, if we follow the well-grounded auto-regressive image model in the compression community [31], then the larger the patch size, the less homogeneous (the more complex) it becomes, which increases $S$ making the signal less sparse and rendering sparse recovery less effective.

An immediate solution to this local-versus-global trade-off is to rely on multi-resolution representations such as the wavelet transform or the redundant wavelet packet [32] as the sparsifying operator ${\boldsymbol \Psi}$ in the recovery process. However, if sensing is performed locally (as it often is, as discussed in the next section), we can recover any random patch of any random size at any location in the image without conforming to a certain grid since overlapped recovered regions can simply be averaged to provide even more robust recovery and to prevent undesired blocking artifacts. One example of such a local–global recovery strategy is depicted in Fig. 8 where the iterative optimization algorithm seeks the sparsest set of DCT coefficients of overlapped patches that are consistent with the CS measurements [33,34]:

(25)$$\{{{\hat{\boldsymbol \alpha}}^k}\} = \mathop{{\rm arg}\, {\rm min}}\limits_{{{\boldsymbol \alpha}^k}}\;\;\sum\limits_k {\left\| {{{\boldsymbol \alpha}^k}} \right\|_1}\;\;{\rm subject}\,\,{\rm to}\;\;{{\boldsymbol \Phi}_k}{\cal P}(\{{\boldsymbol \Psi}({{\boldsymbol \alpha}^k}{)\} _k}) = {{\textbf y}_k},$$

where ${{\boldsymbol \alpha}_k}$ are DCT coefficients of the $k$th image patch associated with the local CS measurements ${{\textbf y}_k} = {{\boldsymbol \Phi}_k}{\boldsymbol \Psi}({{\boldsymbol \alpha}_k})$, ${\boldsymbol \Psi}$ is the IDCT operator acting as the local sparsifying transform, and ${\cal P}(.)$ is the global projection operator that combines the set of all involved image patches ${\boldsymbol \Psi}({{\boldsymbol \alpha}_k})$ back to the original image domain. The optimization problem in Eq. (25) can be solved efficiently by an iteratively alternating minimization procedure [28]. At iteration $t$ of the algorithm, a noisy estimate of the original global image ${{\textbf G}_t}$ consistent with the observations is reconstructed based on the information from the previous iteration, and the estimates of the coefficients ${{\boldsymbol \alpha}_k}$ at this iteration can then be found by soft-thresholding the DCT coefficients of the noisy image patches extracted from ${{\textbf G}_t}$.

5. Deep Neural Networks

Recent works in compressed sensing and sparse coding have focused on the application of new tools and ideas from deep neural networks. Initial models use a pretrained deep neural network (DNN) ${{\cal G}_{\boldsymbol \theta}}$ (which can be a variational auto-encoder, a generative adversarial network, etc.) as the structural constraint instead of sparsity [35,36]. This generator maps a latent representation or low-dimensional sparse code ${\textbf z}$ to the signal space: ${\textbf x} = {{\cal G}_{\boldsymbol \theta}}({\textbf z})$. Instead of requiring sparse signals, the deep network ${{\cal G}_{\boldsymbol \theta}}$ implicitly constrains output ${\textbf x}$ to belong in a low-dimensional manifold via its architecture and the weights adapted from the training data. Effectively, the choice of training data provides the prior knowledge that limits the set of possible solutions shown in Fig. 2(d). In other words, the network can learn the mapping that generates the high-dimensional signal ${\textbf x}$ in ${{\cal R}^N}$ from a latent low-dimensional sparse code ${\textbf z}$ in ${{\cal R}^S}$. Often, the sensing matrix can be co-trained with the optimizer here. A minimization process similar to that in compressed sensing is then employed for reconstruction

(26)$${\hat{\textbf{z}}} = \mathop{{\rm arg}\, {\rm min}}\limits_{\textbf z}\;\;\left\| {{\textbf y} - {\textbf A}{{\cal G}_{\boldsymbol \theta}}({\textbf z})} \right\|_2^2,$$

and the signal-of-interest can be obtained as ${\hat{\textbf{x}}} = {{\cal G}_{\boldsymbol \theta}}({\hat{\textbf{z}}})$.

Due to the nonlinear complex nature of deep neural networks, the cost function in Eq. (26) is highly nonconvex and the optimization problem is intractable [35]. A well-behaved approximation strategy involves gradient descent starting from randomly sampled points in the a priori signal distribution ${p_{\boldsymbol z}}({{\textbf z}_k})$ as follows:

(27)$${{\textbf z}_{k + 1}} \leftarrow {{\textbf z}_k} - \alpha \frac{\partial}{{\partial {\textbf z}}}\left\| {{\textbf y} - {\textbf A}{{\cal G}_{\boldsymbol \theta}}({\textbf z})} \right\|_2^2\;{|_{{\textbf z} = {{\textbf z}_k}}},$$

where $\alpha$ is the gradient-descent step size or the learning rate. Typically, training the network is a laborious effort involving numerous different measurement vectors and numerous gradient-descent steps with restarts from many initial steps.

Another group of approaches takes inspiration from the existing recovery algorithms augmented by DNN components such as ISTA-Net inspired by the ISTA algorithm [37], AMP-Net inspired by the approximate message passing algorithm (AMP) [38], or ADMM-Net inspired by the alternating direction method of multipliers (ADMM) algorithm [28]. More recent work making use of attention mechanisms such as the transformer architecture is also gaining popularity [39]. A more in-depth review of the literature can be found in [40,41].

Fig. 9. Top: typically an optical compressive sampling scheme will map a point source pseudo-randomly to the sampling device with a limited bandwidth and spatial (or temporal) extent. Bottom: conversely, a sensing point receives information from a region of the signal with a limited bandwidth and spatial (or temporal) extent.

Download Full Size | PDF

3. OPTICAL TECHNOLOGIES FOR COMPRESSED SENSING

In this section, we review some optical technologies often employed for multiplexing optical information for compressed sensing. Multiplexing information is critical for implementation of the sensing matrix ${\textbf A}$ and generating the measurement vector ${\textbf y}$ in the physical domain prior to sampling.

The optical technology used will have important implications for resolution and therefore the number of samples needed to represent the reconstructed signal. A common pitfall is to generate an oversampled representation of the signal, which would then overestimate the achieved compression ratio and therefore the performance of the compressed sensing system. The SBP or the TBP of the sensing waveform produced by the optical hardware and used to impart the sensing matrix will determine the number of pixels needed to appropriately represent the recovered signal. It should be noted that typically photonic compressed sensing hardware does not result in idealized matrices as discussed in Section 2 but in responses that have spatial (or temporal) extent as well as a bounded bandwidth as shown in Fig. 9. Features of the signal that are smaller than the minimum feature in this sensing waveform will not be recoverable, and for optimal sampling of the reconstructed signal the Nyquist rate is determined from the sensing waveform’s SBP or TBP. The number of compressive measurements needed can then be estimated from Eq. (17), which depends on the scene sparsity and the coherence. Another important consideration is the signal-to-noise ratio (SNR). For example, nonbinary signals or scenes with low contrast generally require measurements with higher SNR since the encoded information of interest will result in smaller changes between measurements. Additionally, systematic errors from nonidealities such as aberrations can be removed by in situ calibration of the photonic system, but this might not always be possible if the dimension of the reconstructed signal is very large (e.g., high pixel counts for imaging).

Fig. 10. (a) Scene information can be multiplexed using a coded aperture, which can be thought of as having many randomly placed pinhole cameras imaging the scene and making many overlapping copies of the object. (b) Picture of different coded apertures printed on a transparency against a fluorescent lamp background. (c) Fields emanating from the scene surface get filtered by the mask and pseudo-randomly multiplexed on a camera. (d) Rays emanating from a scene point $\eta$ get filtered out by the mask at ${y^\prime}$ finally hitting the sensor at $y$. The mask is some distance $b$ away from the scene and $f$ away from the camera. For a mask with a single hole, $f$ corresponds to the pinhole camera focal length. Similar triangles can be used to derive an expression for ${y^\prime}$ in terms of $y$ and $\eta$, which helps simplify the image formation Eq. (28), where $m$ is the mask magnification.

Download Full Size | PDF

Remember that psuedo-random sensing matrices are extremely useful in the context of compressed sensing, and therefore the optical technologies generally focus on implementing psuedo-random multiplexing. These optical multiplexing technologies can be split into two broad categories: (i) passive multiplexing, which leverages static optical devices, and (ii) active multiplexing, where a dynamic optical device is leveraged and some electronic device control is typically needed.

A. Passive Multiplexing

1. Coded Aperture

Coded apertures block light with patterned opaque materials in order to multiplex intensity information onto a sensor, an example of which is shown in Fig. 10. One of the earlier examples is from x-ray imaging where lenses that are impractical as materials have near-unity refractive indices and bulky mirrors operated near grazing angles are not desirable. Dicke [42] found that randomly distributed pinholes (50% open) can be used to efficiently image stars in x-ray by placing the pinholes between the object and image planes and using no further optics. Although a single pinhole can achieve imaging with high resolution (i.e., camera obscura), most of the light is lost, which reduces the SNR. With multiple pinholes, images from each hole are multiplexed on the measurement screen to increase SNR and subsequently demultiplexed digitally.

Initial designs had engineered patterns such as uniformly redundant arrays (URAs) [43] or modified URAs (MURAs) [44], which provided desirable correlation properties. We will first investigate the image formation process to understand why. Using the geometry in Fig. 10(a), an object intensity distribution $o(\xi ,\eta)$ assuming planar scene and an aperture function $a({x^\prime},{y^\prime})$ results in the following measurement $p(x,y)$:

(28)$$p(x,y) = \iint o(\xi ,\eta) a\left({\xi - \frac{{\xi - x}}{m},\eta - \frac{{\eta - y}}{m}} \right){\rm d}\xi {\rm d}\eta ,$$

where $b$ is the distance between the object plane and the aperture, $f$ is the distance from the aperture to the camera, and $m = \frac{{b + f}}{b}$ is the magnification factor [43]. This equation can be derived from simple trigonometric arguments shown in Fig. 10(d). A ray emanating from an object point $\eta$ will pass through the aperture at point ${y^\prime}$ and land on a camera pixel at $y$. Therefore light contribution from $\eta$ to $y$ is $o(\eta)a({y^\prime})$, which needs to be written purely in terms of coordinates $\eta$ (image coordinate) and $y$ (measurement coordinate). Without loss of generality, ${y^\prime}$ is offset from $\eta$ by some height $h$, which can be expressed in terms of $y$ and $\eta$ by looking at similar triangles: ${y^\prime} = \eta - h = \eta - \frac{{\eta - y}}{m}$.

As the distances between the scene and the code increase $b \to \infty$, the magnification factor asymptotically approaches unity $m \to 1$. In that case, we have

(29)$$\begin{split}p(x,y)& = \iint o(\xi ,\eta) a\left({x,y} \right){\rm d}\xi {\rm d}\eta \\&= a\left({x,y} \right)\iint o(\xi ,\eta)\,{\rm d}\xi {\rm d}\eta .\end{split}$$

This makes intuitive sense since a point source at infinity, $o(\xi ,\eta) = \delta (\xi ,\eta)$, will simply cast a shadow of the aperture onto the camera $p(x,y) = a(x,y)$. Another interesting case is $f = b$, where magnification becomes $m = 2$ and

(30)$$p(x,y) = \iint o(\xi ,\eta) {a_2}\left({x + \xi ,y + \eta} \right){\rm d}\xi {\rm d}\eta ,$$

where ${a_m} = a({x^\prime}/m,{y^\prime}/m)$ is the magnified version of the aperture, which is the equation for correlation and can be denoted by $p = o*{a_m}$. We can discretize the image formation equation for the general case as

(31)$$\begin{split}p(k\Delta x,l\Delta y) &= \sum\limits_{i,j} o(i\Delta \xi ,j\Delta \eta) {a_m}\\&\quad\times\left({i\frac{f}{b}\Delta \xi + k\Delta x,j\frac{f}{b}\Delta \eta + l\Delta y} \right)\Delta \xi \Delta \eta .\end{split}$$

If we set reconstructed image resolution as $\Delta \xi = \frac{b}{f}\Delta x$ and $\Delta \eta = \frac{b}{f}\Delta y$, then an increment in $i$ will be equivalent to an increment $k$ and the equation can be further simplified and yield a correlation equation for the general case:

(32)$$P(k,l) = \sum\limits_{i,j} O(i,j)A(i + k,j + l) = O*A,$$

where capital letters are used to denote the discretized versions of the corresponding variables and the $\Delta$ terms are absorbed by the amplitude of $O$. It is worth noting that in general $\Delta x,\Delta y$ are determined by the camera pixel size and the mask size is determined from $\Delta {x^\prime} = \frac{{2\Delta x}}{m}$ and $\Delta {y^\prime} = \frac{{2\Delta y}}{m}$ so that the magnified mask image is sampled at the Nyquist rate on the detector. We can then use $G$, the correlational inverse of the aperture function, to recover the orignal scene $\hat O = P*G$. Since $\hat O = O*A*G$, we want $A*G = \delta$, where $\hat O$ is an estimate of $O$. A general class of aperture functions called URAs [43] and MURAs [44] have been found to satisfy the property $A*G = \delta$.

Here, we reproduce the square MURA since it is often employed in coded aperture imaging. For a $p \times p$ aperture $A$ where $p$ is a prime number,

(33)$${A_{\textit{ij}}} = \left\{{\begin{array}{ll}0&{i = 0}\\1&{j = 0,i \ne 0}\\1&{{C_i}{C_j} = 1}\\0&{{\rm otherwise}}\end{array}} \right.,$$

where

(34)$${C_i} = \left\{{\begin{array}{*{20}{l}}1&{{x^2} = i\,\,{\rm mod}\,\,p\,\,{\rm for \, any}\,\,x \in \mathbb{Z}}\\{- 1}&{{\rm otherwise}}\end{array}} \right..$$

The decoding code is then

(35)$${G_{\textit{ij}}} = \left\{{\begin{array}{cl}1&{i + j = 0}\\1&{{A_{\textit{ij}}} = 1\,\,{\rm and}\,\,i + j \ne 0}\\{- 1}&{{A_{\textit{ij}}} = 0\,\,{\rm and}\,\,i + j \ne 0}\end{array}} \right..$$

Fig. 11. (a) Optical elements such as diffusers and random microlens arrays can be used to randomly multiplex information from the scene. The resulting caustic pattern is captured by the camera placed such that caustic patterns have high contrast. (b) Image of a commercial diffuser. (c) Caustic pattern for a point source at different lateral (top) and axial (bottom) positions. (c) adapted from [45].

Download Full Size | PDF

Pseudo-random binary arrays can also be used since they provide greater flexibility in code design. For example, MURAs for large $p$ have roughly 50% transparency, but the experimental setup might require a different ratio. For example, a sensing problem with higher SNR and lower sparsity will favor lower transparency (fewer pinholes). Nevertheless, MURAs lead to computationally cheaper algorithms since the signals can be reconstructed with a simple correlation with the precomputed inverse, $G$. This is beneficial for imaging applications where the pixel count is large and inverting a matrix becomes infeasible.

2. Diffusers and Random Microlens Arrays

Diffusers rely on surface roughness of a transparent dielectric and the corresponding spatially random refraction for pseudo-randomly multiplexing scene information as shown in Fig. 11(a). Diffusers are in many ways similar to coded apertures but leverage spatial phase modulation rather than amplitude modulation to generate a complex amplitude pattern known as a caustic pattern a distance from the diffuser. Caustic patterns are equivalent to the cobweb-like intensity pattern formed on the bottom of a swimming pool due to the random water surface height on a sunny day. Importantly, unlike coded apertures, diffusers do not rely on light absorption and therefore can provide less loss and better SNR. The simplest way to calibrate a diffuser setup is to operate far from the diffuser. In that case, a single point in the scene is used to calibrate the entire system since any other point on the same plane simply shifts the caustic pattern. Furthermore, points at different depths simply change the magnification of the calibrated system response. Suppose a point on the optical axis leads to a caustic pattern $h(x,y)$; then any point shifted by $(\xi ,\eta ,z)$ from the measurement point will lead to the following system response [46]:

(36)$$h(\xi ,\eta ,z) = h\left({s\frac{f}{b}\xi + sx,s\frac{f}{b}\eta + sy} \right),$$

where

(37)$$s = \frac{{{m_z}}}{{{m_b}}}, {m_z} = \frac{{z + f}}{z}, {m_b} = \frac{{b + f}}{b}.$$

Note the similarities between the coded aperture and the diffuser response formation equations. ${m_z}$ and ${m_b}$ correspond to aperture magnification at distances $z$ and $b,$ respectively, for coded aperture. Since the caustic pattern is measured some distance $b$ away from the diffuser, it needs to be properly scaled for a point at distance $z$.

A more accurate approach to modeling is to recover the surface roughness profile through phase retrieval algorithms and experiments followed by ray tracing [47]. There are many phase retrieval methods with extensive literature coverage [48–50], so we will not discuss them here.

3. Scattering

Recovering shape information from acoustic or electromagnetic scattering data has a wide range of applications from geophysical surveying to imaging and therefore has been extensively studied in the literature under the name inverse scattering [51] and will be omitted here. Scattering in compressed sensing has been used for two main applications: (i) to multiplex light information randomly or (ii) to create high bandwidth patterns for structured illumination. Ultimately, the SBP or TBP of the scattering process will constrain the compressed sensing system. Therefore, we will cover some statistical properties following the excellent treatment from Goodman [52] as such properties might be relevant either in optimally sampling the response of a scattering medium or in engineering the bandwidth of the structured illumination. Scattering approaches are similar to the use of diffusers but leverage the random interference of coherent sources transmitted though similar optics and the resulting speckle.

One interesting characteristic is to look at the probability density function (PDF) for the intensity from a speckle pattern (i.e., if we binned the intensity values and counted frequencies in each bin for speckle, what would the histogram look like?) also referred to as “first order statistics” by Goodman [52]. The speckle pattern for coherent polarized light can be thought of as arising from a random walk on the complex plane $A(x,y,z) = \frac{1}{{\sqrt N}}\sum\nolimits_{k = 1}^N {a_k}{e^{i{\phi _k}}}$, where ${a_k}$ and ${\phi _k}$ are random variables. The amplitude function is normalized by $\sqrt N$ so that the intensity that is proportional to $|A{|^2}$ is properly normalized. A random walk in the complex plane yields a Gaussian joint distribution for the real and imaginary parts of the complex amplitude

(38)$$p({A_r},{A_i}) = \frac{1}{{2{\sigma ^2}}}\exp \left\{{- \frac{1}{2}\frac{{A_r^2 + A_i^2}}{{{\sigma ^2}}}} \right\},$$

where ${\sigma ^2} = \mathop {\lim}\nolimits_{N \to \infty} \frac{1}{N}\sum\nolimits_{k = 1}^N \frac{{\langle |{a_k}{|^2}\rangle}}{2}$, which can be derived from the moments of ${A_r}$ and ${A_i}$. Often we measure only the intensity that is proportional to $|A{|^2} = A_r^2 + A_i^2$. Using the method of transformations and marginalizing over $\theta$, we find that the intensity for polarized speckle has a negative exponential distribution

(39)$$p(I) = \frac{1}{{2{\sigma ^2}}}\exp \left\{{- \frac{I}{{2{\sigma ^2}}}} \right\} = \frac{1}{{\langle I\rangle}}\exp \left\{{- \frac{I}{{\langle I\rangle}}} \right\}$$

for $I \ge 0$. Right away, the error introduced by quantization of a speckle pattern can be calculated from this PDF. We can also investigate what happens when fields are added incoherently in an intensity basis. Intuitively, we know that polarized speckle from a coherent source will have the highest contrast. Therefore, $C = \frac{{{\sigma _I}}}{{\langle I\rangle}}$ is a reasonable metric for contrast that is unity $C = 1$ for the ideal negative exponential distribution case above. With more work, it can be shown that the contrast is reduced by $\sqrt N$, where $N$ is the number of speckle patterns added incoherently in the intensity basis.

Another interesting characteristic is the auto-correlation function, which is referred to as “second order” statistical properties of speckle by Goodman [52]. For pseudo-random sampling of information, auto-correlation provides a measure of response sensitivity to changes to the input signal, which usually determines spectral resolution for spectroscopy, spatial resolution for imaging, and temporal resolution for RF sensing.

Here, we need to consider two geometries: (1) free-space geometry where speckle is measured some distance $z$ away from the scattering surface or (2) imaging geometry where speckle is imaged through a lens and is measured some distance $z$ away from the lens. The intensity auto-correlation functions calculated for these two example cases are very instructive:

1. For a square scattering spot with length $L$ in the free-space geometry, $(40)$${R_I}\!\left({\Delta x,\Delta y} \right) = {\langle I\rangle ^2}\left({1 + {{{\rm sinc}}^2}\left({\frac{{L\Delta x}}{{\lambda z}}} \right){{{\rm sinc}}^2}\left({\frac{{L\Delta y}}{{\lambda z}}} \right)} \right).$$$
2. For the imaging configuration with a circular lens diameter $D$, $(41)$${R_I}(r) = {\langle I\rangle ^2}\left({1 + {{\left| {2\frac{{{J_1}\left({\frac{{\pi Dr}}{{\lambda z}}} \right)}}{{\frac{{\pi Dr}}{{\lambda z}}}}} \right|}^2}} \right),$$$ where ${r^2} = \Delta {x^2} + \Delta {y^2}$, ${J_1}$ is the Bessel function of the first kind, and $\lambda$ is the wavelength.

For either case, we can define a width parameter $\delta x = \frac{{\lambda z}}{L}$ or $\delta r = \frac{{\lambda z}}{D}$ that can be thought of as the average speckle width and is important for determining speckle bandwidth and how to efficiently sample it. When speckle is projected onto the scene to sample it, the resolution of the system is set by the average speckle width. On the one hand, if scattering is used such that the signal forms a speckle response on the sensor, the average speckle width determines the optimal sampling width on the sensor. The statistical properties of blurred speckle are also important for compressed sensing; however, we will omit them here due to space constraints. A significantly more detailed treatment of speckle statistics can be found in [52].

Fig. 12. Example waveguides and cavities used for compressed sensing. (a) Scanning electron microscope (SEM) image of a silicon photonic ray chaotic cavity used for RF sensing [53]. (b) Optical image of a fiber bundle used for RF sensing [54]. (c) SEM image of a silicon spiral waveguide with bidirectional evanescent coupling used for spectroscopy [55]. (d) Optical image of long silicon multimode waveguides that is coiled to reduce footprint and used for RF sensing [56]. Figure adapted from [54–57].

Download Full Size | PDF

Fig. 13. (a) Scenes can be spatially patterned using a DMD that operates by tilting a grid of tiny mirrors (roughly 10 µm in size). Tilt angles are roughly ${\pm}{10^ \circ}$ with one sign corresponding to the on and the other to the off state. (b) Picture of a commercial DMD development kit. Red circle shows the micro-mirror module. (c) Fourier transform of the scene is obtained at the focal point behind the lens where a single detector (area shown in blue) takes a measurement.

Download Full Size | PDF

4. Waveguides and Cavities

Integrated photonic devices such as chaotic cavities and multimode waveguides hold promise as miniaturized sensing devices and can be produced cheaply en masse exploiting the mature micro-electronic processing technologies. Multimode waveguides can impart random phase between various propagating spatial modes over some distance and effectively yield speckle in both the spatial and temporal domains. Similarly, chaotic cavities can be designed to generate pseudo-random spectro-temporal behavior and time-domain speckle with even smaller footprints due to their reverberating nature. Chaotic cavities have many other applications such as in laser physics and thus have been extensively covered [58–61] and are often termed dynamical billiards due to their analogous behavior [62].

Overall, these devices try to mimic the behavior from scattering in smaller form factors, so discussions for the speckle statistics above are also highly relevant here. For example, Guerit et al. [63] provide a comprehensive theoretical and experimental study of speckle arising from multicore fibers and find that under some mild assumptions, the radius of a speckle grain scales as $\lambda z/D$, where $\lambda$ is the wavelength, $z$ is the propagation distance, and $D$ is the fiber diameter, which is very similar to the scattering result above. Often, a large number of modes are required from these devices to approximate speckle behavior well, which is determined by size and refractive index contrast.

Figure 12 shows example devices that result in a speckle-like response by approximating a random walk on the complex plane and are used for compressed sensing. Figure 12(a) shows a scanning electron microscope (SEM) image of a ray chaotic cavity where the induced features and the chamfer lead to the ray chaotic behavior and the reverberant nature of the disc keeps losses low and the device size small, which is used for RF sensing [53]. Figure 12(b) shows an optical image of a multicore fiber used to generate speckle for compressive RF sensing as well [54]. Figure 12(c) shows an SEM image of an evanescently coupled multimodal Si spiral waveguide where the wavelength dependent response is used for on-chip spectroscopy [55]. Evanescently coupled silicon nitride spirals have also been proposed for compressive on-chip spectral sensing [64]. Figure 12(d) shows another integrated waveguide, which is used for RF sensing [56].

B. Active Multiplexing

1. Spatial Light Modulators

Spatial light modulators (SLMs) are actively controllable optical devices that allow for the spatially dependent amplitude or phase modulation of an optical wavefront. Effectively, SLMs are the active equivalent of coded apertures and diffusers. Furthermore, SLMs are often combined with, for example, Fourier transformation with a lens to multiplex the modulated spatial positions onto a single measurement position, e.g., in the case of the single pixel camera [65]. There are two primary classes of SLMs used for compressed sensing, digital-micromirror-based devices and liquid-crystal-based devices. Digital micromirror devices (DMDs) are made up of millions of tiny mirrors (more or less 10 µm in size) that can be rotated by applying a voltage that can be used to create binary patterns as shown in Fig. 13(a) [66]. Even though operation is binary, dithering or pulse width modulation can be used to trade off spatial resolution or frame rate, respectively, for grayscale operation.

The transfer function of the DMD and a more detailed discussion can be found in [66], where the angle of the incoming wave, the square aperture size, and the pixel pitch are taken into account. It is worth noting here that due to the small mirror sizes and pitch length, a significant amount of power can be lost to the diffracted orders. The simplest way to obtain compressive measurements using a DMD is shown in Fig. 13(c) where the scene is patterned. A Fourier transform of the scene is obtained at the focal point behind the lens where a single detector can be placed to make a measurement. The finite extent of the detector essentially low-pass filters the signal, which approximates a dot product as shown in Eq. (45) for 1D in the next section.

Beyond DMDs, liquid-crystal-based devices such as liquid crystal on silicon (LCoS) technology allows spatial patterning of light using the anisotropic nature of liquid crystals and allows both amplitude and phase modulation as well as grayscale operation. However, the increase in grayscale quality and control comes at the cost of typically slower operation than DMDs operating in a binary fashion. There are many configurations, such as the twisted nematic configuration, which is widely used in display technology, vertically aligned nematic (VAN), electrically controlled, surface stabilized ferroelectric liquid crystal, and more [67]. A unique application of liquid crystal devices for compressed sensing is that, unlike DMDs, they allow for direct phase modulation and wavefront shaping.

Fig. 14. (a) Side view of a buffered x-cut Mach–Zehnder modulator (MZM) on a ${{\rm LiNbO}_3}$ substrate. (b) Top view of the MZM in push–pull configuration. (c) The simplest way to compressively sample a photonic link (red) is to map the RF signal using an MZM and to modulate the signal with a pseudo-random bit pattern (PRBS). Dot products between the signal and the PRBS are detected with a slow detector after low-pass filtering and form the measurement vector as discussed in [68]. The RF signal can then be recovered using a CS algorithm. CW, continuous-wave laser; LPF, low-pass filter; PD, photodiode.

Download Full Size | PDF

2. Electro-Optic Modulators

Electro-optic modulators (EOMs) are used for controlling the phase or the amplitude of a single spatial mode optical signal in the time domain. Within an EOM, nonlinear optical crystals such as lithium niobate are driven by radio frequency (RF) signals that modulate the refractive index of the crystal and therefore the phase of the light passing through it. Such phase-based EOMs can be used in an interferometric configuration to convert the phase modulation into intensity modulation as well [i.e., Mach–Zehnder modulator (MZM)] as shown in Fig. 14. More detail on different geometries and trade-offs can be found in Ref [69]. For a relative phase shift of $\phi$, the output intensity is

(42)$$I(t) = 2{I_0}(1 + \cos(\phi (t))),$$

(43)$$\phi (t) = \pi \frac{{v(t)}}{{{V_\pi}}},$$

where ${V_\pi}$ is the voltage required for shifting the phase by $\pi$ and depends on many factors such as the nonlinear index and the wavelength of the light. Often, for small amplitudes and by changing the DC bias, the MZM can be operated in a linear regime, which is often preferred for compressed sensing applications. Such temporal modulation combined with, for example, filtering or dispersive Fourier transformation can be used to implement psuedo-random multiplexing of points within a time-domain signal for compressed sensing.

One of the simplest ways to use MZMs to compressively sample a photonic link is to modulate the signal encoded lightwave with a pseudo-random bit string (PRBS) and to low-pass filter the resulting signal [68]. Low-pass filtering essentially calculates a dot product between the photonic signal and the PRBS. More specifically, the input lightwave is temporally varying with the encoded signal, so we get

(44)$$\begin{split}I(t) &= 2{I_0}(t)(1 + {\rm \cos}(\phi (t))) \\&\approx 2{I_0}(t)(1 + {\phi _{{\rm PRBS}}}(t))\,\,{\rm for}\,\,{\rm small}\,\,{\phi _{{\rm PRBS}}}\,\,{\rm and}\,\,\pi /2\,\,{\rm bias}{\rm .}\end{split}$$

After low-pass filtering, the measured signal is proportional to

(45)$$y(t) \propto \frac{1}{T}\int_{- \infty}^t {e^{- (t - {t^\prime})/T}}{I_0}({t^\prime}){\phi _{{\rm PRBS}}}({t^\prime}){\rm d}{t^\prime},$$

where $T$ is the time constant of the low-pass filter. The expression above approximates a dot product for large $T$.

Beyond EOMs, acousto-optic modulators (AOMs) are another method for active control of light where light is scattered from sound waves. Although the authors are not aware of any compressive sensing schemes that make use of this effect, there have been several non-compressive sensing applications where it is used for frequency multiplexing [70] and has potential for future use. An excellent review article for microscopy applications can be found in [71].

4. APPLICATIONS

In this section, we will provide a brief survey of how CS algorithms and CS photonic sampling technologies are used in practice to solve challenging sensing problems. As this is a tutorial, the overview in this section is not meant to serve as a comprehensive review but rather as a collection of examples that we believe to have pedagogical value.

Fig. 15. A digital light projector illuminates the scene, and a lens is used to collect the reflected light which is split into three color channels using an X-cube. Each photodetector measures one color channel. The measurements are used to reconstruct a color image of the scene using ${\ell _1}$ minimization. Figure adapted from Welsh et al. [72].

Download Full Size | PDF

Fig. 16. Single photon fluorescence imaging using a single PMT. A DMD is mapped onto the sample plane where the excitation is structured. Fluorescence signal acquires the same pattern as the DMD. Images at the bottom show ground-truth (camera) and reconstructed images at different compression ratios. Figure reprinted from Studer et al. [73].

Download Full Size | PDF

A. Single Pixel Camera

The single pixel camera is one of the earliest and perhaps one of the most well-known applications of compressed sensing. High-resolution cameras are only available in a small portion of the electromagnetic spectrum (mostly visible). For many other portions of the spectrum (e.g., far-IR or terahertz) or in low-light settings that require photomultiplier tubes (PMTs), only single detectors are cost effective or feasible. In order to get 2D information, that pixel needs to be scanned across the scene, which is time consuming. Compressive single pixel imaging can enable high-resolution imaging at such wavelengths with faster imaging speeds [65]. There are many excellent review articles on the topic [74,75]. Figure 15 shows an example experiment where an image of the scene is formed on the DMD and then the light from the DMD is collected onto a single detector using a lens. Multiple measurements from the scene (each corresponding to a different DMD pattern) can be used to reconstruct the original scene. Other application areas of single pixel imaging include lidar [76,77], non-line of sight imaging [78], 3D imaging [79], Raman imaging [80], and ghost imaging [81].

B. Microscopy

1. Fluorescence Microscopy

The single pixel imaging concept can be extended to microscopy. Studer et al. propose a compressive fluorescence microscopy and hyperspectral imaging (HSI) method for biological applications [73]. PMTs are extremely sensitive detectors that are widely used in microscopy as they work well in low photon count applications. However, they are expensive and bulky, and it is difficult to make PMT arrays for high-resolution imaging applications, which makes single pixel imaging using CS a very attractive solution. In their setup a DMD is mapped onto the sample and the fluorescent signal is collected by a PMT as shown in Fig. 16. When imaging fluorescent beads, they use the Hadamard basis for the sensing matrix since the scene is sparse in its natural domain (i.e., spatial domain). The Fourier basis is also maximally incoherent with the spatial basis; however, Hadamard is easier to implement with the binary DMD patterns. Elements of the Hadamard patterns take on values of ${+}1$ and ${-}1$ that are mapped to 1 and 0, respectively, on the DMD. Another method often used, although not employed in this work, is to measure the response for the ${+}1{\rm s}$ and the ${-}1{\rm s}$ separately and to subtract them. This approach leads to $2 \times$ more measurements, but can provide increased SNR and dynamic range for differential sensing (e.g., balanced detection) due to background removal and noise cancellation. They show significant compression ratios can be achieved, which translates to either higher speed operation or higher resolution at the same frame rate.

Kuo et al. propose an on-chip fluorescence microscope using a thin diffuser placed between the sample and the camera [46]. On-chip microscopes can be lightweight and portable and have reduced hardware complexity. Due to the omnidirectional nature of fluorescence signals, samples must be placed close to the sensor, which deviates from the case discussed in the previous section on diffusers. Samples close to the sensor complicate the calibration process, and so Kuo et al. propose a model-based interpolation scheme to significantly reduce the calibration time and data.

Miniscopes have been extensively used for imaging the brain activity of freely moving mice due to their compact design. Yanny et al. propose a 3D miniscope that replaces the tube lens of a popular miniscope design with a diffuser [82]. In order to achieve high-frequency patterns from each depth plane, they use a multifocal microlens diffuser since a single focal length would yield a high-frequency diffuser pattern only at one depth plane. Microlenses are initially randomly placed, but their parameters (e.g., microlens positions) are fine tuned by minimizing a coherence-based loss function using gradient descent. In a similar vein, Adams et al. aim to build a very thin and lightweight 3D fluorescence microscope and achieve this by using coded aperture with MURA patterns [83]. The reduced bulk of such a system could be highly beneficial for mounting to an animal and imaging brain activity over large transverse fields of view. There are many other examples of compressive fluorescence microscopy techniques [84–88], and a more comprehensive review of the literature can be found in [89].

2. Multiphoton Fluorescence Microscopy

Multiphoton fluorescence imaging is a critical technology in biomedical imaging for surpassing the limitations of single photon imaging techniques such as a low signal-to-background ratio and small penetration depth due to strong scattering from biological tissue. These are circumvented by the larger penetration depth of a longer wavelength excitation and small background since multiphoton absorption has a strong intensity dependence allowing for depth resolved excitation at the laser focus. Conventional multiphoton microscopes often raster scan the scene, which we know to be inefficient and time consuming especially since the scenes are often sparse in the spatial domain.

Alemohammad et al. propose a temporally focused compressive two-photon microscope where a DMD acts both as a structured light source and a diffraction grating [34] as shown in Fig. 17. Pseudo-random binary patterns are used for the sampling matrix. In addition to the benefits offered by compressed sensing, random access of the sample plane enables more sophisticated adaptive image scanning.

Fig. 17. Experimental setup for a temporally focused two-photon microscope is shown on the left, where the DMD acts as both a structured illumination source and a diffraction grating. Images on the right show reconstructed images using 10% of the Nyquist rate along with ground-truth images. Figure adapted from Alemohammad et al. [34].

Download Full Size | PDF

Wijesinghe et al. implement the same compressive multiphoton microscope and argue that a random Morlet basis is better suited for the given optical setup [90]. They show that high-frequency components from Hadamard or pseudo-random patterns get low-pass filtered by the optical system during pattern projection, which causes basis mismatch. Random Morlet patterns are essentially low-pass filtered random patterns matched to the optical system, which removes the basis mismatch. Alternatively, this issue could also be resolved in post-processing by incorporating the transfer function of the optical system.

Fig. 18. (a) Schematic for an on-chip evanescently coupled spiral spectrometer. (b) Far-field measurements used to reconstruct spectra. (c) Sparse spectrum reconstruction. Figure adapted from Redding et al. [55].

Download Full Size | PDF

Kazemipour et al. propose a tomographic two-photon imager that is able to resolve 1 billion voxels per second [91]. A 2D plane with $N \times N$ pixels is sampled with four line scans (45º increments) resulting in $4 \times N$ measurements. This extreme compression ratio is achieved by first taking a full raster scan image of the scene and segmenting the parts of the scene that are very sparse. An SLM is then used in the intensity modulation configuration to only allow light from the segmented regions. Other examples of compressive two-photon microscopy can be found in [92–94].

C. On-Chip Spectrum Sensing

The resolving power of conventional spectrometers depends strongly on the disperser size and pathlength where bulky free-space optics are typically used. The large disperser sizes required for high-resolution spectroscopy limit the ability to integrate high-resolution spectrometers into chip-scale photonic devices. In order to solve this problem, Redding et al. propose using evanescently coupled multimode spiral waveguides to randomly multiplex spectral information [55] as shown in Fig. 18. The response of these devices is in many ways similar to wavelength dependent speckle and can efficiently multiplex information. In their work, spectra are reconstructed both with and without using compressed sensing. In such applications, CS can be leveraged to reconstruct spectra with either higher resolution or larger range using a limited number of measurements. Other works have also used spirals [64] and ring resonators [95] for compressive on-chip spectrum sensing.

Fig. 19. Single disperser CASSI. (a) Scene is imaged onto the coded aperture which is then relayed using a modified $4\!f$ system. The dispersive element in the Fourier plane of the $4\!f$ system shifts each color channel in the detector plane. (b) An example scene is shown where each ping-pong ball is illuminated with a different source. The spectra for each source are shown. (c) The measurement is used to reconstruct 28 color channels where the green ball is more apparent in the shorter wavelength channels compared to the red ball. Adapted from Wagadarikar et al. [96].

Download Full Size | PDF

Fig. 20. Experimental setup for a coded aperture micro-endoscope and reconstruction results for various objects. Figure adapted from Shin et al. [97].

Download Full Size | PDF

D. Hyperspectral Imaging

HSI captures spectral information for each pixel (or voxel for 3D HSI) in a given scene as shown in Fig. 19 . Since high-dimensional hyperspectral information must be captured with a 2D sensor, CS has been extensively utilized to alleviate the need for a large number of measurements (i.e., pixels and frames). An excellent introduction to the topic can be found in [98].

Gehm et al. propose a dual-disperser coded aperture setup known as CASSI where the light from the scene is dispersed and patterned with a coded aperture [100]. Due to dispersion, different wavelengths of light are patterned with shifted versions of the aperture. The second disperser then undoes the effect of the first dispersion. This way hyperspectral information encoded in 2D is detected in a single shot.

Wagadarikar et al. propose a modified version of the CASSI system with a single disperser [96]. Here the scene is patterned with a coded aperture first so all the wavelength channels get the same pattern as shown in Fig. 10. A coded hyperspectral data cube is subsequently sheared using a disperser and projected onto the camera. This method tends to get better spectral resolution at the expense of reduced spatial resolution.

Alemohammad et al. propose an HSI system target on particulate scenes where the coded aperture is completely eliminated as it reduces SNR and a conventional image is simultaneously captured with the sheared spectral data [101]. Here the spatial sparsity of the particles and their naturally random distribution and motion is leveraged. Co-registered conventional images along with the sparsity constraint are then needed to disentangle hyperspectral data of the particles.

Fig. 21. (a) Experimental setup for a compressive OCT system. Pulses from a mode locked laser (MLL) are chirped, which allows spectrum-to-time mapping, and each pulse is encoded with a unique binary pattern. The pulses are then compressed and sent to the OCT system. (b) Reconstruction results for a scene with dimensions ${100} \times {150} \times {192}$ at two different sampling rates. Figure adapted from Stroud et al. [99].

Download Full Size | PDF

E. Lens-Free Micro-Endoscope

Fiber microendoscopes enable imaging deep inside tissue with minimal damage. However, their miniature size and correspondingly limited number of spatial measurements result in low resolution and a narrow field of view. Compressed sensing can be useful in this scenario to maximize the image information captured with this limited number of measurements [97,102–105].

Shin et al. propose using a $\rm{Ti}{\rm{O}_2}$ tipped single-mode fiber on the illumination side where wavelength dependent speckle patterns are formed on the sample to be imaged [102]. Light from the sample is then detected with a single photodetector. A tunable laser is used to scan the wavelength in small increments to generate distinct speckle patterns.

In another work, Shin et al. show that a coded aperture placed in the distal end of a multicore fiber can be used to efficiently multiplex information from the scene [97] as shown in Fig. 20. In this case, using a compressive approach enables significant reduction in fiber size, which is important for minimal invasiveness.

Fig. 22. (a) Compressed ultrafast photography (CUP) system where a DMD is used in conjunction with a streak camera to image high speed events. (b)–(d) Reconstructions for various scenes at high speed. Reprinted from Liang and Wang [106].

Download Full Size | PDF

Amitonova and De Boer propose using a multimode fiber to produce speckle patterns whose characteristics were modified by a DMD [104]. DMD is used to control phase through an amplitude holography setup.

Recently, Guérit et al. proposed using a combination of an SLM and a multicore fiber as well for minimally invasive endoscopy applications. Their work is very comprehensive and goes into details of estimating speckle statistics for optimal sensing [63].

F. Optical Coherence Tomography

Optical coherence tomography (OCT) typically uses low coherence light sources to acquire 3D information from interferometric measurements. The high-dimensional nature of OCT data and the need for high frame rates provides opportunities for compressed sensing.

Stroud et al. propose an OCT system where pseudo-random binary patterns are imprinted on the spectrum of pulses from a mode locked laser [99] as shown in Fig. 21. This enables a high compression ratio, which implies either slower ADCs can be used or alternatively the same ADC can be used to make faster measurements.

Fig. 23. Experimental setup for RF compressed sensing where pulses from a mode locked laser (MLL) are temporally broadened and an RF signal is imprinted on the intensity. With the help of a diffraction grating and an SLM, compressive measurements of the RF signals are acquired. Plots at the bottom show reconstructions using the pseudo-inverse and sparse recovery using ${l_1}$ penalty. Figure adapted from Valley et al. [107].

Download Full Size | PDF

Liu and Kang propose a compressive spectral domain OCT system where the $k$-space is randomly undersampled [108]. Random undersampling is achieved by using a CCD camera with randomly addressable pixels that have been developed for high speed imaging where the bottleneck is electrical (transfer of data from pixels to the computer).

G. High Speed Imaging

High speed imaging beyond the frame rate of cameras has been of great interest for elucidating ultrafast phenomena. Success in this field has been partly due to the generation of short femtosecond pulses that provide superior temporal sectioning as well as high intensities. This problem is very attractive to compressed sensing since it can push the speed limit of such imagers further by recovering more from fewer measurements.

For example, Gao et al. propose a method where a streak camera is used in conjunction with a DMD as shown in Fig. 22 to image macroscopic objects at 100 GHz frame rates for bursts of up to 350 frames [109].

Bosworth et al. propose a method where mode locked laser pulses are dispersed, patterned with random binary patterns, and compressed again followed by wavelength-to-space mapping to image a microscopic object with high throughput for continuous imaging [30].

H. Radio Frequency Sensing

Many RF applications make use of RF signals sparse in some domain and therefore compressed sensing relax constraints on ADCs such as size, weight, or power consumption [107]. Valley et al. propose the setup shown in Fig. 23 where the RF signal is imprinted on the intensity of a chirped pulse from a mode locked laser whereby RF information in time is mapped to spectrum. A diffraction grating then maps spectrum to space, which is coded with an SLM and finally detected by an ADC. Each pattern on the SLM yields a single compressive measurement. RF signals can be recovered using measurements from multiple patterns (far fewer than the Nyquist rate).

Other work in the literature has made use of other active or passive multiplexing technologies. For example, Bosworth and Foster use EOMs to pattern chirped pulses prior to imprinting RF information [110]. In another work, Valley et al. and Sefler et al. make use of speckle from a multimode waveguide to multiplex information [54,111], and in their more recent work, Borlaug et al. study SiN multimode spirals for integrated RF sensing [56].

5. CONCLUSION

In this tutorial we cover the mathematical background, algorithms, and photonic technologies necessary for designing photonic compressed sensing systems, and some example systems for specific applications. Compressed sensing (CS) methods can improve system performance or help surpass physical limits for conventional photonic sensing modalities. In particular, by leveraging compression, CS methods can provide significant improvements when the underlying conventional sensing method is measurement constrained. Successful compression of signals usually involves pseudo-random multiplexing of information. Signals are then reconstructed using computational methods that have a significant effect on reconstruction quality. Recent progress in machine learning can help push compression ratios further in the future with better decoding algorithms as well as hardware co-design.

Funding

Air Force Office of Scientific Research (FA9550-21-1-0372); Defense Threat Reduction Agency (HDTRA1-20-2-0001).

Disclosures

The author declares no conflicts of interest.

Data availability

No data were generated or analyzed in the presented research.

REFERENCES

1. E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]

2. M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok, Introduction to Compressed Sensing (2012).

3. J. Romberg, “Imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 14–20 (2008). [CrossRef]

4. S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing (Springer, 2013).

5. Y. C. Eldar and G. Kutyniok, Compressed Sensing: Theory and Applications (Cambridge University, 2012).

6. G. Strang, Introduction to Linear Algebra (2020).

7. A. V. Oppenheim, J. Buck, M. Daniel, A. S. Willsky, S. H. Nawab, and A. Singer, Signals & Systems (Pearson Educación, 1997).

8. C. Shannon, “Communication in the presence of noise,” Proc. IRE 37, 10–21 (1949). [CrossRef]

9. A. W. Lohmann, R. G. Dorsch, D. Mendlovic, Z. Zalevsky, and C. Ferreira, “Space–bandwidth product of optical signals and systems,” J. Opt. Soc. Am. A 13, 470–473 (1996). [CrossRef]

10. E. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory 51, 4203–4215 (2005). [CrossRef]

11. T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009), Vol. 2.

12. R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constr. Approx. 28, 253–263 (2008). [CrossRef]

13. E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pure Appl. Math. 59, 1207–1223 (2006). [CrossRef]

14. L. Stanković, M. Brajović, D. Mandic, I. Stanković, and M. Daković, “Improved coherence index-based bound in compressive sensing,” IEEE Signal Process. Lett. 28, 1110–1114 (2021). [CrossRef]

15. R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. (Cambridge University, 2012).

16. D. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” IEEE Trans. Inf. Theory 52, 6–18 (2006). [CrossRef]

17. D. Donoho and J. Tanner, “Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing,” Philos. Trans. R. Soc. A 367, 4273–4293 (2009). [CrossRef]

18. S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process. 41, 3397–3415 (1993). [CrossRef]

19. J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory 53, 4655–4666 (2007). [CrossRef]

20. W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. Inf. Theory 55, 2230–2249 (2009). [CrossRef]

21. D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Appl. Comput. Harmon. Anal. 26, 301–321 (2009). [CrossRef]

22. G. B. Dantzig, “Linear programming,” Oper. Res. 50, 42–47 (2002). [CrossRef]

23. T. Blumensath and M. Davies, “Iterative hard thresholding for compressive sensing,” Appl. Comput. Harmon. Anal. 27, 265–274 (2009). [CrossRef]

24. I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pure Appl. Math. 57, 1413–1457 (2004). [CrossRef]

25. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci. 2, 183–202 (2009). [CrossRef]

26. J. M. Bioucas-Dias and M. A. Figueiredo, “A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16, 2992–3004 (2007). [CrossRef]

27. D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Academic, 2014).

28. Y. Yang, J. Sun, H. Li, and Z. Xu, “ADMM-CSNet: a deep learning approach for image compressive sensing,” IEEE Trans. Pattern Anal. Mach. Intell. 42, 521–538 (2018). [CrossRef]

29. Z. T. Harmany, R. F. Marcia, and R. M. Willett, “This is spiral-tap: sparse Poisson intensity reconstruction algorithms-theory and practice,” IEEE Trans. Image Process. 21, 1084–1096 (2012). [CrossRef]

30. B. T. Bosworth, J. R. Stroud, D. N. Tran, T. D. Tran, S. Chin, and M. A. Foster, “High-speed flow microscopy using compressed sensing with ultrafast laser pulses,” Opt. Express 23, 10521–10532 (2015). [CrossRef]

31. J. W. Woods, Multidimensional Signal, Image, and Video Processing and Coding (Elsevier, 2012).

32. S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way (Elsevier, 2009).

33. D. N. Tran, D. N. Tran, S. P. Chin, and T. D. Tran, “Local sensing with global recovery,” in IEEE International Conference on Image Processing (ICIP) (IEEE, 2015), pp. 4313–4316.

34. M. Alemohammad, J. Shin, D. N. Tran, J. R. Stroud, S. P. Chin, T. D. Tran, and M. A. Foster, “Widefield compressive multiphoton microscopy,” Opt. Lett. 43, 2989–2992 (2018). [CrossRef]

35. D. Van Veen, A. Jalal, M. Soltanolkotabi, E. Price, S. Vishwanath, and A. G. Dimakis, “Compressed sensing with deep image prior and learned regularization,” arXiv:1806.06438 (2018).

36. Y. Wu, M. Rosca, and T. Lillicrap, “Deep compressed sensing,” in International Conference on Machine Learning (PMLR, 2019), pp. 6850–6860.

37. J. Zhang and B. Ghanem, “ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 1828–1837.

38. Z. Zhang, Y. Liu, J. Liu, F. Wen, and C. Zhu, “Amp-net: denoising-based deep unfolding for compressive image sensing,” IEEE Trans. Image Process. 30, 1487–1500 (2020). [CrossRef]

39. D. Ye, Z. Ni, H. Wang, J. Zhang, S. Wang, and S. Kwong, “Csformer: Bridging convolution and transformer for compressive sensing,” arXiv:2112.15299 (2021).

40. M. T. McCann, K. H. Jin, and M. Unser, “Convolutional neural networks for inverse problems in imaging: a review,” IEEE Signal Process. Mag. 34(6), 85–95 (2017). [CrossRef]

41. A. L. Machidon and V. Pejović, “Deep learning for compressive sensing: a ubiquitous systems perspective,” Artif. Intell. Rev. (2022). [CrossRef]

42. R. Dicke, “Scatter-hole cameras for x-rays and gamma rays,” Astrophys. J. 153, L101 (1968). [CrossRef]

43. E. E. Fenimore and T. M. Cannon, “Coded aperture imaging with uniformly redundant arrays,” Appl. Opt. 17, 337–347 (1978). [CrossRef]

44. S. R. Gottesman and E. E. Fenimore, “New family of binary arrays for coded aperture imaging,” Appl. Opt. 28, 4344–4352 (1989). [CrossRef]

45. N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng, and L. Waller, “Diffusercam: lensless single-exposure 3d imaging,” Optica 5, 1–9 (2018). [CrossRef]

46. G. Kuo, F. L. Liu, I. Grossrubatscher, R. Ng, and L. Waller, “On-chip fluorescence microscopy with a random microlens diffuser,” Opt. Express 28, 8384–8399 (2020). [CrossRef]

47. N. Antipa, S. Necula, R. Ng, and L. Waller, “Single-shot diffuser-encoded light field imaging,” in IEEE International Conference on Computational Photography (ICCP) (IEEE, 2016), pp. 1–11.

48. E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Commun. Pure Appl. Math. 66, 1241–1274 (2013). [CrossRef]

49. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21, 2758–2769 (1982). [CrossRef]

50. P. F. Almoro, L. Waller, M. Agour, C. Falldorf, G. Pedrini, W. Osten, and S. G. Hanson, “Enhanced deterministic phase retrieval using a partially developed speckle field,” Opt. Lett. 37, 2088–2090 (2012). [CrossRef]

51. D. L. Colton, R. Kress, and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory (Springer, 1998), Vol. 93.

52. J. W. Goodman, “Statistical properties of laser speckle patterns,” in Laser Speckle and Related Phenomena (Springer, 1975), pp. 9–75.

53. H. Sun, B. T. Bosworth, B. C. Grubel, M. R. Kossey, M. A. Foster, and A. C. Foster, “Compressed sensing of sparse rf signals based on silicon photonic microcavities,” in CLEO: Science and Innovations (Optica Publishing Group, 2017), paper SM1O-5.

54. G. A. Sefler, T. J. Shaw, and G. C. Valley, “Demonstration of speckle-based compressive sensing system for recovering RF signals,” Opt. Express 26, 21390–21402 (2018). [CrossRef]

55. B. Redding, S. F. Liew, Y. Bromberg, R. Sarma, and H. Cao, “Evanescently coupled multimode spiral spectrometer,” Optica 3, 956–962 (2016). [CrossRef]

56. D. B. Borlaug, S. Estrella, C. T. Boone, G. A. Sefler, T. J. Shaw, A. Roy, L. Johansson, and G. C. Valley, “Photonic integrated circuit based compressive sensing radio frequency receiver using waveguide speckle,” Opt. Express 29, 19222–19239 (2021). [CrossRef]

57. B. C. Grubel, B. T. Bosworth, M. R. Kossey, H. Sun, A. B. Cooper, M. A. Foster, and A. C. Foster, “Silicon photonic physical unclonable function,” Opt. Express 25, 12710–12721 (2017). [CrossRef]

58. H. Cao and J. Wiersig, “Dielectric microcavities: model systems for wave chaos and non-Hermitian physics,” Rev. Mod. Phys. 87, 61 (2015). [CrossRef]

59. J. U. Nockel and A. D. Stone, “Ray and wave chaos in asymmetric resonant optical cavities,” Nature 385, 45–47 (1997). [CrossRef]

60. T. Tanaka, M. Hentschel, T. Fukushima, and T. Harayama, “Classical phase space revealed by coherent light,” Phys. Rev. Lett. 98, 033902 (2007). [CrossRef]

61. M. Lebental, J.-S. Lauret, J. Zyss, C. Schmit, and E. Bogomolny, “Directional emission of stadium-shaped microlasers,” Phys. Rev. A 75, 033806 (2007). [CrossRef]

62. D. Szasz, L. Bunimovich, D. Burago, et al., Hard Ball Systems and the Lorentz Gas, Encyclopedia of Mathematical Sciences (Springer, 2013).

63. S. Guerit, S. Sivankutty, J. Lee, H. Rigneault, and L. Jacques, “Compressive imaging through optical fiber with partial speckle scanning,” SIAM J. Imag. Sci. 15, 387–423 (2022). [CrossRef]

64. D. Irvine, V. Kilic, M. Foster, and A. Foster, “Evanescently coupled multimode silicon nitride waveguides for on chip spectroscopy,” in Conference on Lasers and Electro-Optics (CLEO) (OSA, 2022).

65. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

66. Y.-X. Ren, R.-D. Lu, and L. Gong, “Tailoring light with a digital micromirror device,” Ann. Phys., Lpz. 527, 447–470 (2015). [CrossRef]

67. Z. Zhang, Z. You, and D. Chu, “Fundamentals of phase-only liquid crystal on silicon (LCOS) devices,” Light Sci. Appl. 3, e213 (2014). [CrossRef]

68. J. Nichols and F. Bucholtz, “Beating Nyquist with light: a compressively sampled photonic link,” Opt. Express 19, 7339–7348 (2011). [CrossRef]

69. E. L. Wooten, K. M. Kissa, A. Yi-Yan, E. J. Murphy, D. A. Lafaw, P. F. Hallemeier, D. Maack, D. V. Attanasio, D. J. Fritz, G. J. McBrien, and D. E. Bossi, “A review of lithium niobate modulators for fiber-optic communications systems,” IEEE J. Sel. Top. Quantum Electron. 6, 69–82 (2000). [CrossRef]

70. E. D. Diebold, B. W. Buckley, D. R. Gossett, and B. Jalali, “Digitally synthesized beat frequency multiplexing for sub-millisecond fluorescence microscopy,” Nat. Photonics 7, 806–810 (2013). [CrossRef]

71. M. Duocastella, S. Surdo, A. Zunino, A. Diaspro, and P. Saggau, “Acousto-optic systems for advanced microscopy,” J. Phys. Photon. 3, 012004 (2020). [CrossRef]

72. S. S. Welsh, M. P. Edgar, R. Bowman, P. Jonathan, B. Sun, and M. J. Padgett, “Fast full-color computational imaging with single-pixel detectors,” Opt. Express 21, 23068–23074 (2013). [CrossRef]

73. V. Studer, J. Bobin, M. Chahid, H. S. Mousavi, E. Candes, and M. Dahan, “Compressive fluorescence microscopy for biological and hyperspectral imaging,” Proc. Natl. Acad. Sci. USA 109, E1679–E1687 (2012). [CrossRef]

74. G. M. Gibson, S. D. Johnson, and M. J. Padgett, “Single-pixel imaging 12 years on: a review,” Opt. Express 28, 28190–28208 (2020). [CrossRef]

75. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13, 13–20 (2019). [CrossRef]

76. A. Colaço, A. Kirmani, G. A. Howland, J. C. Howell, and V. K. Goyal, “Compressive depth map acquisition using a single photon-counting detector: parametric signal processing meets sparsity,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 96–102.

77. N. Radwell, S. D. Johnson, M. P. Edgar, C. F. Higham, R. Murray-Smith, and M. J. Padgett, “Deep learning optimized single-pixel lidar,” Appl. Phys. Lett. 115, 231101 (2019). [CrossRef]

78. Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of Fourier spectrum acquisition,” Nat. Commun. 6, 6225 (2015). [CrossRef]

79. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science 340, 844–847 (2013). [CrossRef]

80. C. Scotté, S. Sivankutty, R. A. Bartels, and H. Rigneault, “Line-scan compressive Raman imaging with spatiospectral encoding,” Opt. Lett. 45, 5567–5570 (2020). [CrossRef]

81. Y. Bromberg, O. Katz, and Y. Silberberg, “Ghost imaging with a single detector,” Phys. Rev. A 79, 053840 (2009). [CrossRef]

82. K. Yanny, N. Antipa, W. Liberti, S. Dehaeck, K. Monakhova, F. L. Liu, K. Shen, R. Ng, and L. Waller, “Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy,” Light Sci. Appl. 9, 171 (2020). [CrossRef]

83. J. K. Adams, V. Boominathan, B. W. Avants, D. G. Vercosa, F. Ye, R. G. Baraniuk, J. T. Robinson, and A. Veeraraghavan, “Single-frame 3D fluorescence microscopy with ultraminiature lensless flatscope,” Sci. Adv. 3, e1701548 (2017). [CrossRef]

84. A. Ghezzi, A. Farina, A. Bassi, G. Valentini, I. Labanca, G. Acconcia, I. Rech, and C. D’Andrea, “Multispectral compressive fluorescence lifetime imaging microscopy with a SPAD array detector,” Opt. Lett. 46, 1353–1356 (2021). [CrossRef]

85. V. J. Parot, C. Sing-Long, Y. Adam, U. L. Böhm, L. Z. Fan, S. L. Farhi, and A. E. Cohen, “Compressed Hadamard microscopy for high-speed optically sectioned neuronal activity recordings,” J. Phys. D 52, 144001 (2019). [CrossRef]

86. M. Pascucci, S. Ganesan, A. Tripathi, O. Katz, V. Emiliani, and M. Guillon, “Compressive three-dimensional super-resolution microscopy with speckle-saturated fluorescence excitation,” Nat. Commun. 10, 1327 (2019). [CrossRef]

87. G. Calisesi, M. Castriotta, A. Candeo, A. Pistocchi, C. D’Andrea, G. Valentini, A. Farina, and A. Bassi, “Spatially modulated illumination allows for light sheet fluorescence microscopy with an incoherent source and compressive sensing,” Biomed. Opt. Express 10, 5776–5788 (2019). [CrossRef]

88. C. Fang, T. Yu, T. Chu, W. Feng, F. Zhao, X. Wang, Y. Huang, Y. Li, P. Wan, W. Mei, and D. Zhu, “Minutes-timescale 3D isotropic imaging of entire organs at subcellular resolution by content-aware compressed-sensing light-sheet microscopy,” Nat. Commun. 12, 1–13 (2021). [CrossRef]

89. G. Calisesi, A. Ghezzi, D. Ancora, C. D’Andrea, G. Valentini, A. Farina, and A. Bassi, “Compressed sensing in fluorescence microscopy,” Prog. Biophys. Mol. Biol. 168, 66–80 (2021). [CrossRef]

90. P. Wijesinghe, A. Escobet-Montalbán, M. Chen, P. R. Munro, and K. Dholakia, “Optimal compressive multiphoton imaging at depth using single-pixel detection,” Opt. Lett. 44, 4981–4984 (2019). [CrossRef]

91. A. Kazemipour, O. Novak, D. Flickinger, J. S. Marvin, A. S. Abdelfattah, J. King, P. M. Borden, J. J. Kim, S. H. Al-Abdullatif, P. E. Deal, and E. W. Miller, “Kilohertz frame-rate two-photon tomography,” Nat. Methods 16, 778–786 (2019). [CrossRef]

92. M. Alemohammad, J. Shin, J. R. Stroud, and M. A. Foster, “High-speed compressive line-scanned two photon microscopy,” in Optics and the Brain (Optica Publishing Group, 2020), paper BTu1C-3.

93. C. Wen, M. Ren, F. Feng, W. Chen, and S.-C. Chen, “Compressive sensing for fast 3-D and random-access two-photon microscopy,” Opt. Lett. 44, 4343–4346 (2019). [CrossRef]

94. A. Song, A. S. Charles, S. A. Koay, J. L. Gauthier, S. Y. Thiberge, J. W. Pillow, and D. W. Tank, “Volumetric two-photon imaging of neurons using stereoscopy (VTWINS),” Nat. Methods 14, 420–426 (2017). [CrossRef]

95. X. He and J. Cardenas, “Fully integrated on-chip ring resonator spectrometer based on compressed sensing,” in Conference on Lasers and Electro-Optics (CLEO) (IEEE, 2022), pp. 1–2.

96. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47, B44–B51 (2008). [CrossRef]

97. J. Shin, D. N. Tran, J. R. Stroud, S. Chin, T. D. Tran, and M. A. Foster, “A minimally invasive lens-free computational microendoscope,” Sci. Adv. 5, eaaw5595 (2019). [CrossRef]

98. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging: an introduction,” IEEE Signal Process. Mag. 31(1), 105–115 (2014). [CrossRef]

99. J. R. Stroud, L. Liu, S. Chin, T. D. Tran, and M. A. Foster, “Optical coherence tomography using physical domain data compression to achieve MHz A-scan rates,” Opt. Express 27, 36329–36339 (2019). [CrossRef]

100. M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15, 14013–14027 (2007). [CrossRef]

101. M. Alemohammad, E. R. Wainwright, J. R. Stroud, T. P. Weihs, and M. A. Foster, “Kilohertz frame rate snapshot hyperspectral imaging of metal reactive materials,” Appl. Opt. 59, 10406–10415 (2020). [CrossRef]

102. J. Shin, B. T. Bosworth, and M. A. Foster, “Single-pixel imaging using compressed sensing and wavelength-dependent scattering,” Opt. Lett. 41, 886–889 (2016). [CrossRef]

103. J. Shin, B. T. Bosworth, and M. A. Foster, “Compressive fluorescence imaging using a multi-core fiber and spatially dependent scattering,” Opt. Lett. 42, 109–112 (2017). [CrossRef]

104. L. V. Amitonova and J. F. De Boer, “Compressive imaging through a multimode fiber,” Opt. Lett. 43, 5427–5430 (2018). [CrossRef]

105. B. Lochocki, M. V. Verweg, J. J. Hoozemans, J. F. de Boer, and L. V. Amitonova, “Epi-fluorescence imaging of the human brain through a multimode fiber,” APL Photon. 7, 071301 (2022). [CrossRef]

106. J. Liang and L. V. Wang, “Single-shot ultrafast optical imaging,” Optica 5, 1113–1127 (2018). [CrossRef]

107. G. C. Valley, G. A. Sefler, and T. J. Shaw, “Compressive sensing of sparse radio frequency signals using optical mixing,” Opt. Lett. 37, 4675–4677 (2012). [CrossRef]

108. X. Liu and J. U. Kang, “Compressive SD-OCT: the application of compressed sensing in spectral domain optical coherence tomography,” Opt. Express 18, 22010–22019 (2010). [CrossRef]

109. L. Gao, J. Liang, C. Li, and L. V. Wang, “Single-shot compressed ultrafast photography at one hundred billion frames per second,” Nature 516, 74–77 (2014). [CrossRef]

110. B. T. Bosworth and M. A. Foster, “High-speed ultrawideband photonically enabled compressed sensing of sparse radio frequency signals,” Opt. Lett. 38, 4892–4895 (2013). [CrossRef]

111. G. C. Valley, G. A. Sefler, and T. J. Shaw, “Multimode waveguide speckle patterns for compressive sensing,” Opt. Lett. 41, 2529–2532 (2016). [CrossRef]

Input: Sensing matrix $A$ , measurement vector $y$ , sparsity $S$ , error tolerance $ϵ$
Output: Reconstructed signal $\hat{x}$
$\hat{x} \leftarrow 0$	$▹$ Initialize guess to the zero vector
$r_{0} \leftarrow y$	$▹$ Initialize residue to the measurement vector
$S_{0} \leftarrow \emptyset$	$▹$ Initialize support set to the empty set
while $\| S \| < S$ and $‖ r_{i} ‖_{2} < ‖ r_{i - 1} ‖_{2}$ and $‖ A {\hat{x}}_{i} - y ‖_{2} > ϵ$ do	$▹$ Termination criteria
$n^{*} \leftarrow a r g {m a x}_{1 \leq n \leq N} \| ⟨ r_{i - 1}, a_{n} ⟩ \|$	$▹$ Scan for the best basis candidate
$S_{i} \leftarrow S_{i - 1} \cup n^{*}$	$▹$ Update the support set
${\hat{x}}_{i} \leftarrow a r g {m i n}_{x} ‖ A_{S_{i}} x - y ‖_{2}$	$▹$ The best signal estimate given support set
$r_{i} \leftarrow y - A_{S_{i}} x$	$▹$ Update residue
end while

Input: Sensing matrix $A$ , measurement vector $y$ , sparsity $S$ , error tolerance $ϵ$
Output: Reconstructed signal $\hat{x}$
$\hat{x} \leftarrow 0$	$▹$ Initialize guess to the zero vector
$r_{0} \leftarrow y$	$▹$ Initialize residue to the measurement vector
$S_{0} \leftarrow \emptyset$	$▹$ Initialize support set to the empty set
while $‖ r_{i} ‖_{2} < ‖ r_{i - 1} ‖_{2}$ and $‖ A {\hat{x}}_{i} - y ‖_{2} > ϵ$ do	$▹$ Termination criteria
if SP then
$c_{i} \leftarrow σ_{S} (A^{T} r_{i - 1})$	$▹$ SP finds the best $S$ new indices whereas
else if CoSaMP then
$c_{i} \leftarrow σ_{2 S} (A^{T} r_{i - 1})$	$▹$ CoSaMP finds the best $2 S$ new indices
end if
${\hat{x}}_{i} \leftarrow σ_{S} (A_{S_{i - 1} \cup sup (c_{i})}^{†} y)$	$▹$ Prune for the best $S$ -sparse solution and support
$S_{i} \leftarrow sup ({\hat{x}}_{i})$
$r_{i} \leftarrow y - A_{S_{i}} x$	$▹$ Update residue
end while

Input: Sensing matrix $A$ , measurement vector $y$ , sparsity $S$ , error tolerance $ϵ$
Output: Reconstructed signal $\hat{x}$
$\hat{x} \leftarrow 0$	$▹$ Initialize guess to the zero vector
while $‖ A \hat{x} - y ‖_{2} > ϵ$ do	$▹$ Termination criteria
$\hat{x} \leftarrow σ_{S}^{H} (\hat{x} + A^{T} (y - A \hat{x}))$	$▹$ Gradient descent followed by hard thresholding
end while

Input: Sensing matrix $A$ , measurement vector $y$ , sparsity $S$ , error tolerance $ϵ$
Output: Reconstructed signal $\hat{x}$
${\hat{x}}_{0} \leftarrow 0$	$▹$ Initialize guess to the zero vector
${\hat{w}}_{0} \leftarrow 0$
while $‖ A \hat{x} - y ‖_{2} > ϵ$	$▹$ Termination criteria
${\hat{x}}_{k + 1} \leftarrow σ_{S}^{S} ({\hat{w}}_{k} + A^{T} (y - A {\hat{w}}_{k}))$	$▹$ Gradient descent followed by shrinkage
if FISTA then
$t_{k + 1} \leftarrow \frac{1 + \sqrt{1 + 4 t_{k}^{2}}}{2}$
${\hat{w}}_{k + 1} \leftarrow {\hat{x}}_{k} + \frac{t_{k} - 1}{t_{k + 1}} ({\hat{x}}_{k + 1} - {\hat{x}}_{k})$	$▹$ Momentum acceleration
else
${\hat{w}}_{k + 1} \leftarrow {\hat{x}}_{k + 1}$
end if
end while

Input: Sensing matrix $A$ , measurement vector $y$ , sparsity $S$ , error tolerance $ϵ$
Output: Reconstructed signal $\hat{x}$
$\hat{x} \leftarrow 0$	$▹$ Initialize guess to the zero vector
$r_{0} \leftarrow y$	$▹$ Initialize residue to the measurement vector
$S_{0} \leftarrow \emptyset$	$▹$ Initialize support set to the empty set
while $\| S \| < S$ and $‖ r_{i} ‖_{2} < ‖ r_{i - 1} ‖_{2}$ and $‖ A {\hat{x}}_{i} - y ‖_{2} > ϵ$ do	$▹$ Termination criteria
$n^{*} \leftarrow a r g {m a x}_{1 \leq n \leq N} \| ⟨ r_{i - 1}, a_{n} ⟩ \|$	$▹$ Scan for the best basis candidate
$S_{i} \leftarrow S_{i - 1} \cup n^{*}$	$▹$ Update the support set
${\hat{x}}_{i} \leftarrow a r g {m i n}_{x} ‖ A_{S_{i}} x - y ‖_{2}$	$▹$ The best signal estimate given support set
$r_{i} \leftarrow y - A_{S_{i}} x$	$▹$ Update residue
end while

Abstract

1. INTRODUCTION

2. COMPRESSED SENSING

A. Notation

B. Sampling

C. Sparsity

D. Restricted Isometry Property and Mathematical Guarantee

E. Coherence, the Gershgorin Disc Theorem, and the Benefit of Random Sensing Matrices

F. Sparse Recovery Algorithms

1. Greedy Algorithms

2. Linear Programming

3. Iterative Convex Optimization

4. Local versus Global Recovery

5. Deep Neural Networks

3. OPTICAL TECHNOLOGIES FOR COMPRESSED SENSING

A. Passive Multiplexing

1. Coded Aperture

2. Diffusers and Random Microlens Arrays

3. Scattering

4. Waveguides and Cavities

B. Active Multiplexing

1. Spatial Light Modulators

2. Electro-Optic Modulators

4. APPLICATIONS

A. Single Pixel Camera

B. Microscopy

1. Fluorescence Microscopy

2. Multiphoton Fluorescence Microscopy

C. On-Chip Spectrum Sensing

D. Hyperspectral Imaging

E. Lens-Free Micro-Endoscope

F. Optical Coherence Tomography

G. High Speed Imaging

H. Radio Frequency Sensing

5. CONCLUSION

Funding

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (23)

Tables (4)

Equations (45)

Journal of the Optical Society of America B