Middle output regularized end-to-end optimization for computational imaging

Roman Jacome; Pablo Gomez; Henry Arguello

doi:10.1364/OPTICA.494924

1. INTRODUCTION

The joint operation of optical systems and computational algorithms in computational imaging (CI) has allowed the acquisition of high-dimensional signals, $D \gt 2$, where $D$ is the signal dimension, such as spectral imaging [1], polarization state [2], depth imaging [3], temporal imaging [4], and angular views in light fields [5]. A key in these systems is optical coding elements (OCEs), which allow modulating variables of the incident light wave, such as its amplitude, using coded apertures (CAs) [6], phase using diffractive lenses [7], polarization using micro-polarizers [2], or spectral information employing dispersive elements [8]. Consequently, the design of these elements for optimal CI performance has received great attention. Particularly, the design of CA has been extensively studied based on analytical criteria such as Hadamard invertibility [6,9] or compressive sensing theory [10], such as the restricted isometry property [11,12]. Additionally, in the design of diffractive lenses, methods have been proposed to reduce chromatic aberrations and geometries [13] to improve CI systems. Moreover, these elements have been designed for the encoding of spectral information [14,15]. Although an increase in the performance of the aforementioned design methods is presented with respect to standard configurations (Bernoulli CA or Fresnel lenses), these are based on structural assumptions of the signal or system, which in some cases are not achieved and do not work well in several scenarios.

With new advances in machine learning algorithms, particularly those of deep learning [16], and a large number of databases available, the end-to-end (E2E) optimization method [17] has been proposed where the OCE is optimized taking into account properties of the training dataset. Here, the optical system is modeled as a layer of a neural network whose trainable parameters are the OCE, and this layer is called the optical encoder (OE). The OCE is coupled with a network that performs the decoding task, i.e., reconstruction, classification, segmentation, etc., and is called the computational decoder (CD). This way, the OCE is jointly trained with the inference task, allowing the OCE to adapt according to the training database and the CD. While the whole E2E network has shown an overall good performance in several tasks such as spectral imaging [18], classification, and depth estimation [19], compressive spectral image fusion [20,21], extended chromatic field of view and super-resolution [22], and monocular depth estimation [3], among others, optimization of the OE can be subpar for several reasons. For instance, the OE parameters are optimized only with respect to the loss function computed at the output of the CD network yielding in vanishing of the gradient on the OCE. Thus, the performance of the E2E network relies more on optimal CD training than on optimal optical codification design. Moreover, the output of the intermediate layer is not considered a variable that needs to be carefully optimized to increase the entire performance of the network. Additionally, the OE is highly constrained to a feasible set of values due to the physical meaning of the OCE, which reduces the degrees of freedom in training.

To overcome the OCE training issues, we propose middle output regularized E2E (MORE), where a set of regularization functions performed in the output of the OE is devised. First, the proposed regularization functions can exploit prior knowledge about the task, the dataset, and the OE to optimize the OCE. Also, we give insights into some criteria to better optimize the intermediate layers’ output based on these outputs’ statistical properties such as the mean and variance of the measurement set. We show how the measurement distribution affects the CD performance according to the tasks. Empirically, we demonstrate that if we concentrate on the distribution of the measurements (reducing data variance), it allows a more compact representation of the data, thus allowing better reconstruction performance. For the classification task, increasing the variance improves accuracy since the classes are better identified by the CD. Based on these criteria, three types of regularization functions are proposed to promote these properties on the OE: (i) Kullback–Leiber divergence (KL-D) regularization, where these functions aim to approximate the distribution of the intermediate output (OE output) to a prior distribution. Particularly, Gaussian distribution (widely used in variational autoencoders [23]) and Laplacian distribution (employed in regression tasks [24]) priors are employed since the KL-D has a closed-form solution and can be efficiently implemented. This regularization promotes a given mean and variance value on the measurement distribution by the prior distribution. We study the effect of this prior distribution to obtain better task performance. Preliminary results on the KL-D regularizer have shown promising results in [25] for recovery tasks and also beyond CI in [26] where we employed this regularization to improve the design of the geometry of acquisition in compressive seismic applications. (ii) The second type is variance-based regularization in which the variance of the coded observations is minimized or maximized. This criterion has been studied in self-supervised representation learning, where controlling the variance allows a more compact representation of the data. We minimize the variance for the reconstruction task and maximize it for classification. (iii) The third type is structural regularization, where we exploit a low rank in the measurement set by sparsifying the singular values of the measurements, thus concentrating the dataset information in a few linear independent coded measurements, and sparsity in a given basis, e.g., wavelets along the measurement set to promote smoothness, i.e., reduce data variability. These regularization functions indirectly concentrate the distribution of measurements. From a learning representation point of view, these regularization functions encourage invariant OE and allow contractive representation in the data manifold, while the recovery loss function enforces accurate image estimation [27]. Contractive representations have been used in traditional autoencoders [28]. However, this criterion has not been proposed for sensing matrix optimization. One of the main advantages of the proposed training methodology is that it can be applied in any optical architecture and can be adapted for any computational task. An overview of the proposed approach is shown in Fig. 1.

Fig. 1. (a) E2E scheme where the OE is optimized jointly with the CD network. (b) Proposed regularization functions to improve the design of the OE by inducing statistical priors during the training of the E2E network. (c) Scheme of the optical systems employed in this paper to validate the proposed OCE optimization. OL, objective lens; RL, relay lens; GS, grayscale sensor; P, prism; SPS, single-pixel sensor. Highlighted in blue are the trainable elements of each system.

Download Full Size | PDF

Several systems were employed to validate the proposed design criteria’s effectiveness. First, the regularization functions were evaluated using a compressive sensing scenario; further real imaging systems were employed, such as the single-pixel camera (SPC) [29] for imaging, and the CA snapshot spectral imager (CASSI) [8] for spectral imaging. The design of the OCE of these systems has been addressed with the E2E framework. For instance, in the SPC, authors in [30] design the CA for a single-pixel video; in [19], the author employs the E2E framework to optimize the CASSI CA. We showed that decreasing the variance of the set of measurements for all these systems allows for better reconstruction quality.

The rest of the paper is organized as follows. In Section 2, the E2E formulation is presented, and Section 3 presents the proposed regularization functions to improve the E2E performance. In Section 4 is shown the mathematical modeling of the compressive imaging systems employed to validate the proposed design. Section 5 presents the numerical experiments. Furthermore, Section 6 reports the experimental validation of the proposed design, and finally, Section 7 presents the conclusion of this work.

2. END-TO-END OPTIMIZATION

In CI, a high-dimensional signal $\textbf{f} \in {\mathbb{R}^n}$ is acquired via a low-dimensional coded projection $\textbf{y} \in {\mathbb{R}^m}$, with $m \ll n$. Here, we focus on linear CI systems. In this case, the E2E optimization framework, the sensing procedure, is modeled as a differentiable linear operator, i.e.,

(1)$$\textbf{y} = {\textbf{H}_{\boldsymbol \Phi}}\textbf{f} + {\boldsymbol \omega},$$

where ${\textbf{H}_{\boldsymbol \Phi}} \in {\mathbb{R}^{n \times m}}$ is the sensing matrix of the system, namely, OE, ${\boldsymbol \Phi}$ is the OCE of the sensing system, e.g., CA, and ${\boldsymbol \omega}$ is additive noise. The OCE is then optimized jointly with a CD network ${{\cal M}_\theta}$ with trainable parameters $\theta$ as

(2)$$\begin{split}\left\{{{{\boldsymbol \theta}^ \star},{{\boldsymbol \Phi}^ \star}} \right\} &= \mathop {{\rm arg} {\rm min}}\limits_{{\boldsymbol \theta},{\boldsymbol \Phi}} {\cal L}({\boldsymbol \theta},{\boldsymbol \Phi})\\& = \mathop {{\rm arg} {\rm min}}\limits_{{\boldsymbol \theta},{\boldsymbol \Phi}} \frac{1}{K}\sum\limits_{k = 1}^K {{\cal L}_{\rm{task}}}\left({{{\cal M}_{\boldsymbol \theta}}({\textbf{H}_{\boldsymbol \Phi}}{\textbf{f}_k}),{\textbf{d}_k}} \right) + \rho {R_i}({\boldsymbol \Phi}),\end{split}$$

where $\{{\textbf{f}_k}\} _{k = 1}^K$ is the training dataset, ${{\cal L}_{\rm{task}}}$ is the loss function of desired tasks, and ${\textbf{d}_k}$ corresponds to the expected output, e.g., classification labels [31], ground truth image [20], depth maps [3], etc. Usually, the OCE is constrained to a set of feasible values due to the physical limitations of the elements. To impose this constraint, a regularization function ${R_i}({\boldsymbol \Phi})$ is added to the loss function, where $\rho$ is the regularization parameter. This regularization can also induce the desired properties on the OCE, such as transmittance in CA, number of shots, etc., (Table II in [17]). The main goal here is that the OCE is updated according to the task loss function and the physical constraint given by the regularization. Particularly, following the chain rule, the gradient of the loss function with respect to the OCE is

Fig. 2. Norm of the gradient of the CD parameters and the OCE of the OE.

Download Full Size | PDF

(3)$$\frac{{\partial {\cal L}}}{{\partial {\boldsymbol \Phi}}} = \frac{{\partial {{\cal L}_{\rm{task}}}}}{{\partial {{\cal M}_{\boldsymbol \theta}}}}\frac{{\partial {{\cal M}_{\boldsymbol \theta}}}}{{\partial \textbf{y}}}\frac{{\partial \textbf{y}}}{{\partial {\boldsymbol \Phi}}} + \rho \frac{{\partial {R_i}({\boldsymbol \Phi})}}{{\partial {\boldsymbol \Phi}}}.$$

Training the OCE has two main issues: (i) training is highly conditioned to the physical-limitation regularization function, which decreases the degrees of freedom of the OCE; (ii) gradient vanishing due to the OE being the first layer of the E2E network. Most of the optimization is performed over the CD parameters rather than optimizing the optical coding properly. As an illustration of this phenomenon, Fig. 2 plots the norm of the loss function gradient with respect to $\theta$ and $\phi$ in a logarithmic scale. This experiment is performed with an SPC as OE; its corresponding OCE is the CA, and the computational task is recovery via a UNET network. Here, there is a significant difference (almost one order of magnitude) between the OE gradient and the CD parameter gradient. Mainly, this issue is because the intermediate output of the E2E (coded measurements) is not taken into account independently in training, and the optimization is performed only with respect to the CD output. Thus, we provide new insights into what the intermediate output should be based on the statistical properties of this output. Then, based on this criterion, we propose a set of regularization functions that control the statistical properties of the coded measurements. Other regularization functions have been proposed to increase the performance of the E2E network. For instance, [19] proposes to minimize a regularization based on concentrating the eigenvalues of the sensing matrix ${\textbf{H}_\Phi}$ following the function $\Vert \textbf{H}_\Phi ^T{\textbf{H}_\Phi}\textbf{f} - \textbf{f}{\Vert _2}$. Similarly, [32] proposes to minimize the closed-form solution of a regularized $\ell - 2$ optimization problem, i.e., ${\rm arg} {{\rm min}_\textbf{f}}\Vert {\textbf{H}_\Phi}\textbf{f} - \textbf{y}{\Vert _2} + \gamma \Vert \textbf{f}{\Vert _2}$, yielding the regularization function $\Vert {\textbf{f}_k} - (\textbf{H}_\Phi ^T{\textbf{H}_\Phi} + \gamma \textbf{I}{))^{- 1}}\textbf{H}_\Phi ^T{\textbf{H}_\Phi}{\textbf{f}_k}{\Vert _2}$, thus promoting good invertibility properties on ${\textbf{H}_\Phi}$. These functions aim to obtain an approximation of the desired image only with the invertibility properties of the sensing matrix. However, such invertibility is usually not met due to a highly structured matrix and mostly due to the ill-posed nature of the problem. Thus this regularization does not provide better optimization of the OE. Additionally, these regularization functions apply only to recovery and cannot be adapted to other computational tasks. The proposed regularization functions promote a contractive OE, which reduces the variance between training samples’ compressed projections. Then, by reducing the variability on the compressed domain, the decoder performs better in the reconstruction. Also, for the classification task, the opposite effect is desired, expanding the distribution of the measurements. The following section will detail the proposed regularization.

3. PROPOSED REGULARIZATION FUNCTIONS

In this paper, we propose a new type of regularization function for E2E optimization, promoting some properties on the distribution of the measurements. The optimization problem Eq. (2) becomes

(4)$$\begin{split}\left\{{{{\boldsymbol \theta}^ \star},{{\boldsymbol \Phi}^ \star}} \right\}& = \mathop {{\rm arg} {\rm min}}\limits_{{\boldsymbol \theta},{\boldsymbol \Phi}} {\cal L}({\boldsymbol \theta},{\boldsymbol \Phi})\\ &= \mathop {{\rm arg} {\rm min}}\limits_{{\boldsymbol \theta},{\boldsymbol \Phi}} \frac{1}{K}\sum\limits_{k = 1}^K {{\cal L}_{\rm{task}}}\left({{{\cal M}_{\boldsymbol \theta}}({\textbf{H}_{\boldsymbol \Phi}}{\textbf{f}_k}),{\textbf{d}_k}} \right) \\&\quad+ \rho {R_i}({\boldsymbol \Phi}) +\mu R(\textbf{Y}),\end{split}$$

where $\mu$ is the regularization parameter, and $\textbf{Y} \in {\mathbb{R}^{K \times m}}$ is the matrix containing all the training batch of compressed measurements, i.e., $\textbf{Y} = [\textbf{y}_1^T,\textbf{y}_2^T, \ldots ,\textbf{y}_K^T{]^T}$.

A. Divergence-Based Regularization

This type of regularization function is based on the idea behind variational autoencoders [23]. Particularly, this regularization aims to approximate the probability distribution of the measurement set denoted by the posterior distribution ${q_{\boldsymbol \Phi}}(\textbf{Y}|\textbf{F})$, where $\textbf{F} \in {\mathbb{R}^{K \times n}}$ is a matrix with all the input training images, to a prior distribution ${p_\beta}(\textbf{Y})$, where $\beta$ is the set of parameters that defines the prior distribution. This regularizer is defined as

(5)$${R_D}(\textbf{Y}) = {\cal D}\left({{q_\Phi}(\textbf{Y}|\textbf{F})\Vert {p_\beta}(\textbf{Y})} \right),$$

where ${\cal D}$ denotes the divergence function. Several divergences have been used as loss functions in neural network training. The most common is the KL-D, employed in variational autoencoders [23], generative adversarial networks [33], and self-supervised learning [34], among others. Particularly, the KL-D is defined as follows: given two probability distributions $P(x)$ and $Q(x)$, we have ${{\cal D}_{\rm{KL}}}(P\Vert Q) = \int P(x)\log ({\frac{{P(x)}}{{Q(x)}}}){\rm d}x$. One of the main reasons the KL-D is widely used is that it has a closed-form solution when $P$ and $Q$ are Gaussian or Laplacian distributions (see [23,35]). In these cases, the parameters for the prior distribution ${p_\beta}(\textbf{Y})$ are $\beta = {\mu _p},{\sigma _p}$, where ${\mu _p}$ is the mean value, and ${\sigma _p}$ is the variance of the distribution. Denote the distribution of the measurement as ${q_\Phi}(\textbf{Y}\Vert \textbf{F})$. The mean ${{\boldsymbol\mu}_\textbf{Y}} \in {\mathbb{R}^m}$ and variance ${{\boldsymbol \sigma}_\textbf{Y}} \in \mathbb{R}_ + ^m$ are computed pixel-wise across the training batch. For the Gaussian case, the KL-D-based regularizer is defined as

(6)$${R_{\rm KL - G}}(\textbf{Y}) = \log \left({\frac{{{\sigma _\textbf{Y}}}}{{{\sigma _p}}}} \right) - \frac{{{\boldsymbol \sigma}_\textbf{Y}^2 + {{\left({{{\boldsymbol\mu}_\textbf{Y}} - {\mu _p}} \right)}^2}}}{{2\sigma _p^2}} + \frac{1}{2},$$

and for the Laplacian assumption, the KL-D-based regularizer is given by

(7)$${R_{\rm KL - L}}(\textbf{Y}) = \log \left({\frac{{{\sigma _\textbf{Y}}}}{{{\sigma _p}}}} \right) - \frac{{{\sigma _p} + {e^{\left({\frac{{- |{\mu _p} - {{\boldsymbol\mu}_\textbf{Y}}|}}{{{\sigma _p}}}} \right)}} + |{\mu _p} - {{\boldsymbol\mu}_\textbf{Y}}|}}{{{\sigma _p}}} - 1.$$

The effect of these regularizers depends directly on the values of the mean and variance of the prior distribution. Thus, these are hyperparameters of the regularizers needed to be chosen to obtain the desired behavior.

B. Variance-Based Regularization

Another way to control the measurement set distribution is to regularize the variance directly. Here, we propose a variance minimization regularizer. This variance-based regularization criterion has also been used in representation learning for self-supervised tasks [36] and sparse-coding [37]. We extrapolate these criteria of an optimal low-dimensional representation basis to a compressive sensing system, thus giving more interpretability of the designed OCE by E2E optimization. The proposed regularization function is the following:

(8)$${R_{{V{\rm min}}}}(\textbf{Y}) = \Vert {{\boldsymbol \sigma}_\textbf{Y}}{\Vert _2}.$$

For this regularization, to control how concentrated we want the distribution of the measurements to be, we tune the hyperparameter $\mu$ in Eq. (4). In some downstream tasks, such as classification, where we want to identify the difference from the image of different classes, if the distribution of the measurements is wider, i.e., greater variance, the CD could better identify the classes. Thus, the variance maximization can be promoted by the following regularization function:

(9)$${R_{{V{\rm max}}}}(\textbf{Y}) = \Vert {\sigma _{{{\rm max}}}} - {{\boldsymbol \sigma}_\textbf{Y}}{\Vert _2},$$

where ${\sigma _{{{\rm max}}}}$ is a maximum variance reference, a hyperparameter that can be tuned.

C. Structural Regularization

This type of regularization is based on the common priors of compressed sensing recovery: low rank and sparsity [10,38]. Although these priors are employed over the underlining signal $\textbf{f}$, here, we employ these criteria to achieve the following effects in the measurement space. The low-rank prior is employed to concentrate the information of the dataset in a few representative measurements, thus reducing the projection manifold and allowing better reconstruction by the CD. To promote the low rankness on the measurement space, we minimize the ${\ell _1}$ norm of the singular values of $\textbf{Y}$. Particularly, employing the singular value decomposition (SVD) of the measurement matrix, we obtain $\textbf{Y} = \textbf{UD}{\textbf{V}^T}$, where matrices $\textbf{U} \in {\mathbb{R}^{m \times m}}$ and $\textbf{V} \in {\mathbb{R}^{K \times K}}$ are the left and right singular vectors, respectively, and $\textbf{D} \in {\mathbb{R}^{m \times K}}$ is a rectangular diagonal matrix with the singular values in its diagonal. The singular values are denoted by $\textbf{d} = [{d_i}, \ldots ,{d_K}]$, where ${d_i} = {\textbf{D}_{(i,i)}}$ for $i = 1, \ldots ,K$. Thus, our low-rank regularization is the following:

(10)$${R_{\textit{LR}}}(\textbf{Y}) = \Vert \textbf{d}{\Vert _1}.$$

By applying the ${\ell _1}$ norm on the singular values, we promote having few non-zero values on $\textbf{d}$ and thus reducing the rank.

The second criterion, sparsity-based regularization, follows the same intuition of its application in imaging inverse problems, where sparsity over a given representation basis (wavelet, discrete cosine transform, or Fourier) is employed to promote the smoothness of the images. Here, we aim to promote smoothness along the coded measurements, thus reducing the variance. Mathematically, the regularizer is

(11)$${R_S}(\textbf{Y}) = \Vert {\boldsymbol \Psi}{{\boldsymbol \sigma}_\textbf{Y}}{\Vert _1},$$

where ${\boldsymbol \Psi}$ is the representation basis. In this work, we consider the Haar wavelet, which has shown good results in promoting smoothness on signals [39].

4. COMPRESSIVE IMAGING SENSING MODELS

To validate the proposed deep optical design, we employed two flagship CI optical architectures: CASSI and SPC.

A. Single-Pixel Camera

The first optical architecture is the SPC [29], which is widely used in compressive imaging systems. This system employs an imaging lens that spatially introduces light, which is previously modulated by a CA, and then integrates the encoded image into a single-pixel detector. The CA can be implemented with spatial light modulators (SLMs) [40], such as a digital micro-mirror device (DMD) [41] that selectively redirects parts of the light beam [42]. The SPC uses a CA ${\boldsymbol \Phi}_{(i,j)}^k$ that spatially modulates all the information from the scene ${\textbf{F}_{(i,j)}}$ with the same pattern, where $(i,j)$ index the spatial coordinates, and $k$ indexes each captured snapshot. In particular, the CA $\boldsymbol{\Phi}_{(i,j)}^k$ is a binary pattern whose spatial distribution determines the performance of the reconstruction. Mathematically, the CA effect over the scene can be represented as

(12)$$\hat{\textbf{F}}_{(i,j)}^k = {\textbf{F}_{(i,j)}}\boldsymbol{\Phi}_{(i,j)}^k.$$

After that, the modulated scene $\hat F$ is focused in a single spatial point by the condenser lens, and captured by a single-pixel detector. The resulting sensing matrix ${\hat{\textbf{H}}_{{\phi _s}}} \in {\mathbb{R}^{K \times MN}}$ contains the vectorization of the CA of each snapshot $k$ in rows. In Fig. 1 is shown a scheme of the SPC system. The aperture codes implemented for the sensing matrix are the design parameters from the proposed regularizers. The acquisition system can be modeled as

(13)$$\textbf{y} = {\hat{\textbf{H}}_{{\phi _s}}}\textbf{f} + {\textbf{n}_s},$$

where $\textbf{y} = [{y_0},\ldots,({y_{K - 1}}{)]^T}$ is the compressed measurements, $\textbf{f} \in {\mathbb{R}^{\textit{MN}}}$ is the vectorized image, and ${\textbf{n}_c}$ is additive Gaussian noise.

B. CASSI

In the CASSI architecture, the input light source is first focused by an imaging lens to a CA, which codifies the spatial information of the image. Then, the spectral information of the coded field is dispersed through a prism. Finally, the coded and dispersed information impinges on a focal plane array. An illustration of this system is depicted in Fig. 1(c). Therefore, the discrete model of the CASSI measurements ${\textbf{y}_c}$ can be formulated as

(14)$${\textbf{y}_{{c_{(i,j)}}}} = \sum\limits_{\ell = 1}^L {{\boldsymbol \Phi}_{{c_{(i,j)}}}}{\textbf{F}_{(i,j - \ell ,\ell)}},$$

where $\textbf{F} \in {\mathbb{R}^{M \times N \times L}}$ is the discretized spectral image, and ${{\boldsymbol \Phi}_c}$ represents the CA. The discrete model in Eq. (14) can be expressed in a matrix–vector product in the following expression:

(15)$${\textbf{y}_c} = {\textbf{H}_{{{\boldsymbol \Phi}_c}}}\textbf{f} + {\textbf{n}_c},$$

where ${\textbf{y}_c} \in {\mathbb{R}^{M(N + L - 1)}}$ are the compressed measurements, ${\textbf{H}_{{{\boldsymbol \Phi}_c}}} \in {\mathbb{R}^{M(N + L - 1) \times MNL}}$ is the CASSI sensing matrix, $\textbf{f} \in {\mathbb{R}^{\textit{MNL}}}$ is the vectorization of the high spatial–spectral resolution image, and ${\textbf{n}_c} \in {\mathbb{R}^{M(N + L - 1)}}$ is additive noise. Here, the design parameters are the CA ${{\boldsymbol \Phi}_c}$.

5. SIMULATION RESULTS

To evaluate the performance of the proposed design methodology, we perform the experiments shown in Table 1. In particular, we perform classification and recovery tasks, where for the first, we use a MobilNet-V2 network [43], which is a lightweight model widely used for classification. For the recovery task, a U-Net model with five convolution blocks was used for each downsampling and upsampling process. For all the experiments, we trained the E2E network for 100 epochs, as it was sufficient for the network to properly converge. We halved the learning rate every 40 epochs. For the CASSI CA binary constraint, the polynomial regularization in [19] was employed, i.e., $R({\boldsymbol \Phi}) = \sum\nolimits_{\textit{ij}} {(1 - {{\boldsymbol \Phi}_{\textit{ij}}})^2}{({{\boldsymbol \Phi}_{\textit{ij}}})^2}$. For the SPC CA constraint, we consider values $\{- 1,1\}$ that in practice can be achieved by following the procedure in detail in the appendix in [31], which allows a better signal-to-noise ratio (SNR). Then, the physical constraint regularizer is $R({\boldsymbol \Phi}) = \sum\nolimits_{\textit{ij}} {(1 - {{\boldsymbol \Phi}_{\textit{ij}}})^2}{(1 + {{\boldsymbol \Phi}_{\textit{ij}}})^2}$. The parameter of the physical constraint regularizer $\rho$ was dynamically updated during training as suggested in [19]. Three datasets were employed to train the networks. For the SPC experiments, we employed the Fashion MNIST dataset [44], which contains 60,000 images of 10 classes of clothes. We split this dataset into 50,000 for training and 10,000 for testing. All images were resized to have $32 \times 32$. For the CASSI experiments, the ARAD spectral images dataset was used [45], where the images were resized to have a size of $128 \times 128 \times 31$, and 900 images were used for training, 100 for testing. The code employed for all the simulations can be found in [46].

Table 1. Experiments to Validate the Proposed OCE Design

View Table

A. Compressed Sensing Experiments

In a first experiment to validate the performance of the proposed regularized E2E network, we study a compressive imaging scenario, not imposing a physical or structural meaning on the sensing matrix ${\textbf{H}_\Phi}$. Here, we use a compression ratio of 10%. For all experiments, the MNIST dataset is employed, where despite its simplicity, this dataset works as a first proof of concept of the proposed sensing matrix design. Further experiments on structured sensing matrices and more challenging datasets will validate the proposed method.

1. KL-Divergence

First, we analyze the effect of the mean and variance of the prior distribution (${\mu _p},{\sigma _p}$) on network performance. Here, we vary ${\mu _p}$ from ${-}{2}$ to 2, and ${\sigma _p}$ was changed from 0.1 to 2.0, taking five equispaced values. The results of this experiment are shown in Fig. 3, where optimal reconstruction peak SNR (PSNR) values are obtained at variances close to 1.0 and for means close to 0. These results suggest that better reconstruction performance is obtained by concentrating on the measurement distribution. The main interpretation is that reducing the representation space can improve the CD performance since the variability of the data is reduced.

Fig. 3. Recovery performance for the CS scenario employing KL-D regularizers with Gaussian (left) and Laplacian (right) cases.

Download Full Size | PDF

2. Variance and Structural Regularizers

Then, we analyze the performance of the E2E network for the variance minimization and structural regularization. In Fig. 4(a), the recovery performance is shown depending on the regularization parameter in Eq. (4) where optimal values for the regularization suggest a trade-off between how much concentrates the distribution and the recovery performance. In particular, significant recovery improvements are shown with the low-rank and sparsity experiments with respect to the baseline (no regularization E2E). Figures 4(b)–4(d) present the distribution of two pixels of the test set measurement with the trained system, where it depicts the distribution concentration compared with the no-regularization model. Additionally, Fig. 5 shows some visual reconstructions from two images of the test set, validating that the proposed regularization outperforms the baseline E2E method. Here, the best overall performance was obtained by the sparsity regularization. This is because in CS, where the sensing matrix is not constrained to the physical conditions of a particular system, the E2E network becomes a commonly used autoencoder, and then applying sparsity regularization on the low-dimensional representation yields sparse autoencoders [47], which is a widely employed method to improve representation performance.

Fig. 4. Recovery performance for the CS scenario employing the variance and structural regularizers compared with the non-regularized E2E network. (a) Performance depends on the regularization parameter $\mu$. First- and second-pixel distribution of the test dataset for (b) low rank, (c) minimized variance, and (d) sparsity.

Download Full Size | PDF

Fig. 5. Visual reconstruction results for the CS scenario using the variance and the structural regularizations and baseline E2E designs. The blue values correspond to the best results and green to the second best.

Download Full Size | PDF

B. SPC Experiments

For the SPC, we performed experiments on classification and recovery tasks. The classification is performed directly from the compressed measurements without reconstructing the underlying scene. During the training of the E2E network, the parameter of the physical constraint regularizer $\rho$ was dynamically updated during training as suggested in [19], and in the first epochs, $\rho$ is very low, thus not constraining the training of the OCE, and it is increased to obtain a binary CA. For both the recovery and classification tasks, we employed the Fashion MNIST dataset. For all experiments, we used a compression ratio of 0.1.

1. Recovery Experiments

For this experiment, we vary the values of ${\mu _p}$ from ${-}{2}$ to 2, and ${\sigma _p}$ was changed from 0.1 to 2.0, taking five equispaced values. The CD in this experiment is a UNET [48] with five downsampling and five upsampling blocks. The results of this experiment are shown in Fig. 6. Here, the performance obtained is similar to that in the CS case, where a lower variance yields better reconstruction performance. Also, similar to the results in Fig. 3, the best performance is obtained in ${\mu _p} = 0$, following the concept of batch normalization where the centered output distribution yields more stable training and better performance.

Fig. 6. Recovery performance for the SPC system KL regularizers with Gaussian (left) and Laplacian (right) cases.

Download Full Size | PDF

Then, we evaluate the variance and structural regularizers (${R_{{V{\rm min}}}},{R_{\textit{LR}}}$, and ${R_S}$) in the recovery task for the SPC architecture. To this end, a study of hyperparameter ${\mu _p}$ was performed, varying ${\mu _p}$ from ${10^{- 8}}$ to ${10^0}$ in a logarithmic scale. The results of this experiment are compared with the baseline E2E (non-regularized training), traditional CA aperture design based on Hadamard matrices [29], and random CA obtained with a Bernoulli distribution. The reconstruction performance measured in PSNR of this experiment is shown in Fig. 7(a). The results suggest that, in most cases, the proposed regularized design outperforms the baseline E2E design, Hadarmard, and random settings. Later, the distribution of the first two SPC snapshots for all images in the test dataset was plotted for the best-performing setting of each regularizer: variance minimization [Fig. 7(b)], sparsity [Fig. 7(c)], and low rank [Fig. 7(d)]. Each scatter plot also shows the distribution obtained by the non-regularized E2E sensing matrix design. In all cases, the resultant distribution employing the regularizers is more concentrated than the non-regularized, validating that for the reconstruction task, we perform better by reducing the variance of the measurements. Additionally, in Fig. 8 are shown some visual reconstructions of two images of the test set, where the reconstruction for these particular examples shows that the proposed design methodology improves upon the non-regularized E2E, the random coding, and the Hadarmad-based CA. Here, the best-performing regularization is the variance minimization, mainly due to the physical constraint that the OCE has values $\{{-}{1},{1\}}$, yielding that with a proper design, data variance can be effectively reduced. Finally, the resulting CA for employing the regularization functions is shown in Fig. 9 along with the CA of the comparison methods. Notably, the CA structure is highly affected by the regularization functions, showing for instance that the structural regularizer converges to an almost uniform pattern, while the variance and KL-based regularizer tends to form clusters into the CA.

Fig. 7. Recovery PSNR performance for the SPC system with the minimum variance and structural regularizers for (a) different regularization parameter ${\mu _p}$, and measurement distribution comparisons for non-regularized design with (b) sparsity, (c) variance minimization regularization, and (d) low rank.

Download Full Size | PDF

Fig. 8. Visual reconstruction results for the SPC scenario using the variance and structural regularizations, baseline E2E designs, random coding, and coding based on Hadamard matrices. The blue values correspond to the best results and green to the second best.

Download Full Size | PDF

Fig. 9. Optimized CA of two SPC snapshots employing the proposed regularization functions for the recovery tasks. Additionally, the E2E baseline, random, and Hadamard CAs are shown for comparison purposes.

Download Full Size | PDF

2. Classification Experiments

Here, we evaluate the proposed regularization functions on the classification high-level task. The CD is a Mobilnet-V2 [43], which is a lightweight classification network. In this scenario, the same values were used in the experiment in Fig. 6 of ${\mu _p}$ and ${\sigma _p}$. The results are shown in Fig. 10, where an opposite performance is obtained compared to the recovery case. Higher variance gives better classification performance.

Fig. 10. Classification performance for the SPC system with KL regularizers with Gaussian (left) and Laplacian (right) cases.

Download Full Size | PDF

Then, we employ the variance regularization (${R_{{V{\rm min}}}}$ and ${R_{{V{\rm max}}}}$) in the classification task. A study of hyperparameter ${\mu _p}$ was performed, varying ${\mu _p}$ from ${10^{- 8}}$ to ${10^0}$ in a logarithmic scale. The maximum variance value of ${R_{{V{\rm max}}}}$ was set to ${\sigma _{{{\rm max}}}} = 5$, as we saw better performance with this setting. The results of this experiment are compared with the baseline E2E (non-regularized training). The accuracy performance of this experiment is shown in Fig. 11(a). These results show that the variance maximization regularizer outperforms the baseline regularization and variance minimization. Additionally, regularization ${R_{{V{\rm min}}}}$ underperforms the baseline, validating that more concentrated distribution negatively affects the decoder performance. Then, the distribution of the first two SPC snapshots was plotted for the best-performing setting of each regularizer: non-regularized design [Fig. 11(b)], variance minimization [Fig. 11(c)], and variance maximization [Fig. 11(d)]. The colors on the scatter represent the corresponding class of each measurement. While in the baseline and minimized variance distributions, the classes are hardly identified, in the variance maximization design, the measurements of each class are clustered, which helps the decoder to classify the data better.

Fig. 11. Classification accuracy performance for the SPC system with the minimum and maximum variance regularizers for (a) different regularization parameter ${\mu _p}$, and measurement distribution comparisons for non-regularized design. Measurement distribution of (b) non-regularized design, (c) minimum variance, and (d) maximum variance.

Download Full Size | PDF

C. CASSI Experiments

Here, we aim to design the CA of the CASSI with the proposed regularization functions. Regularizers ${R_{{V{\rm min}}}},{R_S}$, and ${R_{\textit{LR}}}$ were used for this scenario since we want to recover the spectral image from the compressed measurements. We also compare random CA and blue noise CA [11] as non-data-driven designs. We first evaluate the performance with respect to regularization parameter ${\mu _p}$ compared with the non-regularization design. This parameters were varied from ${10^{- 8}}$ to ${10^0}$ in a logarithmic scale. The results in Fig. 12(a) show that the proposed regularizer improves upon the non-regularized setting, where the low rank is, in this case, the one that provides the best performance. In Fig. 12(b) is shown the optimized CA for the non-regularized and regularized designs. Remarkably, the low-rank design convergence to a uniform sampling pattern is a highly desired criterion in compressive imaging sensing matrix design [11,12]. Figure 12(c) shows a visual reconstruction of a test image with its corresponding PSNR and structural similarity index measure (SSIM) reconstruction values. Finally, in Fig. 12(d), the reconstruction of a red spectral signature is plotted with the corresponding spectral angle mapper (SAM) value. These last results show that the best results correspond also to the low-rank design. The low-rank regularizer performs better in this scenario since this prior is a very suitable prior for spectral images, as these kinds of data contain highly redundant information that can be effectively represented via low-rank approximation.

Fig. 12. Reconstruction PSNR performance for CASSI system varying the (a) regularization parameter ${\mu _p}$, (b) optimized CA for non-regularized and regularized training, (c) visual reconstruction, and (d) spectral reconstruction. Blue values correspond to the best results and green to the second best.

Download Full Size | PDF

6. EXPERIMENTAL VALIDATION

To perform the experimental validation of the proposed OCE design method, the SPC and the CASSI systems were implemented. For the SPC, we focus only on the classification task since this system has proven to be very suitable for this task [31,49]. For CASSI, the computational task is reconstruction; therefore, both tasks are also experimentally validated.

Fig. 13. SPC acquisition system validation of proposed method.

Download Full Size | PDF

Fig. 14. Validation of the proposed method through the SPC acquisition system, for the classification task. (a) Scenes from the acquired Fashion-MNIST dataset with SPC implementation. (b) Classification confusion matrix for non-regularized design and with the regularization functions.

Download Full Size | PDF

Fig. 15. Experimental prototype of the CASSI acquisition system.

Download Full Size | PDF

Fig. 16. Reconstruction of real data with the CASSI system with the low-rank, minimized variance, and no-regularization designs; 30 reconstructed spectral bands and a spectral signature reconstruction of a region of interest. The blue SAM value refers to the best performance and green to the second best.

Download Full Size | PDF

A. SPC Implementation

The SPC system was implemented based employing a group of lenses that concentrate light on a single pixel, which is focused at the entrance of the optical fiber. The illumination used was a 3900E lamp from Illumination Technology, which has a spectral range of 400–2200 [nm]. For implementation of the CA generated by the regularizers, a reference DMD DLP7000 from Thorlabs was used, which has a pitch of 13.6 [µm]. In this case, the binary levels are either ${+}{1}$ or ${-}{1}$. The modulation effect caused by the ${-}{1}$ level can be implemented by acquiring a measurement with a CA of all ones and subtracting it from each captured snapshot. Also, two types of sensors were used. The first is a side information sensor, which is a stingray camera F-145, with a pitch size of 6.45 [µm]. On the other hand, to acquire the SPC measurements, a Flame Vis-Nir spectrometer was used, which has a spectral range from 350 to 1077 [nm], as shown in Fig. 13.

We employed this architecture to validate the performance of the proposed method. For this experiment, 15 scenes of the first five classes of the Fashion MNIST dataset were acquired utilizing the implemented SPC system. A re-training of the network was performed with the calibrated and captured CA and using only the images from the first five classes of the Fashion MNIST dataset. From this, some of the examples acquired and shown in Fig. 14(a) were used as a test to evaluate the performance of the proposed method for every one of the regularizers. Figure 14(b) shows the confusion matrix for the non-regularized design, KL-Laplacian, KL-Gaussian, and the maximized variance regularization. The results suggest that the variance maximization regularization has the most accurate classification performance. Additionally, using the other regularization functions, there is an improvement with respect to the non-regularized design.

B. CASSI Implementation

The implemented CASSI system was based on the single-disperse setting proposed in [8] as shown in the scheme in Fig. 1(c). The structure of the system is as follows: First, an objective lens Canon 28–80 mm f/3.5-5.6 focuses the input light into a CA, which was implemented via a DMD Thorlabs DLP7000. Later, the coded field passes through an amici prism that disperses the light into its wavelengths along the horizontal axis. Finally, the dispersed light is integrated into a grayscale F-145 stingray sensor. An additional side sensor is employed to capture a set of ground truth images to perform fine-tuning on the network with the calibrated system. These reference images were obtained via wavelength scanning using a TLS Tunable QTH Light Source monochromator illumination. The system characterization allows sensing of 31 spectral bands ranging from 400 to 700 [nm]. The implemented CASSI optical set-up is shown in Fig. 15.

In this experiment [Fig. 16], we performed the acquisition of several scenes by varying the CA implemented in the DMD. These CAs were generated from the proposed model by varying the regularizers used, which are minimum variance, low-rank, and without a regularizer. The sparsity regularization is not employed in this experiment because its best performance was lower than the low-rank design; therefore, we employ only the low-rank design to compare the results of the structural-based regularization with the variance-based regularization design. From these captures, the reconstruction of the scene was performed in a range of spectral bands from 400–700 [nm], where it is observed that the behavior of the proposed model along the spectral range produces fewer artifacts with the proposed design than with the base E2E design. Additionally, a region of interest in the reconstructed images was analyzed, where the mean spectral signature is plotted along with the SAM metric. This result shows that with the proposed CA design, a more accurate spectral reconstruction is obtained.

7. CONCLUSION AND DISCUSSION

We proposed a set of regularization functions over the output of the OE layer within an E2E optimization of optics and image processing framework. These regularizations promote some statistical properties over coded measurements, i.e., they concentrate or spread the distribution of the measurements. We found that the optimal distribution depends on the computational task; for the recovery task, a concentrated distribution allows better performance, while for best classification performance, a wider distribution is desired. We validate the design of OCEs through regularized E2E optimization in different optical architectures, showing improvement with respect to the non-regularized design and other traditional non-data-driven approaches such as blue noise coding and Hadamard sensing. We present extensive simulation results for both computational tasks, whose performances were also validated by real scenarios with data acquired with the physical implementation of the designed systems. While here we analyzed all three types of regularizations individually, it remains an open question, and future work will focus on how to combine these functions to promote more complex priors and structures on the set of measurements by designing the OCE.

Here, we employed an optical architecture to validate the proposed regularized E2E optimization; however, this methodology can be extended to a general sensing matrix design such as the design of geometry in a compressive seismic acquisition scenario [26]. Additionally, beyond the sensing matrix design, these regularizations can also be used in high-level tasks such as generative models [50] where the variance of the generated samples is maximized to have high-diversity synthetic samples.

Funding

Ministerio de Ciencia, Tecnología e Innovación (110287780575); Agencia Nacional de Hidrocarburos; Fondo Nacional de Financiamiento para la Ciencia, la Tecnología y la Innovacion Francisco José de Caldas.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. G. R. Arce, D. J. Brady, L. Carin, et al., “Compressive coded aperture spectral imaging: an introduction,” IEEE Signal Process. Mag. 31(1), 105–115 (2014). [CrossRef]

2. C. Fu, H. Arguello, B. M. Sadler, et al., “Compressive spectral polarization imaging with coded micropolarizer array,” Proc. SPIE 9484, 59–65 (2015). [CrossRef]

3. J. Chang and G. Wetzstein, “Deep optics for monocular depth estimation and 3D object detection,” in Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 10193–10202.

4. K. M. León-López and H. A. Fuentes, “Online tensor sparsifying transform based on temporal superpixels from compressive spectral video measurements,” IEEE Trans. Image Process. 29, 5953–5963 (2020). [CrossRef]

5. M. Hirsch, G. Wetzstein, and R. Raskar, “A compressive light field projection system,” ACM Trans. Graph. 33, 1–12 (2014). [CrossRef]

6. E. Caroli, J. Stephen, G. Di Cocco, et al., “Coded aperture imaging in x-and gamma-ray astronomy,” Space Sci. Rev. 45, 349–403 (1987). [CrossRef]

7. Y. Peng, Q. Fu, H. Amata, et al., “Computational imaging using lightweight diffractive-refractive optics,” Opt. Express 23, 31393–31407 (2015). [CrossRef]

8. A. Wagadarikar, R. John, R. Willett, et al., “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47, B44–B51 (2008). [CrossRef]

9. S. R. Gottesman and E. E. Fenimore, “New family of binary arrays for coded aperture imaging,” Appl. Opt. 28, 4344–4352 (1989). [CrossRef]

10. E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]

11. C. V. Correa, H. Arguello, and G. R. Arce, “Spatiotemporal blue noise coded aperture design for multi-shot compressive spectral imaging,” J. Opt. Soc. Am. A 33, 2312–2322 (2016). [CrossRef]

12. H. Arguello and G. R. Arce, “Colored coded aperture design by concentration of measure in compressive spectral imaging,” IEEE Trans. Image Process. 23, 1896–1908 (2014). [CrossRef]

13. J. N. Mait, G. W. Euliss, and R. A. Athale, “Computational imaging,” Adv. Opt. Photonics 10, 409–483 (2018). [CrossRef]

14. F. Heide, Q. Fu, Y. Peng, et al., “Encoded diffractive optics for full-spectrum computational imaging,” Sci. Rep. 6, 1–10 (2016). [CrossRef]

15. D. S. Jeon, S.-H. Baek, S. Yi, et al., “Compact snapshot hyperspectral imaging with diffracted rotation,” ACM Trans. Graph. 38, 117 (2019). [CrossRef]

16. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015). [CrossRef]

17. H. Arguello, J. Bacca, H. Kariyawasam, et al., “Deep optical coding design in computational imaging: a data-driven framework,” IEEE Signal Process. Mag. 40(2), 75–88 (2023). [CrossRef]

18. E. Vargas, J. N. Martel, G. Wetzstein, et al., “Time-multiplexed coded aperture imaging: Learned coded aperture and pixel exposures for compressive imaging systems,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 2692–2702.

19. J. Bacca, T. Gelvez-Barrera, and H. Arguello, “Deep coded aperture design: an end-to-end approach for computational imaging tasks,” IEEE Trans. Comput. Imaging 7, 1148–1160 (2021). [CrossRef]

20. R. Jacome, J. Bacca, and H. Arguello, “D²UF: deep coded aperture design and unrolling algorithm for compressive spectral image fusion,” IEEE J. Sel. Top. Signal Process. 17, 502–512 (2022). [CrossRef]

21. R. Jacome, J. Bacca, and H. Arguello, “Deep-fusion: An end-to-end approach for compressive spectral image fusion,” in IEEE International Conference on Image Processing (ICIP) (IEEE, 2021), pp. 2903–2907.

22. V. Sitzmann, S. Diamond, Y. Peng, et al., “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. 37, 1–13 (2018). [CrossRef]

23. D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv, arXiv:1312.6114 (2013). [CrossRef]

24. G. P. Meyer, “An alternative probabilistic interpretation of the Huber loss,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 5261–5269.

25. R. Jacome, A. Hernandez-Rojas, and H. Arguello, “Probabilistic regularization for end-to-end optimization in compressive imaging,” in Computational Optical Sensing and Imaging (Optica Publishing Group, 2022), paper CW1B-1.

26. R. Jacome, H. Arguello, A. Hernandez-Rojas, et al., “Divergence-based regularization for end-to-end sensing matrix optimization in compressive sampling systems,” in SIGNAL 2023 Editors (2023), p. 79.

27. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013). [CrossRef]

28. S. Rifai, G. Mesnil, P. Vincent, et al., “Higher order contractive auto-encoder,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2011), pp. 645–660.

29. M. F. Duarte, M. A. Davenport, D. Takhar, et al., “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

30. C. F. Higham, R. Murray-Smith, M. J. Padgett, et al., “Deep learning for real-time single-pixel video,” Sci. Rep. 8, 2369 (2018). [CrossRef]

31. J. Bacca, L. Galvis, and H. Arguello, “Coupled deep learning coded aperture design for compressive image classification,” Opt. Express 28, 8528–8540 (2020). [CrossRef]

32. J. Bacca, T. Gelvez-Barrera, and H. Arguello, “Invariant coded aperture design for compressive imaging,” in Adaptive Optics and Applications (Optica Publishing Group, 2022), paper JTh2A-9.

33. T. Nguyen, T. Le, H. Vu, et al., “Dual discriminator generative adversarial nets,” in Advances in Neural Information Processing Systems (2017), Vol. 30.

34. W.-C. Hung, V. Jampani, S. Liu, et al., “SCOPS: Self-supervised co-part segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 869–878.

35. C. A. Metzler, H. Ikoma, Y. Peng, et al., “Deep optics for single-shot high-dynamic-range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1375–1385.

36. A. Bardes, J. Ponce, and Y. Lecun, “VICREG: Variance-invariance-covariance regularization for self-supervised learning,” in ICLR 2022-10th International Conference on Learning Representations (2022).

37. K. Evtimova and Y. LeCun, “Sparse coding with multi-layer decoders using variance regularization,” arXiv, arXiv:2112.09214 (2021). [CrossRef]

38. M. Fazel, E. Candes, B. Recht, et al., “Compressed sensing and robust recovery of low rank matrices,” in 42nd Asilomar Conference on Signals, Systems and Computers (IEEE, 2008), pp. 1043–1047.

39. I. W. Selesnick and M. A. Figueiredo, “Signal restoration with overcomplete wavelet transforms: comparison of analysis and synthesis priors,” Proc. SPIE 7446, 107–121 (2009). [CrossRef]

40. C. A. O. Quero, D. Durini, J. Rangel-Magdaleno, et al., “Single-pixel imaging: an overview of different methods to be used for 3d space reconstruction in harsh environments,” Rev. Sci. Instrum. 92, 111501 (2021). [CrossRef]

41. L. Galvis, H. Arguello, and G. R. Arce, “Coded aperture design in mismatched compressive spectral imaging,” Appl. Opt. 54, 9875–9882 (2015). [CrossRef]

42. A. Jerez, H. Garcia, and H. Arguello, “Single pixel spectral image fusion with side information from a grayscale sensor,” in IEEE 1st Colombian Conference on Applications in Computational Intelligence (ColCACI) (2018), pp. 1–6.

43. M. Sandler, A. Howard, M. Zhu, et al., “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 4510–4520.

44. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms,” arXiv, arXiv:1708.07747 (2017). [CrossRef]

45. B. Arad, R. Timofte, R. Yahel, et al., “Ntire 2022 spectral recovery challenge and data set,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 863–881.

46. https://github.com/romanjacome99/MORE.git.

47. A. Ng, “Sparse autoencoder,” CS294A Lect. Notes72, 1–19 (2011).

48. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in 18th International Conference on Medical Image Computing and Computer-Assisted Intervention–MICCAI, Part III 18, Munich, Germany, October 5–9, 2015 (Springer, 2015), pp. 234–241.

49. J. Bacca, M. Marquez, and H. Arguello, “Single pixel near-infrared imaging for spectral classification,” in Computational Optical Sensing and Imaging (Optica Publishing Group, 2022), paper CW1B-2.

50. E. Martinez, R. Jacome, A. Hernandez-Rojas, et al., “Ld-GAN: Low-dimensional generative adversarial network for spectral image generation with variance regularization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 265–275.

Middle output regularized end-to-end optimization for computational imaging

Abstract

1. INTRODUCTION

2. END-TO-END OPTIMIZATION

3. PROPOSED REGULARIZATION FUNCTIONS

A. Divergence-Based Regularization

B. Variance-Based Regularization

C. Structural Regularization

4. COMPRESSIVE IMAGING SENSING MODELS

A. Single-Pixel Camera

B. CASSI

5. SIMULATION RESULTS

A. Compressed Sensing Experiments

1. KL-Divergence

2. Variance and Structural Regularizers

B. SPC Experiments

1. Recovery Experiments

2. Classification Experiments

C. CASSI Experiments

6. EXPERIMENTAL VALIDATION

A. SPC Implementation

B. CASSI Implementation

7. CONCLUSION AND DISCUSSION

Funding

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (16)

Tables (1)

Equations (15)

Optica

System	Dataset	Task	Computational Decoder
Compressive sensing	MNIST	Recovery	U-NET
Single-pixel	Fashion MNIST	Recovery	U-NET
	Fashion MNIST	Classification	MobilNet-V2
CASSI	ARAD	Recovery	U-NET