## Abstract

We propose a computational paradigm where off-the-shelf optical devices can be used to image objects in a scene well beyond their native optical resolution. By design, our approach is generic, does not require active illumination, and is applicable to several types of optical devices. It only requires the placement of a spatial light modulator some distance from the optical system. In this paper, we first introduce the acquisition strategy together with the reconstruction framework. We then conduct practical experiments with a webcam that confirm that this approach can image objects with substantially enhanced spatial resolution compared to the performance of the native optical device. We finally discuss potential applications, current limitations, and future research directions.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

The resolution of an imaging system, *i.e.*, its ability to separate points that are located at small angular positions, is limited by the density of the sensors and by diffraction. Following advances in signal processing and computer science, computational approaches have been proposed to address cases where the number of sensors is the limiting factor. In particular, the compressed-sensing framework allows images to be reconstructed from substantially fewer sensors than defined by the Shannon-Nyquist sampling limit [1,2]. A practical implementation of compressed sensing is seen in the single-pixel camera, where a scene is imaged based on a single detector [3]. Applications of single-pixel imaging include microscopy [4], terahertz imaging [5], fluorescence lifetime imaging [6], time-resolved hyperspectral imaging [7], Raman imaging [8], and phase imaging [9]—see [10] for a recent review. In parallel to this trend, however, the spatial density of sensors has increased substantially, with native camera resolutions in standard mobile phones now exceeding ten megapixels. In most consumer and professional optical devices, computational methods that compensate for a lack of sensors are not critical.

The resolution of an imaging device is not only limited by the density of the sensors, but also by the diffraction that occurs inside the optical system. This physical limit has been extensively addressed in fluorescence microscopy. With the advent of super-resolution techniques, different technologies in microscopy are now able to achieve *diffraction-unlimited imaging*, i.e., to resolve details beyond the diffraction limit of the physical system [11]. These techniques rely on complex illumination schemes (*e.g.*, stimulated emission depletion [12]) or photocontrol of the sample (*e.g.*, photoactivated localization microscopy [13], stochastic optical reconstruction microscopy [14]), and they are tailored to fluorescence imaging. More generic resolution-enhancement techniques have also been developed for non-fluorescent objects (*e.g.*, digital holography [15], synthetic-aperture microscopy [16], Fourier ptychography [17]). While diffraction *per se* is unvoidable, these methods manage to extract visual information that is robust to diffraction. For instance, stimulated emission depletion exploits nonlinear fluorophore responses to minimize the area of illumination at the focal point, which allows to resolve areas that are substantially smaller than the diffraction limit.

However, the above methods rely on active illumination schemes and require direct access to the sample or object to be imaged. While suitable for microscopy, none of them can be applied when the object is too far away to be accessed (*e.g.*, in remote sensing or astronomy). In this case, only post-processing approaches have been implemented, to deconvolve the instrument response [18]. While deconvolution can improve the image quality when strong prior information is available (*e.g.*, point-like objects), it leads to relatively poor results for extended objects of unknown structure [19].

In this paper, we propose a computational imaging paradigm for objects that are too far away to be illuminated or accessed, which allows them to be imaged beyond the limit of diffraction. Our paradigm involves a specific acquisition procedure that allows to extract visual information that is robust to diffraction. Our approach is highly flexible, in the sense that it can be applied to any conventional off-the-shelf optical device by adding a spatial light modulator (SLM) some distance from it. After acquisition of a sequence of images for different SLM patterns, the object can be reconstructed through a simple procedure. To the best of our knowledge, this is the first time that a practical set-up based on these techniques has been shown to break the diffraction limit of an off-the-shelf optical device.

In Section 2, we specify the nature of the imaging problem under consideration. In Section 3, we introduce the joint acquisition and reconstruction paradigm, discuss its intrinsic invariance to diffraction and sampling, and describe an effective implementation. In Section 4, we demonstrate the relevance of the proposed approach through practical optical implementation that is based on a webcam, with a comparison of the results with respect to the conventional case where no SLM is added. In Section 5, we discuss the implications of this work, propose future lines of research, and discuss potential applications. Finally, we conclude our work in Section 6.

## 2. Conventional diffraction-limited acquisition

We consider the problem of imaging an object that emits spatially incoherent light, using a conventional diffraction-limited digital optical device. Let $\Sigma _{\textrm {o}}$ denote the object plane located at distance $z_{\textrm {o}}$ from the front principal plane of the device (see Fig. 1). We assume that the object and image planes are both perpendicular to the optical axis. The light emitted by the object is recorded by a sensor array of $P$ square pixels, which results in a digital image $\boldsymbol {g} \in \mathbb {R}^{P}$. This digital image is obtained by sampling the light intensity within the image plane $g(\mathbf {x}), \, \mathbf {x} \in \Sigma _{\textrm {i}}$. Mathematically, we can model the sampling operator $\mathcal {S}$ by

The intensity $g$, and hence the digital image $\boldsymbol {g}$, suffers from diffraction within the imaging device. On the basis of a centered diffraction-limited model, the diffraction can be modeled through a low-pass operator $\mathcal {D}$ [20, p. 130], such that

where $f_{\textrm {i}}(\mathbf {x})$, $\mathbf {x} \in \Sigma _{\textrm {i}}$, is a diffraction-free image, and $h(\mathbf {x})$, $\mathbf {x} \in \Sigma _{\textrm {i}}$ is a point-spread function corresponding to the Fraunhofer diffraction pattern of the exit pupil of the imaging system. The image $f_{\textrm {i}}$ relates to the object-plane intensity profile $f_{\textrm {o}}$ through geometrical optics and the inverse-square law, which depend on $z_{\textrm {o}}$ and the type of optical system. More specifically,Our goal is to recover a degradation-free image from the degradation-sensitive measurements, which is particularly relevant when the sampling or the diffraction effects—or both—dramatically limit the spatial resolution of a raw image $\boldsymbol {g}^{\delta }$. Contrary to post-processing approaches that invert Eqs. (4)–(5) numerically, which is prone to artifacts due the ill-posedness of the operators $\mathcal {S}$ and $\mathcal {D}$, we propose to alter the acquisition chain upfront, such that the degradation effects are neutralized.

## 3. Proposed diffraction-unlimited approach

#### 3.1 Concept

We propose to make the acquisition chain described in Section 2 robust to the loss of resolution due to $\mathcal {S}$ and $\mathcal {D}$, by performing multiple acquisitions and to recover the degradation-free image using a straightforward numerical inversion after preprocessing a set of modulated images.

Our acquisition approach consists of measuring a set of dot products $\{v_k\}_{1\le k \le K}$ between the image and some SLM patterns $\{q_{k}(\mathbf {x})\}_{1\le k \le K}, \, \mathbf {x} \in \Sigma$. This is realized through the addition of a SLM to the pre-existing acquisition set-up. The key specificity of our approach is to ensure that the SLM patterns modulate the image *before* degradation occurs due to $\mathcal {S}$ and $\mathcal {D}$, which ultimately makes the dot products and the corresponding numerical reconstruction robust to these effects. This is achieved by placing the SLM sufficiently far from the optical system, at a distance $z_{\textrm {}}$ perpendicular to the optical axis (see Fig. 1). Our overall acquisition and reconstruction approach involves three main steps, as illustrated in Fig. 2 and described below.

**Modulated image acquisition** The use of image modulation modifies the initial conventional forward model of Eq. (4). Specifically, every SLM pattern $q_{k}$ leads to measurement of the (digital) modulated image $\boldsymbol {g}_{k}$ given by

As the $q_{k}$ are complex-valued, whereas light intensities are positive quantities, every $\boldsymbol {g}_{k}$ is obtained in practice as a linear combination of several sub-acquisitions, as detailed in Sections 3.2 and 3.3 below. Finally, according to Eq. (5)—which applies to every acquisition—we only have access to noisy versions $\boldsymbol {g}_{k}^{\delta }$ of the modulated images $\boldsymbol {g}_{k}$.

**Preprocessing** Once the modulated images are acquired, we numerically integrate them over their field-of-view, to produce the scalar quantities

*not*affected by sampling or diffraction. Indeed, as shown in the Appendices {A.1} and {A.2} , it is proportional on average to the dot product of the SLM pattern and the diffraction-free image,

*i.e.*,

**Reconstruction** As detailed in Appendix A.3, each measurement $v_k^{\delta }$ satisfies the discrete-scalar-product relation $\operatorname *{\mathbb {E}}\left [{v_k^{\delta }}\right ] = \boldsymbol {q}_k^{\top }\boldsymbol {f}$, where $\boldsymbol {q}_k \in \mathbb {R}^{N}$ and $\boldsymbol {f} \in \mathbb {R}^{N}$ are discrete versions of the SLM pattern $q_k$ and of the degradation-free image $f$, respectively. Defining $\boldsymbol {v}^{\delta } = [v_1^{\delta } \ldots v_K^{\delta }]^{\top }\in \mathbb {R}^{K}$, we have the linear model

#### 3.2 SLM patterns

Our choice for the SLM is dictated by two requirements. First, it is crucial to maximize light throughput, so as to limit the acquisition time needed to acquire low-intensity objects with acceptable signal-to-noise ratios (SNRs). Secondly, the SLM patterns need to capture the information of $\boldsymbol {f}$ into relatively few measurements, so as to decrease the number of measurements, and hence the time needed for an acquisition. For instance, choosing $\boldsymbol {Q}$ as the identity matrix corresponds to an extreme case where light is only transmitted through a single SLM pixel at a time. This complies poorly with these requirements.

In this study, we choose $\boldsymbol {Q}$ as the discrete Fourier basis, which transmits $\sim \!64\%$ of the incident light flux and is known to sparsify natural images [22,23]. For a SLM array of $N = N_1 \times N_2$ square pixels of size $\Delta _{\textrm {}}\times \Delta _{\textrm {}}$, this defines the discrete SLM patterns $\boldsymbol {q}_k = [q_k^{1},\ldots ,q_k^{N}]^{\top }$ as

where $\mbox {j}$ is the imaginary unit, $\mathbf {n}$ is the two-dimensional pixel coordinate associated with the $n$-th pixel of the SLM, and $\boldsymbol {\xi }_k$ is the two-dimensional spatial frequency of the $k$-th pattern, with $\boldsymbol {\xi }_k \in \{0, 1/N_1 \ldots , (N_1 - 1)/N_1 \}\times \{0, 1/N_2 \ldots , (N_2 - 1)/N_2 \}$.This choice implies that the measurement vector $\boldsymbol {v}$ is the discrete Fourier transform (DFT) of $\boldsymbol {f}$. Therefore, the reconstruction step of Eq. (10) simplifies to the performing of an inverse DFT, which also has the advantage of having rapid implementation with complexity $\mathcal {O}(N \log N)$. The implementation of the spatial patterns $\boldsymbol {q}_k$ into the SLM is discussed in Section 3.3, below.

#### 3.3 Differential acquisition strategy

As our physical set-up can only implement positive-valued SLM patterns, every complex-valued SLM pattern must be split into positive real-valued patterns to be programmed into the SLM. The acquisitions that use positive patterns are then recombined to obtain a complex-valued image $\boldsymbol {g}_{k}^{\delta }$. Differential strategies, which find their roots in structured light microscopy [24], are common in ghost [25] and single-pixel imaging [26]. One advantage of differential acquisition is its intrinsic robustness to the additive noise caused by environmental illumination. Here, we use four positive patterns, following a splitting approach that is similar to [26]. Specifically, we define $\boldsymbol {q}_{k,i}$, $1 \le i \le 4$ by

Finally, we repeat the measurements $L$ times and average them to decrease the noise. The modulated image $\boldsymbol {g}_k^{\delta }$ is computed as

#### 3.4 Time budget and subsampling

As $\boldsymbol {f}$ is real-valued, only half of its DFT coefficients are nonredundant. Recalling that the SLM is an array of $N = N_1 \times N_2$ pixels, and assuming that $N_1$ and $N_2$ are even, the number of DFT coefficients for full acquisition is $K = N/2 + 2$. Implementing the differential approach of Eq. (15), the time budget for a full acquisition is thus

The time budget of full acquisition can be substantial when the acquisition time per measurement is large (*e.g.*, for low-intensity objects that require ${{\Delta _\textrm {t}}}$ or $L$ to be large), or when the image resolution $N$ is large. One approach to decrease the time budget is to use a smaller number of $L'$ of acquisitions per SLM pattern, which yields Another approach is to acquire a subset of $K < N/2 + 2$ significant DFT coefficients. In this case, we consider the low-frequency diamond scheme, which yielded the best image-reconstruction results in [27]. The corresponding time budget is We also propose an adaptive subsampling approach that can preserve higher-frequency coefficients, exploiting repeated measurements. This approach is described in detail in Appendix A.4. For each of the aforementioned subsampling methods, the sampling ratio $\gamma$ is defined as

#### 3.5 Angular resolution

As this approach is robust to diffraction and sampling that occur after modulation, the maximum angular resolution $R$ (in radians; the smaller the better) only depends on the SLM. Under the approximation of small angles, and neglecting noise, we have

where $\Delta$ is the SLM pixel size and $z$ is the distance between the conventional optical device and the SLM. The combination of the original device and the SLM can be seen as a single new device where the maximum angular resolution is only parameterized by $\Delta _{\textrm {}}$ and $z_{\textrm {}}$. This is in contrast to conventional optical devices, where the maximum angular resolution is fixed and is limited by diffraction.One way to improve the angular resolution is to decrease the SLM pixel size. The lower limit for $\Delta _{\textrm {}}$ depends on the available technology (*e.g*, digital micro mirrors, translucent or reflective liquid crystal displays). Ultimately, $\Delta _{\textrm {}}$ must be larger than the wavelength used for acquisition, to avoid diffraction effects during modulation [28]. For a fixed pixel size $\Delta _{\textrm {}}$, a target angular resolution $R$ can be achieved by setting the SLM distance $z_{{\textrm {}}}$ accordingly, *i.e.*, choosing $z_{{\textrm {}}} \geq \Delta _{\textrm {}} / R$. For instance, for a pixel size of $50 \mu$m, setting the SLM at a distance of $10$ m yields an angular resolution of $5 \cdot 10^{-6}$ radians, which is equivalent to approximately one arcsecond. This angular resolution is comparable to that of a regular 4-inch telescope [29]. In Table 1, we report more example values of $R$ as a function of $z_{\textrm {}}$ and $\Delta _{\textrm {}}$.

## 4. Results

#### 4.1 Experimental setup

We evaluate this approach considering a webcam (USB HD C270; Logitech) as the conventional optical device (Fig. 3(a)) and the front screen of a commercial showcase (ClearVue Lite CV101LV1) as the SLM (Fig. 3(b)). The object is placed and illuminated inside the same showcase some distance behind the screen (Fig. 3(c)). Overall, the scene setting follows the configuration of Fig. 1, where all of the distances and the effective SLM area ensure that the object is fully modulated and captured by our webcam according to the requirements of Section 3.

The webcam camera has a resolution of $1280 \times 960$ pixels, an angle of view of $60$ degrees, and a focal length of $4$ mm. It produces color images in compressed JPEG format that we converted to grayscale. According to the trigonometric relations between these quantities [31], the pixel size of the camera is $2.9 \mu$m, with a corresponding angular resolution of $135$ arcseconds. Importantly, this pixel size is comparable to the theoretical optical diffraction limit associated with the webcam parameters, and can thus be used as a meaningful reference to assess the resolving power of our approach. For instance, the Airy-disk diameter $D_{\textrm {A}} = 2.44 \lambda N$ obtained at average optical wavelength $\lambda = 550$ nm [32] and low focal ratio $N = 2$ is $D_{\textrm {A}} = 2.7 \mu$m.

The SLM has a liquid crystal display of $1024 \times 600$ pixels with $\Delta _{\textrm {}} = 210 \mu$m and a contrast ratio of 500:1. For our experiments, we only use an effective area of $N = 64 \times 64$ pixels located at the center of the liquid crystal display, with the rest of the pixels set to block out all of the light. For conventional acquisitions, we set the effective SLM region of $64 \times 64$ pixels to transmit all of the light, while leaving the rest of the SLM pixels set to block out all of the light. This ensures that the same object field-of-view is acquired with and without modulation.

The object is the letter "T" as the black capital (width $9.5$ mm, height $8$ mm) with a white background, as shown in Fig. 4(a). The object is placed inside the showcase, which leads to $z_{\textrm {}} = 3.80$ m and $z_{\textrm {o}} = 3.87$ m, respectively. As viewed from the webcam, the effective size of $f_{\textrm {o}}$ captured by the SLM is thus $730 \times 730$ arcseconds, or as the equivalent, $5.4 \times 5.4$ sensor pixels.

In our experiments, we acquire each pattern $L = 50$ times during ${{\Delta _\textrm {t}}}=42$ ms. The object illumination, SLM transmittivity gain, and webcam gain are set to maximize the brightness, while avoiding saturation of the webcam. We perform the acquisition in a dark room, to minimize the influence of variations in ambient light during acquision. To ensure the correct synchronization between the acquisitions performed by the webcam and the generation of the SLM patterns, a latency of $0.5$ s is added between the successive measurements. As this latency is only relevant to our particular software implementation, it is not included in our time budget.

Both the optical and the SLM devices are connected to a laptop computer (MacBook Pro; 2.4 GHz Intel Core i7; 6 GB memory). All of the acquisition and reconstruction methods are implemented in Matlab.

#### 4.2 Proposed paradigm versus conventional acquisition

In this first experiment, we assess the proposed paradigm, and compare it with conventional acquisition where the object is acquired from the same location but without SLM modulation. Based on the practical set-up and its parameters described in Section 4.1, we acquire object coefficients and reconstruct $\boldsymbol {f}$ via Eq. (10). The results of this experiment are shown in Fig. 4.

In the conventional acquisition setting (Fig. 4(b)), the object appears in a very small central region of the acquired image, where the horizontal line at the top is due to light leaking from the showcase. When magnifying this central region (Fig. 4(c)), whose size is $5.4 \times 5.4$ webcam-sensor pixels as derived in Section 4.1, no clear object features can be identified. The image is blurred due to diffraction and instrument response. This confirms that the native resolution of the webcam is insufficient to image the object.

In the proposed-paradigm setting (Fig. 4(d)), the reconstructed object can be resolved and appears to be consistent with the original profile (Fig. 4(a)). The reconstruction also contains details that are significantly smaller than the pixel resolution and diffraction limit in the classical setting (Fig. 4(c)). Finally, the resolution of our approach is quantified using the edge response [33]. Accordingly, the analysis of the reconstruction (Fig. 4(e)) yields a resolution of 2.6 SLM pixels, which corresponds to an angular resolution of 30 arcseconds. As the native angular resolution of the webcam is 135 arcseconds (see Section 4.1), this is a 4.5-fold improvement. This result demonstrates that our joint acquisition and reconstruction paradigm can image objects at a resolution that significantly exceeds the native limits of the conventional device it is built from.

As mentioned in the previous sections, one major caveat of our approach is its acquisition time. In that regard, Eq. (16) implies that the time budget to acquire $f$ in our set-up is $t_\textrm {full} = 17\,220$ s ($4.8$ h). In the next experiment, we thus investigate how this time budget can be mitigated while maintaining acceptable reconstruction quality.

#### 4.3 Subsampling

In this second experiment, we investigate whether the subsampling can maintain reconstruction quality under acquisition-time budgets than are lower than the one of Section 4.2. To do so, we compare the performance of the subsampling strategies proposed in Section 3.4. For convenience, our images are reconstructed retrospectively, based on the full set of DFT coefficients, as in [34]. Each subsampling method is evaluated in terms of the SNR using the fully sampled result of Section 4.2 as reference.

First, we consider the case where $\gamma = 1/4$, *i.e.*, a four-fold reduction in the acquisition time. The images reconstructed here are shown in Fig. 5. Repetition subsampling (Fig. 5(a)) yields the worst result, due to noise. The noise issue is further exacerbated in the extreme case where the image is reconstructed using no repetition (Fig. 5(e)), which illustrates the need for repeat measurements. Compared to both non-adaptive approaches (Fig. 5(a), (b), the proposed adaptive subsampling scheme (Fig. 5(c)) yields the best result. This last is closest to the ideal oracle subsampling (Fig. 5(d)), where the highest-energy coefficients are determined and selected *a posteriori* from the full acquisition ($L = 50$; $K = N/2 + 2$). Reconstruction from adaptive sampling is also less blurry than from nonadaptive sampling, which confirms the potential of adaptive approaches to better preserve high-frequency information. The sampling pattern of the adaptive scheme (Fig. 5(g)) is also close to that of the oracle (Fig. 5(h)), as opposed to the nonadaptive case (Fig. 5(f)).

Figure 6 illustrates the SNR of the image reconstructed with different subsampling strategies with increasing sampling ratios. Adaptive sampling consistently outperforms its nonadaptive counterpart over a large range of sampling ratios. Its drop in performance at low sampling ratios is due to the constant time-budget overhead that is used to determine the coefficients relevant for repetition, as detailed in Appendix A.4. Overall, these results highlight the critical impact of the subsampling strategy and the potential of adaptive methods to preserve reconstruction quality with smaller time budgets.

## 5. Discussion

Our experiments confirm that the proposed paradigm can be used to resolve objects beyond the native capabilities of an optical device. It is worth emphasizing that in these experiments, the classical resolution benchmark that is outperformed by this method is the pixel size of the webcam, which is comparable to an ideal diffraction limit, as mentioned in Section 4. The actual resolving power of the webcam is even lower than this benchmark due to device nonidealities, such as optical aberrations, which also account for the blur observed in Fig. 4(c).

Our paradigm potentially extends to a relatively broad class of optical devices and applications. In this regard, it is important to note that the only property of the point-spread function $h$ that is exploited to derive Eq. (8) is its preservation of the total light energy, as shown in Appendix A.2. For this reason, Eq. (8) also holds in the presence of optical aberrations, which can be seen as phase components of a *generalized pupil function* [20, p. 145]. Our acquisition approach would also remain valid if $h$ were not isoplanatic, or if the light emitted from the object were spatially coherent, in which case diffraction effects would become nonlinear in intensity but remain energy preserving.

While the strength of the current study is its broad applicability to a large set of optical devices, dedicated optics can also be envisaged, drawing from previous studies on one-pixel cameras, for instance. For current single-pixel cameras, one major difference is that the SLM is primarily meant to compensate for the lack of multiple sensors, and it is integrated into the device itself instead of being placed externally. The achievable resolution is thus currently determined by the diffraction limit of the device [27], as opposed to our acquisition approach where modulation from the SLM occurs before diffraction. However, our approach described in Section 3 is computationally similar to what is used in single-pixel cameras. In particular, both involve the acquisition of scalar products with the object using a SLM. In further work, the single-pixel architecture can thus be adapted to implement our approach, and its single sensor could physically replace the numerical-integration process in Eq. (7).

In terms of applications, our paradigm can be used to either better resolve objects – as in our experiments here – or to better track them. The proposed approach can also be exploited to image objects where the distance is out of reach, for either practical or technical reasons, *e.g.*, at infinity. In that regime, the ability of our method to increase angular resolution, or, equivalently, to increase spatial resolution independent of the object distance becomes key, as there is no way to access, illuminate, or increase the size of the object in the image field of view. This is typically the case in astronomical imaging, in which case an additional SLM placed at sufficient distance inside or outside the Earth’s atmosphere might further enhance the resolution limits of an existing ground or space-based telescope. Such a configuration could borrow from the *external occulter* concept [35], except that the SLM would not only allow to mask the light from unwanted sources, but also to modulate the light from the object of interest and reconstruct a more detailed image, based on our paradigm. For astronomical applications, the robustness of our paradigm to diffraction and the use of differential measurements might also prove useful, to mitigate the effects of atmospheric seeing and turbulence, depending on the relative location of the SLM. This remains the topic of further investigations.

An important limiting factor of our approach is the acquisition time required to produce a suitable reconstruction, which is caused by the very low amounts of light that can be collected by the sensor array compared to the noise level. In that regard, our no-repetition result (Fig. 5(e)) illustrates how noise can negatively affect reconstruction quality when the acquisition time is insufficient. The effect of noise is also important in the computational imaging paradigms used in microscopy [36]. In the noiseless regime, Eq. (8) implies that the imaging quality of our approach is only limited by the angular resolution determined by the SLM parameters $\Delta _{\textrm {}}$ and $z_{\textrm {}}$, as discussed in Section 3.5. How to extend our paradigm to higher resolutions, for instance for point-like objects [37], constitutes an open question.

In this work, we have shown how particular acquisition strategies can mitigate the issue of the acquisition time to some extent. In further work, the proposed acquisition and reconstruction methods can be adapted to decrease the time budget, to maximize the reconstruction quality, and to operate in more complex settings. For instance, satisfactory reconstructions might be obtained from fewer coefficients based on the compressed-sensing framework, assuming wavelet, total-variation, or nonlocal image priors [1,2,38]. Recent advances in deep learning for inverse problems might also be of benefit to our paradigm. Indeed, image reconstruction based on neural networks is intrinsically faster than compressed-sensing-based iterative algorithms, and provides improved reconstruction quality by learning image features during an offline training phase [39].

Adaptive-acquisition methods that are more advanced than the one proposed in Section 3.4 can also be developed to limit the acquisition time. Such methods might avoid repetitions of the same SLM pattern by producing specifically optimized patterns for each new measurement, based on information on signal and noise properties that would be gathered from all of the previous measurements. Furthermore, acquisition settings where the object is moving or changing might be handled by adapting our acquisition model and our algorithms.

## 6. Conclusion

We here propose and demonstrate experimentally a novel imaging paradigm where an optical device can be used in conjunction with a SLM to acquire and resolve remote objects with resolution that exceeds the diffraction limit of the optical device. Our acquisition strategy can be seen as the transformation of the optical device that produces degraded measurements—due to diffraction and instrument response—into a new device that produces compressed measurements. Then, the image of the scene is reconstructed by solving a simple inverse problem. Our experiments represent the first proof-of-concept that the loss of spatial and angular resolution that is intrinsic to diffraction can be circumvented through the use of specific acquisition and reconstruction strategies.

## A. Appendix

## A.1. Unbiasedness of the measurements

By linearity of the expectation, we obtain, from the differential expression of $\boldsymbol {g}_k^{\delta }$ given by Eq. (15),

## A.2. Robustness to diffraction and sampling

We now demonstrate that $v_k^{\delta }$ are robust to diffraction and sampling. In more detail, they provide inner products between the degradation-free image $f$ and the modulating patterns $q_k$, up to a multiplicative constant. By linearity of the expectation, we have ${\mathbb {E}}\left [{v_k^{\delta }}\right ] = {\mathbb {E}}\left [{\boldsymbol {u}^{\top } \boldsymbol {g}_k^{\delta }}\right ] = \boldsymbol {u}^{\top }\, {\mathbb {E}}\left [{ \boldsymbol {g}_k^{\delta }}\right ]$. Therefore, as measurements are unbiased as shown in Eq. (24), we have ${\mathbb {E}}\left [{v_k^{\delta }}\right ] = \boldsymbol {u}^{\top } \boldsymbol {g}_k$. By definition of $\boldsymbol {g}_k$, we have

*i.e.*}, $\sum_p \phi(\mathbf{x}_{p}-\mathbf{x}) = 1$, and the convolution simplifies to

## A.3. Discretization

Each modulation pattern $q_k(\mathbf {x})$ is implemented using a vector $\boldsymbol {q}_k = [q_k^{1},\ldots ,q_k^{N}]^{\top }$ that indicates the value of each of the SLM pixels. Mathematically, we have

where $b(\mathbf {x})$ is a square box function of size $\Delta _{\textrm {}}$ that represents the shape of the SLM pixels. Therefore Eq. (28) expands as*i.e.*,

## A.4. Adaptive subsampling

Subsampling methods such as the low-frequency diamond scheme proposed in [27] select coefficients *a priori*, regardless of image properties. Therefore, such methods can miss relevant high frequencies. Inspired by [34], we propose an adaptive subsampling scheme that mitigates this problem. Specifically, we propose to first make a quick but exhaustive acquisition of all of the coefficients, and then to repeat the acquisition of the highest coefficients only, to increase their SNR.

Retaining $K$ significant coefficients and acquiring the significant coefficients $L$ times at most, we obtain the adaptive measurement vector as follows:

- 1. Acquire all patterns,
*i.e.*, $\{\boldsymbol {q}_k\}$ for $1 \le k \le N$. The resulting measurement vector is denoted by $\boldsymbol {v}^{\delta ,1}$ - 2. Low-pass filter the image of the first-pass measurements $\boldsymbol {v}^{\delta ,1}$ with a Gaussian filter of unit variance; this yields the filtered coefficients ${\hat {\boldsymbol {v}}^{\delta ,1}}$
- 3. Find the locations of the $K$ highest (absolute) values of the filtered measurement vector ${\hat {\boldsymbol {v}}^{\delta ,1}}$. The set of indices indicating relevant coefficients is denoted by $\Omega$.
- 4. Acquire the relevant patterns,
*i.e.*, $\{\boldsymbol {q}_k\}_{k \in \Omega }$, $L-1$ more times. The resulting measurement vectors are denoted by $\boldsymbol {v}^{\delta ,\ell }$, $2\le \ell \le L$, where the coefficients that are not acquired are set to zero. The first-pass acquisition is cleaned by setting $\boldsymbol {v}^{\delta ,1}_{k}$ to zero for all $k\notin \Omega$. - 5. Average the measurement vectors to get the adaptive measurement vector $\boldsymbol {v}^{\delta }$. Mathematically, $\boldsymbol {v}^{\delta } = \frac {1}{L}\sum _{\ell =1}^{L} \boldsymbol {v}^{\delta ,\ell }$.

## Funding

Agence Nationale de la Recherche (ANR-17-CE19-0003).

## Acknowledgments

The authors thank Dr. Diego Marcos for his valuable suggestions for improvements to the manuscript.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, arXiv preprint math/0409186 (2004).

**2. **E. J. Candès, “Compressive sampling,” in Proceedings of the international congress of mathematicians, vol. 3 (Madrid, Spain, 2006), pp. 1433–1452.

**3. **M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. **25**(2), 83–91 (2008). [CrossRef]

**4. **N. Radwell, K. J. Mitchell, G. M. Gibson, M. P. Edgar, R. Bowman, and M. J. Padgett, “Single-pixel infrared and visible microscope,” Optica **1**(5), 285–289 (2014). [CrossRef]

**5. **C. M. Watts, D. Shrekenhamer, J. Montoya, G. Lipworth, J. Hunt, T. Sleasman, S. Krishna, D. R. Smith, and W. J. Padilla, “Terahertz compressive imaging with metamaterial spatial light modulators,” Nat. Photonics **8**(8), 605–609 (2014). [CrossRef]

**6. **Q. Pian, R. Yao, N. Sinsuebphon, and X. Intes, “Compressive hyperspectral time-resolved wide-field fluorescence lifetime imaging,” Nat. Photonics **11**(7), 411–414 (2017). [CrossRef]

**7. **F. Rousset, N. Ducros, F. Peyrin, G. Valentini, C. D’Andrea, and A. Farina, “Time-resolved multispectral imaging based on an adaptive single-pixel camera,” Opt. Express **26**(8), 10550–10558 (2018). [CrossRef]

**8. **D. J. Starling and J. Ranalli, “Compressive sensing for spatial and spectral flame diagnostics,” Sci. Rep. **8**(1), 2556 (2018). [CrossRef]

**9. **R. Horisaki, H. Matsui, and J. Tanida, “Single-pixel compressive diffractive imaging with structured illumination,” Appl. Opt. **56**(14), 4085–4089 (2017). [CrossRef]

**10. **M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics **13**(1), 13–20 (2019). [CrossRef]

**11. **L. Schermelleh, R. Heintzmann, and H. Leonhardt, “A guide to super-resolution fluorescence microscopy,” Int. J. Biochem. Cell Biol. **190**(2), 165–175 (2010). [CrossRef]

**12. **S. W. Hell and J. Wichmann, “Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy,” Opt. Lett. **19**(11), 780–782 (1994). [CrossRef]

**13. **E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science **313**(5793), 1642–1645 (2006). [CrossRef]

**14. **M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm),” Nat. Methods **3**(10), 793–796 (2006). [CrossRef]

**15. **V. Micó, Z. Zalevsky, C. Ferreira, and J. García, “Superresolution digital holographic microscopy for three-dimensional samples,” Opt. Express **16**(23), 19260–19270 (2008). [CrossRef]

**16. **T. Gutzler, T. R. Hillman, S. A. Alexandrov, and D. D. Sampson, “Coherent aperture-synthesis, wide-field, high-resolution holographic microscopy of biological tissue,” Opt. Lett. **35**(8), 1136–1138 (2010). [CrossRef]

**17. **G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution fourier ptychographic microscopy,” Nat. Photonics **7**(9), 739–745 (2013). [CrossRef]

**18. **J. L. Starck, E. Pantin, and F. Murtagh, “Deconvolution in astronomy: A review,” Publ. Astron. Soc. Pac. **114**(800), 1051–1069 (2002). [CrossRef]

**19. **K. G. Puschmann and F. Kneer, “On super-resolution in astronomical imaging,” Astron. Astrophys. **436**(1), 373–378 (2005). [CrossRef]

**20. **J. W. Goodman, *Introduction to Fourier optics (2nd edition)* (McGraw Hill Higher Education, 1996).

**21. **A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian, “Practical poissonian-gaussian noise modeling and fitting for single-image raw-data,” IEEE Trans. on Image Process. **17**(10), 1737–1754 (2008). [CrossRef]

**22. **D. J. Tolhurst, Y. Tadmor, and T. Chao, “Amplitude spectra of natural images,” Ophthalmic Physiol. Opt. **12**(2), 229–232 (1992). [CrossRef]

**23. **A. Torralba and A. Oliva, “Statistics of natural image categories,” Network-Comp Neural. **14**(3), 391–412 (2003). [CrossRef]

**24. **M. A. A. Neil, R. Juškaitis, and T. Wilson, “Method of obtaining optical sectioning by using structured light in a conventional microscope,” Opt. Lett. **22**(24), 1905–1907 (1997). [CrossRef]

**25. **S. M. Khamoushi, Y. Nosrati, and S. H. Tavassoli, “Sinusoidal ghost imaging,” Opt. Lett. **40**(15), 3452–3455 (2015). [CrossRef]

**26. **Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of fourier spectrum acquisition,” Nat. Commun. **6**, 1–6 (2015). [CrossRef]

**27. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus fourier single-pixel imaging,” Opt. Express **25**(16), 19619–19639 (2017). [CrossRef]

**28. **K. M. Schmitt and M. Rahm, “Evaluation of the impact of diffraction on image reconstruction in single-pixel imaging systems,” Opt. Express **24**(21), 23863–23871 (2016). [CrossRef]

**29. **M. Mobberley, *Astronomical equipment for amateurs* (Springer Science & Business Media, 1999).

**30. **M. Yanoff and J. S. Duker, *Ophthalmology, Maryland Heights, MO* (Elsevier, 2009).

**31. **E. McCollough, Photographic topography, Industry: A Monthly Magazine Devoted to Science, Engineering and Mechanic Arts p. 65 (1893).

**32. **E. R. Fossum, “What to do with sub-diffraction-limit (SDL) pixels?-A proposal for a gigapixel digital film sensor (DFS),” in IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors, (2005), pp. 214–217.

**33. **S. W. Smith, *The scientist and engineer’s guide to digital signal processing* (California Technical Pub. San Diego, 1997).

**34. **F. Rousset, N. Ducros, A. Farina, G. Valentini, C. D’Andrea, and F. Peyrin, “Adaptive basis scan by wavelet prediction for single-pixel imaging,” IEEE Trans. Comput. Imaging **3**(1), 36–46 (2017). [CrossRef]

**35. **W. A. Traub and B. R. Oppenheimer, “Direct imaging of exoplanets, Exoplanets pp. 111–156 (2010).

**36. **R. Chen, M. Wu, J. Ling, Z. Wei, Z. Chen, M. Hong, and X. Chen, “Superresolution microscopy imaging based on full-wave modeling and image reconstruction,” Optica **3**(12), 1339–1347 (2016). [CrossRef]

**37. **X. Chen, *Computational methods for electromagnetic inverse scattering* (Wiley Online Library, 2018).

**38. **G. Peyré, S. Bougleux, and L. Cohen, “Non-local regularization of inverse problems,” in *European Conference on Computer Vision*, (Springer, 2008), pp. 57–68.

**39. **C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. **8**(1), 2369 (2018). [CrossRef]