Low-signal limit of X-ray single particle diffractive imaging

Kartik Ayyer; Kartik Ayyer; Andrew J. Morgan; Andrew J. Morgan; Andrew Aquila; Hasan DeMirci; Hasan DeMirci; Hasan DeMirci; Brenda G. Hogue; Brenda G. Hogue; Brenda G. Hogue; Richard A. Kirian; P. Lourdu Xavier; P. Lourdu Xavier; P. Lourdu Xavier; Chun Hong Yoon; Henry N. Chapman; Henry N. Chapman; Henry N. Chapman; Anton Barty

doi:10.1364/OE.27.037816

1. Introduction

The potential of X-ray free electron lasers (XFELs) to image biomolecular structures at room temperature without the need for crystallisation has been one of the goals driving their development. For many years, theoretical studies backed by simulated data have suggested that near-atomic resolution of isolated non-crystalline proteins should be possible with currently available XFEL sources [1,2]. To date, published results have focused on large or symmetric particles such as viruses in the 60-500nm size range where the higher signal levels from larger particles is ideal for methods development [3–6]. Results from the single particle imaging initiative at the Linac Coherent Light Source (LCLS) [7] have been in a similar size range [8,9].

Imaging individual proteins has so far proven more elusive due to the lower signal-to-background from smaller sized particles and a lower than expected rate of single particle diffraction pattern acquisition [6]. While theoretical studies indicate that molecular imaging should be achievable using Bayesian algorithms such as the EMC algorithm [10] for near-perfect data simulated assuming currently available XFEL parameters [2], this has yet to be demonstrated using experimental data containing realistic instrument background, sample variability and other experimental factors.

This paper addresses the question of whether these above-mentioned experimental effects pose a fundamental roadblock to diffraction-pattern alignment and phasing algorithms in the low signal limit. We achieve this using experimental rather than simulated data. The approach taken is to start with experimentally measured data and progressively reduce the photon count to levels similar to those expected from smaller particles such as individual proteins. This process also mimics data that would be recorded from the same size particles using weaker X-ray pulses such as will soon be available with a high repetition rate from the LCLS-II upgrade.

We start from data collected by the SPI initiative from 60 nm PR772 viruses [9] to 8.5-nm resolution. Weak data was generated by keeping only a small, random fraction of photons from each experimental snapshot. These reduced data, or ‘diluted’, patterns contain just a smattering of photons which often look like pure noise to the eye. In addition to diffraction from the virus particles, each diffraction pattern contains instrument background caused by a range of experimental sources. Any structure in the instrument background does not depend on particle orientation, thus after orientation determination this background appears as a spherically symmetric function incoherently added to the 3D Fourier intensities of the object. To account for this background, we develop a modified iterative phasing algorithm which isolates and retrieves this background while reconstructing the electron density, and also show that phase retrieval is robust to statistical noise.

The paper is set out as follows. The reconstruction pipeline and the results of its application to the full data set are described in Section 3, and a set of metrics including the Fourier Shell Correlation (FSC) and Phase Retrieval Transfer Function (PRTF) for quantifying reconstruction resolution and fidelity are defined in Section 4. The experimental data sets are then subsampled by randomly selecting a fraction of photons in every frame, followed by orientation and phasing of the sparsified photon counts in Section 5. The quality of the electron densities obtained using the subsampled data sets is evaluated and compared using the metrics of reconstruction quality defined in Section 4.

We find that the reconstruction quality persists for a significant reduction of data quantity: even when the signal is reduced by as much as 1/256, quality metrics show the virus electron density determined using ab initio phasing is of almost the same quality as the high signal data. This suggests that given sufficient number of single particle diffraction patterns from sub-10 nm biomolecules with current XFEL parameters (assuming a proportionate reduction in instrument background), or from 60-nm viruses with a pulse 256 times weaker, one can obtain reliable 3D electron densities with the methods presented here. In order to obtain higher resolution, many more patterns will be required to achieve sufficient statistics. This may soon be within reach with advancements in sample delivery methods as well as with high-repetition-rate XFEL sources such as the European XFEL and LCLS-II.

2. Experiment description

Diffraction snapshots of aerosolized PR772 viruses were collected at the Linac Coherent Light Source (LCLS) as described in [9]. Briefly, diffraction patterns were recorded on a pnCCD detector in the AMO instrument at the LCLS [11] at a photon energy of 1.6 keV with the detector placed 586 mm downstream from the X-ray-sample interaction point, giving a resolution of 11.8 nm at the center-edge of the detector and maximum resolution of 8.4 nm in the corner of the detector. This data set is available for download from the Coherent X-ray Imaging database [12] as CXIDB 58.

The data set consists of 14 772 frames with an average signal level of 395 876 photons/frame. For a 60 nm virus, the speckles were around 100 pixels wide. The pixels were therefore binned by a factor of 4 in both dimensions after photon conversion to reduce computational costs. Excluding bad pixels and the central speckle, where the detetor was often saturated, there were 34 783 photons/frame on average. There were on average 22.2 photons/speckle at the detector corner.

Diffraction patterns were recorded at a repetition rate of 120 Hz, however only a small fraction of the X-ray pulses interacted with an object. These so-called “hits” included not only interactions with PR772 virus particles but also with water droplets, multi-particle clusters, and patterns with detector artifacts. Such spurious patterns need to be excluded from analysis. In [9], Reddy et al describes the classification of the single particle patterns using various machine learning methods, with the data for this study based on the classification by manifold embedding [13] to obtain a data set consisting of 14 772 single virus diffraction patterns.

3. Reconstruction procedure

The PR772 virus electron density was reconstructed in a two-step process, illustrated in Fig. 1 and detailed below. First, the orientations of a set of noisy diffraction patterns of mostly identical objects in random orientations with variable incident fluence were determined to produce a 3D intensity volume using the EMC algorithm [10]. The three dimensional diffraction volume was then phased using a background-aware phase retrieval algorithm to arrive at the real-space electron density using a combination of the Difference Map [14] and Error Reduction [15] algorithms.

Fig. 1. Reconstruction of the virus electron density from measured diffraction snapshots is a two step process. First, the orientations of a set of noisy diffraction patterns of mostly identical objects in random orientations with variable incident fluence (top left) are determined to produce a 3D intensity volume (top right). The three dimensional diffraction volume is then phased using a background-aware phase retrieval algorithm to arrive at the real space electron density (bottom). The electron density is shown as both an isosurface plot and a slice through the center of the object.

Download Full Size | PDF

3.1 Alignment: Determining the 3D reciprocal space intensity distribution

Orientation determination, alignment and scaling of the diffraction patterns into a 3D diffraction volume was performed using the Dragonfly software [2]. Data was provided to Dragonfly in photon counts since the pnCCD detector used in this experiment could resolve individual 1.6 keV photons. A Poisson noise model was therefore used in Dragonfly. Both the orientation as well as a relative scale factor was estimated for each pattern to account for incident fluence fluctuations and variations in impact parameter of the virus relative to the beam. The predicted intensities on the detector for a given orientation were multiplied by this scale factor before calculating the probability distribution over orientations (PDOs). These scale factors were updated every iteration using the current estimate for the PDO for each pattern. In order to avoid convergence issues due to the high signal per pattern, the PDO was raised to the power of the deterministic annealing parameter, $\beta$. This parameter was increased from 0.001 by a factor of $\sqrt {2}$ every 10 iterations. The detailed procedure used for this reconstruction is described in Appendix A.

3.2 Phasing: Iterative phase retrieval with background estimation

The three dimensional diffraction volume from Dragonfly was phased to arrive at the real space electron density using a background-aware iterative projection phase retrieval algorithm as described in Algorithm 1. The update rule for this algorithm consists of a modulus projection defined to incorporate a spherically symmetric background intensity which is incoherently added to the diffraction signal (“Background aware") in addition to a support constraint on the electron density consisting of a fixed number of voxels rather than a static mask (“Voxel number support").

The iterate $\Psi$ is comprised of both the real space density ${\rho (\mathbf {x})}$ and background ${B(\mathbf {q})}$

(1)$$\Psi = \left\{\rho(\mathbf{x}), B(\mathbf{q})\right\}$$

In practice this consists of two 3D volumes, one for the real-space electron density and the other for the square root of the background intensity. The calculated intensity is the sum of the intensity from the particle plus the background,

(2)$$I_\textrm{calc}[\Psi](\mathbf{q}) = \left|\mathcal{F}[\rho](\mathbf{q})\right|^2 + B^2(\mathbf{q})$$

where $\mathcal {F}[\rho ]$ is the discrete Fourier transform of the electron density $\rho$. The modulus projection rescales both terms by the ratio to the measured Fourier magnitude,

(3)$$P_M[\Psi] = \left\{\mathcal{F}^{-1}\left[\sqrt{\frac{I_\textrm{meas}(\mathbf{q})}{I_\textrm{calc}(\mathbf{q})}} \mathcal{F}[\rho](\mathbf{q})\right], \sqrt{\frac{I_\textrm{meas}(\mathbf{q})}{I_\textrm{calc}(\mathbf{q})}}B(\mathbf{q})\right\}$$

where $I_{\textrm {meas}}(\mathbf {q})$ is the measured intensity.

The support projection imposes two different constraints on the two halves of the iterate, $\rho$ and $B$. A constant $N$ is chosen at the beginning representing the number of voxels inside the particle for which the density is allowed to be non-zero. In this case we chose $N=2000$. The modulus-squared electron density values are sorted and the highest $N$ are left unchanged while the rest are set to zero. The background intensities, $B(\mathbf {q})$, are replaced by the spherically symmetric version i.e. the intensities in each radial bin are replaced by their average. The derivation that both these operations are projections is given in Appendix B. Further details regarding masking and alignment of reconstructions from different random starting models are discussed in Appendix C.

3.3 Reconstruction from the full data set

The results of applying the above two-step reconstruction method to all 14 772 patterns are shown in Fig. 1. The 3D intensity shows strong icosahedral symmetry even though this constraint was not enforced during the reconstruction. The resolution corresponding to the edge of the spherical volume of intensities is 8.4 nm. After iterative phasing, the electron density shown in the bottom row was obtained. The contour plot shows an icosahedron with bulges at each vertex while a slice through the object centre shows the presence of a double-walled shell with a slight reduction in density just inside the outer shell, consistent with other treatments of the data [16,17].

4. Quantifying reconstruction quality

A set of quantitative metrics are required in order to compare reconstructions and assess overall reconstruction quality, for reconstructions of both the full and diluted data sets. We used two metrics established in the literature, which we define in this section for clarity, and applied them to the reconstruction performed with the full data set described above.

4.1 “Gold-standard” cross correlations

The first of these metrics, inspired by cryo-electron microscopy, involves a slight change in the analysis pipeline itself. The ‘gold-standard‘ Fourier shell correlation from CryoEM [18] calls for the separation of the dataset into two equal halves. Each half is analyzed independently, the final volumes rotationally aligned, and the relative agreement is calculated as a function of resolution using the Fourier Shell Correlation (FSC) metric:

(4)$$\mathrm{FSC}(q) = \operatorname{Re}\left[\frac{\sum\limits_{|\mathbf{q}_i| = q} F_1(\mathbf{q}_i) F_2^*(\mathbf{q}_i)} {\sqrt{\sum\limits_{|\mathbf{q}_i| = q} |F_1(\mathbf{q_i})|^2}\sqrt{\sum\limits_{|\mathbf{q}_i| = q} |F_2(\mathbf{q_i})|^2}}\right]$$

where $F(\mathbf {q}) = \mathcal {F}[\rho ](\mathbf {q})$. In practice, the FSC is calculated in $q$ bins which are shells of a certain thickness.

A similar correlation can also be calculated between the two half-dataset intensities. In order to increase the sensitivity of the correlation, the mean is subtracted in each resolution shell before calculating the cross-correlation i.e. a Pearson correlation coefficient is calculated in each shell independently.

(5)$$\mathrm{CC}_{1/2}(q) = \frac {\sum\limits_{|\mathbf{q}_i| = q} \left(\mathrm{I}_1 - \overline{\mathrm{I}_1}\right) \left(\mathrm{I}_2 - \overline{\mathrm{I}_2}\right)} {\sqrt{\sum\limits_{|\mathbf{q}_i| = q} \left(\mathrm{I}_1 - \overline{\mathrm{I}_1}\right)^2} \sqrt{\sum\limits_{|\mathbf{q}_i| = q} \left(\mathrm{I}_2 - \overline{\mathrm{I}_2}\right)^2}}$$

where $\mathrm {I}_k$ is shorthand for $\mathrm {I}_k(\mathbf {q}_i)$ and $\overline {\mathrm {I}_k}$ is the mean intensity in the resolution shell $\overline {\mathrm {I}_k(q)}$. The increased sensitivity due to subtracting the mean is most apparent when there is spherically symmetric background in the intensity reconstruction, as is the case here.

4.2 Phase retrieval transfer function (PRTF)

The other metric is the phase retrieval transfer function (PRTF) [19]. This metric measures the reliability of iterative phasing by (in effect) averaging complex values over may instances of the phasing process.

The first step in the calculation of this metric is to reconstruct a large number of independent density volumes from different random starting guesses. At any given reciprocal-space voxel, $\mathbf {q}$, the argument of the complex Fourier transform of the density (the phase) can be slightly different in each random start. The value of the PRTF at that voxel is the complex sum of the unit complex numbers whose argument is the phase, $\phi$:

(6)$$\mathrm{PRTF}(\mathbf{q}) = \frac{1}{N}\left|\sum\limits_{n=1}^N e^{i\phi_n}\right|$$

where there are $N$ independent density volumes. By convention, the azimuthal average of the PRTF is reported as a function of the radial coordinate $\left |\mathbf {q}\right |$. As described in Sec. 3.2, the different reconstructions must be aligned in real-space before calculating the average. A shift in real space is equivalent to a phase ramp which will significantly lower the PRTF. An uncorrected central inversion will negate the phase, leading to a similar reduction [20].

One weakness of the PRTF is that it can be unjustifiably high if the support volume is chosen to be too small. As an extreme case, if the support consists of only one voxel, the PRTF (after alignment) will be unity everywhere even though the reconstruction is very poor. One should therefore have a slightly larger support mask which includes some voxels with low density. In the reconstructions performed here, the support volume (2000 voxels) is significantly larger than the nominal volume of a regular icosahedron with a size corresponding to the fringe spacing (which would be 1497 voxels).

We calculate the PRTF from 400 independent reconstructions. This number is important because it needs to be large enough for the PRTF to converge and the voxels with irreproducible phases to average down. Consider for example the case where the phases are completely random, in which case the sum is a 2D random walk in the complex plane with a fixed step size which has an average distance from the origin of $\sqrt {N}$ after $N$ steps. Thus, the expected lower bound on the PRTF if $N$ reconstructions are averaged is $1/\sqrt {N}$, which is 0.05 for the case of 400 the case here. In keeping with convention, the threshold value to determine the reproducible resolution is considered to be $1/e = 0.37$.

4.3 Metrics applied to full data reconstruction

We applied the metrics defined above to the reconstructed intensity and electron density calculated using the procedure described in Sec. 3. For the FSC and CC$_{1/2}$ calculations, frames were split into and odd and even halves containing the 1st, 3rd, 5th$\ldots$ and 2nd, 4th, 6th $\ldots$ patterns respectively. This procedure of splitting is chosen in order for both halves to be similarly affected by slowly varying drifts in the experiment. It is also sufficiently random because the “hits" themselves are a random subset of all the patterns collected.

The FSC and CC$_{1/2}$ plots are shown in Fig. 2. The crystallographic definition of $q$ is used with the full-period resolution, $d = 1/q$. Each of the metrics gives a slightly different estimate of the resolution of the reconstruction. from the half-bit FSC criterion standard common in cryo-electron microscopy [21], the resolution is 8.75 nm, while using the CC$_{1/2} = 0.5$ cutoff, the intensities are reproducibly reconstructed to a resolution of 9.02 nm. The purely phasing metric, PRTF, suggests that the resolution is 10.9 nm for both the even and odd data sets. The oscillations apparent in the PRTF plot, which manifest from fringe intensities in the data, further reveal how resolution determined by the PRTF metric can be dramatically affected by whether or not values in one of the local minima happen to lie above or below the 0.37 threshold value. That the resolution estimates differ is not surprising given that different quantities are being measured, and suggests that one should be cautious when reporting a single resolution number. The difference between values further suggests being very conservative with the precision to which resolution is quoted in publication: the mean resolution estimated above is 9.5 nm with a standard deviation of 1.1 nm, in which case quoting resolution to three significant figures is certainly not appropriate. One should further be careful comparing resolution between publications to make sure that the same values are being compared.

Fig. 2. Reconstruction metrics for the full data set as a function of $q$. Top: Fourier Shell Correlation (FSC) plot with the dashed line showing the half-bit threshold. Middle: Intensity CC$_{1/2}$ plot with the dashed line showing the 0.5 cutoff. Bottom: Phase Retrieval Transfer Function (PRTF) plot with the customary $1/e$ cutoff. Error bars represent the standard deviation across 10 random starts.

Download Full Size | PDF

5. Results

We now turn our attention to the effect of reducing the amount of data on reconstruction quality using the analysis pipeline described in Section 3. Data quantity is reduced in one of two ways. Diffraction patterns can be made weaker to simulate the effect of imaging smaller particles or the effect of a lower intensity X-ray beam. This has two effects: firstly orientation determination is expected to become harder as there is less information in each pattern from which to determine the orientation, and secondly the signal-to-noise ratio of the reconstructed 3D intensities is reduced making phase retrieval more challenging. Alternatively, the number of diffraction patterns can be reduced to simulate the effect or a smaller data set consisting of fewer diffraction patterns of the same signal strength. Computationally reducing the data in this way avoids confounding factors from working with different data sets collected at different times under potentially different experimental conditions.

5.1 Reducing diffraction pattern intensity

To simulate measurement of weaker diffraction patterns we computationally reduced the number of photons in each image to produce diffraction patterns with fewer photons drawn from the same experimental data sets. Reducing the number of photons in each diffraction pattern was done by applying a Bernoulli process to each photon with a certain probability to keep or discard the photon. These selection fractions, $p$, were reduced from $2^{-1}$ to $2^{-10}$ in steps of powers of two. Due to the Poisson nature of the photon counting statistics, this simulates the effect of a factor $p$ weaker incident pulse. The effect of applying this process to a particular diffraction pattern is shown in Fig. 3. The average number of photons per frame after photon dilution is shown in Table 1, from which it can be seen that photon counts per frame decreases from nearly 35,000 photons per frame at full strength to only 33 photons per frame when diluted to 1/1024 strength.

Fig. 3. Four versions of the same diffraction pattern showing the reduction of photons/pattern by a given selection probability, $p$. In each case, the color scale maximizes at 4 photons per pixel. (a) Original pattern (b) $p=1/4$ (c) $p=1/16$ (d) $p=1/256$

Download Full Size | PDF

Table 1. Data statistics as a function of selection fraction. The photons per frame described in the second column refers to photons outside the central speckle. The last three columns give the resolution in nanometers according to the standard cutoff criteria for the respective metric.

View Table

Reconstruction of the 3D intensity from weakened data was performed in the same manner as previously described for all data sets using identical Dragonfly reconstruction parameters for all data sets except for the schedule of the deterministic annealing parameter $\beta$. A low value of $\beta$ was not necessary when the signal level was low since this parameter acts to solve convergence issues for very high signals by broadening the PDOs. Appendix A contains details of the parameters for each subset. The 3D intensities from Dragonfly were phased with identical parameters in every case to generate electron densities. Each reduced data set was split into two halves and independently reconstructed in order to calculate the “gold-standard" FSC and CC$_{1/2}$, and this whole process was repeated 10 times to obtain error bars on the metrics.

The results of reducing signal strength are summarized in Fig. 4. In Fig. 4(a) we plot one metric, CC$_{1/2}$, as a function of $q$ for both the full data set and a selection fraction of $p=2^{-8}=1/256$. Figure 4(a) shows that the reconstruction from the reduced data shows a slightly decreased quality metric compared to the full data set.

Fig. 4. Dependency of reconstruction metrics on selection fraction. (a) Plots of CC$_{1/2}$ vs $q$ for the full data set and for a selection fraction of $p=2^{-8}=1/256$. Error bars represent the standard deviation across 10 different random half datasets and the dashed line represents the CC$_{1/2}=0.5$ cutoff. To represent dependence on selection fraction, we plot the metric in grayscale versus both selection fraction and $q$ in panels (b)-(d) with the color representing the metric value. (b) CC$_{1/2}$, the green dashed line shows the $q$ for the CC$_{1/2} = 0.5$ cutoff; (c) PRTF, dashed line shows the typical PRTF$=1/e$ cutoff, and (d) FSC, where the metric never went below the standard half-bit criterion. Each plot is the average of 10 random subsets.

Download Full Size | PDF

In order to summarise the results as a function of resolution for many different photon dilution levels, in Figs. 4(b)–4(d) we plot each metric in grayscale versus both selection fraction and $q$, where color represents the metric value. The green dashed line in Fig. 4(b) marks the somewhat arbitrarily chosen CC$_{1/2} = 0.5$ cutoff, and shows how the resolution of the intensity reconstruction becomes progressively worse as $p$ is reduced. One cause of this reduction is just the graininess of the reconstruction due to insufficient total signal. Similarly the green line in Fig. 4(c) represents the the typical PRTF$=1/e$ cutoff. The step decrease in resolution shown by the PRTF in Fig. 4(c) occurs when the overall PRTF decreases to the point where the next local minima falls below cutoff threshold, Fig. 2. The resolution estimated by each metric is tabulated in Table 1.

From the metrics alone one immediately notices that the electron densities do not suffer from such a drastic falloff in resolution at very low signal. In effect, the support constraint during phasing restores the smoothness of the speckles even when the total number of photons per 3D speckle (Shannon voxel) is low, partially negating the effect of insufficient total signal. For the highest photon dilution ($p=1/1024$), the average signal level used to determine the orientations is just 33.9 photons/frame.

We also studied the effect of reducing data on the histogram of electron density values retrieved in real space. Figure 5 shows the histogram of electron densities inside the support mask for three different selection fractions. The plots are averaged over the 20 phasing runs for each fraction (10 random subsets and two halves per subset). The histograms clearly show the degradation in quality as signals are reduced, with the average reconstructed particle tending towards a uniform icosahedral blob with no internal structure. Additionally, the presence of the low density voxels is reassurance that the support was not too tight and the calculated PRTF not artificially high. For selection fractions above $1/32$, the histograms and densities were nearly identical, and are hence not shown for clarity. The difference in electron density histograms suggests that differences in the real space electron density may not be entirely reflected in all of the reconstruction metrics, and that metric cutoff values used to assess resolution may on their own paint a partial picture of reconstruction quality.

Fig. 5. (a) Histogram of reconstructed electron densities for three different selection fractions. The voxels with low densities are present because the support is slightly larger than the particle. At higher photon counts, one can see a separation between the higher densities in the core of the virus compared to the capsid shell. This distinction disappears at the very low signal levels corresponding to $p=1/1024$. (b) Slices through representative electron densities with the same selection fractions. One can see the gradual disappearance of the double-shell structure with reducing fraction.

Download Full Size | PDF

5.2 Reducing number of patterns

An alternative method of reducing the total number of measured photons is be to select a random subset of full intensity diffraction patterns. By this method one approaches the limit of a few bright patterns.

From the total number of 14 772 , 10 random subsets were generated with 8192, 4096, 2048, 1024 and 512 patterns respectively. Each of these subsets was split into two halves (the even and odd patterns) and independently reconstructed. The CC$_{1/2}$ plots for the intensity reconstructions for each of the subsets is shown in Fig. 6. Using this approach the metrics remain largely unaffected provided more than 2048 patterns in total are used (1024 in each half data set), indicating that the reconstruction was very stable and supports the hypothesis that there was more than enough data for this resolution. However, with 1024 frames (512 frames in each half), the reconstruction failed 4 out of the 20 times. What happens in this case is that if the number of patterns is reduced too much, they do not fill the 3D reciprocal space volume, leading to artifacts in orientation determination. Since a unique assignment of orientation for just 512 patterns would be insufficient to fully populate reciprocal space, the reconstruction only succeeds due to the PDOs being broad when $\beta$ is low. Even so, there are times when the 3D intensity collapses into a single, or a few planes: orientation determination effectively fails and all frames are assigned to one or a few orientations. Fortunately, this failure mode is easy to identify and exclude from averaging. The failed reconstructions have been retained in this work for the sake of completeness. Other algorithms which use additional constraints on the intensity, from a restricted real-space support, or from additional point-group symmetries, may have better performance in this limit of a few very bright patterns.

Fig. 6. Intensity CC$_{1/2}$ vs $q$ plots as a function of number of frames in the data set. Like in Fig. 4, each column represents a plot or a different number of frames.

Download Full Size | PDF

6. Discussion

By sub-sampling the experimental data from PR772 viruses measured in [9], we show that the reconstruction quality is essentially same as from the full data set with as few as 135 relevant photons/pattern, corresponding to 0.087 photons/speckle at the detector corner. This approaches the limits of prior work using simulated data [1,2,10] or proof-of-principle experiments under highly controlled conditions not realistic for single particle imaging [22,23]. By way of contrast, the results here are based on data derived from experimental measurements on PR772 viruses incorporating particle variability and instrument background, demonstrating that the signal required for X-ray single particle imaging under realistic conditions is much lower than previously demonstrated especially in terms of the number of scattered photons required per frame.

From this numerical experiment we conclude that current SPI algorithms should be capable of processing experimental single particle diffraction patterns when the photon flux in the X-ray focus is 256 times smaller than currently available at LCLS for particles of the same size as PR772. Furthermore, algorithms appear to be more robust for the case of many weak hits than a small number of very strong hits. The extension of this method to smaller particles is not so direct. In order for this analysis to also hold for the case where the particle volume is reduced by the same factor, one requires that the parasitic scatter is also proportionately reduced. At higher photon energies, significantly lower background has already been achieved [8] than present in this data set. Thus, one strategy for the future direction of the field may be to move to hard X-ray instruments where one has reduced scattering cross section (factor 20 lower for 7 keV vs 1.6 keV, as was the case here) but possibly much lower background.

From this analysis we also conclude that analysis algorithms on their own are not the current limiting factor for SPI imaging. Low background data collection has already been demonstrated in the data set of [8] to 6Å resolution. Unfortunately there were insufficient hits from the entire beamtime for a reconstruction to be feasible. The work here suggests that signal levels may have been adequate had sufficient single-particle diffraction patterns been collected. This points to the need to further develop methods for introducing single particles into the X-ray focus in sufficient density to make sufficient measurements at high resolution. Indeed, this could currently be one of the main factors limiting further progress in SPI imaging. Another key conclusion is that further work is needed in the area of single particle diffraction pattern classification to achieve similar noise tolerance as orientation determination, for which the efficacy of machine learning techniques in the limit of low signal still needs to be explored. This result bodes well for the prospects of single particle flash X-ray imaging to near-atomic resolution at high repetition rate XFELs like the European XFEL and LCLS-II and may help guide future XFEL and instrument design.

Appendix A: Intensity reconstruction details

This appendix gives the detailed steps applied to reconstruct the intensity volume from the full dataset with 14 772 frames shown in Sec. 4.3. A similar procedure was used for the reduced data set reconstructions whose results are described in Sec. 5. All intensities were reconstructed using Version 1.0.4 of the Dragonfly software. The virtual powder sum from all the patterns is shown in Fig. 7(a). Figure 7(b) shows the mask used when reconstructing the intensities. The innermost pixels inside the central speckle were not used to determine the orientations because of saturation. Some other regions were completely excluded from either orientation determination or to calculate the average 3D intensity.

Fig. 7. (a) Virtual powder sum of all 14 772 patterns, shown with a logarithmic color scale. (b) Detector mask used in orientation determination and intensity reconstruction. The ‘black’ pixels were ignored completely. The ‘ochre’ pixels were used to calculate the average intensity in 3D but not to calculate the orientations. The ‘white’ pixels were used for both orientation and average intensity calculations.

Download Full Size | PDF

First, the photon converted patterns were downloaded as HDF5 files from the CXIDB. Each file contains patterns from a single experimental run. The photons were first converted to the sparse .emc format using the script h5toemc.py. The configuration file used for this reconstruction is shown in Fig. 8. The file specified by in_mask_file is provided along with the Dragonfly source code and is shown in Fig. 7(b). The make_detector.py utility was used to generate the detector file detailing which voxel was sampled by every pixel. The ewald_rad parameter sets the $q$-space size of a voxel which is defined to be 1/lambda/ewald_rad. amo86615_PR772.txt is a text file containing the names of the converted emc files from every run. 100 iterations of the EMC algorithm were performed starting from a random starting model (uniform random numbers at each voxel).

Fig. 8. Configuration file used to perform Dragonfly reconstructions with the entire data set of 14 772 patterns.

Download Full Size | PDF

For all the cases where the data set was split into two halves, the selection option was added in the [emc] section and set to odd_only and even_only for the two halves respectively. Since the intensity reconstruction is invariant to an overall rotation, the two half-data set volumes were rotationally aligned with each other using the compare utility in Dragonfly. This program maximizes the overall CC$_{1/2}$ between the two models within a radius range and also calculates the value of CC$_{1/2}$ as a function of $q$ (as shown in Fig. 4(a)).

Appendix B: $P_M$ and $P_S$ are projections

Eq. 3 for $P_{M}$ describes the rescaling of both the background and signal Fourier magnitudes by the square root of the ratio of measured to calculated intensities. The Fourier space modulus constraint requires that the calculated intensity defined in Eq. 2 equals the measured modulus $\sqrt {I_{\textrm {meas}}}$. $I_{\textrm {calc}}$ has three components at each voxel, namely the real and imaginary parts of the Fourier transform of the electron density and the background, which is allowed to vary independently. The constraint set, therefore, represents the surface of a sphere with radius equalling the measured modulus. The projection of a general point, $\{\operatorname {Re}(\mathcal {F}[\rho ]), \operatorname {Im}(\mathcal {F}[\rho ]), B\}$ to this sphere is just a rescaling of this 3-vector by the ratio of the magnitudes.

The support projection applies different operations to the two halves of the iterate. For the electron density $\rho (\mathbf {x})$, the “voxel number" constraint states that at most $N$ voxels have non-zero density. The projection to this constraint set under a Euclidean metric is just to let these $N$ voxels be the ones with the highest absolute value. Note, however, that unlike the conventional fixed support constraint, this “voxel number” constraint on $\rho$ is non-convex. For the background volume, $B(\mathbf {q})$, the constraint requires that the background be spherically symmetric. Stated another way, the voxels within the same radial bin should have the same value. The projection to this set is to replace the background magnitude by its azimuthally averaged value.

Appendix C: Iterative phasing details

This appendix contains some additional implementation details about the phase retrieval procedure described in Section 3. The code used to perform the reconstructions in this work can be found here: https://github.com/andyofmelbourne/3D-Phasing. The configuration file used is described in Fig. 9.

Fig. 9. Configuration file used to reconstruct electron density from the 3D intensity distribution for all data sets.

Download Full Size | PDF

As in the intensity reconstructions, the central speckle intensities were not found to be trustworthy and were masked out up to a radius of 6 voxels from the center. This means that during the modulus projection $P_M$, these voxels were left unmodified. In addition to this central region, a 7-voxel thick shell at the edge of the sphere of reconstructed intensities was also masked out in order to avoid ringing artifacts due to truncating half a speckle.

As mentioned in Section 3.2, the reconstruction from the different random starting guesses need to be aligned with respect to each other before averaging and calculating the PRTF. This is done in three steps, first by translating the volumes such that the center of mass of each of them is at the origin. Second, since the objects are assumed to be complex-valued in general, a global phase is removed by subtracting the mean phase over all voxels. Finally, in order to remove a central inversion uncertainty, one solution (for convenience, the first) is taken as the reference For each of the other solutions, the error with respect to the reference for both the original and the center-inverted version is calculated and the one with lower error is retained.

Funding

Basic Energy Sciences (DE-AC02-76SF00515); National Science Foundation BioXFEL award (STC-1231306); Helmholtz Association; Australian Research Council; European Research Council (ERC-2013-SyG 609920); Human Frontier Science Program (RGP0010/2017); Exzellenzclusters Entzündungsforschung (EXC 2056 - project ID 390715994).

Acknowledgments

We wish to thank the members of the Single Particle Imaging initiative at LCLS who provided valuable feedback regarding this work, such as Ivan Vartanyants, John Spence and Max Rose.

Disclosures

The authors declare no conflict of interest.

References

1. R. Neutze, R. Wouts, D. van der Spoel, E. Weckert, and J. Hajdu, “Potential for biomolecular imaging with femtosecond X-ray pulses,” Nature 406(6797), 752–757 (2000). [CrossRef]

2. K. Ayyer, T.-Y. Lan, V. Elser, and N. D. Loh, “Dragonfly: an implementation of the expand–maximize–compress algorithm for single-particle imaging,” J. Appl. Crystallogr. 49(4), 1320–1335 (2016). [CrossRef]

3. N. D. Loh, M. J. Bogan, V. Elser, A. Barty, S. Boutet, S. Bajt, J. Hajdu, T. Ekeberg, F. R. N. C. Maia, J. Schulz, M. M. Seibert, B. Iwan, N. Timneanu, S. Marchesini, I. Schlichting, R. L. Shoeman, L. Lomb, M. Frank, M. Liang, and H. N. Chapman, “Cryptotomography: Reconstructing 3d fourier intensities from randomly oriented single-shot diffraction patterns,” Phys. Rev. Lett. 104(22), 225501 (2010). [CrossRef]

4. S. Kassemeyer, A. Jafarpour, L. Lomb, J. Steinbrener, A. V. Martin, and I. Schlichting, “Optimal mapping of x-ray laser diffraction patterns into three dimensions using routing algorithms,” Phys. Rev. E 88(4), 042710 (2013). [CrossRef]

5. T. Ekeberg, M. Svenda, C. Abergel, F. R. N. C. Maia, V. Seltzer, J.-M. Claverie, M. Hantke, O. Jönsson, C. Nettelblad, G. van der Schot, M. Liang, D. P. DePonte, A. Barty, M. M. Seibert, B. Iwan, I. Andersson, N. D. Loh, A. V. Martin, H. Chapman, C. Bostedt, J. D. Bozek, K. R. Ferguson, J. Krzywinski, S. W. Epp, D. Rolles, A. Rudenko, R. Hartmann, N. Kimmel, and J. Hajdu, “Three-dimensional reconstruction of the giant mimivirus particle with an x-ray free-electron laser,” Phys. Rev. Lett. 114(9), 098102 (2015). [CrossRef]

6. A. Aquila and A. Barty, “Single Molecule Imaging Using X-ray Free Electron Lasers,” in X-ray Free Electron Lasers, (Springer, 2018).

7. A. Aquila, A. Barty, C. Bostedt, S. Boutet, G. Carini, D. DePonte, P. Drell, S. Doniach, K. Downing, T. Earnest, H. Elmlund, V. Elser, M. Gahr, J. Hajdu, J. Hastings, S. Hau-Riege, Z. Huang, E. Lattman, F. Maia, S. Marchesini, A. Ourmazd, C. Pellegrini, R. Santra, I. Schlichting, C. Schroer, J. Spence, I. Vartanyants, S. Wakatsuki, W. Weis, and G. Williams, “The linac coherent light source single particle imaging road map,” Struct. Dyn. 2(4), 041701 (2015). [CrossRef]

8. A. Munke, J. Andreasson, A. Aquila, S. Awel, K. Ayyer, A. Barty, R. J. Bean, P. Berntsen, J. Bielecki, S. Boutet, M. Bucher, H. N. Chapman, B. J. Daurer, H. DeMirci, V. Elser, P. Fromme, J. Hajdu, M. F. Hantke, A. Higashiura, B. G. Hogue, A. Hosseinizadeh, Y. Kim, R. A. Kirian, H. K. N. Reddy, T.-Y. Lan, D. S. D. Larsson, H. Liu, N. D. Loh, F. R. N. C. Maia, A. P. Mancuso, K. Mühlig, A. Nakagawa, D. Nam, G. Nelson, C. Nettelblad, K. Okamoto, A. Ourmazd, M. Rose, G. van der Schot, P. Schwander, M. M. Seibert, J. A. Sellberg, R. G. Sierra, C. Song, M. Svenda, N. Timneanu, I. A. Vartanyants, D. Westphal, M. O. Wiedorn, G. J. Williams, P. L. Xavier, C. H. Yoon, and J. Zook, “Coherent diffraction of single rice dwarf virus particles using hard x-rays at the linac coherent light source,” Sci. Data 3(1), 160064 (2016). [CrossRef]

9. H. K. Reddy, C. H. Yoon, A. Aquila, S. Awel, K. Ayyer, A. Barty, P. Berntsen, J. Bielecki, S. Bobkov, M. Bucher, G. A. Carini, S. Carron, C. Henry, D. Benedikt, H. DeMirci, T. Ekeberg, P. Fromme, J. Hajdu, M. F. Hantke, P. Hart, B. G. Hogue, A. Hosseinizadeh, Y. Kim, R. A. Kirian, R. P. Kurta, D. S. Larsson, N. Loh, F. R. N. C. Maia, A. P. Mancuso, K. Mahlig, A. Munke, D. Nam, C. Nettelblad, A. Ourmazd, M. Rose, P. Schwander, M. Seibert, J. A. Sellberg, C. Song, J. C. Spence, M. Svenda, G. Van der Schot, I. A. Vartanyants, G. J. Williams, and P. Xavier, “Coherent soft x-ray diffraction imaging of coliphage pr772 at the linac coherent light source,” Sci. Data 4(1), 170079 (2017). [CrossRef]

10. N.-T. D. Loh and V. Elser, “Reconstruction algorithm for single-particle diffraction imaging experiments,” Phys. Rev. E 80(2), 026705 (2009). [CrossRef]

11. K. R. Ferguson, M. Bucher, J. D. Bozek, S. Carron, J.-C. Castagna, R. Coffee, G. I. Curiel, M. Holmes, J. Krzywinski, M. Messerschmidt, M. Minitti, A. Mitra, S. Moeller, P. Noonan, T. Osipov, S. Schorb, M. Swiggers, A. Wallace, J. Yin, and C. Bostedt, “The atomic, molecular and optical science instrument at the linac coherent light source,” J. Synchrotron Radiat. 22(3), 492–497 (2015). [CrossRef]

12. F. R. Maia, “The coherent x-ray imaging data bank,” Nat. Methods 9(9), 854–855 (2012). [CrossRef]

13. C. H. Yoon, P. Schwander, C. Abergel, I. Andersson, J. Andreasson, A. Aquila, S. Bajt, M. Barthelmess, A. Barty, M. J. Bogan, C. Bostedt, J. Bozek, H. N. Chapman, J.-M. Claverie, N. Coppola, D. P. DePonte, T. Ekeberg, S. W. Epp, B. Erk, H. Fleckenstein, L. Foucar, H. Graafsma, L. Gumprecht, J. Hajdu, C. Y. Hampton, A. Hartmann, E. Hartmann, R. Hartmann, G. Hauser, H. Hirsemann, P. Holl, S. Kassemeyer, N. Kimmel, M. Kiskinova, M. Liang, N.-T. D. Loh, L. Lomb, F. R. N. C. Maia, A. V. Martin, K. Nass, E. Pedersoli, C. Reich, D. Rolles, B. Rudek, A. Rudenko, I. Schlichting, J. Schulz, M. Seibert, V. Seltzer, R. L. Shoeman, R. G. Sierra, H. Soltau, D. Starodub, J. Steinbrener, G. Stier, L. Strüder, M. Svenda, J. Ullrich, G. Weidenspointner, T. A. White, C. Wunderer, and A. Ourmazd, “Unsupervised classification of single-particle x-ray diffraction snapshots by spectral clustering,” Opt. Express 19(17), 16542–16549 (2011). [CrossRef]

14. V. Elser, “Phase retrieval by iterated projections,” J. Opt. Soc. Am. A 20(1), 40–55 (2003). [CrossRef]

15. J. R. Fienup, “Reconstruction of an object from the modulus of its fourier transform,” Opt. Lett. 3(1), 27–29 (1978). [CrossRef]

16. R. P. Kurta, J. J. Donatelli, C. H. Yoon, P. Berntsen, J. Bielecki, B. J. Daurer, H. DeMirci, P. Fromme, M. F. Hantke, F. R. N. C. Maia, A. Munke, C. Nettelblad, K. Pande, H. K. N. Reddy, J. A. Sellberg, R. G. Sierra, M. Svenda, G. van der Schot, I. A. Vartanyants, G. J. Williams, P. L. Xavier, A. Aquila, P. H. Zwart, and A. P. Mancuso, “Correlations in scattered x-ray laser pulses reveal nanoscale structural features of viruses,” Phys. Rev. Lett. 119(15), 158102 (2017). [CrossRef]

17. M. Rose, S. Bobkov, K. Ayyer, R. P. Kurta, D. Dzhigaev, Y. Y. Kim, A. J. Morgan, C. H. Yoon, D. Westphal, J. Bielecki, J. A. Sellberg, G. Williams, F. R. Maia, O. M. Yefanov, V. Ilyin, A. P. Mancuso, H. N. Chapman, B. G. Hogue, A. Aquila, A. Barty, and I. A. Vartanyants, “Single-particle imaging without symmetry constraints at an x-ray free-electron laser,” IUCrJ 5(6), 727–736 (2018). [CrossRef]

18. R. Henderson, A. Sali, M. L. Baker, B. Carragher, B. Devkota, K. H. Downing, E. H. Egelman, Z. Feng, J. Frank, N. Grigorieff, W. Jiang, S. J. Ludtke, O. Medalia, P. A. Penczek, P. B. Rosenthal, M. G. Rossmann, M. F. Schmid, G. F. Schröder, A. C. Steven, D. L. Stokes, J. D. Westbrook, W. Wriggers, H. Yang, J. Young, H. M. Berman, W. Chiu, G. J. Kleywegt, and C. L. Lawson, “Outcome of the first electron microscopy validation task force meeting,” Structure 20(2), 205–214 (2012). [CrossRef]

19. D. Shapiro, P. Thibault, T. Beetz, V. Elser, M. Howells, C. Jacobsen, J. Kirz, E. Lima, H. Miao, A. M. Neiman, and D. Sayre, “Biological imaging by soft x-ray diffraction microscopy,” Proc. Natl. Acad. Sci. 102(43), 15343–15346 (2005). [CrossRef]

20. S. Marchesini, H. N. Chapman, A. Barty, M. R. Howells, J. H. Spence, C. Cui, U. Weierstall, and A. M. Minor, “Phase aberrations in diffraction microscopy,” IPAP Conf. Series 7, 380–382 (2006).

21. M. Van Heel and M. Schatz, “Fourier shell correlation threshold criteria,” J. Struct. Biol. 151(3), 250–262 (2005). [CrossRef]

22. H. T. Philipp, K. Ayyer, M. W. Tate, V. Elser, and S. M. Gruner, “Solving structure with sparse, randomly-oriented x-ray data,” Opt. Express 20(12), 13129–13137 (2012). [CrossRef]

23. K. Ayyer, H. T. Philipp, M. W. Tate, J. L. Wierman, V. Elser, and S. M. Gruner, “Determination of crystallographic intensities from sparse data,” IUCrJ 2(1), 29–34 (2015). [CrossRef]

Fraction	ph/fr	Frames	CC $_{1 / 2}$	PRTF	FSC
$1$	34 783.2	14 772	9.02	10.19	8.75
$1 / 2$	17 349.3	14 772	9.16	9.16	8.75
$1 / 4$	8674.5	14 772	9.33	9.16	8.75
$1 / 8$	4337.3	14 772	9.50	9.33	8.75
$1 / 16$	2168.6	14 772	9.69	9.33	8.75
$1 / 32$	1084.3	14 772	9.69	9.33	8.75
$1 / 64$	542.2	14 772	11.2	9.50	8.75
$1 / 128$	271.0	14 772	11.2	9.50	8.75
$1 / 256$	135.5	14 772	11.4	10.9	8.75
$1 / 512$	67.8	14 772	11.7	10.9	8.75
$1 / 1024$	33.9	14 772	20.1	11.2	8.75

Low-signal limit of X-ray single particle diffractive imaging

Abstract

1. Introduction

2. Experiment description

3. Reconstruction procedure

3.1 Alignment: Determining the 3D reciprocal space intensity distribution

3.2 Phasing: Iterative phase retrieval with background estimation

3.3 Reconstruction from the full data set

4. Quantifying reconstruction quality

4.1 “Gold-standard” cross correlations

4.2 Phase retrieval transfer function (PRTF)

4.3 Metrics applied to full data reconstruction

5. Results

5.1 Reducing diffraction pattern intensity

5.2 Reducing number of patterns

6. Discussion

Appendix A: Intensity reconstruction details

Appendix B: $P_M$ and $P_S$ are projections

Appendix C: Iterative phasing details

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (9)

Tables (1)

Equations (6)

Optics Express