## Abstract

We evaluate the retrieval performance of the automated, unsupervised inversion algorithm, Tikhonov Advanced Regularization Algorithm (TiARA), which is used for the autonomous retrieval of microphysical parameters of anthropogenic and natural pollution particles. TiARA (version 1.0) has been developed in the past 10 years and builds on the legacy of a data-operator-controlled inversion algorithm used since 1998 for the analysis of data from multiwavelength Raman lidar. The development of TiARA has been driven by the need to analyze in (near) real time large volumes of data collected with NASA Langley Research Center’s high-spectral-resolution lidar (HSRL-2). HSRL-2 was envisioned as part of the NASA Aerosols-Clouds-Ecosystems mission in response to the National Academy of Sciences (NAS) Decadal Study mission recommendations 2007. TiARA could thus also serve as an inversion algorithm in the context of a future space-borne lidar. We summarize key properties of TiARA on the basis of simulations with monomodal logarithmic-normal particle size distributions that cover particle radii from approximately 0.05 μm to 10 μm. The real and imaginary parts of the complex refractive index cover the range from non-absorbing to highly light-absorbing pollutants. Our simulations include up to 25% measurement uncertainty. The goal of our study is to provide guidance with respect to technical features of future space-borne lidars, if such lidars will be used for retrievals of microphysical data products, absorption coefficients, and single-scattering albedo. We investigate the impact of two different measurement-error models on the quality of the data products. We also obtain for the first time, to the best of our knowledge, a statistical view on systematic and statistical uncertainties, if a large volume of data is processed. Effective radius is retrieved to 50% accuracy for 58% of cases with an imaginary part up to $0.01i$ and up to 100% of cases with an imaginary part of $0.05i$. Similarly, volume concentration, surface-area concentration, and number concentrations are retrieved to 50% accuracy in 56%–100% of cases, 99%–100% of cases, and 54%–87% of cases, respectively, depending on the imaginary part. The numbers represent measurement uncertainties of up to 15%. If we target 20% retrieval accuracy, the numbers of cases that fall within that threshold are 36%–76% for effective radius, 36%–73% for volume concentration, 98%–100% for surface-area concentration, and 37%–61% for number concentration. That range of numbers again represents a spread in results for different values of the imaginary part. At present, we obtain an accuracy of (on average) 0.1 for the real part. A case study from the ORCALES field campaign is used to illustrate data products obtained with TiARA.

Published by The Optical Society under the terms of the Creative Commons Attribution 4.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

## 1. INTRODUCTION

NASA’s Aerosol-Clouds-Ecosystem (ACE) mission (https://acemission.gsfc.nasa.gov/whitepapers.html) called for the development of active and passive remote sensors for investigating climate forcing by aerosol particles and clouds. One of the envisaged instruments is a multiwavelength high-spectral-resolution lidar (HSRL), which in its most sophisticated configuration would be a system that provides profiles of particle backscatter coefficients and particle linear-depolarization ratios at 355 nm, 532 nm, and 1064 nm, and extinction coefficients at 355 nm and 532 nm. NASA Langley Research Center has been developing two prototype HSRL instruments, dubbed HSRL-1 and HSRL-2, since 2000. These instruments are flown on aircraft and serve as research instruments and testing platforms for a potential space-borne HSRL.

In this contribution, we present a summary of the performance characteristics of the Tikhonov Advanced Regularization Algorithm (TiARA). To our knowledge, TiARA is the worldwide first autonomous algorithm that allows for the automated, unsupervised retrieval of particle size distributions (PSDs) and their microphysical particle parameters based on the analysis of multiwavelength HSRL data. TiARA works with inversion with Tikhonov’s regularization [1]. TiARA allows us to infer uncertainty bars (in terms of bias and noise) from measurements of atmospheric aerosols with HSRL-2. In this first stage of the algorithm development, we have focused on optimizing TiARA for the case of spherical particle geometry, mainly for the lack of a light-scattering model that allows for a trustworthy description of light-scattering at 180° of non-spherical particles. Thus, we analyzed backscatter coefficients only at 355 nm, 532 nm, and 1064 nm, and extinction coefficients at 355 nm and 532 nm, which we denote as “$3\beta +2\alpha $” or “3 + 2” configuration.

The software has been optimized for fast data processing to achieve real-time data analysis with regard to the duration of lidar measurements. The exact data processing times depend on the number of profiles and the number of individual points within a profile. Number of profiles and data points depend on the specifications of the used laser system, e.g., laser power and laser-pulse repetition rate, which influence signal-to-noise ratios of the input optical data, as well as other factors such as lidar return-signal averaging times and height of the observed atmospheric column.

TiARA has been developed on the basis of our experience we have collected since 1997 [2] with our data-operator-controlled (manual) version of the inversion algorithm, e.g., [3–7]. Additional ideas and concepts were presented in, e.g., [8–10]. The initial main driver of developments of multiwavelength Raman lidar and lidar-data inversion methodology was the European Aerosol Research LIdar NETwork (EARLINET) [11] since the late 1990s. Data-analysis concepts were developed through working with manually operated inversion algorithms. These concepts have been included and automated in TiARA. Additional concepts had to be developed for TiARA to ensure reliable data products of high quality.

The general result of simulation studies carried out with the data-operator version of the inversion algorithm pointed towards 20% as the threshold for acceptable measurement uncertainty for retrieving meaningful values of particle effective radius, volume concentration, and surface-area concentration [8–10,12]. *Meaningful result* in that regard was defined as values that deviate less than 50% from the true values. Time-consuming manual analysis of the inversion results allowed us to identify and remove outliers in the final solution spaces. We present for the first time a detailed analysis of the retrieval performance with regard to number concentration.

This paper focuses on the main performance features of TiARA based on an extensive study of results obtained with synthetic optical input data. We obtain, in part, significantly improved performance compared to the data-operator-controlled inversion algorithm in the case of PSD size parameters. Retrievals of high-quality complex refractive indices remain a challenge and will require more sophisticated approaches. First steps in that direction have recently been published [13–16]. We discuss the concept of uncertainty analysis we developed in the context of TiARA, which includes separating bias (accuracy) and statistical noise (precision) of the inferred data products.

We further note that first tests of the performance of TiARA with regard to experimental data focused on comparisons to airborne measurements of the same data products delivered by *in situ* instruments during two major field activities, i.e., the Two Column Aerosol Project [TCAP; https://www.arm.gov/research/campaigns/amf2012tcap] and Deriving Information on Surface conditions from Column and Vertically Resolved Observations Relevant to Air Quality [(DISCOVER-AQ; https://discover-aq.larc.nasa.gov/]. Publications by [17,18] discuss in detail the quality of the data products of TiARA and point to the (in part) excellent performance characteristics of TiARA with regard to parameters of particle size distributions, i.e., effective radius and volume and surface-area concentrations.

In Section 2, we present the theoretical background of TiARA. Results of simulations with synthetic data show the main features of TiARA in Section 3. In Section 4, we show a measurement example taken with HSRL-2. We summarize the main results in Section 5. The Appendix provides additional background information on the simulations.

## 2. INVERSION ALGORITHM

We summarize the general theory of the inversion methodology. We have published numerous papers on the manual inversion in the past 20 years. A summary of that work can be found in Ref. [19]. In the following text, we introduce a coherent nomenclature that will be used in future publications. We explain the new features that were introduced in the autonomous inversion, e.g., optimized look-up tables, fast data processing, and tools that are used in uncertainty analysis.

#### A. Theory

We solve Fredholm integral equations of the first kind [8,10,19,20] in the inversion of the multiwavelength HSRL and multiwavelength Raman lidar data. In generalized terms, these equations can be written as

The terms ${K}_{\beta}(m,r,\lambda ,s)$ and ${K}_{\alpha}(m,r,\lambda ,s)$ describe the backscatter and extinction kernel functions, respectively, which depend on the particles’ complex refractive index (CRI) $m={m}_{\mathrm{R}}-i\xb7{m}_{\mathrm{I}}$ (${m}_{\mathrm{R}}$ denotes the real part, $i\xb7{m}_{\mathrm{I}}$ denotes the imaginary part), on the radius $r$ of the particles, as well as on their shape properties $s$. In the following discussion, we consider only spherical particle geometry. The kernel functions ${K}_{l}(m,r,{\lambda}_{i},s)$ can be calculated from Mie-scattering theory [21]. The term ${f}^{*}(r)$ describes the true particle size distribution expressed as the number of particles per unit volume between particle radius $r$ and $r+\mathrm{d}r$. For easier reading, the reference to the shape property $s$ will be omitted, as we do not need to introduce a numerical value for it.

The modified version of Tikhonov’s inversion algorithm is explained in detail by [8,10,12,19,20,22]. Briefly, Eq. (1) can be rewritten in the following form:

Equation (2) has to be solved numerically [1,23–25] for the investigated size distribution ${f}^{*}(r)$. The expression ${f}^{*}(r)$ is approximated by a sum that consists of the following linear superposition of base functions:

The term $f(r)$ is an approximation of the solution of Eq. (2). The expression ${f}_{j}$ describes the constants or weight coefficients, and ${B}_{j}(r)$ denotes base functions that, in our case, have a triangular shape on a logarithmic radius scale [12,20]. These functions have shown to work sufficiently well in approximating PSDs of monomodal and bimodal shapes [9,10,12]. ${\mathrm{N}}_{\mathrm{B}}$ is the number of base functions. In our algorithm, ${\mathrm{N}}_{\mathrm{B}}$ is always equal to or exceeds the number of optical data. A different concept is followed by [8,9], who prefer to keep the number of base functions nearly equal to the number of input optical data.Figure 1 shows the concept of the reconstruction of the investigated PSDs. Details are given by [12,20]. Briefly, we use base functions of triangular shape that are distributed equidistant on a logarithmic radius grid between 10 nm and (in its latest version) 20 μm. The values ${f}_{j}$ are the weight factors that we obtain in Eq. (3). The weight factors allow us to reconstruct the true PSD in an approximate way. The differences in the reconstruction quality depend on a large number of factors, e.g., measurement uncertainties; if the true PSD is mono- or bimodal; if the main portion (maximum) of the PSD is located in the size range of Aitken mode particles (radius range below 50 nm) or accumulation mode particles (radius range approximately between 50 nm and 500 nm) or coarse mode (particle radius is larger than 500 nm); the fact that the integration limits (minimum and maximum particle radii) are not known; the possibility to reconstruct particle size distributions with base functions of triangular shape; the number of base functions used in the reconstruction process; or the value of the complex refractive index of the particles and whether it depends on particle size and/or measurement wavelength.

The solution carries the mathematical residual error $\u03f5(r)$, which is caused by the discretization of the true PSD with the chosen linear combination of base functions.

Using Eqs. (2) and (3), we can write the optical data as linear combination, i.e.,

The expressions ${A}_{pj}$ and ${\u03f5}_{p}$ are calculated from the kernel functions, base functions, and mathematical residual errors as andIn the following transformations, it is convenient to rewrite Eq. (7) as the vector–matrix equation

The unknown vector of the weight coefficients $\mathbf{f}$ is connected by the matrix operator $\mathbf{A}$ with the optical data vector $\mathbf{g}$. This vector contains the error vector $\mathrm{\Delta}\mathbf{g}$, which includes in an additive way the measurement error and the mathematical residual errors.The operator $\mathbf{A}$ is nearly singular (we are dealing with an ill-conditioned operator); see, e.g., [26]. If we denote the transposed of matrix $\mathbf{A}$ as ${\mathbf{A}}^{\mathrm{T}}$ and ${({\mathbf{A}}^{\mathrm{T}}\mathbf{A})}^{-1}$ denotes the inverse of matrix ${\mathbf{A}}^{\mathrm{T}}\mathbf{A}$, we obtain the simple solution of Eq. (8) for the weight factors, i.e.,

Equation (9) leads to nonphysical or unstable solutions [26] because the mathematical problem is ill posed. For that reason, we consider the following equation instead of Eq. (8) [1,26]:The matrix $\mathbf{C}={\mathbf{A}}^{\mathrm{T}}\mathbf{A}+\gamma \mathbf{H}$ is symmetrical. We can apply diagonalization, e.g., [27] and find eigenvectors and eigenvalues.

The above-mentioned near singularity of the operator $\mathbf{A}$ means that the smallest eigenvalue of matrix ${\mathbf{A}}^{\mathrm{T}}\mathbf{A}$ is much smaller that its largest eigenvalue—10 orders of magnitude and even more. More information can be found in Ref. [28]. Regularization allows us to increase the minimum eigenvalue of matrix $\mathbf{C}$ by several orders of magnitude. In this way the condition number of matrix $\mathbf{C}$ decreases, and the solution of Eq. (10) becomes more stable than the solution of Eq. (9).

#### B. Mathematical Tools and Methods Used in the Automated Unsupervised Algorithm

Finding the solution requires the application of several constraints. These constraints stabilize the inversion problem and help us reject mathematical solutions that are physically not reasonable. We use the simplifying assumption of a wavelength- and size-independent complex refractive index of the aerosol particles. The effect of this constraint on the quality of the retrieval products will be investigated in future studies. We assume that retrieving the wavelength dependence of the CRI likely will require that we combine data from active and passive remote sensing. The application of the gliding-inversion window and the grid of *a priori* chosen CRI values are the constraints that will be explained in the following subsections.

### 1. Inversion Window and Grid of CRI

This section serves as an introduction and motivation for identifying the solution space. The methodology that allows us to find the solution space is described in detail in Section 2.D.

The rationale for using a gliding inversion window is given by [12,20]. In summary, the precise position of the investigated particle size distribution with regard to the particle size range is unknown unless supplementary information, e.g., from other particle size distribution measurements, is known. We solve this challenge by using a gliding inversion window. This window consists of the linear combination of base functions of variable positions along the investigated size range and variable width; see, e.g., [20].

Historically, we started out with 50 inversion windows [3,20]. We find that the quality of the retrieval can be improved if we use more inversion windows on a smaller radius search grid. The automated algorithm uses 91 inversion windows on the basis of an optimized databank of look-up kernel files for the processing of each 3 + 2 dataset.

Figure 2 demonstrates the simplified example of how the set of inversion windows is formed in the automated inversion algorithm.

This concept requires us to predefine two sets consisting of several minimal and maximal particle radii. The two sets we used in our simulations might have an insignificant overlap. The top part of Fig. 2 shows a qualitative example of three entries in both minimal and maximal radii sets. The given example results in nine inversion windows because each of the three minimal radii ${r}_{\mathrm{min}}$ forms a separate inversion window $[{r}_{\mathrm{min}}^{(k)},{r}_{\mathrm{max}}^{(k)}]$ with each of the three maximal radii ${r}_{\mathrm{max}}$ [see Fig. 2]. The linear size of the inversion window, i. e., the difference ${r}_{\mathrm{max}}^{(k)}-{r}_{\mathrm{min}}^{(k)}$, in our simulations is always more than 0.38 μm. We exclude the narrow inversion windows because we consider them to be nonphysical.

We discretize the CRI search space; see Fig. 2, shaded area. The real part of CRI, ${m}_{\mathrm{R}}$, ranges from 1.325 to 1.8 with step 0.025. The imaginary part, ${m}_{\mathrm{I}}$, varies from 0 to $0.1i$ with step $0.003i$. We plan to extend the CRI search grid in our future studies. The real part of CRI, ${m}_{\mathrm{R}}$, will cover the range from 1.2 to 2 with step 0.025 [see Fig. 2].

In this way, we arrive at a set of individual mathematical solutions. Each solution number $k$ within the complete set of solutions is characterized by an inversion window $[{r}_{\mathrm{min}}^{(k)},{r}_{\mathrm{max}}^{(k)}]$, by a CRI value ${m}^{(k)}={m}_{\mathrm{R}}^{(k)}-{m}_{\mathrm{I}}^{(k)}i$, by a vector ${\mathbf{f}}^{(k)}$ of weightcoefficients, and by a vector ${\mathbf{g}}^{(k)}$ of backcalculated optical data:

We calculate what we describe as the modified discrepancy vector [8] using the vectors ${\mathbf{f}}^{(k)}$ and ${\mathbf{g}}^{(k)}$. We obtain and the normalized discrepancy valueThis expression does not take account of measurement uncertainties. We expect improved retrieval results if we take account of this information, too. We are working on an improved uncertainty analysis scheme.

### 2. Optimized Look-Up Tables

The most time-consuming procedure during the microphysical parameters retrieval is the calculation of the values of the matrix $\mathbf{A}$ with the help of Eq. (5). According to our estimations, the direct computation of the light-scattering kernel functions ${K}_{p}(m,r)$ consumes more than 90% of the whole inversion time budget.

In order to reduce computation time, we improved the efficiency of our algorithm by using look-up tables (LUTs). The elements ${K}_{p}(m,r)$ of the LUTs are pre-computed for the radius range $r$ defined by the gliding inversion window and the grid of complex refractive indices.

The automated algorithm uses an upgraded version of an LUT that was used in the manual version of the algorithm [12,20], which we denote as the “optimized databank.” This optimized databank contains the pre-integrated values of the matrix $\mathbf{A}$.

The optimized LUT is based on the following numerical properties of Eq. (5):

Our three optimized LUTs were created using Eq. (15) for the number, Eq. (16) for the surface area, and Eq. (17) for the volume kernel functions [12,20]. Each LUT consists of four separate files that contain the results of the integration of the left sides of Eqs. (15)–(17) for the corresponding backscatter, extinction, absorption, and scattering kernel functions. The reference wavelength $\lambda $ is always equal to 355 nm. We split the covered range of particle radii from 10 nm to 10 μm into 71 logarithmical equidistant grid bins ${r}_{1},\dots ,{r}_{71}$. These grid bins allow us to form 69 integration intervals $[{r}_{{j}_{1}},{r}_{{j}_{1}+2}]$, where ${j}_{1}=1,\dots ,69$. Each interval has three grid bins ${r}_{{j}_{1}}$, ${r}_{{j}_{1}+1}$, and ${r}_{{j}_{1}+2}$ that are used to create a triangular base function ${b}_{{j}_{1}}(r)$ [see Fig. 3], and then carry out the integration with the corresponding kernel function following the left sides of Eqs. (15)–(17). The complex refractive index $m$ ranges in the LUTs from 1.2 to 2 with step 0.005 for the real part (${m}_{\mathrm{R}}$), and from 0 to 0.5 with step 0.0005 for the imaginary part (${m}_{\mathrm{I}}$).

For the practical purposes of data inversion, we normally form each of the ${\mathrm{N}}_{B}$ triangular base functions ${B}_{j}(r)$ [see Eqs. (4)–(5)] by using an odd number of optimized LUT base functions ${b}_{{j}_{1}}(r)$. Figure 3 shows a qualitative example where three functions ${b}_{{j}_{1}}(r)$ were used to build one triangular base function ${B}_{j}(r)$ that mathematically can be expressed as

The term ${B}_{j}(r)$ in Eqs. (5) and (18) should be treated as $\widehat{B}(r)$on the right-hand side of one of the Eqs. (15)–(17). Thus, the result of the right-side integration of any of the Eqs. (15)–(17) can be linearly expressed through the left-side values that were pre-calculated and stored in the optimized LUT. The other desired triangular base functions ${B}_{j}(r)$ can be formed using linear combinations of the LUT base functions ${b}_{1}(r),\dots ,{b}_{69}(r)$, which allows us to use the optimized LUT in a wide range of practical situations.Another difference to the original version of the LUT used by [12,20] is that we use LUTs not only for backscatter and extinction but also for absorption and scattering coefficients. Backscatter and extinction kernels are used for the inversion itself. We introduced absorption and scattering elements in order to speed up the computations of absorption and scattering coefficients, and single-scattering albedo from the retrieved microphysical parameters. The optimized data bank can be stored in the operational memory of a personal computer, which allows us to further reduce data processing time. Storage of the original LUT that is used for the manual version of the algorithm is done on a harddrive.

Another new feature of the optimized LUT is that intermediate inversion windows and/or CRIs can be used. We estimate these intermediate values from the entries of the matrices $\mathbf{A}$ by linear interpolation.

### 3. Parallelized Routines

The inversion of lidar data allows for efficient parallelization of the computation steps. We discretize the search space in terms of CRI and inversion windows, which allows us to compute a finite set of input parameters. This set of input parameters forms the set of mathematical solutions.

The analysis of Tikhonov’s regularization curve, computation of the weight coefficients of the base functions, and computation of the microphysical parameters and their uncertainty bars can be done for each individual solution independently of each other. This feature allows us to split the inversion calculations into independent computational threads, each one computing the mathematical solutions of the subset that has been selected for that specific thread.

Within one application, each thread can be launched on a separate core of a multi-core shared-memory system. According to our experience, the time of calculations for such a system decreases almost linearly with increasing number of used cores. For example, if the inversion code runs in parallel mode on a 64-core server using all available cores, the computation time decreases by a factor 63 compared to computations with a single-core system. We implemented the software on an AMD Opteron server. It has four CPUs, 64 integer cores, and 32 floating point cores, and the processing frequency is 2.8 GHz. It takes slightly less than 6 s to analyze one set of 3 + 2 data, including a full uncertainty analysis, as described in Section 2.C. For comparison, we previously used a laptop with one Intel CPU, four cores, eight threads, and 2.4 GHz processing frequency. It took a little more than 22 s for a full analysis of one 3 + 2 dataset.

Further increases of data processing speed are possible if we increase the number of cores. The speed of the microprocessors will increase in the future. We will also further improve the use of mathematical algorithms in order to increase the speed of the inversion algorithm. Our code also can be compiled and executed using standard message passing interface (MPI) libraries. The improvement of performance with the number of involved cores in this case will strongly depend on the features of the hardware of employed distributed-core systems.

Finally, we can also increase the data processing speed if we use important auxiliary information that can be provided by multiwavelength HSRL or Raman lidar. For example, aerosol typing [29] is one way of reducing the search space in data inversion, which automatically increases the data processing speed.

We achieved one of the main goals of our work set out several years ago. We are able to carry out real-time data analysis of airborne operation of HSRL-2 in the sense that there is no difference between the flight time of a research flight and the time it takes to deliver the final microphysical particle data products.

#### C. Uncertainty Analysis

### 1. General Remarks

Error computation is done in a numerical fashion, i.e., we invert the mean value of the optical dataset, and then we distort the optical data within the error bars. This procedure is time consuming and affected by conceptual uncertainties. We will explore these concepts in future work.

The errors that we show in the results section are the combination of uncertainties resulting from uncertainties of the optical data and uncertainties/errors created by the inversion algorithm. We are working on concepts that will allow us to separate the uncertainties caused by the mathematically induced errors of the inversion algorithm and the uncertainties of the optical data. Mathematically induced errors are, e.g., choice of the grid of the CRI, choice of number and position of the inversion windows, choice of shape of the base functions, assumption that the CRI can be kept wavelength and size independent in the retrievals, correct choice of the regularization parameter, and choice of regularization method. The error induced by the choice of the grid of the CRI presents a special challenge in uncertainty analysis. The CRI in itself is an unknown quantity in the data inversion scheme.

In this first version of TiARA, we analyze the uncertainty caused by selecting only those individual inversion results that are defined by the minimum of the regularization curve [12,20]. We are aware that this minimum of the regularization curve may not always provide us with the best possible inversion result of the investigated PSDs.

It is a challenge to process measurement uncertainties in data inversion in a realistic manner. Measurement uncertainties do not propagate during the inversion process in a way that allows for using an analytical expression of the effect of measurement uncertainties on the final data products. Small measurement uncertainties may lead to large uncertainties of the final data products.

We know of only a few studies in which the authors investigated the effect of uncertainties in individual measurement channels on the total uncertainties [30]. The authors found that uncertainties are additive with respect to the retrieval scheme they use. [31] used the methodology of optimal estimation to illustrate the propagation of measurement uncertainties in a 3 + 2 lidar system. The authors confirm that the mathematical system of equations is underdetermined and contains approximately four independent pieces of information. The authors further find that their retrieval procedure reduces the uncertainty with respect to the *a priori* value for all five retrieved variables. In particular, the comparison of the retrieved data products to the desired target parameters is best for the size distribution parameters and worst for the complex refractive index. [32] explored the limitations of the retrievals with regard to particle size. Preliminary results indicated significantly increased retrieval uncertainties for particle radii below $\approx 50\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{nm}$ (see Tables 1–4 in Ref. [32]). These results are strongly corroborated by the study of [31], who demonstrate that limited sensitivity to particle radii smaller than 50 nm leads to greatly increased uncertainties unless the retrieval size range is constrained to avoid them.

One of the characteristics of ill-posed inverse problems is that a small amount of uncertainties or even the absence of uncertainties in the input data can lead to large uncertainties/errors of the final data products. There remain questions of how to treat measurement uncertainties in uncertainties analysis, how the error distribution function affects the quality of the inversion results, and how big measurement uncertainties can be until the inversion results become meaningless. Studies published in mathematical literature often use unrealistically low uncertainties, i.e., input data uncertainties often do not exceed 1%.

A description of how to treat measurement uncertainties in the retrieval process and uncertainty analysis based on the optimal estimation approach is given in Ref. [33]. [31] does not carry out an optimal estimation retrieval, but the uncertainty analysis used the optimal estimation method. Table 4 in Ref. [31] shows, on the basis of results for three different sets of measurement uncertainties of different sizes, how big measurement uncertainties can be. We currently work on an improved uncertainty analysis scheme and will consider in that study also the techniques used in the Generalized Aerosol retrieval from Radiometer and LIdar Combined data (GARLIC) and LIdar-Radiometer Inversion Code (LIRIC) algorithms, e.g., [34].

The formalism that describes accuracy and precision can be written as

### 2. Error Models

We use a scheme we denote as “extreme-error model” (EEM), which has been used for the manual version of the inversion method in the past 20 years [3]. This method allows us to compute the uncertainty of the data products on the basis of a very simple assumption: if the slope of the extinction-related Ångström and/or backscatter-related Ångström exponent changes, the microphysical parameters also change, though not necessarily in a proportional way. We are aware that this extreme-error scheme comes with certain flaws. For example, it is an intuitive model in which we assume that extreme distortions of the spectra of backscatter and extinction coefficients result in proportional distortions of the solution space of microphysical parameters. This may not always be true, and we lack in a theoretical model that would help us in considering improvements. However, the model has been used with rather good success in the past years, e.g., [35], and currently is our preferred model for uncertainty analysis.

Because of this situation, we introduced for the first time another model that could be used in future uncertainty analysis. We call it the “Gauss-distributed-error model” (GEM). The uncertainty bars have a Gaussian-curve-like shape. We started exploring if such a model could provide us with a mathematically sound uncertainty analysis and even replace EEM in future versions of TiARA. Both uncertainty-analysis methods will be explained in the following.

Figure 4 is a graphical display of the two error models used in the simulation studies. Aside from the traditional way of expressing the uncertainty of the retrieval products in terms of error bars of one standard deviation, we also explored whether it is possible to split the uncertainty bars into accuracy (bias) and precision (noise). A first set of results on the accuracy of the derived parameters can be obtained from the statistical analysis of the simulation results; see Section 3.C. We will further develop this concept of accuracy and precision in future versions of TiARA.

In the case of the EEM, we used a fixed error level for each of the five measurement channels. Table 1 shows the basic principle of this method. We used in our simulations uncertainty values of 5%, 10%, 15%, 20%, and 25%; see Table 5.

Figure 4 (top) shows an example of the distortions for the case of using two extinction coefficients in the EEM. The number of combinations of the extinction input data is ${3}^{2}$. We consider the three options of correct value, maximum underestimation, and maximum overestimation for a given uncertainty, respectively. In the case of the three backscatter coefficients, ${3}^{3}$ combinations are possible. Figure 4 (bottom) shows the example of the GEM. We show the spread around the “correct” optical data point. We define the measurement uncertainty as the value $\mu $ at which this Gauss-error function reaches its peak value.

For example, run number 1 describes the situation in which the backscatter and extinction coefficients at 355 nm and 532 nm are overestimated by x%, whereas the backscatter coefficient at 1064 nm is underestimated by x%. Theoretically, except for the case of 0% error, we find 242 combinations that could be tested for EEM. However, we can remove many of the scenarios for obvious reasons.

Three of these combinations merely lead to a shift in the backscatter and extinction spectra, e.g., in the case of 5% error, there is the case of 0% shift (correct data), $+5\%$ shift for all five data points, and $-5\%$ shift for all data points. Some of the distortions are physically unreasonable, e.g., backscatter coefficients at 532 nm are lower than backscatter coefficients at 355 nm and 1064 nm, and could be identified in the optical profiles.

With regard to the EEM, we therefore used nine inversion runs. One run is for the error-free data, and eight runs are for the distorted backscatter and extinction spectra according to Table 1. We then average the results of these nine runs, which gives us the final value of the data product including uncertainy bars. Based on our experience with the manual inversion, we know that this number is at the lower limit of error runs, but still give us statistically meaningful results [36].

With regard to the GEM, the error computation is done for a set of eight distorted input data, which are within the error bounds prescribed for each measurement channel according to Gaussian distribution and the error-free dataset, i.e., we carry out nine different inversions for each optical dataset as in the case of the EEM.

We used a random number generator that produced data uncertainties according to the Gaussian-error distribution function; see Fig. 3. This was done for each measurement channel and independently of each other. The mean value of each optical data point, which is the observed value and which we denote in the context of this work as *correct value*, is located at the $\mu $-position of the Gauss distribution function. The deviation from the $\mu $-position denotes the measurement uncertainty. If we denote the measurement uncertainty as x%, this value is reached at the 3-$\sigma $ position of the error-distribution function, i.e., $\sigma =x/3$. Accordingly, the distribution function is defined by its mean value and the geometric standard deviation ${\sigma}_{\mathrm{GM}}$, which fully describes the measurement uncertainty around the exact value of the optical data point.

In our simulations, we set the one standard deviation, denoted as $1\sigma $ in Table 6, and constrain the maximum value of distortion—about three times larger than $1\sigma $ in Table 6. Following the formalism of EEM, we average the results of these nine runs, which gives us the final value of the data product including uncertainty bars.

#### D. Unsupervised Identification of the Final Solution Space

### 1. Method

The ill-posedness of the mathematical problem does not allow us to select one single solution for each of the final data products. The use of constraints on the one hand reduces the instability of the inversion problem. On the other hand, the introduction of constraints allows us to find a match to the sought microphysical solutions, but we cannot perfectly reproduce the optical data. This “conundrum” regarding the identification of the correct solution space results in a multitude of individual solutions that fulfill the prescribed constraints as well as the constraint that both, the optical input data and the microphysical data products need to be reproduced within a certain range of uncertainty. We use all these individual solutions that we obtain from the inversion for computing the mean value and the uncertainty bars.

The averaging procedure takes advantage of the fact that different solutions can have oscillations of opposite sign, and that, after averaging, the mean oscillations can become much smaller than the oscillations of individual solutions. In our case, taking the average of many “good” quasi-solutions smoothes the mean solution and gives an additional stabilization of the solution space.

The aim of the unsupervised data analysis is to collect a set of individual solutions from the complete solution space we obtain from the inversion of each optical data set. This set of selected individual solutions then is averaged and forms the basis of the final inversion results, i.e., mean value and its uncertainty. One of the challenges we are dealing with is the selection of the vicinity around the minimum discrepancy, as it defines what we denote as the *discrepancy averaging interval* ${\rho}_{\mathrm{av}}$ [8].

We used the *discrepancy averaging interval*, i.e., we used a few percentages of individual solutions of the total number of all solutions. This latter solution space is denoted as *initial solution space*, which follows from the inversion of a single dataset, i.e., all solutions we obtain for all inversion windows and the grid of refractive indices tested in the inversion. This *initial solution space* includes all error runs for each optical input dataset according to the EEM or GEM method.

The individual solutions of the initial solution space fulfill the discrepancy as described by [12] in the sense that ${\rho}^{(k)}\le {\rho}_{\mathrm{av}}=1\%$ to 5%. The exact value of discrepancy that allows us to consider the corresponding individual solution part of the final-averaged solution space depends on various factors. For example, the input (measurement) uncertainties can be used as one of the constraints that define the final solution space. We will use measurement uncertainties as additional elements of information in future versions of TiARA.

However, the use of this constraint only usually does not lead to the final solution, which is the average of individual solutions that fulfill ${\rho}^{(k)}\le {\rho}_{\mathrm{av}}$, for which we obtain a tolerable uncertainty. In fact, even a small value of the discrepancy within this acceptable value of ${\rho}_{\mathrm{av}}$ can be linked to an individual solution of microphysical parameters that does not make sense from a physical point of view and thus has to be classified as outlier.

The huge amount of optical data we obtain with airborne and space-borne lidar requires software that runs in an autonomous mode, i.e., we need routines that allow us to identify the averaging interval automatically. Autonomous mode means that parameters such as discrepancy averaging interval ${\rho}_{\mathrm{av}}$ and amount of individual solutions ${N}_{s}$ that need to be averaged are estimated automatically.

For that reason, we started out with an averaging procedure [8] and introduced several modifications in order to allow for the unsupervised data analysis. These modifications simultaneously take into account the following constraints imposed on the solution space:

- 1. The number of solutions that are averaged is ${N}_{\mathrm{S}}$. Usually ${N}_{\mathrm{S}}$ does not exceed a few percent of the total number of solutions (initial solution space) for a given optical input dataset. In our simulations, we set ${N}_{\mathrm{S}}=500$. That number is approximately 1% of the total number of solutions of the initial solution space.
- 2. The maximum discrepancy ${\delta}_{\mathrm{max}}$ is the number that defines all individual solutions with larger discrepancy and that is not considered in the averaging process. This maximum discrepancy is correlated with the measurement uncertainty of the optical input data. Usually this value does not exceed ${\delta}_{\mathrm{max}}=10\%$ to 20%.
- 3. We define threshold uncertainties of effective radius (${\delta}_{r}$) and number concentration (${\delta}_{n}$). Usually, we set these values to 25% and 100%, respectively.
These two constraints are set before the averaging procedure starts. This procedure involves the following steps:

- (a) The initial step is to find the first individual solution of the final solution space. This first solution can be found from the minimum value of the discrepancy, defined as ${\rho}^{(1)}$, as it defines the values of the parameters ${r}_{\mathrm{eff}}^{(1)}$ and ${n}_{t}^{(1)}$.
- (b) After that first step, we select the number concentration ${n}_{\mathrm{t}}^{(2)}$ and effective radius ${r}_{\mathrm{eff}}^{(2)}$ that belong to ${\delta}^{(2)}$, which is the second smallest discrepancy value.
- (c) We compare the values of ${r}_{\mathrm{eff}}^{(1)}$ and ${n}_{\mathrm{t}}^{(1)}$ and ${r}_{\mathrm{eff}}^{(2)}$ and ${n}_{\mathrm{t}}^{(2)}$. If the relative deviations of$$\Vert {r}_{\mathrm{eff}}^{(1)}-{r}_{\mathrm{eff}}^{(2)}\Vert /{r}_{\mathrm{eff}}^{(1)}\times 100\%$$and do not exceed the respective thresholds ${\delta}_{r}$ and ${\delta}_{n}$, the 1st and 2nd individual solutions of effective radius are averaged. The same is done for number concentration. If condition (21) and (22) are not fulfilled simultaneously, the 2nd individual solution is ignored.
- (d) We continue with this selection procedure. We test the $(k+1)$-th individual solution that has the discrepancy ${\rho}^{(k+1)}$. This discrepancy value is the next nearest to the value of the discrepancy ${\rho}^{(k)}$ of the $k$-th individual solution. We compare the parameters ${r}_{\mathrm{eff}}^{(k+1)}$ and ${n}_{t}^{(k+1)}$ with the averaged values of the $k$ previous individual solutions ${\overline{r}}_{\mathrm{eff}}$ and ${\overline{n}}_{t}$. These average values of effective radius and number concentration are defined as$$\overline{p}=\frac{1}{k}\sum _{j=1}^{k}{p}^{(k)},\phantom{\rule[-0.0ex]{2em}{0.0ex}}p={n}_{t},{r}_{\mathrm{eff}}.$$If the relative deviations defined by$$\Vert {\overline{r}}_{\mathrm{eff}}-{r}_{\mathrm{eff}}^{(k+1)}\Vert /{\overline{r}}_{\mathrm{eff}}\times 100\%$$and do not exceed the respective uncertainties ${\delta}_{r}$ and ${\delta}_{n}$ the 1st, 2nd, …, $k$-th, and $(k+1)$-th individual solutions are averaged. The same is done for the effective radii and number concentrations that belong to these values of $\delta $. If the relative deviations as described by (24) and (25) are not fulfilled, the $(k+1)$-th individual solution is ignored.
- (e) This selection process is continued as long as the number of averaged solutions is less than ${N}_{\mathrm{s}}$ or the discrepancy of the next individual solution is ${\rho}^{(k)}\le {\delta}_{\mathrm{max}}$.
- (f) After these routines are carried out, we obtain a subset of the individual solutions. This subset is constrained by the number ${N}_{\mathrm{s}}$ and/or the maximum discrepancy ${\delta}_{\mathrm{max}}$. The uncertainty (scatter) of number concentration and effective radius that arises from the averaged values is limited by ${\delta}_{n}$ and ${\delta}_{r}$, respectively. Furthermore, we take account of additional constraints, i.e., the radius range (${r}_{\mathrm{min}}$, ${r}_{\mathrm{max}}$), that we want to use for the averaging procedure; we check the lidar ratios, the Ångström exponents, and the spectral slopes of these parameters that follow from the individual solutions we want to include in the averaging procedure. Additional constraints can be set (or extracted, obtained) using
*a priori*information about aerosol types, e.g., [29,37,38].

### 2. Unsupervised, Automated Data Inversion: Summary

Figure 5 shows the flowchart of the inversion software. The methods were described in the previous sections. The numbers in circles in the flowchart refer to text in Appendix A.

Table 2 summarizes the data products we present in this contribution. Additional data products that can be derived with TiARA are listed in Table 4.

## 3. TIARA: SENSITIVITY STUDY

#### A. Preparation of Simulation Data

In this section, we present a summary of the first sensitivity study carried out with TiARA. The number of simulation cases is sufficient to show the basic properties of TiARA. We made several simplifications of the underlying aerosol properties. We neglected the fact that in real aerosol situations, particles have a size- and wavelength-dependent complex refractive index. One main simplification is that we assumed a logarithmic-normal monomodal shape of the investigated PSDs.

Results of pre-studies regarding bimodal PSDs are presented in Refs. [14–16]. We did not test bimodal PSDs in our contribution, because a significantly different simulation strategy other than the one used for simulations of monomodal PSDs is needed. Both modes (fine and coarse mode) cover a wide range of parameters (median radius and geometrical standard deviation in their own domains, respectively). Given the available computer power, we cannot cover a large enough set of bimodal PSDs consisting of different combinations of the fine and coarse modes to derive statistically meaningful results.

Figure 6 shows the monomodal logarithmic-normal PSDs we used in our simulations. The mathematical description has the form

We used PSDs in terms of volume concentration representation in data inversion. Figure 6 shows that our PSDs not only cover the fine-mode range but also the coarse-mode range. We define in this study 500 nm particle radius as separation between fine-mode and coarse-mode particles. This separation at 500 nm particle radius already allows us to take a first look at the performance of TiARA for particles in the coarse mode without explicitly considering bimodal PSDs.

Figure 7 is a graphical display of particle effective radii and surface-area and volume concentrations tested in our study. Table 4 in Appendix A presents the numerical values of these parameters. The optical data were computed from these size distributions with the assumption of 1 particle per ${\mathrm{cm}}^{3}$. We do not need to use other values of particle number concentration, as this parameter serves only as a multiplication factor of the extensive properties of the investigated PSDs, i.e., number, surface-area, and volume concentrations.

The chosen values for median radius and mode width cover a comparably wide range of particle scenarios. In terms of number concentration, the PSDs cover the fine-mode fraction; see Fig. 6, left. The inversion algorithm retrieves the PSDs in terms of volume concentration. If we convert the number concentration to volume concentration, we see that PSDs based on volume-concentration distribution contain particles above 10 μm radius. That size range covers a significant part of the coarse mode of PSDs and thus needs to be considered in the reconstruction process. The LUTs developed for TiARA cover this radius range.

Simulation studies with the manual version of the inversion software [8,12] showed that measurement uncertainties up to 20% still allow for inferring meaningful results of microphysical data products. The results presented in this contribution are based on five different uncertainty scenarios and two error models. In TiARA’s method of solution space analysis, we added several new components that were not used in the manual version, which was first published between the late 1990s and early 2000s. For that reason, we also explored at which threshold value of measurement uncertainty TiARA provides unacceptable microphysical data products.

Table 3 shows that we used measurement-channel-independent uncertainties of 0%, 5%, 10%, 15%, 20%, and 25%. We also simulated four uncertainty scenarios in which the measurement uncertainties varied among the different measurement channels; see Table 6 in the Appendix. Analysis of these measurement-channel-dependent simulations is underway.

The case of error-free data was also investigated, as it allows us to collect information on the possible minimum uncertainty of the data products. We hypothesize that this uncertainty is driven by the mathematically induced retrieval uncertainty of TiARA. The CRI in that case is assumed error free, i.e., we assume we know the correct value of the CRI. We will further discuss this topic in our next paper, which will detail our next version of the uncertainty-analysis scheme.

In the following section, we present an example of the inversion of optical data obtained from a size distribution of a given geometrical standard deviation. The example serves as illustration of the general features of TiARA.

#### B. Example: Case Study

Figure 8 shows effective radius (first and second rows), and number (third and fourth rows) and surface-area (fifth and sixth rows) concentrations obtained with EEM. Figure 9 shows the results for volume concentrations and the complex refractive index.

For each parameter, we show the results for the case of error-free optical input data and the case of 15% measurement uncertainty. Each retrieval result (data point) is shown as mean value (symbols). Each data point follows from applying the averaging procedure described before.

The 1-1 lines indicate perfect reproduction of the theoretical values of the investigated parameters, i.e., the 1-1 line is a measure of the accuracy of the retrievals. The green lines show that the retrieved data products would need to be reduced, respectively, increased by 20% in order to be on the 1-1 line. The red lines indicate that the retrieved data products would need to be redruced by a factor of two, respectively, doubled in order to be on the 1-1 line.

In addition, each data point is shown with an uncertainty bar. Each individual solution (mean value) is affected by a retrieval uncertainty caused by the mathematically induced error, which follows from applying the averaging procedure in the vicinity of minimum discrepancy and the uncertainty caused by the inversion of imprecise optical input data. This uncertainty bar can be interpreted as standard deviation of the individual mean solutions.

The results for surface-area and volume concentrations are shown on a double-logarithmic scale. We chose this form of presentation because these two parameters cover a large range of values. Although the individual uncertainty bars are difficult to identify, we see that in most cases the uncertainties are below 50%.

The mean values of effective radius largely remain within the red sector if the true values of the input optical data are exact, i.e., we achieve the minimum requirement we expect from TiARA: the mean values of the inversion results remain within $\pm 50\%$ deviation from the true results. There are some outliers that seem to occur for the case of low real parts and large imaginary parts. The reason for these comparably strong outliers is unknown. The retrieved effective radius remains within the green sector, at least up to an effective radius of 0.5 μm, except for these few outliers.

50% is the upper value of statistical uncertainty of the mean value of effective radius. In many cases, this statistical noise is considerably less. The outliers belong mainly to the lowest real part tested in this study (1.4) and also show the strongest statistical uncertainty.

15% uncertainty of the optical data reduces the retrieval accuracy. The mean values remain in the red sector if effective radius is less than 0.5 μm and the imaginary part is $\le 0.01$. For stronger light-absorbing particles, i.e., imaginary parts larger than 0.01, we find cases in which effective radius is overestimated by more than a factor of two. Statistical uncertainty also increases if measurement errors are 15%.

In summary, we find that PSDs with particle effective radii below approximately 0.5 μm can be retrieved to at least 50% accuracy and better than 50% precision if particle absorption is low, i.e., imaginary parts are $\le 0.01$. Such low values of the imaginary part result in single-scattering albedo (SSA) at 532 nm $>0.9$ if, at the same time, $0.1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{\mu m}<{r}_{\mathrm{eff}}<0.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{\mu m}$; additional information can be found in Ref. [36]. If particles are highly light-absorbing, there is an increasing chance of overestimating particle effective radius.

The mean value of number concentration for the most part remains in the red sector. TiARA tends to underestimate number concentration if the input optical data are error free; see Fig. 8. This underestimation seems to grow with the increasing imaginary part of the complex refractive index of the investigated particles. The statistical error usually is less than 50%—in part, it is significantly less than 50%. The retrieval accuracy seems to improve with the increasing real part of the complex refractive index of the investigated particles. This pattern will be investigated in more detail in future studies.

A measurement uncertainty of 15% amplifies the feature of underestimating number concentration. As in the case of error-free data, the retrieved number concentration tends to be underestimated stronger as the imaginary part increases. Statistical uncertainty also increases compared to the case of error-free optical data. This feature is particularly obvious for the cases in which number concentration is overestimated.

As noted in previous studies [8,12,13,17,36] we find exceptionally accurate inversion results for surface-area concentration. An explanation for this mechanism has recently been published [14,15]. Surface-area concentration covered approximately three orders of magnitude in our simulation study. The results for surface-area concentration are inside the red sector, even if measurement errors are 15%. The only exceptions from this result are connected to the lowest values of surface-area concentration tested in our study. In that case, we significantly underestimate surface-area concentration if imaginary parts $\le 0.01$. In contrast, highly light-absorbing PSDs can be retrieved comparably accurately.

We are investigating the outliers that we find for low imaginary parts. One reason for this behavior could be that we cannot resolve in a trustworthy manner PSDs below particle radii of 50 nm. Such an underestimation would lead to an underestimation of surface-area concentration. In fact, number concentration would be underestimated even more in such a case. We find this underestimation of number concentration for PSDs that consist of a comparably high share of small particles below 50 nm radius. We will address this issue in our next performance test of TiARA. We will use an upgraded version of LUT tables and an improved solution-space post-processing scheme [14–16].

We find accurate inversion results of volume concentration across four orders of magnitude. The quality of the inversion results seems to decrease for imaginary parts of $0.05i$, i.e., the case of highly light-absorbing particles. We find two outliers in the plot for the imaginary parts $0.03i$ and $0.05i$. The statistical uncertainty of the retrieved values is always below 50% except for these two outliers. In most cases, the statistical uncertainty is less than 20%–30%.

We will carry out a more detailed analysis to identify what effect may be responsible for the two outliers. First, the input lidar ratios are 39,000 sr and 73,000 sr, respectivley. These data are produced by PSDs with very small mean radii of 0.02 μm and a real part of 1.7. In this case, the overlap between the available inversion windows and the PSDs likely is insufficient. The inversion solution has a minimal discrepancy of $\approx 1.0e+4$. This value is far too large to count as an acceptable solution. The next upgrade of TiARA will use an improved databank and other important modifications. Furthermore, we will be able to use a bigger set of simulation data, which hopefully will remove such outliers.

The accuracy of the inversion results slightly worsens for 15% measurement uncertainty if the imaginary part is $\le 0.01i$. TiARA tends to overestimate volume concentration. Mean values still remain within the red sector except for very small volume concentrations, where we find a similar pattern to the one described for surface-area concentration, i.e., volume concentration may be underestimated by more than 50%. However, as in the case of surface-area concentration, this underestimation seems to disappear for large imaginary parts, i.e., $>0.01i$. TiARA seems to overestimate volume concentration for large imaginary parts of $0.03i$ and $0.05i$. The statistical noise also increases significantly.

Figure 9 (center) shows the results for the retrieved real part. The true value was 1.6 in this simulation case. We used a search grid from 1.325–1.8 (real part) and $0i-0.05i$ (imaginary part) in our simulations; see Fig. 2. We tested in summary four different real parts: 1.4, 1.5, 1.6, and 1.7. As the final goal of our algorithm development, we targeted a retrieval accuracy of 0.05 to 0.075 for the real part. With regard to the imaginary part we aimed at 50% retrieval accuracy as the minimum goal. The main challenge in that regard lies in retrieving low values of the imaginary part, which we define as 0.01. We cannot use negative imaginary parts in our inversion. Based on the current way of how we select the final solution space, we naturally end up with an overestimation of the imaginary part. This overestimation is particularly strong if the true value of the imaginary part of the complex refractive index approaches the value zero, i.e., no light absorption.

TiARA tends to overestimate the real part by more than 0.075 if the true value is 1.4 and the input optical data are error free. This overestimation disappears for a larger real part. The range of retrieved real parts for the different size distributions (effective radii) seems to vary in dependence on the underlying imaginary part. For example, in the case of the true value 1.6 (red symbols), we see that all solutions are clustered between approximately 1.65 and 1.7 if imaginary parts are small. In contrast, if the imaginary part is $0.05i$ the solutions are spread out between real parts from 1.4 to 1.7. If the true value is 1.4, this pattern is nearly opposite, i.e., real parts vary across a wide range of numbers for low imaginary parts. In contrast, retrieved real parts seem to accumulate between 1.4 and 1.5 if the imaginary part is $0.05i$.

In summary, we find the following pattern of retrieval accuracy: the real part is overestimated, particularly if real parts are 1.4 or 1.5. Real parts are underestimated if values are 1.6 and 1.7. We assume that this pattern is largely based on the search space of the real and that TiARA is not sensitive to retrievals of the real part. Statistical uncertainty is as high as $\pm 0.1$, which is another indicator of the insensitivity of TiARA with regard to retrieving the real part. Such uncertainties are larger than what we need to achieve with TiARA in view of the accuracy requirements for SSA; see, e.g., [36]. We are exploring methods that allow us to reduce the uncertainty of the retrieved CRI. Ref. [13] shows that a retrieval accuracy of at least 0.1 of the real part may be achievable. This accuracy would already significantly improve the accuracy of the data products obtained with TiARA, and particulary would improve the accuracy of the mean values of the SSA to approximately $\pm 0.075$ or better; measurement uncertainty of the optical input data, of course, has influence on SSA, too. An example that illustrates our assumption is given in Fig. 6 of Ref. [36].

In the case of 15% measurement error, TiARA seems to generally overestimate the real part. The majority of retrieved real parts is at/above 1.6. We currently have no explanation for this pattern, i.e., this general overestimation when we move from error-free to erroneous optical input data. The insensitivity of TiARA with regard to the real part retrieval might explain this pattern only in part.

Another reason could be the position of the PSD along the particle radius grid. The analysis of this pattern, however, goes beyond the main goal of this paper, which is about identifying typical patterns and the general applicability of TiARA. However, these patterns will form the basis for a more refined analysis of our simulations. We hope that a better understanding of these patterns will help us improve TiARA in future. First results in that regard have been published in recent years [13–16,36].

Figures 9 (rows 5 and 6) shows the results of the retrieved imaginary parts. If the optical data are error free, TiARA overestimates the imaginary part. This pattern is particularly strong for low imaginary parts. If the imaginary part is high, i.e., $0.03i$ and $0.05i$, accuracy improves. The mean values are within the red sector. Still, we find on average an overestimation of the imaginary part in these two cases.

The situation is similar in the case of 15% measurement error. The imaginary part is largely overestimated if the true values are $0i,0.005i$, and $0.01i$, respectively. However, we find some (minor) differences compared to the case of 0% measurement uncertainty. The mean value for the most part stays within $\pm 0.006i$ if the real part is 1.4 or 1.5. A more detailed analysis on the basis of a larger set of simulations is needed to decide if this is a result that can be generalized, i.e., if better retrieval results for small imaginary parts can be obtained if real parts are small. We find better retrieval results of the imaginary part in the case of 0% data error. Accuracy is better than $\pm 0.006i$ if real parts are large, i.e., 1.6 and 1.7. The underestimation of the retrieved values largely disappears if imaginary parts are $0.03i$ or $0.05i$. Most of the mean values remain in the red sector.

The statistical uncertainty is mainly in the range of 50% if the true imaginary parts are $0.03i$ or $0.05i$. If imaginary parts are smaller ($0.005i$ or $0.01i$), the statistical uncertainty is as high as 100%. If the true imaginary part is $0i$, uncertainties can be as high as $-0.02i$.

In summary, most of the results we obtained for the imaginary part in this study confirm the findings we obtained with the manual version of the inversion algorithm (see also results presented in, e.g., Fig. 6 of Ref [36]): a) retrieval results are within $\pm 50\%$ for highly light-absorbing particles, and b) imaginary parts can be retrieved to $\pm 100\%$ if particles are low light-absorbing. Imaginary parts of $0i$ cannot be retrieved at the moment.

#### C. Statistics: Histograms

The previous case study illustrated for a few examples of PSDs what can be achieved with TiARA. It showed its strengths and weaknesses. As we are interested mainly in the mass production of inversion results for future space-borne missions, the more important question is how TiARA performs on a statistical basis; regardless of this demand, we will strive for improving inversion results of each individual case, as this, of course, naturally improves the statistical performance of TiARA. The statistical overview in this section allows us to further discuss the performance of TiARA in more general terms.

In view of the assessment of the impact of data uncertainties on the inversion results, the following figures also include results for GEM. We emphasize, that at present, we favor EEM over GEM in TiARA. EEM is more robust than GEM at the current status of algorithm development. We gain insight on the impact of extreme values of the retrieved microphysical parameters on the mean values of the data products, because we test extreme distortions of the optical spectra with the EEM. Knowledge of the possible largest and smallest values of each data product is valuable information in our uncertainty analysis and can be achieved with a comparably small number of error runs (eight). We cannot derive this information (extreme values of data products) from GEM.

The strength of GEM lies in its use of the statistical nature of uncertainty bars of the optical input data. In this first step of developing this new scheme of uncertainty analysis, we used only eight error runs. Our idea was to make the inversion results we obtained with GEM easier to compare with the results we obtained with EEM. Based on the current analysis of the simulation results we obtained with GEM, we realized that the number of error runs (eight) may not be sufficient to obtain a statistically significant set of individual solutions that need to be averaged for the final data products. We will increase the number of error runs to several hundred per dataset in a future study. We assume that we will reproduce the Gaussian-like shape of the distribution curve shown in Fig. 4 in a much better way. Despite the incompleteness of the simulation with GEM, we consider it relevant to show some results. The results help in better understanding the fundamental concepts of using GEM in future use of TiARA.

Figure 10 shows a statistical overview of the deviation between the reconstructed microphysical parameters and the true values for all size distributions tested in this study, including the six error levels, i.e., 0%, 5%, 10%, 15%, 20%, and 25%. The figures show how results differ if we use EEM (gray-shaded columns and numbers in black) and GEM (green-shaded columns and numbers in green) for generating the eight erroneous datasets (for each PSD and set of CRI) in our simulations. The histogram distributions show the percentage of simulated cases (eight PSDs, four real parts, 15 imaginary parts) that deviate by a given percentage from the true results of effective radius and number, surface-area, and volume concentrations. The histogram columns show results for which the retrieval products deviate $\le 50\%$ from the true results. 50% retrieval accuracy is our priority requirement, though we target a 20% retrieval accuracy for as many data products as possible in a future version of TiARA. In the case of real and imaginary parts of the complex refractive index, we show absolute deviations from the true results. The histogram columns refer to these deviations from the true results.

These histogram distributions are complemented by horizontal gray lines (EEM) and green horizontal lines (GEM). These lines show the percentage of cases that deviate by $\le 50\%$ from the true results in the case of effective radius and number, surface-area, and volume concentrations. For example, Fig. 10(aa) shows that nearly 100% of all effective radii can be retrieved with less than 50% deviation (accuracy) from the true results. This holds true for both error models—the gray and the green horizontal lines overlap. The black and green numbers in each plot extend this analysis. Each number denotes how many of all cases deviate less than $\pm 20\%$ from the true results. The results give an impression of how far off we are from what we consider as maximum achievable performance.

In the case of the real part, the horizontal lines and the numbers in each plot describe absolute deviations from the true values (real part). The horizontal lines refer to $\pm 0.075$ deviation (accuracy) from the true values. The numbers in the plots describe the percentage of cases that deviate by $\pm 0.05$ from the true values.

With regard to the imaginary part, the numbers in the plots describe the percentage of cases that deviate $\pm 0.006i$ from the true values..

The results derived from EEM and GEM differ. We first discuss the results for the size parameters, i.e., effective radius and number, surface-area, and volume concentrations.

From a qualitative point of view, the distribution (spread) of histogram columns generally is wider for the results obtained with EEM compared to the GEM results. There is a larger number of individual simulation cases for which, on average, the microphysical data products tend toward larger systematic deviations from the true values if we apply EEM compared to GEM. On average, GEM delivers more results that are close to 0% deviation from the true results compared to EEM.

The inversion results tend to be better, i.e., narrower around the 0%-deviation point for measurement errors of 10%–15%. This result however needs to be confirmed in further simulation studies. We again note the exceptionally high retrieval quality of surface-area concentration. With regard to GEM, we notice on average a slight underestimation of the size parameters for the individual simulations. The exception is surface-area concentration, which is overestimated by up to 10%.

With regard to the real and imaginary parts, we find the following results. We systematically overestimate the real part in all cases considered in our study. Only for the case of high imaginary parts do we find situations in which we underestimate the real part for some cases. If imaginary parts are large, there is a rather clear systematic behavior with regard to retrieval quality in dependence on uncertainty of the optical input data. The retrieval accuracy drops with increasing uncertainty of the input data if we take $\pm 0.075$ as the threshold value for the retrieved real part.

We do not find a significant performance difference in TiARA whether we use the EEM or the GEM in our simulations. If we use $\pm 0.05$ as the threshold for the retrieved real part, we find fewer cases that fulfill this requirement if the imaginary part is large ($0.03i$ and $0.05i$). There is no clear pattern for smaller imaginary parts.

With regard to the imaginary part, we again notice an overestimation of the retrieved values compared to the theoretical values. The inversion results are larger than the theoretical values, regardless of the used error model. The number of cases in which the imaginary part is derived to within $\pm 50\%$ of the true value is comparably high. This number of cases increases with increasing imaginary part. No results are shown for the cases of $0i$ and $0.005i$ (horizontal gray and green lines) as the error of $\pm 50\%$ cannot be computed in these cases. I.e., it is not defined for the case of 0*i*and cannot be computed for the case of 0.005*i*in view of the resolution of the grid that was used for the imaginary part.

We see a slight systematic behavior of the retrieved values of the imaginary parts with regard to increasing uncertainty of the optical data. The number of cases that fall within the 50% threshold-level decreases with increasing uncertainty level. Again, it seems that 15% data uncertainty is an upper threshold for which a reasonable retrieval performance can be achieved.

If we use the threshold-value $\pm 0.006i$, we find that on average 40%–50% of all cases fall within this uncertainty level. We do not see a clear pattern of retrieval quality in dependence on the uncertainty of the optical data used as input in the inversion.

In summary, the uncertainties of the optical input data seem to have little influence on the retrieval quality of the real and imaginary parts of the CRI. This pattern points to a possibly low sensitivity of TiARA with regard to retrieving the CRI. In contrast, the algorithm is capable of deriving trustworthy values of particle size parameters, i.e., effective radius and number, surface-area, and volume concentrations.

Finally, we show the unscaled version of this figure in Fig. 16 in Appendix D. This figure shows more clearly the height of the individual histogram columns and the distribution of the height of the columns. This figure is our first attempt to illustrate the *distribution function* of the solution space with regard to the deviation from the true results. One goal of future work includes the development of a more comprehensive uncertainty analysis scheme and to investigate the properties of the distribution function. We are interested in properties such as the shape of this distribution function. For example, we want to find out if a specific shape exists for all data products, e.g., Gaussian-type distribution or skewed distribution, and if we can infer in a more straightforward manner systematic and statistical biases of the individual data products from such histogram distributions. We will discuss in more detail these topics in our follow-up contribution that will deal with our updated uncertainty analysis scheme.

#### D. Statistics: Color Maps

Figure 11 summarizes the retrieval results in terms of color (heat) maps. This style of presentation provides us with a quicklook version of the performance of TiARA.

We show the results for all investigated size distributions and all investigated error levels (EEM and GEM) for all microphysical parameters. We show the number of simulated cases (in percent) that deviate by less than 20%, respectively, 50% from the true values of the size parameters (effective radius and number, surface-area, and volume concentrations). We show the results for $\pm 0.05$ and $\pm 0.075$ deviation from the true values of the real part of the CRI. We show the results for $\pm 0.005i$ and $\pm 50\%$ deviation from the true values of the imaginary part.

The map separates the results according to the uncertainty level of the input optical data and the error model we used. In one case, we combined all results regardless of the imaginary part, but the results are separated according to the mode width that was used for the investigated PSDs. In the other case, we combined all results regardless of the mode width of the PSDs, but the results are separated according to the used imaginary parts of the investigated PSDs.

We believe that separation according to the imaginary part of the CRI is necessary in this figure, as this parameter is the most sensitive input quantity for calculating SSA. SSA also depends on the PSD; thus, representation of the simulations according to a quantity related to particle size is also useful for the analysis of the performance of TiARA. We picked geometrical standard deviation as a first attempt in this type of data analysis. This parameter represents the variance, which is one additional quantity we want to provide as data product in the future.

We look at other ways of presenting our results. These heat maps can be considered only as our first attempt of presenting the performance of TiARA in a way that allows for easy comparison to performance changes compared to future versions of the software.

We counted the solutions that fall within 20% and 50% deviation from the true results. We then defined color-coded *intervals* that reflect if, e.g., 68.3% (1-$\sigma $ standard deviation), 95.4% (2-$\sigma $ standard deviation), and 99.7% (3-$\sigma $ standard deviation) of all solutions are within the $\pm 20\%$-interval, respectively, $\pm 50\%$-interval of *acceptable* retrieval quality. Our goal is to develop a statistically robust framework of uncertainty analysis, and for that reason, we introduce the concept of (1, 2, 3)-sigma standard deviations. We are aware of the fact that such a denotation requires a statistical framework based on Gaussian-like distributions of data products that can only in part be achieved at the moment. However, we prefer to introduce this standardized concept of representation of the results already at this stage of our research work, as it allows us to make the results presented in this study comparable to future performance tests of TiARA. We also avoid more *arbitrary*selections of intervals that could be used to reflect the performance of TiARA in the main region of our interest with regard to performance assessment, i.e., accurate retrievals in more than 50% of all simulation cases.

The cells in the heat maps for which we achieve 68.3%, 95.5%, and 99.7% are shown in green (light, medium, dark) colors. Cases for which at least 50% of all investigated cases of effective radius and number, surface-area, and volume concentrations fall into the selected uncertainty levels are shown in orange. If less than 50% of the cases are within the uncertainty levels, we show the cells in red color. One row of the map of results related to the imaginary part retrievals has no color. The values are not defined.

The left part of the map shows the results for the EEM. The right half of the map shows the results for the GEM. We note once more that the EEM currently is our preferred method of uncertainty analysis, as the retrieval results seem to be more consistent with what we can expect from erroneous simulation data, i.e., retrieval uncertainties increase on average with increasing optical data input uncertainty. As mentioned before, the results we obtain with GEM are not conclusive, and we need to carry out more simulations and analysis of the results before we will consider switching to this new uncertainty analysis scheme.

In order to illustrate the utility of Fig. 11, we will present two specific examples of how to interpret the figure. We select the *SURFACE-AREA map, mI theo*, and the row that shows the value 0.005 for the case: *extreme error model: <20% deviation from the true values*.

The term *mI theo* refers to the correct value of the imaginary part of the CRI that was used for calculating the optical input data. We select column 20, which denotes the optical data uncertainty of 20%. We see a light green cell, and the value 81% inside. It means that if we take all simulations regardless of the sigma value and select the simulations for which the true value of the imaginary part was $0.005i$, 81% of all surface-area concentration retrievals could be retrieved with less than 20% uncertainty.

As another example, we select the *IMAGINARY PART, Sigma* map and the row that shows the value 1.9. We use the map *Gauss error model:* $<50\%$ *deviation from true values*. We select the column 5%, which again denotes the uncertainty of the optical input data. We see an orange cell and the value 50 inside. It means that we take all simulations obtained from PSDs with a geometrical standard deviation of 1.9 regardless of the imaginary part ($0i$, $0.005i$, $0.01i$, $0.03i$, $0.05i$); we obtain the imaginary part with less than 50% uncertainty for 50% of all investigated cases.

With regard to the size parameters, on average, more cases fall within 50% than 20% retrieval uncertainty. This result holds true for both uncertainty models. With regard to the EEM, the retrieval performance on average worsens significantly stronger if we reduce the threshold from $\pm 50\%$ to $\pm 20\%$ deviation from the true value, compared to the results obtained with GEM.

However, we need to take the results of better performance of GEM compared to EEM with caution. For example, in the case of $<50\%$ *Deviation From True Values*, it seems counter-intuitive that the retrieval quality of the size parameters is independent of the uncertainty of the optical input data.

One explanation for this unusual behavior we obtain with GEM could be the low number of inversion runs (nine) that we use for each PSD and error level. As mentioned before, this number by far is not able to reproduce a Gaussian-shape-like error curve, i.e., the statistics of the distribution of the erroneous input data may be (strongly) biased. For example, it may happen that the nine inversion runs accumulate around 0% uncertainty rather than covering the bell-shaped form of the Gaussian curve in a representative way (Section 3.C.2). This result shows that we still have insufficient understanding of how to correctly apply the GEM.

The situation with regard to retrieval quality is different in the case of the CRI retrievals. With regard to the real part, considerably less than 50% of all cases are retrieved to better than $\pm 0.075$, respectively, $\pm 0.05$ accuracy. There are only a few exceptions. With regard to the imaginary part, we see the same poor retrieval quality.

#### E. Statistics: Summary of Statistical Analysis

Figure 12 summarizes the main performance characteristics of TiARA if we take the retrieval uncertainty of 50% deviation from the true results as benchmark. As noted a few times before, this value is the primary threshold value we have been aiming at in this first performance study. Thus, we do not show the results for 20% deviation from the true results in this figure. We show the results in dependence on the used error model and the uncertainty of the optical input data.

With regard to the EEM, we can derive effective radius to within $\pm 50\%$ in more than 60% of the simulation cases if we keep the optical data uncertainty below 15%. The exceptions to this result are strongly light-absorbing particles, in which case, the retrieval uncertainty is on average larger; see Fig. 11. Approximately 60% of all simulation cases of number concentration retrievals fall within this threshold of 50%. Surface-area concentration retrievals are very robust. As mentioned before (Figs. 10 and 11), the retrieval quality of this parameter seems to depend little on optical input data uncertainties of up to 25%. The pattern of uncertainty of volume concentration retrievals with regard to optical data uncertainty is similar to the patterns we find for effective radius.

Figure 12 once more confirms that the quality of the retrieval of the real part and the imaginary part of the CRI is not satisfactory, regardless of the uncertainty of the optical input data. We find that the quality of the microphysical data products worsens with increasing measurement error in the case of the EEM. In that sense, we find at least a pattern we expected at the start of our simulation work. In contrast, this pattern is missing if we use the GEM. As noted before, results for the GEM may not represent the true statistics in view of the limited number of error runs for each optical dataset tested in this study.

## 4. ORACLES: MEASUREMENT EXAMPLE FROM 22 SEPTEMBER 2016

We present the results of the inversion of optical data taken with the HSRL-2 during the ObseRvations of Aerosols above CLouds and their IntEractionS (ORACLES) mission (https://espo.nasa.gov/oracles/content/ORACLES_Two-page4_ORACLES_Flyer). A more comprehensive description of microphysical particle properties of aerosols observed during ORACLES will be given in a future presentation.

#### A. Optical Properties

We used data collected during the flight from 08:00 and 14:40 UTC on 22 September 2016. Smoke from biomass burning was observed to heights up to 8 km above sea level. There was considerable variation of particle properties and particle concentration during the flight.

Figure 13 provides a comprehensive overview on several key optical properties of the smoke. Particle Angström exponents provide us with some insight to particle size. Particle lidar ratios contain information on particle size and light-absorption capacity.

Particle Angström exponents varied between 1 and 1.5 above approximately 3 km height above sea level during most of the flight time. These values point to particles in the fine-mode fraction, i.e., particles below approximately 500 nm radius. Below approximately 3 km height, particles were considerably larger. We excluded them from our data analysis. The linear particle depolarization ratios indicated a significant contribution from non-spherical particles, which most likely were dust particles.

The lidar ratio at 532 nm is above approximately 65 sr in most parts of the curtain plot. Maximum values are around 90 sr. These high values of lidar ratios and Angström exponents above 1 (above approximately 2 km height above sea level) indicate the presence of aerosol pollution with comparably strong light-absorption capacity. Biomass-burning aerosol particles likely were the reason for the observed hight lidar ratios.

The ratio of the lidar ratios at 355 nm and 532 nm, respectively, varies around 1. There is a tendency toward values below 1 (as low as approximately 0.75) in heights below 4 km and as high as 1.5 above 4 km above sea level. These ratios corroborate the assumption that comparably strong light-absorbing particles were observed during the flight.

#### B. Microphysical Data Products

Figure 14 shows curtain plots of number, surface-area, and volume concentrations and particle effective radius. We used only optical data points that fulfilled quality assurance checks, e.g., optical data with strong signal-to-noise ratio, i.e., uncertainty of the optical data products below 15% and low linear particle depolarization ratios (to exclude contamination by dust particles) below 5%.

Total number concentration in large portions of the plot does not exceed $1000\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{particles}/{\mathrm{cm}}^{3}$. However, particle concentration significantly increases between approximately 9:30 and 11 UTC in flight levels between 3 km and 5 km height above sea level. We find number concentrations as high as $3000\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{particles}/{\mathrm{cm}}^{3}$. After 12:00 UTC, we find another, though less significant, increase in number concentration, up to $1500\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{particles}/{\mathrm{cm}}^{3}$. The uncertainty of the retrieved values of number concentration for the most part of the curtain plot stays below approximately $250\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{particles}/{\mathrm{cm}}^{3}$. The exception is the region where we find the significantly increased values of number concentration. In that region, uncertainties are as large as approximately $750\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{particles}/{\mathrm{cm}}^{3}$. That means retrieval errors are on average 25% for the complete curtain plot.

Particle effective radius remains comparably constant above 4 km height between 9:00 and 11:30 UTC. Values are a little larger than 0.2 μm. Below 4 km height and particularly after 11:30 UTC, mean particle size seems to increase. We find particle effective radii above 0.4 μm. Retrieval uncertainties accordingly increase in the portions of the curtain plot that show larger values of effective radius. The results for effective radius are rather noisy in these portions. In contrast, the section of the flight track that contains particles with a comparably constant value of effective radius (around 0.2 μm) shows rather low uncertainties. Uncertainties of effective radius are less than 0.04 μm, which translates into a relative uncertainty below 20%.

Surface-area concentration and volume concentration behave in a fashion similar to number concentration. Values remain rather steady during most of the flight except for the flight time from 9:30–11:00 UTC. During that time, surface-area concentration increases from average values of less than $300\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{\mu}{\mathrm{m}}^{2}/\mathrm{cm}}^{3}$ to as high as $700\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{\mu}{\mathrm{m}}^{2}/\mathrm{cm}}^{3}$. As in the case of number concentration, we find again another slight increase of surface-area concentration after 12:00 UTC in flight levels up to 5.5 km height above sea level.

Volume concentration for the most part of the flight stays below $20\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{\mu}{\mathrm{m}}^{3}/\mathrm{cm}}^{3}$ except during the time of the flight that shows the increase in number and surface-area concentration. Volume concentration increases to values as large as $50\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{\mu}{\mathrm{m}}^{3}/\mathrm{cm}}^{3}$.

On average, the retrieval results show more noise for surface-area concentration and for volume concentration after 11:30 UTC. Uncertainties on average stay below approximately 20% for volume concentration. With regard to surface-area concentration, uncertainties are below approximately 10% in the region of increased number concentration. Outside this region, we find higher retrieval uncertainties, which also tend to vary significantly stronger from data point to data point. Uncertainties increase to 15%–20%.

Figure 15 shows the contribution of particles in the fine-mode fraction (particle radius below 500 nm) of the number concentration to number concentration of the complete particle size distribution. We also show the ratio of the fine-mode particle effective radius (we considered only particles with radius below 500 nm in our computation) to effective radius of the total PSD.

These curtain plots provide us not only with valuable information on the share of particles in the fine-mode fraction of the retrieved particle size distribution, but also serve as a quality flag parameter. We are quite confident that we can derive particle effective radius and number, surface-area, and volume concentrations with comparably high quality if particle size is below 500 nm radius, as explained below.

We are aware of the fact that retrieval uncertainties may increase with particle radius, particularly if particles are above approximately 1–2 μm radius because of the limited range of measurement wavelengths. The ratio, particularly that of number concentration, but also to some extent effective radius, provides us with a qualitative confidence value on the retrieval quality. If these ratios of fine-mode number concentration to total concentration and fine-mode effective radius to total effective radius are close to the value 1, we likely obtain comparably low retrieval uncertainties.

We see that the ratio of fine-mode number concentration to total number concentration is close to 1 for nearly all the data points that were used for data inversion. There are only a few outliers, and we assume that these outliers are related mainly to *larger than usual* retrieval uncertainties. We are quite confident that most of the particles in this biomass-burning plume belonged to the fine-mode fraction of the particle size distributions.

With regard to effective radius, we find a ratio of 0.8 or larger in the portion of the plume that was observed between 9:00 and 11:30 UTC above 4 km height above sea level. Again, this result indicates that most particles in this part of the plume belonged to the fine-mode fraction of the retrieved particle size distributions. We find similarly large values of this ratio in many other parts of the curtain plots, although the overall appearance of the plot indicates more variation of effective radius. On the one hand, larger retrieval uncertainties may be one reason for the more *noisy* structure of the plot. On the other hand, larger effective radius observed after 11:30 UTC may be another reason. Particularly below 4 km height above sea level, we find stronger variations.

## 5. CONCLUSION

We presented the main results of extensive simulation studies carried out with our new, autonomous inversion algorithm, TiARA. We applied the algorithm to a case of experimental data collected during ORACLES. The values of the data products obtained with TiARA for the ORACLES case are in a reasonable range of numbers. Nevertheless, we need to confirm the quality of the results with *in situ* data and a more detailed uncertainty analysis.

Some of the strengths of TiARA are:

- • This methodology works with inversion with regularization, which allows for, e.g., great flexibility with regard to the resolution capacity of the investigated particle size distributions. We do not need to assume any specific shape of the particle size distribution.
- • TiARA allows for the unsupervised inversion of curtain plots of optical data acquired with NASA Langley’s airborne HSRL-2 instrument. TiARA is used for producing several microphysical particle parameters, i.e., effective radius, and number, surface-area, and volume concentrations to a level that until now could be obtained with the data-operator-controlled version of the inversion algorithm, e.g., Ref. [36].
- • Contrary to the manual version of our inversion methodology, we include for the first time number concentration as an additional standard data product in TiARA.
- • We managed to test different levels of uncertainties of optical input data. Although such work has been done in previous publications, it was never based on a statistically significant set of data. We tested hundreds of scenarios and used a wide range of data uncertainty levels ranging from 5% to 25%. We consider tests of such a range of uncertainties important, as they give us a much clearer picture on the maximum acceptable optical data uncertainty above which microphysical data products become useless. We find that 15% data uncertainty is a threshold value that needs to be considered in future instrument development. Of course, lower uncertainties are desirable, as they could further improve the data products (though no proof of this claim is available).
- • Another important outcome of this study is that data uncertainties above 20% clearly lead to a reduction of the quality (uncertainty respectively accuracy/precision) of data products obtained with TiARA.
- • Adding to this finding, i.e., impact of measurement uncertainties, we find that TiARA performs at least as well as our manually operated inversion algorithm with regard to a) quality of retrieved data products, and b) sensitivity to uncertainty of optical input data.
- • We obtain for the first time insight on the uncertainties of the retrieved data products by making use of the statistical analysis of a large number of simulations in which we use different properties of the investigated particle size distributions, i.e., size parameters and complex refractive index. This analysis, which could not have been achieved with a manual, data-operator-controlled algorithm, allowed us for the first time to introduce the concept of bias (or accuracy) and noise (or precision) in our uncertainty analysis.

Despite these strengths, TiARA still requires improvements in order to meet the high demands that are required with regard to an accurate description of atmospheric aerosol pollution. We mention just a few important points that serve as guidance for future work.

- • TiARA currently has been tested only for particle size distributions of monomodal shape. Work on bimodal size distributions is in progress but is affected by extreme challenges. It is a futile approach to combine two monomodal particle size distributions into bimodal distributions, particularly in view of the fact that we must obtain statistically significant results. The number of bimodal size distributions that can be generated from the monomodal size distributions used in this study leads to an amount of bimodal distributions that cannot be processed in any reasonable time frame, even with supercomputers that might be available to us.
- • We also need to consider that the weighting of each of the two monomodal size distributions that are needed to create a bimodal particle size distribution adds another element of complexity to the simulations. We thus need to think of smarter approaches to this specific problem of using bimodal particle size distributions in simulation studies.
- • The main challenge of retrieving an accurate and precise value of the CRI remains an open issue. We made progress in this work, as we now have a much clearer picture of where we stand, based on this statistical set of simulations. We also know from a previous study, [36], what uncertainty of the retrieved CRI (real and imaginary part) is permitted at maximum to obtain meaningful values of the SSA. Thus, the goal of future work will be to insert additional tools in TiARA that will help us to come closer to this ideal situation shown in Ref. [36].
- • We assume size- and wavelength-independent CRIs in the simulations. This assumption is a serious restriction and likely affects simulations with bimodal size distributions considerably stronger than simulations with monomodal size distributions. The number of possible scenarios that need to be tested to obtain statistically significant results is simply too large if we apply current strategies of simulation studies.
- • Despite the fact that we made progress with developing a robust uncertainty analysis scheme, we are still far from calling this task fulfilled. We tested two different uncertainty models (optical input data), i.e., the Gauss-error distribution model (GEM) and the extreme-error model (EEM). Both methods have advantages and disadvantages, though it is obvious that in the future, we must move to GEM. However, our simulations remain inconclusive with regard to the true performance of GEM, should, in the future, this uncertainty model of optical input data become the method of choice in TiARA. We are preparing a manuscript that will further elaborate on the topic of uncertainty analysis. We hope this work will bring us one step closer to solving this challenge in future versions of TiARA.
- • We explicitly excluded simulations with particle size distributions in which particles are of non-spherical shape.
- • We are still struggling with providing robust error bars (bias and noise), though we made significant progress in that regard in this contribution.
- • We exclude in our software the possibility of inverting data that represent particle sizes below 50 nm radius and above 10 micrometers. Although we are working on resolving that challenge at the larger particle radius range, we have not yet developed any realistic strategy that would allow us to identify particles below 30 nm radius, given the available lidar measurement wavelengths of 355 nm, 532 nm, and 1064 nm.

In this simulation study, we targeted a statistically based uncertainty analysis, which is in contrast to studies on the uncertainties of inversion results carried out with the data-operator-controlled inversion algorithm. TiARA has been developed for the large-scale processing of data from measurements with airborne and space-borne lidar. Thus, a statistical approach toward the performance characterization is necessary. We obtained first though very simple results in that regard, because of the amount of simulation data needed for a statistical analysis. Based on the statistical analysis of the simulations, we noticed that there is a systematic shift of the retrieved values of effective radius and number, surface-area, and volume concentrations compared to the true values. This shift is comparably small for surface-area concentration, and larger for the other parameters. With regard to real and imaginary parts of the CRI, we found a significant shift. The spread of the retrieval results around the true values also differs for the different data products. Surface-area concentration has the smallest spread. The largest spread is found for real and imaginary parts of the CRI.

We selected a benchmark of 50% retrieval accuracy of effective radius and number, surface-area, and volume concentrations as acceptable performance of this first version of TiARA. We achieve this goal if the optical input data are error free. This case also gives us insight into the mathematical uncertainties caused by TiARA. Simulations with different uncertainty levels of the input optical data show that satisfactory performance of TiARA on average can still be achieved if measurement uncertainties are $\le 15\%$.

Retrievals of the CRI are not satisfactory from a statistical point despite the situation that individual inversion cases deliver acceptable results. The retrieval accuracy of the real part needs to be better than $\pm 0.1$ and that of the imaginary part should be better than 50% in order to achieve an accuracy of approximately $\pm 0.05$ for SSA [36]. We achieve this accuracy of the CRI in less than 50% of all cases tested in this study. Particularly in the case of low light-absorbing particles, i.e., the imaginary part is less than $0.01i$, deviations between retrieved and true value are considerably larger than 50%. For these reasons, we will not use TiARA for the retrieval of the CRI before simulations indicate an improvement of the quality of this data product.

Simulations show that EEM is a comparably robust scheme for inferring uncertainty bars of the data products. As mentioned before, we want to use a new model of uncertainty analysis, i.e., GEM, in future versions of TiARA. In the present study, we found that simulation results with GEM are still inconclusive. For example, retrieval uncertainties do not increase in a clear way if uncertainties of the optical input data increase. Other results of this simulation study also suggest that we do not understand the mechanism of GEM in TiARA well enough to use it as standard procedure for our uncertainty analysis at the current stage of algorithm development.

We note that we do not use TiARA for analyzing optical data of particles with non-spherical geometry if the linear particle depolarization ratio is larger than approximately 5% [18]. Previous work has shown [32] that linear particle depolarization ratios below approximately 5% still provide us with reasonable values of the microphysical data products. We currently work on developing methods that allow us to invert optical data of depolarizing particles. A few results have been presented previously [39,40]. That work was based on using the data-operator-controlled version of the inversion software.

In future work, we will focus on the following main topics in order to further improve the performance of TiARA:

- • Application of TiARA to experimental data
- • Next version of post-processing of inversion results
- • Improvements of the quality of the derived CRI
- • Improved uncertainty analysis of data products
- • Simulations with bimodal particle size distributions

Table 4 shows a more comprehensive list of data products provided by TiARA. In this contribution, we focussed on key parameters that can be found in publications that reported on the manually operated version of the inversion algorithm, e.g., [4,8].

For example, TiARA provides data products in which we separate the fine-mode and the coarse-mode fraction of particle size distributions. We define 500 nm particle radius as the boundary between fine-mode and coarse-mode particles. In that regard, TiARA has the potential of providing additional information other than information on bulk parameters; see, e.g., [41]. TiARA provides detailed information on uncertainty levels that go beyond the usual concept of uncertainty bars.

The “x” symbols in Table 4 mark the data products presented in this contribution. The parameters marked by the “+” symbols can already be produced. We currently analyze the results of that part of the simulation studies. First results obtained from experimental data have recently been presented in Ref. [18].

The data products denoted by the “o” symbols are available for monomodal PSDs, but we want to include results based on bimodal PSDs, which have not yet been generated. We are currently developing automated software that allows us to produce the result tables. The simulations denoted by the “#” symbol pose a challenge as we currently use the AERONET light-scattering model for mineral dust [42]. We are preparing a publication that summarizes inversion results of optical data that describe scattering by mineral dust observed during DISCOVER-AQ California and DISCOVER-AQ Texas. This contribution will serve as a basis for developing a methodology suitable for the inversion of data that describe light scattering by non-spherical particles at 180º.

## APPENDIX A: MEANING OF NUMBERS IN FIGURE 5

The following list provides information on where the individual parts of the flow chart (Fig. 5) are explained in the main body of the text:

## APPENDIX B: SIMULATIONS: INPUT PARAMETERS

Table 5 lists the input parameters of microphysical parameters (median radius, geometrical standard deviation, and real and imaginary parts of the CRI) that were used to compute the optical data with a Mie-scattering algorithm [21]. The optical data were subsequently used as input for TiARA. The other parameters listed in this table show the data products retrieved with TiARA, i.e., particle effective radius and number, surface-area, and volume concentrations.

We are in the process of including particle radii below 50 nm into our LUTs, i.e., down to 1 nm. At that point, we will be able to analyze our simulations for which effective radii are less than 0.14 μm.

## APPENDIX C: SIMULATIONS WITH CHANNEL-DEPENDENT ERRORS

Table 6 lists scenarios in which measurement uncertainty depends on the measurement channel. These simulations will provide us with additional insight on the robustness of TiARA. These simulations target experimental scenarios that are more realistic. For example, uncertainties of the optical profiles may be different for the different measurement channels. We note that the four cases listed in this table by far do not describe the variety of scenarios that may occur under realistic measurement situations. Nonetheless, we will gain a better understanding if it is important that specific measurement channels should have lower measurement uncertainties than other measurement channels.

Simulations have already been carried out for these scenarios. We will present the results in a future presentation. We will use the next version of TiARA that will include the improved data post-processing scheme described in Refs. [14–16].

## APPENDIX D: HISTOGRAM DISTRIBUTIONS

Figure 16 shows once more the histogram distributions of the data products presented in Fig. 10. In contrast to Fig. 10, we did not scale the results to 100% on the $y$ axis. The shape of the distribution of the histogram columns can be clearly identified. As mentioned in the main body of the text, surface-area concentration can be retrieved exceptionally well. There seems to be a comparably well-defined shape of the distribution of the histogram columns for all data products. The shape in terms of peak and width varies for the different data products. In the future, we plan to parameterize these distribution functions. Parametrization allows us to provide important performance characteristics of TiARA in a highly compressed way. Thus, parametrization will also allow us to identify performance improvements of future versions of TiARA if they are significant. Furthermore, the parametrization allows us to compare our results to a study carried out in Ref. [30]. The authors of that study for the first time investigated the effects of systematic and random errors on the retrieval quality of particle microphysical data products based on their own inversion methodology.

## Funding

Langley Research Center (NASA Langley Research Center).

## Acknowledgment

We thank Dr. Matthias Tesche, formerly University of Hertfordshire (UK), now at University of Leipzig (Germany), for preparing several figures in this publication.

## REFERENCES

**1. **A. N. Tikhonov and V. Y. Arsenin, eds., *Solutions of Ill-Posed Problems* (Wiley, 1977).

**2. **D. Müller, “Inversionsalgorithmus zur Bestimmung Physikalischer Partikeleigenschaften aus Mehrwellenlängen-Lidarmessungen (inversion algorithm for the determination of particle properties from multiwavelength lidar measurements),” Ph.D. thesis (Universität1997).

**3. **D. Müller, U. Wandinger, D. Althausen, I. Mattis, and A. Ansmann, “Retrieval of physical particle properties from lidar observations of extinction and backscatter at multiple wavelengths,” Appl. Opt. **37**, 2260–2263 (1998). [CrossRef]

**4. **D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: experiment,” Appl. Opt. **39**, 1879–1892 (2000). [CrossRef]

**5. **T. Murayama, D. Müller, K. Wada, A. Shimizu, M. Sekigushi, and T. Tsukamato, “Characterization of Asian dust and Siberian smoke with multi-wavelength Raman lidar over Tokyo, Japan in spring 2003,” Geophys. Res. Lett. **31**, L23103 (2004). [CrossRef]

**6. **Y. M. Noh, D. Müller, I. Mattis, H. Lee, and Y. J. Kim, “Vertically resolved light-absorption characteristics and the influence of relative humidity on particle properties: multiwavelength Raman lidar observations of East Asian aerosol types over Korea,” J. Geophys. Res. **116**, D06206 (2011). [CrossRef]

**7. **H. Baars, A. Ansmann, D. Althausen, R. Engelmann, B. Heese, D. Müller, P. Artaxo, M. Paixao, T. Pauliquevis, and R. Souza, “Aerosol profiling with lidar in the Amazon Basin during the wet and dry season,” J. Geophys. Res. **117**, D21201 (2005). [CrossRef]

**8. **I. Veselovskii, A. Kolgotin, V. Griaznov, D. Müller, U. Wandinger, and D. N. Whiteman, “Inversion with regularization for the retrieval of tropospheric aerosol parameters from multiwavelength lidar sounding,” Appl. Opt. **41**, 3685–3699 (2002). [CrossRef]

**9. **I. Veselovskii, A. Kolgotin, V. Griaznov, D. Müller, K. Franke, and D. N. Whiteman, “Inversion of multiwavelength Raman lidar data for retrieval of bimodal aerosol size distribution,” Appl. Opt. **43**, 1180–1195 (2004). [CrossRef]

**10. **C. Böckmann, I. Miranova, D. Müller, L. Scheidenbach, and R. Nessler, “Microphysical aerosol parameters from multiwavelength lidar,” J. Opt. Soc. Am. A **22**, 518–528 (2005). [CrossRef]

**11. **J. Bösenberg, V. Matthias, A. Amodeo, V. Amoiridi, A. Ansmann, J. M. Baldasano, I. Balin, D. Balis, C. Böckmann, A. Boselli, G. Carlson, A. Chaikovsky, G. Chourdakis, A. Comerón, F. D. Tomasi, R. Eixmann, V. Freudenthaler, H. Giehl, I. Grigorov, A. Hågård, M. Iarlori, A. Kirsche, G. Kolarov, L. Komguem, S. Kreipl, W. Kumpf, G. Larchevêque, H. Linné, R. Matthey, I. Mattis, L. Mona, D. Müller, S. Music, S. Nickovic, M. Pandolfi, A. Papayannis, G. Pappalardo, J. Pelon, C. Pérez, R. M. Perrone, R. Persson, D. P. Resendes, V. Rizi, R. Rocadenbosch, J. A. Rodriguez, L. Sauvage, L. Schneidenbach, R. Schumacher, V. Shcherbakov, V. Simeonov, P. Sobolewsky, N. Spinelli, I. Stachlewska, D. Stoyanov, T. Trickl, G. Tsaknakis, G. Vaughan, U. Wandinger, X. Wang, M. Wiegner, M. Zavrtanik, and C. Zerefos, “EARLINET: A European aerosol research lidar network to establish an aerosol climatology,” Technical report No. 348 (2003).

**12. **D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: simulation,” Appl. Opt. **38**, 2358–2368 (1999). [CrossRef]

**13. **E. Chemyakin, D. Müller, S. Burton, A. Kolgotin, C. Hostetler, and R. Ferrare, “Arrange and average algorithm for the retrieval of aerosol parameters from multiwavelength high-spectral-resolution lidar/Raman lidar data,” Appl. Opt. **53**, 7252–7266 (2014). [CrossRef]

**14. **A. Kolgotin, D. Müller, E. Chemyakin, and A. Romanov, “Improved identification of the solution space of aerosol microphysical properties derived from the inversion of profiles of lidar optical data, part 1: theory,” Appl. Opt. **55**, 9839–9849 (2016). [CrossRef]

**15. **A. Kolgotin, D. Müller, E. Chemyakin, and A. Romanov, “Improved identification of the solution space of aerosol microphysical properties derived from the inversion of profiles of lidar optical data, part 2: simulations with synthetic data,” Appl. Opt. **55**, 9850–9865 (2016). [CrossRef]

**16. **A. Kolgotin, D. Müller, E. Chemyakin, A. Romanov, and V. Alehnovich, “Improved identification of the solution space of aerosol microphysical properties derived from the inversion of profiles of lidar optical data, part 3: case studies,” Appl. Opt. **57**, 2499–2513 (2018). [CrossRef]

**17. **D. Müller, C. A. Hostetler, R. A. Ferrare, S. P. Burton, E. Chemyakin, A. Kolgotin, J. W. Hair, A. L. Cook, D. B. Harper, R. R. Rogers, R. W. Hare, C. S. Cleckner, M. D. Obland, J. Tomlinson, L. K. Berg, and B. Schmid, “Airborne multiwavelength high spectral resolution lidar (HSRL-2) observations during TCAP 2012: vertical profiles of optical and microphysical properties of a smoke/urban haze plume over the northeastern coast of the US,” Atmos. Meas. Tech. **7**, 3487–3496 (2014). [CrossRef]

**18. **P. Sawamura, R. H. Moore, S. P. Burton, E. Chemyakin, D. Müller, A. Kolgotin, R. A. Ferrare, C. A. Hostetler, L. D. Ziemba, A. J. Beyersdorf, and B. E. Anderson, “HSRL-2 aerosol optical measurements and microphysical retrievals vs. airborne in situ measurements during DISCOVER-AQ 2013: an intercomparison study,” Atmos. Chem. Phys. **17**, 7229–7243 (2017). [CrossRef]

**19. **A. Ansmann and D. Müller, “Lidar and atmospheric aerosol particles,” in *Lidar Range-Resolved Optical Remote Sensing of the Atmosphere*, C. Weitkamp, ed. (Springer, 2005), pp. 105–141.

**20. **D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: theory,” Appl. Opt. **38**, 2346–2357 (1999). [CrossRef]

**21. **C. F. Bohren and D. R. Huffman, eds., *Absorption and Scattering of Light by Small Particles* (Wiley, 1983).

**22. **D. Müller, U. Wandinger, D. Althausen, and M. Fiebig, “Comprehensive particle characterization from three-wavelength Raman-lidar observations,” Appl. Opt. **40**, 4863–4869 (2001). [CrossRef]

**23. **D. L. Phillips, “A technique for the numerical solution of certain integral equations of the first kind,” J. Assoc. Comput. Mach. **9**, 84–97 (1962). [CrossRef]

**24. **S. Twomey, “The application of numerical filtering to the solution of integral equations encountered in indirect sensing measurements,” J. Franklin Inst. **279**, 95–109 (1965). [CrossRef]

**25. **V. E. Zuev and I. E. Naats, eds., *Inverse Problems of Lidar Sensing of the Atmosphere* (Springer, 1983).

**26. **S. Twomey, ed., *Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements* (Elsevier, 1977).

**27. **R. A. Horn and C. R. Johnson, eds., *Matrix Analysis* (Cambridge University, 1985).

**28. **I. Veselovskii, A. Kolgotin, V. Griaznov, D. Müller, and D. N. Whiteman, “Information content of multiwavelength lidar data with respect to microphysical particle properties derived from eigenvalue analysis,” Appl. Opt. **44**, 5292–5303 (2005). [CrossRef]

**29. **S. P. Burton, R. A. Ferrare, C. A. Hostetler, J. W. Hair, R. R. Rogers, M. D. Obland, C. F. Butler, A. L. Cook, D. B. Harper, and K. D. Froyd, “Aerosol classification using airborne high spectral resolution lidar measurements—methodology and examples,” Atmos. Meas. Tech. **5**, 73–98 (2012). [CrossRef]

**30. **D. Pérez-Ramírez, D. N. Whiteman, I. Veselovskii, A. Kolgotin, M. Korenskiy, and L. Alados-Arboledas, “Effects of systematic and random errors on the retrieval of particle microphysical properties from multiwavelength lidar measurements using inversion with regularization,” Atmos. Meas. Tech. **6**, 3039–3054 (2013). [CrossRef]

**31. **S. P. Burton, E. Chemyakin, X. Liu, K. Knobelspiesse, S. Stamnes, P. Sawamura, R. H. Moore, C. A. Hostetler, and R. A. Ferrare, “Information content and sensitivity of the 3β + 2α lidar measurement system for aerosol microphysical retrievals,” Atmos. Meas. Tech. **9**, 5555–5574 (2016). [CrossRef]

**32. **U. Wandinger, D. Müller, C. Böckmann, D. Althausen, V. Matthias, J. Bösenberg, V. Weiss, M. Fiebig, M. Wendisch, A. Stohl, and A. Ansmann, “Optical and microphysical characterization of biomass-burning and industrial-pollution aerosols from multiwavelength lidar and aircraft measurements,” J. Geophys. Res. **107**, D21 (2002). [CrossRef]

**33. **C. D. Rodgers, ed., *Inverse Methods for Atmospheric Sounding: Theory and Practice* (World Scientific, 2000).

**34. **V. Bovchaliuk, P. Goloub, T. Podvin, I. Veselovskii, D. Tanré, A. Chaikovsky, O. Dubovik, A. Mortier, A. Lopatin, M. Korenskiy, and S. Victori, “Comparison of aerosol properties retrieved using GARRLiC, LIRIC, and Raman algorithms applied to multi-wavelength lidar and sun/sky-photometer data,” Atmos. Meas. Tech. **9**, 3391–3405 (2016). [CrossRef]

**35. **P. Sawamura, D. Müller, R. M. Hoff, C. A. Hostetler, R. A. Ferrare, J. W. Hair, R. R. Rogers, B. E. Anderson, L. D. Ziemba, A. J. Beyersdorf, K. L. Thornhill, E. L. Winstead, and B. N. Holben, “Aerosol optical and microphysical retrievals from a hybrid multiwavelength lidar dataset – DISCOVER-AQ 2011,” Atmos. Meas. Tech. **7**, 3095–3112 (2014). [CrossRef]

**36. **D. Müller, C. Böckmann, A. Kolgotin, L. Schneidenbach, E. Chemyakin, J. J. Rosemann, P. Znak, and A. Romanov, “Microphysical particle properties derived from inversion algorithms developed in the framework of EARLINET,” Atmos. Meas. Tech. **9**, 5007–5035 (2016). [CrossRef]

**37. **M. Tesche, A. Ansmann, D. Müller, D. Althausen, R. Engelmann, V. Freudenthaler, and S. Groß, “Vertically resolved separation of dust and smoke over Cape Verde by using multiwavelength Raman and polarization lidar during SAMUM 2008,” J. Geophys. Res. **114**, D13202 (2009). [CrossRef]

**38. **R. E. Mamouri and A. Ansmann, “Separating mixtures of aerosol types in airborne HSRL data,” Atmos. Meas. Tech. **7**, 419–436 (2013). [CrossRef]

**39. **I. Veselovskii, O. Dubovik, A. Kolgotin, M. Korenskiy, D. Whiteman, K. Allakhverdiev, and F. Huseyinoglu, “Linear estimation of particle bulk parameters from multi-wavelength lidar measurements,” Atmos. Meas. Tech. **5**, 1135–1145 (2012). [CrossRef]

**40. **D. Müller, I. Veselovskii, A. Kolgotin, M. Tesche, A. Ansmann, and O. Dubovik, “Vertical profiles of pure dust and mixed smoke-dust plumes inferred from inversion of multiwavelength Raman/polarization lidar data and comparison to AERONET retrievals and in situ observations,” Appl. Opt. **52**, 3178–3202 (2013). [CrossRef]

**41. **M. de Graaf, A. Apituley, and D. Donovan, “Feasibility study of integral property retrieval for tropospheric aerosol from Raman lidar data using principle component analysis,” Appl. Opt. **52**, 2173–2186 (2013). [CrossRef]

**42. **O. Dubovik, A. Sinyuk, T. Lapyonok, B. N. Holben, M. Mishchenko, P. Yang, T. F. Eck, H. Volten, O. Muñoz, B. Veihelmann, W. J. van der Zande, J.-F. Leon, M. Sorokin, and I. Slutsker, “The application of spheroid models to account for aerosol particle nonsphericity in remote sensing of desert dust,” J. Geophys. Res. **111**, D11208 (2006). [CrossRef]