## Abstract

A shape dependent method for particle size distribution (PSD) estimation based on bulk scattering properties was elaborated. This method estimates the parameters of a particle size distribution with predefined shape from the bulk scattering spectra. The estimation routine was validated on simulated data of polystyrene in water suspensions. To investigate the effect of measurement errors on PSD estimates, a sensitivity analysis was performed. The influence of spectral resolution and range was rather limited. Good PSD estimations were obtained on noise-free spectra, spectra with limited random noise and for estimations on *μ _{s}* or ${\mu}_{s}^{\prime}$ in case of a multiplicative baseline. However, the PSD estimation deteriorated if an incorrect value for the refractive index of the particle relative to the medium was used as input parameter. Deviations caused by an incorrect distribution type were smaller for more narrow PSDs than for broader ones. Overall, this study showed the potential to estimate PSDs from bulk scattering spectra and indicated the factors affecting the accuracy.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

The particle size distribution (PSD) of a suspension or emulsion is an important quality characteristic, because it determines technical properties such as viscosity [1] or stability during storage. Moreover, the uptake of nutrients or active pharmaceutical ingredients during digestion is also related to particle size [2]. Therefore, PSDs are frequently measured in industry and research. As the different phases in an emulsion or suspension typically have a different refractive index, these samples are turbid. According to Mie theory, light scattering by a spherical particle can be described as a function of the particle size, wavelength and refractive indices of both medium and particle. The probability that light is deviated from its straight path in combination with the direction of scattering, results in a specific angular and wavelength dependent scattering pattern.

These characteristic light scattering patterns contain valuable information on the particle size. Therefore, current particle size distribution measurement techniques use this Mie scattering approach for PSD estimation of turbid samples. Laser diffraction methods measure the angular distribution of scattered light to determine the PSD [3]. This volume based technique requires the knowledge of the complex refractive indices of all components. The sensitivity is slightly biased towards large particles, because the volume of the particles determines their relative weight. To prevent particles from interfering with each other’s scattering pattern, samples should be highly diluted to reach a single scattering regime. Such destructive and off-line measurements at high dilutions may cause artefacts such as the disruption of particle clusters in the original samples. Another particle sizing technique is dynamic laser light scattering [3]. Contrary to diffraction methods, no information on the refractive index is required, but rather on the viscosity and temperature of the medium. This intensity based technique derives particle size from fluctuations in scattered light intensity caused by Brownian motion of the particles. Therefore, this technique is more suitable for submicron to micron particles. Just like in case of laser light scattering, the sample has to be diluted to prevent multiple scattered light from reaching the detectors. A third type of particle sizers are the number based methods or ’particle counters’. By leading the particles one after another through an orifice, the change in conductivity as a particle passes through the opening creates a signal proportional to the particle size [3]. For submicron particles, number based PSDs can also be obtained by (nano)particle tracking analysis. The light scattered by these small particles is captured as point scatterers on microscopic images. Tracking the Brownian motion of individual particles in consecutive images captured by a camera allows to link the amount of movement to the particle size [4]. Number based methods tend to give more weight to small particles compared to volume based methods. As reported by Goddeeris et al. [5] for the case of self-microemulsifying drug delivery systems, every technique has its strengths and weaknesses as mentioned above. Due to the limitations of these methods, the obtained PSDs should all be interpreted critically.

Although Mie theory has been defined for individual particles, it can also be used in a single scattering regime to calculate the contributions of individual particles to the total scattering, as used in laser light scattering measurements. Moreover, if the concentration is low enough for particles to not influence each other’s scattering behaviour, the scattering contributions of all particles can be combined similarly in the case of multiple scattering. To reach this independent scattering regime, samples do not have to be so extremely diluted as in case of single scattering. Aernouts et al. developed a tool to simulate bulk optical properties of polydisperse systems in the independent scattering regime based on Mie theory [6, 7]. For a known particle size distribution and refractive index of particles and medium, the bulk optical properties (BOP) can be calculated at the desired wavelengths. The bulk absorption coefficient (*μ _{a}*) is related to the chemical composition of the sample. Steponavicius & Thennadil used the bulk optical properties to obtain pure absorption spectra (

*μ*) for determining the concentration of absorbing components [8, 9]. On the other hand, microphysical characteristics such as particle size and concentration are linked to the bulk scattering coefficient (

_{a}*μ*), anisotropy factor (

_{s}*g*) and the derived reduced scattering coefficient $({\mu}_{s}^{\prime})$. For milk as example, Aernouts et al. [10] illustrated the relation between milk composition and the bulk absorption coefficient

*μ*on the one hand, and fat globule size, bulk scattering coefficient

_{a}*μ*and anisotropy factor

_{s}*g*on the other hand.

Since bulk scattering properties contain pure microphysical information, *μ _{s}*, ${\mu}_{s}^{\prime}$ and

*g*spectra have the potential to serve as a basis for estimating particle size distributions. However, such an inverse estimation of PSDs from optical measurements is an ill-conditioned problem that requires regularisation to obtain a physical meaningful solution. Several regularization techniques have been described in literature [11,12], which can be classified in two general approaches: shape independent and shape dependent methods. Firstly, the shape independent methods estimate the PSD as a weighted combination of basis functions imposing smoothness conditions on the solution, e.g. the Tikhonov regularization [13–17] or the iterative modified Chahine algorithm [18]. Secondly, there is the shape dependent approach in which the unknown PSD is assumed to be well-described by the parametric shape of a probability density function. In this case, the distribution parameters are estimated by matching the spectra of the estimates with the measured spectra in an optimization problem [14, 19]. Such a PSD estimation routine based on bulk scattering properties would require samples only to be diluted to an independent scattering regime, e.g. to a particle volume concentration below 3%, as reported by Aernouts et al. for the case of homogenized milk samples [7].

To estimate the PSD of real suspension or emulsion samples, the BOP should be determined experimentally based on reflection and transmission measurements. The most accepted method for BOP determination combines a double integrating sphere (DIS) set-up to measure simultaneously the total reflectance and total transmittance, with an unscattered transmission measurement [7,10,20]. If only one integrating sphere is used, the total reflection and total transmission are measured consequently by placing the sample respectively at the exit and entrance port of the sphere [8, 9]. Although the combination of DIS with unscattered transmission is widespread, Thennadil and Chen proposed some alternative ways to determine the BOP by making DIS measurements at multiple sample thicknesses to replace the unscattered transmission measurement [21]. To extract the bulk optical properties from the above-mentioned reflectance and transmission measurements, the inverse adding-doubling (IAD) routine is typically used. In this method, the adding-doubling approach for light propagation modelling is used iteratively to update the initial set of BOP until the reflection and transmission values match the measured values [22].

In order to calculate the BOP, the IAD routine requires input on the sample refractive index. This total sample refractive index can be calculated as a weighted combination of the refractive indices of the sample components if their relative quantity in the sample is known. Moreover, the difference in refractive index between particles and medium is used in the optimization step of the PSD estimation to calculate the scattering spectra of the intermediate solutions. For frequently used materials, accurate refractive indices can be found in literature. For more complex or rare materials, however, it is not always easy to find the correct value or wavelength dependency.

To examine the impact of uncertainty on these input parameters on the accuracy of the estimated PSDs, we designed a study to validate a shape dependent routine for estimating particle size distributions and particle volume fractions based on simulated bulk scattering spectra. To investigate how measurement errors and uncertainties on the input parameters of the estimation procedure affect the estimated PSD, multiple cases will be considered in a sensitivity analysis. The effect of spectral resolution, range and type of the scattering spectrum will be examined, as well as the consequences of random and baseline noise on the spectra, representing a decomposition of the measurement noise on BOP determined with e.g. a DIS set-up. Moreover, uncertainties on the level and wavelength dependency of the particle refractive index used in the IAD algorithm will be considered. As an important factor of the shape dependent PSD estimation method, the adopted distribution type will be varied to investigate the consequences of the selection of an incorrect distribution type. This systematic study of PSD-estimation sensitivity to disturbances will provide valuable insights into the effect of algorithm choices and measurement errors when estimating PSDs from spectroscopic measurements.

## 2. Materials and methods

#### 2.1. Inverse estimation of PSDs from bulk scattering properties

### Forward model

The bulk optical properties (*μ _{a}*,

*μ*, ${\mu}_{s}^{\prime}$ and

_{s}*g*) for a system of spherical particles suspended in a medium can be calculated using Mie theory. In the case of polydisperse systems, the contributions of all distinct particle sizes have to be taken into account. Therefore, the PSD has to be discretized when simulating the bulk optical properties for a known PSD. Each discretization interval is considered as a monomodal system, i.e. one particle radius per interval. The BOP of every interval are computed according to Mie theory, and the BOP of the polydisperse system are calculated as the weighted sum of all contributions. Instead of fixed discretization intervals, the forward simulation tool used here (developed by Aernouts et al. [6]), systematically increases the number of intervals until a predefined accuracy level on the BOP is reached. The exact parameter values of the lognormal reference PSDs used in the forward simulations are given in section 2.2. This routine is repeated independently for all selected wavelengths. All calculations and estimations were done in Matlab (version 9.1, The Mathworks Inc., Massachusetts, USA).

### Inverse estimation

Since the forward relation between particle size and bulk scattering properties given by Mie theory cannot be inverted analytically, the inverse estimation was formulated as an optimization problem. This ill-conditioned problem was regularized by adopting a shape dependent estimation method. In this method, the parametric shape of a probability density function (PDF) e.g. lognormal, normal or Weibull, is assumed to be a good approximation for the unknown PSD [14,19,23,24]. This way of regularization ensures the PSD estimate to be non-negative and smooth. The two distribution parameters (Table 1) are estimated simultaneously with the volume fraction (VF) of particles. The sum of squared relative errors between the input scattering spectrum and the spectrum corresponding to the estimated PSD is minimized using the cost function in Eq. (1), where *N _{λ}* is the total number of wavelengths in the spectrum.

Depending on the choice of input scattering spectrum, ${\mu}_{s}^{\prime}$ in Eq. (1) can be replaced by *μ _{s}* or

*g*. The log

_{10}was added to enhance the contrast in broad minima of the cost function, thereby reducing the possibility to strand close to the optimum. In the constrained optimization, distribution parameter values and VF were rescaled to respectively a lower and upper boundary of 0.5 and 1.5 to avoid possible effects of parameter magnitude. The optimization problem was solved with Matlab’s fmincon function for constrained nonlinear multivariable problems using the active set algorithm [25]. Multiple starting points (15) were used to reduce the possibility of a local minimum to be selected as final solution. The starting points are the corners (8), the centres of the sides (6) and the centre of the cubic solution space spanned by the parameter boundaries. The solution with the overall lowest cost was selected as final PSD estimate. In Fig. 1, a schematic overview of the inverse estimation routine and the link with forward BOP simulations is given.

#### 2.2. Validation on unperturbed simulated BOP

The PSDs to start from in the forward simulation were defined as lognormal PSDs with a nominal diameter mode of 0.1, 0.2, 0.5, 1, 2, 3, 4.5, 6, 10 and 12 μm. For every diameter mode, three distribution widths were considered (*σ* of 0.1, 0.2 or 0.3), while maintaining a constant coefficient of variation for all PSDs per distribution width level. In total, this resulted in 30 reference PSDs of which the exact distribution parameters are listed in Table 2. When referring to these reference PSDs, their nominal diameter mode will be used followed by 1*σ*, 2*σ* or 3*σ* to indicate the distribution width (respectively narrow, medium or wide). Note that although the lognormal distributions were formulated as a function of the particle radius, the PSDs in the results section are plotted with the particle diameter on the x-axis for consistency with naming.

The volume fraction particles was set at 1% v/v, to stay in the independent scattering regime, where the relationship between *μ _{s}* and VF is linear. This linear relationship was also present in the BOP simulated with the microscale light propagation tool of Aernouts et al. [6], as the independent scattering assumption is embedded in this tool. To check for a corresponding linearity in VF estimates, all PSDs were estimated based on

*μ*spectra at particle concentrations of 0.5-2% v/v. Contrary to

_{s}*μ*,

_{s}*g*is concentration independent in the independent scattering regime and therefore, no VF estimates could be made based on

*g*spectra.

The complex refractive index of water was used as reported by Segelstein [26]. For polystyrene particles, a fit of the Sellmeier dispersion equation to the data of Sultanova et al. [27] was used as real part of the refractive index [28]. The imaginary part related to absorption was fixed at zero, which is a reasonable assumption when the imaginary part of the refractive index is very small compared to the real part [29]. The unperturbed noise-free BOP were calculated in the 0.5–1.85 μm wavelength range with a 1 nm resolution.

#### 2.3. Sensitivity analysis

### Distribution type

The PSD estimation algorithm (see 2.1) allowed both lognormal, normal and Weibull distributions to be selected as type of PSD estimate. For every distribution type, lower and upper parameter boundaries of the constrained optimization (see Table 3) were defined as respectively 80% of the minimal and 120% of the maximal reference parameter values. For lognormal estimates, reference values were those of the lognormal reference PSDs (Table 2). In case of normal and Weibull estimates, reference values were found as the least-squares solutions when matching the function values of a normal or Weibull distribution directly to those of the lognormal reference PSDs. Only the upper boundary of *b* (Weibull) was chosen slightly larger, because a preliminary test showed that the original upper boundary of 120% of the maximal *b* -value was often restrictive. The performance of the three distribution types was evaluated on PSD estimates based on spectra with both a 1 nm and 10 nm wavelength resolution.

### Scattering spectrum type, wavelength resolution and range

To examine the effect of scattering spectrum type on the PSD estimation routine, all PSDs were estimated based on *μ _{s}*,

*μ*

_{s}^{0}and

*g*spectra. To assess the effect of spectral resolution, PSD estimates based on spectra with a 1 nm and 10 nm resolution were compared. The role of spectral range was inspected by using the range of 0.5–1.0 μm wavelength (’Vis’), 1.0–1.85 μm (’NIR’) or the full 0.5–1.85 μm range (’VisNIR’). This division in Vis and NIR is based on detector ranges, as for visual light and up to 1000 nm wavelength Si detectors can be used, while for wavelengths above 1000 nm InGaAs detectors are typically used.

### Noise on bulk scattering spectra

To mimic the noise on BOP determined with a DIS set-up, three specific types of noise were defined as random variations or baseline effects. First, random noise was derived from experimentally determined BOP for a range of liquid optical phantoms with Intralipid as scattering agent and Naphtol Blue-Black as absorber (similar to the optical phantoms in [30]). This resulted in Gaussian random noise with mean zero and standard deviation 0.0018 for *μ _{s}*-spectra, 0.0049 for ${\mu}_{s}^{\prime}$-spectra and 0.0052 for

*g*-spectra. These standard deviations were halved and doubled to create in total three levels of random noise. Secondly, a multiplicative baseline effect was created by multiplying the complete spectrum with 90%, 95%, 105% or 110% (i.e. ±5 & ±10%). Thirdly, the spectral mean over all wavelengths was calculated. By adding ±5% or ±10% of this mean to the complete spectrum, four levels of additive baseline were produced. The baseline variations mimicked experimental errors in particle concentration or sample thickness.

For all influencing factors considered, every PSD estimate was made in 6-fold: one time on a noise-free spectrum (only variation in influencing factor of interest) and 5 times with standard random noise added. In the case of random noise being the influencing parameter, five estimates were made for each noise level.

### Variation in refractive index

BOP simulations over the 0.5–1.85 μm range with a 10 nm resolution were performed for different values of the refractive index of polystyrene to investigate the effect of uncertainties on the particle refractive index values or uncertainties on the wavelength dependency of this refractive index. Baseline variation was introduced by multiplying the refractive index spectrum with 0.95, 0.98, 0.995, 1.005, 1.02 or 1.05. The slope of the refractive index spectrum was altered by fixing the value at 1.17 μm (*n _{fixed}*), multiplying the difference between the maximum (

*n*and the fixed value, respectively the fixed value and the minimum (

_{max}*n*), with a

_{min}*factor*of 0.5)or 0.8 to ’squeeze’ the spectrum and with 1.2 or 1.5 to ’stretch’ the spectrum (Eq. (2) – (5)).

### Combined cases

In practice, it is not unlikely that several of the variations discussed above occur together. Therefore, three combined cases were defined as examples. First, a lognormal PSD of 3 μm 1*σ* was estimated on a *μ _{s}* spectrum with a +10% multiplicative baseline and a −2% baseline on the particle refractive index. The second case consisted of a Weibull distribution estimate for the 10 μm 2

*σ*PSD based on a

*g*spectrum with 1× random noise. The last case was a normal distribution estimate of the 4.5 μm 3

*σ*PSD based on a ${\mu}_{s}^{\prime}$ spectrum with a +3% additive baseline.

### Accuracy of PSD estimation

The accuracy of estimated lognormal PSDs and VF was evaluated by comparing parameter values estimated on the BOP, both noise-free and noisy, with the parameters used for the forward simulation. In case of normal and Weibull distributions, no direct comparison of estimated parameter values and lognormal reference values can be made. Therefore, more general distribution characteristics, such as the mode, D10, D50, D90 (respectively the 10th, 50th and 90th percentile), d32 (Sauter mean diameter), d43 (De Broucker mean diameter), span and skewness were considered.

PSD estimates and reference PSDs were compared in a non-parametric one sample Kolmogorov-Smirnov test (KS test) with a significance level *α* of 5% [31]. For the PSDs estimated in five-fold based on spectra with standard random noise, the average estimated parameter values were used. Given the lack of actual measurements, measurement points were defined as 25 points equally spaced on a logarithmic scale between the 1% and 99% percentile of each reference PSD. This way, a limited set of points in a relevant particle radius span could capture the curvature of the PSD correctly. Thereby, it is avoided that good estimates would be found significantly different from the reference PSD due to a too low resolution. As the reference PSDs followed a lognormal distribution, an additional *t*-test was made on the distribution parameters and VF estimated in five-fold on spectra with standard random noise.

## 3. Results and discussion

The simulated *μ _{s}* and

*g*spectra varied in appearance from smoothly decreasing with increasing wavelength for submicron particles (Fig. 2(a)), to a wavy shape for narrow distributions of 6–12 μm (Figs. 2(c)–2(d)). This oscillating behaviour decreases with increasing polydispersity of the PSD. Moreover, in ${\mu}_{s}^{\prime}$ the oscillations in

*μ*and

_{s}*g*cancel out, resulting in rather smooth spectra (not shown). The bulk absorption coefficient (

*μ*) spectra were identical for all PSDs considered since the same refractive indices and particle concentrations were used in the simulations. Although these were not used for PSD estimations, some examples are shown in Fig. 2(b).

_{a}#### 3.1. Linear relation between particle concentration and μ_{s}

The estimates for the volume fraction particles obtained from *μ _{s}* spectra multiplied with a factor 0.5–2 were close to the 1% v/v VF multiplied with the respective factor (

*R*

^{2}> 0.9999), as shown in Fig. 3(a). Estimated

*μ*and

*σ*were only marginally affected (

*μ*not shown), but in general the estimated peak position of the PSD (related to

*μ*) proved to be slightly more accurate than the distribution width (related to

*σ*). This can be seen in the coefficients of variation (CV) in Table 4, where the CV for

*μ*is slightly lower than the CV for

*σ*, if the 0.1 μm 1

*σ*PSD is left out of consideration because of the more difficult estimation, as explained in the following paragraph.

This linear relationship between VF and *μ _{s}* is in agreement with the assumed independent scattering regime, in which e.g. a doubling of the particle concentration results in a

*μ*- spectrum multiplied by two. Therefore, estimating a PSD at different concentration should not affect the estimated distribution parameters, while the VF estimates vary according to the concentration. Only for the smallest particle size (0.1 μm) small deviations from this linear relationship can be noticed. This is more pronounced in distribution parameter

_{s}*σ*, as leaving out the PSD of 0.1 μm 1

*σ*resulted in a clear drop in CV (Table 4). These deviations may be explained by the limited effect of light scattering by these small particles in the considered wavelength range, which complicates the optimization.

Based on these findings, further PSD estimations were limited to a VF of 1% as the assumption of independent scattering was found to be valid for the tested concentrations. For concentrations > 2%, the effect of dependent scattering should be taken into account. Such a correction for light scattering in concentrated systems was for example incorporated in a PSD estimation routine by Frontini et al. [14]. They used the Pedersen local monodisperse approximation for dense systems with a low refractive index difference between particles and medium.

#### 3.2. Choice of distribution type

In practical applications, the exact PSD shape is unknown, but we typically have an idea of the expected shape and probability density distributions which could provide a reasonable approximation. For example, Byat et al. reported a wide range of distributions to model soil samples [32], while the lognormal distribution has been used for stratospherical particles [33]. When applying a shape dependent method, it is not guaranteed that the chosen probability density function can provide a reasonable approximation of the actual PSD. Therefore, the three distribution types incorporated in the PSD estimation algorithm were compared for estimating the lognormal reference PSDs (Fig. 4).

The estimated lognormal distributions almost perfectly match the reference PSDs. The symmetric normal distributions on the other hand, were incapable of fitting the right-tailed reference PSDs. Estimates of narrow distributions can still be relatively accurate, but the wider the reference, the more erratic it will be. This is clear when comparing skewness values: normal distributions have per definition a skewness of zero, which results in a systematic underestimation. In general, the D10 was underestimated because of the symmetric widening of the normal estimates when trying to fit the right tail. The D90 value showed a larger underestimation for (middle) large particles. The underestimated span and overestimated mode of particles ≥1 μm resulted in too narrow PSDs shifted towards larger sizes. The VF is underestimated to compensate the higher scattering of the too narrow estimates. For submicron particles on the other hand, the mode was estimated relatively accurate, while the span and VF were overestimated. The d32 and d43 were in general more underestimated for wider PSDs, especially for 10–12 μm.

When the Weibull distribution was used in the estimation, the long right tail of lognormal PSDs was also not fitted well. The skewness deviates strongly, resulting particularly in an overestimation for submicron particles. Note that the logarithmic x-axis in Fig. 4 might be somewhat misleading. The estimated *μ*, *σ* and VF of submicron and larger particles show the same trends as those described for normal distributions. Again, the D10 was typically underestimated with large deviations for 1, 10 & 12 μm estimates based on *μ _{s}*, and for 1–2 μm based on

*g*. The D90 estimates based on

*μ*deviated for 1 μm particles, those based on ${\mu}_{s}^{\prime}$ for large and wide PSDs, and those based on

_{s}*g*deviated for 1–2 μm particles. To conclude, d32 and d43 showed strong deviations in the same cases as both D10 and D90, and an increased underestimation with PSD width.

The normal and Weibull estimates deviated already substantially from the reference PSDs in case of spectra without and with standard random noise. According to the KS test, only three Weibull estimates were not significantly different from the reference: 10 μm 3*σ* (on ${\mu}_{s}^{\prime}$, p=0.0904; on *g*, p=0.6629) and 12 μm 2*σ* (on ${\mu}_{s}^{\prime}$, p=0.1594). Therefore, it was considered unlikely that their performance would improve in the case of more noisy spectra. For this reason, the results discussed in the following sections will be limited to lognormal PSD estimates.

#### 3.3. Wavelength resolution

PSD estimations made on scattering spectra with a 1 nm and 10 nm resolution were compared. Based on the results from the *t*-test on estimated parameter values, it was concluded that both resolutions allow to make correct estimations of the distribution parameters and VF. The 1 nm resolution did not systematically outperform estimations based on 10 nm resolution spectra. According to the KS test, only in two cases the estimates based on a 10 nm resolution were categorized as different from the reference, while the 1 nm estimates were not (Fig. 5). This was the case for the PSD of 0.1 μm 1*σ* estimated on ${\mu}_{s}^{\prime}$ with standard random noise (p=0.005 for 10 nm resolution) and 3 μm 3*σ* estimated on *g* with random noise (p=0 for 10 nm resolution). Since the 1 nm resolution resulted in better estimates for only two PSDs, of which the respective scattering spectra contain only limited size information (see section 3.7), all further analyses were done for a 10 nm spectral resolution.

In general, including more wavelengths in a spectrum increases the amount of information and can help to make the PSD estimation more robust. This is in line with the findings of Frontini et al. who reported increasing errors on estimated distribution parameters as the number of data points was reduced [14]. However, a slightly lower spectral resolution of these smooth scattering spectra may still perform equally well, while reducing the simulation or measurement time needed.

Distribution parameters estimated on *g* spectra show large deviations for PSD’s with mode 2–3 μm, both in case of noise-free spectra as in case of random noise. Parameter *μ* (~ peak position) is largely overestimated, while *σ* (~ distribution width) on the other hand is underestimated. This will be discussed in more detail in section 3.7. For 10–12 μm particles, a larger standard deviation in case of random noise is present for the 10 nm resolution, especially for estimated *σ*. This indicates that random noise is less disturbing for estimates of *μ* compared to *σ* estimates, as *μ* is mainly related to the level of the *g* spectrum. On the other hand, *σ* is more related to the smoothness of the *g* spectrum. Since spectra for these large particles show little variation in magnitude over the complete wavelength range, small disturbances by random noise can complicate a correct estimation of the distribution width.

#### 3.4. Wavelength range: Vis, NIR or both?

For PSD estimates based on *μ _{s}* and ${\mu}_{s}^{\prime}$ spectra, three zones of particle size can be distinguished in the parameter values estimated on the Vis, NIR and Vis/NIR range. The estimates for small particles (0.1–0.2 μm) showed clearly more deviations in both distribution parameters and VF in case of the NIR range (Fig. 6 left). Large particles (10–12 μm) on the other hand show a larger spread for estimates on the Vis range, although the trend is less clear in case of ${\mu}_{s}^{\prime}$ compared to the example for

*μ*shown in Fig. 6 (right). However, both distribution parameters and VF estimated based on ${\mu}_{s}^{\prime}$ in the Vis range tend to deviate for 10 μm particles. Thirdly,

_{s}*μ*was well-estimated for the medium size particles (0.5–6 μm). For

*σ*and VF estimated on ${\mu}_{s}^{\prime}$, errors occurred in the Vis range for 1 μm to 2 μm 2

*σ*, while mainly in the NIR range for 2 μm 3

*σ*– 6 μm. For estimates based on

*μ*, deviations occur in the NIR range for

_{s}*σ*(0.5-3 μm) and VF (1–3 μm), and in the Vis range for the VF of 6 μm particles.

As mentioned in section 3.3, PSD estimations based on *g* are poor for medium size particles (2–4.5 μm). PSDs with a mode of 2 μm had a strongly overestimated *μ* and a mostly underestimated *σ*. Particles of 3–4.5 μm showed similar strong deviations for estimations based on NIR spectra (Fig. 6 middle). In case of (sub)micron particles, *μ* was well estimated and slightly larger variations occurred for *σ* in case of NIR for 0.1 μm – 0.2 μm 1*σ*, and Vis for 0.5–1 μm. Estimated *μ* for 6–12 μm particles deviated only very little, while *σ* had larger deviations for Vis estimates.

When comparing the estimates based on the three wavelength ranges, it was hypothesized that the Vis range is more suitable for small particles, the NIR range for large particles and VisNIR performs generally well. This hypothesis was based on the higher scattering of small particles at lower wavelengths, while vice-versa the larger particles scatter more at higher wavelengths. Although these trends were visible in the data, the hypothesis could not be confirmed by the KS test. It only indicated a significant difference with the reference PSD for some 0.1 & 0.2 μm PSDs in case of *μ _{s}* and ${\mu}_{s}^{\prime}$, and 12 μm in case of

*μ*(noise-free: p=0; with noise: p=0.0074). Further, 2–3 μm estimates for NIR and Vis/NIR differed, and 4.5 μm for NIR in case of

_{s}*g*with noise (all p=0). Neither the

*t*-tests confirmed the expected pattern in the estimated parameter values.

#### 3.5. Noise on bulk scattering spectra

### Random noise

PSDs and VF estimated on *μ _{s}* spectra produce quite accurate results with a small (to very small for

*μ*) s.d. on the estimated parameter values and a small deviation of the mean estimated parameter compared to the reference value. Only for 0.1 μm particles a bit larger errors occur, since these sizes approach the lower detection limit (as also for ${\mu}_{s}^{\prime}$, Fig. 7 left). In general, higher noise levels result in increased uncertainty on the parameter estimates, as shown in Fig. 7 middle. No systematic pattern of over- or underestimation is visible, which is in agreement with the random nature of the noise. Important to notice for estimations on ${\mu}_{s}^{\prime}$ are the increased deviations and s.d. for

*σ*in the 2–3 μm particle range.

For PSDs estimated on *g* spectra, the results for 2–3 μm particles are again poor, e.g. Fig. 7 right. On the other hand, small and large particles (0.1–1 μm and 4.5–12 μm) show (very) small errors and s.d. on *μ* and *σ*. The s.d. increased slightly with increasing noise level, and somewhat larger errors are found for *σ* for 1 & 12 μm particles with 2*σ* and 3*σ* width.

According to KS tests on average estimated distribution parameters, nearly no PSD estimate was significantly different from the reference PSD. For PSD estimations based on *μ _{s}* and ${\mu}_{s}^{\prime}$, only the 1 μm 1

*σ*estimate based on

*μ*(p=0.0168) and ${\mu}_{s}^{\prime}(\mathrm{p}=0)$ were significantly different, both in case of the highest noise level. For PSD estimates based on

_{s}*g*in the 2–3 μm size range, all estimates were significantly different from the reference PSD (p=0), except for the 3 μm 1

*σ*PSD with 0.5× and 1× noise and 3 μm 2

*σ*with 1× noise (all p> 0.99).

*t*-test results on estimated parameters show also no consistent significant difference between estimates and reference for higher noise levels. Lack of such pattern can be explained by random noise being a non-systematic perturbation. A larger magnitude of the random noise can deteriorate PSD estimates in two ways. Firstly, the mean estimated parameters can differ more from reference values and secondly, the s.d. on the estimated values can increase. This larger s.d. does not necessarily imply a poor average parameter estimate, as was the case here. Averaging out multiple estimates can reduce the influence of random noise, or a certain level of noise can be taken into account. For example, Jumelet et al. did this on Lidar backscatter data by applying a cluster filtering on look-up tables to restrict the possible solutions instead of directly solving the least squares optimization [33].

### Multiplicative baseline error

For estimations based on *μ _{s}* and ${\mu}_{s}^{\prime}$ spectra, all PSD estimates are qualified as good according to the KS test (

*μ*: p≥ 0.8783 for PSDs > 1 μm; ${\mu}_{s}^{\prime}:\mathrm{p}\ge 0.0602$). This is supported by the

_{s}*t*-tests as the majority of estimated and reference distribution parameters did not differ significantly, whereas the estimated VF for PSDs 0.2–12 μm on the other hand all did (0.0001 × 10

^{−8}≤ p≤ 0.0013). Looking at the actual values, there are only small deviations in estimated

*μ*and

*σ*for the smallest PSD (0.1 μm). The VF is for all PSDs over/underestimated for a positive/negative baseline proportional to the baseline magnitude. This is in agreement with the linear relationship between the particle concentration and

*μ*discussed in 3.1, and thereby also between ${\mu}_{s}^{\prime}$ and VF. The estimated distribution parameters remain indeed unaffected (Figs. 8(a)–8(b)).

_{s}For PSD estimations based on *g*, only some submicron PSDs were estimated correctly according to the KS test (p≤ 0.0156 for PSDs ≤ 1 μm, except 1 μm 3*σ*). Again, this is supported by the *t*-tests showing that all estimates of *μ* (0 ≤p≤ 0.0127) and all but two estimates of *σ* were significantly different from reference values (0 ≤p≤ 0.0176 except 0.1 μm 1–2*σ*). Three distinct particle size groups were present, namely small particles of 0.1–1 μm, medium size of 2–1.5 μm and large particles of 6–12 μm. Representative examples are shown in Figs. 8(c)–8(h). Therefore, a negative baseline results in an underestimation of the particle size as smaller particles have a lower anisotropy factor (more isotropic scattering). Analogously, a positive baseline results in an overestimated particle size as larger particles scatter more forwardly, and the smaller the baseline, the smaller the error. This indicates that the estimation routine focuses rather on the average level of the input *g* spectrum than on the spectrum shape.

### Additive baseline error

In case of PSDs estimated on *μ _{s}* or ${\mu}_{s}^{\prime}$ spectra with an additive baseline, roughly two groups can be distinguished: a group of submicron particles and one of medium to large particles. The first group contains particles of 0.1–1 μm in case of

*μ*and 0.1–0.2 μm in case of ${\mu}_{s}^{\prime}$. Estimates for

_{s}*μ*are too low for negative baselines and too high for positive baselines. The same trend, but more extreme, can be observed for

*σ*as the parameter boundaries were reached (Figs. 9(a) and 9(b)). Estimated VFs show mainly strong underestimations with a strong overestimation at −5% baseline. Although, for 0.5–1 μm particles in case of

*μ*and 0.2 μm in case of ${\mu}_{s}^{\prime}$ other patterns are observed.

_{s}The second group consists of 2–12 μm particles for *μ _{s}* and 0.5–12 μm for ${\mu}_{s}^{\prime}$. Estimated

*μ*,

*σ*and VF show the same trend: under- or overestimations according to a negative or positive baseline, but the deviations for

*μ*were much smaller compared to those for small particles (Figs. 9(c)–9(d)). The only exception for

*μ*is the 10 μm 3

_{s}*σ*PSD showing an opposite course in

*μ*and

*σ*and deviating estimates for

*σ*based on ${\mu}_{s}^{\prime}$ spectra for 1–3 μm PSDs. This second group of particles are according to KS tests not significantly different from the reference PSDs, except for 3 μm 1

*σ*with a ±10% baseline and 10 μm 1

*σ*for a +10% baseline in the case of ${\mu}_{s}^{\prime}$.

An additive baseline shifts the complete spectrum up- or downward with a fixed value. The spectra are not shifted over wavelengths, which causes especially *μ* to still be estimated relatively correct for PSDs > 1 μm. On the other hand, the effect on VF is very similar to that of a multiplicative baseline (see previous section). However, unlike for multiplicative baselines, the absolute difference between minima and maxima remains unchanged. Therefore, the relative spectrum shape is not preserved exactly, which could explain the inaccurate *σ* estimate.

For PSD estimations based on *g* spectra, the complete PSD range can be divided in four groups. Submicron particles of 0.1–0.2 μm show an overestimation of *μ* in case of negative and an underestimation in case of positive baselines. Estimates for *σ* show the opposite and more extreme trend reaching lower and upper parameter boundaries. The second group consists of particles 0.5–1 μm. In case of a negative baseline *μ* is underestimated and *σ* overestimated (Figs. 9(e)–9(h)). For the middle sized particles of 2–4.5 μm and the largest particles of 6–12 μm the same trend can be observed as for multiplicative baselines, as shown in Figs. 8(e)–8(h). According to the KS test, only a set of submicron PSDs in case of a ±5 % baseline on the spectra can be labelled as not significantly different from the reference (p≤ 0.0182 for PSDs > 2 μm).

In general, the same reasoning as for multiplicative baselines can be adopted since again the level of the *g* spectrum is linked to particle size. The lack of link with particle concentration might be an opportunity to test an approach similar to the one described by Kim et al., where an unknown baseline error on the measurements was estimated simultaneously with the PSD [34].

#### 3.6. Variation in refractive index

### Baseline error on refractive index

When *μ _{s}* or ${\mu}_{s}^{\prime}$ spectra are used as input for PSD estimation, the results can be divided in roughly two regions based on

*μ*and VF. The smallest particles (0.1 μm) have an underestimated

*μ*and a severely overestimated

*σ*in case of a negative baseline, while the opposite occurred for positive baselines. No such straightforward pattern was visible for VF, probably because the particle size approximates the limitations for estimations based on these types of scattering spectra.

The rest of the PSDs (0.2 & 2–12 μm) show the same trends for *μ* as described for 0.1 μm. Exceptions are the 0.5 μm particles, and the 1 μm ones in case of ${\mu}_{s}^{\prime}$ as they show the opposite trend. All VF values were underestimated for negative refractive index baselines and overestimated for positive baselines. Although the degree of over- or underestimation was proportional to the baseline magnitude, the deviations were larger than in case of ${\mu}_{s}^{\prime}$ spectra. The estimations for *σ* show the largest variation, but in general *σ* is overestimated for negative baselines on the refractive index and underestimated for positive baselines (Figs. 10(a)–10(d)).

KS tests indicate that most of the PSDs for nominal particle sizes larger than 0.1 μm estimated on spectra with non-extreme baselines, are not significantly different from the reference PSDs. *t*-test results on the other hand, show that practically no VF estimate could be qualified as not significantly different and for *μ* and *σ* only in case of just a few submicron and 10–12 μm particles, mostly in combination with non-extreme baseline magnitudes. Despite these estimates being classified as significantly different from reference and the seemingly unstructured *σ* estimates, the following reasoning can explain the results. To match the case of polystyrene in water, particles are assumed to have a higher refractive index than the medium. If the refractive index of the particles is shifted downward, the refractive index difference between particle and medium decreases. This results in *μ _{s}* spectra with scattering maxima shifted to lower wavelengths. Since higher scattering in the lower wavelength range is typical for small particles, the particle size is underestimated. However, an underestimated

*μ*also makes the estimated PSD more narrow. The overestimation of

*σ*seems to be a reaction to this in an attempt to reach the original distribution width. The increased scattering of the estimated smaller particles was countered by an underestimated VF. An analogue but opposite explanation could be elaborated for a positive baseline error on RI.

Compared to results based on *μ _{s}* or ${\mu}_{s}^{\prime}$, PSD estimates based on

*g*are showing large and unstructured deviations, especially for particles 1–12 μm. For small particles, the deviations in

*μ*are rather small (Figs. 10(e)–10(f)). For 0.1 μm 3

*σ*– 0.5 μm 2

*σ*,

*μ*is underestimated for negative baselines and overestimated for positive baselines. Particles in the range 0.5 μm 3

*σ*– 1 μm on the other hand show the opposite trend. Finally,

*μ*was overestimated in case of a negative and underestimated in case of a positive baseline for 10–12 μm particles (Figs. 10(g)–10(h)). For 2–6 μm particles,

*μ*is overestimated in general, but more pronounced for negative baselines. For particles 0.1–1 μm, over- or underestimation of

*σ*was proportional to the baseline magnitude, even though the correlation (positive or negative) switched. For larger particles (2–12 μm), no obvious relationship with baseline magnitude was found.

For particles ≤ 1 μm, despite the small deviations, estimates are labelled as not significantly different from the reference PSD according to the KS test. The same can be concluded for particles of 4.5–12 μm if the baseline on the refractive index is ±0.5%.

Whereas in case of *μ _{s}* and ${\mu}_{s}^{\prime}$ the VF could be used to compensate for effects of slightly changed distribution parameters in the optimization,

*g*is independent of VF. Therefore, all deviations and errors have to be compensated by changing distribution parameters. A negative baseline on the refractive index shifts the

*g*spectrum slightly upwards and maxima are shifted towards lower wavelengths, although the effect is mainly noticeable for particles > 1 μm. In line with the above explanation for

*μ*spectra, the higher anisotropy factors (more isotropic scattering) are related to smaller particles compared to the wavelength. Analogously, a positive baseline on the particle refractive index results in an overestimated particle size.

_{s}The large effect of refractive index on scattering spectra and particle size retrieval observed in this study is in line with the strong dependency of extinction coefficients described by Ma et al. [35] and the PSD estimation example of Glatter & Hofer [36]. Ren et al. [40] showed on simulated data that diffuse reflectance and transmission are rather insensitive to PSD parameters, but strongly dependent on the complex refractive index. Therefore, they elaborated a routine to first estimate the refractive index on the aforementioned spectra. Afterwards, the obtained values were used in a secondary optimization to find the distribution parameters based on the collimated transmission. Alternatively, the particle refractive index could also be determined based on particle composition and environmental conditions, as applied by Jumelet et al. for stratospheric particles [33].

### Error in wavelength dependency of refractive index

In Fig. 11 bottom, the estimates for the 12 μm 3*σ* PSD estimated based on ${\mu}_{s}^{\prime}$ spectra are given. In case the slope of the refractive index spectrum was decreased, the effect is mainly visible in the first part of the spectrum as it is lower and shifted to the left. Consistent with the effect of a negative baseline on the refractive index discussed in the previous subsection, both *μ* and VF are underestimated to match the input spectrum.

This change in slope of the wavelength dependent refractive index can be regarded as a special case of the baseline shift. Instead of adding or subtracting a fixed value to the entire spectrum, rather a (nonlinear) baseline is applied to both parts of the spectrum separated by a fixed point (here at wavelength 1.17 μm). If the slope of the refractive index spectrum is reduced, the first part experiences a negative baseline, while the second part undergoes a positive baseline. The final effect on the estimated PSD and VF depends on the baseline in the part of the spectrum that is most influenced by the altered slope. However, it is not so straightforward in all cases. For example, when estimating the PSD 0.2 μm 3*σ* on a *μ _{s}* spectrum (Fig. 11 top), the shift to lower or higher wavelengths caused by a change in the slope of the refractive index spectrum is misinterpreted as a multiplicative baseline. This is caused by the similar shape of the spectra with and without variation in refractive index slope. Therefore, the VF is underestimated. To compensate the otherwise too low scattering in the second part of the spectrum,

*μ*is overestimated (in contrast to expectation).

A third example is given by the estimation of a 2 μm 1*σ* PSD on a ${\mu}_{s}^{\prime}$ spectrum (Fig. 11 middle). The effect of a ’squeezed’ refractive index is more visible in the first part of the spectrum since it is clearly lower. The shift to the left according to a negative baseline on the refractive index should lead to an underestimation of *μ*. However, this shift removes the onset of the scattering peak for small particles from the considered wavelength range. Consequently, the fraction of small particles is underestimated by a too high *μ* to match the shape of the spectrum. To compensate for the lower scattering level, VF is overestimated. These examples show that no general rule can be defined for the effect of a variation in slope of the refractive index on the estimated PSD. For each case, an inspection of the actual spectra is required to clarify the effect on PSD estimation.

#### 3.7. Type of input scattering spectrum

From previous sections, it is clear that although *μ _{s}*, ${\mu}_{s}^{\prime}$ and

*g*spectra can be used similarly, they don’t perform equally well for all PSD estimations. When

*μ*is used, the estimated PSDs were less correct for the smallest particles (0.1–0.2 μm, Fig. 12(a)) and also slightly deviating for the largest particles (10–12 μm, Fig. 12(c)). In case of the submicron particles, this can be attributed to their main scattering activity lying in the UV wavelength range. For broader PSDs, the

_{s}*μ*spectra are more flattened and loose some of the ’signature’ peaks associated with monodisperse systems.

_{s}Secondly, PSD estimations based on the anisotropy factor *g* were expected to perform well over the entire particle size range. Estimations are indeed relatively accurate, except for medium sized particles (2–3 μm, Fig. 12(b)). This might be due to their overall high and relatively flat *g* spectra. Moreover, the spectra are independent of particle concentration (due to the independent scattering regime) and therefore no VF estimation can be made. A second consequence is that PSD estimations are more prone to errors due to baseline noise on the spectra, since such shifts are not linked to the particle concentration, but directly to particle size.

Finally, the information contained in ${\mu}_{s}^{\prime}$ spectra can be regarded as a combination of *μ _{s}* and

*g*, as ${\mu}_{s}^{\prime}={\mu}_{s}\times \left(1-g\right)$. Therefore, its performance inherits both advantages and disadvantages of the properties described above. The estimated PSDs deviate more for small particles cf. estimates based on

*μ*, while for medium size particles the distribution parameter

_{s}*σ*was poorly estimated cf. estimates based on

*g*. Moreover, the linear relationship between particle concentration and VF, discussed in 3.1 for

*μ*, is also valid for ${\mu}_{s}^{\prime}$. Because of this, PSD estimation on ${\mu}_{s}^{\prime}$ is less prone to baseline noise on the spectra than estimation based on

_{s}*g*.

#### 3.8. Combined cases

The previous sections discussed the effects of deviations in the BOP spectrum on the estimated values for PSD parameters and particle concentration for separate cases. For the examples of combined variations, Fig. 13 shows the PSD estimates with corresponding scattering spectra. The time needed for these inverse estimations was respectively 14.2, 27.3 and 45.8 seconds for case 1, 2 and 3 on a desktop pc with i5 processor and 4GB RAM using Matlab’s *parfor* option for parallel computing. The individual run-time of an inverse estimation may vary, but in general estimations on *μ _{s}* were the fastest.

In the first case, the multiplicative baseline does not influence the estimated distribution parameters, which is in agreement with the results presented in section 3.5. The negative baseline on the refractive index results in an underestimation of *μ* and overestimation of *σ*, as can be seen in Fig. 13 (blue), because the estimated PSD is shifted to the left. Most interesting is the almost complete cancellation of the VF underestimation due to the negative baseline on the refractive index by the overestimation caused by the positive multiplicative baseline (Table 5). This is important since it shows different errors can (partially) cancel out one another and thereby conceal their presence. However, it should be noted that they could also amplify each other.

For the estimation on *g* in the second example, the same characteristics can be noticed as those reported in section 3.2 for estimating a Weibull distribution on *μ _{s}*. The mode is overestimated, while the span and D10 are underestimated, as can be seen from the green lines in Fig. 13. The effect of random noise is almost negligible compared to the incorrect distribution type.

In the third case consisting of a normal distribution estimated on a ${\mu}_{s}^{\prime}$ spectrum, the main effect is the positive additive baseline due to which the VF is overestimated, as expected, proportional to the 3% baseline. With respect to the estimated distribution parameters, the characteristics of estimating a normal distribution on a lognormal PSD are clearly present (Fig. 13 purple): the overestimated mode, a too long left tail (underestimated D10) and a too short right tail. The additive baseline has only a small additional effect on the distribution parameters compared to the large effect of the distribution type.

#### 3.9. General discussion

Typically, experimentally determined BOP have a set-up specific coloured noise as a consequence of measurement noise or errors in sample preparation affecting the reflection and transmission spectra. However, this noise can be modelled by a combination of the noise types investigated separately in this sensitivity analysis. The random noise added was derived from *μ _{s}*, ${\mu}_{s}^{\prime}$ and

*g*spectra determined by DIS measurements in combination with IAD. The results showed that, when averaging five repetitions, the PSD estimation routine was able to handle 0.36% random noise on

*μ*spectra, 0.98% on ${\mu}_{s}^{\prime}$ and 1.04% on

_{s}*g*spectra (twice the standard level defined in 2.3). These noise levels were determined from measurements on model systems consisting of a stable scattering agent and an absorbing ink. Noise levels on BOP of less standardised or biological samples can reach higher levels. Possibly, average PSD estimates can handle these higher random noise level, but individual estimates can deviate substantially. For comparison, He et al. achieved still acceptable monomodal shape dependent estimates at 10% random noise in the extinction method [47]. Sun et al. obtained good estimates at a level of 2% random noise using the moment method, although they also found that the distribution parameter related to distribution width was more sensitive to noise [23]. Baseline noise could correspond to errors in particle concentration or errors in the sample thickness provided to the IAD routine. In case of

*μ*and ${\mu}_{s}^{\prime}$, an error in sample concentration would result in a multiplicative baseline effect, which should not affect the PSD estimation as long as the independent scattering assumption remains valid. Additive baselines have a similar effect, although VF will not absorb the complete effect and especially the parameter related to distribution width will be affected in case of baselines of 5% or larger. Estimations based on

_{s}*g*spectra are more sensitive to baseline noise and baselines of 5% result already in large deviations. The random and baseline errors used in the sensitivity analysis do not take into account wavelength dependent errors, such as a incomplete separation between scattering and absorption in high absorption wavelength regions. If the interference of absorption is too high for a limited set of wavelengths, it can be an option to omit these wavelengths from the input scattering spectrum. If the crosstalk between absorption and scattering is limited and the rest of the scattering spectrum is accurate, PSD estimates might still be accurate since they are bound by the shape of the probability density function.

Whether the scattering spectra for PSD estimation are simulated or experimentally determined, a choice has to be made which one to use as input spectrum. Apart from the reasoning in section 3.7, some additional remarks should be kept in mind when using experimental BOP. Reliable *g* spectra are difficult to determine. In (double) integrating sphere measurements, *g* influences only the total reflection and total transmission, while *μ _{a}* and

*μ*also affect the unscattered transmission. Accordingly, the majority of the noise in the measured reflectance and transmittance signals is transferred to the estimated g by the IAD routine [20, 37]. In spatially or time resolved spectroscopy measurements, on the other hand, the estimation of

_{s}*g*is ill-conditioned due to the high correlation between the different measurements involved [38]. Determining

*g*from a scattering phase function measured with a goniometer setup might be an alternative. However, Michels et al. reported the need for dilution of the sample or a correction to counter the effect of multiple scattering [39]. Spectra of ${\mu}_{s}^{\prime}$ on the other hand, are easiest to determine experimentally. As no unscattered transmission measurement is required, the integrating sphere measurements are sufficient. However, Ren et al. [40] reported that diffuse reflectance and transmittance were less sensitive to PSD parameters when estimated together with the complex refractive indices. They estimated the PSD parameters in a second optimization step based on the unscattered transmittance. This might explain why

*μ*and

_{s}*g*spectra, that need the unscattered transmission for BOP calculation, gave better PSD estimates compared to the PSD estimates based on ${\mu}_{s}^{\prime}$ that were often too wide. In case scattering spectra are relatively smooth or lack typical oscillations, the PSD estimation might be improved by combing two or more scattering spectra, e.g. ${\mu}_{s}^{\prime}$ and

*μ*or

_{s}*μ*and

_{s}*g*, in one cost function to increase the amount of available particle size information.

The particle concentrations considered in this simulation study were low enough to ensure an independent scattering regime. In practical applications however, concentrations will be higher and most likely dependent scattering will be present. In that case, *μ _{s}* and ${\mu}_{s}^{\prime}$ are no longer linearly related to VF, and

*g*becomes concentration dependent. Therefore, a correction should be made to include these effects of dependent scattering on the BOP, as already mentioned in section 3.1. Aernouts et al. used the semi-empirical Twersky approximation to model dependent scattering in Intralipid optical phantoms [45]. Another approach can be to use the Percus-Yevick model to describe the interaction between particles, and calculate the static structure factor that serves as a scaling factor between scattering properties in the dependent and independent regime [41–43]. For non-spherical particles, the T-matrix method might be an option [44]. However, in PSD estimation, the particle size and concentration are unknown, requiring that the correction factor can be expressed as a function of the distribution parameters and VF to be estimated.

In the presented PSD estimation routine, it was chosen to solve the optimization problem with a constrained local optimizer using a multistart approach to deal with non-convexity and local minima. This was to converge rapidly, even when using multiple starting points. Because the starting points covered the parameter space in a systematic grid-like way, it was assumed that at least one starting point would always be relatively close to the global minimum and reach a correct solution. In general, the starting points converged to a limited set of minima, with a distinct difference in cost. Points with similar costs also corresponded to similar values for the distribution parameters and VF. The use of parameter boundaries to include expert knowledge on the product characteristics limited the search space to a relevant area, thereby limiting the number of starting points and reducing the time needed to find a solution. However, if the boundaries of the constrained optimizer are chosen too restrictive, estimated values will coincide with these boundaries and the global minimum might not be reached. In that case, the parameter boundaries should be widened, or an unconstrained optimizer can be used. To make the PSD estimation more robust, one could switch to derivative-free, but more time consuming, algorithms designed for global optimization. For example, Zhu et al. and Ren et al. used the particle swarm optimization [40, 46], while He et al. used the non-gradient based fruit fly algorithm [47]. Sun et al. on the other hand used a genetic algorithm, stating that it was simple, global and open to parallelism although more time consuming than simplex optimizers [23]. A more robust optimization, possibly combining multiple scattering spectra as input for the PSD estimation, could be a next step towards estimating more complex PSDs. In this shape dependent approach, this would imply a weighted combination of probability density distributions as proposed in [23, 46, 47].

## 4. Conclusion

A monomodal shape dependent method based on Mie theory was elaborated for simultaneous estimation of the particle size distribution and volume fraction of particles. Estimations were based on bulk scattering spectra in terms of the scattering coefficient *μ _{s}*, reduced scattering coefficient ${\mu}_{s}^{\prime}$ or anisotropy factor

*g*. A simulated dataset for polystyrene particles suspended in water was constructed containing the scattering spectra of 30 PSDs ranging from 0.1 to 12 μm diameter. The sensitivity of PSD estimation to assumed distribution type, spectral choices (wavelength range and resolution), noise (random, multiplicative & additive baseline) and uncertainties on the particle refractive index was investigated.

Good PSD estimations were possible on all types of scattering spectra if the distribution type was chosen correctly. The use of spectra in the Vis, NIR or Vis/NIR wavelength range, or with a resolution of 1 or 10 nm had only a limited effect. On the other hand, spectral noise deteriorated the quality of the PSD and VF estimates. In case of random noise, most distribution parameters and VF were still correct on average, although the uncertainty increased. Spectral baseline shifts were most detrimental for estimations on *g* spectra, since they lead to shifts in the estimated particle size. In case of *μ _{s}* and ${\mu}_{s}^{\prime}$ spectra, baseline shifts are linearly related to particle concentration in the independent scattering regime and therefore the VF estimate particularly was affected. A special case is the multiplicative baseline where the complete spectral shift is absorbed in VF and the estimated distribution parameters remain unaffected. Realistic noise, as would be encountered in measurements, is a set-up dependent combination of the separately considered noise effects, which will most likely affect both the PSD parameters and VF estimates. Correct values for the particle refractive index were found to be crucial to obtain good estimates. The average refractive index was found to be more crucial than the wavelength dependency

This study showed the potential of using bulk scattering spectra for PSD estimation and identified the impact of different algorithm choices and measurement errors on the quality of the obtained PSD estimates. The insights obtained in this sensitivity analysis will be highly valuable when validating the PSD estimation routine on measurement data, and for future work to estimate more complex PSDs.

## Funding

Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT) (141687); Fund for Scientific Research Flanders (FWO) (12K3916N).

## References and links

**1. **C. Servais, R. Jones, and I. Roberts, “The influence of particle size distribution on the processing of food,” J. Food Eng. **51**, 201–208 (2002). [CrossRef]

**2. **J. Champion, A. Walker, and S. Mitragotri, “Role of particle size in phagocytosis of polymeric microspheres,” Pharm. Res. **25**(8), 1815–1821 (2008). [CrossRef] [PubMed]

**3. **H. G. Merkus, *Particle size measurements - Fundamentals, practice, quality* (Springer, 2009), Chap. 9, 10 & 12.

**4. **A. Malloy and B. Carr, “NanoParticle Tracking Analysis: the Halo system,” Part. Part. Syst. Charact. **23**, 197–204 (2006). [CrossRef]

**5. **C. Goddeeris, F. Cuppo, H. Reynaers, W. G. Bouwman, and G. Van Den Mooter, “Light scattering measurements on microemulsions: estimation of droplet sizes,” Int. J. Pharm. **312**, 187–195 (2006). [CrossRef] [PubMed]

**6. **B. Aernouts, R. Watté, R. Van Beers, F. Delport, M. Merchiers, J. De Block, J. Lammertyn, and W. Saeys, “Flexible tool for simulating the bulk optical properties of polydisperse spherical particles in an absorbing host: experimental validation,” Opt. Express **22**(17), 20223–20238 (2014). [CrossRef] [PubMed]

**7. **B. Aernouts, R. Van Beers, R. Watté, T. Huybrechts, J. Jordens, D. Vermeulen, T. Van Gerven, J. Lammertyn, and W. Saeys, “Effect of ultrasonic homogenization on the Vis/NIR bulk optical properties of milk,” Colloids Surf. B **126**, 510–519 (2015). [CrossRef]

**8. **R. Steponavičius and S. N. Thennadil, “Extraction of chemical information of suspensions using radiative transfer theory to remove multiple scattering effects: application to a model two-component system,” Anal. Chem. **81**, 7713–7723 (2009). [CrossRef]

**9. **R. Steponavičius and S. N. Thennadil, “Extraction of chemical information of suspensions using radiative transfer theory to remove multiple scattering effects: application to a model multicomponent system,” Anal. Chem. **83**, 1931–1937 (2011). [CrossRef]

**10. **B. Aernouts, R. Van Beers, R. Watté, T. Huybrechts, J. Lammertyn, and W. Saeys, “Visible and near-infrared bulk optical properties of raw milk,” J. Dairy Sci. **98**, 6727–6738 (2015). [CrossRef] [PubMed]

**11. **N. Riefler and T. Wriedt, “Intercomparison of inversion algorithms for particle-sizing using Mie scattering,” Part. Part. Syst. Charact. **25**, 216–230 (2008). [CrossRef]

**12. **M. Kandlikar and G. Ramachandran, “Inverse methods for estimating aerosol size distributions: a critical review,” J. Aerosol Sci. **30**(4), 413–437 (1999). [CrossRef]

**13. **D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: theory,” Appl. Opt. **38**(12), 2346–2357 (1999). [CrossRef]

**14. **G. L. Frontini, F. Otero, M. G. Messineo, and G. E. Eliçabe, “Estimation of size distribution in concentrated particle systems from light scattering measurements,” Inverse Probl. Sci. Eng. **16**(8), 995–1004 (2008). [CrossRef]

**15. **F. Bauer and M. A. Lukas, “Comparing parameter choice methods for regularization of ill-posed problems,” Math. Comput. Simulat. **81**, 1795–1841 (2011). [CrossRef]

**16. **W. Liu, X. Sun, and J. Shen, “A V-curve criterion for the parameter optimization of the Tikhonov regularization inversion algorithm for particle sizing,” Opt. Laser Technol. **44**, 1–5 (2012). [CrossRef]

**17. **M. T. Celis and L. H. Garcia-Rubio, “Stability of emulsions from multiwavelength transmission measurements,” Ind. Eng. Chem. Res. **43**, 2067–2072 (2004). [CrossRef]

**18. **F. Ferri, A. Bassini, and E. Paganini, “Modified version of the Chahine algorithm to invert spectral extinction data for particle sizing,” Appl. Opt. **34**(25), 5829–5839 (1995). [CrossRef] [PubMed]

**19. **G. Cabassi, M. Profaizer, L. Marinoni, N. Rizzi, and T. M. P. Cattaneo, “Estimation of fat globule size distribution in milk using an inverse light scattering model in the near infrared region,” J. Near Infrared Spectrosc. **21**, 359–373 (2013). [CrossRef]

**20. **B. Aernouts, E. Zamora-Rojas, R. Van Beers, R. Watté, L. Wang, M. Tsuta, J. Lammertyn, and W. Saeys, “Supercontinuum laser based optical characterization of Intralipid phantoms in the 500–2250 nm range,” Opt. Express **21**(26), 32450–32467 (2013). [CrossRef]

**21. **S. N. Thennadil and Y. Chen, “Alternative measurement configurations for extracting bulk optical properties using an integrating sphere setup,” Appl. Spectrosc. **71**(2), 224–237 (2017). [CrossRef]

**22. **S. A. Prahl, M. J. van Gemert, and A. J. Welch, “Determining the optical properties of turbid media by using the adding-doubling method,” Appl. Optics **32**(4), 559–598 (1993). [CrossRef]

**23. **X. Sun, H. Tang, and J. Dai, “Retrieval of particle size distribution in the dependent model using the moment method,” Opt. Express **15**(18), 11507–11516 (2007). [CrossRef] [PubMed]

**24. **H. G. Merkus, *Particle size measurements - Fundamentals, practice, quality* (Springer, 2009), Chap. 2.

**25. ** MathWorks, “Documentation fmincon,” (2018), https://nl.mathworks.com/help/optim/ug/fmincon.html.

**26. **D. J. Segelstein, “The complex refractive index of water,” Ph.D. thesis, University of Missouri-Kansas City (1981).

**27. **N. Sultanova, S. Kasarova, and I. Nikolov, “Dispersion properties of optical polymers,” Acta Phys. Pol. A **116**, 585–587 (2009). [CrossRef]

**28. **M. Polyanskiy, “Optical constants of polystyrene (PS),” (2016), https://refractiveindex.info/?shelf=organic&book=polystyren&page=Sultanova.

**29. **H. C. van de Hulst, *Light Scattering by Small Particles* (John Wiley and Sons, 1957), Chap. 14.

**30. **R. Watté, B. Aernouts, R. Van Beers, and W. Saeys, “Robust metamodel-based inverse estimation of bulk optical properties of turbid media from spatially resolved diffuse reflectance measurements,” Opt. Express **5**(11), 27880–27898 (2015). [CrossRef]

**31. **H. Keutelian, “The Kolmogorov-Smirnov test when parameters are estimated from data,” CDF Notes (1991).

**32. **H. Bayat, M. Rastgo, M. Mansouri Zadeh, and H. Vereecken, “Particle size distribution models, their characteristics and fitting capability,” J. Hydrol. **529**, 872–889 (2015). [CrossRef]

**33. **J. Jumelet, S. Bekki, C. David, and P. Keckhut, “Statistical estimation of stratospheric particle size distribution by combining optical modelling and lidar scattering measurements,” Atmos. Chem. Phys. **8**, 8913–8949 (2008). [CrossRef]

**34. **J. Kim, S. Ahn, H. Lee, and M. Lee, “Estimation of particle size distribution using photon autocorrelation function from dynamic light scattering considering unknown baseline,” Opt. Lett. **38**(11), 1757–1759 (2013). [CrossRef] [PubMed]

**35. **L. Ma, “Measurement of aerosol size distribution function using Mie scattering - Mathematical considerations,” J. Aerosol Sci. **38**, 1150–1162 (2007). [CrossRef]

**36. **O. Glatter and M. Hofer, “Interpretation of elastic light scattering data. III. Determination of size distributions of polydisperse systems,” J. Colloid Interface Sci. **122**(2), 496–506 (1988). [CrossRef]

**37. **S. A. Prahl, “Everything I think you should know about Inverse Adding-Doubling,”, http://omlc.ogi.edu/software/iad/iad-3-9-10.zip.

**38. **R. Watté, N. Nguyen Do Trong, B. Aernouts, C. Erkinbaev, J. De Baerdemaeker, B. Nicolaï, and W. Saeys, “Metamodeling approach for efficient estimation of optical properties of turbid media from spatially resolved diffuse reflectance measurements,” Opt. Express **21**(26), 32630–32642 (2013). [CrossRef]

**39. **R. Michels, F. Foschum, and A. Kienle, “Optical properties of fat emulsions,” Opt. Express **16**(8), 1094–4087 (2008). [CrossRef]

**40. **Y. Ren, H. Qi, Q. Chen, L. Ruan, and H. Tan, “Simultaneous retrieval of the complex refractive index and particle size distribution,” Opt. Express **23**(15), 19328–19337 (2015). [CrossRef] [PubMed]

**41. **V. Duc Nguyen, D.J. Faber, E. van der Pol, T.G. van Leeuwen, and J. Kalkman, “Dependent and multiple scattering in transmission and backscattering optical coherence tomography,” Opt. Express **21**(24), 29145–29156 (2013). [CrossRef]

**42. **G. Göbel, J. Kuhn, and J. Fricke, “Dependent scattering effects in latex-sphere suspension and scattering powders,” Wave Random Media **5**, 413–426 (1995). [CrossRef]

**43. **J.K. Percus and G.J. Yevick, “Analysis of classical statistical mechanics by means of collective coordinates,” Phys. Rev. **110**(1), 1–13 (1958). [CrossRef]

**44. **D.W. Mackowski and M.I. Mishchenko, “Direct simulation of extinction in a slab of spherical particles,” J. Quant. Spectrosc. Radiat. Transfer **123**, 103–112 (2013). [CrossRef]

**45. **B. Aernouts, R. Van Beers, R. Watté, J. Lammertyn, and W. Saeys, “Dependent scattering in Intralipid phantoms in the 600–1850 nm range,” Opt. Express **22**(5), 6086–6098 (2014). [CrossRef] [PubMed]

**46. **X. Zhu, J. Shen, Y. Wang, J. Guan, X. Sun, and X. Wang, “The reconstruction of particle size distributions from dynamic light scattering data using particle swarm optimization techniques with different objective functions,” Opt. Laser Technol. **43**, 1128–1137 (2011). [CrossRef]

**47. **Z. He, H. Qi, Y. Yao, and L. Ruan, “Inverse estimation of the particle size distribution using the Fruit Fly Optimization Algorithm,” Appl. Therm. Eng. **88**, 306–314 (2015). [CrossRef]