## Abstract

Finding the wavevectors (eigenvalues) and wavefronts (eigenvectors) in nanostructured metasurfaces is cast as a problem of finding the complex roots of a non-linear equation. A new algorithm is introduced for solving this problem; example eigenvalues are obtained and compared against the results from a popular, yet much more computationally expensive method built on a matrix eigenvalue problem. In contrast to the conventional solvers, the proposed method always returns a set of ‘exact’ individual eigenvalues. First, by using the Lehmer-Schur algorithm, we isolate individual complex roots from others, then use a zero-polishing method applied at the very final stage of ultimate eigenvalue localization. Exceptional computational performance, scalability, and accuracy are demonstrated.

© 2014 Optical Society of America

## 1. Introduction

Fabricating a lattice of direct or inverted nanoantennas by creating nanoscale perforations in a subwavelength-thin metal film [1–8] is a practical way of realizing working photonic devices built on the concept of the generalized Snell’s law [1, 5, 6]. Using such ultra-thin metasurfaces for mimicking an ideal, continuous, and surface-confined phase gradient is always somewhat inexact; the lattice of individual structural units provides only an approximation (with discrete lateral distribution of finite phase steps) needed to generate a desired wavefront of the scattered cross-polarized light. Moreover, the nanoantenna-based metasurfaces can generate an additional, undesired scattered signal aligned with the incident polarization [1–6]. Nonetheless, this most general approach to creating 3D beam-shaping devices (lenses, axicons, phase plates, etc) may still be promising, provided that the designed functionality and features are beyond our reach through other means. Introduction of elemental nanostructures that support localized polaritonic modes (e.g., the conventional or Babinet-inverted plasmonic v-shape antennas and optical metamagnetics arranged of periodic coupled metal nanostrips) results in substantially stronger, sometimes exotic optical responses [6–10].

Even in a simple practical focusing device – plasmonic metal slit lenses [9, 10], which consist of a number of short, metal-insulator-metal (MIM) waveguides, with width-adjustable effective propagation constants - the resulting electromagnetic modes are much more complicated than in periodic dielectric lattices. In such devices, the arrays of MIM waveguides are designed to generate the combined transmitted and reflected wavefronts, which propagate either out of the surface or into the substrate. Along with the focusing of incident light, the nanoslit metal lenses also generate surface modes on the lens interfaces and inside the slits. The combination of those effects complicates the theoretical and numerical analysis of such structures, as the coupling between the individual scattering elements (slits) and all the above modes must be taken into account.

One of the most popular methods for solving the general linear scattering problem for a periodic array of elemental sub-wavelength-scale structures is the Fourier Mode Matching, FMM (also known as the Rigorous Coupled Waves Analysis, RCWA, and the Spatial Harmonic Analysis, SHA) method [11–28]. Simply stated, by using the FMM method, one turns the wave equation inside the given planar metasurfaces into an eigenvalue problem and tries to get the possible wavefronts (eigenvectors) and propagation constants (eigenvalues). The solution is solely built on a set of material and geometrical properties which are preserved by translation along the array periodicity directions normal to the metasurface boundary. The convenience of using the Fourier basis implies the truncation of the entire set of eigenmodes, and requires expensive, poorly-scalable numerical solution of a matrix eigenvalue problem. Moreover, the usually convenient Fourier basis is not a good fit for the particular purpose of approximating the distribution of dielectric constants within a given structure, which are discontinuous within the structure. For this reason, not only the numerical accuracy of computing each mode depends substantially on the total amount of modes taken into account [27], but the appearance of polaritonic (plasmonic or phonic) modes requires a specific, fast-converging formulation of the matrix eigenvalue problem in hand.

The approach described in this paper provides a means to solve the eigenvalue problem for a realistic subwavelength-patterned cascaded multilayer structure, and therefore calculation of the complex scattering coefficients. The development of this approach is motivated by our previous theoretical, numerical, and experimental studies of angular dependences of reflectance and transmission of plasmonic metasurfaces [6]. Those initial numerical experiments provided us with detailed information about the overall performance and scalability of our 2D and 3D SHA solver built on the standard, scalable linear algebra libraries [27]. Brief analysis of those numerical experiments is presented in [27], which describes the theoretical treatment of the proposed approach. Instead of solving an approximate (truncated) matrix eigenvalue problem (MEP), this paper is built on finding complex roots of an equivalent, yet exact non-linear equation (NLE) for a given number of the same eigenvalues.

There have been many studies of the conventional single-periodic or double-periodic MEP-based
methods, including the reflection and transmission of cross-gratings and metasurfaces [6, 10, 27]. Discussions of the NLE equivalents are much fewer and
date back to layered media classics such as Rytov and Yeh et al. [29, 30]. It is the effective mode
indices that are of prime interest here. Addressing the question of how they couple to external
fields deserves a separate publication. For simplicity, we choose to test our approach with a
single-period structure. The initial development of the related methods built on MEP [31] was done by using the Fourier expansion based on the
lattice period to approximate individual natural wavefronts. With this approach, the known
structure of the wavefront inside individual layers is not taken into account exactly, and hence
the problem becomes similar to any boundary-value problem reformulated in terms of a convenient
orthogonal basis in space. The NLE adopted in the paper is derived from a boundary-value
problem, which is extensively used to calculate the propagation properties of lamellar
structures [32]. For different values of the in-plane
wave vector k_{x}, the equation for a transverse field is formulated independently for
each layer of a given unpatterned lamellar structure. Then, the system of equations for adjacent
layers is solved by expanding the fields in each layer in terms of plane waves with the
corresponding perpendicular wave vector k_{y} and by applying appropriate
electromagnetic boundary conditions at the interfaces. Adding the periodic Bloch-Fouquet
boundary conditions couples the propagation constant values k_{x} along the lamellar
interfaces, leading to a non-linear equation. Hence, our method deals with a set of proper
wavefronts (eigenvector) moving synchronously along the interfaces of a given lamellar structure
with a corresponding propagation constant (eigenvalue). Our method uses natural, most efficient,
piece-wise continuous basis functions or wavefronts. These functions behave as smooth functions
within each layer, and as continuous functions at the interfaces.

The remaining outline of our text is arranged as follows: Section II casts the Maxwell curl
equations in a transverse field representation for *p*- and
*s*-polarized incidence. The most general electromagnetic modes are defined in
each layer, ignoring its boundaries. These fields are used (i) to obtain a set of synchronous
wavefronts, propagating in the x-direction as a general plane wave, and (ii) to form a
non-linear equation for the eigenvalues in terms of the period, material arrangement, angle of
incidence, and the free-space wavelength. Section III describes how the NLE is solved, using the
Lehmer-Schur algorithm [33]. A brief account of some
important numerical benchmarks is also given in Section III, with particular emphasis on the
individual eigenvalue accuracy and parallelization properties.

## 2. Physical Fundamentals

The formulation of the proposed method for bi-periodic structures is quite involved. For the
sake of brevity in this paper, we use a simplified formulation relevant to single-periodic
structures for in-plane oblique incidence with *p*- or
*s*-polarized light. The complete formulation for bi-periodic 3D structures for
arbitrary incidence will be published elsewhere.

We consider a 2D cross-section of a single-periodic layered medium (with period
$\delta $ along the *y*-direction). As we take a
*p*-polarized plane wave incident at angle $\theta $ to the $x$-direction, a single component of the magnetic field
($H=\widehat{z}h$) sufficiently describes the propagation of light through the
structure. The structure starts at ${y}_{0}=0$. Every *i*^{th} layer is defined by its
dielectric constant ${\epsilon}_{i}$ and position ${y}_{i}$ of its boundary (see Fig.
1).

Using free-space wavelength $\lambda $ as a geometrical scale reference, it is convenient to renormalize coordinates with a free-space wavenumber, ${k}_{0}=2\pi /\lambda $,

and as in the optical range, $\mu \equiv 1$, a*p*-polarized plane wave in

*i*

^{th}layer ($i=\overline{1,s}$) is given by,

_{,}$\omega ={k}_{0}c$ and $c$ are respectively the angular frequency and the speed of light. Plus and minus signs correspond to forward and backward propagating modes. Due to normalization (1), ${g}_{i}$ and ${k}_{x}$ imply dimensionless wavenumbers. We also use a normalized dispersion law, and include $\iota $ in the definition of the

*y*-component of wave vector,

Then, we take the Maxwell curl equations, $\nabla \times E=-{\mu}_{0}{\partial}_{t}h$, $\nabla \times H={\epsilon}_{0}{\partial}_{t}\left(\epsilon E\right)$, to couple the fields, and to define the boundary conditions (BC)
at *i*^{th} interface,

*i*

^{th}layer. Since the structure is periodic, i.e. ${\gamma}_{i}={\gamma}_{i+s}$, we also take into account the Bloch periodic boundary conditions

#### 2.1. Systems of equations

We drop the monochromatic factor ${e}^{-\iota \omega t}$ in (2), and write the BC (4) in a matrix-vector form,

where ${a}_{i}={\left({a}_{i}^{+}{e}^{\iota {k}_{x}x},{a}_{i}^{-}{e}^{-\iota {k}_{x}x}\right)}^{\text{T}}$, ${d}_{j,i}=\mathrm{exp}\left[\mathrm{diag}\left({g}_{j}{y}_{i},-{g}_{j}{y}_{i}\right)\right]$, and matrix ${m}_{i}$and its exact inverse ${m}_{i}^{-1}$ are given by ${m}_{i}=\left(\begin{array}{cc}1& 1\\ {g}_{i}{\gamma}_{i}& -{g}_{i}{\gamma}_{i}\end{array}\right)$, ${m}_{i}^{-1}=\frac{1}{2{g}_{i}{\gamma}_{i}}\left(\begin{array}{cc}{g}_{i}{\gamma}_{i}& 1\\ {g}_{i}{\gamma}_{i}& -1\end{array}\right)$; then, from (6), we obtain a recurrence relation,where ${t}_{i}$ is defined by ${t}_{i}={d}_{i,i}^{-1}{m}_{i}^{-1}{m}_{i+1}{d}_{i+1,i}$.We stress that we can introduce ${t}_{i}$ only if matrices ${d}_{i,i}^{}$ and ${m}_{i}$ are both nonsingular, which immediately yields the following condition,

For the number of layers *s* and period $\delta ={y}_{s}$, (for ${y}_{0}=0$), the Bloch periodic boundary conditions (5) imply that
${h}_{i+s}^{\pm}={h}_{i}^{\pm}{e}^{\alpha \delta}$, and applying (4) at the ${y}_{s}$-boundary we obtain

We then use $i=\text{diag}\left(1,1\right)$, with the complete chain of equations for ${a}_{i}$, to rewrite the periodicity equation, as

where $t={\displaystyle \prod _{i=1}^{s}{t}_{i}}$, with ${t}_{s}={d}_{s,s}^{-1}{m}_{s}^{-1}{m}_{1}{e}^{\alpha \delta}$, as ${d}_{1,0}=i$.Finally, we arrive at a homogeneous system of equations for ${a}_{1}$,

As (11) is the eigenproblem for ${a}_{1}$, where the eigenvalue is equal to 1, the homogeneous system (11) has nontrivial solutions iff $\left|t-i\right|=0$, which results in the following characteristic equation:

Moreover, as we have $2s+1$ unknowns ($2s$ of ${a}_{i}^{\pm}$, and a single ${k}_{x}$, because each ${g}_{i}$ can be expressed through ${k}_{x}$ and ${\epsilon}_{i}$ using the dispersion relation (3)), then the characteristic Eq. (12), together with 2 Eqs. (4) at each of *s* boundaries give
us closure.

#### 2.2. Properties of matrix $t$

First, we calculate the determinant of matrix ${t}_{i}$ for $i=\overline{1,s}$, using (7)

Using the definition of $t$ and Eqs. (13) and (14) we obtain

Then, (15) simplifies (12), providing the following equation,From (16), it follows that only the diagonal elements of $t$ are important, so we could consider the following matrix

Here ${\Delta}_{i}=\text{diag}\left({\text{e}}^{{g}_{i}{\delta}_{i}},{\text{e}}^{-{g}_{i}{\delta}_{i}}\right)$ and ${\delta}_{i}={y}_{i}-{y}_{i-1}$, is the thickness of *i*^{th} layer. Again
we note that dependence on ${k}_{x}$ is already included in ${g}_{i}$ through (3). The characteristic Eq. (12) then takes the following simple form,

#### 2.3. Binary medium

An example configuration for a binary single-periodic 2D structure is depicted in Fig. 1b. The expression for ${\rm K}$ is given by

where only the diagonal terms are of importance. Hence, by using*k*-normalized parameterization (${\tilde{\delta}}_{i}={g}_{i}{\delta}_{i}$ and ${\tilde{\gamma}}_{i}={g}_{i}{\gamma}_{i}$), we obtain the trace terms from (19)

Accordingly, the nonlinear Eq. (20) corresponds to the known result [34]

Once we take $\alpha =0$ (normal incidence) we obtain,

## 3. New algorithm implementation: theoretical description

While the problem of finding the roots (zeros) of the function on a complex plane is old and well explored, it remains rather difficult to solve. There are many “root polishing” algorithms available which can provide a given zero with a required accuracy. However, we must first isolate a zero from others in a given region with relatively good accuracy in order to get fast and reliable convergence of a given polishing algorithm (if the zero is of the first order). Here, we use the Lehmer-Schur algorithm [33] built on the “Argument Principle” (see for example [35]). Suppose that a function $f\left(z\right)$ is meromorphic (analytic except for poles) from the domain interior to a positively oriented contour , analytic and nonzero on , then

where $Z$ is the number of zeros, $P$ is the number of poles of the function in the domain, and $l{n}^{\prime}$ denotes the logarithmic derivative of a given function $f\left(z\right)$. Absence of poles follows immediately from condition (8) because poles can appear only in the expressions for inverse matrices ${d}_{i,i}^{}$ and ${m}_{i}$. Physics here corresponds to natural facts that the field in an element of the structure depends on the fields in the neighboring elements, and the wavefront of the field inside a structured unit cell is always different from that of a plane wave in a uniform media.First, we should determine a region to localize the roots, and use (23) to find the total number of zeros in the domain. Here, we choose a rectangle that includes the origin $z=0$. The rectangle should not be too large, as it is difficult (i) to excite modes with the propagation constants being much higher than that of the incident wave (we normalized all wave vectors to the wave vector of the incident light, meaning that the square shouldn’t be more than a couple orders of magnitude larger than a unit square), or (ii) propagate the modes with substantial attenuation (i.e., with large imaginary parts of their propagation constants).

Next, the rectangular domain is split in half (in a real situation, any ratio may be used) over vertical and horizontal dimensions and use (24) again in order to locate “non-empty” (containing zeros) quadrants. Such iterations (splitting of zero-containing quadrants in four parts) are repeated until the resulting quadrants are small enough - meaning that zeros have been located close enough to ensure convergence of a given root-polishing algorithm. This process is schematically represented in Fig. 2, where crosses correspond to the values of ${k}_{x}$ (eigenvalues) of a binary periodic structure.

To test our approach, we consider a layer consisting of only two elemental materials (see
Fig. 1(b)). This binary structure (Material 1) is
composed of a gain-doped silica layer (with thickness ${\delta}_{1}=45\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{nm}$ and complex dielectric permittivity ${\epsilon}_{1}=2.7224-\iota 0.029615$), and a layer of silver with ${\delta}_{2}=5\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{nm}$ and ${\epsilon}_{2}=-26.079+\iota 0.882$. The incident *p*-polarized light of a wavelength
$\lambda =740\text{\hspace{0.17em}}\text{nm}$ enters at the incident angle, $\theta =\pi /3$, as shown in Fig. 1.

For this problem, the nonlinear equation to be solved is given by expression (21). We seek to get all ${k}_{x}$ in a rectangle with the lower left corner being at $-100-\iota 100$, and the upper right corner at $100+\iota 100$.

Several consecutive splittings of the initial search area (a red rectangle) are shown in Fig. 2 by blue, orange, and black lines. The final
zero-containing quadrants are shown as filled yellow rectangles. Overall, this is simply a
generalization of the dichotomy method for the complex plane. *As calculations for each
quadrant are completely independent from each other, these calculations can be done in
parallel*.

It is convenient to use (25) in the following formulation:

here ${\Delta}_{\Gamma}\mathrm{arg}\left(f\left(z\right)\right)$ is a change of argument of function $f\left(z\right)$ along the closed contour $\Gamma $. In this case, it is not required to calculate the derivatives of the function. However, troubles arise because available functions for calculating arguments of a complex number are well defined only on one of the branches (usually the principal one). In order to calculate ${\Delta}_{\Gamma}\mathrm{arg}\left(f\left(z\right)\right)$, $\Gamma $ is split into relatively small steps, and is checked whether arguments on every step change by a small value with respect to $2\pi $. If this change is greater than some threshold value, let's call it argument smoothness tolerance (say, 0.1), then we split our small interval in half and repeat calculation of the argument for both parts. Iterations are continued until this branch point is located, or it is understood that there is no branch point on this interval. At this point, $\mathrm{arg}\left(f\left(z\right)\right)$ has a jump of $2\pi $, so we can correct our calculations of the argument function, which follows the principal branch only, by adding this jump to the sum of argument changes. It should be noted that instead of the argument function, the complex logarithm function can be used.During application of the Lehmer-Schur algorithm, we should watch closely for the conservation of the total number of zeros. If the total number of zeros is not conserved, it usually means that there is a zero on one of the boundaries and this boundary must be moved. After the zeros are separated, the hybrid Powell algorithm (a modification of 2D Newton's algorithm) [36] is used in order to obtain every zero with a desired accuracy. Currently, an implementation provided by the GNU Scientific Library [37] is used.

#### 3.1. Accuracy of the proposed algorithm

As a reference point, we have used the simulation results from 2D spatial harmonics analysis (SHA), proposed in [28] and validated in [27, 38] (2D-case) and in [27] (3D-case). The accuracy of the SHA method depends on the total number of Fourier modes. In the reference runs, 300 modes were taken. This number of modes is, with confidence, beyond the limit of convergence of the algorithm. This number is so high because the usual configuration of current nanostructures involves a complex piecewise constant function for the dielectric permittivity, which results in a slow convergence of the Fourier series.

We compared the simulation results of the lowest propagating mode (${k}_{x}$ has the smallest imaginary value) as a function of the incident light’s wavelength. Results are shown in Fig. 3(a)and 3(b), where the binary periodic structure (Material 2) had the following configuration: layer of silica doped with gain inclusions of thickness ${\delta}_{1}=20\text{\hspace{0.17em}}\text{nm}$, and a layer of silver of thickness ${\delta}_{2}=20\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{nm}$, incident angle $\theta =0$. The optical constants of silver and doped silica were kept the same as in Material 1.

We have used the following parameters of the algorithm: zeros tuning was started after the size of all the quadrants in the Lehmer-Schur algorithm smaller than ${10}^{-3}$ was achieved, accuracy of the zero tuning was ${10}^{-16}$ (the absolute value of the function at the zero-point). It can be seen that the same (up to the round-off error) results for 300 harmonics in SHA have been obtained. These results are very promising, taking into account the performance tests discussed later. Here, it should also be stressed that the proposed algorithm takes into account the piecewise character of the structure by using the natural proper functions. This means that the exact eigenvalues are obtained up to the accuracy of calculated zeros, which is limited from below by a round-off error of a given floating-point representation.

In Fig. 4, we demonstrate that the propagation constants (${k}_{x}$) from the MEP-SHA method converge to the results of the proposed method after increasing the number of harmonics. In this case (Material 3) the binary structure had the following composition: silica layer of thickness ${\delta}_{1}=20\text{\hspace{0.17em}}\text{nm}$ and ${\epsilon}_{1}=2.723$, and a layer of silver of thickness ${\delta}_{2}=20\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{nm}$ and ${\epsilon}_{2}=-25.274+\iota 0.85436$, and the wavelength of normally incident light is 730 nm. Even in our quite simple example, only after taking into account at least 300 Fourier modes the MEP-SHA was capable of computing almost the same numerically exact (up to a floating-point representation accuracy) eigenvalues as the proposed method.

#### 3.2. Parallelization of the algorithm

After each iteration of the Lehmer-Schur process, a set of zero-containing quadrants is found. Every such quadrant is processed independently. Thus, we have a natural, parallel part of the algorithm. Another important part is the final tuning of zeros with Powell's algorithm, where each zero is independent. However, as we know that this part of the entire algorithm takes a small fraction of the total computation time, it is much more beneficial to optimize the calculation of $\mathrm{arg}\left(f\left(x\right)\right)$ in the Lehmer-Schur process.

Currently, only the simplest Lehmer-Schur parallelization described above has been implemented. The shared memory model with the Linux implementation of POSIX threads as a parallelization mechanism is used.

For our current requirements (simulation of metasurfaces), we are satisfied with the present performance, which is more than one order of magnitude faster than the traditional, MEP-based SHA. The comparison of the performances of SHA against the proposed algorithm is shown in Fig. 5.As one can see from Fig. 5(a), scalability of the proposed algorithm is better than in the case of SHA, but still far from linear. The total performance is much better, which is demonstrated in Fig. 5(b). Moreover, in order to obtain comparable accuracy in MEP-based SHA (even for the lowest evanescent modes), one must use hundreds of modes.

For the proposed approach, we have used a freely available gcc compiler and standard libraries, while for the SHA code we have used a highly optimized Intel compiler together with a specialized Intel Math Kernel Library. So we still have a room for further acceleration of the new algorithm implementation. Hence, the current speed up numbers at Fig. 5(b) can be considered as minimal values. All the performance tests have been performed on Intel Xeon 5450 CPU operating at 3.0 GHz under CentOS 5.6 Linux.

## 4. Conclusion

We propose an algorithm for simulating periodic metasurfaces and cascaded metamaterial structures. The current formalism is developed for 2D-geometry, but can be generalized for 3D-case without any difficulties. We decided to limit ourselves to the 2D-case in order to keep all derivations relatively simple and clear for understanding. The proposed algorithm was implemented, and its accuracy and performance have been thoroughly tested. The preliminary results are truly encouraging - both in terms of accuracy and speed. Even the proof-of-concept, pilot version of the proposed algorithm is significantly (at least one order of magnitude) faster than the traditional, MEP-based implementation of SHA.

We must point out that even the simplest case of a slit in a metal film requires using at least 80 modes in the MEP-based SHA (or similar algorithms) in order to get any reasonable correspondence with experimental results [39]. In this case, our method offers more than one order-of-magnitude less time. At the same time, more involved configurations may require even more modes, as the main problem of the Fourier basis in MEP-based approaches is poor representation of piecewise constant distribution of dielectric constants in the media, which leads to slow convergence of higher frequency modes. This means that the further from the origin we go in the spatial spectrum, the more significant is the error (see Fig. 4). In other words, higher frequency modes obtained by SHA have a tendency to deviate from their exact values stronger than lower frequency modes. In contrast with the MEP-based approaches, the proposed method includes naturally the sharp boundaries. As a result, all calculated propagation constants are exact up to a desired level of accuracy, which is limited from below by a round-off error of floating-point representation.

One may speculate that the non-propagating modes are not that important. However, current
metasurfaces and metamaterials operate mostly due to near-field coupling effects. In the case of
cascaded layers (which is natural for construction of bulk metamaterials) the near-field
interaction becomes crucial. As it has already been demonstrated in [28], the significance of lower evanescent modes in cascaded structures
increases dramatically with respect to single-layer case, and is still important even in thin
symmetric structures on a substrate [40]. As a result,
accurate calculation of the evanscent modes becomes absolutely necessary. This leads to an
increasing number of modes in MEP-based methods, with *N* reaching several
hundred modes. In this case, usage of our method is a must, because of several orders of
magnitude advantage in performance.

As the number of required operations is dramatically decreased, the proposed solver demonstrates somewhat weak scalability, which is still better than the scalability of MEP-based methods. However, due to the exceptional speed enhancement of our new solver, even its current, non-optimal implementation exhibits very good performance. The solver already allows for fast global optimization of modern periodic metamaterials and metasurfaces by using a conventional, four-core desktop computer.

We truly believe that the proposed solver is a key to both fast computation of periodic nano-structures on conventional personal computers and feasible application of the optimization algorithms, which requires many repeating calculations of the given structure. Future work will include the implementation of the proposed algorithm into our free, in-the-cloud simulation tool [38], which currently employs a conventional, MEP-based version of 2D SHA. Speed-up techniques for spectral and parametric sweeps will also receive our special attention.

## Acknowledgments

A.V.K. gratefully cites the support from the Gordon and Betty Moore Foundation Grant GBMF3385 and ARO Grant W911NF-13-1-0226. He also thankfully acknowledges motivating discussions with Prof. E. E. Narimanov. A.O.K. acknowledges useful discussions on numerical methods with Prof. K. S. Turitsyn, MIT, along with partial support by the University of New Mexico RAC grant #11-19. The authors thank developers of FFTW [41] and the whole GNU project (especially GNU Scientific Library [37]) for creating, developing, and supporting this free and useful software.

## References and links

**1. **N. Yu, P. Genevet, M. A. Kats, F. Aieta, J.-P. Tetienne, F. Capasso, and Z. Gaburro, “Light Propagation with Phase Discontinuities:
Generalized Laws of Reflection and Refraction,”
Science **334**(6054), 333–337
(2011). [CrossRef] [PubMed]

**2. **M. A. Kats, P. Genevet, G. Aoust, N. Yu, R. Blanchard, F. Aieta, Z. Gaburro, and F. Capasso, “Giant birefringence in optical antenna arrays with
widely tailorable optical anisotropy,” Proc. Natl. Acad. Sci.
U.S.A. **109**(31), 12364–12368
(2012). [CrossRef]

**3. **F. Aieta, P. Genevet, N. Yu, M. A. Kats, Z. Gaburro, and F. Capasso, “Out-of-Plane Reflection and Refraction of Light by
Anisotropic Optical Antenna Metasurfaces with Phase Discontinuities,”
Nano Lett. **12**(3), 1702–1706
(2012). [CrossRef] [PubMed]

**4. **P. Genevet, N. Yu, F. Aieta, J. Lin, M. A. Kats, R. Blanchard, M. O. Scully, Z. Gaburro, and F. Capasso, “Ultra-thin plasmonic optical vortex plate based on
phase discontinuities,” Appl. Phys. Lett. **100**(1), 013101
(2012). [CrossRef]

**5. **S. Larouche and D. R. Smith, “Reconciliation of generalized refraction with
diffraction theory,” Opt. Lett. **37**(12), 2391–2393
(2012). [CrossRef] [PubMed]

**6. **X. Ni, S. Ishii, A. V. Kildishev, and V. M. Shalaev, “Ultra-thin, planar, Babinet-inverted plasmonic
metalenses,” Light Sci. Appl. **2**(4), e72 (2013).

**7. **S. Sun, Q. He, S. Xiao, Q. Xu, X. Li, and L. Zhou, “Gradient-index meta-surfaces as a bridge linking
propagating waves and surface waves,” Nat. Mater. **11**(5), 426–431
(2012). [CrossRef] [PubMed]

**8. **A. V. Kildishev, A. Boltasseva, and V. M. Shalaev, “Planar Photonics with
Metasurfaces,” Science **339**(6125), 1232009 (2013). [CrossRef] [PubMed]

**9. **L. Verslegers, P. B. Catrysse, Z. Yu, and S. Fan, “Planar metallic nanoscale slit lenses for angle
compensation,” Appl. Phys. Lett. **95**(7), 071112 (2009). [CrossRef]

**10. **S. Ishii, A. V. Kildishev, V. M. Shalaev, K.-P. Chen, and V. P. Drachev, “Metal nanoslit lenses with polarization-selective
design,” Opt. Lett. **36**(4), 451–453
(2011). [CrossRef] [PubMed]

**11. **R. E. Collin, “Reflection and Transmission at a Slotted Dielectric
Interface,” Can. J. Phys. **34**(4), 398–411
(1956). [CrossRef]

**12. **C. B. Burckhardt, “Diffraction of a Plane Wave at a Sinusoidally
Stratified Dielectric Grating,” J. Opt. Soc. Am. **56**(11), 1502–1508
(1966). [CrossRef]

**13. **H. Kogelnik, “Coupled wave theory for thick hologram
gratings,” Bell Syst. Tech. J. **48**(9), 2909–2947
(1969). [CrossRef]

**14. **F. G. Kaspar, “Diffraction by thick, periodically stratified gratings
with complex dielectric constant,” J. Opt. Soc. Am. **63**(1), 37–45
(1973). [CrossRef]

**15. **R. Magnusson and T. K. Gaylord, “Analysis of multiwave diffraction of thick
gratings,” J. Opt. Soc. Am. **67**(9), 1165–1170
(1977). [CrossRef]

**16. **K. Knop, “Rigorous diffraction theory for transmission phase
gratings with deep rectangular grooves,” J. Opt. Soc.
Am. **68**(9), 1206–1210
(1978). [CrossRef]

**17. **L. C. Botten, M. S. Craig, R. C. McPhedran, J. L. Adams, and J. R. Andrewartha, “The Finitely Conducting Lamellar Diffraction
Grating,” Opt. Acta (Lond.) **28**(8), 1087–1102
(1981). [CrossRef]

**18. **M. G. Moharam and T. K. Gaylord, “Three-dimensional vector coupled-wave analysis of
planar-grating diffraction,” J. Opt. Soc. Am. **73**(9), 1105–1112
(1983). [CrossRef]

**19. **L. Li, “Multilayer modal method for diffraction gratings of
arbitrary profile, depth, and permittivity,” J. Opt. Soc. Am.
A **10**(12), 2581–2591
(1993). [CrossRef]

**20. **E. Noponen and J. Turunen, “Eigenmode method for electromagnetic synthesis of
diffractive elements with three-dimensional profiles,” J. Opt.
Soc. Am. A **11**(9), 2494–2502
(1994). [CrossRef]

**21. **G. Granet and B. Guizal, “Efficient implementation of the coupled-wave method for
metallic lamellar gratings in TM polarization,” J. Opt. Soc.
Am. A **13**(5), 1019–1023
(1996). [CrossRef]

**22. **J. M. Jarem and P. P. Banerjee, “A nonlinear, transient analysis of two- and multi-wave
mixing in a photorefractive material using rigorous coupled-wave diffraction
theory,” Opt. Commun. **123**(4-6), 825–842
(1996). [CrossRef]

**23. **P. Lalanne and G. M. Morris, “Highly improved convergence of the coupled-wave method
for TM polarization,” J. Opt. Soc. Am. A **13**(4), 779–784
(1996). [CrossRef]

**24. **E. Popov and M. Nevière, “Maxwell equations in Fourier space: fast-converging
formulation for diffraction by arbitrary shaped, periodic, anisotropic
media,” J. Opt. Soc. Am. A **18**(11), 2886–2894
(2001). [CrossRef] [PubMed]

**25. **B. Momeni and B. Rashidian, “Improved coupled wave analysis of two-dimensional
planar multiple gratings,” IEEE Trans. Antenn. Propag. **52**(1), 165–171
(2004). [CrossRef]

**26. **A. David, H. Benisty, and C. Weisbuch, “Fast factorization rule and plane-wave expansion method
for two-dimensional photonic crystals with arbitrary hole-shape,”
Phys. Rev. B **73**(7), 075107 (2006). [CrossRef]

**27. **X. J. Ni, Z. T. Liu, A. Boltasseva, and A. V. Kildishev, “The validation of the parallel three-dimensional solver
for analysis of optical plasmonic bi-periodic multilayer
nanostructures,” Appl. Phys., A Mater. Sci. Process. **100**(2), 365–374
(2010). [CrossRef]

**28. **A. V. Kildishev and U. K. Chettiar, “Cascading optical negative index
metamaterials,” J. Appl. Comput. Electromagnetics Soc. **22**, 172–183
(2007).

**29. **S. M. Rytov, “Electromagnetic properties of a finely stratified
medium,” Sov. Phys. JETP **2**, 466–475
(1956).

**30. **P. Yeh, A. Yariv, and C.-S. Hong, “Electromagnetic propagation in periodic stratified
media. I. General theory,” J. Opt. Soc. Am. **67**(4), 423–438
(1977). [CrossRef]

**31. **F. Tisseur and K. Meerbergen, “The Quadratic Eigenvalue
Problem,” SIAM Rev. **43**(2), 235–286
(2001). [CrossRef]

**32. **X. Ni, S. Ishii, M. D. Thoreson, V. M. Shalaev, S. Han, S. Lee, and A. V. Kildishev, “Loss-compensated and active hyperbolic
metamaterials,” Opt. Express **19**(25), 25242–25254
(2011). [CrossRef] [PubMed]

**33. **F. S. Acton, *Numerical Methods that
Work* (Math. Assoc. Amer., Washington, DC, 1990).

**34. **J. Elser, V. A. Podolskiy, I. Salakhutdinov, and I. Avrutsky, “Nonlocal effects in effective-medium resfponse of
nanolayered metamaterials,” Appl. Phys. Lett. **90**(19), 191109 (2007). [CrossRef]

**35. **J. W. Brown and R. V. Churchill, *Complex
Variables and Applications* (McGraw Hill, 2004).

**36. **M. J. D. Powell, *A Hybrid Method for Nonlinear
Equations* (Gordon and Breach, 1970).

**37. **GNU Scientific Library, (1996–2013). http://www.gnu.org/s/gsl/.

**38. **X. Ni, Z. Liu, F. Gu, M. G. Pacheco, J. Borneman, and A. V. Kildishev, “PhotonicsSHA-2D: Modeling of single-period multilayer optical
gratings and metamaterials,” (2012), doi: . https://nanohub.org/resources/6977. [CrossRef]

**39. **Z. Liu, K. P. Chen, X. Ni, V. P. Drachev, V. M. Shalaev, and A. V. Kildishev, “Experimental verification of two-dimensional spatial
harmonic analysis at oblique light incidence,” J. Opt. Soc. Am.
B **27**(12), 2465–2470
(2010). [CrossRef]

**40. **A. V. Kildishev, J. D. Borneman, X. J. Ni, V. M. Shalaev, and V. P. Drachev, “Bianisotropic effective parameters of optical
metamagnetics and negative-index materials,” Proc.
IEEE **99**(10), 1691–1700
(2011). [CrossRef]

**41. **M. Frigo and S. G. Johnson, “The design and implementation of FFTW
3,” Proc. IEEE **93**(2), 216–231
(2005). [CrossRef]