Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Cross-talk elimination for lenslet array near eye display based on eye-gaze tracking

Open Access Open Access

Abstract

Lenslet array (LA) near-eye displays (NEDs) are a recent technical development that creates a virtual image in the field of view of one or both eyes. A problem occurs when the user’s pupil moves out of the LA-NED eye box (i.e., cross-talk) making the image look doubled or ghosted. It negatively impacts the user experience. Although eye-gaze tracking can mitigate this problem, the effect of the solution has not been studied to understand the impact of pupil size and human perception. In this paper, we redefine the cross-talk region as the practical pupil movable region (PPMR50), which differs from eye box size because it considers pupil size and human visual perception. To evaluate the effect of eye-gaze tracking on subjective image quality, three user studies were conducted. From the results, PPMR50 was found to be consistent with human perception, and cross-talk elimination via eye-gaze tracking was better understood in a static gaze scenario. Although the system latency prevented the complete elimination of cross-talk for fast movements or large pupil changes, the problem was greatly alleviated. We also analyzed system delays based on PPMR50, which we newly defined in this paper and provided an optimization scheme to meet the maximum eyeball rotation speed.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The use of head-mounted displays (HMD) with virtual reality (VR) applications has become increasingly popular. For traditional VR devices that use a single lens, it is impractical to manufacture a focal length that is significantly less than the width of the lens, making the required space between the display and the lens difficult to compress [13]. It results in thick and heavy HMDs. Hence, some HMD manufacturers have applied combiners, but they are still quite bulky [46]. The development trend of VR is toward miniaturization. Therefore, we are interested in producing HMDs that are light and portable for long-term wear. Lanman and Luebke demonstrated a near-eye display (NED) device that uses a microdisplay in front of a lenslet array (LA) [1]. This technology provides a means to achieve a thin and lightweight structure with a wide field of view (FOV). It also addresses vergence accommodation conflicts (VAC), which occur when the human brain receives mismatching cues between the distance of a virtual three-dimensional (3D) object (vergence) and the focusing distance (accommodation) required for the eyes to focus on the object in binocular configurations. However, for NEDs using LAs, it is crucial to consider the eye box (i.e., the area where users observe the full extent of the virtual image). Notably, the pupil must be located within this area [1,7]. With LA-NEDs, cross-talk occurs if the pupil is outside the eye box, making the image look doubled or ghosted. The eye box in an LA-NED is small, compared with a normal HMD, owing to its optics. Lanman and Luebke [1] treated the pupil as a point without physical size, reporting the maximum 10 mm eye box for 14 mm eye-relief. However, the actual pupil size ranges from 2 to 8 mm [8], which decreases the range in which cross-talk can be avoided. The new eye box causes a problem for practical use, as users easily exceed this range by rotating their eyes. Another study, carried out by Bang [9], presented a thin and flat VR display glass prototype using a Fresnel LA with a polarization-based optical folding technique. Their prototype achieved an eye box of 8.8 mm in an evaluation using a wide-angle camera instead of the human eye. However, the camera aperture was smaller than that of a human pupil. As they pointed out, the range in which a user can avoid cross-talk is likely to be smaller, and for some applications, even their new eye box may be insufficient.

A promising approach to achieving a larger apparent region in NEDs is to measure the position of the pupil while dynamically generating microdisplay images based on it. This has been identified as a future challenge in several studies [916]. The effect of the eye-gaze tracking technique on a holographic NED by enlarging the region to solve vignetting issues has been examined [10,11]. Although system parameters were designed and performance was analyzed, pupil size not been considered. Some other papers [1719] also used eye-gaze tracking to solve this kind of problem in LA-NEDs, but they did not verify the effect of system latency on cross-talk in the process of pupil rotation, which is significant for eye-gaze tracking HMDs also a lack of consideration for human visual perception. Human beings perceive only the results of the brain’s interpretation of a visual stimulus, rather than the stimulus itself. In terms of human cognition, cross-talk is not immediately perceived when the position of the pupil exceeds the eye box. Notably, extant studies used simulations or captured camera images and did not evaluate actual human visual physiology via participant experiments. No study has examined the effects of dynamic image generation based on eye-gaze tracking while accounting for the effects of pupil size and human visual perception as it pertains to the cross-talk of LA-NEDs. Notably, the "eye box" metric does not accurately represent the region where people can avoid cross-talk in LA-NEDs. It cannot evaluate system performance based on it. Therefore, we define a new range metric called the "Practical Pupil Movable Region" (PPMR$_{50}$), which considers pupil size and human perception. Based on it, we designed new LA-NED prototype parameters, analyzed their feasibility, and performed a user study to compare theoretical values and actual human-perceived cross-talk measures. We also investigated the effect of eye-gaze tracking on the user’s perception of cross-talk in LA-NEDs through user studies with images and videos.

2. Related works

2.1 NEDs

NEDs enable users to see stereoscopic images through two separate display systems. They have many uses in gaming, aviation, engineering, and medicine [20]. Conventional NEDs use a single lens per eye [2,21] and provide a large eye box. Some NEDs offer ability to physically adjust each eye’s display position or the rendered image, depending on the distance between the eyes. Most displays have an eye box covering the full range of eyeball motion. Users continue to perceive a clear image unless eyes shift too far away from the NED. Such NEDs have thicknesses of 10 cm and require additional methods to eliminate distortion. It is impractical to manufacture the traditional simple magnifier of NEDs with a focal length significantly less than its width [1]. Thus, it is difficult to achieve a thin form factor. Representative NED products on the market (e.g., Vive and Oculus) [22,23] are still bulky and unsuitable for long-term wear. Our primary interest is in portable long-term wearable NEDs with high performance.

2.2 LA-NEDs

To make NEDs more portable, Lanman and Luebke [1,24] designed a prototype with an LA. A new compact NED optical element used a pair of LAs that worked together [25]. The purpose is to aid in the accommodation of the eye on a head-up display that is positioned several centimeters from the eye. However, the eye box was limited to 4 mm. Another NED used a discrete LA [26], but the stray light entering the prototype caused a decrease in brightness and eye box size. A novel prototype relies on curved displays and an LA to obtain a larger FOV and eye box [7]. Flexible organic light-emitting diode displays exist on some mobile phones, but drivers are widely unavailable. Thus, phone electronics and batteries must be attached to each display, adding to the bulk. Another binocular prototype system [27] maintained high-quality imagery, high visual resolution, and a wide FOV for a see-through view display, and it removed cross-talk with an eye box of 6 mm. Another prototype presented a new thin and flat VR display combining a Fresnel LA, a Fresnel lens, and a polarization-based optical folding technique [9]. The function of the LA was to float the virtual display plane at a far distance, while another Fresnel lens collected the light from the display plane and sent it into the pupil. To further shorten the space between the LA and the display, an optical folding technique based on polarization was applied. LA displays are suitable for a thin form factor, wide FOV, and combined LAs with a light field that can mitigate the VAC. However, as mentioned in Section 1, the prototype eye box (8.8 mm) does not cover the entire movable range of the human eyeball.

2.3 Techniques for extending the eye box in a NED

As described in Section 1, the eye box is a crucial parameter of NED design. The major eye box extension technique, which can utilize the full bandwidth of recent displays, uses eye-gaze tracking to extend the pseudo-eye box for dynamic image switching [1013]. Currently, portable NEDs with thin form factors include holographic and LA-NEDs. Novel designs for holographic NEDs were presented that achieved full color, high contrast, and low-noise holograms with high resolution and true per-pixel focal control [28]. However, a major limitation was the narrow eye box. One possible solution was to shift the exit pupil by switching light sources or using a beam-steering element. This method was demonstrated successfully by Haeussler [29], who used an eye-gaze tracking system with large-format holographic displays. A prototype NED was built using a holographic image combiner with an eye-gaze tracker, achieving competitive display performances with a high FOV and a sufficient eye box [10]. One year later, the same group [11] proposed a pupil-shifting HOE with an eye-gaze tracker, demonstrating a reduced form factor for exit-pupil shifting, enlarging the shifted intrinsic eye box from 2 to 7 mm. In these studies, thorough analyses of design parameters and display performances ascertained the possibility of eye-gaze tracking by enlarging the eye boxes of holographic NEDs. When the pupil moves outside the eye box, it causes severe vignetting issues. Enlarging the eye box is therefore practical. With LA-NEDs, the pupil moving outside the eye box causes severe cross-talk. To solve this problem in LA-NEDs, Seokmin Hong etc. [30] proposed a system by using an eye tracking method based on the Kinect sensor. They tested generation speeds of 10 frames per second (FPS) and 5 FPS for two different resolutions. The latency of their prototype is a severe problem, so they only analyzed static gaze. The papers [1719] also used eye-gaze tracking to solve this kind of problem in LA-NEDs, but all of their prototypes cannot be worn and they did not verify the effect of the system latency in the process of pupil rotation considering pupil size and human perception. Though the eye-gaze tracking method has already existed, unfortunately, no study has rigorously examined human visual physiology and cognitive characteristics, including eye movement, while considering the accuracy and latency of eye trackers. The human visual system cannot process everything with full fidelity, nor can it perform, moment-to-moment, all possible visual tasks. It must lose some information and prioritize some tasks over others [31,32]. The latency of the mechanical devices (i.e., eye trackers and display devices) can also cause problems, leading to different kinds of cross-talk and negative artifacts. We investigated the effect of eye-gaze tracking on the user’s perception of images and video quality in LA-NEDs by conducting user studies.

3. Principle of NEDs

In LA-NEDs, the position change of the human eye in the eye box causes a change in the observed image quality. In this section, we first describe the difference in what happens when, using a simple magnifier, the pupil position changes based on the pupil as a point without size (position only) and then in a real situation where the pupil has a physical size. We then do the same with the LA-NED to further explain the difficulties of assessing performance with the eye box alone. Based on this discussion, we define the pupil movable region (PMR) and PPMR$_{50}$ by considering pupil size and human perception effects, respectively. Lastly, we introduce the idea of dynamically generating a PPMR$_{50}$ to eliminate cross-talk using eye-gaze tracking and analyze the feasibility of the system.

3.1 Eye box in single magnifier NEDs

In a simple HMD, one lens is positioned at a focal-length distance in front of a microdisplay, creating a magnified virtual image. The eye box defines the region in which the user can perceive the full extent of the virtual image if the user’s pupil is located within the eye box. When the pupil is regarded as a point, the transition distance from the eye box center is smaller than $w_e/2$, and a complete image can be seen. If the transition distance from the center of the eye box is larger than $w_e/2$, a part of the image disappears. However, if we consider the pupil size as $w_p$, and the transition distance from the eye box center is smaller than $(w_e - w_p)/2$, the whole virtual image can be seen. However, if the transition distance is larger than $(w_e - w_p)/2$, the intensity of a part of the image gradually decreases instead of immediately disappearing. In summary, when pupil size is considered, the eye box is not suitable as an indicator to evaluate the actual performance of NEDs using a single lens.

3.2 Eye boxes in LA-NEDs

In LA-NEDs, the microdisplay is divided into several elemental parts. Each region and the corresponding lens act as an independent simple magnifier. Each elemental image ideally belongs to one lens. The clear image is composed of elemental images mapped onto the virtual image plane through lenses. However, the elemental images map to the virtual image plane not only at the corresponding lens but also at other lenses. The elemental images formed by the other lenses are redundant, which causes a mixture of images to appear on the virtual image plane. When the pupil is regarded as a point that moves within the eye box, the whole non-mixture image can be seen. When the pupil is out of the eye box, redundant parts appear in sight. In an actual case (i.e., a pupil with an area) when the pupil extends beyond the eye box, the intensity of the cross-talk region gradually increases. Figure 1 presents simulations of the perceived images in an LA prototype, in which the width and focal length of each lens are 1 mm and 3.3 mm. For eye-relief 15.2 mm, the eye box of the prototype is 4.5 mm. The images of the first row in Fig. 1 present the simulation of the pupil as a point. The images of the second and third rows of Fig. 1 present simulations of a pupil of 4 mm size. No cross-talk appears when two types of pupils are located at the center of the eye box (the black rectangle in Fig. 1). When the pupil transition distance is 2.0 mm, the case of the pupil as a point still does not have cross-talk. However, in the real case, cross-talk has already appeared. When the transition distance is 3 mm, the user will see the cross-talk in both cases, but the intensities differ. Thus, only considering the eye box is unsuitable for assessing the actual performance of LA-NEDs.

 figure: Fig. 1.

Fig. 1. Simulations of retina images under the condition of lens width 1 mm, the focal length 3.3 mm; the size and resolution of microdisplay are 15.36 mm $\times$ 8.64 mm and 1280 $\times$ 720 with the pupil as a point and an area. The black rectangle means a 4.5 mm eye box. The first row represents the simulations of the retina image with the pupil as a point with different transition distances (Td) from the center of the eye box. When the pupil moves inside of the eye box and Td is smaller than 2.25 mm from the eye box center, a sharp image can be seen. The observed images become worse when the pupil moves out of the eye box. The bottom two rows represent simulations of the retina image when the pupil has an area of 4 mm. With a Td larger than 0.6 mm, the cross-talk has already appeared. As the Td from the eye box center increases, the cross-talk gradually becomes clear. The point value at the bottom of each image means the criteria value of image quality used in the experiments.

Download Full Size | PDF

3.3 Definition of the PMR in NED

We define PMR as a simple extension of the eye box while considering pupil size. The value of the PMR is shown in Eq. (1). The color map of the relationship between the PMR values, pupil size, and eye-relief is shown in Fig. 2, in which focal length f = 3.3 mm, lens width $w_l$ = 1 mm. The black region indicates cross-talk that appears in the LA-NEDs. In the condition of eye-relief being 13.2 mm with a pupil size of 4 mm, the value of PMR is zero, meaning that there is no place for pupil movement.

$$w_{PMR} = max(\frac{d_e w_l}{f} - w_p, 0)$$

 figure: Fig. 2.

Fig. 2. Color-map of PMR. Values of the black region are zero, indicating that there is no space to move. Pupils located in this region are not suitable. When the pupil size is 4 mm, the minimum eye-relief is 13.2 mm.

Download Full Size | PDF

In actual use, the PMR does not match the cross-talk perceived by users because it does not consider the effect of human perception. The human brain converts physical stimuli into electrochemical information that can be understood by the human brain. If the image noise is lower than a certain threshold, the brain does not perceive chemical signal information [33]. In LA-NEDs, when the pupil is beyond a certain distance from the eye box, cross-talk is present. Therefore, we define PPMR$_{50}$ for LA-NEDs as the area in which the user cannot perceive cross-talk, considering the limits of human visual perception. To quantify the effect of human visual perception on LA-NED cross-talk, the peak signal-to-noise ratio (PSNR) was adopted in this study. PSNR is commonly used to quantify reconstruction quality for images and videos [3,6,34,35]. PSNR values from infinity to zero indicate the quality of images or videos from best to worst [36]. The pixels in the target image or video are considered signals, the pixels in cross-talk are considered noise. When the noise in image or video is sufficiently large, the observed image quality decreases. Typical values of PSNR for lossy images and video compression are 30–50 dB, and 50 dB is considered high quality [22,23].

We simulated retina images based on pupil transition distance, eye-relief and pupil size. Then we calculated the PSNR value for each retina image compared with the target image(see Supplement 1). As the pupil rotation angle increases, the PSNR value of the retina image also decreases. The bottom two rows of Fig. 1 present the simulations of retina images with different transition distances from the center of the LA-NEDs and PSNR values under the condition of the pupil size of 4 mm. We do not perceive any cross-talk with PSNR larger than 50 dB due to human visual perception limitations. Therefore, we used PSNR = 50 as a boundary to define PPMR$_{50}$. Figure 3 shows PSNR values with different pupil transition distances with a pupil size of 4 mm and an eye-relief of 15.2 mm. When the PSNR value is infinite, no cross-talk in the image in the region of the PMR. The PPMR$_{50}$ value should be added to two regions with PSNR values ranging from infinity to 50 dB. PPMR$_{50}$ is determined in Eq. (2). If pupil size and human visual perception are not considered in LA-NEDs, the eye box size, which does not match what is actually observed, is much larger than the PPMR$_{50}$. We will see how well the PPMR$_{50}$ matches human perception in the user study in Section 5.2. All subsequent dynamic image generation is discussed based on PPMR$_{50}$.

$$w_{PPMR_{50}} = w_{PMR} + 2 w_{PSNR50}$$

 figure: Fig. 3.

Fig. 3. PSNR values for eye-relief is 15.2 mm, and pupil size is 4 mm.

Download Full Size | PDF

3.4 Dynamic eye box generation based on eye-gaze directions

Assuming that the average human pupil diameter is 4 mm and the thin-form factor we need is 15.2 mm, the PPMR$_{50}$ of the LA-NEDs is only 1.24 mm. The limited PPMR$_{50}$ has a critical drawback that hinders LA-NED systems from being practically used. Therefore, it is essential to enlarge the PPMR$_{50}$ to obtain continuous images between pupil positions so that we can use the full displays without loss of FOV.

The main idea of this paper is that the dynamic generation of PPMR$_{50}$ following eye-gaze directions can be calculated by an eye tracker to eliminate cross-talk. In LA-NEDs, a thin, two-dimensional array of converging lenses places a focal length, $d_l$, in front of a microdisplay. The focal length and width of each lens in the LA are $f$ and $w_l$, respectively. The microdisplay is divided into several elemental parts. Each region and corresponding lens synthesize an off-axis perspective projection of the virtual image plane, located at a distance, $d_o$, from the LA [1]. The PPMR$_{50}$ in LA-NEDs with eye-relief $d_e$, which cannot perceive cross-talk, is shown in Eq. (3).

$$w_{PPMR_{50}} = \frac{d_e w_l}{f} - w_p + 2w_{PSNR50}$$

While generating microdisplay images according to pupil position and eye-relief, the microdisplay is divided into $N_l$ elemental images. Different eye-reliefs and pupil positions cause each lens to correspond to different elemental image areas. As shown in Fig. 4, the elemental images on the microdisplay are redistributed based on pupil position. As the pupil moves from top to bottom, the image belongs to the same lens, changing from $\Delta w_s$ to $\Delta w_s'$, corresponding virtual image changes from $\Delta w_o$ to $\Delta w_o'$. This indicates that dynamic microdisplay image generation relies on changes in pupil position (see Visualization 1). The microdisplays with different generation angles are shown in Fig. 5(c,g). The position of the PPMR$_{50}$ changes following the eye gaze directions, which output from the eye-tracker. This process is shown in Fig. 4. The "dynamic PPMR$_{50}$" covers the moving pupil, which causes the cross-talk to not be perceived.

 figure: Fig. 4.

Fig. 4. Optical structure layout of an NED using an $N_l$ LA. Each lens has a focal length, $f$ and width, $w_l$, positioned a distance, $d_l$, in front of a microdisplay of width $w_s$. The virtual image distance from the virtual image plane to the LA is $d_0$.

Download Full Size | PDF

 figure: Fig. 5.

Fig. 5. (a) Monocular VR glasses prototype. The prototype comprises a microdisplay in front of an LA with an eye tracker fixed underneath. We used a 3D-printed glasses frame. The microdisplay size was 15.36 $\times$ 8.64 mm, and the resolution was 1,280 $\times$ 720. The driver comes from a Sony HMZ-T1 personal media viewer [39,40] whose magnifying eyepieces were removed. (b) Input image. (c) Microdisplay image generated with an eye-gaze rotation angle of 0 $^{\circ }$. (e) The optical part of the prototype with a detachable eye-relief ruler on the right side to measure eye-relief. (f) Observed image taken with a Huawei P40 smartphone with an aperture of 2.4 mm and a focal length of 35 mm. (g) Microdisplay image generated with an eye-gaze rotation angle of 12 $^{\circ }$.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. Software structure, including calibrations, microdisplay image generation, and retina image simulation.

Download Full Size | PDF

Based on the above principle, a prototype feasibility analysis is discussed next. When users stare at a single point, the average amplitude of eye motion jitter perceived as most natural varies between 0.014 and 0.029 mm [37]. The accuracy of the eye tracker used in our prototype is 0.12 mm [38], which is larger than the pupil jitter. Additionally, the eye-relief set is 15.2 mm in our prototype; thus, the PPMR$_{50}$ can cover the pupil jitter distance and eye tracker accuracy.

4. Implementation

Based on the considerations described in Section 3.4, we implemented a prototype of monocular LA-NED for an experiment to verify the effectiveness of dynamic image generation on eye-gaze tracking. In this section, the prototype is described in terms of hardware and software.

4.1 Hardware implementation

The monocular prototype shown in Fig. 5(a) consists of a driver board, a display, an LA, and an eye tracker (see Visualization 2). The glasses frame was printed using a 3D printer (Agilista-3200). A replaceable eye-relief ruler (Fig. 5(e)) was attached to the eye area of the frame. When the user wears the display, the tool touches the side of the nasal bone, and the eye-relief can be adjusted to a predetermined value. The driver board comes from a Sony HMZ-T1 personal media viewer [39,40] with magnifying eyepieces removed. The display in the prototype is a Sony ECX332A, which offers $1,280 \times 720$ (720p) resolution at 0.7" size. The focal length and size of each lens of the LA were $f = 3.3$ mm and $w_l = 1$ mm. The LA is an $N_l$ array of $23 \times 15$. The LA places a distance, $d_l$, in front of the microdisplay. The parameters in our prototype are completely the same as Lanman’s [1] prototype. Although $d_l$ affects the distance from the display to the virtual image plane, we empirically set it so that the user can see a blur-free image through the LA-NED at $d_o = 0.5$ m. The eye tracker used in this paper was manufactured by Pupil Labs Company. The frame rate is 120 Hz and the accuracy is 0.6$^{\circ }$ [38,41,42]. According to the size of the general human pupil and the measurement accuracy of the eye tracker, an eye-relief of $d_e = 15.2$ mm was used in this prototype, which produces FOV $\alpha = 45.1 \times 26.3^{\circ }$.

4.2 Software implementation

All programs were implemented in C++ with an OpenGL library on a PC with an Intel Core i7 (Ubuntu, 3.7-GHz CPU, 16-GB RAM, and an NVIDIA GEFORCE GTX 1080Ti GPU). The software structure (as shown in Fig. 6) included three parts: calibrations, microdisplay image generation, and retina image simulation parts. The calibration results were used in the microdisplay image generation. The simulation part works separately from the main system and was used to visually and quantitatively analyze the images that users see, as shown in Fig. 1 in Section 3.

4.2.1 Angle offset calibration

The angle offset calibration calculates the offset angle between the microdisplay and the LA. An angle offset caused by the manufacturing process is inevitable, and it converges each pixel of the elemental image at different points through the LA. It causes the virtual image to become dark and blurred. The purpose of angle offset calibration is to correct those pixel positions, causing those that belong to one virtual image to overlap in the software. We observed changes in imaging quality by rotating the image on the screen and choosing the sharpest image through visual inspection. The rotation angle corresponding to the sharpest image was the offset angle between the LA and the microdisplay.

4.2.2 Eye-gaze tracking calibration

In the theory explanation process, pupil transition distance was used, which is more clear to show the relative position between pupil and eye box compared with pupil rotation angle. But pupil rotation angle was used instead of transition distance in the experiments because changes in the relative positional relationship between the pupil and the HMD are caused by eye rotation rather than the parallel transition of the face or eyes. Equation 4 is the convert equation from transition distance (Td) to rotation angle. The eye rotation radius, $r$, was 11.8 mm [43]. The Pupil Lab software outputs the gaze angle, but this is relative to the face. Instead, we used the pupil position detected in the image to determine the absolute gaze angle for the display. The relationship between the marker position and the pupil rotation angle can be derived according to Eqs. (5) and (6). $p_x$ is the pixel pitch. The middle point of the microdisplay is the reference point for the rotation angle. $N$ is the number of pixels between the point at which we stare and the center point of the input image. The virtual image was projected at a distance of 500 mm from the microdisplay in the prototype.

$$\theta_0 = \arctan(Td/r)$$
$$w_d=N p_x$$
$$\theta = \arctan((d_0+d_e+r)/w_d)$$

The process of transforming pupil positions output from the eye tracker to pupil rotation angles describes the eye-gaze tracking calibration. In our prototype, owing to the limitations of the microdisplay, the rotation angle, $\theta$, was smaller than $20^{\circ }$. According to our test, the relationship between pupil coordinates and pupil rotation angles is a linear regression within a $20^{\circ }$ rotation angle. For calibration, users were asked to gaze at two black dots corresponding to different rotation angles. The relationship between $N$ and pupil coordinates $x$, which is the output from the eye tracker, was retrieved. Using the captured data, a linear regression equation for converting the eye rotation angle was calculated (Eq. (7)). The calibration was performed every time the user removed the display.

$$\theta = \frac{\theta_1 - \theta_2}{x_1 - x_2}x + \theta_1 - \frac{\theta_1 - \theta_2}{x_1 - x_2}x_1$$

4.2.3 Microdisplay image generation

The microdisplay image dynamic generation based on eye-gaze directions aims to dynamically form the PPMR$_{50}$ to eliminate cross-talk in the LA-NEDs. Two threads in this process. Thread one transfers eye-gaze coordinates output from the eye tracker to an eye rotation angle. Thread two gets the rotation angle from thread one through shared memory. Microdisplay images are generated according to eye rotation angles and eye-relief, then rendered and updated to the microdisplay image texture shown on the microdisplay.

When generating microdisplay images, as shown in Fig. 4, each pixel of the microdisplay belongs to only one lens of the LA. The distribution method is based on off-axis perspective projection. According to the principle of single-lens imaging, each pixel value of the microdisplay can be calculated. When the pupil position changes, the relationships between pixels and lenses are redistributed, causing each pixel value to change. The regeneration microdisplay images are shown in Fig. 5(c) and 5(g).

Generally, the mapping relationship between the microdisplay image and the virtual image is calculated after obtaining the location information for the next state. We should calculate the mapping information for each pixel of the microdisplay image and then calculate the value of each pixel according to each rotation angle. However, this process is tedious and time consuming. The rotation angle in the vertical direction is much smaller than that in the horizontal direction, and both directions, in theory, are the same. Hence, we created a lookup table (LUT) for initialization only considering the horizontal eye rotation to store the mapping relationships between the microdisplay image and the virtual image. When the eye-relief is 15.2 mm, the PPMR$_{50}$ is $3 ^{\circ }$. The interval angle for generating a microdisplay image should be smaller than that of the PPMR$_{50}$. We tested interval angles $2 ^{\circ }$, $1 ^{\circ }$, $0.5 ^{\circ }$, and $0.25 ^{\circ }$ using different rotation speeds, and the virtual image quality of the $2 ^{\circ }$ interval angle was the worst. The virtual image quality of the other angles was nearly the same. To save computer resources, an interval angle of $1 ^{\circ }$ was chosen. When the rotation angle was obtained, the microdisplay image was quickly calculated using the LUT.

4.2.4 Retina image simulation

The retina image simulation was used to quantify the human visual perception of LA-NED cross-talk. Our PPMR$_{50}$ was calculated based on the simulated retina images, and the scoring reference for the experiments came from the retina image simulation. During this simulation, each entrance pupil ray is traced to form a retina image. Each point on the retina is illuminated by many rays. The beams of light entering the pupil can be considered infinite. During retina image simulation, the larger the number of light rays entering the pupil, the closer the simulation result is to actual human perception. To balance the calculations, we set the light rays to 2,000, and the simulation results are shown in Fig. 1. These rays were refracted through the eye lens and contributed to the observed image. When the rays hit the LA, the rays were refracted according to their focal lens, and their intersection with the display was computed. The final irradiance at the image point was given as the average of the radiance along all incident paths [44,45]. The degree to which the simulated image matches the actual human perception is evaluated in Section 5.2.

5. Experiments

To examine the validity of the PPMR$_{50}$ and explore the effect of eye-gaze tracking on the cross-talk of LA-NEDs, three experiments were devised. In Experiment 1, we aimed to confirm the validity of the PPMR$_{50}$ defined in Section 3 by comparing it to a subjective evaluation. Specifically, we checked how the participants perceived images through the LA-NEDs when their gaze directions were guided to specific angles under three eye-relief conditions. In Experiment 2, we investigated how the participants perceived image changes by applying dynamic microdisplay image generation when their gaze direction followed specific angles. In Experiment 3, we explored the influence of dynamic image generation when participants were allowed to freely change gaze directions using several videos (see Visualization 3, Visualization 4, Visualization 5, Visualization 6) instead of images.

5.1 Participants

Participants were recruited using announcements and a campus mailing list at the author’s institution. Under the oversight of the university’s ethical review committee, informed consent was obtained from each participant after the experiments were fully explained. Each participant was paid 18.5 USD. Seventeen (7 males and 10 females), ages ranging from 22 to 31 with a mean age of 25 years, participated in the user study. All took part in the three experiments. They had a maximum $300^{\circ }$ myopia and a maximum $50^{\circ }$ astigmatism; one participant had contact lens correction. The "absolute" pupil size is not directly available from the eye-tracker we used as the output changes as following the distance from the eye tracker to the pupil. Thus, we decided to measure each participant’s physical pupil size with the ruler. In addition, we confirmed that the pupil size of the participants did not change hugely during the pre-user study and decided to assume that the pupil size was constant during the experiment. We shared two videos (see Visualization 7 and Visualization 8). The values in the green circle in the videos show the pupil size, which ranges from 2.8 mm to 3.2 mm in Visualization 7 and 6.1 mm to 6.6 mm in Visualization 8. The changes in pupil size in experiments are small so we set the pupil size as a constant value in the experiments. During the experiments, microdisplay image generation was based on the angle change of the pupil relative to the initial position. Therefore, participants’ heads were required to remain still. During the pre-user study, we confirmed that the participants’ pupil sizes measured by the eye tracker did not change significantly. Thus, we assumed that this value could remain static during the experiments. Before the experiments, we measured each participant’s pupil size with a ruler. All participants’ pupil sizes were nearly 4 mm.

5.2 Experiment 1: comparison of PMR/PPMR$_{50}$ and actual human visual perception

5.2.1 Overview

The purpose of the first experiment was to determine whether the range in which a user does not perceive LA-NED cross-talk is consistent with PPMR$_{50}$. Three eye-reliefs at 14.2, 15.2 and 16.2 mm were used, considering the accuracy of the eye tracker.

5.2.2 Procedure

Before the experiment, participants were educated about cross-talk and the scoring process for observed image quality, as shown in Fig. 1. A Likert-type scale from "5" (best: no cross-talk observed) to "1" (worst) was applied. The participants were instructed to wear the NED prototype and match the eye-relief to the target value. First, a 14.2 mm eye-relief was applied. We used auxiliary equipment for eye-relief adjustment, as shown in Fig. 5(b), which aims between the eyes and the nose. The participants adjusted the horizontal position of the lens using a physical slider until they could see a clear image. They were instructed to keep their heads still afterward.

The participant then observed the two black dots corresponding to different rotation angles on the target image, as shown in Fig. 5(b). There were 25 dot location types. The interval angle of the adjacent dots ranged from $-6^{\circ }$ to $6^{\circ }$ at $0.5^{\circ }$ intervals. We defined the right rotation as a positive angle and the left rotation as a negative angle. The starting position was the center point of the image. The experimenter presented each dot randomly and asked the participants to focus on them by rotating their eyes rather than moving their heads. Participants observed the images and provided scores for perceived image quality. These procedures were repeated at least twice for each angle. If the scores of the two trials for the same dot differed, a third trial was performed to ensure consistency. These procedures were repeated for all eye-relief conditions. If the score fell short of the maximum (i.e., five) at the $0 ^{\circ }$ dot position, the head position was assumed to have shifted. The trials were then performed again.

5.2.3 Results and discussion

From the definition of the subjective score, we assumed that cross-talk was not perceived by the participants when the score was "5." The box-plot in Fig. 7 shows the minimum eye rotation angles for both positive and negative directions just before each score that was less than 5 for eye-relief. We calculated the PMR range using Eq. (1) and PPMR$_{50}$ using Eq. (2), based on the mean rotation angle. According to those equations, when the eye-reliefs were 14.2, 15.2, and 16.2 mm, the values of the eye box angle were respectively $\pm 10.3^{\circ }$, $\pm 11.0^{\circ }$, $\pm 11.7^{\circ }$, PMR values were $\pm 0.7^{\circ }$, $\pm 1.5^{\circ }$, $\pm 2.2^{\circ }$, and PPMR$_{50}$ values were $\pm 2^{\circ }$, $\pm 3^{\circ }$, $\pm 4^{\circ }$. The medians of the box-plots were $(-2.5^{\circ }, 2^{\circ })$, $(-2.5^{\circ }, 3^{\circ })$, and $(-3.5^{\circ }, 4^{\circ })$. The PMR and PPMR$_{50}$ values were both much smaller than the size of the eye box. The results of the experiment verify that the PPMR$_{50}$ was the most consistent measure of the actual range when the participant did not see cross-talk for the Fig. 5(b). Based on the PMR definition, the pupil moving within the eye box is a more conservative value. When human perception is accounted for, the value deviates from the visual perception, especially as the eye-relief becomes large.

 figure: Fig. 7.

Fig. 7. Results of Experiment 1: In the condition of pupil size of 4 mm, the boundary when the participants first observed the cross-talk in different eye-reliefs is compared with theoretical values. The box-plot shows the inter-user variation (i.e., one sample per user) of each user’s threshold angle (that is an eye angle at which the cross-talk is not just barely visible). The medians of the box-plots were $(-2.5^{\circ }, 2^{\circ })$, $(-2.5^{\circ }, 3^{\circ })$ and $(-3.5^{\circ }, 4^{\circ })$ corresponding to PSNR values of (47.3, 51.8), (54.5, 49.7) and (52.3, 47.7). The PPMR$_{50}$s of three eye-reliefs are $\pm 2^{\circ }$, $\pm 3^{\circ }$, $\pm 4^{\circ }$. This result shows that the PSNR of the image corresponding to the median of the box-plot (the simulated image of the picture the user would see) is close to 50.0. It indicates that the use of PSNR=50 as the PPMR threshold is appropriate. The difference between Eye box and PMR is considering pupil size or not. The transition distance, which is the same as the pupil size (4mm), corresponds to a rotation angle $\pm 9.6 ^{\circ }$ converted from Eq. (4).

Download Full Size | PDF

5.3 Experiment 2: verify the effect of the dynamic microdisplay image generation using images

5.3.1 Overview

The purpose of Experiment 2 was to investigate the dynamic image generation and eye-gaze tracking effects on cross-talk in both static and dynamic situations. Four images in this experiment to check the effects of different appearance properties on human perception. Figure 5(b) as image 1, from top to bottom of Fig. 8 are image 2, 3, and 4. Under conditions without dynamic image generation, the microdisplay image was generated assuming an eye angle of $0 ^{\circ }$. These images had seven dot markers corresponding to eye rotation angles of $-12^{\circ }$, $-8^{\circ }$, $-4^{\circ }$, $0^{\circ }$, $4^{\circ }$, $8^{\circ }$, $12^{\circ }$ (Eqs. (5) and 6) and IDs (i.e., "0" to "6"). Considering the accuracy of the eye tracker ($0.6 ^{\circ }$), we adopted a 15.2 mm eye-relief (PMR value was ($1.5 ^{\circ }$)) for Experiments 2 and 3, as discussed in Section 3.4.

 figure: Fig. 8.

Fig. 8. The simulated retina images and PSNR values with different contents and pupil rotation angle with pupil size 4 mm.

Download Full Size | PDF

5.3.2 Procedure

Following a 5-min break after Experiment 1, the participants performed an initial adjustment of the display from Experiment 1. The condition without eye-gaze tracking was performed first. The experimenter informed the participants of the ID number, which they used to identify the dot markers on the image. They observed the images and dots while keeping their eyes still and annotating their five-point image quality measure. The order of the IDs was randomized. After seven trials, dynamic image generation with eye-gaze tracking was applied. The eye-tracking calibration described in Section 4.2 was then executed. Next, the same procedure was repeated. Participants switched perspectives between dots and annotated scores for each eye rotation angle according to the worst momentary perceived image quality. The interval of the two adjacent markers was $4 ^{\circ }$, and the rotation interval angle ranged from $4^{\circ }$ to $24^{\circ }$. These steps were repeated for the other three images. The order was controlled for all participants.

5.3.3 Result and discussion

Figure 9 shows the results of users staring at a point with and without eye-gaze tracking. Under the condition without dynamic image generation, as the eye rotation angle increased, the difference from the angle used for microdisplay image rendering (i.e., image generation angle) also increased. Accordingly, as in Experiment 1, the subjective score indicating the quality of the image decreased. The decrease in score was slightly more pronounced in Image 1. Because Image 1 has a uniform white background. The black region in Fig. 9 is the eye box. All of the image quality decreased during rotation angle out of the PPMR. The results of this experiment without eye-gaze tracking verified that eye box is not suitable for assessing the no cross-talk region when considering pupil size and human visual perception. Moreover, in the condition of dynamic image generation using eye-gaze tracking, the maximum score was obtained in all cases for all participants and conditions. We found that the dynamic image generation worked well when participants focused their eye gazes on a single point.

 figure: Fig. 9.

Fig. 9. Results of Experiment 2. As the rotation angle increased, observed image quality became worse without eye-gaze tracking. When generating microdisplay images with eye-gaze tracking, the observed image quality was "5".

Download Full Size | PDF

Figure 10 shows the score summaries when participants moved their eyes between two dots according to the rotation angle. When the eye rotation was less than 4$^{\circ }$, the score was "5" and did not decrease, indicating that participants did not observe cross-talk. However, as the rotation angle increased, the score decreased. It indicates that cross-talk appeared.

 figure: Fig. 10.

Fig. 10. Results of Experiment 2: Observed image quality according to different rotation interval angles of the four images.

Download Full Size | PDF

5.4 Experiment 3: verify the effect of the dynamic microdisplay image generation using videos

5.4.1 Overview

The aim of Experiment 3 was to investigate how dynamic image generation with eye-gaze tracking would affect subjective cross-talk while freely observing videos. We also investigated the effect of different pixel property changes on cross-talk in the four videos (see Visualization 3, Visualization 4, Visualization 5, Visualization 6).

5.4.2 Procedure

Following a 5-min break after Experiment 2, the following procedure was applied. First, the participants completed and reviewed a questionnaire about the evaluation items to which they should focus for each video. The items were as follows.

  • Visualization 3: Pay attention to the moving cartoon characters. Under what circumstances and where does the cross-talk appear?
  • Visualization 4: Pay attention to the waterfall. Under what circumstances and where does the cross-talk appear?
  • Visualization 5: Pay attention to the close and distant views. Under what circumstances and where does the cross-talk appear?
  • Visualization 6: Pay attention to the birds and contrasting surroundings. Under what circumstances and where does the cross-talk appear?

The participants wore the HMD, adjusted their head positions, and performed calibration as in Experiment 2. Participants observed one of the four videos selected at random without the dynamic image generation. Then participants were free to move their eyes while watching the videos (see Visualization 3, Visualization 4, Visualization 5, Visualization 6). The procedure was repeated for all four videos. After removing their displays, they scored the perceived quality of each video using the five-point scale and provided their feedback.

Next, the participants wore the display again. After performing the initialization and the calibration, they tried the same procedures with the dynamic image generation.

5.4.3 Result and discussion

Figure 11 shows the results of the perceived video quality. A Shapiro–Wilk test showed that the data in Experiment 3 violated normality (p < 0.05). To compare the results with and without dynamic image generation and eye-gaze tracking, a non-parametric Wilcoxon test was performed. We then found significant differences with and without dynamic image generation for all videos (p < 0.01).

 figure: Fig. 11.

Fig. 11. The result of Experiment 3: Score of four videos quality without and with eye-gaze tracking. ** indicates p < 0.01.

Download Full Size | PDF

Table 1 provides a summary of the participants’ feedback from the questionnaire. According to the results, most participants noticed cross-talk when rotating their eyes quickly or widely or blinking. Seven participants easily perceived cross-talk on the bright parts caused by sunlight reflecting off the water in video 3 (see Visualization 3). This formed cross-talk areas with noticeable pixel-value differences from neighboring regions that were easier to be noticed. We anticipated cross-talk should not appear at locations where the pixel value changed rapidly. However, most participants perceived it on the waterfall and bird bodies. Hence, the speed of pixel-value changes did not decrease the extent of cross-talk.

Tables Icon

Table 1. Summary of the feedback on the questionnaire and corresponding number of the participants

6. Discussion and conclusion

NEDs are innovations to achieve long-term, lightweight HMDs. In this study, we newly defined the PPMR$_{50}$ region, in which the full extent of an image could be perceived by considering pupil size and human visual perception. However, a narrow PPMR$_{50}$ has a fatal drawback. We designed an eye-gaze tracking-based prototype of an LA-NED to dynamically generate PPMR$_{50}$ regions. We carried out a detailed analysis of the system and demonstrated through experiments that this method can effectively eliminate cross-talk based on the static viewing angle. However, rapid and wide-range pupil movements can still cause cross-talk. We proposed corresponding solutions based on system analysis, as described in the following subsections.

6.1 Summary of results

We conducted three experiments to verify the feasibility of our PPMR$_{50}$ measure and to assess the effects of dynamic image generation with an eye tracker. The result of Experiment 1 demonstrated that the PPMR$_{50}$, which considers pupil size and human visual perception in LA-NEDs, was more consistent than the eye box measure with subjective perception. We added another three different content of images with different rotation angles and corresponding PSNR values as a new Fig. 8 and confirmed that PSNR=50 is generally an appropriate threshold for another image as well. We show the simulated retina images (see Supplement 1) in the condition of different resolutions of microdisplay on this basis, the results indicate that the PPMR$_{50}$s have no significant difference for 1K, 2K and 4K microdisplays. The results of Experiment 2 showed that eye-gaze tracking can effectively eliminate the cross-talk of a static eye gaze with four image types. For the dynamic case, cross-talk was effectively eliminated at a small range of eye rotation. However, it occurred with large or sudden eye rotations. The results of Experiment 3 show that eye-gaze tracking effectively alleviated cross-talk during free eye rotation. However, it was still perceived with rapid or extensive eye movements. The questionnaire showed that cross-talk areas with noticeable pixel value differences from neighboring regions were likelier to be observed. From the user feedback about Videos 2 and 4 (see Visualization 4, Visualization 6), we found that rapid changes in pixel values did not reduce the extent of cross-talk.

6.2 Limitations of the prototype

Generating virtual images with eye-gaze directions can effectively eliminate cross-talk with static eye-gaze. However, owing to system latency, the speed of dynamic PPMR$_{50}$ generation was unable to fully cover pupils with rapid and extensive movements. The key reason for cross-talk that occurred with blinking eyes was the system latency. When users close their eyes and quickly reopen them, system delays cause cross-talk to be visible before the appropriate microdisplay image can be updated. Solving this latency is key to resolving the problem. When eye-gaze tracking is performed, the total latency between the pupil movement and the display of an updated microdisplay image should be small enough to keep the pupil within the dynamically generated PPMR$_{50}$. Ideally, the latency of the system, $T_{l}$, should satisfy the following criteria:

$$T_{l} = T_{eg} + T_{gr} + T_{rd} < \frac{PPMR_{50}}{2 v_{eye}}$$

The latency of the system, $T_{l}$, was 32 ms. System delay comprises three parts: the time needed by the eye tracker to capture the pupil position and output the data to the microdisplay image generation program for data analysis, $T_{eg}$; the conversion of pupil coordinates to rotation angle, including microdisplay image generation and rendering, $T_{gr}$; and the monitor refresh time-consumption, $T_{rd}$. The average value of $T_{eg}$ was 8 ms. The time of microdisplay image generation and rendering, $T_{gr}$, was 7 ms, as explained in Section 4. The refresh rate of the monitor and microdisplay were both 60 Hz, $T_{rd}$ was 17 ms, and $v_{eye}$ was the moving speed of eye movement. When the eye-relief was 15.2 mm in our prototype, the PPMR$_{50}$ was 6$^{\circ }$. This satisfies the maximum velocity of saccadic pupil movement at $94 ^{\circ }/s$. According to related research, the general speed of saccadic pupil movement is less than $174 ^{\circ }/s$. The peak speed can reach $720 ^{\circ }/s$ [46]. The steering speed and latency of the prototype with eye-gaze tracking can cover a slow eye movement at $v_{eye} < 94 ^{\circ }/s$. However, when there is a large saccadic eye movement, the dynamic PPMR$_{50}$ generates slower than pupil movement due to the relative latency of 32 ms of the current prototype. A dynamic PPMR$_{50}$ will be formed outside the pupil, causing cross-talk to appear. It is therefore understandable that cross-talk was still not eliminated when the user’s eyes rotated quickly, as in Experiments 2 and 3. In addition, we did not consider pupil swim in the process of pupil rotation in current work, we leave it as future work.

6.3 System optimized

For a general speed of saccadic pupil movement of $174 ^{\circ }/s$, the latency of the system should be smaller than 17.2 ms. For a peak speed of $720 ^{\circ }/s$ [46], the latency should be smaller than 4 ms. Following the acceleration of the image generation process, as described in Section 4.2.4, $T_{eg}$ and $T_{rd}$, which are the latency of the eye tracker and the refresh rate of the microdisplay, respectively, accounted for most of the delay. Enhancement of the tracking camera performance and refresh rate of the microdisplay can effectively reduce system latency. Anastasios N. Angelopoulos created a novel eye-gaze tracking camera [47] that drastically reduced sampling time. Simultaneously, the refresh time of the microdisplay, LCX202A, made by the Sony company [48], achieved 4 ms. Preliminary estimates showed that the latency of our prototype could be less than 17 ms. This could potentially allow our current prototype to fully eliminate cross-talk for the general speeds of saccadic pupil movements at $174 ^{\circ }/s$. Another improvement would be to extend the PPMR$_{50}$ using optimal lens specifications. Currently, the LA lens width is only 1 mm. It can be enlarged to 1.5 mm with a focal length of 3 mm [49]. When the eye-relief is 15.2 mm, the PPMR$_{50}$ would be $8.6 ^{\circ }$. A new prototype with a novel eye-tracker camera [47] and a high refresh rate microdisplay would satisfy the maximum peak velocity of $410 ^{\circ }$/s.

In the future, we plan to focus on improving the resolution and enlarging the FOV of LA-NEDs. We hope that our work will contribute to expanding the world of VR.

Funding

Japan Society for the Promotion of Science (15H01700).

Acknowledgments

The authors would like to thank Dr. Jun Ohta and Dr. Yukiharu Uraoka for useful discussions. The authors would like to thank the Sony Semiconductor Solutions Corporation for providing us with the hardware.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results in this paper are available upon request to the corresponding author.

Supplemental document

See Supplement 1 for supporting content.

References

1. D. Lanman and D. Luebke, “Near-eye light field displays,” ACM Trans. Graph. 32(6), 1–10 (2013). [CrossRef]  

2. R. Konrad, N. Padmanaban, K. Molner, E. A. Cooper, and G. Wetzstein, “Accommodation-invariant computational near-eye displays,” ACM Trans. Graph. 36(4), 1–12 (2017). [CrossRef]  

3. S. Lee, Y. Jo, D. Yoo, J. Cho, and B. Lee, “Tomographic near-eye displays,” Nat. Commun. 10(1), 2497 (2019). [CrossRef]  

4. Q. Gao, J. Liu, X. Duan, T. Zhao, X. Li, and P. Liu, “Compact see-through 3d head-mounted display based on wavefront modulation with holographic grating filter,” Opt. Express 25(7), 8412–8424 (2017). [CrossRef]  

5. S. Liu, H. Hua, and D. Cheng, “A novel prototype for an optical see-through head-mounted display with addressable focus cues,” IEEE Trans. Visual. Comput. Graphics 16(3), 381–393 (2010). [CrossRef]  

6. A. Maimone, G. Wetzstein, M. Hirsch, D. Lanman, R. Raskar, and H. Fuchs, “Focus 3d: Compressive accommodation display,” ACM Trans. Graph. 32(5), 1–13 (2013). [CrossRef]  

7. J. Ratcliff, A. Supikov, S. Alfaro, and R. Azuma, “Thinvr: Heterogeneous microlens arrays for compact, 180 degree fov vr near-eye displays,” IEEE Trans. Visual. Comput. Graphics 26(5), 1981–1990 (2020). [CrossRef]  

8. B. Winn, D. Whitaker, D. B. Elliott, and N. J. Phillips, “Factors affecting light-adapted pupil size in normal human subjects,” Investigative ophthalmology & visual science 35(3), 1132–1137 (1994).

9. K. Bang, Y. Jo, M. Chae, and B. Lee, “Lensiet vr: Thin, flat and wide-fov virtual reality display using fresnel lens and lensiet array,” IEEE Trans. Visual. Comput. Graphics 27(5), 2545–2554 (2021). [CrossRef]  

10. C. Jang, K. Bang, S. Moon, J. Kim, S. Lee, and B. Lee, “Retinal 3d: augmented reality near-eye display via pupil-tracked light field projection on retina,” ACM Trans. Graph. 36(6), 1–13 (2017). [CrossRef]  

11. C. Jang, K. Bang, G. Li, and B. Lee, “Holographic near-eye display with expanded eye-box,” ACM Trans. Graph. 37(6), 1–14 (2018). [CrossRef]  

12. J. Jeong, J. Lee, C. Yoo, S. Moon, B. Lee, and B. Lee, “Holographically customized optical combiner for eye-box extended near-eye display,” Opt. Express 27(26), 38006–38018 (2019). [CrossRef]  

13. J. Kim, Y. Jeong, M. Stengel, K. Akşit, R. Albert, B. Boudaoud, T. Greer, J. Kim, W. Lopes, Z. Majercik, P. Shirley, J. Spjut, M. McGuire, and D. Luebke, “Foveated ar: dynamically-foveated augmented reality display,” ACM Trans. Graph. 38(4), 1–15 (2019). [CrossRef]  

14. D. Dunn, C. Tippets, K. Torell, P. Kellnhofer, K. Akşit, P. Didyk, K. Myszkowski, D. Luebke, and H. Fuchs, “Wide field of view varifocal near-eye display using see-through deformable membrane mirrors,” IEEE Trans. Visual. Comput. Graphics 23(4), 1322–1331 (2017). [CrossRef]  

15. X. Hu and H. Hua, “High-resolution optical see-through multi-focal-plane head-mounted display using freeform optics,” Opt. Express 22(11), 13896–13903 (2014). [CrossRef]  

16. K. Akşit, W. Lopes, J. Kim, P. Shirley, and D. Luebke, “Near-eye varifocal augmented reality display using see-through screens,” ACM Trans. Graph. 36(6), 1–13 (2017). [CrossRef]  

17. X. Shen, M. M. Corral, and B. Javidi, “Head tracking three-dimensional integral imaging display using smart pseudoscopic-to-orthoscopic conversion,” J. Disp. Technol. 12(6), 542–548 (2016). [CrossRef]  

18. N. Okaichi, H. Sasaki, M. Kano, J. Arai, M. Kawakita, and T. Naemura, “Design of optical viewing zone suitable for eye-tracking integral 3d display,” OSA Continuum 4(5), 1415–1429 (2021). [CrossRef]  

19. N. Okaichi, H. Sasaki, M. Kano, J. Arai, M. Kawakita, and T. Naemura, “Integral three-dimensional display system with wide viewing zone and depth range using time-division display and eye-tracking technology,” Opt. Eng. 61(01), 013103 (2022). [CrossRef]  

20. T. Shibata, “Head mounted display,” Displays 23(1-2), 57–64 (2002). [CrossRef]  

21. G. Tan, Y.-H. Lee, T. Zhan, J. Yang, S. Liu, D. Zhao, and S.-T. Wu, “Foveated imaging for near-eye displays,” Opt. Express 26(19), 25076–25085 (2018). [CrossRef]  

22. Vive, “https://www.vive.com/jp/,” Accessed: 2021-08-30.

23. Oculus, “https://www.oculus.com/,” Accessed: 2021-08-30.

24. G. Koo, D. Shin, J. C. Leong, and Y. H. Won, “Foveated high-resolution light-field system based on integral imaging for near-eye displays,” in Advances in Display Technologies X, vol. 11304 (International Society for Optics and Photonics, 2020), p. 1130417.

25. H. S. Park, R. Hoskinson, H. Abdollahi, and B. Stoeber, “Compact near-eye display system using a superlens-based microlens array magnifier,” Opt. Express 23(24), 30618–30633 (2015). [CrossRef]  

26. C. Yao, D. Cheng, T. Yang, and Y. Wang, “Design of an optical see-through light-field near-eye display using a discrete lenslet array,” Opt. Express 26(14), 18292–18301 (2018). [CrossRef]  

27. H. Huang and H. Hua, “High-performance integral-imaging-based light field augmented reality display using freeform optics,” Opt. Express 26(13), 17578–17590 (2018). [CrossRef]  

28. A. Maimone, A. Georgiou, and J. S. Kollin, “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36(4), 1–16 (2017). [CrossRef]  

29. R. Häussler, S. Reichelt, N. Leister, E. Zschau, R. Missbach, and A. Schwerdtner, “Large real-time holographic displays: from prototypes to a consumer product,” in Stereoscopic Displays and Applications XX, vol. 7237 (International Society for Optics and Photonics, 2009), p. 72370S.

30. S. Hong, D. Shin, J.-J. Lee, and B.-G. Lee, “Viewing angle-improved 3d integral imaging display with eye tracking sensor,” J. Inf. Commun. Converg. Eng. 12(4), 208–214 (2014). [CrossRef]  

31. M. F. Deering, “The limits of human vision,” in 2nd International Immersive Projection Technology Workshop, vol. 2 (1998), p. 1.

32. R. Rosenholtz, “Capacity limits and how the visual system copes with them,” Electron. Imaging 29(14), 8–23 (2017). [CrossRef]  

33. R. Blake and M. Shiffrar, “Perception of human motion,” Annu. Rev. Psychol. 58(1), 47–73 (2007). [CrossRef]  

34. S. Lee, C. Jang, S. Moon, J. Cho, and B. Lee, “Additive light field displays: realization of augmented reality with holographic optical elements,” ACM Trans. Graph. 35(4), 1–13 (2016). [CrossRef]  

35. G. Kuo, L. Waller, R. Ng, and A. Maimone, “High resolution étendue expansion for holographic displays,” ACM Trans. Graph. 39(4), 66 (2020). [CrossRef]  

36. D. Salomon, “Data compression: The complete reference (by d. salomon; 2007) [book review],” IEEE Signal Process. Mag. 25(2), 147–149 (2008). [CrossRef]  

37. S. Jörg, A. Duchowski, K. Krejtz, and A. Niedzielska, “Perceptual adjustment of eyeball rotation and pupil size jitter for virtual characters,” ACM Trans. Appl. Percept. 15(4), 1–13 (2018). [CrossRef]  

38. M. Kassner, W. Patera, and A. Bulling, “Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction,” in Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: Adjunct publication, (2014), pp. 1151–1160.

39. Sony HMZ-T1, “https://www.sony.jp/hmd/products/HMZ-T1/spec.html,” Accessed 2021-08-15.

40. Sony-hmz-t1-persional-3d-viewer, “https://www.cnet.com/reviews/sony-hmz-t1-personal-3d-viewer-review/,” Accessed 2021-08-15.

41. Pupil Labs, “https://pupil-labs.com/products/,” Accessed 2021-08-15.

42. Pupil Labs GitHub, “https://github.com/pupil-labs/pupil,” Accessed 2021-08-15.

43. G. L. Dick, B. T. Smith, and P. L. Spanos, “Axial length and radius of rotation of the eye,” Clin. Exp. Optom. 73(2), 43–50 (1990). [CrossRef]  

44. A. Plopski, L. Arno, W. Kashima, T. Taketomi, C. Sandor, and H. Kato, “Eye-gaze tracking in near-eye head-mounted displays,” in 28th Australian Conference on Human-Computer Interaction (OzCHI 2016), (2016).

45. R. L. Cook, T. Porter, and L. Carpenter, “Distributed ray tracing,” in Proceedings of the 11th annual conference on Computer graphics and interactive techniques, (1984), pp. 137–145.

46. R. A. Abrams, D. E. Meyer, and S. Kornblum, “Speed and accuracy of saccadic eye movements: characteristics of impulse variability in the oculomotor system,” J. Exp. Psychol. Hum. Percept. Perform. 15(3), 529–543 (1989). [CrossRef]  \

47. A. N. Angelopoulos, J. N. Martel, A. P. Kohli, J. Conradt, and G. Wetzstein, “Event based, near eye gaze tracking beyond 10, 000 hz,” arXiv preprint arXiv:2004.03577 (2020).

48. Sony LCX202A, “https://www.sony-semicon.co.jp/e/products/microdisplay/lcd/product.html,”, Accessed 2021-011-28.

49. Lenslet array, “http://www.okotech.com/microlens-arrays,” Accessed 2021-011-28.

Supplementary Material (9)

NameDescription
Supplement 1       Supplement 1
Visualization 1       This is the microdisplay video generation process with an eye tracker based on pupil position.
Visualization 2       This is the prototype of the paper.
Visualization 3       This is the process of microdisplay video generation of the cartoon used in experiment 3.
Visualization 4       This is the process of microdisplay video generation of waterfall used in experiment 3.
Visualization 5       This is the process of microdisplay video generation of natural landscape used in experiment 3.
Visualization 6       This is the process of microdisplay video generation of a bird used in experiment 3.
Visualization 7       This is the process of pupil size measurement.
Visualization 8       This is the process of pupil size measurement.

Data availability

Data underlying the results in this paper are available upon request to the corresponding author.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. Simulations of retina images under the condition of lens width 1 mm, the focal length 3.3 mm; the size and resolution of microdisplay are 15.36 mm $\times$ 8.64 mm and 1280 $\times$ 720 with the pupil as a point and an area. The black rectangle means a 4.5 mm eye box. The first row represents the simulations of the retina image with the pupil as a point with different transition distances (Td) from the center of the eye box. When the pupil moves inside of the eye box and Td is smaller than 2.25 mm from the eye box center, a sharp image can be seen. The observed images become worse when the pupil moves out of the eye box. The bottom two rows represent simulations of the retina image when the pupil has an area of 4 mm. With a Td larger than 0.6 mm, the cross-talk has already appeared. As the Td from the eye box center increases, the cross-talk gradually becomes clear. The point value at the bottom of each image means the criteria value of image quality used in the experiments.
Fig. 2.
Fig. 2. Color-map of PMR. Values of the black region are zero, indicating that there is no space to move. Pupils located in this region are not suitable. When the pupil size is 4 mm, the minimum eye-relief is 13.2 mm.
Fig. 3.
Fig. 3. PSNR values for eye-relief is 15.2 mm, and pupil size is 4 mm.
Fig. 4.
Fig. 4. Optical structure layout of an NED using an $N_l$ LA. Each lens has a focal length, $f$ and width, $w_l$, positioned a distance, $d_l$, in front of a microdisplay of width $w_s$. The virtual image distance from the virtual image plane to the LA is $d_0$.
Fig. 5.
Fig. 5. (a) Monocular VR glasses prototype. The prototype comprises a microdisplay in front of an LA with an eye tracker fixed underneath. We used a 3D-printed glasses frame. The microdisplay size was 15.36 $\times$ 8.64 mm, and the resolution was 1,280 $\times$ 720. The driver comes from a Sony HMZ-T1 personal media viewer [39,40] whose magnifying eyepieces were removed. (b) Input image. (c) Microdisplay image generated with an eye-gaze rotation angle of 0 $^{\circ }$. (e) The optical part of the prototype with a detachable eye-relief ruler on the right side to measure eye-relief. (f) Observed image taken with a Huawei P40 smartphone with an aperture of 2.4 mm and a focal length of 35 mm. (g) Microdisplay image generated with an eye-gaze rotation angle of 12 $^{\circ }$.
Fig. 6.
Fig. 6. Software structure, including calibrations, microdisplay image generation, and retina image simulation.
Fig. 7.
Fig. 7. Results of Experiment 1: In the condition of pupil size of 4 mm, the boundary when the participants first observed the cross-talk in different eye-reliefs is compared with theoretical values. The box-plot shows the inter-user variation (i.e., one sample per user) of each user’s threshold angle (that is an eye angle at which the cross-talk is not just barely visible). The medians of the box-plots were $(-2.5^{\circ }, 2^{\circ })$, $(-2.5^{\circ }, 3^{\circ })$ and $(-3.5^{\circ }, 4^{\circ })$ corresponding to PSNR values of (47.3, 51.8), (54.5, 49.7) and (52.3, 47.7). The PPMR$_{50}$s of three eye-reliefs are $\pm 2^{\circ }$, $\pm 3^{\circ }$, $\pm 4^{\circ }$. This result shows that the PSNR of the image corresponding to the median of the box-plot (the simulated image of the picture the user would see) is close to 50.0. It indicates that the use of PSNR=50 as the PPMR threshold is appropriate. The difference between Eye box and PMR is considering pupil size or not. The transition distance, which is the same as the pupil size (4mm), corresponds to a rotation angle $\pm 9.6 ^{\circ }$ converted from Eq. (4).
Fig. 8.
Fig. 8. The simulated retina images and PSNR values with different contents and pupil rotation angle with pupil size 4 mm.
Fig. 9.
Fig. 9. Results of Experiment 2. As the rotation angle increased, observed image quality became worse without eye-gaze tracking. When generating microdisplay images with eye-gaze tracking, the observed image quality was "5".
Fig. 10.
Fig. 10. Results of Experiment 2: Observed image quality according to different rotation interval angles of the four images.
Fig. 11.
Fig. 11. The result of Experiment 3: Score of four videos quality without and with eye-gaze tracking. ** indicates p < 0.01.

Tables (1)

Tables Icon

Table 1. Summary of the feedback on the questionnaire and corresponding number of the participants

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

w P M R = m a x ( d e w l f w p , 0 )
w P P M R 50 = w P M R + 2 w P S N R 50
w P P M R 50 = d e w l f w p + 2 w P S N R 50
θ 0 = arctan ( T d / r )
w d = N p x
θ = arctan ( ( d 0 + d e + r ) / w d )
θ = θ 1 θ 2 x 1 x 2 x + θ 1 θ 1 θ 2 x 1 x 2 x 1
T l = T e g + T g r + T r d < P P M R 50 2 v e y e
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.