A 10.7 Gb/s electronic predistortion transmitter using FPGA-based real time 2 Samples/bit digital signal processing based on 55-tap finite impulse response filters is described. Transmission over 1200 km of standard single-mode fiber without optical dispersion compensation is demonstrated with an OSNR penalty of only 2.5 dB, compared with back-to-back operation.
©2008 Optical Society of America
There has been considerable recent interest in high speed electronic digital signal processing (DSP) for overcoming transmission impairments in long haul optical transmission. Several techniques including maximum likelihood sequence estimation (MLSE) , electronic predisortion (EPD)  and coherent phase and polarisation diversity reception  have been demonstrated using integrated circuit implementations. While offering optimum performance, the cost and inflexibility of custom integrated circuit development is prohibitive for experimental research work. Off-line DSP has been used to demonstrate proof of principle. For instance, EPD waveforms can be calculated off-line and loaded into arbitrary waveform generators . For receiver based techniques, waveforms can be captured on high-speed digital sampling scopes and processed offline [5, 6]. However, these approaches do not prove real-time viability. An alternative is the implementation of real-time DSP using field programmable gate arrays (FPGA) which are low-cost and reprogrammable. In previous work , it was shown that DSP for 10 Gb/s EPD could be implemented on the latest FPGA. 10.7 Gb/s EPD modulator drive waveforms for transmission distances of up to 600 km of standard single mode fibre (SMF) were generated using real time DSP employing RAM-based look up tables (LUT), although at that time only a single waveform generator had been constructed. Due to the limited serial output bit rate on FPGAs available at that time, only 1 sample per bit (1 Sa/b) DSP was employed.
This paper describes a fully-assembled 10.7 Gb/s FPGA-based EPD transmitter with real time 2 Sa/b DSP. As it is known that the resources required to implement LUT compensation for chromatic dispersion scale exponentially with transmission distance , the transmitter uses finite impulse response (FIR) filters. It is shown that FIR filters are particularly attractive for EPD as not only does resource use scale linearly with chromatic dispersion, but the input words are only 1-bit, making implementation very efficient. For these reasons, it was possible to construct a 55-tap FIR filter on each of two FPGAs. Preliminary results using this system demonstrating 800 km transmission were published in . The purpose of this paper is to give a more detailed description of the transmitter and experimental technique particularly concerning the FIR filters, scaling and the method for tap weight calculation (sections 2 and 3) and to report improved transmission results (after further optimization of the system) for distances up to 1200 km (section 4). In addition, the potential for improved performance (section 5) and other applications of the transmitter and FPGA real time processing in experimental optical communications (section 6) are discussed.
2. Transmitter design
A top level diagram of the transmitter is shown in Fig. 1(a). The 10.7 Gb/s EPD signal was generated using a Cartesian Mach-Zehnder modulator (MZM) with an integrated laser emitting at 1554.94 nm. The MZM was driven by two drive voltages representing the real and imaginary parts of the required transmitted field. The real and imaginary waveforms were calculated in real time on separate Xilinx Virtex-4 FPGAs (4VFX100) and output as sixteen 5.35 Gb/s data streams using the multigigabit transceivers (MGTs). Each drive signal was then generated using a 21.4 GSa/s 4-bit resolution digital to analog converter (DAC) The signals were amplified so that the peak-to-peak amplitudes at the MZM inputs were approximately 1.5 V compared with the MZM Vπ of 7 V. This ensured that the MZM characteristic was approximately linear and, hence, no linearization function was applied in DSP. The amplifier output bias, amplifier gain and MZM phase control were adjusted to optimize transmission performance.
As integrated DACs with suitable sample rates were not available at the time of hardware construction, the DAC used in this work consisted of connectorized discrete components coupled using short semiflexible cables as shown in figure 1b. Initially the sixteen 5.35 Gb/s FPGA outputs were time aligned using variable phase shifters and then 4:1 time division multiplexed to four 21.4 Gb/s streams. The 4-bit resolution analog output was generated using attenuators and a passive 4-way coupler. One limitation on the performance of the system was due to the AC coupled inputs of the discrete 4:1 multiplexers. This limited the maximum run length on each FPGA output without errors occurring.
The system was synchronised using a 10.7 GHz crystal oscillator which was distributed to the eight multiplexers and divided by 64 to provide 167.2 MHz clocks for the FPGAs. The DeBruijn sequence to be transmitted was stored in a read only memory (ROM) on each FPGA. A pattern synchronisation pulse was sent between the two FPGAs each cycle to ensure pattern alignment. The 32 FPGA outputs were time aligned as described in  and the issue of reset dependent skew , which affects some families of FPGA including the Virtex-4, was overcome using a simple manual procedure each time the FPGAs were switched on.
3. DAC characterization
Following the method described in , the output voltage non-linearity and noise of the DACs were measured using a sawtooth test waveform. An example of the sawtooth for the Vreal DAC is shown in figure 8. The worst case integral non-linearity (INL) and differential non-linearity (DNL) were both less than 0.25 as a proportion of the least significant bit (LSB) for both DACs. The non-linearity performance could be further improved by controlling the power supply voltage of each multiplexer separately to fine tune the output levels. The standard deviation of the noise was measured to be 0.13 ± 0.01 of LSB across all output levels for both DACs. The total uncertainty of the DAC output due to noise and non-linearity was therefore much less than the LSB. However, as indicated in figure 2, the DACs also suffer from other dynamic imperfections such as switching transients, droop and hysteresis which are difficult to eliminate in the current implementation. The impact of these imperfections on the performance of the EPD transmitter will be discussed in section 7.
4. DSP architecture
A block diagram of the main functions implemented on each FPGA board to realise electronic predistortion is shown in Fig. 3. Both FPGAs had identical functions except for the pattern synchronisation circuit: the Vreal FPGA outputted a pulse at the start of each pattern cycle while the Vimag FPGA detected this pulse and reset its counter to ensure pattern alignment. All the main functions on the FPGAs used the 167.2 MHz clock. The pattern memory outputed 64-bits of the sequence each clock cycle to the DSP. The DSP consisted of 128 parallel 55-tap FIR filters (using identical tap weights) the outputs of which are scaled to 4-bit resolution. Hence, in each clock cycle, the DSP outputed 128 (64 bits × 2 Sa/b) 4-bit words each representing a DAC sample. These were re-ordered into sixteen 32-bit words, appropriately delayed for alignment purposes, and output by the FPGAs transceivers at 5.35 Gb/s. The 4-bit FIR filter tap weights could be updated during operation.
Figure 4 shows the detail of a single 55-tap FIR filter with output scaling implemented in the time domain. In general it is known from DSP theory that FIR filters of more than approximately 20 taps are more efficient to implement in the frequency domain as the number of multiplications is reduced . However in the specific case of EPD, the inputs to the filter are single bit words and hence the time domain multiplications are very simple. The initial 1-bit adders and the multipliers can be conveniently implemented as multiplexers on the FPGA. The inputs to the filter were 55 1-bit words consisting of the bit sequence sampled at 2 Sa/b, denoted by d(0…54) in Fig. 4. The symmetry of the chromatic dispersion impulse response was exploited to reduce logic resource requirements. The 4-bit FIR filter tap weights, denoted c(0…27), represent one half of the symmetrical impulse response. Quantizing the tap weights to 4-bit reduces the FPGA programmable logic resources required and was found by simulation to produce negligible transmission penalties.
The final sum output of each 55-tap FIR was a 10-bit word. This was scaled to 4-bits by carrying out the following arithmetic:
where b is the smallest expected value of the FIR sum and a is chosen such that the maximum expected value of yscaled is the binary number “1111111111111”. The 4-bit DAC output was then taken to be the four most significant bits of yscaled. The values of a and b must be chosen with care. One approach is to choose values based on the maximum and minimum possible outputs of the filter. However, these extreme values are very unlikely outputs in a large filter. In this case, most output values will fall within a narrow range, the effective resolution of the DAC is reduced and poor performance is likely to be obtained. However, if too narrow a range is used, the FIR output will occasionally overflow the 4-bit range used as output leading to occasional systematic errors in the output waveform. This issue of FIR scaling is well known in DSP theory, for example see . In this work, the maximum and minimum values of the FIR filter output were calculated using a 29 DeBruijn sequence. An 8% margin was allowed before the values of a and b were calculated. The effect of this is to slightly reduce the effective resolution of the DAC.
To achieve the required clock rate, the FIR and scaling operations were pipelined, with signals taking four clock cycles to pass from the FIR input register to the delay circuit input. The FPGA design used 71% of the programmable logic slices and only 6% of the RAM on each FPGA, allowing other DSP such as compensation for self-phase modulation to be implemented at a later stage.
5. Transmission experiments
Transmission experiments were carried out using a recirculating loop as shown in Fig. 5. The loop contained 80 km of standard SMF only (no dispersion-compensating fibre) and optical amplifiers with gain exactly equal to the loss of the fibre and other loop components. The loop-synchronised polarisation controller (LSPC) was used to minimise the detrimental impact of the polarisation effects in the recirculating loop. The LSPC has the effect of making the results more consistent and repeatable. At the receiver, noise loading was used to control the received optical signal-to-noise ratio (OSNR). A bandpass optical filter of 0.4 nm FWHM was used to remove out-of-band optical noise before the avalanche photodiode (APD) detector. A receiver with clock and data recovery together with an Anritsu MP1764A bit error rate test set (BERT) were used to obtain real time bit error rate (BER) measurements. However, unreliable locking of the clock recovery, both with phase locked loop and high-Q filter based circuits, caused a reduction in receiver sensitivity and synchronisation loss for the occasional recirculation. This is a general problem when using EPD signals in a recirculating loop, as the clock recovery circuit only receives the expected signal in the final recirculation and must lock rapidly to avoid timing errors. This prevented the capture of eye diagrams on an analog sampling scope. Instead, 1000 bits were captured with a 40 GSa/s digital sampling scope followed by off-line processing to retime the data and generate eye diagrams.
The tap weights for each transmission distance were calculated numerically by first finding the impulse response required to reverse the effect of chromatic dispersion:
where D is the fibre dispersion in ps/nm.km, L is the transmission distance, λ is the wavelength, f is the frequency variable, F -1(.) is the inverse Fourier transform operator and c is the speed of light in vacuum. This response was sampled at intervals of half the bit period. To avoid aliasing of the digital response and to simplify FIR implementation, the sampled response was truncated to 2Mfibre+1 taps (rounded up to the nearest odd integer) and the fibre memory is given approximately by:
where rb is the bit rate. Hence 19, 37 and 55 taps were used for 400, 800 and 1200 km transmission distances respectively. Finally the tap weights were scaled and quantized to 4-bit resolution, taking integer values from -7 to +7. For filter designs requiring less than the 55-taps implemented in FPGA hardware, the unused taps were set to zero. Figure 6 shows an example of two 55-tap FIR filters for calculating the real and imaginary parts of the modulator drive signals for 1200 km transmission distance.
It must be noted that the calculation of the tap weights used in this work is not optimum. To achieve improved results, the filter response can be modified to compensate for the measured transmitter and receiver responses  as well as to optimize the pulse shape for greater sensitivity.
6. Transmission results
Figure 7 shows BER against OSNR (measured in a 0.1 nm bandwidth) plots for back-back, 400 km (5 spans), 800 km (10 spans) and 1200 km (15 spans). Launch power was set to −5 dBm to avoid distortion due to self phase modulation (SPM). Initially a 27 DeBruijn sequence length was used. The back-to-back case used the Vreal FPGA with the FIR filters bypassed and the Vimag drive amplifier switched off with its output bias set to a null point, giving a required OSNR of 10.1 dB at 10-3 BER. Error free operation (<10-9 BER) was achieved for 400 and 800 km transmission distances. The highest OSNR available at 1200 km was 16.7 dB which allowed BER of below 10-5 to be obtained. The required OSNR penalties for 10-3 BER were 1.4, 1.4 and 2.5 dB for 400 km, 800 km and 1200 km respectively. Simulations using the general method described in  which include all aspects of the DSP implemented on the FPGAs (including 2 Sa/b operation, limited resolution of the DAC and FIR tap weights, chromatic dispersion impulse response truncation and the scaling of the FIR outputs) predict an average penalty of 0.9 ± 0.2 dB for transmission distances from 400 to 1200 km. The remaining penalty is believed to be due to the clock recovery issues, imperfections in the DAC response and the lack of compensation for the transmitter and receiver response in the FIR tap weight calculation (discussed further in section 7).
Using longer pattern lengths, the performance is reduced: for a 29 length, the penalties at 10-3 BER are increased to 1.7, 1.7 and 2.8 dB respectively for 400, 800 and 1200 km. For longer patterns (217) error floors have been measured at 10-6 for 800 km transmission. The pattern sensitivity is believed to be due to a combination of clock recovery issues and multiplexer input AC coupling. The error floors in long sequences are due to the mulitplexer input AC coupling causing errors when long runs of ones or zeros occur in an individual FPGA output.
Figure 8 shows eye diagrams for back-to-back, 400 km and 800 km taken by capturing data on the 40 GSa/s scope. In all cases, the eyes were taken with noise loading off and show 1000 bits. Launch power of -3 dBm was used to obtain the highest OSNR without significant SPM distortion. The measured OSNRs were 20.3 dB at 400 km and 19.5 dB at 800 km.
7. Discussion of transmitter performance
A 10.7 Gb/s electronic predistortion (EPD) transmitter employing real time 2 Sa/b DSP implemented on FPGAs has been described. Transmission over 1200 km of standard SMF was achieved with an OSNR penalty of 2.5 dB compared with back-to-back operation for a BER of 10-3. A lower penalty of 1.4 dB was measured at both 400 and 800 km. Of these penalties, 0.9dB has been explained, using simulation, to be due to 2 Sa/b 4-bit DAC, the method of FIR tap weight calculation and reduction of the effective DAC resolution due to FIR output scaling. Also, as discussed in section 5, the penalty could have been reduced by modifying the FIR filter to compensate for the transmitter and receiver responses and optimise pulse shaping . The remaining penalties are believed to be due to a combination of clock recovery issues and imperfections in the DACs. The clock recovery issues could be addressed either by further work to optimise clock recovery for the burst-mode recirculating loop operation with EPD signals or by performing experiments using a straight line link in which the clock recovery would receive a constant signal and hence would remain locked at all times.
The DAC imperfections fall into several categories. Firstly, replacing the multiplexers with devices with DC coupled inputs would remove the problem of errors in long run lengths in individual FPGA sub-channels (thereby removing the error floors in long sequence lengths) and reduce pattern sensitivity. However, as shown in Fig. 2, the DACs also suffer from other dynamic imperfections such as switching transients, droop and hysteresis. For the configuration used in this work, it is the multiplexer output stages which determine the noise, linearity and transient response of the overall DAC. The FPGA MGT outputs are retimed and thresholded within the multiplexers, and so their signal quality does not have a significant effect on the overall performance. The multiplexers were designed for digital operation rather than for the generation of accurate analog waveforms as required in this application. Overall performance would be improved by replacing the current discrete component DACs with integrated circuit devices optimised for good analog output performance.
A four bit DAC was implemented in this work, but this could be increased using the same basic technique. The largest current FPGA have 24 high speed transceivers allowing a 6-bit DAC to be implemented. The logic resource use of the FIR filters would not be increased as the additional bits of precision are already available at the scaler output and can be truncated to 6 bits (rather than 4 bits). Whether this increase in resolution improves overall transmission performance depends on the extent to which the DAC response can be improved as discussed above.
Although the clock recovery and DAC issues described above will affect the maximum transmission distance, the primary factor is the number of FIR filter taps available. It has been shown in this work that large FIR filters for transmitter-based chromatic dispersion compensation can be efficiently implemented on FPGAs as the input words are only 1-bit. In this work 55-tap FIR filters were implemented on each FPGA using 71 % of the available programmable logic resources of which 5% was for functions other than DSP. As the resources required to implement FIR filters scale linearly with the number of taps, this suggests that using all the available logic resources would allow 1720 km SSMF, although as 100 % resource use is approached, on-chip routing delays may prevent 10.7 Gb/s operation. Whilst for a real time application such as EPD adding further FPGA devices is impractical and yields diminishing returns, there are several methods for further increasing the transmission distance. Firstly the available RAM on the FPGAs could be used to add LUTs to provide additional chromatic dispersion compensation and/or SPM compensation. This approach is limited by the exponential increase in memory requirements with increasing chromatic dispersion . The V4FX100 FPGA used in this work has 6.2 Mbits of on-chip RAM which can be accessed in a single clock cycle (the use of off-chip RAM in a real time application such as this is impractical). As shown in , this can provide an additional 650 km transmission distance over SSMF. However, each new generation of FPGA uses a more advanced CMOS process and hence benefits from increased density, providing more logic cells per device. For example, using the currently available Virtex-5 5VFX200T device (which uses 65 nm CMOS and has 24 transceivers operating at up to 6.5 Gb/s) the number of logic cells is 196,608 compared with 94,900 for the Virtex-4 device (90 nm CMOS) used in this work, an increase of 207%. Further design work would be required to determine the number of FIR taps possible with this device but, as a conservative estimate, over 100 taps or transmission distances of over 2100 km are likely to be possible. The rapid increase in the number of logic cells available in each new generation of FPGA is likely to continue for the foreseeable future, allowing for the demonstration of increasingly complex DSP. Another option for increasing transmission distances is the use of a modulation format with a higher dispersion tolerance, for example duobinary or DQPSK. In  it was shown by simulation that EPD using duobinary allows an increase of 50 % in the transmission distance for a given processor memory (or number of FIR taps) compared with NRZ-OOK.
Increasing the bit rate of the transmitter is also possible using the techniques described in this paper by use of a higher order modulation format, the use of duobinary with 1 Sa/b DSP  or by an increase in the sample rate. However, the total output of currently available FPGAs (24 transceivers × 6.5 Gb/s) limits the potential increase in the sample rate. Future FPGAs will feature increased numbers of faster transceivers.
8. Further applications of the transmitter and real-time FPGA-based DSP
This work has focused on chromatic dispersion compensation using EPD. However, the system described is a fully programmable optical transmitter, capable of producing arbitrary optical fields (in one polarisation). By updating the FPGA design and tuning the MZM biases, many modulation formats can be generated. In addition the FPGAs can perform coding. For example,  demonstrates an example of parallel implementation of precoding for DQPSK signals. Such a transmitter would be a considerable advantage in a research or corporate lab. For example the performance of various modulation formats and coding schemes could be rapidly compared for a given transmission link. One modulation format which has received considerable recent attention is single sideband orthogonal frequency division multiplexing (SSB-OFDM) . The transmitter described in this paper could be used for the demonstration and investigation of real-time processing for SSB-OFDM. In particular, the scheme described in  in which all transmitter operations, including Hilbert transform filtering for single sideband generation, are performed digitally could be implemented without any changes to the hardware.
FPGAs can also be used to investigate real-time DSP for receiver based compensation schemes e.g. coherent QPSK polarization-multiplexed receivers  or SSB-OFDM receivers.
In conclusion, an EPD transmitter based on two Xilinx Virtex-4 FPGAs has been designed and constructed, and used to demonstrate real time processing for transmission at 10.7 Gb/s over 1200 km of standard single mode fibre with no optical compensation. Required OSNRs of 11.5 dB and 12.6 dB for transmission over 800km and 1200 km respectively were measured for a BER of 10-3. This work demonstrates that the latest FPGAs are suitable for experimental investigation of real time DSP for optical communication at 10 Gb/s and above. FPGA interfaced with integrated circuit ADC/DAC may also be suitable for providing lowcost flexible real time processing for commercial products operating at 10 Gb/s and above.
The authors would like to acknowledge the financial support of Huawei Technologies, Intel, the UK Engineering and Physical Sciences Research Council (grant EP/C523865/1) and the EU FP6 NOBEL2 programme for this work, as well as personal support from the RCUK (V. Mikhailov) and the Royal Society (P. Bayvel). We would also like to thank Dr S. Savory for many helpful discussions, Dr M. Glick of Intel Research for invaluable advice and assistance, Dr R Griffin and Dr M Wale of Bookham Technologies for the loan of the MZM, and S. Shang of Intel for donating one of the FPGA boards.
References and links
1. A. Färbert, S. Langenbach, N. Stojanovic, C. Dorschk, T. Kupfer, Schulien C., J. P. Elbers, H. Wernz, H. Griesser, and C. Glingener, “Performance of a 10.7Gb/s receiver with digital equaliser using maximum likelihood sequence estimation”, Proceedings of the 30th European Conference on Optical Communications (ECOC 2004), Stockholm, Sweden, Paper Th4.1.5, Sept 2004.
2. J. McNicol, M. O’Sullivan, K. Roberts, A. Comeau, D. McGhan, and L. Strawczynski, “Electrical domain compensation of optical dispersion”, Proceedings of Conference on Optical Fibre Communications (OFC 2005), Anaheim, USA, Paper OThJ3, Feb 2005
3. K. Roberts, “Electronic dispersion compensation beyond 10 Gb/s“, Digest of the 2007 IEEE LEOS Summer Topical Meetings, Portland Oregon, USA, 23–25th July 2007.
4. D. McGhan, C. Laperle, A. Savchenko, C. Li, G. Mak, and M. O’Sullivan, “5120 km RZ-DPSK transmission over G.652 fiber at 10 Gb/s without optical dispersion compensation”, Photon. Tech. Lett. 18, 400–402 (2006). [CrossRef]
5. P. Poggiolini, G. Bosco, S. Savory, Y. Benlachtar, R. I. Killey, and J. Prat, “1040 km uncompensated IMDD transmission over G.652 fiber at 10 Gb/s using a reduced state SQRT metric MLSE receiver”, Proceedings of European Conference on Optical Communications (ECOC’2006), Cannes, France, Postdeadline paper Th4.4.6, September 2006.
7. P. M. Watts, M. Glick, P. Bayvel, and R. I. Killey, “An FPGA-based optical transmitter design using realtime DSP for advanced signal formats and electronic predistortion”, J. Lightwave Technol. 25, 3089–3099 (2007). [CrossRef]
8. Killey R. I., P. M. Watts, V. Mikhailov, M. Glick, and P. Bayvel, “Electronic dispersion compensation by signal predistortion using digital processing and a dual drive Mach-Zehnder modulator ”,IEEE Photon. Tech. Lett. 17, 714–716 (2005). [CrossRef]
9. P. M. Watts, R. Waegemans, Y. Benlachtar, V. Mikhailov, M. Glick, P. Bayvel, and R. I. Killey, “10.7 Gb/s electronically predistorted transmission over 800 km standard single mode fibre using FPGA-based realtime processing”, Proceedings of 34th European Conference on Optical Communications (ECOC’2008), Brussels, Belgium, Paper We.2.E.1, September 2008.
10. P. J. Winzer, C. Woodworth, F. Fidler, P. K. Reddy, H. Song, and A. Adamiecki, “Temporal alignment of high speed transmit channels of FPGA”, Electron. Lett. 44, 113–114 (2008). [CrossRef]
11. J. G. Proakis and D. G. Manolakis, Digital Signal Processing (Prentice-Hall1996).
12. P. M. Watts, M. Glick, P. Bayvel, and R. I. Killey, “Performance of electronic predistortion systems with 1 Sample/bit processing using optical duobinary format”, Proceedings of 33rd European Conference on Optical Communcations (ECOC 2007), Berlin, Germany, 16th –20th September 2007.
13. H. Song, A. Adamiecki, P. J. Winzer, C. Woodworth, S. Corteselli, and G. Raybon, “Multiplexing and DQPSK precoding of 10.7-Gb/s client signals to 107 Gb/s using an FPGA”, Proceedings of the Conference on Optical Fibre Communications (OFC 2008), San Diego, CA, USA, Paper OTuG3, 24th–28th February 2008.
14. A. J. Lowery and J. Armstrong, “Orthogonal frequency division multiplexing for dispersion compensation of long haul optical systems”, Opt. Express, 142079–2084 (2006). [CrossRef]
15. B. J. C. Schmidt, A. J. Lowery, and J. Armstrong, “Experimental demonstrations of electronic dispersion compensation for long-haul transmission using direct-detection optical OFDM”, J. Lightwave Technol. 26, 196–203 (2008). [CrossRef]
16. A. Leven, N. Kaneda, and Y.-K. Chen, “A real-time CMA-based 10 Gb/s polarization demultiplexing coherent receiver implemented in an FPGA”, Proceedings of the Conference on Optical Fibre Communications (OFC 2008), San Diego, CA, USA, Paper OTuG3, 24th–28th February 2008.