## PROCEEDINGS OF SPIE

SPIEDigitalLibrary.org/conference-proceedings-of-spie

# FPGA based time-to-digital converters

Nock, Richard, Ai, Xiao, Lu, Yang, Dahnoun, Naim, Rarity, John

Richard W. Nock, Xiao Ai, Yang Lu, Naim Dahnoun, John G. Rarity, "FPGA based time-to-digital converters," Proc. SPIE 11347, Quantum Technologies 2020, 1134719 (30 March 2020); doi: 10.1117/12.2555997



Event: SPIE Photonics Europe, 2020, Online Only, France

### FPGA based time-to-digital converters

Richard W. Nock\*a, Xiao Ai<sup>b</sup>, Yang Lu<sup>a</sup>, Naim Dahnoun<sup>c</sup>, John G. Rarity<sup>c</sup>
aSchool of Engineering & Applied Science, Aston University, Birmingham, UK, B4 7ET; bQLM
Technology, Unit DX, Bristol, UK, BS2 0XJ; Merchant Venturers Building, University of Bristol,
Woodland Road, Bristol, UK, BS8 1UB

#### **ABSTRACT**

Time-to-digital converters are a key component in many photonics systems, ranging from LiDAR, quantum key distribution, quantum optics experiments and time correlated single photon counting applications. A novel efficient time-to-digital converter non-linearity calibration technique has been developed and demonstrated on a Spartan 6 LX150 field programmable gate array (FPGA). Most FPGA based time-to-digital converters either use post processing or have calibration techniques which do not focus on minimizing resource utilization. With the move towards imaging with arrays of single photon detectors, scalable timing instrumentation is required. The calibration system demonstrated minimizes block memory utilization, using the same memory for probability density function measurement and cumulative distribution function generation, creating a look up table which can be used to calibrate the sub-clock timing module of the time-to-digital converter. The system developed contains 16 time-to-digital converters and demonstrates an average accuracy of 21ps RMS (14.85ps single channel) with a resolution of 1.86ps.

**Keywords:** time-to-digital converters, photon counting, and time correlated single photon counting.

#### 1. INTRODUCTION

Time-to-digital converters (TDCs) are a common building block of timing instruments, photonic and quantum systems, ranging from time correlated single photon counters (TCSPC), QKD, quantum optics experiments to LiDAR. Such instruments provide a series of "time stamps" or "time tags", which record the time at which input event occurred relative to a start signal. Typically, the event to be measured is the rising edge of an input signal, which is often the output of a single photon detector. In this work, the TDC measures the time from when the measurement started to when an input edge occurred. Hence, there is no dedicated start signal.

Typically, such instruments are constructed from application specific integrated circuit (ASIC) TDCs, such as those offered by AMS [1]. However, such ASICs often require a field programmable gate array (FPGA) to perform communication of time-stamp data to a personal computer (PC) for further analysis and display. As of such, significant work has been undertaken in developing FPGA based TDCs [2] to enable low cost scaling to a larger number of TDC channels such that measurements from single photon avalanche photo diode (SPAD) arrays can be performed. However, there are numerous challenges in moving towards FPGA based instrumentation for such tasks. In particular, this work provides an efficient and low resource approach to correct for the non-linearity of delay line FPGA based TDCs.

#### 2. BACKGROUND

FPGA based TDCs often make use of the carry chain (a dedicated linear path within the FPGA designed for addition circuits) based delay line such that input edge time measurements can be obtained with sub-clock period resolution. The delay line is used to measure the input signal's edge position within the clock period and a coarse counter is used to determine which clock cycle the input edge occurred. This combination of the coarse counter and the sub-clock timing delay line offers long measurement ranges with high resolution [3]. Although such approaches typically offer a resolution of 20ps [4], converter linearity is often poor, with the biggest problem being known as "ultra-wide bins", where the delay between sampling flip-flops can increase considerably when the input signal is routed between FPGA slices [5]. Although there are numerous techniques to mitigate converter non-linearity [6,7,8], this work focuses on creating a real-time solution to measure non-linearity and calibrate for it, whilst minimizing resource utilization.

Quantum Technologies 2020, edited by Eleni Diamanti, Sara Ducci, Nicolas Treps, Shannon Whitlock, Proc. of SPIE Vol. 11347, 1134719 · ⊚ 2020 SPIE · CCC code: 0277-786X/20/\$21 · doi: 10.1117/12.2555997

The calibration is performed using statistical code density testing (SCDT) [9]. In this method, an asynchronous clock source is presented to the delay line based sub-clock timing module, which measures the position of the edge within the clock period. As there is no relationship between the calibration clock source and the FPGA's clock, each rising edge should have an equal chance of appearing in a sampling bin if all delay elements used in the delay line exhibit the same delay. However, the time delay introduced by of each delay element is dependent upon voltage and temperature [5] and the placement variations caused by the FPGA placement and routing tools. Hence, the likelihood of an event falling within a particular bin will be proportional upon the time width of the bin, as it is proportionally larger in comparison to the clock period. Typically, the calibration edges could be the signal of interest, but for robustness, a separate clock source is utilized in this work, as the TDC can be utilized to trigger experiments, which could introduce time correlated input events. Numerous input edges (typically a million or more) will be presented to the delay line and a probability density function (PDF) formed of bin number vs hit occurrence. The width of a bin in terms of time can be calculated as in (1), where i is the bin at hand, H(i) is the count of occurrences for bin i,  $T_{CLK}$  is the clock period and Y is the number of events used in histogram.

$$W(i) = \frac{H(i)T_{CLK}}{v} \tag{1}$$

 $W(i) = \frac{H(i)T_{CLK}}{Y}$  (1) Performing a cumulative sum of the bin widths will result in the cumulative distribution function (CDF), which is the transfer function of the sub-clock timing delay line. This is shown in (2), where i is again the bin at hand. This CDF can be utilized as a look table to convert delay line codes to actual time within the delay line.

$$T_{fine}(i) = \sum_{m=1}^{i-1} W(m)$$
 (2)

#### 3. METHODOLOGY

Hardware calibration is performed in hardware in an efficient manner via the use of dual port block random access memory (RAM). Conventional (single port) RAM implemented in block memory would have one address bus, one data in bus, one data out bus, and a write enable control, with this group of signals being known as port. Dual port RAM has two independently controllable ports. Hence, with dual port RAM, it is possible to read from two memory locations, write to two memory locations or read and write to two separate memory locations in the same clock cycle. This property is utilized to implement hardware statistical code density testing (SCDT) calibration with minimum resource utilization as follows, referring to Figure 1.



Figure 1: Delay line, priority encoder and the PDF/CDF dual port RAM

- 1. The contents of all memory addresses contained in the dual port block RAM are reset to 0. This is achieved by personal computer (PC) control, and a top-level state machine. The input multiplexer controlled by the top-level machine selects the asynchronous calibration clock source as the input to the delay line. An external 1.8432 MHz crystal oscillator is used for this.
- 2. A power of 2 hits are applied to the delay line. In the actual instrument,  $2^{20} = 1,048,576$  input hits are utilized to create a histogram of bin widths stored in the dual port block RAM signified as bin counter, H(B), where B is the bin number. The delay line is 512-bins long and a priority encoder is used to convert the 512-bit thermometer code to a 9-bit code word. The requirement for a power of 2 hits will be discussed later.

- 3. For each input edge, the contents at the bins address H(B) are incremented by 1. This is achieved by asserting the bin code for the hit as an address to port A of the block RAM. The output bus from port A is connected to a binary adder, such that the output of the adder is always equal to the input plus 1. In the very next clock cycle, the incremented count value is written back to the appropriate memory location. With only the addition of registers, the process of bin occurrence counting (the formation of the bin PDF) has been performed.
- 4. After the designated number of hits have been applied and counted, the same dual port block RAM is utilized to form the transfer function look up table, for the delay lines span of time 0 to  $T_{\text{CLK}}$ . This is achieved using the same memory and an iterative process. The sum of bin 0 and bin 1 is placed in the memory location of bin 1. On the next iteration, the elements 1 and 2 are summed and placed into memory location 2. This process repeats until the end of the dual port RAM. 5. The dual port RAM can now be utilized as a look up table for the specific sub clock timing module when used in the run mode rather than calibration mode.

The iterative process utilizes block RAM resources to the fullest potential, as the same RAM is used for the both the PDF and CDF.

The power of 2 requirement enables the use of fixed-point time tags, which is preferable to minimize logic utilization and computational time in post processing. If the sum of the bin counts (or the number of hits used to form the histogram, Y) is a number that satisfies  $2^{Yb}$ , where  $Y_b$  is a positive integer, the fine time measurement will cover the range of  $(0, 2^{Yb} - 1)$  or 0 to  $T_{CLK}$ . This ideal from the standpoint of forming a fixed point time tag, as the  $Y_b$  bit number can be concatenated onto the end of  $C_b$  bit number representing the number of clock cycles that have passed since the measurement was started. In the system implemented,  $C_b$  is equal to 36 and  $Y_b$  is equal to 12. With a clock period  $T_{CLK}$  of 7.62ns (131.25 MHz), delay line non-linearities are calibrated to a resolution of 1.86ps ( $T_{CLK} \div 2^{I2}$ ), whilst maintaining a measurement range of  $T_{CLK}$ .  $2^{Cb} = 524$  seconds or 8.7 minutes.

#### 4. RESULTS & DISCUSSION

The hardware developed is shown in Figure 2. It utilizes an Opal Kelly XEM6310 Spartan 6 module (LX150) mounted on a custom printed circuit board (PCB) and offers 16-input channels with user configurable constant level discriminators, 8 user-programmable logic level  $50\Omega$  outputs (to trigger various equipment components). A four-layer PCB is utilized, as it was found in early work that 2-layer PCBs offered poor timing resolution performance, due to signal integrity issues. In addition to this, a jitter attenuator is used to reduce clock jitter, as the digital clock manager (DCM) used to generate the 131.25 MHz system clock from a 10 MHz oven-controlled crystal oscillator introduces considerable jitter.

The system clock frequency is currently constrained by the priority encoder used to convert delay line thermometer codes (where each bit is used to represent the input signal state) to number which represents the input's edge position. The delay line developed produces 512-bit thermometer codes, which are converted into 9-bit codes (512 possible edge positions) using the priority encoder. However, the remaining logic is capable of much higher operating frequencies.



Figure 2: The hardware platform developed, utilizing a custom printed circuit board and an Opal Kelly XEM6310 FPGA module.

Figure 3 (a) gives a typical example of the probability density function (PDF, with respect to the clock period) for a typical delay line, showing the bin widths (in seconds) for each bin. This was calculated from (1) and the histogram of input edge occurrences discussed earlier. The mean bin size is 14.88ps, although there are a considerable number of missing codes caused by the non-linearity of the converter. Figure 3 (b) demonstrates a typical delay line transfer function (CDF) formed by the cumulative sum of bin widths (2). This look-up table allows for easy conversion of delay line codes to fixed-point time stamps and this calibration is performed periodically to account for voltage and temperature changes of which the FPGA experiences.



Figure 3: (a) Example histogram (PDF with respect to clock period) of bin counts/occurrences for 2<sup>20</sup> input edges (b) The resultant transfer function (CDF) for a typical carry chain delay line in the FPGA based TDC

Figure 4 (a) is the typical differential non-linearity (DNL) plot for a delay line. DNL can be essentially thought of as a measure of a sampling bin's width in comparison to the ideal width, in units of least significant bits. As of such, DNL is calculated as of (3), where  $T_{LSB}$  is the ideal bin size, which in this case is  $T_{CLK} \div 2^{12} = 1.86ps$ . Maximum DNL experienced in this delay line is 3.87 least significant bits (LSB).

$$DNL(i) = \frac{W(i)}{T_{LSR}} - 1 \tag{3}$$

Integral non-linearity is shown in Figure 4 (b) and can be thought as the bin specific deviation from an ideal transfer function. As of such, it is the cumulative sum of all DNL errors prior to the bin at hand. Hence, INL can be calculated as (4). For this particular delay line, the vast majority of all bins exhibit a positive error with respect to the ideal transfer function. Maximum INL experienced is 7.87 LSB.

$$INL(i) = \sum_{m=0}^{i} DNL(m)$$
(4)



Figure 4: (a) Differential non-linearity (DNL) for a typical delay line (b) Integral non-linearity for a typical delay line

The accuracy or single shot precision (SSP) of the instrument has been measured by evaluating the RMS error across each pair of channels with respect to channel 1 (e.g. channel 1 to 2, 1 to 3, etc.). To measure this, an HP8082A pulse generator and a Mini-Circuits ZFRSC-42-S+  $50\Omega$  splitter was utilized to drive each input channel pair with essentially the same input, which in ideal circumstances provides an input delta function to the TDC. Two million-time tags were taken per pair, the time difference between time-tag pairs was then calculated and the RMS error was calculated from this series of time differences, resulting in an average accuracy of 21ps. A typical SSP histogram of time differences for a pair of input channels is shown in Figure 5 (a) and the RMS error across all pairwise channels is shown in Figure 5 (b).



Figure 5: (a) Typical single shot precision (SSP) for the TDC developed, measured across a pair of channels (b) RMS timing error between each pair of input channels, with respect to channel 1.

It is worth noting that the measurements could be likely improved with input signals exhibiting a faster rise time.

#### 5. CONCLUSION

An efficient delay line TDC calibration technique has been demonstrated on a low-cost FPGA based instrument, exhibiting 16 channels. Such an approach minimises utilisation of calibration logic and should allow for a larger number of timing channels to be fitted to a given FPGA based device. The TDC developed exhibits a resolution of 1.86ps and an average accuracy or SSP of 21ps RMS (14.84ps RMS single channel accuracy).

Differential non-linearity is <3.87 and INL <7.87. With a simple delay-line based TDC, there is no easy to improve linearity, as the carry chain is a fixed resource contained within the FPGA's fabric.

It is believed that FPGA based TDCs will play a pivotal role in measuring the outputs of SPAD arrays. The given hardware could be potentially optimised further, for example, by increasing the number of channels supported by working with shorter delay lines. However, this would require a much higher clock frequency, of which will be investigated in future work.

#### 6. ACKNOWLEDGEMENTS

The authors would like to thank QUANTIC for the partnership resource funding which made the development of the new 16-channel hardware possible.

#### REFERENCES

- [1] AMS, "TDC-GPX2 4-channel time-to-digital converter," 18<sup>th</sup> December 2017", <a href="https://ams.com/documents/20143/36005/TDC-GPX2">https://ams.com/documents/20143/36005/TDC-GPX2</a> DS000473 3-00.pdf/5d046c0d-b6f4-d0e4-7048-ab94263618e1> (22 September 2019).
- [2] Aloisio, A., Branchini, P., Giordano, R., Izzo, V. and Loffredo, S., "High precision time-to-digital converter in a fpga device," 2008 IEEE Nuclear Science Symposium Conference Record, 2114-2118 (2008).
- [3] Nock, R., Dahnoun, N. and Rarity, J., "Low cost timing interval analyzers for quantum key distribution," 2011 IEEE International Instrumentation and Measurement Conference, 1–5 (2011).
- [4] Bourdeauducq, S., "A 26 ps rms time-to-digital converter core for spartan-6 fpgas," <a href="http://arxiv.org/abs/1303.6840">http://arxiv.org/abs/1303.6840</a> > (2013).
- [5] Wu, J., "Several key issues on implementing delay line based TDCs using FPGAs," IEEE Transactions on Nuclear Science 57.3, 1543-1548 (2010).
- [6] Wu, J. and Zonghan, S., "The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay," In 2008 IEEE Nuclear Science Symposium Conference Record, 3440-3446 (2008).
- [7] Daigneault, M. and David, J., "A novel 10 ps resolution TDC architecture implemented in a 130nm process FPGA," In Proceedings of the 8th IEEE International NEWCAS Conference, 281-284 (2010).
- [8] Szplet, R., Jachna, Z., Kwiatkowski, P. and Rozyc, K., "A 2.9 ps equivalent resolution interpolating time counter based on multiple independent coding lines," Measurement Science and Technology, 24(3) (2013).
- [9] Song, J., An, Q., and Liu, S., "A high-resolution time-to-digital converter implemented in field-programmable-gate-arrays," IEEE Transactions on Nuclear Science 53(1), 236-241(2006).