A partial sum of singular ‐ value ‐ based reconstruction method for non ‐ uniformly sampled NMR spectroscopy

The nuclear magnetic resonance (NMR) spectroscopy has fruitful applications in chemistry, biology and life sciences, but suffers from long acquisition time. Non ‐ uniform sampling is a typical fast NMR method by undersampling the time ‐ domain data of the spectrum but need to restore the fully sampled data with proper constraints. The state ‐ of ‐ the ‐ art method is to model the time ‐ domain data as the sum of exponential functions and reconstruct these data by enforcing the low rankness of Hankel matrix. However, this method is solved by minimizing the sum of singular values of the Hankel matrix, which leads to the distortion of low ‐ intensity spectral peaks. Here, a low rank Hankel matrix reconstruction approach with a partial sum of singular values is proposed to protect small singular values, which can faithfully reconstruct all peaks. Results on both synthetic and realistic NMR spectroscopy show that the proposed method can reconstruct


| INTRODUCTION
Nuclear magnetic resonance (NMR) spectroscopy provides fruitful atomic-level information on molecular structures. It is an indispensable tool for the analysis in physics, chemistry [1], structural biology [2,3], medical science and other fields [4]. The data acquisition time of NMR spectra increases rapidly with dimensions and spectral resolutions [5]. Nonuniform sampling (NUS) [6][7][8][9][10] is commonly used to accelerate the data acquisition by undersampling the timedomain data of the spectrum. More recently, the deep learning approach has been proposed to accelerate NMR by using solely synthetic NMR signals, which lifts the prohibiting demand for a large volume of realistic training data usually required for a deep learning approach [11]. The deep learning NMR, however, requires the huge database and long time to train the neural network, although the spectra reconstruction from NUS is ultrafast.
Modern methods reconstruct high-quality NMR spectra from NUS data by imposing appropriate constraints on it. Typical constraints include sparsity of the spectrum [12][13][14][15][16] and low rankness of time-domain data, which is called the free induction decay (FID) signal, of the spectrum [17][18][19][20][21][22]. The former assumes that non-zero values are rare, for example, sparsity [12][13][14][15][16], in the spectrum and explores the l 1 norm [23][24][25] to constrain spectral sparsity. Although narrow peaks can be reconstructed very well, reconstruction of broad peaks may be compromised since they violate the assumption of sparsity. The latter [17] explores the low-rank Hankel matrix (LRHM) constructed from the FID assuming that the number of peaks is much fewer than the length of the spectrum. Due to the independence of the rank on the peak width, the LRHM method can simultaneously reconstruct both narrow and broad peaks, and has a significant advantage on preserving low-intensity broad peaks [17]. More advances of LRHM can be found in a recent review [26].
In the LRHM method [17], the FID signal is transformed into a Hankel matrix, whose rank is equal to the number of spectral peaks and supposed to be small. By introducing a nuclear norm, defined as the sum of singular values, as an approximation of rank, the FID is reconstructed by finding the low-rank solution subject to the acquired data. To achieve a better performance on recovering the low-intensity peaks, the weighted sum of singular values [19] is introduced as a better approximation of rank to LRHM, but the improvement is limited and not robust to sampling trials (See Section 4.1).
In this treatise, a new LRHM approach is proposed based on the partial sum of singular values (PSSV). This method tries to protect small singular values and introduces a better strategy to approximate the rank of Hankel matrix. Results on the synthetic and realistic NMR data show that the proposed method outperforms the state-of-the-art low-rank methods on preserving low-intensity peaks and provides more robust reconstructions. TA B L E 1 The reconstruction algorithm of the proposed method Initialization: Input y; U; R;set maximal β ¼ 10 9 , convergence condition η tol ¼ 10 À 6 , and maximal number of iterationK ¼ 10 3 . Initialize the solution x 0 ¼ U T yand β ¼ 1, the number of iterations k ¼ 0 and η 0 ¼ 1.

End while
Output: The reconstructed FID signalx.

F I G U R E 1
Reconstructed synthetic spectrum with five peaks when 10% of data are sampled in the NUS. (a) The fully sampled spectrum; (b), (c) and (d) are reconstructed spectra using the nuclear norm-based low-rank Hankel matrix (LRHM), weighted nuclear normbased LRHM and partial sum of singular values LRHM, respectively. 51 samples are used for reconstruction TU ET AL.

| RELATED WORK
In NMR spectroscopy, the time-domain FID signal is commonly modelled as the sum of exponential functions according to [10,11,[17][18][19][20][21][22][26][27][28] where Δt is the time interval; J is the number of spectral peaks; a j ; ϕ j ; τ j and ω j are the amplitude, phase, damping factor and the frequency of the jth spectral peak, respectively. In NUS, an unduplicated n was chosen from partial entries in the index f1; 2; …; Ng where N is the total number of fully sampled data points. Assuming the total number of n is M, then the sampling rate g is defined as g ¼ M=N. The sampled data y ∈ ℂ M in NUS is defined as where U ∈ ℂ M�N is an undersampling operator that defines how to acquire the FID with NUS and ε ∈ ℂ M is the additive Gaussian noise, which is the most common noise in NMR spectroscopy [27] and has been widely applied in NMR spectra reconstruction [10,11,14,17,20,22]. In the LRHM method, the reconstruction of FID signal x is formulated as a nuclear norm minimization problem of the Hankel matrix according to [17] where Rdenotes a Hankel operator converting the x into a Hankel matrix Rx, ‖⋅‖ * represents the nuclear norm (sum of singular values) [29,30], ‖⋅‖ 2 represents the l 2 norm, and λ is a regularization parameter that balances the low rankness and the data consistency. To distinguish this basic model in Equation (3) from other modifications discussed later, we call this method as nuclear norm-based LRHM (NN-LRHM). However, the NN-LRHM provides suboptimal reconstruction when NUS data are few in highly accelerated NMR. Since large singular values usually submerge the small ones in NUS and the amplitude of singular value reflects the intensity of peaks, the reconstruction of low-intensity peaks becomes hard and one may observe the distortion or even disappearance of these weak peaks [19].
To achieve better performance, the low-rank constraint is enforced by minimizing the weighted nuclear norm of the LHRM (WNN-LRHM) [19]. Its core idea is to better approximate the low rankness by assigning a big (or small) weight on the small (or big) singular value [19]. The model is as follows: where w ¼ ½w 1 ; …; w s ; …; w S � T includes a weight w s for the sth singular value σ s .
Although the NMR reconstruction performance of WNN-LRHM is greatly improved, particularly on low-intensity peaks, this improvement is not stable due to the incorrect learning of weights from typical nuclear norm reconstruction (see Section 4.1).

| PROPOSED METHOD
To better recover a general low-rank matrix, a new metric called PSSV was proposed to outperform the nuclear norm in computer vision [31]. One main advantage of PSSV is that it avoids simultaneously minimizing all singular values and provides a F I G U R E 2 Peak intensity correlations for the reconstructed spectra when 10% of data are sampled in the NUS. The error bars are the standard deviations of correlations over 100 sampling trials 16 better approximation to matrix rank. The PSSV has never been extended into the LRHM reconstruction in NMR spectroscopy.
In this work, we propose an LRHM reconstruction model by minimizing the PSSV, which is called PSSV-LRHM, as follows: where P r ð·Þ is the matrix projection operator to the matrix of rank r, and it is defined as where H 1:r and V 1:r are the matrices consisting of the singular vectors corresponding to the r largest singular values of X. The PSSV minimizes the sum of small singular values by subtracts the sum of large singular values ‖P r ðRxÞ‖ * from all singular values ‖Rx‖ * , leading to protection of small singular values in the reconstruction. Since the intensity of spectral peaks is basically reflected by the singular values, preserving the lowintensity peaks is expected in PSSV-LRHM.
To efficiently solve the proposed PSSV-LRHM model, the alternating direction minimization with continuation algorithm [32] is modified to fit the Hankel matrix reconstruction problem. The main steps are derived as follows.
The augmented Lagrange of Equation (6) is where β is a penalty variable; a larger β leads to better approximation to the rank. ‖⋅‖ F represents Frobenius norm and ‖⋅‖ r represents the PSSV with targeted rank r [31]. The algorithm consists of two loops of iterations. In the outer loop, we progressively increase β. In the inner loop, we alternatively solve Equation (8) by two sub-problems: (1) Fixing Z k , x kþ1 is obtained by solving whose solution is (2) Fixing x kþ1 , Z kþ1 is obtained by solving Here, a partial singular value thresholding operator [31] is used to solve Equation (12), whose solution is where the P r;1=β ð·Þ can be expressed as where D Y 1 ¼ diagðσ 1 ; :::; σ r ; 0; :::; 0Þ; ð15Þ :::; 0; σ rþ1 ; :::; σ minðm;nÞ and S τ ½x� ¼ signðxÞmaxð|x | À τ; 0Þ is the soft thresholding operator. The alternating iterations in the two sub-equations of Equation (9) stops if the number of iterations k reaches the maximal number K or the normalized successive difference smaller than a given tolerance η tol . The pseudocode of this reconstruction algorithm is shown in Table 1.

| EXPERIMENTAL RESULTS
The proposed method is compared with two state-of-the-art NMR reconstruction methods, including NN-LRHM and WNN-LRHM. Their performances are tested on both synthetic and realistic NMR data. All the 1D and 2D spectra reconstruction were performed using MATLAB 2017b (Mathworks Inc., Natick, MA) on a personal computer with 1.80 GHz 8 core CPU and 8 GB RAM.

| Synthetic data
Two synthetic spectra are generated according to the exponential function in Equation (1) and their parameters are listed in Tables A1 and A2, respectively. For the first spectrum (Figure 1a), the broadness of peak 1, 2, 3, 4 and 5 are decreased and their intensity is increased accordingly since a larger τ j results in a slower signal attenuation. Therefore, the broadest peak, which is also the lowest one, is peak 1 as shown in Figure 1a. The second spectrum (Figure 4a) contains dense peaks with different intensities. The total number of data points for each synthetic data is 512. Then, Gaussian noise [10,11,14,17,20,22,27] with zero mean and standard deviation of 0.005 is added to the two spectra. Finally, FID are sampled according to Poisson-gap NUS [33].
In general, the larger number of peaks, the greater the number of samples should be. Our empirical observations indicate that a faithful reconstruction usually requires the number of samples M being larger than five times of the number of peaks J, that is, M > 5J. Reconstructions on the first spectrum are shown in Figure 1. With very limited 51 sampled points under the sampling rate 10%, the height of moderate-intensity peak (peak 3 in Figure 1b) is incorrect and low-intensity peaks (peaks 1 and 2 in Figure 1b Figure 1c) are recovered much better by WNN-LRHM, but the peak shape or intensity of low-intensity peaks (peaks 1 and 2 in Figure 1c) is still incorrect. The proposed PSSV-LRHM remarkably improves the broadest and lowest intensity peak (peak 1 in Figure 1d) than other methods and obtains the most consistent spectrum to the ground truth (Figure 1a).
To test the robustness to randomness in NUS, we compare the quantitative analysis on the spectrum intensities correlation over 100 sampling trials. As shown in Figure 2, PSSV-LRHM achieves the highest correlations and the lowest standard deviations of correlations on low-intensity peaks (peaks 1 and 2) and comparable performance on other peaks. The scatter plot of correlations ( Figure 3) for low-intensity peaks imply that, in most cases, PSSV-LRHM obtains higher correlations than other methods. Even when WNN-LRHM gets extremely low correlation that is close to 0.2, the new method obtains correlation that is close to 1.
For the reconstruction of the dense spectrum in Figure 4, the proposed approach obtains more accurate peak intensity than other methods and more consistent spectral shape to the ground truth spectrum (Figure 4a). Thus, the PSSV-LRHM achieves the best reconstruction among three methods.

F I G U R E 9
Reconstructed 2D HSQC spectrum of ubiquitin from 25% data. (a) The fully sampled spectrum; (b)-(d) are reconstructed spectra using nuclear norm-based low-rank Hankel matrix (LRHM), weighted nuclear norm-based LRHM and partial sum of singular values LRHM, respectively. The data size is 98 � 576 with 98 points in the indirect dimension ( 15 N) and 576 points in the direct dimension ( 1 H). The ppm denotes parts per million by frequency, the unit of chemical shift, which is defined in Appendix. HSQC, Heteronuclear single-quantum correlation spectroscopy F I G U R E 1 0 Peak intensity correlation between fully sampled spectrum and reconstructed spectrum on the 2D HSQC of ubiquitin. (a)-(c) are correlation plots of all peaks using nuclear norm-based low-rank Hankel matrix (LRHM), weighted nuclear norm-based LRHM and partial sum of singular values LRHM, respectively. The R 2 denotes the square of Pearson correlation coefficient; the closer the value is to 1, the closer the reconstructed spectrum is to the reference. HSQC, Heteronuclear singlequantum correlation spectroscopy

| NMR data
The realistic 2D NMR data include a 1 H-15 N Heteronuclear single-quantum correlation (HSQC) spectrum of cytosolic CD79b protein [17] and a 1 H-15 N HSQC spectrum of ubiquitin [11]. For realistic 2D NMR, the undersampling is only performed on the indirect dimension, since the data acquisition in this dimension is time consuming due to the evolution of magnetization in NMR physics [13]. Another dimension called direct dimension is fully sampled since the data acquisition is fast.
As shown in Figure 5a, the 2D 1 H-15 N HSQC spectrum of cytosolic CD79b protein is acquired for 300 μM 15 N-13 C labelled sample of cytosolic CD79b in 20mM sodium phosphate buffer, pH 6.7 at 25°C on 800 MHz Bruker AVANCE III HD spectrometer equipped with 3-mm TCI cryoprobe. The fully sampled spectrum consists of 1024 data points in the direct dimension ( 1 H) and 256 data points in the indirect dimension ( 15 N).
The reconstructed 2D NMR spectra are shown in Figure 5. Some missing peaks (Figures 5b,c and 6) are presented in the reconstruction obtained by NN-LRHM and WNN-LRHM, while these distortions are not observed in the PSSV-LRHM reconstruction. The quantitative analysis on the peak intensities correlation in Figures 7 and 8 indicate that the proposed method achieves the highest correlations among three methods.
As shown in Figure 9a, the 2D 1 H-15 N HSQC spectrum of ubiquitin is acquired at 298.2 K temperature on an 800-MHz Bruker spectrometer. The fully sampled spectrum consists of 1024 data points in the direct dimension ( 1 H) and 98 data points in the indirect dimension ( 15 N). According to the Poisson-gap NUS, 25% of both data are non-uniformly sampled in the indirect dimension [33].
The reconstructed 2D NMR spectra are shown in Figure 9. Some pseudo peaks (Figure 9b,c) are presented in the reconstruction obtained by NN-LRHM and WNN-LRHM, while these distortions are not observed in the PSSV-LRHM reconstruction. The quantitative analysis on the peak intensities correlation in Figures 10 and 11 indicate that the proposed method achieves the highest correlations among three methods.
These observations imply that the PSSV-LRHM can reconstruct the 2D NMR spectra more faithfully, especially for low-intensity peaks. F I G U R E 1 1 Peak low intensity correlation between fully sampled spectrum and reconstructed spectrum on the 2D HSQC of ubiquitin. (a)-(c) are correlation plots of low-intensity peaks using nuclear norm-based low-rank Hankel matrix (LRHM), weighted nuclear norm-based LRHM and partial sum of singular values LRHM, respectively. The R 2 denotes the square of Pearson correlation coefficient; the closer the value is to 1, the closer the reconstructed spectrum is to the reference. HSQC, Heteronuclear single-quantum correlation spectroscopy F I G U R E 1 2 Synthetic data reconstruction with high dynamic range peaks when 10% of data are sampled in the non-uniform sampling. (a)-(d) are fully sampled spectrum, reconstructions obtained by the nuclear normbased low-rank Hankel matrix (LRHM), weighted nuclear norm-based LRHM and partial sum of singular values LRHM methods, respectively. The intensity ratio of four peaks is 1:10:100:1000, and the parameters to generate this data are listed in Table A3 TU ET AL. -21

| DISCUSSIONS
To verify the performances of the proposed method and other two methods, their computational time, the reconstruction ability of the minimum low-intensity peak and high dimensional spectra are compared here. Besides, the limitation of the proposed method is discussed.

| Computational time
The time taken to reconstruct 2D HSQC spectrum of cytosolic

| Minimum low-intensity peak
In NMR, the low-intensity peak usually means the intensity of the lowest peak is much lower than that of the highest one. To verify the performance of minimum intensity in reconstruction, a high dynamic range spectrum shown in Figure 12a is generated, whose lowest intensity is 10 À 3 of the highest one. The synthetic spectra are generated according to exponential functions in Equation (1) and their parameters are listed in Table A3.
Results show that, when the sampling rate is 10%, the proposed method (Figure 13d) holds the advantage over NN-LRHM (Figure 13b) and WNN-LRHM (Figure 13c) on the reconstruction of the minimum low-intensity peak (marked with an arrow in Figure 12 and its zoom-in in Figure 13). The intensity of the lowest peak in NN-LRHM, however, is relatively lower than that of fully sampled spectra (Figure 13a).
By increasing the sampled rate to 25% (Figures 14 and 15), high-fidelity reconstruction of the minimum low-intensity peak (Figure 16d) can be achieved while compared methods still introduce strong artefacts (Figure 15b,c).  Table A3 F I G U R E 1 4 Synthetic data reconstruction with high dynamic range peaks when 25% of data are sampled in the non-uniform sampling. (a)-(d) are fully sampled spectrum, reconstructions obtained by the nuclear normbased low-rank Hankel matrix (LRHM), weighted nuclear norm-based LRHM and partial sum of singular values LRHM methods, respectively. The intensity ratio of four peaks is 1:10:100:1000, and the parameters to generate this data are listed in Table A3 Thus, the proposed provide much better reconstruction of the minimum low-intensity peak whose intensity is 10 À 3 of the highest one.

| High-dimension spectra
We add the reconstruction of a synthetic 3D spectrum. The spectrum is with a size 20 � 20 � 20 are obtained by performing the Fourier transform on the 3D FID generated according to Equation (18) with parameters listed in Table A4.
The fully sampled spectrum is shown in Figure 16a. The lowest intensity peak, which is marked with an arrow, is reconstructed much better by the proposed method ( Figure  16d) than NN-LRHM ( Figure 16b) and WNN-LRHM (Figure 16c). The advantage of the PSSV is also confirmed by the lowest reconstruction error, relative l 2 norm error (RLNE), under 10 trials of NUS. Thus, the proposed method can be extended to high-dimensional spectrum reconstruction and still outperforms other methods on achieving lower reconstruction error and preserving lowintensity signals.
The current PSSV-LRHM as well as other LRHM methods work for complex spectra reconstruction but cannot be applied to realistic high-dimensional spectroscopy, such as 3D NMR [10,11,34], since they cannot handle the hypercomplex format in the indirect dimension [35]. We remark this as a valuable future work.

| Limitation by noise
The limitation of the proposed method is the same as other methods. When the noise is relatively high, the lowest peaks may be distorted by all methods, although all other higher-intensity F I G U R E 1 5 Synthetic data reconstruction with high dynamic range peaks when 25% of data are sampled in the NUS. (a)-(d) are zoom-in of marked minimum low-intensity peak in Figure 14a-d. The intensity ratio of four peaks is 1:10:100:1000, and the parameters to generate this data are listed in Table A3 F I G U R E 1 6 Reconstructed 3D synthetic spectrum when 25% of data are sampled in the non-uniform sampling. (a) The fully sampled spectrum. (b)-(d) are reconstructed spectra using nuclear norm-based low-rank Hankel matrix (NN-LRHM), weighted nuclear norm (WNN)-LRHM and the proposed partial sum of singular values (PSSV)-LRHM, respectively. The data size is 20 � 20 � 20. We draw 3D spectra of tensors by using the tensor toolbox. The colour stands for the magnitude of spectra. Gaussian noise with zero mean and a standard deviation of 0.005 is added to the free induction decay. The average reconstruction error, the relative l2 norm error (RLNE), of NN-LRHM, WNN-LRHM and the proposed PSSV-LRHM are 0.101 � 0.013, 0.070 � 0.009 and 0.057 � 0.002, respectively. The number after the '�' denotes the standard deviation of RLNE over 10 sampling trials peaks are reconstructed much better ( Figure 17). In addition, the proposed PSSV obtains slightly higher peak correlation for all peaks than other methods ( Figure 18).

| CONCLUSION
An LRHM method with a PSSV is proposed to reconstruct the undersampled data in accelerated NMR spectroscopy. Experimental results on both synthetic and realistic NMR data demonstrate that the new approach reconstructs spectral peaks more faithfully and robustly, particularly for low-intensity peaks. Possible future work could be extended into higher dimensional NMR, improving the reconstruction under high noise, and incorporating the protection of weak signals into exponential-based [17] and LRHM-based deep learning reconstruction [36], which may enable ultrafast and high-fidelity reconstruction of fast NMR [17,36,37].

F I G U R E 1 7
Reconstructed spectra under a relatively high noise level. (a)-(d) are fully sampled noisy spectra, reconstructions obtained by the nuclear norm-based low-rank Hankel matrix (LRHM), weighted nuclear norm-based LRHM and the proposed partial sum of singular value LRHM methods, respectively. 15% of data are sampled in the non-uniform sampling. The standard deviation of the noise is 0.02, and the intensity of the lowest peak in the fully sampled noise-free spectra is 0.04. Thus, the noise level is half of the lowest spectral peak intensity. The parameters to generate this data are listed in Table A1 F I G U R E 1 8 Peak intensity correlations for the reconstructed spectra when the sampled data are 15%.

APPENDIX
The ppm denotes parts per million by frequency, the unit of chemical shift. The chemical shift is defined as [20]: where f sample denotes resonance frequency of the sample, f ref the absolute resonance frequency of a standard compound measured in the same magnetic field and f spec the frequency of magnetic field strength of spectrometer. The synthetic 3D signal is generated in the time domain as follows [10,27]: e À 1=τ 1;r þj2πf 1;r � i 1 � e À 1=τ 2;r þj2πf 2;r � i 2 � e À 1=τ 3;r þj2πf 3;r where R is the number of exponentials (i.e. spectral peaks), the frequency f n;r is uniform randomly drawn from the interval ½0; 1Þ, a r is the amplitude, τ n;r is the damping factor, and i n ¼ 1; 2; ⋯; 20ðn ¼ 1; 2; 3Þ. The NUS is performed on the indirect plane that is constituted by the variables i 1 and i 2 [18,38].