18 band ANSI S1.11 filter bank based on interpolated finite impulse response technique for hearing aids

: A low complexity interpolated finite impulse response (IFIR) based 18 band ANSI S1.11 1/3 octave digital filter bank for hearing aid is proposed and implemented using 65 nm UMC technology in this study. ANSI S1.11 specifications for 1/3 octave Class-2 filters are chosen as design criteria in the proposed method. The strict requirements at lower bands make the hardware implementation of ANSI S1.11 filter bank complex and difficult in a power critical application like hearing aid. In the proposed technique, the maximum margin available in the filter specifications is utilised in both upper and lower band edge specifications to reduce the filter order. IFIR technique was used to reduce the computations at lower bands which can be implemented in hardware efficiently without altering the sampling frequency of the input signal. Compared to other recent architectures, >50% reduction in total number of filter coefficients is achieved and the group delay is kept <10 ms. Hardware implementation of the algorithm was done using 65 nm standard cell libraries and tested the outputs with NAL-NL2 gain prescriptions. The matching errors are within ±1.5dB and the core power consumption was obtained as 0.37 mW.


Introduction
According to the World Health Organisation, around 50% of young people aged between 12-35 years are at risk of hearing loss due to excessive exposure to loud sounds including music from personal audio devices [1]. Fundamental blocks present in the digital signal processing part of a basic digital hearing aid are filter bank (FB), dynamic range compression (DRC), noise reduction (NR) and acoustic feedback cancellation (FBC). Out of these, FB and DRC perform the basic functionality of a hearing aid that is loudness compensation and signal compression while NR and FBC are optional blocks used for further signal enhancement. Hearing aid being a real-time application powered by a small battery, the circuit complexity should be as low as possible meeting the required delay constraint. At the same time, the algorithm should provide enough frequency resolution to keep the matching error between the prescribed gain and initial fitting as small as possible.
Finite impulse response (FIR) filters are preferred over infinite impulse response (IIR) filters because of their linear phase characteristics and inherent stability. Uniform or non-uniform FB architecture can be used and initially, uniform structures were used in hearing aids. Lunner and Hellgren [2] proposed an eight band uniform FB based on interpolated FIR technique [3]. McAllister et al. [4] used frequency sampling technique and the FB was based on a comb filter which combines with a bank of 167 resonators spaced at 50 Hz intervals over the frequency range 0-8.3 kHz. A frequency-domain implementation of an oversampled, weighted overlap-add (WOLA) FB is described in [5]. In WOLA architecture, a single prototype filter was replicated to 32 equal bands by DFT modulation. Advantages of the multidimensional logarithmic number system for hardware implementation were utilised in [6] and designed an 8-channel uniform FB structure using Kaiser window based parallel FIR filters.
In uniform FB, a large number of bands are required to get sufficient frequency resolution to match the audiograms at lower frequencies compared to non-uniform architectures. Research interest shifted to non-uniform architectures for hearing aid application in recent years. A frequency response masking technique was proposed by Lian and Wei [7] for hearing aids. It is a symmetric non-uniform FB which gives higher resolution at lower and higher frequencies, but lesser resolution at middle bands, hence the matching error is high at middle frequencies. A hardware implementation of the FRM technique using a 16 band non-uniform FB is proposed in [8]. Even though the matching errors for the algorithm were calculated directly for the audiogram instead of any gain prescription, the maximum matching error is high in some cases since the algorithm was based on the FRM technique, which differs from frequency domain characteristics of cochlea significantly. A technique based on frequency warping is proposed by Lai et al. [9], in which uniform cosine modulated FB is modified using all pass transform to obtain non-uniform structure. It would be difficult to meet the specifications in hardware because of its recursive and complicated structure and can cause problems with stability. A low complexity multirate ANSI S1.11 18 band FB was proposed in [10], which uses three prototype filters to define the upper octave. Remaining bands are derived from it using multirate technique. The algorithm suffers from a high group delay (GD) which is not acceptable for hearing aid application. Further modifications to multirate ANSI S1.11 18 band structure are proposed by Liu et al. [11] and Yang et al. [12]. Liu and coworkers [10] reduced the GD to 10 ms by restricting the down sampling factor to four for the lower nine bands. The upper nine bands were designed the same as in. Yang et al. [12], derived the whole 18 bands from a single prototype filter by using fractional sampling rates. Both these architectures are not following ANSI S1.11 specifications and are running at multiple sampling rates for different bands which will make the synthesis part complex in hardware implementation and can cause problems while incorporating other multi-band algorithms like NR and DRC.
A three-channel variable band IIR FB was proposed for hearing aid application by Deng et al. in [13]. The design was based on a variable low pass, variable bandpass and variable high pass digital filters. The three filters were obtained from a Chebyshev type-1 low pass filter using frequency transformations. The algorithm suffers from high audiogram matching error along with the disadvantages of being IIR. The hardware implementation of the system may also be difficult considering the requirement of adjusting the bandwidth of each band according to each audiogram.
In this study, we propose a comparatively simple architecture that satisfies ANSI S1.11 specifications with a lesser number of filter coefficients without altering the sampling rate. The band edge frequencies are carefully designed by utilising the maximum margin available to meet the ANSI specifications which reduce the prototype filter's order considerably. The entire FB is implemented on hardware and the hardware design was tested using NAL-NL2 prescriptions for different audiograms, which is not available in the literature for ANSI S1.11 specifications to our knowledge. The background of ANSI S1.11 fractional filter specifications and basic hearing aid working is discussed in Section 2. The proposed architecture is explained in Section 3. Details of hardware implementation are described in Section 4. Simulation results and performance analysis are given in Section 5.

Background
Basic signal processing components present in a hearing aid chip are shown in Fig. 1. In hearing aid fitting, the insertion gains obtained from various prescription formulas are applied to different bands to fit the audiogram. DRC algorithm is used to compress the signal levels, which cross a particular threshold after the application of gain to protect the residual hearing. NR and DRC can be applied as single channel or multichannel. Multichannel DRC [14] and multiband NR [15,16] give better performance over single channel processing.
An audiologist uses various prescription formulas to fit the audiogram. Most commonly used ones are National Acoustics Laboratory-Non Linear1 (NAL-NL1), NAL-NL2 [17,18] and desired sensation level input/output formula [19]. These methods prescribe the insertion gains in terms of 1/3 octave bands from 160 Hz to 8 kHz, which closely match the spectral characteristics of the cochlea. A sample audiogram and its NAL-NL2 prescription are shown in Fig. 2. ANSI S1.11 standard defines the specifications for 1/3 octave band pass filters from 25 Hz to 20 kHz [20]. 18 bands from 157 Hz to 8 kHz are chosen which covers the entire audiogram frequency spectrum. The center frequency, f m_n , of n th band is defined as where G is the octave ratio which is 2 for base-2 systems, n is the frequency band number and f r is the reference frequency which is 1 kHz. For 1/3 octave b is 3. The band numbers corresponding to 157 Hz is 22 and that of 8 kHz is 39. The lower and upper pass band edges of the filters are defined as the pass band for each 1/3 octave is defined as f p2_n − f p1_n . The maximum pass band ripple (δ 1 ) for Class-2 filters is specified as ±0.5 dB for each band and minimum stop band attenuation (δ 2 ) is 60 dB. The stop band edge frequencies for 60 dB attenuation is specified as

Proposed algorithm
The architecture of the proposed FB is shown in Fig. 3. Parks-McClellan (PM) algorithm is followed for prototype filter design [21]. The order of the filter in PM algorithm is given by P ≃ −10log 10 (δ 1 δ 2 ) − 13 2.43B TW (6) where B TW = min(B TW1 , B TW2 ). B TW1 and B TW2 are transition bandwidths ( f p1 − f s1 ) and ( f s2 − f p2 ), respectively. Equation (6) shows that increasing the minimum transition bandwidth will decrease the order of the filter. According to ANSI S1.11 Class-2 specifications, the band edge frequencies for H 22 , i.e. f s1_22 , f p1_22 , f p2_22 and f s2_22 are 29, 140, 177 and 853 Hz, respectively. Modifying the band edge frequencies according to minimum of transition bandwidths which is 111 Hz yields new specification as 29, 140, 177 and 288 Hz. The direct FIR implementation for the above specification gives an order of 430 which will result in 216 multiplications per sample for band 22 alone and will cause a GD of ∼9 ms for a sampling frequency of 24 kHz. Considering the upper band, i.e. H 39 , the ANSI S1.11 Class-2 specifications for the band edge frequencies are 1472, 7127, 8980 and 43,478 Hz. Taking the minimum of transition bandwidths, the band edge frequencies will get modified to 1472, 7127, 8980 and 14,635 Hz. To meet this specification, a sampling frequency of 29,270 Hz is required to satisfy the Nyquist criteria, which is high for hearing aids. In hearing aids, the frequency spectrum is highly restricted by transducer frequency characteristics, generally in the range of 5 kHz for both microphones as well as receivers. Audiograms are recorded for a frequency spectrum of 160 Hz to 8 kHz and hence a sampling frequency <16 kHz is required. We opted for a sampling frequency of 24 kHz as frequencies beyond 8 kHz are perceptually insignificant in practical hearing aids. It also facilitates a comparison of the performance of the architecture with other ANSI S1.11 based FBs [9][10][11][12] which also use a sampling frequency of 24 kHz. So the specifications are modified to the maximum margin possible, i.e. 4107, 7127, 8980 and 12,000 Hz, which gave a filter order of 16. We opted for critical sampling considering the fact that in practical hearing aids, the frequencies beyond 8 kHz do not matter much. This analysis shows that the lowest stop band edge is decided by the ANSI S1.11 specifications, i.e. f s1_22 and highest stop band edge is decided by the sampling frequency, i.e. f s2_39 which is still within the required ANSI specifications.
In the proposed architecture, interpolated FIR (IFIR) technique [22] is used to reduce the computational complexity. The upper nine prototype filters (H 39 -H 31 ) were designed separately using PM algorithm and interpolated by a factor of 8 to obtain the lower 9 filters (H 30 − H 22 ). ANSI S1.11 requirements are used to design filter H 31 so that it will meet the specification for lowest band, H 22 . Therefore, the band edge specifications for H 31 are 229, 1123, 1414 and 2308 Hz. Based on the above design criteria, the required band edge specifications for filters H 39 , H 31 and H 22 are as given in Table 1. The upper stop band edges for filters H 32 -H 38 are computed using a linear interpolation between the upper stop band edges of H 31 ( f s2_31 ) and H 39 ( f s2_39 ) in octave scale as given below where where f s2_n is the upper stop band edge and f m_n is the centre frequency of n th band. Calculated band edge frequencies and its corresponding orders for filters f 39f 31 are given in Table 2. This new band structure will distribute the inter-band interference almost uniformly among all the 18 bands since the band edges are linearly varying from H 31 to H 39 in a logarithmic scale and satisfy the ANSI S1.11 specifications with reduced filter order. Fig. 4 shows the plot of inter-band interference from the first and second neighbouring bands to each filter. I r1 is interference from the first neighbouring band from the right side, I r2 is the interference from the second neighbouring band from the right side, I l1 is the interference from the first neighbouring band from left side and I l2 is the interference from the second neighbouring band from left side. It can be observed that the interference from the first neighbouring bands is almost uniformly distributed among all the bands except for the case of interference between bands 9 and 10. Since band 10 is derived from band 18 instead of band 9, there is a comparable difference in slope in the interference plot between bands 9 and 10. Interference from the second neighbouring bands is <−10 dB in most of the cases, which is negligible compared to first neighbouring band interference.
The GD of an IFIR combination can be calculated as where L is the interpolation factor, N a and N i represent orders of analysis and interpolation filters, respectively, and f s is the sampling frequency.
Since the interpolation factor is high, the constraints on image cancellation filter specification become high. Here transition bandwidth of interpolation filter is decided by f p2_30 of H 30 and f s1_22 of first image of H 22 , which gives an order of 118 for the low pass filter. It will adversely affect the GD as well as total multiplications per sample of the entire system. To reduce the GD, two interpolation filters are used instead of one, which will relax the band edge requirements. For example, let the lower 9 band filters be grouped into two as H 30_28 and H 27_22 and two low pass filters I 30_28 and I 27_22 be used to cancel out the image frequencies of H 30_28 and H 27_22 . Then the band edge requirements of I 30_28 is decided by f p2_30 of H 30 and f s1_28 of first image of H 28 which is higher than those of H 22 , so the order of I 30_28 is obtained as 80. Similarly the band edge requirements of I 27_22 is decided by f p2_27 of H 27 which is lesser than that of H 30 and f s1_22 of first image of H 22 and the order of I 27_22 is obtained as 50. Now the maximum GD of the entire structure is decided by the maximum of GDs introduced by the combinations of (H 30_28 &I 30_28 ) and (H 27_22 &I 27_22 ), which is 4.67 and 10.04 ms, respectively. Table 3 shows the exploration results of the order of interpolation filters for all the combinations of the analysis filters in groups of two and their respective GDs.
It can be observed that filter pair (H 30_24 H 23_22 ) gives the minimum GD of 9.75 ms with a slight increase in the total number of multiplications required to 75 for interpolation filters.

Hardware implementation
The hardware realisation structure of the proposed FB is shown in Fig. 5. The architecture contains a total of 20 filters with odd tap length. The co-efficient symmetry property of the filters is used to reduce the number of multiplications. Table 4 shows the filter orders and the corresponding number of multiplications required for each filter.
The entire system was implemented using 16 bit fixed point arithmetic. The prescribed gains G xy , obtained from the NAL-NL2  formula, are applied to the input signal as it passes through each filter. These gains are stored in programmable registers. Separate input ports are provided to load the gain values and input data. A five bit address is needed to choose between different gains. After interpolation, the tap length of filter H 22 becomes 433, which requires the maximum number of delay registers among all 18 bands. This delay line can be shared between the 18 analysis filters. Two separate delay lines are required for interpolation filters I 30_24 and I 23_22 of lengths 111 and 37, respectively. Nineteen buffer registers are needed to synchronise filters from H 39 to H 31 as shown in Fig. 5. Similarly, 80 buffer registers are needed to synchronise filters from H 30 to H 24 and 48 buffers are needed between filters H 23 and H 22 . The maximum GD of the proposed architecture is due to the combination of filter H 22 and the interpolation filter I 23_22 . So 207 and 35 buffer registers are needed to synchronise the upper nine bands and the output of interpolation filter I 30_24 , respectively. So the proposed FB architecture requires a total of 581 data line registers and 389 synchronisation buffers to satisfy the linear phase condition. The entire FB needs to be synchronised according to three delay lines as shown in Fig. 6  the combined output to the delay line for I 30_24 . Finally, the combination of filters from H 31 to H 39 and the output from I 30_24 should be synchronised with the output of I 23_22 .
A multi-MAC based design was followed for the implementation of the proposed algorithm which is preferred for low power consumption [11]. Sampling frequency was chosen as 24 kHz. The entire FB was designed using 30 MAC units with a clock frequency of 384 kHz. Various possibilities for the number of MAC units are explored as given in Table 5. Total number of multiplications per sample for the proposed architecture is 347. If a single MAC unit is used for performing all the multiplications, a clock frequency of 8.328 MHz will be required. Increased clock frequency will cause higher switching power dissipation. If Fig. 7. N represents the order of the filter and k is the count value from 0 to 15. From Fig. 7, it can be observed that the longest flop to flop path constitutes only one multiplier, two adders and two multiplexers. Since I 30_24 requires 56 multiplications per sample (MPY), 14 multiplications per MAC will be enough to complete one convolution operation which will result in 1 stall cycle per MAC. Outputs of all four MAC registers (MAC30_241Reg to MAC30_244Reg) are added once the count value reaches 14 and stored in the output register of filter I 30_24 , which is MAC30_24_Reg as shown in the figure. Therefore, one extra cycle is needed to perform this final addition operation for the outputs of four different MAC units. In the case of filter I 23_22 , only two accumulators are needed instead of four.

Simulation results and discussion
All the filter coefficients were obtained using MATLAB ® 2015.2 with Signal Processing toolbox. firpm function was used to design the prototype filters. Filter tap lengths were adjusted to next higher order where required, to make all the filters of type I, which will make the hardware implementation easier. Frequency response of the designed 18 band FB is shown in Fig. 8. According to ANSI S1.11 Class-2 filter specifications, the passband ripples should be within ±0.5 dB, which is important to keep the final audiogram matching error within ±1.5 dB. However, the inter-band interference from adjacent bands and the contribution of ripples from the interpolation filter can adversely affect the overall passband distortion from the desired specification. So a two-stage gain optimisation procedure was followed as given in [12] to reduce the effect of inter-band interference and the interpolation filter ripple contribution, to make the overall matching error within ±1.5 dB. During the first stage of gain optimisation, the average flat band error was kept within ±0.5 dB. This gain offset was added  to the prescribed gain and applied the same optimisation algorithm during the second stage after applying the fitting gain to keep the final fitting error <±1.5 dB. The flat band response of the proposed FB is shown in Fig. 9.

Comparison with other architectures
A comparison with other architectures available in the literature is given in Table 6. Kuo et al. [10] introduced the use of ANSI S1.11 specifications for fractional octave FBs for hearing aid application. They used multirate architecture to reduce computational complexity. Three prototype filters were used to generate the entire FB along with interpolation and decimation filters. The design was satisfying the specifications with a lesser number of coefficients but the GD (78 ms) was high. Liu et al. [11] modified the architecture to meet the delay constraint of 10 ms by relaxing the ANSI specifications and constricting the downsampling factor to four. They used the same design as in [10] for upper three octaves and lower 3 octaves were designed with a restricted downsampling factor of 4. All the lower six filters were designed with the same order of 97, which is redundant considering the requirements for filters other than filter H 22 . Yang et al. [12], further modified the architecture in such a way that the 18 bands are generated from a single prototype filter by fractional sub-sampling. This architecture is also a relaxed version of ANSI S1.11 specifications. In [12], every band is at a different sampling rate which is difficult to implement practically as other algorithms like DRC and NR are also to be implemented in hearing aids. The modifications to [10] in later publications were mainly focussed on lower 9 bands which were responsible for increased complexity and delay. The upper 9 bands specifications of [11,12] were similar to [10].
In the proposed design, the complexity of the upper 9 bands was reduced by carefully stretching the design criteria to its maximum possible limits within ANSI S1.11 specifications and derived the lower 9 band filters from them. This approach resulted in a large reduction in filter order required for upper 9 bands and an overall reduction in the total number of filter coefficients required for the 18 band filter structure. The design requires 212 coefficients, which is <50% of other architectures with similar GD. Another ANSI S1.11 architecture available in the literature is [9], which used a non-linear structure based on frequency warping and a combination of cosine modulation and all pass transform. The algorithm was not tested after hardware implementation. Lai's et al. also conclude that the architecture will cause phase distortion due to the IIR structure used in it. The linear phase requirement is important if FBC algorithms are to be incorporated in the hearing aid.
Considering the hardware implementations available in the literature for ANSI S1.11 specifications based architectures, [10,11] have implemented only the analysis FB and not the synthesis FB. In [10], authors have given the hardware results for the analysis part, which requires fewer number of storage elements, but its synthesis part requires 3432 buffer registers to meet the linear phase requirement which is large. Moreover, different octaves are running at different sampling frequencies, which will require the generation of different clock frequencies. Handling clock domain crossing issues may become critical in hardware implementation when other complex signal enhancement algorithms like DRC and NR are to be added at different bands. The generation of the different clock frequencies will necessitate overhead logic as well. The synthesis will become complex as buffers of different bands at different frequencies will have to be synchronised even though the architecture has a regular structure. Similar problems can arise in [11] due to its multirate structure.
Comparing to the above two architectures, the proposed algorithm has the same sampling frequency for all the filters at the cost of a slight increase in the number of multiplications per sample. It is more practical to implement and easier to incorporate other signal enhancement algorithms without altering the sampling rate.

Hardware results
The entire architecture is implemented using Verilog Hardware Description Language, and synthesised, placed and routed with Cadence ® Design tools using UMC 65 nm standard cell libraries.  Post layout core power analysis was done using Cadence Voltus TM IC Power Integrity Solution with a maximum switching activity factor of one for typical parasitic corner. Audiograms corresponding to most commonly found eight different hearing losses available in [24], are widely used in research related to audiogram fitting. The prescribed insertion gains for each audiogram were obtained from the NAL-NL2 formula for an input level of 50 dB. The gains were applied to the proposed FB implementation and matching error for each band for all the audiograms were obtained. The quantisation error was obtained by comparing the 16-bit hardware implementation results with the MATLAB double precision implementation of the proposed architecture. The quantisation error results are given in Table 7. It can be observed that the maximum quantisation error is 0.122 dB. The audiogram matching results are shown in Fig. 10. The maximum matching error for each audiogram is given in Table 8. The results show that the error is <±1 dB in all the cases which is less than the audible difference of 1.5 dB. Audiogram 6, which is having a rapidly changing slope, has the maximum matching error.
Post implementation core power results are given in Table 9 for a supply voltage of 1.2 V. The FB architecture consumes a total power of 0.37 mW. The FB algorithm was combined with a single channel DRC [25] and tested with a sample audio file for flatband condition and without any compression applied. The input and reconstructed output waveforms are shown in Fig. 11. The FB with DRC consumes a power of 0.39 mW. In state of the art hearing aids, audio CODEC [26] and transducers [27] generally consume around 0.35 mW of power. To meet the minimum specification of <1 mW [28], for a digital hearing aid, the DSP part is having a power margin of around 0.65 mW in which, the FB consumes most of the power compared to other signal enhancing algorithms. The proposed algorithm consumes only around 60% of the available power margin with a standard implementation. Power consumption can be further reduced by applying various low power design techniques. The chip is having a die area of 0.81 mm 2 and 71,215 standard cells with 43.4% of logic cells, 55.1% sequential cells and 1.5% of inverters. To compare the power consumption, the entire algorithm was implemented using 2 MAC units, one for all the 18 analysis bands and one for both the interpolation filters, which requires a clock frequency of 6.528 MHz and the power consumption was obtained as 5.044 mW. It shows that multi MAC

Conclusion
In this study, a complexity-oriented architecture for ANSI S1.11 1/3 octave Class-2 FB for hearing aids is developed. A method to define band edge frequencies of each prototype filter to get minimum order which satisfies the required specifications is proposed. Hardware implementation of the algorithm was carried out using 65 nm standard cell libraries. The hardware output was tested with NAL-NL2 prescription for different audiograms and the maximum matching error was within the practical limit of ±1.5 dB.
The proposed architecture shows >50% reduction in the total number of filter coefficients required and <10 ms GD with practically applicable power consumption. Future work includes incorporating multiband NR algorithm and a multichannel compression algorithm to the proposed FB. Since the entire FB is running at the same sampling frequency, incorporating such algorithms will be easier compared to multirate architectures. The overall performance of the algorithm can be improved further by applying low power hardware implementation techniques. The number of MAC units required may be optimised with a better design space exploration and can change depending upon the filter structure. Developing a generalised solution to find the best possible number of MAC units can also be taken as a future work.