当前位置:文档之家› Zero Forcing Block Linear Equalization for TD-SCDMA

Zero Forcing Block Linear Equalization for TD-SCDMA

Zero Forcing Block Linear Equalization for TD-SCDMA
Zero Forcing Block Linear Equalization for TD-SCDMA

Zero Forcing Block Linear Equalization for TD-SCDMA

Robert Link1 Kan Zang2 James Ge3

(1: Holley Communications (Canada) Inc, 2: Holley Information Industry Development Co.Ltd,

3: Holley Information Industry Group )

1. Abstract

We review zero-forcing block linear equalization (ZF-BLE) for multi-user detection in TD-SCDMA user equipment using an approximate Cholesky factorization to solve the matrix equation for the least-squares estimate of the data; and determine the required degree of the approximation. We compare this solution with the leading proposed solutions for ZF-

BLE and show that, unlike for the high chip rate option of the TDD mode for the UMTS air-interface, this solution has the lowest computational complexity of all solutions which exhibit near-ideal bit error rate performance.

2. Acronyms

AWGN Additive White Gaussian Noise

BER Bit Error Rate

BPSK Binary Phase Shift Keying

CDMA Code Division Multiple Access

CP Chip Period (T c)

DFT Discrete Fourier Transform

FFT Fast Fourier Transform

FLOP Floating Point Operation

GP Guard Period

HCR High Chip Rate

JD Joint Detection

kbps kilo bit per second

LCR Low Chip Rate

Mcps Mega Chip Per Second

MIPS Million Instructions Per Second

MMSE-BDFE Minimum Mean-Square-Error Block Decision Feedback Equalizer

MMSE-BLE Minimum Mean-Square-Error Block Linear Equalization (or Equalizer) QPSK Quadrature Phase Shift Keying

SNR Signal to Noise Ratio (E b/N t)

TD-CDMA Time Division CDMA (HCR option of TDD UMTS)

TDD Time Division Duplex

TD-SCDMA Time Division Synchronous CDMA (LCR option of TDD UMTS)

UE User Equipment

UMTS Universal Mobile Telecommunications System

ZF-BDFE Zero Forcing Block Decision Feedback Equalizer

ZF-BLE Zero Forcing Block Linear Equalization (or Equalizer)

3. Introduction

It is well known that TD-SCDMA [1], the low chip rate option of the TDD mode for the UMTS air-interface, requires multi-user (joint) detection at the receiver of the node B or the UE to achieve acceptable link performance. The four types of implementable, high-performance joint detectors for CDMA are introduced in [5]. These are based on zero-forcing or minimum mean-square-error block linear equalization with or without decision feedback (ZF-BLE, MMSE-BLE, ZF-BDFE, and MMSE-BDFE). In [6], two simplified related detectors are presented that are applicable only to the situation in which a single element transmit

antenna is employed. However, since this precludes smart antenna technology, one of the hallmarks of TD-SCDMA, we will not consider these two simpler detectors.

The authors of [5] and others, for example in [7], find that the ZF-BLE detector gives vastly superior performance over the conventional RAKE receiver and its decision feedback extension. Furthermore, they also find that the performance advantage of the other three (MMSE-BLE, ZF-BDFE, and MMSE-BDFE) more complex joint detectors over the ZF-BLE detector is comparatively small. Therefore the ZF-BLE detector is regarded as an entry level joint detector most suitable for TD-SCDMA systems. Even though the complexity of this detector is the lowe st of all the high performance joint detectors, it’s complexity is still very much an issue when it comes to implementing it in UE due to battery lifetime

considerations.

Various techniques have been proposed to solve the ZF-BLE equations (see [8], [9], [10] and references therein). In this paper we review the algorithm based on an approximate Cholesky factorization first proposed in [9], and provide floating-point simulation results for the case of TD-SCDMA to determine the degree of approximation required for the

factorization. We compare this solution with the leading proposed solutions for ZF-BLE, and show that this solution has the lowest computational complexity of all solutions which exhibit near-ideal bit error rate performance.

In the next section we describe the TD-SCDMA downlink signal model, and follow this with a section giving a detailed description of the ZF-BLE detector which uses the approximate Cholesky factorization. The following section provides complexity analysis of the method, introduces the other four leading techniques for solving the ZF-BLE equations, and

compares all five methods on the basis of computational complexity. The final two sections give the simulated performance results, and the conclusion.

4. TD-SCDMA Downlink

4.1 Data Format

The format of a time-slot is shown in Figure 4-1. It consists of two data fields each of 352 chips, and a 144 chip midamble [2].

D ata sym bols

352 chips M idam ble 144 chips

D ata sym bols 352 chips G P 16C P

Figure 4-1: Burst Format

The symbol duration T s depends on the spreading factor Q and the chip duration T c : T s = Q ?T c , where T c =chip_rate 1

, and chip_rate is 1.28 Mcps. Each data field contains the data

for K code separated users, with K restricted to a maximum of 10. The number of complex symbols N k per data field per code k is 352/Q k , where Q k is the spread factor of code k . For the downlink, and for a maximum data rate of 384 kbps, Q k is restricted to be 16.

The data modulation is QPSK, with a data symbol is generated from two consecutive data bits, using the following mapping to complex symbols:

Table 4-1: QPSK Modulation

The transmitter spreads the symbols with the channelization codes, multiplies the chips by the cell specific scrambling code, and applies a channel gain [3]. The combined

channelization code and scrambling code is the spreading code in the following.

If the RF transmitter uses a single element antenna, the chips of the k code channels are simply added together at the final stage of the transmitter. However, if an antenna array is employed at the node B, then each code channel is weighted by a user location dependent steering vector. Because of this, the code channels for different users will have different channel responses at the terminal of any particular UE.

4.2 Received Signal Model

By convention, vectors are considered to be column vectors, ()T ? means matrix transpose, and ()H ?means complex conjugate transpose.

Denote the vector of N complex data symbols that are transmitted during one half burst on the k ’th code channel by

;,]...[)()()()(K k d d d d T k N k k k ≤≤=121

and combine the transmitted data symbols for all K code channels into one vector d of NK symbols by interlacing:

[]{}[];......)()()(T

T N T T

T K d d d d d d vec d 2121== where the vec operator forms a column vector from the elements of its argument matrix by concatenating the columns of that matrix, starting from the left.

Denote the k ’th spreading code of Q complex chips by

;,]...[)()()()(K k sc sc sc sc T k Q k k k ≤≤=121

and denote the W tap channel impulse response at the m ’th antenna of the k ’th mobile by

.,,]...[),(),(),(),(M m K k h h h h T m k W m k m k m k ≤≤≤≤=1121

Define the combined channel impulse vector of the k ’th user at the m ’th antenna as the convolution of the corresponding channel impulse response with the corresponding spreading code:

()();]...[,),(),(),(),(k m k T m k W Q m k m k m k sc h b b b b ?==-+121

and define the M(Q+W-1) component combined channel impulse vector of the k ’th user by interlacing the combined vectors at each antenna:

()[]{}.,...),(),(),(K k b b b vec b T M k k k k ≤≤=121

Then, denoting the vector of NQ+W-1 complex received chip measurements at the m ’th antenna by

;,]...[)()()()(M m d x x x T m W NQ m m m ≤≤=-+1121

and interlacing measurements at the antennas to form the M (NQ+W -1) complex space-time measurement vector

[]{}T M x x x vec x )()()(...21=,

it can easily be shown that the received signal is given by the following system equation:

n Ad x +=;

where A is the M (NQ+W-1)хNK complex system matrix, and n is formed from the complex noise vector at each antenna

;,]...[)()()()(M m n n n n T m W NQ m m m ≤≤=-+1121

by interlacing the components at the antennas:

[]{}T M n n n vec n )()()(...21=.

The system matrix A has the following block-Toeplitz structure: K

NK

Figure 4-2: Structure of the System Matrix A

where the M (Q+W -1)хK complex system sub-matrix V is formed from the combined channel impulse vectors of the K users as shown in the following figure:

K

M (Q +W

Figure 4-3: Structure of the System Sub-Matrix V

5. ZF-BLE Equations and the Cholesky Algorithm

5.1 The ZF-BLE Detector

With the matched-filter outputs defined to be

x A y H =;

and the correlation matrix defined to be

A A S H

=; the least squares estimate of d from the system equation is

y S d 1-=?.

A matched-filter detector takes y as its estimate of d . Multiplying the matched filter outputs by the inverse of the correlation matrix is known as Zero Forcing Block Linear Equalization because it eliminates the inter-symbol interference and the multiple-access interference of the matched filter detector at the expense of increasing the variance of the noise term. Because of the block-Toeplitz structure of A , the hermitian matrix S is also block-Toeplitz with a band structure:

()K

1+r

Figure 5-1: The Structure of S (for N = 5). Only the dark shaded part needs to be

computed.

In general, S has 2ρ+1 bands where ρ depends on the degree of overlap between the non-zero sub-matrices of A . Specifically, ρ is the largest integer for which

()21-≤-W Q r .

For TD-SCDMA, Q = W = 16, and therefore ρ = 1.

An alternate, generally more accurate, ZF-BLE estimate can be obtained by minimizing the variance of the estimated data symbols. In the case that the noise covariance matrix is proportional to the identity matrix, this solution reduces to that above. It is known that the noise covariance matrix can be approximated as the Kronecker product of the temporal covariance matrix and the spatial covariance matrix. Furthermore, measurements for UTRA-TDD have shown that the temporal covariance matrix is well approximated by the identity matrix. In the case that M , the number of UE antennas, is equal to 1, the spatial covariance matrix is trivially proportional to the identity, making the above solution equivalent to the minimum variance solution. See [8] for the more accurate (when M is greater than 1) minimum variance estimate.

5.2 Cholesky Algorithm for Solving the ZF-BLE Equations

The NK хN K hermitian matrix S is factored into the product of a lower triangular matrix L and its upper triangular conjugate transpose,

H LL S =,

as follows [9]:

for j = 1 : NK

∑-=-=1

1j k jk jk jj jj L L

S L *

for i = j + 1 : NK

??

????-=∑-=111j k jk ik ij jj ij L L S L L *

end

end

If we define

d L z H ?=;

then the ZF-BLE matrix equation becomes y

Lz =. We can then solve this latter equation for z by using forward substitution, after which we

can solve the previous equation for d

? by using backward substitution. Forward substitution:

for i = 1 : NK

??????????-=∑-=111i j j ij i ii i z L y L z

end

Backward substitution:

for i = NK : -1 : 1

??????????-=∑+=NK

i j j H ij i H ii i d L z L d 11?? end

Unfortunately the Cholesky factorization of the NK хN K matrix S requires O[(NK )3] floating point operations, making this exact solution of the ZF-BLE equations prohibitively complex. In the next section we cover an approximate Cholesky factorization [9] which requires only an exact Cholesky factorization of one, or for more accuracy two, K хK matrices –

significantly reducing the required number of operations.

5.3 Approximate Cholesky Factorization

While the correlation matrix S is block 2ρ+1 diagonal and block-Toeplitz, the resulting lower triangular Cholesky factor L is block diagonal but not block-Toeplitz. However, it is

approximately block-Toeplitz: it suffices to calculate the first few block-columns (or block-rows) of L , and then assume that the remaining block-columns (or block-rows) are identical to the last computed block-column (or-block-row). The number of block-columns to compute before setting the remaining block-columns equal to the last one is known as the replication index i m .

With ρ = 1, S can be written as the N хN block-matrix:

????????????

?

?

?

?

??

??=1121211121211121211121

2111X X X X X X X X X X X X X S H H H H ,

where each block is a K хK matrix.

For replication index i m = 1, the approximate lower triangular Cholesky factor L can be written as the N хN block-matrix:

????????????

?

?

?

?

??

??=H H H H H U F U F U F U F U L 111111111111

1 ,

where each block is a K хK matrix. These blocks are derived by the following two equations:

][chol 111X U =

112111-=U X F .

Here chol returns the upper triangular Cholesky factor of its argument. Since U 1 is upper triangular, the second of these equations, 21111X U F =, can be solved using back

substitution.

For replication index i m = 2, the approximate lower triangular Cholesky factor L can be written as the N хN block-matrix:

????????????

?

?

?

?

??

??=H H H H H U F U F U F U F U L 212212212211

1 ,

where each block is a K хK matrix. These blocks are derived by the following four equations:

][chol 111X U =

112111-=U X F

][chol H F F X U 1111112-=

122112-=U X F .

Later, it will be seen that various fading channel conditions indicate that a replication index of 2 provides an approximation sufficiently accurate that there are no noticeable performance degradations of the ZF-BLE.

The ZF-BLE equations are then solved as before using forward and backward substitution.

6. Complexity Analysis

6.1 Cholesky Algorithm Complexity

Table 6-1 shows the number of real multiplications, divisions and square roots required for JD processing of 1 time-slot with L data blocks1 (L is equal to 2 for TD-SCDMA), for the case that the correlation matrix is tri-diagonal (i.e. ρ = 1), using the Cholesky algorithm (with approximate factorization) of the previous section. When the replication index i m is equal to N, the Cholesky factorization becomes exact.

Table 6-1: Number of FLOPs for JD Processing 1 Time-Slot Using Cholesky

Algorithm

We use these three operation counts as the basis of comparison with other methods because they are the most expensive to implement. The formation of the system matrix A requires only the operation of sign inversion with accumulation, and the number of these (real) required per time-slot is given by:

()1

()()

Q

W

Q

M K.

Q

-Q

4-

1

+

+

Consider the MIPS requirement of a DSP implementation assuming: 1 cycle or instruction for a real inversion and add, or a real multiply, or a real multiply and add, or a real multiply and subtract; 6 cycles or instructions for a real division; and 10 cycles or instructions for a real square root. Then for replication index two, and parameters M =1 K=10 L=2 N=22

Q=16 W=16, the processing of a single time-slot in 5 msec would require approximately 30 MIPS. To support a 384 kbps data rate, 4 such time-slots would need to be processed in 5 msec, requiring 120 MIPS. For UE with 2 antennas (M =2), 182 MIPS would be required to support 384 kbps data.

6.2 Other Leading Methods

We compare the Cholesky Algorithm detailed herein for solving the ZF-BLE equations with 4 other leading methods: Block-Levinson Algorithm, Block-Schur Algorithm, Block-Fourier Algorithm, and Fourier-Block Algorithm. The Block-Levinson, Block-Schur, Block-Fourier, and Cholesky algorithms are discussed and compared for TD-CDMA in [8]; while the Fourier-Block algorithm is proposed and analyzed in [10].

1L is used to denote the Cholesky matrix, and to denote the number of data blocks in a time-slot. This seemingly unfortunate choice of notation has been made so that we correspond with the notation used in the references as closely as possible.

6.2.1 Levinson Algorithm

The Levinson algorithm can compute the solution to a system of linear equations with a Hermitian, Toeplitz, positive definite matrix of size nхn with only O(n2) operations. The Block-Levinson algorithm extends the algorithm to the case that the matrix is block-Toeplitz. The block algorithm is recursive with N main iterations of computations on KхK matrices. Approximation can be introduced by setting two internal parameters (α and η) to zero after several iterations of the main loop. Further approximation can be introduced by shortening the length of the vector operations on two internal block-vectors (Y and W). In the following, “n iterations” will mean that α and η are set to zero after n-1 iterations, and n blocks of Y and of W are updated at each iteration [8].

6.2.2 Schur Algorithm

The Schur algorithm efficiently finds the triangular factor R of a QR representation of the system matrix A, where R is again the Cholesky factor of S. The Block-Schur algorithm works with a representation of S that contains much less redundancy than S itself. The matrix R becomes available block-row by block-row, and like in the approximate Cholesky technique, an approximation is introduced by simply stopping the algorithm after enough block-rows have been computed [11].

6.2.3 Block-Fourier Algorithm

In the Block-Fourier algorithm, the system matrix A is extended to be block-circulant (this introduces approximation) and the ZF-BLE equations are block-diagonalized by a block-Fourier transform (implemented using FFTs). However, the sub-matrices along the diagonal of the transformed block-diagonal matrix are unstructured, and the un-approximated Cholesky decomposition is applied to invert these. A further computational reduction is achieved by partitioning the data vector into smaller blocks, and using the overlap-save technique to avoid edge distortion introduced by the partitioning [8]. In the following, the partitioned blocks will be denoted by (FFT length, pre-lap, post-lap).

6.2.4 Fourier-Block Algorithm

Finally, in the Fourier-Block algorithm, rather than arranging the data vector as herein to obtain a block-Toeplitz system matrix; the data is arranged as in [5] to obtain a Toeplitz-block structure. The resulting correlation matrix S, is composed of K2 Toeplitz NхN sub-matrices. Each sub-matrix is then approximated by a circulant matrix, and a Fourier transform is applied that simultaneously diagonalizes each of these, so that in frequency-space, S is composed of K2NхN diagonal sub-matrices. The Cholesky factorization of this matrix can be performed in O(K3N) operations. The ZF-BLE equations are then solved in frequency-space using forward and backward substitution [10].

6.3 Comparative Complexity

Although the two Fourier-based methods also use Cholesky factorization, unless otherwise stated, “Cholesky algorithm” herein refers to the direct (non-Fourier) method of solution using the approximate Cholesky factorization as detailed in section 5.

We remind that TD-SCDMA is the low chip rate option, while TD-CDMA refers to the high chip rate option, of the TDD mode for the UMTS air-interface. In [8], [9] and [10], TD-CDMA is considered where W is much larger (equal to 57) due to the higher chipping rate, and the data block length N is also larger due to the larger number of chips per time-slot. The authors of [8] find that for TD-CDMA parameters (at N=60) the Block-Fourier algorithm is about twice as efficient as the Cholesky, Block-Levinson, or Block-Schur algorithms – with the latter three all very close in efficiency (Block-Schur being the most expensive of these three). However, as N decreases (still with large W) they find all four algorithms to become very close in computational efficiency. The authors of [10] find that for TD-CDMA their Fourier-Block algorithm is significantly more efficient than the Block-Fourier algorithm, and

therefore the most efficient of all methods proposed thus far. However, we find for TD-SCDMA, that the Block-Fourier and Fourier-Block algorithm are very close in efficiency and are significantly computationally cheaper than the Block-Levinson, or Block-Schur algorithms. In terms of multiplications, the Fourier-Block algorithm is also cheaper than the Cholesky algorithm; but when the other floating-point operations are also taken into consideration, the Cholesky algorithm actually has the lowest computational complexity of

all methods for comparable performance.

In Table 6-2 and Table 6-3 all five algorithms are compared in terms of the floating point operation counts of multiplication, division, and square root, for JD processing in single-antenna and dual-antenna UE, respectively, for the worst case of 10 simultaneous users

per time-slot. As in the previous section, we form a total from these by assuming a real multiplication to equal 1 FLOP, a real division to cost 6 FLOPs, and a real square root to cost 10 FLOPs. The operation counts for the Block-Levinson, Block-Schur, and Block-Fourier algorithms were obtained by applying the program complexity.m found online at [12]; the operation counts for the Fourier-Block algorithm were obtained as described in section 6.3.1, and the operation counts for the Cholesky algorithm were obtained from Table 6-1. The approximation parameters for the four algorithms that we are comparing to are chosen such that each algorithm achieves near-ideal BER performance. In the simulation results section we will find that the replication index must be taken to be 2 for the Cholesky algorithm to have near-ideal performance for TD-SCDMA. This is also what is found for TD-CDMA in [9]. For the other four algorithms we assume that the approximation parameters that gave near-ideal performance for TD-CDMA also give near-ideal performance for TD-SCDMA.

We see from Table 6-2 and Table 6-3, for near-ideal performance, that the Cholesky algorithm (with replication index 2) is the cheapest, easily beating the Block-Levinson and Block-Schur algorithms, and slightly beating the Fourier algorithms. For M equal to 1, the Cholesky algorithm just beats the otherwise best Block-Fourier (16,2,3) algorithm by a ratio of approximately 139.8K to 141.4K FLOPS. For M equal to 2, the Cholesky algorithm beats the otherwise best Fourier-Block algorithm by a ratio of approximately 207.2K to 215.5K FLOPS. While a 1.6K and an 8.3K FLOP difference may seem nearly insignificant, the Cholesky algorithm is further favored by the fact that it is much more straight-forward than the Fourier algorithms. Both Fourier algorithms still require a Cholesky factorization, because in both techniques the Fourier transform can not completely diagonalize the correlation matrix.

Table 6-2: Required Number of FLOPs for JD Processing 1 Time-Slot; M =1

K=10 L=2 N=22 Q=16 W=16

Table 6-3: Required Number of FLOPs for JD Processing 1 Time-Slot; M=2

K=10 L=2 N=22 Q=16 W=16

6.3.1 Fourier-Block Computational Requirements

The number of complex multiplications for JD processing a single data block using the Fourier-Block method are given by the formulas of the F-ZF-BLE section in Table I of [10], except that the first two entries (multiplication by A H, and computation of A H A) must be multiplied by M (in [10], antenna arrays are not considered). Note that for all entries except the first two, N must be taken to be 32, the smallest power of 2 greater than 22, so that the DFTs can be performed by FFTs. Furthermore, one must take into account that there are L data blocks per time slot (L=2) so that the multiplication by A H, the FFT, and the forward-back substitution entries must all be multiplied by L when considering JD processing of a time-slot.

The number of divisions and square roots were not tabulated in [10]; however, these are given by 4LNK+N(K2-K)/2 and NK, respectively, where again N is 32.

7. Downlink Simulation Results

Here we give the results for the floating-point, chip-rate, base-band simulation assuming an omni-directional base-station antenna, a single UE antenna, ideal channel estimation and timing synchronization at the UE with no forward power control. Also it was assumed that the receiver had knowledge of which spreading codes were being used. In every case, the spreading factor (number of chips per bit) was 16, correspondingly the number of symbols N per user per data field was 22.

A demodulator recovers soft data bits from the soft symbols, and a decoder decodes a soft bit that is greater (less) than 0 to the bit 0 (1). This function is only part of the simulation test-bench, and will not exist in an implementation where the soft bits are read from the demodulator by the outer receiver for eventual channel decoding. Decoding the uncoded bits in this manner allows us to evaluate the performance of the inner receiver by itself before integrating with channel codecs.

7.1 AWGN Only

In the absence of the Rx and Tx low-pass filters and with no multi-path fading, the code channels remain orthogonal at the receiver. In this case spreading does not affect the BER, and we have confirmed, by testing over an SNR range of –1 to +5 dB, that the simulation has the same BER performance as theoretical BPSK, independent of the number of code channels:

()

=;

BER5

*

b it_snr

erfc

.

where erfc is the complementary error function, and bit_snr is E b/N t.

7.2 Multi-Path Fading

In [4], 3 cases of multi-path fading environments are used for conformance testing. These cases are defined in Table 7-1. We use the generalized Jakes model of [13], with parameter N0 equal to 8, to simulate the fading channel. The carrier frequency was 900 MHz.

Table 7-1: Propagation Conditions for Multi-Path Fading Environments

Our BER versus SNR simulation results for case 1, 2, and 3 are shown in Figure 7-1, Figure 7-2, and Figure 7-3, respectively. For cases 1 and 2 each point represents the average BER of a 36000 RF frames simulation run; whereas in the high Doppler rate case, it was only necessary to run the simulation for a duration of 3600 RF frames to obtain accurate statistics.

When there is only a single user, there is no multi-user interference; and therefore the results for a single user also represent a limit on the best possible performance of a multi-user detector in the case of multiple simultaneous users. We have checked that in the single user case, the Matched Filter and the ZF-BLE detectors give identical performance, which is displayed by the curve marked “Single User” in the figures. The matched filter is equivalent to a RAKE detector.

To test the multi-user detector performance we considered the extreme case of 10 simultaneous users. In fading case 1, the matched filter gives reasonably good results for the multi-user case because the fading channel has only one dominant path (the second path is a full 10 dB lower in power). The fact that the paths interfere with each other is what breaks the code orthogonality between the users. And in fact, we see that in cases 2 and 3 the matched filter performance is so poor that is effectively inoperable – it is impossible to supply enough transmit power to drive the BER below 1%. For cdmaOne, wher e the spreading factor is 64, a RAKE receiver rejects multi-path interference quite well. In TD-SCDMA, with the low spread factor of 16, a RAKE receiver is inadequate, and equalization becomes necessary.

We have used the Cholesky algorithm with approximate factorization, with replication index 1 and with replication index 2, to solve the ZF-BLE matrix equation as presented in section 5. We have checked that in all three cases shown in Figure 7-1, Figure 7-2, and Figure 7-3, that the replication index 2 approximation results in BER performance that is indistinguishable from the performance attained when solving the ZF-BLE matrix equation exactly using an exact Cholesky factorization. Therefore we have not shown the exact ZF-BLE curve, as in each case it is indistinguishable from the displayed curve “ZF-BLE Rep Index 2 Cholesky K=10” which uses th e replication index 2 approximation. With replication index 1, the approximate Cholesky technique suffers from some loss relative to the exact solution. This loss is zero for fading case 1, but for fading case 2 the loss increases with increasing SNR, up to approximately 1 dB at a BER of 0.1%.

In every case tested, the multi-user performance of the exact ZF-BLE detector is reasonably close to the single-user theoretical limit. In cases 1, 2, and 3 at a BER of 1% the loss is 0.8, 1.2, and 2.2 dB, respectively. When forward power control and the channel codecs are integrated with this simulation model, we will be able to compare with the performance standard [4].

Figure 7-1: Uncoded BER vs. SNR, Multi-Path Fading Case 1

Figure 7-2: Uncoded BER vs. SNR, Multi-Path Fading Case 2

Figure 7-3: Uncoded BER vs. SNR, Multi-Path Fading Case 3

8. Conclusion

The exact ZF-BLE detector gives good performance, and because the replication index 2 approximate Cholesky technique (also referred to as the Cholesky algorithm) has the lowest computational complexity of all the techniques that give near-exact ZF-BLE detector performance, we recommend the ZF-BLE Replication Index 2 Cholesky detector as the best choice for TD-SCDMA UE.

9. References

[1] 3GPP TS 25.201: “Ph ysical Layer General Description”, Release 4, V4.3.0, 2002-06.

[2] 3GPP TS 25.221: “Physical Channels and Mapping of Transport Channels onto

Physical Channels (TDD)”, Release 4, V4.7.0, 2002-12.

[3] 3GPP TS 25.223: “Spreading and Modulation (TDD)”, Release 4, V4.5.0, 2002-12.

[4] 3GPP TS 34.122: “Terminal Conformance Specification; Radio Transmission and

Reception (TDD)”, Release 4, V4.6.0, 2002-12.

[5] A. Klein, G.W. Kaleh, and P.W. Baier, “Zero Forcing and Minimum Mean-Square-Error

Equalization for Multiuser Detection in Code-Division Multiple-Access Channels”, IEEE Transactions on Vehicular Technology, Vol. 45, No. 2, May 1996.

[6] A. Klein, “Data Detection Algorithms Specially Designed for the Downlink of CDMA

Mobile Radio Systems”, IEEE Vehicular Technology Conference, pp. 203-207, 1997. [7] Y.H. Wang, M.Y. Sun, and S.W. Oh, “Comparisons of Inner Receivers for TD-SCDMA

User Equipment”, SNUG Singapore, 2003.

[8] M. Vollmer, M. Haardt, and J. Gotze, “Comparative Study of Joint-Detection Techniques

for TD-CDMA based Mobile Radio Sys tems”, IEEE Journal on Selected Areas in

Communications, Vol. 19, Issue 8, Aug. 2001.

[9] H. Karimi and N. Anderson, “A Novel and Efficient Solution to Block-Based Joint-

Detection using Approximate Cholesky Factorization”, The Ninth IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Vol. 3, Sept. 1998.

[10]N. Benvenuto and G. Sostrato, “Joint Detection with Low Computational Complexity for

Hybrid TD-CDMA Systems”, IEEE Journal on Selected Areas in Communications, Vol.

19, No. 1, Jan. 2001.

[11]M. Vollmer, M. Haardt, and J. Gotze, “Schur Algorithms for Joint Detection in TD-CDMA

Based Mobile Radio Systems”, Annals of Telecommunications, Vol. 54, No. 7-8, July-Aug. 1999.

[12]M. Vollmer, “Programs for computing computational requirements of four JD algorithms”,

available online at http://www-dt.e-technik.uni-

dortmund.de/mitarbeiter/mvo/compreq.html.

[13]Y. Li and X. Huang, “The Simulation of Independent Rayleigh Faders”, IEEE

Transactions on Communications, Vol. 50, No. 9, Sept. 2002.

相关主题
相关文档 最新文档