# AN IMPLEMENTATION OF MULTI-DSP SYSTEM ARCHITECTURE FOR PROCESSING VARIANT LENGTH FRAME FOR WEATHER RADAR

#### Min WonJun, Han II, Kang DokGil and Kim JangSu

Institute of Information Science, Kim Il Sung University, D.P.R. of Korea

#### Abstract

In this paper we propose a method for implementation of multi-DSP system which four DSPs are coupled by FPGA for processing variant length frame and noise and showed experimental results. This system receives variant length frame from EDMA and McBSP and increases parallel processing speed. This device has wide prospect in signal processing system, sonar, real-time image processing system.

Keywords:

DSP, FPGA, Digital Signal Processing System, Variant Length Frame, PD Radar

## **1. INTRODUCTION**

Real time, parallel process by multi-DSP tends to extend throughput and enhance algorithm effect and flexibility of system. Especially along developing high performance DSPs, FPGA and microprocessors, many researches are aiming to construct multi-DSP system with high price performance ratio that can be widely used for real time mass data processing such as radar, sonar and imaging processing. Generally parallel processing system performance is determined by price of communication between process elements. In multi-DSP parallel processing system, data communication is guaranteed by peripheral interfaces of DSPs, e.g. there are HPI (host peripheral interface), EMIF (External memory interface), McBSP (Multichannel Buffered Serial Port) of TI TMS series, LinkPort of ADI TigerSHARC series [1] [2] [3] [4]. Instead peripheral interface, it can be used special communication devices. Frame work of multi-DSP parallel processing system is classified the direct connect method, bus direct connect method and indirect connect method [3]. The method using FPGA in real time multi-DSP system is classified two types, for one FPGA roles a "coupling point" in system data processing flow, and for other FPGA, operates as cooperation processor, belongs DSP. In addition, many works present about multi-DSP with FPGA in many application such as radar, image processing and so on [9] [10]. In the previous works about multi-DSP for real-time large capacity data processing, we can conclude as follows. General Methodology of multiprocessor, system structure for combining real-time processing character [3] [9], and high-speeding of processing algorithm [5] [6] [7], combining method between DSPs and FPGA combining methodology [3] [10] were performed and referred applied design [4] [8] [11] [12] for radar and image processing. However, input data are mostly fixed format static data or fixed frame length discreet data. Generally for multi-DSP processing systems which CPU is central processor the format of input data could be changed and frame length of the data could be changed according to the external signal. We offer continuous data sequence whose frame length is changed as variant length continuous frame. For weather radar, we proposed new Multi-DSP system architecture whose

main processor is high-speed float processor TMS320C6713 centred on FPGA and frame length is changed by external motivation.

This paper is organized as follows. In section 2 we introduces signal processing theory of PD radar. In section 3 we showed multi-DSP structure which peripherals device centered on FPGA is combined by DSP. In section 4 shows continuous processing method of variant length continuous frame DSP. Section 5 shows examples of proposed methods. Finally we concludes with summary.

## 2. SIGNAL PROCESSING THEORY FOR PD RADAR

In order to survey speed and distance of clouds and space target, PD radar transmits and receives signal as follows,

$$S(t) = \sum_{i=0}^{N-1} s(t - iT_r(t)) e^{i2\pi f_c t}$$
(1)

where,

$$s(t) = rect\left(\frac{t}{\tau}\right) \tag{2}$$

and  $T_r(t)$  is transmitting pulse period, which is varied according to changing distance, can be expressed as time function.

In radar receiving signal can be expressed as follows,

$$R(t) = \sum_{i=0}^{N-1} \sum_{j=1}^{M} R_{ij} s(t - iT_r(t) - \tau_{ij}) e^{i2\pi (f_c + f_{dij})t}$$
(3)

where  $i = 0, 1, \dots, N-1, j = 1, 2, \dots, M$ ,  $\tau_{ij} = \frac{2R_{ij}}{c}$ . In Eq.(3)  $R_{ij}$  is

reflected signal length of  $j^{\text{th}}$  target  $i^{\text{th}}$  transmitting impulse. The  $f_{d_{ij}}$  is Doppler effect frequency that is caused by target movement.

We must observe location of cloud which is very dangerous and movement of space target by Doppler effect frequency. In Eq.(1) according to  $T_r(t)$  length of frame that is expressed by signal series receiving between transmitting impulse can be changed continuously.  $R_{ij}$  is corrupted by impulse noise.

The variant length frame signal which is obtained at receiver output by high-speed A/D translate is processed by partial part Second interpolation. Let length of partial interval where frame signal is approximated to curve of secondary degree  $s_n$ . In the interval from frame start to  $s_n$  we can get interpolation curve by least squares as follows,

$$S_{msint}(j) = aj^2 + bj + c \tag{4}$$

where,  $S_{m sinf}(j)$  is interpolation curve. Let  $R_{data}(j)$  receive signal than *a*, *b*, *c* above is obtained as follows.

$$\begin{pmatrix} a \\ b \\ c \end{pmatrix} = sfc \cdot sfb$$
 (5)

−1

where,

s

$$fc = \begin{pmatrix} \sum_{j=1}^{sn} j^{4}, \sum_{j=1}^{sn} j^{3}, \sum_{j=1}^{sn} j^{2} \\ \sum_{j=1}^{sn} j^{3}, \sum_{j=1}^{sn} j^{2}, \sum_{j=1}^{sn} j \\ \sum_{j=1}^{sn} j^{2}, \sum_{j=1}^{sn} j, \sum_{j=1}^{sn} 1 \end{pmatrix}$$

$$sfb = \begin{pmatrix} \sum_{j=1}^{sn} j^{2} R_{data}(j) \\ \sum_{j=1}^{sn} j^{1} R_{data}(j) \\ \sum_{j=1}^{sn} R_{data}(j) \end{pmatrix}$$
(6)

Next interval, from  $[s_n/2]$  to length  $s_n$ , we get least square interpolation curve. We set value at  $[s_n/2]$  as starting point. i.e. set value *c* Eqs.(6)  $S_{m \text{ sint}}([s_n/2])$ . Coefficients *a*, *b* can be calculated as follows.

$$\binom{a}{b} = ssc \cdot ssb \tag{7}$$

where,

$$ssc = \left(\sum_{j=1}^{sn} j^{4}, \sum_{j=1}^{sn} j^{3}\right)^{-1}, \\\sum_{j=1}^{sn} j^{3}, \sum_{j=1}^{sn} j^{2}\right)^{-1}, \\ssb = \left(\sum_{j=1}^{sn} j^{2} \left(R_{data}\left(\left[\frac{sn}{2}\right] + j\right) - c\right)\right) \\\sum_{j=1}^{sn} j^{1} \left(R_{data}\left(\left[\frac{sn}{2}\right] + j\right) - c\right)\right)$$
(8)

This process is repeated in all frame interval. Also, we can change signal display resolution by adjusting  $s_n$  size.

#### **3. MULTI-DSP ARCHITECTURE**

#### **3.1 SYSTEM ARCHITECTURE**

The total system architecture which is connected to external device and displays the digital signal processing data on the screen is shown in Fig.1. Main configuration part centered on digital signal processing system, which is proposed in this paper, is as follows.

• Analog Signal Interface: It mainly consists of high-speed A/D convert device and multi-channel A/D convert device. It digitize analog signal and transmit synchronous signal determined frame length to FGPA board.

- *Control Interface*: It receives digital output signal from FPGA board and transfers to analog signal and amplifies control signal. We make output amplification circuit whose main device is OPA544 in order to amplify signal output.
- *Integrated Manage Board*: Integrated manage device mainly consists of two FPGAs and private USB control device. FPGA device is ALTERA Cyclone series EP1C12Q24017 and private USB control device is EZ-USB FX2 series cy7c68013.
- *Single DSP Processing Board*: Single DSP processing board performs main signal processing operation. We use DSP device as TMS320C6713 DSP.

It provides performing eight 32bit operations per one cycle, 32/64bit data word, 200MHz operation clock, 1800MIPS/1350MFLOPS operation, abundant combining function, C/C++ compiler which optimization function is strong.

#### 3.2 FPGA-DSP, DSP-DSP COMBINING STRUCTURE

It uses HPI, EMIF, and McBSP which is provided in TMS series digit signal processor. We use HPI (Host Port Interface, 960Mbps) as interface that let external host device access memory or register of DSP and use EMIF (External Memory Interface, 400Mbps) as interface that let DSP access and perform input and output of data. McBSP (Multichannel Buffered Serial Port, 70Mbps) is used as serial communication interface between DSP-DSP or DSP-external peripheral device [1] [2].



Fig.1. System Architecture

We design combining structure as follows to let FPGA manage data and arrange processing data to DSP according to external motivation in Fig.2. FPGA use HPI, EMIF which has high transfer speed in FPGA-DSP data transfer and McBSP in DSP-DSP data transfer because it arranges maximum 10MHz clock external data to DSP. As FPGA manage and arrange data, it uses HPI in data input DSP and EMIF in output. As avoiding complexity of device design, we use combination possibility of buses. Also to avoid difficulty in device design the control signal is designed in CPLD of FPGA and DSP board.

## 4. PARALLEL PROCESSING AND CONTINUOUS PROCESSING VARIANT LENGTH FRAME IN MULTI-DSP SYSTEM



Fig.2. Combining Structure of FPGA-DSP, DSP-DSP in multi DSPs

### 4.1 SEQUENCE PARALLEL PROCESSING

Every DSP requires synchronizing signal determining algorithm start point in order to process real-time variant length continuous frame. FPGA receives synchronizing signal from external, this synchronizing signal is generated as interrupt signal by DSP. In order to enhance CPU usage in DSP data I/O is supported by DMA. In system architecture, HPI and EMIF which supports high-speed transmitting in internal peripherals are used. We used DMA controller which configures 16 transmit channel in TMS320C6713. HPI uses DMA implicitly [1]. From this we suggest a processing method which can sequence parallel process variant length continuous frame. We can input data as HPI transmitting of FPGA and DSP and output data as EMIF transferring of FPGA and DSP, each DSP performs same code. At this time, FPGA send input frame to every DSP in DSP1-DSP2-DSP3-DSP1-DSP2-DSP3-order and starts process start motivation. Meanwhile DSP receive synchronization signal and performs signal processing consequently. The output using EMIF is performed using DMA in DSP. All data input and output transfer is performed through DMA and processing of DSP can be served in signal processing.



Fig.3. Schematic picture of parallelization process mode by parallelization of data and function

Data parallelization is parallelization of data division and has advantage of configuring parallelization process simply. We must combine parallelization of data and functionality to implement real time process. Therefore we propose a processing method which join data and function parallelization, perform processing of variant length continuous frame as shown in Fig.3.

In processing configuration FPGA manage data I/O through HPI, EMIF of DSP. DSP1, 2 receive input data form FPGA thorough HPI alternatively and perform data parallelization algorithm. Output is transferred to DSP0 through McBSP. DSP0 receives the result from DSP1, 2 and performs algorithm of functionality parallelization. Also it can receive input data through HPI and perform other operation. Result is outputted to FPGA through EMIF. All data I/O in DSP is controlled by DMA controller so it makes maximum spare time to implement signal processing. The specific mechanism is as follows. First, FPGA combine various input data from device according to external motivation and make array of input data and buff data in FIFO. Second, FPGA build data array from internal FIFO to DSP and transmits data in sequence DSP1-DSP2-DSP3-DSP1 with HPI transmit protocol. The variant length frame is sent to DSP 1, 2, 3 and which is sent to DSP0 is input data which is used additional operation. Third, DSP1, 2, 3 receive data which is come to HPI in double buff mode and performs signal process and sent to DSP0. DSP0 receive input data passed HPI and MCBSP and performs function-parallelization operation, all results is outputted to EMIF. FPGA buffs data from EMIF in internal FIFO and divides according to data format and puts various external devices.

#### 4.2 TRANSMITTING AND RECEIVING VARIANT LENGTH DATA BY COMBINING EDMA AND MCBSP

EDMA (Enhanced Direct Memory Access) of C671x series DSP device supports 16 channels and priority as well as link of data transmitting and chain link. [1] It is impossible to transmit data between addressable memory areas by EDMA. Otherwise McBSP (Multi Channel Buffer Serial Port) supporting various serial data transfer, have a function that transmit data by combining EDMA [1]. The 14th channel in 16 channels supporting EDMA in C671X is used in receiving event of McBSP0. One important function of EDMA is link type EDMA transmitting. Link-type EDMA function is used Ping-Pong buffer, cycle buffer process which transmit data without CPU in various data system. When finishing transmitting of one session by link EDMA, current transmit parameter is reloaded as parameters given as 16bit link address. We performed transmitting and receiving of variant length data by McBSP, using link EDMA function supported by EDMA. The format of variant length data is shown in Table.1.

Table.1. Format of variant length frame in DSP

| N1 (length2) | Data 1 | Data 2 |  | Data N1 | N2 (length 2) | Data 1 | Data 2 |  | Data N2 |
|--------------|--------|--------|--|---------|---------------|--------|--------|--|---------|
|--------------|--------|--------|--|---------|---------------|--------|--------|--|---------|

As Table.1 shows, first we transmit length of data and then transmit fixed number data. It repeats this processing. We propose this method which combines EDMA and McBSP in order to perform transmitting of variant length data without CPU Fig.4.

McBSP receives length of data which will be received through first transmitting. The received value by EDMA setting corresponding McBSP channel is copied to memory area that stores number of element in EDMA setting stored area. When finishing EDMA transmitting, EDMA transmitting starts to receive data. The number of data that will be received is set by previous EDMA transmitting. When finishing receiving fixed number data, EDMA transmitting is performed to obtain number of data that will be received. Repeating this process, variant length format data is received correctly without CPU. Therefore CPU is performing only signal processing algorithm.

## 5. EXPERIMENTAL RESULTS OF MULTI-DSP SYSTEM

This device was applied to weather survey radar signal processing. We modify the period of variant length continuous frame 0.7ms, 1.25ms and 2.5ms respectively. At this time FPGA input data, driving in chip select signal of HPI from DSP1 to DSP3 consequently. The maximum run time analysis in DSP is shown in Table.2.



EDMA setting for McBSP receive (13): Length receive

Fig.4. Data transmitting method combining EDMA and McBSP

Table.2. Run time analysis in DSP1, 2, 3 in sequence parallel processing method

| Period of<br>trigger<br>impulse(ms) | Process time<br>per frame<br>(ms) | EMIF<br>transfer(output)<br>Time (ms) | Assigned<br>time(ms) |  |
|-------------------------------------|-----------------------------------|---------------------------------------|----------------------|--|
| 2.5                                 | 6.99                              | 0.27                                  | 7.5                  |  |
| 1.25                                | 3.50                              | 0.15                                  | 3.75                 |  |
| 0.67                                | 1.83                              | 0.09                                  | 2.01                 |  |

Let input time per frame of continuous data T1, process time in a DSP T2. Because all input and output operation depend DMA time assigned signal processing in DSP is from the start of first frame to start of next frame assuming we use data memory space from first frame until algorithm is completed. If T2 is less than three times T1, complete processing on continuous data input is possible, i.e. system performance is better three times than using one processor. DSP0 performs above frames from DSP1, 2, 3.

Table.3. Run time analysis in DSP0

| Trigger<br>impulse<br>period<br>(ms) | Process<br>1 time<br>(ms) | McBSP<br>transfer<br>time<br>(ms) | Process<br>2 time<br>(ms) | EMIF<br>transfer<br>time<br>(ms) | Assigned<br>time<br>(ms) |  |
|--------------------------------------|---------------------------|-----------------------------------|---------------------------|----------------------------------|--------------------------|--|
| 2.5                                  | 0.16                      | 0.7                               | 0.45                      | 0.13                             | 2.5                      |  |
| 1.25                                 | 0.16                      | 0.34                              | 0.23                      | 0.07                             | 1.25                     |  |
| 0.67                                 | 0.16                      | 0.18                              | 0.12                      | 0.04                             | 0.67                     |  |

### 6. CONCLUSION

In this paper we configured multi-DSP system which is combined with external device and DSP on centred FPGA so improve various parallelization process of variant length continuous frame which is inputted high speed, functionality of high speed data I/O with external device and process speed of DSP system. We focus various parallelization operation method of multi-DSP on based data and function parallelization for realtime digit signal process. This device has large scalability and can be used in many digital radar signal processing system, sonar and image processing system.

### REFERENCES

- [1] TMS320C6000 User's Guide, Available at: http://www.ti.com/lit/ug/spru303b/spru303b.pdf.
- [2] C. Victor and Chen Hao Ling, "*Time-Frequency Transforms for Radar Imaging and Signal Analysis*", Artech House, 2002.
- [3] Wei Wu et al., "Design methods of Multi-DSP Parallel Processing System", *Proceedings of World Congress on Computer Science and Information Engineering*, Vol. 3, pp. 458-464, 2009.
- [4] Fan Xikun et al., "Real-Time Implementation of Airborne Radar Space-Time Adaptive Processing on Multi-DSP System", *Proceedings of IEEE Conference on Radar*, pp. 481-486, 2006.

- [5] Mukul Khandelia et al., "Contention-Conscious Transaction Ordering in Multiprocessor DSP Systems", *IEEE Transactions on Signal Processing*, Vol. 54, No. 2, 2006.
- [6] Yi-Hsuan Lee et al., "A Two-Level Scheduling Method: An Effective Parallelizing Technique for Uniform Nested Loops on a DSP Multiprocessor", *Journal of Systems and Software*, Vol. 75, No. 1, pp. 155-170, 2005.
- [7] T. Lothar et al., "Performance Analysis of Multiprocessor DSPs: A Stream-Oriented Component Model", *IEEE Signal Processing Magazine*, Vol. 22, No. 3, pp. 38-46, 2005.
- [8] Mao Hai-Cen, et al., "A Flexible DSP-Based Network for Real-Time Image-Processing", *Wuhan University Journal* of Natural Sciences, Vol. 9, No. 6, pp. 921-926, 2004.
- [9] Xiang Hong, "Parallel Implementation of High Resolution Radar Signal Processing System Based On Multi-IC Architecture", *Proceedings of IEEE Conference on Radar*, pp. 812-815, 2013.
- [10] Zhang Huixin, He Qi Liusuhua and Yang Haiguang, "The Design for LVDS High speed Data Acquisition and Transmission System based on FPGA", *Proceedings of IEEE Conference on Radar*, pp. 383-386, 2011.
- [11] Yuan Changshun et al., "A Novel Design of Parallel and High-Speed Signal Processor Architecture for PD Radar", *Proceedings of IEEE Conference on Radar*, pp. 551-556, 2013.
- [12] Man Li, et al., "Research on Parallel Debugger in Bus-Based Multi-DSP System in Radar Data Processing", *Proceedings* of *IEEE Conference on Radar*, pp. 236-241, 2013.