# IMPACT OF BUFFER SIZE ON PQRS AND D-PQRS SCHEDULING ALGORITHMS

N. Narayanan Prasanth<sup>1</sup>, Kannan Balasubramanian<sup>2</sup> and R. Chithra Devi<sup>3</sup>

<sup>1</sup>Department of Information Technology, National College of Engineering, India E-mail: nnprcd@gmail.com <sup>2</sup>Department of Computer Science and Engineering, Mepco Schlenk Engineering College, India E-mail: kannanbala@mepcoeng.ac.in

<sup>3</sup>Department of Information Technology, Dr. Sivanthi Aditanar College of Engineering, India E-mail: chitra\_rajan2001@yahoo.co.in

#### Abstract

Most of the internet applications required high speed internet connectivity. Crosspoint Buffered Switches are widely used switching architectures and designing a scheduling algorithm is a major challenge. PQRS and D-PQRS are the two most successful schedulers used in Crosspoint Buffered Switches under unicast traffic. In this paper, we analysed the performance of PQRS and DPQRS algorithms by varying the crosspoint buffer size. Simulation result shows the delay performance of the switch increases if the size of the buffer increases.

#### Keywords:

Crosspoint Buffered Switch, Scheduling Algorithm, Unicast Traffic, Throughput and Delay Performance

## **1. INTRODUCTION**

To attain high speed internet connectivity, crossbar fabric is used in switches and router implementations. Input and Output queued switches have their own limitations such as input-output port contention and speedup requirements. Combined Input Crosspoint Queued switch is complex to implement [3] which leads to automatic choice of Buffered Crossbar Switches (BCS). BCS is the commonly used switching architecture among different crossbar switches because of its simplicity and internal non-blocking capabilities [1], [2]. Buffer located in the crosspoint reduces scheduling overhead; Head of Line (HOL) cell blocking and input-output contention of a switch thereby improves throughput and delay performance [3], [4]. It is further improved by introducing virtual output queue (VOQ) in the BCS which completely eliminates HOL blocking. BCS permit their input and output ports to take scheduling decisions independently which avoids the need for centralized scheduler [3].

Designing a scheduling algorithm for BCS have received significant research attentions. Every BCS has a buffer at the crosspoint and its size is based on the employed scheduling algorithm. Many algorithms offer good performance with buffer of size one cell length but recent research focused on variable sized buffers. In [5], an ideal throughput is achieved for different buffer lengths under uniform Bernoulli i.i.d and non-uniform logdiagonal traffic patterns. In [6], a mathematical model for 2x2 BCS is proposed for larger crosspoint buffers, results in improved throughput and delay performance. In [7], simulation shows that RR-RR scheduler cannot be sufficient to provide 100% throughput with small buffers unless some speedup is introduced. A 32 x 32 switch with buffer capable of holding up to 1000 cells is implemented [3] to provide high performance. This encouraged us to analyse the Prioritized Queue with Round-robin Scheduler (PQRS) [8] and Delay based Prioritized Queue with Round-robin Scheduler (D-PQRS) [9] with broad range of buffer lengths. In this paper, we analysed the throughput and delay performance of PQRS and D-PQRS scheduling algorithms by varying the buffer size 1, 2, 4, 16, 64, 256 and 512. Section 2 discusses about PQRS and D-PQRS scheduling algorithms. Section 3 shows the performance of these algorithms with different buffer sizes and section 4 concludes the paper.



Fig.1. Buffered Crossbar Switch with Virtual Output Queue

### 2. BUFFERED CROSSBAR SWITCH

The buffered crossbar switch with virtual output queue is shown in Fig.1. BCS holds buffer in the switch fabric rather than in the line cards which means the switch and buffer implemented in a single chip thereby reducing the implementation cost. At each timeslot, BCS requires two schedulers to switch a cell namely Arrival and Departure Schedule. Arrival Schedule selects a cell from the HOL of a queue and placed it in an empty crosspoint buffer and in parallel Departure Schedule selects a cell from a non-empty buffer transferred to output through respective port [8, 9]. At each timeslot, based on the employed scheduling algorithm, a cell is scheduled from VOQ to crosspoint buffer and from buffer to output port. Amount of cells stored in the crosspoint buffer is based on its size and it is practically viable to implement multisized buffer in the crosspoint of a switch.

### 2.1 SCHEDULING ALGORITHMS

PQRS uses Priority Queue Scheduler as input schedule and Round-robin algorithm as output schedule [8]. It is designed and simulated under Bernoulli non-uniform bursty and i.i.d. traffic with buffer size 1. Simulation result shows the difference between minimum and maximum average waiting time is less than 1ms thereby the algorithm considerable reduces starvation. D-PQRS uses Delay based Priority Queue Scheduler as input schedule and Round-robin algorithm as output schedule. Simulation result shows that D-PQRS outperforms PQRS and LQF-RR algorithms in achieving high throughput with minimum delay and starvation effect [9].

#### 2.2 BUFFER SIZING

The process of placing a buffer in the crosspoint of BCS/VOQ is termed as Buffering. It avoids input-output contention problem by distributing the scheduling schemes. Considering the hardware limitations and switch performance, it is difficult to finalize the buffer size. Buffers with ideal sizes can prevent the switch from packet overflowing. Normally buffer size of 1 cell size is used by the researchers for stable performance. In certain cases, researchers increased the buffer size to further increase the switch performance which resulted in high implementation cost [6].

Propagation Delay, Queueing Delay and Transmission Delay are the three components involved in end to end latency of a moving packet [10]. Among the three components, queuing delay is the only variable component which is controlled by buffer sizing. A correct sized buffer will considerably reduce the implementation and operational complexity of a switch. Therefore to identify the appropriate buffer size, the performance of PQRS and D-PQRS algorithms are analysed by varying its buffer size.

#### **3. PERFORMANCE ANALYSIS**

Throughput, average cell latency and packet loss are the three parameters which decide the switch performance. Three different traffic patterns such as Uniform Bernoulli traffic, Non-uniform Bernoulli i.i.d. traffic and Non-uniform Bernoulli Bursty traffic are used to analyze the 4×4 switch performance. Each simulation is conducted with one million timeslots of load ranging from probabilities p = 0.1 to 1. At each traffic patterns, performance of the switch is noted for different buffer sizes such as 1, 2, 4, 16, 64, 256 and 512.

### 3.1 THROUGHPUT ANALYSIS

In each simulation, throughput of a switch is defined as the ratio of the cumulative number of cells entering it successfully by the cumulative number of cell arrived. Throughput is observed for both the algorithms with different buffer size under uniform and non-uniform traffic patterns. In Fig.2, throughput as a function of buffer length under Uniform Bernoulli traffic BCS provides a minimum throughput of 98% for buffer sizes 1 to 4 and maximum throughput of 100% for buffer size greater than 64. Variation in throughput between PQRS and DPQRS is less than 2% for all buffer sizes. Throughout the simulation, switch uses the same load therefore practically it is understood that the larger buffer offers more throughput than the smaller one. The Fig.3 depicts the throughput performance of switch under Bernoulli non-uniform i.i.d. traffic. PQRS and DPQRS achieves a minimum of 90% and 94% respectively for buffer size 1 and achieves a maximum of 96% and 92% for buffer size 2 and 4. There is no further improvement in the throughput performance even if the buffer size is increased from 16 to 512. This is because the load used to analyze the performance of the switch is same irrespective to buffer sizes. As a result, an average 3-4% increase in throughput is achieved by DPQRS and PQRS when buffer size is increased

as 2 and 4. It will be very much interesting to analyse the algorithms with constant increase in load structure.



Fig.2. Throughput as a function of buffer length under Bernoulli Uniform Traffic



Fig.3. Throughput as a function of buffer length under Bernoulli Non-uniform I.I.D. Traffic

The Fig.4 shows the throughput performance of both the algorithms under Bernoulli non-uniform bursty traffic. For buffer size 1, 94% and 93% of throughput is achieved by D-PQRS and PQRS respectively. A less than 1% of throughput difference is noted for both the algorithms. Throughput gets increased by 1% if buffer size is increased by 4 and no further performance improvement for buffer sizes 16 to 512. Therefore, for non-uniform bursty traffic, outcome of the switch is mere similar irrespective to buffer sizes.



Fig.4. Throughput as a function of buffer length under Bernoulli Non-uniform Bursty Traffic

Behaviour of the switch is depends on the employed scheduling algorithm and its traffic patterns. From the simulation results, it is understood that the buffer size has a little influence over the scheduling algorithms under both uniform and nonuniform i.i.d. traffic. But for bursty traffic, there is no impact on the schedulers.

#### 3.2 AVERAGE CELL LATENCY

In each simulation, average cell latency is defined as the average waiting time of cells traversing through the input and output port of the switch. In all the simulation experiments, average cell latency is expressed in terms of milliseconds (ms). The Fig.5 shows the average cell latency of a switch for both the algorithms under Bernoulli uniform traffic. Less than 5ms of delay is noted for buffer size less than 16 and for buffer size greater than 64, switch operates without delay. This is because the load structure used for different buffer size is same and therefore switch with greater buffer size can operate without delay under uniform traffic.

Average Cell Latency of the switch under Bernoulli nonuniform I.I.D. traffic and Bernoulli non-uniform traffic are depicted in Fig.6 and Fig.7 respectively. For I.I.D. traffic, an average delay of 20-25ms is measured for both the algorithms with buffer size less than 4 and an average delay of 10-15ms is measured with buffer size greater than 4. An average difference of 5% delay is noted between DPRS and PQRS algorithms. Also it is clear that an increase in buffer size will reduce the cell delay by 10ms. For bursty traffic, average delay ranges from 22 to 32ms for PQRS and from 12 to 22ms for DPQRS and a 10ms difference is noted between them. Throughout the simulation, larger buffers hold less delay than shorter buffers because we use the same load structures. Switch with small buffer has long waiting time for the cells in the VOQ and therefore it offers more delay.



Fig.5. Average Cell Latency as a function of buffer length under Bernoulli uniform Traffic



Fig.6. Average Cell Latency as a function of buffer length under Bernoulli Non-uniform I.I.D. Traffic



Fig.7. Average Cell Latency as a function of buffer length under Bernoulli Non-uniform Bursty Traffic

# 4. CONCLUSION

The paper analysed the influence of buffer size in PQRS and DPQRS algorithms. Throughput performance of the switch is increased by 2% when large sized buffers are used. Under uniform traffic, average cell latency is null for buffer size greater than 16. Under non-uniform traffic patterns, an increase in 10ms is noted for larger buffers compared to smaller ones. For different load structures, delay analysis with different buffer sizes would be interesting. Larger sized buffers have a significant impact in case of average cell latency rather than throughput performance.

## REFERENCES

- [1] Deng Pan, Zhenyu Yang, Kia Makki and Niki Pissinou, "Providing Performance Guarantees for Buffered Crossbar Switches without Speedup", Proceedings of 6<sup>th</sup> International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness, pp. 297-314, 2009.
- [2] Milutin Radonjic and Igor Radusinovic, "Buffer Length Impact to 32×32 Crosspoint Queued Crossbar Switch Performance", *Proceedings of IEEE Symposium on Computers and Communications*, pp. 954-959, 2010.
- [3] Alex Kesselman, Kirill Kogan and Michael Segal, "Best Effort and Priority Queuing Policies for Buffered Crossbar Switches", *Chicago Journal of Theoretical Computer Science*, Vol. 2012, No. 5, pp. 1-14, 2012.

- [4] Alex Kesselman, Kirill Kogan and Michael Segal, "Packet Mode and QoS Algorithms for Buffered Crossbar Switches with FIFO Queuing", *Distributed Computing*, Vol. 23, No. 3, pp. 163-175, 2010.
- [5] Y. Kanizo, D. Hay and I. Keslassy, "The Crosspoint-Queued Switch", *Proceedings of IEEE Conference on Computer Communications*, pp. 729-737, 2000.
- [6] Jelena Cvorovic, Igor Radusinovic and Milutin Radonjic, "Buffering in Crosspoint-Queued Switch", *Proceedings of* 17<sup>th</sup> Telecommunications forum TELFOR, pp.198-201, 2009.
- [7] R. Rojas-Cessa, E. Oki and H. J. Chao, "CIXOB-K: Combined Input-Crosspoint-Output Buffered Packet Switch", *Proceedings of IEEE Global Telecommunications Conference*, Vol. 4, pp. 2654-2660, 2001.
- [8] N. Narayanan Prasanth, Kannan Balasubramanian and R. Chithra Devi, "Prioritized Queue with Round Robin Scheduler for Buffered Crossbar Switches", *ICTACT Journal on Communication Technology*, Vol. 5, No. 1, pp. 890-893, 2014.
- [9] N. Narayanan Prasanth, Kannan Balasubramanian and R. Chithra Devi, "Starvation Free Scheduler for Buffered Crossbar Switches", *International Journal of Engineering*, Vol. 28, No. 4, pp. 523-528, 2015.
- [10] Yashar Ganjali Gavgani, "Buffer Sizing in Internet Routers", Ph.D dissertation, Department of Computer Science, Stanford University, 2007.