Increasing Performance by Sizing
Buffers Dynamically
Figure 4 below compares a
static buffer per port scheme with a Dynamic Scheme on a switch which
is configured with three differing port widths. Since the smaller width
ports require less bandwidth than the wider ports, they should require
fewer packet buffers as well.
 |
| Figure
4. Dynamic allocation allows more appropriate buffer sizing |
In this example, a x8 upstream port is servicing three downstream
ports, one a single x1 port, one a x4 port and third one a x8 port.
With a static fixed buffer per port architecture, the x1 port is
allowed the same buffer size as the x8 ports. Not only is this not the
optimal buffer assignment, but there are two unused groups of packet
buffers.
With Dynamic Allocation, buffers are assigned as needed to each port
based on the width of each port. Since there are no unused buffers, a
larger total amount of buffer is available, increasing the size of
buffer that may be applied in the ports that need the extra bandwidth.
In this example, in the bottom half of Figure 4, ten packet buffers are
allocated to each of the x8 ports, whereas six buffers are given to the
x4 port and four buffers are available for the x1 port. Thus the amount
of buffer available on a given port is dynamically assigned based on
the traffic loading on each port, resulting in higher overall system
performance.
Real-World Implementation of
Dynamic Buffer Allocation
A real-world implementation of Dynamic Allocation can be seen in Figure 5 below. Here, a 24-lane PCIe
Gen2 switch is configured with a x8 upstream port, a x8 downstream port
and two x4 downstream ports.
 |
| Figure
5. Dynamic allocation using a 24 lane switch |
This switch's configuration has been set up by the user with
assigned buffer space for each port and an uncommitted common (or
shared) buffer pool per 16 lanes. The buffers have been assigned
proportional to the port width, i.e., the x8 ports each have 10 packet
buffers, the x4 ports four each.
A common buffer memory pool is set up with five buffer packets for
each of the 16 downstream lanes. Each of the ports may dynamically grab
buffers as needed to support its own traffic bandwidth.
For example, a port may grab buffers when its assigned buffer
memories are full; conversely, a port may return buffers to the pool
when they are empty. This dynamic reallocation has two benefits in
switch design: it makes full use of the buffer memory on-chip and it
requires less overall memory to achieve optimal performance.
Port Flexibility Improves
Performance, Simplifies Layout
In the previous generations of PCIe switches, one port was fixed as the
upstream port while all other ports were defined as downstream, with
severely limited lane count/port count combinations.
A new wave of PCIe Gen 2 switches now offers flexible and versatile
port configuration schemes, with ports configurable as x1, x2, x4, x8,
and x16 for maximum port bandwidth ranging from 250MB/s (x1 port, Gen 1
signaling) to 8GB/s (x16 port, Gen 2 signaling), with several intervals
in between. This means it is easier to optimize lane bandwidth and
power dissipation and port layout trace-width from port to port.
In addition, these new switches support auto-negotiation of the port
width, reducing the number of lanes that are active in a port down to
match endpoints that are connected. For example, if a NIC with a x4
port is connected to a x8 (or x16) port on the switch, the switch will
automatically reduce the number of active lanes for that port down to a
x4 configuration.
Selectable Upstream Port Simplifies
High Performance Layout
These newer switches also support a moveable upstream port. Any port,
in fact, can be defined as the upstream port in these devices. This can
be optimized to meet the needs of the traffic through each port of the
switch.
Additionally, the layout of a system board is enhanced by this
flexible upstream port assignment. Figure
6 below illustrates how, in a storage application, a flexible
upstream port assignment allows spreading of high-speed traces evenly
on a system board with a 16-lane switch configured with one four-lane
upstream (US) port and three x4 downstream (DS) ports. The system on
the left uses a switch with a fixed US port.
 |
| Figure
6. Port flexibility enhances board layout |
The fixed US port creates severe trace congestion since the DS ports
are required to route through the SATA connectors, creating an
undesirable crosstalk environment. The photo on the left shows the same
system with a switch that has a flexible US port. This flexibility
allows the layout designer to avoid routing the high-speed PCIe lanes
through the equally powerful SATA2 data paths, thus reducing crosstalk,
enhancing signal integrity and improving transmission margin.
Dual Cast
In addition to balancing bandwidth and improved buffer allocation,
these new switches also support Dual Cast, a feature that allows for
the copying of data packets from one ingress port to two egress ports,
allowing for higher performance in dual-graphics, storage, security,
and redundant applications.
 |
| Figure
7. Dual cast fiber channel HBA |
Without Dual Cast, the CPU must generate twice the number of
packets, requiring twice the processing power. Figure 7 above illustrates a
redundant storage array, where a PCIe Gen2 switch uses
Dual Cast to store data on two RAID disk arrays. Additionally, the same
card can be used for non-redundant applications
Summary
This new generation of PCIe switches supports Gen2 signaling, doubling
the throughput per lane of the previous devices. Furthermore, new
data-flow architectures are being deployed in these switches to
optimize the bandwidth and memory utilization while minimize latency
and power dissipation. Each of these features makes significant
contributions to dramatic improvements in system and I/O performance in
embedded systems.
Steve Moore is senior product marketing manager at PLX Technology, Sunnyvale, Calif. He can be reached at
smoore@plxtech.com.