

# Interprise<sup>TM</sup> Family of Integrated Communications Processors Interprise<sup>TM</sup> PCI RC32438 Processor: Technology and Product Backgrounder

This backgrounder offers a technical perspective for the IDT<sup>TM</sup> Interprise<sup>TM</sup> RC32438 integrated communications processor, the newest member of the Interprise<sup>TM</sup> family of integrated communications processors. It describes the product's architectural underpinnings and how they offer engineers an optimized combination of features leading to lower-cost, higher-performing communications system designs.

# Market Environment

Every distinct category of electronic components is preceded by a need that evokes it, and this is no less true for integrated communications processors. As digital data networking has evolved, the ingenuity of semiconductor designers and the economies afforded by innovative integration have been the enablers that paved the way to continuous improvements in communications systems' price/performance ratios.

Digital data networking innovations – bridging, routing, and switching –are the foundation of today's fast, efficient, robust enterprise and public networks. And the innovative semiconductor devices developed to enable them have been critical to their success. But the need for greater efficiencies and speeds continues unabated, and the development of more refined and innovative enabling devices goes on. It is in this context that one should examine the new IDT RC32438 integrated communications processor. The choices of integrated subsystems and their relationships to one another makes the most sense when viewed with an eye to changing network needs.

# **Changing Needs, Changing Features**

Even with the advantages provided by bridging, switching and routing, the original Ethernet Base10T standard (e.g. maximum bandwidth of 10 Mb/sec) has proven to be woefully inadequate, as networks have grown larger and more complex. As a result, new Ethernet standards have emerged that provide quantum leaps in bandwidth. The Ethernet Base100T standard supports 100 Mbps bandwidth; and Gigabit Ethernet (GbE) handles 1 Gbps. By ensuring that fundamental compatibilities remained unchanged, the framers of these standards made it easy for network planners and implementers to upgrade, as necessary, to maintain adequate levels of performance.

At the same time, though, the semiconductor components that support switches for the higher performance Ethernet standards have had to reflect those increases in speed and bandwidth so as not to become critical-path limiting factors. With regard to integrated processors, it means that the role of the embedded core processor has changed. In addition to tasks it had to perform at Ethernet Base100T speeds, it has to do them ten times faster at GbE speeds.

Furthermore, the embedded processor's role in Layer-2 (MAC) switching becomes more complex as the device takes on the higher-layer IP statistics and routing table chores associated with Layer-3 (IP) switching. The integrated communications processor's relationship to the external buffer memory subsystem has to ensure that adequate memory bandwidth is available to support the much-increased data transfer speed. And, as more and diverse peripheral components are integrated into the communications system mix, the integrated communications processor has to



optimize PCI throughput to avoid adding unnecessary latency to a system where every microsecond counts.

## **Processor Speed and System Speed**

As we learned so clearly with personal computers, processor and systems speeds are not linearly dependent. As microprocessor speeds increased, with no change in memory characteristics, we ended up with increases in "wait" states rather than increases in system throughput. The same is true for communications systems. As embedded core processor speeds are increased, the system speed gains may be illusory unless they are accompanied by concomitant changes in other factors. We gain improvements in system speed by virtue of increased processor speeds and changes to other aspects of the architecture.

## Wireless Networks, SANs and Gateways

In addition to changes in performance, networks are evolving along diverse functional paths. Wireless access, for example, enables user systems to be moved around within a facility without requiring changes in network wiring and port addressing. Wireless access points (WAPs) form the hub of such network adjuncts, and as such must accommodate an increasingly diverse mix of attached peripherals. Thus, advanced WAP design depends upon optimal PCI throughput to make sure the advantage of location flexibility is not offset by disadvantages from increased system latency. WAP designs must also accommodate the higher-speed Ethernet standards, and thus demand the increased memory bandwidth of their wired switching counterparts.

Storage area networks (SANs) use network technology to make file storage and retrieval more simple and efficient despite increasingly more complex physical storage configurations. Enterprise gateways do the same for increasingly common interplay of WANs, LANs, and the Internet. In both cases (SANs and gateways), there is increased pressure on both PCI and memory bandwidth.

## IDT Interprise™ PCI RC32438 Integrated Communications Processor

At the highest level, the IDT<sup>TM</sup> Interprise<sup>TM</sup> PCI RC32438 integrated communications processor looks very similar popular family of IDT RC3233x integrated processors. All of these devices feature a 32-bit MIPS instruction set architecture (ISA) core, a memory controller, and a 32-bit PCI v2.2- compatible interface.

The similarity is by design because the balanced architecture of the earlier IDT integrated communications processors is in great part responsible for the company's leadership in the managed Layer-2 Ethernet switch area, having a presence in products from three of the five leading Ethernet switch vendors. However, the RC32438 device drives performance in the core, memory controller, and PCI interface, through its additional innovative features.

As noted in Figure 1 below, the RC32438 device features the 32-bit 4Kc CPU core from MIPS Technologies, a high-performance double data rate (DDR) memory controller, and a v2.2 PCI interface. In addition to these modules, the RC32438 processor also includes two Ethernet interfaces and a number of general-purpose peripherals required by embedded applications. All of these I/O peripherals communicate to the CPU core and the external memory via the on-chip bus, known as the IPBus.

# **OIDT**



Figure 1: RC32438 Block Diagram.

## A Core Difference

The architecture of the IDT Interprise<sup>™</sup> PCI RC32438 integrated communications processor extends the innovative designs of the RC32322 and RC32324 processors. As mentioned above, one of the differences between the RC32438 integrated communications processor and its family predecessors lies within its CPU core. Use of a MIPS CPU core makes it easy for users of one of the RC3233x devices to create upgraded platforms and applications while taking advantage of their code and experience investments. The RC32438 device's CPU core is able to handle the added tasks and performance requirements of these more challenging applications. This performance improvement is provided through faster clock speeds (i.e. 200, 233 and 266 MHz versions), and larger (16KB) 4-way set-associative instruction and data caches. The latter reduces the number of primary cache misses, thus reducing the amount of delay associated with those misses. As a result, the new CPU core has the horsepower to easily handle any combination of managed Layer-2 switching chores and more than enough capability to handle the additional tasks imposed by Layer-3 switching requirements.

### The DDR Difference

As was noted earlier, increased processor speed alone does not guarantee increased system throughput. Thus, the IDT designers built in a double data rate (DDR) memory controller. Use of a DDR-based external memory subsystem and controller provides two benefits to the system design, including an immediate doubling of available memory bandwidth compared to an SDRAM-based subsystem of similar width and an overall cost savings due to the wholesale shift by PC makers toward DDR memory. In this regard, a x16 DDR configuration is becoming the "sweet spot" of the PC memory curve.

As the memory controller developed by IDT can support both x16 or x32 memory configurations, a user can select a configuration (e.g. x16) where the money savings are greatest, or one (e.g. x32) for applications where optimal performance requires a wider interface to main memory.

The DDR memory controller subsystem features de-multiplexed address and data buses that are separate from the buses used to connect to local memory and I/O devices. The logic is simple. It



makes no sense to support the bandwidth advantages of DDR only to mitigate those advantages through multiplexed address and data buses.

The architecture between is shown in Figure 2. Importantly, the IPBus operates at one half the speed of the CPU pipeline. The PM Bus operates at CPU pipeline speed. The interface to the IPBus includes a large amount of buffering. The impact is that when an IPBus-based peripheral is bursting at maximum rate to the external memory the CPU core is still able to achieve significant bandwidth to the memory to perform data processing functions.



Figure 2. RC32438 CPU Core Architecture

With the PM bus operating at CPU pipeline speed, and the IP Bus operating at half that speed, the architecture supports full data bursting while permitting the CPU core to maintain a high CPU-tomemory bandwidth.

#### PCI to its Limits ... and Beyond

The peripheral component interconnect (PCI) bus is an industry standard, with its popularity driven by its proliferation in the PC market segment. As a result, a broad range of low-cost, networking, storage and graphics peripherals support the PCI interface. As with processor speeds and system speeds, PCI performance specs do not guarantee that the system design will deliver its maximum throughput, as it is dependent upon the way other design elements relate to PCI.

The designers at IDT have ensured that the RC32438 integrated communications processor takes full advantage of PCI v2.2. The interface supports the bus-arbitration logic to control the way six external bus masters can request and be granted ownership of the bus. Specifically, the arbiter allows these external devices to access the bus in a fixed or a rotating priority scheme. That way, system designers can "tune" PCI performance to match the actual bus access and data traffic patterns of their specific systems and peripherals. For added flexibility, the RC32438 processor offers host and satellite PCI mode support, and synchronous or asynchronous PCI clock domains.

The RC32438 processor's PCI bus interface is optimized for bursting data and designed to provide 160 MB/s sustained throughput. Innovative use of the four 256-byte registers and data



buffering permits this continuous bursting even while the CPU is frequently grabbing the data it needs. Additional information about the PCI optimization of IDT's RC32438 processor can be found in the IDT white paper, *Getting The Most Out Of PCI*.

#### **Ethernet On-Chip**

One could offer an integrated communications processor without on-chip Ethernet and leave it to the system designer to hang Ethernet ports on the PCI bus. However, while it makes the integrated communications processor easier to design and produce, it adds to the system bill of materials, and degrades overall performance by introducing inter-chip and bus delays. For this reason, IDT elected to put two Ethernet MII interfaces on this new communications processor.

With the on-chip Ethernet interfaces, the IDT processor creates less PCI bus traffic than a design where external Ethernet ports are required. Additionally, overall system performance improves as the Ethernet ports leverage on-chip resources such as dedicated direct memory access (DMA) channels. There are two, 512-byte transmit/receive FIFOs supporting each MAC address, and the RC32438 processor includes address recognition logic for checking incoming addresses. The net result is the system designer has less to do, fewer parts to buy, and gets an efficient on-chip Ethernet capability.

#### **Direct Memory Access**

When DMA was first introduced, it was a near revolution in microcomputer design. As the CPU core did not need to handle every memory transaction, system throughput grew geometrically. Now nearly three decades old, DMA's benefits may be a bit unsung, but its contribution to system throughput cannot be overstated. The RC32438 integrated communications processor features 10, dedicated DMA channels. Two of these are for use by the PCI controller, two are for each of the Ethernet ports, two handle memory-to-memory transfers, and two support transfers between the RC32438 device and intelligent peripherals located on the local bus interface.

The RC32438 processor offers "fly-by" DMA on all 10 channels, and requires no core processor storing or pass-through operations. As a result, the system throughput is enhanced, and the core CPU has the time to handle the tasks it should be doing.

#### **Monitoring System-Level Activity**

As stated earlier, overall system performance is a combination of processor speed in balance with other on-chip design factors. Every embedded application will have a different profile of PCI, Ethernet and memory traffic. The RC32438 device provides an opportunity for the system designer to tune the device to maximize system performance for a specific application. This can be achieved by allowing access via the on-chip in circuit emulator interface to some logic on the device that is monitoring (and recording) the transactions on the IPBus.

At the highest level, the IPBus monitor logic can be thought of as an on-chip logic analyzer that is monitoring the transactions occurring on the on-chip bus. Logically, there are two parts to this:

- 1. On-chip memory, configured as a circular buffer. Transactions are continually recorded in the memory. Once full, the information related to the subsequent bus transaction will overwrite entries sequentially starting with the first entry.
- 2. A series of registers that:



- a. Configure the trigger condition on which a data record "trace" will start and complete, and
- b. Allow an end user to access the data that has been recorded. These registers can be accessed via the EJTAG interface described above or directly by software.

Two types of data records are captured in the memory:

- 1. A clock cycle record is stored during each clock cycle of a transaction
- 2. A summary of a transaction is recorded at the end of each transaction

Upon completion of a specific trace, a bit is set in the interrupt control register that in turn generates an interrupt to the CPU core. The contents of the on-chip memory and several registers of the IPBus monitor module are preserved if the device experiences a warm reset.

In terms of hardware/software integration support, the unique contribution of the IP Bus monitor is a critical success factor in meeting time-to-design and optimal performance requirements.

#### Clocking

The RC32438 processor was also designed to simplify the design of the system clocking tree for the hardware designer. This is accomplished in two ways:

- 1. A series of internal PLLs are used to provide the clock signals needed to drive the various interfaces. This simplifies board layout issues and also helps to reduce EMI emissions as the high frequency circuitry is incorporated inside the device. In fact, despite the CPU core being able to operate at up to 266MHz, the clock signal required to drive the part can be as low as 25MHz.
- 2. The RC32438 device can provide the clock signals for the external buses connected to it, including the DDR bus and the local memory bus. For synchronous PCI bus operation, the local bus can be connected to the PCI clocking tree to drive this interface. This reduces external chip count and again simplifies board layout.

#### Small, Fast, and Low-Power Consuming

The IDT RC32438 integrated communications processor is built using a 0.13-micron CMOS process and uses a six-layer metal interconnect for optimal device density. The CPU core requires a 1.2-volt supply, the DDR memory subsystem uses 2.5 volts, and the I/O ring is designed to operate at 3.3 volts. Worst-case power consumption is limited to 2.5 watts. Offered in a 416-ball BGA package, the RC32438 integrated communications processor is designed to take up minimal size, provide highest performance, and consume minimal power.

## The RC32438 Integrated Communications Processor Development Environment

A processor, by definition, contains some level of programmability. The IDT RC32438 integrated communications processor is an embedded controller, where the control program is often loaded to the processor from a firmware device (e.g. a boot ROM). The process of developing the program and ensuring that it does everything the developer intended for it to do in the context of the system under design is a critical part of the overall system design.

IDT pays particular attention to the development environment for its products. In this case, by choosing the MIPS 4Kc CPU core, IDT ensured that users would have a standard MIPS core



supported by a variety of available third-party tools including compilers and debugging software, as well as a choice in real-time operating systems (RTOS).

There is a wide range of RTOS solutions in the market today. However, there are two that when combined, make up the vast majority of today's embedded communications applications. These are: VxWorks from Wind River Systems; and Linux, available from several software companies. The RC32438 integrated communications processor supports both of these platforms. In addition, a wide range of additional software solutions have already been proven to work in conjunction with this popular CPU core, including ThreadX (Green Hills), Nucleus (Mentor) and Neutrino (QNX).

The RC32438 integrated communications processor's on-chip IPBus monitor logic is a major tool for simplifying and speeding up hardware/software integration. Users have a choice of several third-party in-circuit emulation (ICE) tools to use in conjunction with the IPBus monitor for swift, effective hardware/software integration.

Prior to actual design work, system engineers can evaluate the RC32438 integrated communications processor's design suitability by use of an IDT evaluation board. Similarly, a software developer can begin creating portions of the control program and exercise them on the evaluation board before the actual system-under-design is ready for debug and ICE activities.

#### A Natural Upgrade Path To More Advanced System Designs

There are several trends apparent in today's communications system designs. There is adherence to networking standards, such as TCP/IP and Ethernet, even as system makers strive to increase performance and bandwidth, inevitably leading to higher OSI layer management schemes.

Though today's managed Layer-2 Ethernet switches continue to do a formidable job of keeping network transport efficiencies high, other evolving network requirements are pushing for managed Layer-3 and higher management schemes.

The innovative silicon enablers that have paved the way to today's managed Layer-2 systems have created an indelible architecture that can serve as a point of departure for the silicon enablers to come. The RC32438 integrated communications processor is a perfect case in point. It closely resembles its RC32332 and RC32334 processor counterparts, yet offers innovative nuances that make for a natural upgrade vehicle to Layer-3 switches, enterprise gateways, WAPs and other advanced systems.

## The RC32438 Device Applied

In the block diagrams below, the RC32438 communications processor is shown in the context of a GbE switch, an Enterprise VPN, and a WAP application. In each case, note how the device's features are well suited to the functional and performance demands of these three advanced communications systems.

The GbE switch shown in Figure 3 takes advantage of the RC32438 processor's PCI interface to connect to the switch fabrics. The PCI interfaces acts as the control path between the processor and the switching subsystem, performing switch initialization, statistics and certain routing functions. As most of today's gigabit switch fabrics support a smaller number of LAN connections, the arbitration logic in the PCI block needs to be capable of supporting more PCI bus masters. In the example below, four 12-port switch fabrics are required to construct a 48-port gigabit switch, compared to the two 24-port switch fabrics used for fast Ethernet systems. In



addition, as the LAN speeds increase to gigabit rates, there is a need for a separate management port to control the system, unlike fast Ethernet platforms where one of the LAN ports on the switch is used.



Figure 3: Gigabit Ethernet Application Example

In the Enterprise VPN application depicted in Figure 4, the RC32438 processor's dual Ethernet MIIs offer separate interfaces for the public Internet network and the trusted engineering network.



Figure 4: Enterprise VPN Application Example



The WAP application, shown below in Figure 5, takes advantage of these same features plus the on-chip Ethernet MIIs. The PCI interface supports a variety of connected peripherals and the Ethernet MII supports a 4-port Ethernet hub.



Figure 5: Wireless Access Point Application Example

Instead of reinventing the wheel, IDT has built upon the foundation established by the earlier members of its integrated communications processor family and added and refined those that made the new device an ideal candidate for higher-performance and more complex applications.

In the process, every design decision was predicated on preserving the code and experience equity of system designers that have used the earlier products, and embellishing the development environment to add even more third-party resources to the mix.

###