The high-performance RX CPU architecture -Combining RISC and CISC elements to meet impressive design goals The design goals we established for the MCUs in the RX family were very aggressive, to say the least. Using existing Renesas MCUs as a reference, our basic R&D objectives included - Five times the maximum operating frequency
- Twice the processing performance (in terms of MIPS/MHz),
- A 30% increase in code efficiency
- A 2/3 cut in power consumption
Quantitative goals included - A maximum operating frequency of 200MHz
- Processing performance of 1.65MIPS/MHz
- A CPU current of 0.03mA/MHz
- More flash memory capacity: up to 4MB.
These design goals have been met. To achieve these improved levels of performance and capability, a different approach to the architecture was required. Neither a conventional RISC nor a conventional CISC approach would suffice. The innovative Renesas solution was to combine the advantages of the high-speed RISC CPU designs refined in our SuperH family with those of the flexible, code-efficient CISC designs nurtured in our H8S/H8SX and the M16C/R32C families. Specifically, to create a next-generation CPU architecture, the RX design team added the general-purpose register machine Harvard architecture and 5-stage pipeline of RISC to the byte variable-length instructions of CISC (see Figure 2-1, below). This clever design approach was made possible by building on decades of MCU design experience and applying a large library of accumulated IP. Fig. 2-1. Basic Features of RX CPU Architecture 
| The process of developing the RX architecture obviously involved many steps. A very important one was to find the best combination of elements of RISC and CISC CPUs. To do that, the design team conducted benchmark tests using application software for each of the key target markets for the devices, especially the office automation, consumer, industrial, and automotive fields. After analyzing the test data, the engineers identified the optimal solution for performance and code efficiency, and then applied the appropriate architectural elements to the RX CPU. A major design decision was to increase the number of 32-bit general-purpose registers to 16. This achieved a nice balance between overhead and performance (see Figure 2-2, below). Fig. 2-2. Optimization of register number ・The use of general-purpose register offers better performance for both arithmetic and control operations ・With only 8 registers, the performance drops and code size increases due to frequent occurrence of save/restore processing in the register ・Hardware volume and specified bit number within the instruction code rises along with the increase of register number 
| In the area of basic instructions and the addressing modes, both the number of instructions and code size were reduced. The RX design team accomplished this by extracting frequently used instructions/addressing modes and assigning them to an abbreviated format that combined the strengths of RISC and CISC (see Figure 2-3, below). Also, the engineers enhanced the addressing modes to enable greater efficiency in table operations. Fig. 2-3. Instruction Set Code Assignment ・Byte variable-length instructions are used and frequently used instructions are assigned to short instruction codes. ・Frequently used instructions are extracted from the real application software. ・Reduction of instructions by adding the addressing modes and adopting the 3-operand format. ・Benchmark test with various application software proved 30% program size reduction compared to existing products.  Besides improving basic CPU performance based on the results of benchmark tests, the RX design team improved various circuit and function elements in the CPU core. For example, they decided to use a register-assignment algorithm for efficient high-speed interrupt processing, rather than banks of registers (see figure 2-4, below). This approach made it possible to use the entire register as a table register, enabling fast interrupt processing and creating a register capable of optimal assignments that also boosts user flexibility.
| Fig. 2-4. Faster Interrupts through Register Allocation ・Augmented general registers can be allocated for dedicated use by interrupts* to increase speed. 
| For the pipeline structure, the Renesas engineers applied two methods to raise the RX CPU's maximum processing speed. First, they used the Harvard architecture to enable parallel execution of instruction fetches and data accesses. Second, they made effective use of the 5-stage pipeline structure while using out-of-order completion to execute non-dependent subsequent instructions, even when the pipeline encounters a wait mode (see Figure 2-5, below). Fig. 2-5. Five-Stage Pipeline ・Makes use of 5-stage pipeline structure and supports high speeds up to 200MHz. ・Performed benchmark testing with various application software. Achieved over twice the processing performance and a 30% improvement in code efficiency compared to existing products.  Fig. 2-6. Out-of-Order Completion ・Using Out-of-Order Completion to execute instructions efficiently at high speed  Fig. 2-7. Benchmark testing result of RX 
| With a multiply instruction, hardware divider, two types of multiply-and-accumulate instructions (between memories and between registers), single-precision floating point instruction, and memory protection (optional), all of which are in single-cycle access, the architecture satisfies different performance requirement for each application and user. Renesas is an industry leader in supplying MCUs with highly reliable on-board flash memory and emphasizes the benefits of this technology in embedded system applications. For the RX architecture, the design team fully leveraged Renesas' strengths in 90nm and flash technologies. They implemented up to 4MB of on-chip flash ROM that is capable of instruction processing without wait states at speeds up to 100MHz. The engineers achieved fast speed with a MONOS*cell structure and low power consumption by a logic voltage choice. *Metal-Oxide-Nitride-Oxide-Silicon These various design ideas have enabled the performance of 1.65MIPS/MHz, a tremendous leap from the level of our existing products. We will develop, release, and support products that respond to the evolution of applications and user needs. | Table of Contents Development of the RX family The high-performance RX CPU architectur Compatibility with existing Renesas products RX family product lineup System development environment for RX MCUs
|
|