Improved instruction execution performance supports more complex and composite devices.

Real-time performance of SH-2A

SH-2A improves operation performance by adding division instruction, bit operation and other instructions, and adding an addressing mode.


Two-way superscalar

By executing two or more instructions simultaneously, better performance is achieved at the same frequency. For example, a total of seven pipeline operations are available — two integer operations, one memory access, one branch, one multiplier, one shift, and one FPU — and the SH-2A is capable of executing up to two of these instructions simultaneously.

Delayed branch instruction

With a delayed branch instruction, the branch is made after the execution of the instruction immediately following the delayed branch instruction. This minimizes disruption of the pipeline and reduces overhead when a branch is made.

Minimum interrupt response time of 6 clock cycles enables faster mechanical control.

Improves interrupt response time by using dedicated register banks

To improve interrupt response time, SH-2A not only has a faster CPU operating clock but also employs a new configuration. Usually, a CPU stores the general register value that will be used for interrupt processing to the stack memory by software at the beginning of the interrupt routine before executing the interrupt processing program. SH-2A employs an architecture (register bank) that pushes data by hardware, not by software. Pushing to the register bank is performed in parallel to interrupt exception handling, so a very fast interrupt response time is achieved.

Adopts an instruction set that allows ROM size to be reduced.

Upward-compatible instruction set

All CPU instructions are in 16-bit fixed length based on the result of instruction execution frequency analysis of typical embedded applications, which realizes compact ROM size.

SH-2A further improves code efficiency

SH-2A adds 32-bit long instructions, further improving code efficiency.

SH-2A further improves performance and ROM size.

The SH-2A instruction set is a more advanced version of that of SH-2, and they are upward-compatible at the object level. The SH-2A also added new instructions to improve performance per Megahertz. In addition to improving performance per Megahertz, the new instructions have another effect: better code efficiency. The new instructions enable processing equivalent to that which was conventionally performed using a combination of multiple instructions to be performed using a smaller number of instructions.

Improved performance per Megahertz to realize low power consumption.

Use of superscalar

SH-2A uses a superscalar CPU architecture. SH-4 also uses superscalar architecture. This architecture has multiple instruction decoder circuits and computing units, so up to two instructions can be executed in parallel in a single clock cycle. Not only does this improve CPU performance per Megahertz, but it also means that the same performance can be obtained with a lower operating frequency than before. This allows the power consumption of the microcontroller to be reduced.

Use of Harvard architecture

In Harvard architecture, the bus for fetching instructions and the bus for accessing data are independent from each other. SH-4 uses this architecture. In previous SuperH devices, instructions and data went through the same internal 32-bit data bus. However, due to the 16-bit fixed length of instructions, an instruction is only fetched once every two bus cycles, leaving the other cycle free. This free bus cycle is used to perform memory access, so the performance drop resulting from competition between data access and instruction fetch is not very large. In Harvard architecture, the buses used for instruction fetch and data access are completely separate. Thus, it is possible to access the memory at the same time that a 32-bit long instruction is being executed, preventing a drop in performance.