Skip to main content

SuperH RISC engine Family


The following page content corresponds to the products marketed in Japan.
If you do not live in Japan, please

SuperH core evolution
Evolution of cores most suitable to applications
  • Next-generation SH-4A core for mobile network focusing on through put performance
  • Next-generation SH-2A core focusing on real-time control performance

SuperH common architecture
Pipeline execution of instruction
The SuperH family achieved high-speed execution (1clock)of most instructions by its RISC architecture and by pipeline execution of simple instructions.
One instruction/One clock cycle (pipeline execution) Performance comparison at the same frequency
Instruction set based on 16-bit fixed length
All CPU instructions are in 16-bit fixed length based on the result of instruction execution frequency analysis of typical embedded applications, which realizes compact ROM size.
Advantages of 16-bit fixed length instructions C object size comparison
  • Frequently-used instructions are all in 16-bit fixed length which implements compact object size and instruction fetch reduction.

  • Fixed instruction size facilitates high-speed execution by pipeline.

  • Two instructions can be simultaneously fetched from memory when connected to the 32-bit bus.
  C object size comparison
Delayed branch instructions
With a delayed branch instruction, the branch is made after execution of the instruction immediately following the delayed branch instruction. This minimizes disruption of the pipeline and reduces overhead when a branch is made.
General-purpose register configuration
The SuperH RISC engine has 16 general-purpose registers. In typical control program, 16 registers cover 97% of function. Faster task change is possible than compared to the 32 general-purpose registers. /   Used register by motor control program function
Features of SH-1/SH-2
High performance CPU with on-chip Flash
The SH7080 comes equipped with the latest 0.15µm F-ZTAT technology, and with a maximum operation frequency of 80MHz, both instructions and data can be accessed in one cycle. ( While on-chip flash memories from other companies also have operating frequencies of 80MHz and 100MHz, instruction branch time generates additional stalling and because data access takes several cycles, this has led to lower performance in some cases.) Also, with the on-chip flash memory's large capacity of 1MByte (8Mbits), programs which have heretofore been stored in external flash memory can now be stored in on-chip flash memory, and CPU performance of ten times or more can be demonstrated compared with existing technology.  
On-chip 32-bit multiplier
SH-2 with on-chip 32 bit multiplier can execute DSP functions with high speed.
The extension and addition of Multiply (DMULD, DMULU) and Multiply-accumulate instruction (MAC).
General-purpose register configuration
The SuperH RISC engine has 16 general-purpose registers. In typical control program, 16 registers cover 97% of function. Faster task change is possible than compared to the 32 general-purpose registers.   Used register by motor control program function
Features of SH-2A
Improve real-time performance Improve operating frequency
Improve operation performance by adding division instruction, bit operation and other instructions. Realize 360 MIPS of real-time performance at 160 to 200MHz.
Improve instruction execution cycle performance
A Superscalar architecture (5-stage pipeline) enables up to two instructions to be executed simultaneously.
Reduce interrupt response time Improve code efficiency
Reduce the interrupt response time by using dedicated register banks for interrupts. Reduce program code size by added new instructions.
Features of SH-3/SH3-DSP
SH-3 instructions are upwardly-compatible with those of SH-1 and SH-2. In addition, SH-3 DSP has extended instructions for DSP.
SH-3/SH3-DSP has on-chip MMU and supports a wide variety of operating systems.
SH-3/SH3-DSP has large capacity cache which stores low speed external memory data, realizing highly efficient processing without having to wait for the high-speed CPU core.

・Mixed Instruction/Data Type
・4-way set-Associative
SH-3/SH3-DSP has 3 bus structures which make it possible to simultaneously access both data and programs. In addition, multiply and accumulation are executed with one clock.
Features of SH-4
  • SH-4 is a high-performance embedded RISC processor using the superscalar architecture.
    SH-4 expands SuperH architecture that is used as an embedded RISC CPU for a wide range of multimedia equipment.
    SuperH is a RISC CPU that has a 16-bit fixed length instruction set for improved code efficiency and is suitable for embedded equipment.

SH-4 has inherited a 16-bit fixed length instruction. Floating-point instructions and cache operation instructions are added. SH-4 has multiply-and-accumulate instructions.

  • SH-4 uses a superscalar architecture. There are two pipelines in the processor, and 2-instruction parallel execution is possible.

Superscalar architecture is the technology that executes two or more instructions in one clock. A SH-4 using this technology can execute a maximum of two instructions in one clock. Execution time is half and performance is twice compared with a single-scalar architecture. Fourier transformation and digital filter processing can be performed at high speeds by executing floating-point operations and load/store of data with parallel executions.Power consumption and electromagnetic noise can be reduced because the superscalar architecture can realize same the performance at a lower frequency than a single-scalar architecture.

High Speed DSP Operation by FPU
  • SH-4 has powerful FPU (Floating-Point Unit).
    • In 3D graphics, Vector transformation operation (affine transformation) of 3-dimensional coordinates is performed for viewpoint changes, etc.
    • Generally, a 4×4 matrix operation is required for affine transformation processing.
    • A 4×4 matrix operation can be executed in 4cycle pitch because SH-4 has a FTRV instruction.
    • 16 multiplications and 12 additions are executed in four clock cycles.
    • High-speed operation (1.7GFLOPS/240MHz ) can be realized.
  • Continuous data transfer to FPU (32-bit × 16, 2-bank registers) can be realized by using superscalar architecture.
  • Multiply-and-accumulate operations such as FIR filter, FFT operation, etc. can be executed at high speeds.

    FLOPS:Floating point number Operations Per Second

"Load/Store by FMOV instruction" and "FPU operation by FTRV instruction" for FPU Register File can be executed with parallel executions by using the superscalar architecture. Operation can be continued by switching FPU Register File1 and FPU Register File2 when data transfer between one of two Register File and data cache is being performed.

Above figure shows an example of a matrix operation.
First, load data to be operated in the register (FPU Register File), then issue matrix operation instruction FTRV.
The following register is the same register. DR8,DR10 = FV8
DR12,DR14 = FV12
DR4,DR6 = FV4

The operation result is stored in the data cache after FTRV instruction is executed.

MMU is built-in. Support general purpose OS such as Windows(R) CE, Linux, etc.
  • SH-3, SH3-DSP and SH-4 have MMU(Memory Management Unit) built-in. The MMU is hardware for memory management including memory mapping and protection.
  • It is possible to handle logical address space which are visible from software such as applications,etc and physical memory space separately by using MMU.
  • By restricting memory area which can be accessed from an application, the effect on the system and OS can be minimized if an application runaways.

    * MMU: Memory Management Unit
    * Windows is a registered trademark or trademark of Microsoft Corporation in the US and/or other countries.

<Concept of MMU and logic space>
MMU divides physical memory into page units and assigns it to virtual address space (mapping).
Virtual logical memory space larger than physical memory space can be provided for software by using this function. Security can be enhanced by executing each application and OS at separate virtual address space.

PCI Controller is built-in (SH7751/SH7751R)
  • In multimedia equipment and information equipment, a standard bus like PCI bus is used as an interface with a display and a network and for connecting a storage device and a processor.
  • It is good way to use PCI bus as an interface with control devices to bring products into market in a short time.
  • Connection with the graphics controller, Ethernet controller, DVD and CD-ROM controller can be realized easily by using SH7751/SH7751R(SH-4) that has PCI controller built-in.

End of content

Back To Top