Next-generation SH-4A core for mobile network focusing on through
put performance
Next-generation SH-2A core focusing on real-time control performance
The SuperH family achieved high-speed execution (1clock)of most instructions
by its RISC architecture and by pipeline execution of simple instructions.
All CPU instructions are in 16-bit fixed length based on the result of
instruction execution frequency analysis of typical embedded applications,
which realizes compact ROM size.
Frequently-used instructions are all in 16-bit fixed length
which implements compact object size and instruction fetch reduction.
Fixed instruction size facilitates high-speed execution by
pipeline.
Two instructions can be simultaneously fetched from memory
when connected to the 32-bit bus.
With a delayed branch instruction, the branch is made after execution
of the instruction immediately following the delayed branch instruction.
This minimizes disruption of the pipeline and reduces overhead when a branch
is made.
The SuperH RISC engine has 16 general-purpose registers. In
typical control program, 16 registers cover 97% of function. Faster
task change is possible than compared to the 32 general-purpose registers.
/
The SH7080 comes equipped with the latest 0.15µm F-ZTAT
technology, and with a maximum operation frequency of 80MHz, both
instructions and data can be accessed in one cycle. ( While on-chip
flash memories from other companies also have operating frequencies
of 80MHz and 100MHz, instruction branch time generates additional
stalling and because data access takes several cycles, this has led
to lower performance in some cases.)
Also, with the on-chip flash memory's large capacity of 1MByte (8Mbits),
programs which have heretofore been stored in external flash memory
can now be stored in on-chip flash memory, and CPU performance of
ten times or more can be demonstrated compared with existing technology.
SH-2 with on-chip 32 bit multiplier can execute DSP functions with high
speed.
The SuperH RISC engine has 16 general-purpose registers. In
typical control program, 16 registers cover 97% of function. Faster
task change is possible than compared to the 32 general-purpose registers.
Improve operation performance by adding
division instruction, bit operation and other instructions.
Realize 360 MIPS of real-time performance at 160 to 200MHz.
A Superscalar architecture (5-stage pipeline) enables up to two
instructions to be executed simultaneously.
Reduce the interrupt response time by using dedicated register banks
for interrupts.
Reduce program code size by added new instructions.
SH-3 instructions are upwardly-compatible with those of SH-1 and SH-2.
In addition, SH-3 DSP has extended instructions for DSP.
SH-3/SH3-DSP has on-chip MMU and supports a wide variety of operating
systems.
SH-3/SH3-DSP has large capacity cache which stores
low speed external memory data, realizing highly efficient processing
without having to wait for the high-speed CPU core.
・Mixed Instruction/Data Type
・4-way set-Associative
SH-3/SH3-DSP has 3 bus structures which make it possible to simultaneously
access both data and programs. In addition, multiply and accumulation are
executed with one clock.
SH-4 is a high-performance embedded RISC processor using the superscalar
architecture.
SH-4 expands SuperH architecture that is used as an embedded RISC CPU
for a wide range of multimedia equipment.
SuperH is a RISC CPU that has a 16-bit fixed length instruction set
for improved code efficiency and is suitable for embedded equipment.
SH-4 has inherited a 16-bit fixed length instruction. Floating-point
instructions and cache operation instructions are added. SH-4 has multiply-and-accumulate
instructions.
SH-4 uses a superscalar architecture. There are two pipelines in
the processor, and 2-instruction parallel execution is possible.
Superscalar architecture is the technology that executes two or more
instructions in one clock. A SH-4 using this technology can execute a
maximum of two instructions in one clock. Execution time is half and performance
is twice compared with a single-scalar architecture. Fourier transformation
and digital filter processing can be performed at high speeds by executing
floating-point operations and load/store of data with parallel executions.Power
consumption and electromagnetic noise can be reduced because the superscalar
architecture can realize same the performance at a lower frequency than
a single-scalar architecture.
High Speed DSP Operation by FPU
SH-4 has powerful FPU (Floating-Point Unit).
In 3D graphics, Vector transformation operation (affine transformation)
of 3-dimensional coordinates is performed for viewpoint changes,
etc.
Generally, a 4×4 matrix operation is required for affine transformation
processing.
A 4×4 matrix operation can be executed in 4cycle pitch because
SH-4 has a FTRV instruction.
16 multiplications and 12 additions are executed in four clock
cycles.
High-speed operation (1.7GFLOPS/240MHz ) can be realized.
Continuous data transfer to FPU (32-bit × 16, 2-bank registers) can
be realized by using superscalar architecture.
Multiply-and-accumulate operations such as FIR filter, FFT operation,
etc. can be executed at high speeds.
FLOPS:Floating point number Operations Per Second
"Load/Store by FMOV instruction" and "FPU operation by
FTRV instruction" for FPU Register File can be executed with parallel
executions by using the superscalar architecture. Operation can be continued
by switching FPU Register File1 and FPU Register File2 when data transfer
between one of two Register File and data cache is being performed.
[Description]
Above figure shows an example of a matrix operation.
First, load data to be operated in the register (FPU Register File),
then issue matrix operation instruction FTRV.
The following register is the same register. DR8,DR10 = FV8
DR12,DR14 = FV12
DR4,DR6 = FV4
The operation result is stored in the data cache after FTRV instruction
is executed.
MMU is built-in. Support general purpose OS such as Windows(R) CE, Linux,
etc.
SH-3, SH3-DSP and SH-4 have MMU(Memory Management Unit) built-in.
The MMU is hardware for memory management including memory mapping and
protection.
It is possible to handle logical address space which are visible
from software such as applications,etc and physical memory space separately
by using MMU.
By restricting memory area which can be accessed from an application,
the effect on the system and OS can be minimized if an application runaways.
* MMU: Memory Management Unit
* Windows is a registered trademark or trademark of Microsoft Corporation
in the US and/or other countries.
<Concept of MMU and logic space>
MMU divides physical memory into page units and assigns it to virtual
address space (mapping).
Virtual logical memory space larger than physical memory space can
be provided for software by using this function. Security can be enhanced
by executing each application and OS at separate virtual address space.
PCI Controller is built-in (SH7751/SH7751R)
In multimedia equipment and information equipment, a standard bus
like PCI bus is used as an interface with a display and a network and
for connecting a storage device and a processor.
It is good way to use PCI bus as an interface with control devices
to bring products into market in a short time.
Connection with the graphics controller, Ethernet controller, DVD
and CD-ROM controller can be realized easily by using SH7751/SH7751R(SH-4)
that has PCI controller built-in.