 |
|
SH-4 32-bit RISC CPU Core Family
|
 |

 |
|
|
|
The SH-4 is a licensable family of dual-issue 32-bit RISC CPU cores.
|
 |
|
|
|
The SH-4 is available in three groups, the integer SH4-400
group and SH4-500 group and with an integrated vector Floating
Point Unit, the SH4-200 group. SH-4 cores deliver programmable
multimedia solutions, for example the SH4-200 can execute in software an MPEG4
384Kbps, 15fps CIF decode in only 45MHz. Licensees can configure the size of
the 2-way set associative instruction and data caches from 4KB to 64KB in all
of the SH-4 family.
|
 |
|
|
|
The SH4-400 group is an extremely compact integer CPU core suitable
for use in SoCs requiring a class-leading performance with tight space and
power consumption constraints. For example the SH4-401S CPU core delivers up to
400 DMIPS. The SH4-450S integer CPU core is less than 0.8mm² and only consumes
0.06mW/MHz.
|
 |
|
|
|
The SH4-500 group is a compact CPU core with an integral MMU
delivering the performance of the SH4-400 but allowing more complex
applications and operating systems to be executed. The SH4-501S, synthesizable
core is only 1mm² in a generic 0.13µm CMOS process and a system with CPU and
8KB instruction and data caches would be 3.12mm².
|
 |
|
|
|
The SH4-200 group includes the high performance Vector FPU (VFPU)
which delivers 7MFLOPS/MHz performance. The VFPU is a high performance IEEE 754
conformant FPU with the performance and functionality to support audio codecs,
3D graphics etc. The SH4-210S is a synthesisable core while the SH4-202 is a
hard macro with 16KB I and 32KB D caches and system peripherals in 0.13µm,
which is available today at speeds up to 366MHz and is supported by operating
systems including Linux and Windows CE.NET.
|
 |
|
|
|
|
|
|
 |
|
|
|
|
 |
|
|
|
SH-4 key features
|
 |
|
|

 |
|
|
- Dual issue (superscalar) CPU delivering 1.5DMIPS/MHz (Dhrystone 2.1)
- Optional 128-bit Vector Floating Point Unit (FPU)
- 16-bit encoded instruction set delivers class leading code density. The
SH-4 instruction set is based on the popular SHcompact RISC instruction set and
is the only licensable 32-bit CPU technology to offer an instruction set that
is entirely 16-bit encoded.
- Efficient cache architecture: The SH-4 family has a 2-way set associative
split cache architecture.
- Optional memory management unit (MMU) that supports virtual addressing and
variable page sizes and is capable of supporting complex operating systems such
as Linux and Windows CE.NET as well as real-time kernels such as ITRON.
- The SH-4 is part of the upward compatible SuperH family and there is a huge
range of third party products already available, Renesas offers a C/C++
toolchain based on the open source GNU technology.
- Energy efficient core:
The SH-4 features Sleep and Standby power down modes.
Memory accesses are minimized through the 16-bit instruction coding.
Dual-issue performance enables the CPU to execute a task in the minimum
possible time period. This enables the CPU to spend longer periods in Sleep
mode.
SH-4 based SoCs can be designed with variable voltage supplies and multiple
clock domains with clock gearing (variable frequencies) to optimize overall
power consumption.
|
 |
|
|
|
|
|
|
 |
|
|
|
SH-4 family performance
|
 |
|
|
|
The SH-4 family delivers impressive performance across a range of multimedia
applications.
|
 |
|
|
|
|
 |
|
|
| Benchmark |
Performance |
| Dhrystone 2.1 |
1.5 DMIPS / MHz (SuperH gcc) |
| Floating point operations |
7 MFLOPS / MHz (uses FPU) |
| Complex FFT 1024 point radix 2 |
4 cycles / complex butterfly (uses FPU) |
| 16-tap, 40-sample Block FIR |
1.6MACs / cycle |
| EEMBC |
See http://www.eembc.org |
| BDTIMark2000 |
750 at 240MHz (uses FPU) |
| 3D polygons |
9.3M/s at 266MHz (uses FPU) |
| ITU-T G.729 Annex C (8k/s) |
Requires 25MHz and 40k bytes (uses FPU) |
| VoIP channels (full G.729) |
39MHz CPU performance per channel (uses FPU) |
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-400 group
|
 |
|
|
|
The SH4-400 group is a high performance dual issue integer 32-bit RISC
CPU family with a MAC/MUL unit and is designed for a range of multimedia
applications that require a compact CPU core able to execute both general
purpose code and codecs such as audio, speech and low-bit rate video.
|
 |
|
|

 |
|
|
|
The key features of the SH4-400
|
 |
|
|
- MAC/MUL unit that delivers:
o 133MMACs/s at 266MHz
o Automatic data load and pointer increment
o 16 and 32-bit inputs
o 32 and 64-bit results
- Efficient cache architecture:
o The SH4-400 has been designed with 2-way set associative data and instruction
caches that deliver a high level of system performance
o The data cache can be configured in a mixed cache/RAM mode delivering fast,
real time, deterministic performance.
o Configurable cache sizes: 4KB to 64KB
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-401S Synthesizable core
|
 |
|
|
|
The first product in the SH4-400 range is the SH4-401S, a fully
synthesizable core.
|
 |
|
|
|
The SH4-401S CPU is designed to be implemented in a range of different
processes, below are some expected implementation figures in 0.18µm and 0.13µm
processes.
|
 |
|
|
|
|
 |
|
|
| |
0.18µm
die size |
0.18µm
clock speed |
0.13µm
die size |
0.13µm
clock speed |
| CPU only |
1.76mm² |
180-200MHz |
1mm² |
225-266MHz |
| CPU + 8K I and 8K D caches |
4.86mm² |
180-200MHz |
2.66mm² |
225-266MHz |
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-450S Synthesizable core
|
 |
|
|
|
The SH4-450S is a lower frequency and more compact member of the SH4-400
range.
The following figures are typical size and performance figures for the
SH4-450S
|
 |
|
|
| |
0.18μm
die size |
0.18μm
clock speed |
0.13μm
die size |
0.13μm
clock speed |
| CPU only |
1.30mm² |
100-120MHz |
0.78mm² |
100-133MHz |
| CPU + 8K I and 8K D caches |
4.25mm² |
100-120MHz |
2.41mm² |
100-133MHz |
|
 |
|
|
|
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-500 group
|
 |
|
|
|
The SH4-500 group builds on the SH4-400 group of cores by adding a Memory
Management Unit (MMU) allowing more complex applications with virtual or
protected memory requirements to execute.
|
 |
|
|

 |
|
|
|
The SH4-500 group delivers performance in line with the SH4-400 group. The
features supported are a superset of those in the SH4-400 group.
|
 |
|
|
|
The main features of the SH4-500 are:
|
 |
|
|
- MAC/MUL unit that delivers:
o 133MMACs/s at 266MHz
o Automatic data load and pointer increment
o 16 and 32-bit inputs
o 32 and 64-bit results
- Efficient cache architecture:
o The SH4-500 has been designed with 2-way set associative data and instruction
caches that deliver a high level of system performance
o The data cache can be configured in a mixed cache/RAM mode delivering fast,
real time, deterministic performance.
o Configurable cache sizes: 4KB to 64KB
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-501S Synthesizable core
|
 |
|
|
|
The first product in the SH4-500 family is the SH4-501S, a fully
synthesizable core.
|
 |
|
|
|
The SH4-501S CPU is designed to be implemented in a range of different
processes, below are some expected implementation figures in 0.18µm and 0.13µm
general purpose processes.
|
 |
|
|
| |
0.18µm
die size
|
0.18µm
clock speed |
0.13µm
die size |
0.13µm
clock speed |
| CPU only |
1.76mm² |
180-200MHz |
1.00mm² |
225-266MHz |
| CPU + 8K I and 8K D caches |
5.67mm² |
180-200MHz |
3.12mm² |
225-266MHz |
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-200 group
|
 |
|
|
|
The SH4-200 group is a high performance dual issue integer 32-bit RISC
CPU group with an integrated vector floating point unit designed for a range of
multimedia applications that require a compact CPU core with integrated vector
floating point able to execute both general purpose code and multimedia code
such as audio, speech and video codecs. The SH4-202 running at 266MHz can
deliver a full duplex CIF, 384kbps, 15fps MPEG4 codec entirely in software.
|
 |
|
|
|
The key features of the SH4-200 group
|
 |
|
|
- Vector Floating point unit (FPU) that delivers:
o 7MFLOPS/MHz
o Vector instructions include matrix operations for 3D graphics, speech, audio
and video codecs
- Efficient cache architecture:
o The SH4-200 offers two different cache options, direct mapped and 2-way set
associative data and instruction caches.
o The data cache can be configured in a mixed cache/RAM mode delivering fast,
real time performance.
o Configurable cache sizes: 4KB to 64KB bytes
|
 |
|
|
|
|
|
|
 |
|
|
|
SH4-202: Hard macro
|
 |
|
|
|
There are a number of products available in the SH4-200 group, see the
product table below.
|
 |
|
|
|
The SH4-202 integrates the SH4-200 core, a debug port and system peripherals
providing a complete CPU system ready for integration in to an SoC device. The
SH4-202 supports a range of operating systems including Linux and Windows
CE.NET.
|
 |
|
|

 |
|
|
|
|
 |
|
|
|
Summary: SH-4 family product variants
|
 |
|
|
|
The following product variants are available:
|
 |
|
|
| Core |
Process |
Clock speed |
Cache |
Die size (mm²) |
Availability |
Comment |
| SH4-200 group - FPU with 2-way set associative
caches |
| SH4-210S |
Synthesizable core |
Up to 400MHz |
Customer configurable
|
In 0.13µm CPU+FPU =1.53mm² |
Now |
|
| SH4-202 |
0.13µm in UMC-GP, TSMC-LVOD |
266MHz
- 366MHz |
16k I
32k D |
0.13µm Core = 7.8- 8.2mm²
|
Now |
Includes UDI / AUD debug port, UART, timers, interrupts,
clocks |
| SH4-400 group - Integer CPU with 2-way set associative
caches |
| SH4-401S |
Synthesizable core |
Up to 266MHz |
Configurable |
In 0.13µm CPU=1.00mm² |
Now |
|
| SH4-450S |
Synthesizable core |
Up to 133MHz |
Configurable |
In 0.13µm CPU=0.78mm² |
Now |
|
| SH4-500 group - Integer CPU with MMU and 2-way set
associative caches |
| SH4-501S |
Synthesizable core |
Up to 266MHz |
Configurable |
In 0.13µm CPU=1.00mm² |
Now |
|
|
 |
|
|
|
Notes: * GP = General purpose process option. LVOD = Process targeted at
high clock speed
|
 |
|
|