A Review of Processing Architectures Used to Accelerate AI/ML for OT Applications

By: Mario Morales, IDC

As recently as two years ago, AI/ML workloads ran almost exclusively on server-class MPUs and server-based GPU accelerators, even though server-class MPUs and GPUs are not very power-efficient when it comes to neural-network (NN) processing. The lack of efficiency results from a design philosophy that emphasizes raw MPU and GPU compute performance, achieved through very high clock rates, rather than on compute performance per watt. This design approach caters to typical server workloads and depends on data-center power and cooling capabilities.

However, many AI applications—specifically NN inference processing on the edge and in endpoint devices—require a blend of processing performance and power efficiency. Today, many markets, especially the OT (Operational Technology) markets, are recognizing that traditional server-class processing solutions are not generally appropriate for every AI application.

AI applications running on the edge and in endpoints—including wearables, industrial tools, and connected vehicles—demand a processing disruption as they become more commonplace, and eventually ubiquitous, in these markets. Server-class MPUs and GPUs simply draw far too much power to be used in OT markets.

Consequently, semiconductor suppliers targeting these OT markets are evolving their offerings very quickly. They have begun to develop different solutions that achieve the edge and endpoint markets’ performance and power-efficiency goals for AI inferencing capabilities. The OT market will demand strict power-consumption requirements and thermal limitations.

Figure 1 illustrates IDC's strength assessment for the competing processing alternatives across the OT and IT markets.

Technology Attributes of Architecture Solutions

  Performance Efficiency Low Power Market Flexibility Functional Safety Reliability Neural Network Framework Support Ease of Design Real-Time Operation Raw Computing Performance
CPU      
GPU          
FPGA    
SoC            
DRP      
DSP      
MCU      
ASIC            
VPU          

Figure 1: IDC's strength assessment for the competing processing alternatives across the OT and IT markets

VPU
Visual processing unit, primarily used for imaging and computer vision. There are a few variants of these type of solutions in the market addressing markets including drones, robotics and consumer electronics.
DRP
Dynamically reconfigurable processor. A new category being used to illustrate the growing requirements for a reconfigurable solution to optimize and address the requirements of operational technology (OT) industry segments as the market begins the adoption of AI inferencing and moves beyond traditional MCUs, MPUs and DSPs.
ASIC
Application specific integrated circuit. A custom semiconductor solution sold to a single OEM. Most, if not all, of the intellectual property comes from the system vendor in this design. The ASIC includes Google's TPU architecture.
SoC
System on a Chip. SoCs are primarily being illustrated to describe mobile baseband processors with integrated AI engines and neural network co-processors.

Source: IDC, 2018

As shown in Figure 1 above, no one type of processing option meets all of the necessary requirements for AI training or inference. Established MCU, DSP and MPU architectures serve today’s OT market well, but as we continue to see AI inferencing increasingly used in endpoint devices, IDC expects that an emerging class of solutions collectively called dynamically reconfigurable processors (DRPs) will be used to address most of the key requirements needed to run AI inference and ML algorithms in embedded, mobile and industrial IoT markets. DRPs deliver high performance with adaptive flexibility over a wide variety of target applications using a reconfigurable array of many processing elements. The DRP’s functions and interconnections can be dynamically altered under software control to meet the application’s immediate and changing processing needs and the processing elements can be partitioned and configured to run multiple algorithms simultaneously.

Figure 2 below illustrates another way to look at these competing processing architectures using power consumption and performance (in terms of operations per second or OPS, usually measured as GigaOPS (GOPS) or TeraOPS (TOPS)) as the key differentiators. Note the vertical bar separating the IT and OT worlds near the 10-watt point on the horizontal axis. OT applications have significant limits on power consumption--requirements not shared by servers and data centers. As a result, Figure 2 suggests that only DRPs, DSPs, MCUs, MPUs, and SoCs are truly appropriate processing solutions for the OT markets.

Power Efficiency of Embedded and IoT Solutions

Figure 2: Power efficiency is critical for the OT market in order to enable AI inference

SoC
System on a chip, primarily being illustrated to describe the mobile baseband and co-processors that integrate neural network processor or AI engine.
DRP
Dynamically reconfigurable processor, new category being used to illustrate the growing requirements for a reconfigurable solution to optimize and address the requirements of the operational technology (OT) industry segments as the market begins the adoption of AI inferencing and moves beyond using traditional MCUs, MPUs and DSPs.
ASIC
Application specific integrated circuit. A custom semiconductor device sold to a single OEM. Most, if not all, of the intellectual property comes from the OEM in this design. Varieties of custom ASICs include cell-based ASICs, gate array ASICs, new structured ASICs, and FPGAs. ASIC includes Google's TPU architecture.
VPUs
Visual processing unit, falls between DSPs and MPUs. VPUs are primarily used for imaging and computer vision.
Performance
Operations per second (y-axis)
Power
Watts and milliwatts (x-axis)
TOPS
Trillions of operations per second
GOPS
Billions of operations per second

Note: The dotted red line is an estimated market threshold for power requirement for OT vs. IT industry segments. Power efficiency is critical in OT systems.

Source: IDC, 2018

Performance must always find a balance with power consumption in these broad and fragmented embedded and IoT markets. Today, high-speed, server-class MPUs and GPUs are being used for training and inferencing in data centers and the cloud. However, these processing solutions consume substantial amounts of power. In data centers and the cloud-infrastructure markets, where performance remains the most important figure of merit, FPGAs and ASICs are also being used as hardware accelerators by cloud service providers (CSPs). However, as illustrated in Figure 2, power efficiency and low power consumption are critical requirements for AI/ML processing with real-time response needed by the broad set of applications and workloads in the emerging OT markets. Processing solutions for these markets must be able to execute billions or even trillions of operations per second while responding in real time, and do all of this while consuming less than 10 watts. These tough constraints and requirements preclude the use of cloud-centric, server-class MPUs, GPUs and ASICs because they all consume far too much power. This exclusion of power-gobbling processing architectures opens the door for alternative, power-efficient AI-processing solutions, such as DRPs, that have been designed, tuned and, optimized for OT applications.

This blog article is part of a series and is based on the IDC White Paper titled "Embedded Artificial Intelligence: Reconfigurable Processing Accelerates AI in Endpoint Systems for the OT Market."