New Embedded GPU Platform for General-Purpose Computing Delivers the Highest Performance per Energy or Area

March 5th, 2014

From a recent press release:

The versatile Nema™ Platform for General-Purpose Computing on an embedded GPU (GPGPU) is designed by Think Silicon for excellent performance with ultra-low energy consumption and silicon footprint, and is available now from CAST, Inc.

Designed by graphics processing experts Think Silicon Ltd., the Nema GPU is a scalable, many-core, multi-threaded, state-of-the-art, data processing design blending both graphics rendering and general computing capabilities. It offers easy configuration, rapid programming, and straightforward system integration in a reusable soft IP core suitable for ASIC or FPGA implementation.

Nema’s combination of processing capability and speed plus energy- and area-saving techniques reflects Think Silicon’s beliefs that multi-purpose embedded designs are the best solution for many systems, and that full hardware utilization is the key to decreasing their energy consumption. Nema yields what the company believes is the highest performance per square millimeter available today, with reference designs delivering performance up to 16 GFLOPS/mm2 in a 28nm process.

Nema’s performance and scalability make it ideal not just for graphics processing but also for many embedded data- and computation-intensive tasks in industrial, medical, scientific, automotive, and other applications. Examples include augmented reality, computer vision, or surveillance systems that need to display graphics but also must perform video analytics algorithms such as object recognition or image processing tasks such as features extraction.

The Nema GPU employs multiple processing cores in clusters, and multiple clusters can be connected via a proprietary adaptive network-on-chip (NoC). This plus an innovative memory subsystem design allows Nema to be scaled to a multicore GPU of any size meeting any processing requirement customers may have. Designers can readily configure an arbitrary number of floating point and/or integer vector processing cores; dedicated hardware accelerators for graphics, image, and video processing; and a variety of on-chip memory components (caches, buffers, and scratchpads). Techniques such as built-in lossy or lossless compression of memory traffic reduce processing load and hence lower energy consumption.

Development for Nema is made straightforward through included industry-standard APIs and an in-house LLVM/Clang compiler tool chain that is adaptable to the changing architecture. Support for C/C+ programming is available now. Nema’s OpenCL™ support is awaiting certification, and future releases will see the addition of OpenGL® ES and OpenVX (for machine vision).