Join the free webinar on May 20th devoted to accelerating orthorectification, atmospheric correction, and transformations for big data with GPUs. Learn how GPU capabilities can improve time for processing large imagery 50-100 times faster. Amanda O’Connor, a Senior Solutions Engineer at Exelis will walk you through implementation of GPU processing for large imagery datasets, operational use of GPU processing for orthorectification and share benchmarks against desktop algorithms. To register follow this link: https://www2.gotomeeting.com/register/665929994.
New Embedded GPU Platform for General-Purpose Computing Delivers the Highest Performance per Energy or AreaMarch 5th, 2014
From a recent press release:
The versatile Nema™ Platform for General-Purpose Computing on an embedded GPU (GPGPU) is designed by Think Silicon for excellent performance with ultra-low energy consumption and silicon footprint, and is available now from CAST, Inc.
Designed by graphics processing experts Think Silicon Ltd., the Nema GPU is a scalable, many-core, multi-threaded, state-of-the-art, data processing design blending both graphics rendering and general computing capabilities. It offers easy configuration, rapid programming, and straightforward system integration in a reusable soft IP core suitable for ASIC or FPGA implementation.
On March 5 at 11:00am (PST), Acceleware hosts a webinar on accelerating a seismic algorithm on a cluster of AMD GPU compute nodes. The presentation will begin with an outline of the full waveform inversion (FWI) algorithm, followed by an introduction to OpenCL. The OpenCL programming model and memory spaces will be introduced. Strategies for formulating the problem to take advantage of the massively parallel GPU architecture, and key optimizations techniques are discussed including coalescing and an iterative approach to handle the slices. Performance results for the GPU are compared to the CPU run times. Click here to register.
Allinea DDT is part of Allinea Software’s unified tools platform, which provides a single powerful and intuitive environment for debugging and profiling of parallel and multithreaded applications. It is widely used by computational scientists and scientific programmers to fix software defects of parallel applications running on hybrid GPU clusters and supercomputers. DDT 4.1.1 supports CUDA 5.5, C++11 and the GNU 4.8 compilers. Also introduced with Allinea DDT 4.1.1 is CUDA toolkit debugging support for ARMv7 architectures. More information: http://www.allinea.com
The Libra 3.0 Heterogeneous Cloud Computing SDK has recently been released by GPU Systems. It supports PC, Tablet and Mobile Devices and includes a new virtualizing function for cloud compute services of local and remote CPUs and GPUs. C/C++, Java, C# and Matlab are supported. Read the full press release here.
Fastvideo have released their JPEG codec for NVIDIA GPUs. Peak performance of the codec reaches 6 GBytes per second and higher for images loadedfrom host RAM. For instance, a full-color 4K image with resolution 3840 x 2160 can be compressed by 10 times in merely 6 milliseconds on NVIDIA GeForce GTX Titan. More information: http://www.fastcompression.com
Algorithmic trading has become ever more popular in recent years – accounting for approximately half of all European and American stock trades placed in 2012. The trading strategies need to be back-tested regularly using historical market data for calibration and to check the expected return and risk. This is a computationally demanding process that can take hours to complete. However, back-testing the strategies frequently intra-day can significantly increase the profits for the trading institution.
While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4.
See also supplementary materials and the follow up at http://vratis.com/blog/?p=7.
(Tadeusz Tomczak, Katarzyna Zadarnowska, Zbigniew Koza, Maciej Matyka and Łukasz Mirosław: “Acceleration of iterative Navier-Stokes solvers on graphics processing units”, International Journal of Computational Fluid Dynamics, accepted, July 2013. [DOI])
Developed in partnership with NVIDIA, this hands-on four day course will teach students how to write and optimize applications that fully leverage the multi-core processing capabilities of the GPU. Taught by Acceleware developers who bring real world experience to the class room, students will benefit from:
- Hands-on exercises and progressive lectures
- Individual laptops equipped with NVIDIA GPUs for student use
- Small class sizes to maximize learning
July 29 – August 1, 2013, San Jose, CA, USA. More information: http://www.acceleware.com/training/913
The GPU Debayer software developed by Fastvideo can be used for demosaicing of raw 8-bit Bayer images to full-color 24-bit RGB format. The application employs the HQLI and DFPD algorithms and is tuned for NVIDIA GPUs, which results in very fast conversion, e.g., only 1.25 ms for Full HD image demosaicing on GeForce GTX 580. The software is freely available.