Allinea DDT is part of Allinea Software’s unified tools platform, which provides a single powerful and intuitive environment for debugging and profiling of parallel and multithreaded applications. It is widely used by computational scientists and scientific programmers to fix software defects of parallel applications running on hybrid GPU clusters and supercomputers. DDT 4.1.1 supports CUDA 5.5, C++11 and the GNU 4.8 compilers. Also introduced with Allinea DDT 4.1.1 is CUDA toolkit debugging support for ARMv7 architectures. More information: http://www.allinea.com
The Libra 3.0 Heterogeneous Cloud Computing SDK has recently been released by GPU Systems. It supports PC, Tablet and Mobile Devices and includes a new virtualizing function for cloud compute services of local and remote CPUs and GPUs. C/C++, Java, C# and Matlab are supported. Read the full press release here.
Fastvideo have released their JPEG codec for NVIDIA GPUs. Peak performance of the codec reaches 6 GBytes per second and higher for images loadedfrom host RAM. For instance, a full-color 4K image with resolution 3840 x 2160 can be compressed by 10 times in merely 6 milliseconds on NVIDIA GeForce GTX Titan. More information: http://www.fastcompression.com
Algorithmic trading has become ever more popular in recent years – accounting for approximately half of all European and American stock trades placed in 2012. The trading strategies need to be back-tested regularly using historical market data for calibration and to check the expected return and risk. This is a computationally demanding process that can take hours to complete. However, back-testing the strategies frequently intra-day can significantly increase the profits for the trading institution.
While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4.
See also supplementary materials and the follow up at http://vratis.com/blog/?p=7.
(Tadeusz Tomczak, Katarzyna Zadarnowska, Zbigniew Koza, Maciej Matyka and Łukasz Mirosław: “Acceleration of iterative Navier-Stokes solvers on graphics processing units”, International Journal of Computational Fluid Dynamics, accepted, July 2013. [DOI])
Developed in partnership with NVIDIA, this hands-on four day course will teach students how to write and optimize applications that fully leverage the multi-core processing capabilities of the GPU. Taught by Acceleware developers who bring real world experience to the class room, students will benefit from:
- Hands-on exercises and progressive lectures
- Individual laptops equipped with NVIDIA GPUs for student use
- Small class sizes to maximize learning
July 29 – August 1, 2013, San Jose, CA, USA. More information: http://www.acceleware.com/training/913
The GPU Debayer software developed by Fastvideo can be used for demosaicing of raw 8-bit Bayer images to full-color 24-bit RGB format. The application employs the HQLI and DFPD algorithms and is tuned for NVIDIA GPUs, which results in very fast conversion, e.g., only 1.25 ms for Full HD image demosaicing on GeForce GTX 580. The software is freely available.
From a recent press release:
Amdahl Software, a leading supplier of development tools for multi-core software, after extensive beta testing by evaluators over a dozen countries and numerous end-user application markets, today announced the production release of OpenCL CodeBench. OpenCL CodeBench is an OpenCL Code Creation tool. It simplifies parallel software development, enabling developers to rapidly generate and optimize OpenCL applications. Engineering productivity is increased through the automation of overhead tasks. The tools suite enables engineers to work at higher levels of abstraction, accelerating the code development process. OpenCL CodeBench benefits both expert and novice engineers through a choice of command line or guided, wizard-driven development methodologies. Close cooperation with IP, SOC and platform vendors will enable future releases of OpenCL CodeBench to more tightly optimize software for specific end user platforms and development environments.
OpenCL CodeBench is available for trial or purchase. For additional information, please visit www.amdahlsoftware.com.
AccelerEyes has released dates for their upcoming CUDA and OpenCL training courses.
- Feb 25-26, Houston, TX
- Mar 4-5, Washington D.C.
- Mar 25-26, Los Angeles, CA
- Apr 9-10, Seattle, WA
- Apr 15-16, San Francisco, CA
- Feb 27-28, Houston, TX
- Mar 6-7, Washington D.C.
- Mar 27-28, Los Angeles, CA
- Apr 11-12, Seattle, WA
- Apr 17-18, San Francisco, CA
More information can be found on the courses’ webpages.
Acceleware has recently announced four courses on parallel programming:
- OpenCL on AMD APU CPUs: Jan 29 to Feb 1, 2013, Chicago, IL and Apr 9 to Apr 12, 2013, Los Angeles, CAL
- 4 Day CUDA Course with an Oil and Gas focus: Mar 12 to Mar 15, 2013, Houston, TX
- 4 Day C++ AMP Training: Apr 23 to Apr 26, 2013, Seattle, WA
More information is available on the courses’ webpages.