A free, pre-alpha release of Lab4241′s GPGPU profiler is now available at www.lab4241.com. It provides source-code-line performance profiling for C or C++ code and CUDA kernels in a non-intrusive way. The profiler enables the developer to a seamless evaluation of used GPU resources (execution counts, memory access, branch diversions, etc.) per source-line, along with result evaluation in a simple, intuitive GUI, similar as with known CPU profilers like Quantify or valgrind.
This class teaches the fundamentals of parallel computing with the GPU and the CUDA programming environment. Examples are based on a series of image processing algorithms, such as those in Photoshop or Instagram. Programming and running assignments on high-end GPUs is possible, even if you don’t own one yourself. The course started Monday 4th Feb 2013 so there is still time to join. More information and enrollment: https://www.udacity.com/course/cs344.
From a recent press release:
Amdahl Software, a leading supplier of development tools for multi-core software, after extensive beta testing by evaluators over a dozen countries and numerous end-user application markets, today announced the production release of OpenCL CodeBench. OpenCL CodeBench is an OpenCL Code Creation tool. It simplifies parallel software development, enabling developers to rapidly generate and optimize OpenCL applications. Engineering productivity is increased through the automation of overhead tasks. The tools suite enables engineers to work at higher levels of abstraction, accelerating the code development process. OpenCL CodeBench benefits both expert and novice engineers through a choice of command line or guided, wizard-driven development methodologies. Close cooperation with IP, SOC and platform vendors will enable future releases of OpenCL CodeBench to more tightly optimize software for specific end user platforms and development environments.
OpenCL CodeBench is available for trial or purchase. For additional information, please visit www.amdahlsoftware.com.
amgcl is a simple and generic algebraic multigrid (AMG) hierarchy builder. Supported coarsening methods are classical Ruge-Stuben coarsening, and either plain or smoothed aggregation. The constructed hierarchy is stored and used with help of one of the supported backends including VexCL, ViennaCL, and CUSPARSE/Thrust.
With help of amgcl, solution of a large sparse system of linear equations may be easily accelerated through OpenCL, CUDA, or OpenMP technologies. Source code of the library is publicly available under MIT license at https://github.com/ddemidov/amgcl.
rCUDA (remote CUDA) v4.0 has just been released. It provides full binary compatibility with CUDA applications (no need to modify the application source code or recompile your program), native InfiniBand support, enhanced data transfers, and CUDA 5.0 API support (excluding graphics interoperability). This new release of rCUDA allows to execute existing GPU-accelerated applications by leveraging remote GPUs within a cluster (both via sharing and/or aggregating GPUs) with a negligible overhead. The new version is available free of charge ar www.rCUDA.net, along with examples, manuals and additional information.
Alea.cuBase allows to create GPU accelerated applications at all levels of sophistication, from simple GPU kernels up to complex GPU algorithms using textures, shared memory and other advanced GPU programming techniques, fully integrated into .NET. The GPU kernels are developed in functional language F# and are callable from any other .NET language. No additional wrappers or assembly translation processes are required. Alea.cuBase allows dynamic creation of GPU code at run time, thereby opening completely new dimensions for GPU accelerated applications. Trial versions are available at http://www.quantalea.net/products.
The latest release 1.4.0 of the free open-source linear algebra library ViennaCL features the following highlights:
- Two computing backends in addition to OpenCL: CUDA and OpenMP
- Improved performance for (Block-) ILU0/ILUT preconditioners
- Optional level scheduling for ILU substitutions on GPUs
- Mixed-precision CG solver
- Initializer types from Boost.uBLAS (unit_vector, zero_vector, etc.)
Any contributions of fast CUDA or OpenCL computing kernels for future releases of ViennaCL are welcome! More information is available at http://viennacl.sourceforge.net.
OpenCL CodeBench is a code creation and productivity tools suite designed to accelerate and simplify OpenCL software development. OpenCL CodeBench provides developers with automation tools for host code and unit test bench generation. Kernel code development on OpenCL is accelerated and enhanced through a language aware editor delivering advanced incremental code analysis features. Software Programmers new to OpenCL can choose to be guided through an Eclipse wizard, while the power users can leverage the command line interface with XML-based configuration files. OpenCL CodeBench Beta is now available for Linux and Windows operating systems.
Jacket enables GPU computing for MATLAB® codes. The new version v2.3 includes performance improvements and new support for CUDA 5.0. This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line.
More information: http://blog.accelereyes.com/blog/2012/10/23/jacket-v2-3/
The CUDA 5 Production Release is now available as a free download at www.nvidia.com/getcuda.
This powerful new version of the pervasive CUDA parallel computing platform and programming model can be used to accelerate more of applications using the following four (and many more) new features.
• CUDA Dynamic Parallelism brings GPU acceleration to new algorithms by enabling GPU threads to directly launch CUDA kernels and call GPU libraries.
• A new device code linker enables developers to link external GPU code and build libraries of GPU functions.
• NVIDIA Nsight Eclipse Edition enables you to develop, debug and optimize CUDA code all in one IDE for Linux and Mac OS.
• GPUDirect Support for RDMA provides direct communication between GPUs in different cluster nodes
As a demonstration of the power of Dynamic Parallelism and device code linking, CUDA 5 includes a device-callable version of the CUBLAS linear algebra library, so threads already running on the GPU can invoke CUBLAS functions on the GPU. Read the rest of this entry »