Partnering with NVIDIA and Microsoft, this four-day CUDA training course is designed for GPU Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU.
Chai is a new managed platform for GPGPU. It is a free and open source clean room workalike of the PeakStream platform. While not production-ready, the just-released alpha version is able to compile and run non-trivial PeakStream demo code on AMD and NVIDIA GPUs (e.g. conjugate gradient).
Chai combines an application virtual machine, garbage collection, auto-tuning JIT compiler, and high level array programming language implemented as an embedded domain-specific language in C++. The JIT back-end uses expectation-maximization to auto-tune and generate vectorized OpenCL. The JIT includes auto-tuned model families for GEMM and GEMV. Although originally developed for AMD GPUs, these parameterized kernel families also generalize to NVIDIA GPUs.
OpenCL Studio integrates OpenCL and OpenGL into a single development environment for high performance computing. The feature rich editor, interactive scripting language and extensible plug-in architecture support the rapid development of complex parallel algorithms and accompanying visualizations. Version 2.0 now conforms to the Lua plug-in architecture and closely integrates the open-source libCL parallel algorithm library. A complete version of OpenCL Studio is freely available for download at www.opencldev.com, including instructional videos and technology showcases.
VMD is a popular molecular visualization and analysis program used by thousands of researchers worldwide. VMD accelerates many of the most computationally demanding visualization and analysis features using GPU computing techqniques, resulting in improved performance and new capabilities beyond what is possible using only conventional multi-core CPUs. VMD 1.9.1 advances these capabilities further with a CUDA implementation of the new QuickSurf molecular surface representation, enabling smooth interactive animation of moderate sized biomolecular complexes consisting of a few hundred thousand to one million atoms, and allowing interactive display of molecular surfaces for static structures of very large complexes containing tens of millions of atoms, e.g. large virus capsids.
More information: http://www.ks.uiuc.edu/Research/vmd/vmd-1.9.1/
CLOGS is a library for higher-level operations on top of the OpenCL C++ API. It is designed to integrate with other OpenCL code, including synchronization using OpenCL events. Currently only two operations are supported: radix sorting and exclusive scan. Radix sort supports all the unsigned integral types as keys, and all the built-in scalar and vector types suitable for storage in buffers as values. Scan supports all the integral types. It also supports vector types, which allows for limited multi-scan capabilities.
Version 1.0 of the library has just been released. The home page is http://clogs.sourceforge.net/
A new GPU and high-performance computing meetup group has been formed in Pune, India. The informal special interest group will bring together GPU users from all fields and experience levels in India, including academicians, researchers, scientists, device manufacturers, system integrators, service providers and all early adopters of HPC & GPU computing. The group, hosted on Meetup.com, will provide HPC and GPU computing enthusiasts in India a comprehensive platform to track industry trends and engage with each other, discussing the latest developments in the field.
The group will have a core group of key academicians to lead and moderate discussions. The site will feature a bank of research papers, case studies and posts on the latest GPU-related technological developments. The meetup group will also encourage users to engage and interact over group chats and web conferences. You can find the group at
Today NVIDIA released CUDA 4.1, including a new CUDA Toolkit, SDK, Visual Profiler, Parallel Nsight IDE and NVIDIA device driver.
CUDA 4.1 makes it easier to accelerate scientific research with GPUs with key features including
- a redesigned Visual Profiler with automated performance analysis and expert guidance;
- a new LLVM-based compiler that generates up to 10% faster code; and
- 1000+ new imaging and signal processing functions in the NPP library.
The CuSparse library included with CUDA 4.1 has a new tridiagonal solver and 2x faster sparse matrix-vector multiplication using the ELL hybrid format, and the CuRand library included with CUDA 4.1 has two new random number generators. Read the rest of this entry »
CLCC, the light-weight and flexible utility for integrating OpenCL source builds into your project has just been updated to version 0.3.0. This version allows developers to save compiled binaries as object files for distribution with their programs and adds a series of options to select specific target platform/device combinations. Documentation and further information is available at http://clcc.sourceforge.net.
The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at http://wp.me/p1ZihD-1.
Partnering with NVIDIA and Microsoft, this four day course is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.
Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.
Register before January 13 and receive $250 off your course fee!
Enter promotional code AXTEB2012