NVIDIA today announced the release of NVIDIA Parallel Nsight software, the industry’s first development environment for GPU-accelerated applications that work with Microsoft Visual Studio. ”By adding functionality specifically for GPU Computing developers, Parallel Nsight makes the power of the GPU more accessible than ever before,” said Sanford Russell, GM of GPU Computing at NVIDIA. NVIDIA Parallel NSight features a CUDA C/C++ debugger and application performance analyzer, and a graphics debugger and inspector. NVIDIA Parallel Nsight supports Windows HPC Server 2008, Windows 7 and Windows Vista. Download Parallel Nsight here.
NVIDIA Parallel Nsight Now Shipping
July 21st, 2010OpenCurrent v1.1.0 released
June 18th, 2010OpenCurrent version 1.1.0 has been released. OpenCurrent is a library for solving certains types of PDEs over 3D cartesian grids. It supports single and double precision, and includes solvers for Poisson equations, diffusion, and incompressible Navier-Stokes.
New features:
- Multi-GPU communication library
- Multi-GPU versions of Multigrid solver, Incompressible Navier-Stokes solver, and more
- NetCDF support now optional
- Support for Fermi/CUDA 3.0
- Numerous bug fixes and enhancements
Get it here: http://code.google.com/p/opencurrent/downloads/list
HOOMD-blue 0.9.0 released
May 20th, 2010HOOMD-blue stands for Highly Optimized Object-oriented Many-particle Dynamics — Blue Edition. It performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to dozens of processor cores on a fast cluster.
HOOMD-blue 0.9.0 is a major new release. Highlights include:
- Support for Fermi generation GPUs
- Performance enhancements
- New pair potentials
- Particle data is now accessible from hoomd scripts
- Binary format dump files for simulation restarts
- Numerous small enhancements to enable easily restartable jobs
- 2D simulations are now possible
- Integration methods can now be applied to specified groups of particles
- All IMD commands issued by VMD are now understood
- … and more
HOOMD-blue 0.9.0 is available for download under an open source license.
OpenCL Studio 1.0 beta released
April 5th, 2010Geist Software Labs has released the first version of OpenCL Studio for beta testing. OpenCL Studio combines OpenCL and OpenGL into a single integrated development environment that allows you to visualize OpenCL computation using powerful 3D rendering techniques. The editor hides much of the complexity of the underlying APIs while still providing flexibility via the Lua scripting language. Integrated source code editors and debugging capabilities for OpenCL, GLSL, and Lua, as well as a toolbox of 2D user interface widgets provide a framework for a wide range of parallel programming solutions.
rCUDA 1.0 released
April 5th, 2010The GAP (Universidad Politécnica de Valencia, Spain) and HPCA (Universidad Jaume I, Spain) research groups are proud to announce the public release of rCUDA 1.0. The rCUDA Framework enables the concurrent usage of CUDA-compatible devices remotely by employing the sockets API for communication between clients and servers. Thus, it can be useful in three different environments:
- Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc.
- Academia. In low performance networks, to offer access to a few high performance GPUs concurrently to all the students.
- Virtual Machines. To enable the access to the CUDA facilities on the physical machine.
The current version of rCUDA (v1.0) implements all functions in the CUDA Runtime API version 2.3, excluding OpenGL and Direct3D interoperability. rCUDA 1.0 targets the Linux OS (for 32- and 64-bit architectures) on both client and server sides. The framework is free for any purpose under the terms and conditions of the GNU GPL/LGPL (where applicable) licenses.
For additional information, visit the rCUDA web page or Antonio Peña’s webpage.
SpeedIT Toolkit 0.9.1 released
March 26th, 2010The SpeedIT Tools library provides a set of accelerated solvers for sparse linear systems of equations. Manifold acceleration, e.g. more than an order of magnitude, is achieved with a single reasonably priced NVIDIA Graphics Processing Unit (GPU) that supports CUDA and proprietary advanced optimization techniques. The library can be used in a wide spectrum of domains arising from problems with underlying 2D and 3D geometry, such as computational fluid dynamics, electro-magnetics, thermodynamics, materials, acoustics, computer vision and graphics, robotics, semiconductor devices and structural engineering. The library can be also used for problems without defined geometry such as quantum chemistry, statistics, power networks and other graphs and chemical process simulation. All computations are performed with single or double floating point precision. Two version of SpeedIT toolkit have been released: The classic version provides a conjugate gradient solver, and the extreme edition provides optimized CG, BiCGSTAB, diagonal preconditioner, memory management, and heuristic-based analysis of input matrices.
Palix Technologies launches ANDSolver beta program
March 23rd, 2010Palix Technologies has introduced a new Computational Fluid Dynamics (CFD) product called ANDSolver that has been designed from the ground up to use Graphics Processing Units (GPUs) for fast and efficient aerodynamic analysis. Although developing and running applications to use multiple CPUs is a well established practice for high performance science and engineering simulations, a newer trend towards using GPUs for computation promises faster results with lower hardware acquisition and operating costs. ANDSolver delivers on that promise with up to a 10x speedup compared to a typical quad core CPU. This level of performance is unique in that it is achieved on unstructured meshes which have traditionally not been considered amenable to GPUs because of the memory access patterns. However, based on an innovative algorithm design to maximize the performance of the NVIDIA CUDA architecture, the ease and flexibility of unstructured meshing can now be used on high-performance, cost-effective GPUs.
A limited number of additional registrants will be accepted prior to our first production release in Q2 2010. More information can be found at http://www.palixtech.com for our current beta testing program.
CUDA 3.0 toolkit released
March 20th, 2010NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:
- Support for the new Fermi architecture, with:
- Native 64-bit GPU support
- Multiple Copy Engine support
- ECC reporting
- Concurrent Kernel Execution
- Fermi HW debugging support in cuda-gdb
- Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
- C++ Class Inheritance and Template Inheritance support for increased programmer productivity
- A new unified interoperability API for Direct3D and OpenGL, with support for:
- OpenGL texture interop
- Direct3D 11 interop support
- CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
- Read the rest of this entry »
Swan: A simple tool for porting CUDA to OpenCL
March 9th, 2010Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. Its main features are the translation of CUDA kernel source-code to OpenCL, and a common API that abstracts both CUDA and OpenCL runtimes. Swan preserves the convenience of the CUDA <<< grid, block >>> kernel launch syntax by generating C source-code for kernel entry-point functions. Possible uses include:
- Evaluating OpenCL performance of an existing CUDA code
- Maintaining a dual-target OpenCL and CUDA code
- Reducing dependence on NVCC when compiling host code
- Support multiple CUDA compute capabilities in a single binary
Swan is developed by the MultiscaleLab, Barcelona, and is available under the GPL2 license.
Easy GPU programming with GMAC
March 1st, 2010GMAC (Global Memory for ACcelerators) is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model allows CPU code to access data hosted in accelerator (GPU) memory. In this model, a single pointer is used for data structures accessed both in the CPU and the GPU and the coherency of the data is transparently handled by the library. Moreover, the data allocated with GMAC can be accessed by all the host threads of the program. That makes your code simpler and cleaner. GMAC currently supports programs programmed with CUDA, but OpenCL support is planned.
A paper describing the Asymmetric Distributed Shared Memory model and its implementation in GMAC has been accepted in the ASPLOS XV conference. GMAC is being developed by the Operating System Group at the Universitat Politecnica de Catalunya and the IMPACT Research Group at the University of Illinois. Binary pre-compiled packages, the source code, documentation and examples are available at the project website.
(Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu, “An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems”, accepted in: Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), March 2010.)