You are here: Home » Archives for NVIDIA CUDA
February 1st, 2011
This new report covers all the performance improvements in the latest CUDA Toolkit 3.2 release, and compares CUDA parallel math library performance vs. commonly used CPU libraries.
Learn about the performance advantages of using the CUDA parallel math libraries for FFT, BLAS, sparse matrix operations, and random number generation.
Posted in Developer Resources | Tags: FFT, Linear Algebra, NVIDIA CUDA, Random Number Generation, Sparse Linear Systems | Write a comment
January 23rd, 2011
Abstract:
We implemented a GPU based parallel code to perform Monte Carlo simulations of the two dimensional q-state Potts model. The algorithm is based on a checkerboard update scheme and assigns independent random number generators to each thread (one thread per spin). The implementation allows to simulate systems up to ~10^9 spins with an average time per spin flip of 0.147ns on the fastest GPU card tested, representing a speedup up to 155x, compared with an optimized serial code running on a standard CPU. The possibility of performing high speed simulations at large enough system sizes allowed us to provide a positive numerical evidence about the existence of metastability on very large systems based on Binder’s criterion, namely, on the existence or not of specific heat singularities at spinodal temperatures different of the transition one.
(Ezequiel E. Ferrero, Juan Pablo De Francesco, Nicolás Wolovick and Sergio A. Cannas: “q-state Potts model metastability study using optimized GPU-based Monte Carlo algorithms”. [arXiv:1101.0876] [code and additional information])
Posted in Research | Tags: Monte Carlo, NVIDIA CUDA, Papers | Write a comment
January 23rd, 2011
Tina’s Random Number Generator Library (TRNG) version 4.11 has been released. TRNG is a state of the art open-source C++ pseudo-random number generator library for sequential and parallel Monte Carlo simulations. Its design principles are based on a proposal for an extensible random number generator facility that will be part of the forthcoming revision of the ISO C++ standard. The TRNG library features an object oriented design, is easy to use and has been speed optimized. Its implementation does not depend on any communication library or hardware architecture. TRNG is suited for shared memory as well as for distributed memory computers and may be used in various parallel programming environments, e.g. Message Passing Interface Standard or OpenMP. As an outstanding new feature of the latest TRNG release 4.11 it also supports CUDA. All generators that are implemented by TRNG have been subjected to thorough statistical tests in sequential and parallel setups. Download and further information: http://trng.berlios.de/
Posted in Developer Resources | Tags: Libraries, Monte Carlo, NVIDIA CUDA, Open Source, Random Number Generation | Write a comment
January 12th, 2011
Abstract:
Although trivial background subtraction (BGS) algorithms (e.g. frame differencing, running average…) can perform quite fast, they are not robust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. We propose an improved version of the Extended Gaussian mixture model that utilizes the computational power of Graphics Processing Units (GPUs) to achieve real-time performance. Experiments show that our implementation running on a low-end GeForce 9600GT GPU provides at least 10x speedup. The frame rate is greater than 50 frames per second (fps) for most of the tests, even on HD video formats.
(Vu Pham, Phong Vo, Vu Thanh Hung and Le Hoai Bac: “GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction”. IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010. [DOI] [code and additional information])
Posted in Developer Resources, Research | Tags: Computer Vision, Image Processing, NVIDIA CUDA, Papers, Real-Time | Write a comment
December 14th, 2010
The “Beta 2″ version of GPU.NET, a new product by TidePowerd, has recently been released. It allows developers to write GPU-based code in C# or other .NET-supported languages. GPU.NET beta is available for public download, and the full documentation and several example projects are available online.
Posted in Developer Resources | Tags: .NET, NVIDIA CUDA, Programming Languages | Write a comment
December 14th, 2010
MAGMA 1.0 RC1 is now available, including the MAGMA sources. MAGMA 1.0 RC1 is intended for a single CUDA enabled NVIDIA GPU. It extends version 0.2 by adding support for Fermi GPUs (see the sample performances for LU, QR, and Cholesky).
Included are routines for the following algorithms:
- LU, QR, and Cholesky factorizations in both real and complex arithmetic (single and double);
- Linear solvers based on LU, QR, and Cholesky in both real and complex arithmetic (single and double);
- Mixed-precision iterative refinement solvers based on LU, QR, and Cholesky in both real and complex arithmetic;
- MAGMA BLAS in real arithmetic (single and double), including gemm, gemv, symv, and trsm.
See the MAGMA homepage for a download link.
Posted in Developer Resources | Tags: Dense Linear Algebra, Linear Algebra, Numerical Algorithms, NVIDIA CUDA | Write a comment
November 27th, 2010
The OpenFOAM SpeedIT plugin version 1.1 has been released under the GPL License. The most important new features are:
- Multi-GPU support
- Tested on Fermi architecture (GTX460 and Tesla C2050)
- Automated submission of the domain to the GPU cards (using decomposePar from OpenFOAM)
- Optimized submission of computational tasks to the best GPU card in the system for any number of computational threads
- Plugin picks the most powerful GPU card for a single thread cases
The OpenFOAM SpeedIT plugin is available at http://speedit.vratis.com.
Posted in Developer Resources | Tags: Linear Algebra, Multi-GPU, NVIDIA CUDA, Physics Simulation, Tools | Write a comment
November 27th, 2010
A new major release of rCUDA™ (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. The major improvements included in the new version are:
- Updated API to 3.1
- Server now uses Runtime API when possible (CUDA >= 3.1 required)
- Introduced support for the most common CUBLAS routines
- Fixed some bugs
- Added AF_UNIX sockets support to enhance performance on local executions
- Added some load balancing capabilities to the server
- General performance improvements
- Officially added Fermi support
Further information is available from the rCUDA™ webpages http://www.gap.upv.es/rCUDA and http://www.hpca.uji.es/rCUDA.
Posted in Developer Resources | Tags: Clusters, Libraries, Multi-GPU, NVIDIA CUDA, Tools | Write a comment
November 22nd, 2010
CUDA 3.2 has been released and can be downloaded from http://developer.nvidia.com/object/cuda_3_2_downloads.html. New features include:
New and Improved CUDA Libraries
- CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
- New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
- New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
- H.264 encode/decode libraries now included in the CUDA Toolkit
CUDA Driver & CUDA C Runtime
- Support for new 6GB Quadro and Tesla products
- New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations Read the rest of this entry »
Posted in Developer Resources | Tags: NVIDIA CUDA | 3 Comments
November 21st, 2010
Abstract:
The Euler-Lagrange (EL) framework is the most widely-used strategy for solving variational optic flow methods. We present the first approach that solves the EL equations of state-of-the-art methods on sequences with 640×480 pixels in near-realtime on GPUs. This performance is achieved by combining two ideas: (i) We extend the recently proposed Fast Explicit Diffusion (FED) scheme to optic flow, and additionally embed it into a coarse-to-fine strategy. (ii) We parallelise our complete algorithm on a GPU, where a careful optimisation of global memory operations and an efficient use of on-chip memory guarantee a good performance. Applying our approach to the variational ‘Complementary Optic Flow’ method (Zimmer et al. (2009)), we obtain highly accurate flow fields in less than a second. This currently constitutes the fastest method in the top 10 of the widely used Middlebury benchmark.
(Pascal Gwosdek, Henning Zimmer, Sven Grewenig, Andrés Bruhn and Joachim Weickert: “A Highly Efficient GPU Implementation for Variational Optic Flow Based on the Euler-Lagrange Framework”, Proceedings of the ECCV Workshop for Computer Vision with GPUs, Sep 2010.) [Project webpage with PDF, sources and additional information]
Posted in Research | Tags: Computer Vision, NVIDIA CUDA, Papers | Write a comment