A new major release of rCUDA™ (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. The major improvements included in the new version are:
- Updated API to 3.1
- Server now uses Runtime API when possible (CUDA >= 3.1 required)
- Introduced support for the most common CUBLAS routines
- Fixed some bugs
- Added AF_UNIX sockets support to enhance performance on local executions
- Added some load balancing capabilities to the server
- General performance improvements
- Officially added Fermi support
Further information is available from the rCUDA™ webpages http://www.gap.upv.es/rCUDA and http://www.hpca.uji.es/rCUDA.
For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs. Key insights of this article: Throughput-oriented processors tackle problems where parallelism is abundant, yielding design decisions different from more traditional latency oriented processors. Due to their design, programming throughput-oriented processors requires much more emphasis on parallelism and scalability than programming sequential processors. GPUs are the leading exemplars of modern throughput-oriented architecture, providing a ubiquitous commodity platform for exploring throughput-oriented programming.
(Michael Garland and David B. Kirk, “Understanding throughput-oriented architectures”, Commununications of the ACM 53(11), 58-66, Nov. 2010. [DOI])
Papers are solicited for the 2011 Symposium on Application Accelerators in High-Performance Computing. Presentations from technology developers and the academic user community are invited on the following topics:
- novel accelerator processors, systems, and architectures
- integration of accelerators with high-performance computing systems
- programming models for accelerator-based computing
- languages and compilers for accelerator-based computing
- run-time environments, profiling and debugging tools for accelerator-based computing
- scientific and engineering applications that use application accelerators
In addition to the general session, submissions are invited for the following domain-specific topics:
- Computational chemistry on accelerators (Chair: TBD)
- Lattice QCD (Chair: Steven Gottlieb, Indiana University, Bloomington)
- Weather and climate modeling (Chair: John Michalakes, National Renewable Energy Laboratory)
- Bioinformatics (Chair: TBD)
Submissions are due May 6, 2011, and more information can be found at the symposium website www.saahpc.org.
CUDA 3.2 has been released and can be downloaded from http://developer.nvidia.com/object/cuda_3_2_downloads.html. New features include:
New and Improved CUDA Libraries
- CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
- New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
- New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
- H.264 encode/decode libraries now included in the CUDA Toolkit
CUDA Driver & CUDA C Runtime
- Support for new 6GB Quadro and Tesla products
- New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations Read the rest of this entry »
From a recent announcement:
We are excited to announce the immediate availability of Cluster GPU Instances for Amazon EC2, a new instance type designed to deliver the power of GPU processing in the cloud. GPUs are increasingly being used to accelerate the performance of many general purpose computing problems. However, for many organizations, GPU processing has been out of reach due to the unique infrastructural challenges and high cost of the technology. Amazon Cluster GPU Instances remove this barrier by providing developers and businesses immediate access to the highly tuned compute performance of GPUs with no upfront investment or long-term commitment.
Learn more about the new Cluster GPU instances for Amazon EC2 and their use in running HPC applications.
Also, community support is becoming available; see for instance this blog post about SCG-Ruby on EC2 instances.
We are pleased to announce High-Performance Graphics 2011. High Performance Graphics is the leading international forum for performance-oriented graphics systems research including innovative algorithms, efficient implementations, and hardware architecture. The conference brings together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications. High Performance Graphics was founded in 2009 to synthesize and expand on two important and well-respected conferences in computer graphics:
- Graphics Hardware: an annual conference focusing on graphics hardware, architecture, and systems since 1986; and
- Interactive Ray Tracing: an innovative symposium begun in 2006 focusing on the emerging field of interactive ray tracing and global illumination techniques.
By combining and expanding these two communities, we bring to authors and attendees the best of both fields and a conference covering a broad range of interactive 3D graphics systems and algorithm research.
Sponsored by ACM SIGGRAPH and Eurographics (pending)
The program features three days of paper and industry presentations, with ample time for discussions during breaks, lunches, and the conference banquet.
The conference, which will take place on August 5—7, is co-located with ACM SIGGRAPH 2011 in Vancouver, Canada.
The conference website is located at http://www.highperformancegraphics.org/
We invite original and innovative performance-oriented contributions from all areas of graphics, including hardware architectures, rendering, physics, animation, AI, simulation, and data structures, with topics including (but not limited to): Read the rest of this entry »
The Euler-Lagrange (EL) framework is the most widely-used strategy for solving variational optic flow methods. We present the first approach that solves the EL equations of state-of-the-art methods on sequences with 640×480 pixels in near-realtime on GPUs. This performance is achieved by combining two ideas: (i) We extend the recently proposed Fast Explicit Diffusion (FED) scheme to optic flow, and additionally embed it into a coarse-to-fine strategy. (ii) We parallelise our complete algorithm on a GPU, where a careful optimisation of global memory operations and an efficient use of on-chip memory guarantee a good performance. Applying our approach to the variational ‘Complementary Optic Flow’ method (Zimmer et al. (2009)), we obtain highly accurate flow fields in less than a second. This currently constitutes the fastest method in the top 10 of the widely used Middlebury benchmark.
(Pascal Gwosdek, Henning Zimmer, Sven Grewenig, Andrés Bruhn and Joachim Weickert: “A Highly Efficient GPU Implementation for Variational Optic Flow Based on the Euler-Lagrange Framework”, Proceedings of the ECCV Workshop for Computer Vision with GPUs, Sep 2010.) [Project webpage with PDF, sources and additional information]
The application period for the NVIDIA Graduate Fellowship Program is now open. We are currently accepting applications for the 2011-2012 academic year. The deadline to apply is 11:59PM PST on February 3, 2011.
NVIDIA has long believed that investing in university talent is beneficial to the industry and key to our continued growth and success. The NVIDIA Graduate Fellowship Program provides funding to Ph.D. students who are researching topics that will lead to major advances in the graphics and high-performance computing industries, and are investigating innovative ways of leveraging the power of the GPU. We select students each year who have the talent, aptitude and initiative to work closely with us early in their careers. Recipients not only receive crucial funding for their research, but are able to conduct groundbreaking work with access to NVIDIA products, technology and some of the most talented minds in the field.
For complete details including application instructions, requirements, benefits, and eligibility, visit the NVIDIA Graduate Fellowship website.
The latest Top 500 list of the world’s fastest supercomputers, released November 15th, demonstrates that GPUs are being adopted on a large scale in the HPC space. Three out of the top 5 machines (#1 and #3 in China, and #4 in Japan) feature NVIDIA Tesla GPUs. Also, the list confirms the expected result that the new GPU-based Tianhe-1a machine from China has ousted Jaguar from the top spot.
More details at top500.org.
From a press release:
AUSTIN, Texas, — Financial institutions are turning to graphics processing unit (GPU) computing for real economic and performance benefits. Fast and accurate derivatives pricing model development and accelerated execution speeds are crucial for today’s derivatives marketplace. SciComp Inc. has enhanced SciFinance®, its flagship derivatives pricing software, to help quantitative developers further shorten Monte Carlo derivatives pricing model development time and create models with faster execution speeds. SciFinance® now features support for NVIDIA® Tesla™ 20-series GPUs and CUDA™ 3.0.
“The mathematical problems of pricing derivatives are tailor-made for GPU computing, and Monte Carlo simulations enjoy some of the fastest speed-ups on GPUs: from 50 to over 300 times faster compared to serial code,” said Curt Randall, executive vice president of SciComp. “This execution speed increase makes it feasible to replace grid solutions (CPUs and interconnects) with a GPU system. GPU costs are a tiny percentage of the cost of a grid solution and offer radical reductions in both footprint and power consumption.”
SciFinance takes advantage of new GPU hardware and software from NVIDIA Read the rest of this entry »