CUDA 4.1 Released

January 26th, 2012

Today NVIDIA released CUDA 4.1, including a new CUDA Toolkit, SDK, Visual Profiler, Parallel Nsight IDE and NVIDIA device driver.

CUDA 4.1 makes it easier to accelerate scientific research with GPUs with key features including

  • a redesigned Visual Profiler with automated performance analysis and expert guidance;
  • a new LLVM-based compiler that generates up to 10% faster code; and
  • 1000+ new imaging and signal processing functions in the NPP library.

The CuSparse library included with CUDA 4.1 has a new tridiagonal solver and 2x faster sparse matrix-vector multiplication using the ELL hybrid format, and the CuRand library included with CUDA 4.1 has two new random number generators. Read the rest of this entry »

NVIDIA Parallel Nsight Now Shipping

July 21st, 2010

NVIDIA today announced the release of NVIDIA Parallel Nsight software, the industry’s first development environment for GPU-accelerated applications that work with Microsoft Visual Studio.  ”By adding functionality specifically for GPU Computing developers, Parallel Nsight makes the power of the GPU more accessible than ever before,” said Sanford Russell, GM of GPU Computing at NVIDIA. NVIDIA Parallel NSight features a CUDA C/C++ debugger and application performance analyzer, and a graphics debugger and inspector.  NVIDIA Parallel Nsight supports Windows HPC Server 2008, Windows 7 and Windows Vista.  Download Parallel Nsight here.

ATI Stream Profiler v1.3 Released

May 20th, 2010

Advanced Micro Devices (AMD) recently released ATI Stream Profiler version 1.3. ATI Stream Profiler is a Microsoft® Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your OpenCL™ application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application’s performance.

Features of the tool include:

  • Measure the execution time of an OpenCL kernel
  • Query the hardware performance counters on ATI Radeon graphics cards
  • Display the memory traffic from and to GPU
  • Compare multiple runs (sessions) of the same or different programs
  • Store the profile data for each run in a csv file
  • Display the IL and ISA (hardware disassembly) code of the OpenCL kernel

Modelling GPU-CPU Workloads and Systems

March 20th, 2010

Abstract:

Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. While it may be possible for domain experts to optimize a high performance application for a very specific and well documented system, it may not perform as well or even function on a different system. Developers who have less experience with either the application domain or the system architecture may devote a significant effort to writing a program that merely functions correctly. We believe that a comprehensive analysis and modeling framework is necessary to ease application development and automate program optimization on heterogeneous platforms.

This paper reports on an empirical evaluation of 25 CUDA applications on four GPUs and three CPUs, leveraging the Ocelot dynamic compiler infrastructure which can execute and instrument the same CUDA applications on either target. Using a combination of instrumentation and statistical analysis, we record 37 different metrics for each application and use them to derive relationships between program behavior and performance on heterogeneous processors. These relationships are then fed into a modeling framework that attempts to predict the performance of similar classes of applications on different processors. Most significantly, this study identifies several non-intuitive relationships between program characteristics and demonstrates that it is possible to accurately model CUDA kernel performance using only metrics that are available before a kernel is executed.

(Andrew Kerr, Gregory Diamos and Sudakhar Yalamanchili: “Modeling GPU-CPU Workloads and Systems”. Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA. Apr. 2010. PDF Link.)

NVIDIA Introduces Nexus Integrated GPU/CPU Development Environment for Microsoft Visual Studio

October 4th, 2009

From the press release:

NVIDIA Corp. today introduced NVIDIA® Nexus, the industry’s first development environment for massively parallel computing that is integrated into Microsoft Visual Studio, the world’s most popular development environment for Windows-based solutions and Web applications and services.

“NVIDIA Nexus is going to improve programmer productivity immediately,” said Tarek El Dokor at Edge 3 Technologies. “An integrated GPU and CPU development solution is something Edge 3 has needed for a long time. The fact that it’s integrated into the Visual Studio development environment drastically reduces the learning curve.”

NVIDIA Nexus radically improves productivity by enabling developers of GPU computing applications to use the popular Microsoft Visual Studio-based tools and workflow in a transparent manner, without having to create a separate version of the application that incorporates diagnostic software calls. NVIDIA Nexus also includes the ability to run the code remotely on a different computer. Nexus includes advanced tools for simultaneously analyzing efficiency, performance, and speed of both the graphics processing unit (GPU) and central processing unit (CPU) to give developers immediate insight into how co-processing affects their applications.

Nexus is composed of three components:

Read the rest of this entry »

NVIDIA Releases First OpenCL GPU Performance Profiler and Best Practices Guide

September 9th, 2009

The OpenCL Visual Profiler is now available to all NVIDIA GPU Computing Registered developers, and will be included in the next public release of the CUDA Toolkit. Professional developers and researchers are invited to apply for the GPU Computing Registered Developer program.

The OpenCL Best Practices Guide is already publicly available on CUDA Zone.

Details from the press release:

Leveraging the extensive performance instrumentation in NVIDIA’s OpenCL drivers and hardware performance signals designed into NVIDIA GPUs, the OpenCL Visual Profiler provides developers with insight into performance bottlenecks and opportunities for optimization.

Key features include:

  • Profiling of actual hardware signals, kernel efficiency, and instruction issue rate
  • Timing of memory copies between system memory and GPU dedicated memory
  • Customizable graphs to help developers focus in on problem areas
  • Basic auto-analysis to reveal warp serialization problems
  • Easy import/export to CSV for custom analysis

NVIDIA has also prepared a helpful OpenCL Best Practices Guide designed to help OpenCL developers programming for the CUDA architecture implement high performance parallel algorithms and understand best practices for GPU Computing.

Read the rest of this entry »

gDEBugger for Apple Mac OS X – Beta Program

January 22nd, 2009

Graphic Remedy is proud to announce the upcoming release of gDEBugger for Mac OS X. This new product brings all of gDEBugger’s Debugging and Profiling abilities to the Mac OpenGL developer’s world. Using gDEBugger Mac will help OS X OpenGL developers optimize their application performance: find graphics pipeline bottlenecks, improve application graphics memory consumption, locate and remove redundant OpenGL calls and graphics memory leaks, and much more. Visit the gDebuggerMac home page to join the Beta Program, see screenshots and get more details.

gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API, and lets programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Windows, Linux and Mac OS X operating systems.

gDEBugger LINUX – Public Beta Available!

September 4th, 2007

gDEBugger is an OpenGL Debugger and Profiler. It provides the application behavior information a developer needs to find bugs and to optimize application performance. gDEBugger Linux brings all of gDEBugger’s debugging and profiling abilities to the Linux OpenGL developers’ world. gDEBugger Linux is now available as a final beta version. This version includes all gDEBugger’s features and supports the Linux i386 and x86_64 architectures. gDEBugger Linux official version will be released shortly after Graphic Remedy receive feedback from the field and fix any reported issues. (http://www.gremedy.com/gDEBuggerLinux.php)

gDEBugger V3.0 Supports OpenGL 2.1 and adds ATI Hardware Performance Counters Integration

November 7th, 2006

Graphic Remedy is proud to announce the release of gDEBugger Version 3.0. This new major version supports OpenGL V2.1 standards and contains ATI Hardware Performance Counters (Percentage Hardware busy, Transform Clip Lighting unit busy, etc.) integration. These counters are displayed in the Performance Graph and Performance Dashboard Views. V3.0 also adds the option for Floating Licenses with a dedicated License Server. The new version can be downloaded from http://www.gremedy.com/download.php.

Free gDEBugger License for Academic Users

October 4th, 2006

The OpenGL ARB and Graphic Remedy have crafted an Academic Program to make the full featured gDEBugger OpenGL debug toolkit available for use in your daily work and research – free of charge! gDEBugger is a powerful OpenGL and OpenGL ES debugger and profiler delivering one of the most intuitive OpenGL development toolkits available for graphics application developers. The ARB.Graphic Remedy Academic Program will run for one year during which time any OpenGL developer who is able to confirm they are in academia will receive an Academic gDEBugger License from Graphic Remedy at no cost. This license will be valid for one year and will include all gDEBugger software updates as they become available. Academic licensees may also optionally decide to purchase an annual support contract for the software at a reduced rate. For further information, visit:
http://academic.gremedy.com and
http://www.opengl.org/pipeline/article/vol001_3/”
.

Page 1 of 212