AMD CodeXL: comprehensive developer tool suite for heterogeneous compute

October 9th, 2012

AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.

AMD CodeXL increases developer productivity by helping them identify programming errors and performance issues in their application quickly and easily. Now developers can debug, profile and analyze their applications with a full system-wide view on AMD APU, GPU and CPUs.

AMD CodeXL user group (requires registration) allows users to interact with the CodeXL team, provide feedback, get support and participate in the beta surveys.

Panoptes: A Binary Translation Framework for CUDA

May 22nd, 2012

Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks. Instrumentation and analysis tools for GPU environments to date have been more limited. Panoptes is a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, computationally-intensive programs can be run at the native parallelism of the device during analysis. To demonstrate its instrumentation capabilities, we currently implement a memory addressability and validity checker that targets CUDA programs.

Panoptes traces targeted programs by library interposition at runtime. Read the rest of this entry »

CUDA 4.1 Released

January 26th, 2012

Today NVIDIA released CUDA 4.1, including a new CUDA Toolkit, SDK, Visual Profiler, Parallel Nsight IDE and NVIDIA device driver.

CUDA 4.1 makes it easier to accelerate scientific research with GPUs with key features including

  • a redesigned Visual Profiler with automated performance analysis and expert guidance;
  • a new LLVM-based compiler that generates up to 10% faster code; and
  • 1000+ new imaging and signal processing functions in the NPP library.

The CuSparse library included with CUDA 4.1 has a new tridiagonal solver and 2x faster sparse matrix-vector multiplication using the ELL hybrid format, and the CuRand library included with CUDA 4.1 has two new random number generators. Read the rest of this entry »

GPU-Ocelot 2.0 Released

February 8th, 2011

GPU-Ocelot LogoOcelot 2.0.969 brings CUDA 3.2 and Fermi support to a stable release. Ocelot is a BSD-licensed open source implementation of the CUDA runtime, a PTX emulator, and a mid-level PTX compiler.

Here is a feature list for 2.0.969:

  • PTX 2.2 and Fermi device support: Floating point results should be within the ULP limits in the PTX ISA manual. Over 500 unit tests verify that the behaviour matches NVIDIA devices.
  • Four target device types: A functional PTX emulator. A PTX to LLVM to x86/ARM JIT. A PTX to CAL JIT for AMD devices (beta). A PTX to PTX JIT for NVIDIA devices.
  • A full-featured PTX 2.2 IR: An analysis/optimization pass interface over PTX (Control flow graph, dataflow graph, dominator/postdominator trees, structured control tree). Optimizations can be plugged in as modules.
  • Correctness checking tools: A memory checker (detects unaligned and out of bounds accesses). A race detector. An interactive debugger (allows stepping through PTX instructions).
  • An instruction trace analyzer interface: Allows user-defined modules to receive callbacks when PTX instructions are executed. Can be used to compute metrics over applications or perform correctness checks.
  • A CUDA API frontend: Existing CUDA programs can be directly linked against Ocelot. Device pointers can be shared across host threads. Multiple devices can be controlled from the same host thread (cudaSetDevice can be called multiple times).

Ocelot is available under a BSD license at http://code.google.com/p/gpuocelot.

    NVIDIA Parallel Nsight Now Shipping

    July 21st, 2010

    NVIDIA today announced the release of NVIDIA Parallel Nsight software, the industry’s first development environment for GPU-accelerated applications that work with Microsoft Visual Studio.  ”By adding functionality specifically for GPU Computing developers, Parallel Nsight makes the power of the GPU more accessible than ever before,” said Sanford Russell, GM of GPU Computing at NVIDIA. NVIDIA Parallel NSight features a CUDA C/C++ debugger and application performance analyzer, and a graphics debugger and inspector.  NVIDIA Parallel Nsight supports Windows HPC Server 2008, Windows 7 and Windows Vista.  Download Parallel Nsight here.

    CUDA 3.0 toolkit released

    March 20th, 2010

    NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:

    • Support for the new Fermi architecture, with:
      • Native 64-bit GPU support
      • Multiple Copy Engine support
      • ECC reporting
      • Concurrent Kernel Execution
      • Fermi HW debugging support in cuda-gdb
      • Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
    • C++ Class Inheritance and Template Inheritance support for increased programmer productivity
    • A new unified interoperability API for Direct3D and OpenGL, with support for:
      • OpenGL texture interop
      • Direct3D 11 interop support
      • CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
    • Read the rest of this entry »

    gDebugger v5.5: AMD (ATI) GPU Performance Counters Integration

    February 21st, 2010

    Graphic Remedy is proud to announce the release of gDEBugger Version 5.5 for Windows, Linux, Mac OS X and iPhone.

    This version introduces a powerful AMD GPU performance counters integration, displaying AMD graphic hardware and driver performance counters in gDEBugger’s Performance Graph and Performance Dashboard views, allowing developers to optimize their application over AMD (ATI) graphics hardware.

    AMD Performance counters are available on Windows, when using ATI Radeon (TM) HD 2000 series or newer with Catalyst (TM) 9.12 or newer.

    This version also includes a large number of bug fixes and stability improvements.

    Read the rest of this entry »

    gDEBugger for OpenCL – Beta Program

    February 10th, 2010

    Graphic Remedy is proud to announce the upcoming release of gDEBugger for OpenCL on Windows, Mac OS X and Linux. This new product will bring gDEBugger’s advanced Debugging, Profiling and Memory Analysis abilities to the OpenCL developer’s world, helping OpenCL developers find bugs and optimize parallel computing application performance and memory consumption.

    To join the Free Beta Program, see screenshots and more details, please visit http://www.gremedy.com/gDEBuggerCL.php.

    gDEBugger CL enables OpenCL developers to:

    NVIDIA Introduces Nexus Integrated GPU/CPU Development Environment for Microsoft Visual Studio

    October 4th, 2009

    From the press release:

    NVIDIA Corp. today introduced NVIDIA® Nexus, the industry’s first development environment for massively parallel computing that is integrated into Microsoft Visual Studio, the world’s most popular development environment for Windows-based solutions and Web applications and services.

    “NVIDIA Nexus is going to improve programmer productivity immediately,” said Tarek El Dokor at Edge 3 Technologies. “An integrated GPU and CPU development solution is something Edge 3 has needed for a long time. The fact that it’s integrated into the Visual Studio development environment drastically reduces the learning curve.”

    NVIDIA Nexus radically improves productivity by enabling developers of GPU computing applications to use the popular Microsoft Visual Studio-based tools and workflow in a transparent manner, without having to create a separate version of the application that incorporates diagnostic software calls. NVIDIA Nexus also includes the ability to run the code remotely on a different computer. Nexus includes advanced tools for simultaneously analyzing efficiency, performance, and speed of both the graphics processing unit (GPU) and central processing unit (CPU) to give developers immediate insight into how co-processing affects their applications.

    Nexus is composed of three components:

    Read the rest of this entry »

    GPUocelot – A binary Translator Framework for GPGPU

    July 30th, 2009

    Ocelot, developed at Georgia Tech, seeks to develop a set of tools that enable the low level analysis of GPGPU applications as well a providing a JIT compiler for generic architectures.  Ocelot currently provides an implementation of the NVIDIA CUDA runtime, capable of running the entire CUDA 2.2 and 2.1 SDKs.

    Ocelot features include a memory checker similar to valgrind, detection mechanisms for non-coalesced memory accesses, full device emulation, and a number of useful debugging and performance tuning features. The Roadmap lists future developments.

    Ocelot is available at google code, and a number of papers have been published.

    Page 1 of 3123