Amazon announces GPUs for Cloud Computing

November 22nd, 2010

From a recent announcement:

We are excited to announce the immediate availability of Cluster GPU Instances for Amazon EC2, a new instance type designed to deliver the power of GPU processing in the cloud. GPUs are increasingly being used to accelerate the performance of many general purpose computing problems. However, for many organizations, GPU processing has been out of reach due to the unique infrastructural challenges and high cost of the technology. Amazon Cluster GPU Instances remove this barrier by providing developers and businesses immediate access to the highly tuned compute performance of GPUs with no upfront investment or long-term commitment.

Learn more about the new Cluster GPU instances for Amazon EC2 and their use in running HPC applications.

Also, community support is becoming available; see for instance this blog post about  SCG-Ruby on EC2 instances.

GPU Technology Conference Session Archive Available

November 1st, 2010

All talks from the 2010 GPU Technology Conference (as well as archived presentations from GTC 2009) are now available from NVIDIA.

For those who missed this year’s GPU Technology Conference (GTC) , and those who attended, but had a hard time choosing between all the concurrent sessions, NVIDIA has publicly released streamed recordings, video and slides from most  GTC sessions.

There is content available for all types of programmers and developers. Those just getting started programming GPUs may want to take a look at the pre-conference tutorials, which provide an in-depth look at topics such as CUDA C, OpenCL, OpenGL and Parallel Nsight.

GPU Systems release MATLAB CPU-GPU Support

October 27th, 2010

GPU Systems LogoFrom a recent press release:

GPU Systems releases Matlab language bindings for Libra SDK – heterogenous compute platform. Libra 1.2 version with runtime compiler and environment supports x86/x64 backends, OpenGL, OpenCL and CUDA compute backends. This release brings full BLAS 1,2,3 matrix/vector, dense/sparse, real/complex, single/double math library and extended functionality to Matlab computing platform executing on x86 CPUs & GPUs from AMD and NVIDIA.

Examples:

ACUSIM Software Releases Latest Version of AcuSolve CFD Solver

October 27th, 2010
ACUSim vortex shedding

ACUSim vortex shedding

From a recent press release:

ACUSIM Software, Inc., a leader in computational fluid dynamics (CFD) technology and solutions, today announced the immediate availability of AcuSolve™ 1.8, the latest version of ACUSIM’s leading general-purpose, finite-element based CFD solver. ACUSIM will demonstrate AcuSolve 1.8 during two free webinars, taking place at 9:30 a.m. – 10:30 a.m. ET and 6:30 p.m. – 7:30 p.m. ET, on Oct. 26, 2010, at http://www.acusim.com/html/events.html.

Used by designers and research engineers with all levels of expertise, AcuSolve is highly differentiated by its accelerated speed, robustness, accuracy and multiphysics/multidisciplinary capabilities. Contributing to its robustness is the product’s Galerkin/Least-Square (GLS) finite element formulation and novel iterative linear equation solver for the fully coupled equation system. The combination of these two powerful technologies provides a highly stable and efficient solver, capable of handling unstructured meshes with tight boundary layers automatically generated from complex industrial geometries. Read the rest of this entry »

MATLAB Adds GPU Support

October 13th, 2010

Michael Feldman of HPCWire writes:

MATLAB users with a taste for GPU computing now have a perfect reason to move up to the latest version. Release R2010b adds native GPGPU support that allows user to harness NVIDIA graphics processors for engineering and scientific computing. The new capability is provided within the Parallel Computing Toolbox and Distributed Computing Server.

Full details of  MATLAB Release R1020b are available on the Mathworks site.  Information on other numerical packages accelerated using NVIDIA CUDA is available on NVIDIA’s site.

[Editor's Note: as pointed out in the comments by John Melanakos (from Accelereyes),  it may be worth checking out how MATLAB 2010b GPU support currently compares to Accelereyes Jacket.]

Introducing the OpenCL™ Programming Webinar Series

October 12th, 2010

OpenCL LogoThis webinar series is designed to help advance your OpenCL programming knowledge. Experts from AMD will cover both beginning and advanced topics starting with the basics of parallel and heterogeneous computing and an introduction to OpenCL, then progressing to more advanced topics such as performance optimization techniques and real world case studies.

This webinar describes how heterogeneous computing fits into the parallel computing paradigm, what problems it solves and what opportunities it presents. Read the rest of this entry »

A Fast GEMM Implementation on a Cypress GPU

October 12th, 2010

Abstract:

We present benchmark results of optimized dense matrix multiplication kernels for a Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP) precision. Our SGEMM and DGEMM kernels show 73% and 87% of the theoretical performance of the GPU, respectively. Currently, our SGEMM and DGEMM kernels are fastest with one GPU chip to our knowledge. Furthermore, the performance of our matrix multiply kernel in DDP is 31 Gflop/s. This performance in DDP is more than 200 times faster than the performance in DDP on single core of a recent CPU (with mpack version 0.6.5). We describe our GEMM kernels with main focus on the SGEMM implementation since all GEMM kernels share common programming and optimization techniques. While a conventional wisdom of GPU programming recommends us to heavily use shared memory on GPUs, we show that texture cache is very effective on the Cypress architecture.

(N. Nakasato: “A Fast GEMM Implementation on a Cypress GPU”, 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10) November 2010. A sample program is available at http://github.com/dadeba/dgemm_cypress)

HOOMD-blue 0.9.1 release

October 12th, 2010

HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular mynamics simulations of nano-maertials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.

HOOMD-blue 0.9.1 adds many new features. Highlights include:

  • 10 to 50 percent faster performance over 0.9.0
  • DPD (Dissipative Particle Dynamics) capability
  • EAM (Embedded Atom Method) capability
  • Removed limitation on number of exclusions
  • Support for compute 2.1 devices (such as the GTX 460)
  • Support for CUDA 3.1
  • and more

HOOMD-blue 0.9.1 is available for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.

Thrust v1.3 release

October 7th, 2010

Thrust v1.3, an open-source template library for CUDA applications, has been released. Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing.

Version 1.3 adds several new features, including:

  • a state-of-the-art sorting implementation, recently featured on Slashdot.
  • performance improvements to stream compaction and reduction
  • robust error reporting and failure detection
  • support for CUDA 3.2 and gf104-based GPUs
  • search algorithms
  • and more!

Get started with Thrust today! First download Thrust v1.3 and then follow the online quick-start guide. Refer to the online documentation for a complete list of features. Many concrete examples and a set of introductory slides are also available. Read the rest of this entry »

PyCULA: Python Bindings for CULA GPGPU LAPACK

September 30th, 2010

PyCULA is a module providing transparent PyCUDA and ctypes based Python bindings for CULAtools LAPACK by Louis Theran and Garrett Wright of Temple University. It provides support for mixing PyCUDA-style kernel code with CULA device functions and also has a complete set of ctypes wrappers for CULA.

Key Features Include:

  • Reduce Memory Leaks by using Automatic Memory Management (via PyCUDA)
  • Utilize both simple Numpy style and GPUArray manual device style interfaces.
  • Supports mixing LAPACK via CULA with your Custom Kernels.
  • Combine seamlessly with handy Python modules like SQL, gzip, SciPy, R, etc.
  • Develop, Debug, Optimize, and Get Help right at the interactive command line.

The PyCULA0.9a4 alpha release is avaiable at http://pypi.python.org/pypi/PyCULA/0.9a4. PyCULA was developed as part of the ASU/Temple Zeolite Project, which is supported by CDI-I grant DMR 0835586 to Igor Rivin and M. M. J. Treacy.

Page 10 of 33« First...89101112...2030...Last »