You are here: Home » Archives for Supercomputing
May 31st, 2010
The June 2010 Top500 list of the world’s fastest supercomputers was released this week at ISC 2010. While the US Jaguar supercomputer (located at the Department of Energy’s Oak Ridge Leadership Computing Facility) retained the top spot in Linpack performance, a Chinese cluster called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a GPU-accelerated system, or a Chinese system, has ever achieved on the Top500 list.
For more information, visit www.TOP500.org.
Posted in Events, Press | Tags: Supercomputing, Top500 | 1 Comment
December 2nd, 2009
High throughput architectures for HPC seem likely to emphasize many cores with deep multithreading, wide SIMD, and sophisticated memory hierarchies. GPUs present one example, and their high throughput has led a number of researchers to port computationally intensive applications to NVIDIA’s CUDA architecture.
This session explored the art of performance tuning for CUDA using several case studies. Topics included profiling to identify bottlenecks, effective use of the GPU’s memory hierarchy and DRAM interface to maximize bandwidth, data versus task parallelism, and avoiding SIMD divergence. Many of the lessons learned in the context of CUDA are likely to apply to other many-core architectures used in HPC applications.
Posted in Events | Tags: Birds-of-a-Feather, Conferences, NVIDIA CUDA, Supercomputing | Write a comment
November 30th, 2009
The presentation slides from the Supercomputing 2009 full-day tutorial “High-Performance Computing with CUDA” are now available at http://gpgpu.org/sc2009.
Abstract:
NVIDIA’s CUDA is a general-purpose architecture for writing highly parallel applications. CUDA provides several key abstractions—a hierarchy of thread blocks, shared memory, and barrier synchronization—for scalable high-performance parallel computing. Scientists throughout industry and academia use CUDA to achieve dramatic speedups on production and research codes. The CUDA architecture supports many languages, programming environments, and libraries including C, Fortran, OpenCL, DirectX Compute, Python, Matlab, FFT, LAPACK, etc.
In this tutorial NVIDIA engineers will partner with academic and industrial researchers to present CUDA and discuss its advanced use for science and engineering domains. The morning session will introduce CUDA programming, motivate its use with many brief examples from different HPC domains, and discuss tools and programming environments. The afternoon will discuss advanced issues such as optimization and sophisticated algorithms/data structures, closing with real-world case studies from domain scientists using CUDA for computational biophysics, fluid dynamics, seismic imaging, and theoretical physics.
Posted in Developer Resources, Events | Tags: Conferences, High-Performance Computing, NVIDIA CUDA, Supercomputing, Tutorials & Courses | Write a comment
November 30th, 2009
24th International Conference on Supercomputing (ICS’10)
June 1-4, 2010
Epochal Tsukuba (Tsukuba International Congress Center)
Tsukuba, Japan
Sponsored by ACM/SIGARCH
ICS is the premier international forum for the presentation of research results in high-performance computing systems. In 2010 the conference will be held at the Epochal Tsukuba (Tsukuba International Congress Center) in Tsukuba City, the largest high-tech and academic
city in Japan.
Papers are solicited on all aspects of research, development, and application of high-performance experimental and commercial systems. Special emphasis will be given to work that leads to better understanding of the implications of the new era of million-scale parallelism and Exa-scale performance; including (but not limited to): Read the rest of this entry »
Posted in Events, Research | Tags: Conferences, High-Performance Computing, Supercomputing | Write a comment
August 26th, 2009
Abstract:
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR.
In this paper, we evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general. Read the rest of this entry »
Posted in Research | Tags: Astronomy, Cross-correlation, Radio Astronomy, Supercomputing | Write a comment
August 23rd, 2009
Abstract:
Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.
(“Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors“. Nathan Bell and Michael Garland, in “Proc. Supercomputing ’09″, August 2009.)
Posted in Research | Tags: NVIDIA CUDA, Papers, Sparse Linear Systems, Supercomputing | Write a comment
June 4th, 2009
The goal of this workshop, held at the National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, was to help computational scientists in the geosciences, computational chemistry, and astronomy and astrophysics communities take full advantage of emerging high-performance computing resources based on computational accelerators, such as clusters with GPUs and Cell processors.
Slides are now available online and cover a wide range of topics including
- GPU and Cell programming tutorials
- GPU and Cell technology
- Accelerator programming, clusters, frameworks and building blocks such as sparse matrix-vector products, tree-based algorithms and in particular accelerator integration into large-scale established code bases
- Case studies and posters from geosciences, computational chemistry and astronomy/astrophysics such as the simulation of earthquakes, molecular dynamics, solar radiation, tsunamis, weather predictions, climate modeling and n-body systems as well as Monte-Carlo, Euler, Navier-Stokes and Lattice-Boltzmann type of simulations
(National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign: Path to Petascale workshop presentations, organized by Wen-mei Hwu, Volodymyr Kindratenko, Robert Wilhelmson, Todd Martínez and Robert Brunner)
Posted in Events, Research | Tags: Astronomy, Astrophysics, Clusters, Computational Chemistry, geosciences, Supercomputing, Workshops | Write a comment
April 13th, 2009
The workshop “Path to PetaScale: Adapting GEO/CHEM/ASTRO Applications for Accelerators and Accelerator Clusters” was held at the National Center for Supercomputing Applications (NCSA), University of Illinois Urbana-Champaign, on April 2-3, 2009. This workshop, sponsored by NSF and NCSA, helped computational scientists in the geosciences, computational chemistry, and astronomy and astrophysics communities take full advantage of emerging high-performance computing accelerators such as GPUs and Cell processors. The workshop consisted of joint technology sessions during the first day and domain-specific sessions on the second day. Slides from the presentations are now online.
Posted in Events | Tags: Astronomy, Astrophysics, Computational Chemistry, geosciences, Supercomputing, Workshops | Write a comment
March 11th, 2009
Abstract:
This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors. As a case study, we design and implement the “DCGN” API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU based MPI implementations while providing fully-dynamic communication.
(Jeff A. Stuart and John D. Owens, Message Passing on Data-Parallel Architectures, Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium)
Posted in Research | Tags: APIs, NVIDIA CUDA, Papers, Programming Languages, Supercomputing, Tools | Write a comment
November 19th, 2008
This is a GPGPU event a long time in the making. Since the advent of general-purpose APIs and compilers for GPUs it has been predicted that GPUs would one day be used to help boost the performance of Supercomputers. With the latest release of the Top500 Supercomputer list, that prediction has become a reality.
More details from an NVIDIA press release:
NVIDIA Tesla Powers 29th Most Powerful Supercomputer in the World
SC08—AUSTIN, TX—NOVEMBER 17, 2008—The Tokyo Institute of Technology (Tokyo Tech) today announced a collaboration with NVIDIA to use NVIDIA® Tesla™ GPUs to boost the computational horsepower of its TSUBAME supercomputer. Through the addition of 170 Tesla S1070 1U systems, the TSUBAME supercomputer now delivers nearly 170 TFLOPS of theoretical peak performance, as well as 77.48 TFLOPS of measured Linpack performance, placing it, again, amongst the top ranks in the world’s Top 500 Supercomputers.
“Tokyo Tech is constantly investigating future computing platforms and it had become clear to us that to make the next major leap in performance, TSUBAME had to adopt GPU computing technologies,” said Satoshi Matsuoka, division director of the Global Scientific Information and Computing Center at Tokyo Tech. “In testing our key applications, the Tesla GPUs delivered speed-ups that we had never seen before, sometimes even orders of magnitude – a tremendous competitive boost for our scientists and engineers in reducing their time to solution.”
Speaking to the ease of implementation, Matsuoka continued,
“The entire upgrade was carried out in 1 week, and the TSUBAME supercomputer remained live throughout. This is an unprecedented feat in top-level supercomputing.”
Read the rest of this entry »
Posted in Business, Press, Research | Tags: NVIDIA Tesla, Supercomputing | Write a comment