January 6th, 2012
January 4th, 2012
High Performance Graphics is the leading international forum for performance-oriented graphics systems research including innovative algorithms, efficient implementations, and hardware architecture. The conference brings together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and novel applications. High Performance Graphics was founded in 2009 to synthesize and broaden on two important and well-respected conferences in computer graphics: Graphics Hardware and Interactive Ray Tracing.
HPG 2012 is co-sponsored by Eurographics and ACM SIGGRAPH and will take place on June 25-27, is co-located with the Eurographics Symposium on Rendering in Paris, France. We invite original and innovative performance-oriented contributions from all areas of graphics, including hardware architectures, rendering, physics, animation, simulation, and data structures, with topics including (but not limited to): Interactive rendering pipelines (hardware or software); Interactive rendering algorithms (hardware or software); Graphics hardware and systems; Languages and compilation; Parallel computing for graphics; and Mobile graphics. Please see the conference website for the full CFP.
December 19th, 2011
This workshop is organized by Horacio Pérez-Sánchez and José M. Cecilia and takes place in conjunction with the International Conference on Modeling & Applied Simulation (MAS 2012). The goal is to explore the use of emerging parallel computing architectures as well as High Performance Computing systems (Supercomputers, Clusters, Grids) for the simulation of relevant biological systems. We welcome papers, not submitted elsewhere for review, with a focus in topics of interest ranging from but not limited to:
- Parallel stochastic simulation
- Biological and Numerical parallel computing
- Parallel and distributed architectures
- Emerging processing architectures (e.g. GPUs, FPGAs, mixed CPU-GPU or CPU-FPGA)
- Parallel Model checking techniques.
- Parallel algorithms for biological analysis.
- Cluster and Grid Deployment for system biology
- Tools and applications
- Biologically inspired algorithms.
More details, including dates, deadlines and submission instructions, are available on the workshop web page.
December 14th, 2011
HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.
HOOMD-blue 0.10.0 adds many new features. Highlights include: Read the rest of this entry »
December 7th, 2011
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P).
Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.
(Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A.: “On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures”, The Computer Journal (in press) [DOI] [PREPRINT])
November 20th, 2011
Since the last WCCM (Sydney 2009), where we organized a similarly themed minisymposium, the scientific and engineering communities have gained much experience in using GPU hardware for their applications. The number of publications addressing GPU applications has skyrocketed, while researchers have developed much common understanding of how to implement numerical methods in this architecture. Moreover, we now find that three of the five fastest computers in the world, as measured for the Top500 list, are GPU-based systems. There is much conversation about GPUs playing a leading role in the exascale computing world. In summary, this topic is of wide interest; frankly, it is all the rage. This minisymposium will concentrate presentations from the top researchers in the world using GPU hardware for applications in all branches of computational mechanics. We encourage contributions that address innovative methods to use GPUs efficiently, studies in numerical methods as they apply to adapting to the hardware and perspectives on the future of GPUs as we advance toward exascale.
WCCM will be held at São Paolo, Brazil, 8–13 July 2012. The abstract submission deadline is December 31, 2011. More information: http://www.wccm2012.com, http://barbagroup.bu.edu/Barba_group/Events.html.
November 16th, 2011
The 4th Workshop on using Emerging Parallel Architectures (WEPA 2012) is held in conjunction with the International Conference on Computational Science (ICCS 2012), Omaha, Nebraska, June 2-4, 2011.
The computing landscape has undergone significant transformation with the emergence of more powerful processing elements such as GPUs, FPGAs, multi-cores, etc. On the multi-core front, Moore’s Law has transcended beyond the single processor boundary with the prediction that the number of cores will double every 18 months. Going forward, the primary method of gaining processor performance will be through parallelism. Multi-core technology has visibly penetrated the global market. Accordingly to the latest Top500 lists the HPC landscape has evolved from supercomputer systems into large clusters of dual or quad-core processors. Furthermore, GPUs, FPGAs and multi-cores have been shown to be formidable computing alternatives, where certain classes of applications witness more than one order of magnitude improvement over their GPP counterpart. Therefore, future computational science centers will employ resources such as FPGA and GPU architectures to serve as co-processors to offload appropriate compute-intensive portions of applications from the servers. Read the rest of this entry »
November 16th, 2011
A GPU-based parallel star retrieval method is proposed to improve the efficiency of searching stars from star catalogue in computer simulation, especially when the FOV (Field of View) is large. By the novel algorithm, the stars in catalogue are classified and stored in different zones using latitude and longitude zoning method firstly. Based on the easily accessible star catalogue, the star zones that FOV covers can be computed exactly by constructing a spherical triangle around the FOV. As a result, the searching scope is reduced effectively. Finally, we use CUDA computation architecture to run the process of star retrieving from those star zones parallel on GPU. Experimental results show that, in comparison with CPU-oriented implementation, the proposed algorithm achieves up to tens of times speedup, and the processing time is limited within a millisecond level in large FOV and wide star magnitude span. It meets the requirement of real-time simulation.
(Chao Li, Liqiang Zhang, Jiaze Wu, and Changwen Zheng, “Parallel Accelerating for Star Catalogue Retrieval Algorithm using GPUs”, Journal of Astronautics, 2012)
November 14th, 2011
In order to test the function and performance of star sensor on the ground, a fast method for simulating star map is presented. The algorithm adopts instantanesous coordinate of star and improves the star searching efficiency by optimizing the zone partitioning method for star catalogue. We overcome the low accuracy of the latitude and longitude’s span that FOV overlays by proposing a new spherical right-angled triangle method and the searching scope is reduced highly; meanwhile, the simulation model for star brightness is also built based on adopted star catalogue. Simulation study is conducted for the demonstration of the algorithm. The proposed approach meets the requirement of wide magnitude range and short simulation period.
(Chao Li, Changwen Zheng, Jiaze Wu, and Liqiang Zhang, “A fast algorithm of simulating star map for star sensor”, Proceedings of the 3rd IEEE International Conferernce on Computer and Network Technology (IEEE ICCNT), 2011)
November 14th, 2011
Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that signiﬁcantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting – a set of GPU speciﬁc optimization techniques –allows us to easily remove performance oscillations associated with problem dimensions not divisible by ﬁxed blocking sizes. For example, applied to the matrix-matrix multiplication routines, depending on the hardware conﬁguration and routine parameters, this can lead to two times faster algorithms. Similarly, the matrix-vector multiplication can be accelerated more than two times in both single and double precision arithmetic. Additionally, GPU speciﬁc acceleration techniques are applied to develop new kernels (e.g. syrk, symv) that are up to 20x faster than the currently available kernels. We present these kernels and also show their acceleration e!ect to higher level dense linear algebra routines. The accelerated kernels are now freely available through the MAGMA BLAS library.
(R. Nath, S. Tomov and J. Dongarra: “Accelerating GPU Kernels for Dense Linear Algebra”, VECPAR 2010. [PDF])
We present an improved matrix–matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi’s new architectural features, most notably their extended memory hierarchy and memory sizes. The improved kernels run at up to 300 GFlop/s in double precision and up to 645 GFlop/s in single precision arithmetic (on a C2050), which is correspondingly 58% and 63% of the theoretical peak. We compare the improved kernels with the currently available version in CUBLAS 3.1. Further, we show the effect of the new kernels on higher-level dense linear algebra (DLA) routines such as the one-sided matrix factorizations, and compare their performances with corresponding, currently available routines running on homogeneous multicore systems.
(R. Nath and S. Tomov and J. Dongarra: “An Improved MAGMA GEMM For Fermi Graphics Processing Units”, International Journal of High Performance Computing Applications. 24(4), 511-515, 2010. [DOI] [PREPRINT])