A GPU-based parallel star retrieval method is proposed to improve the efficiency of searching stars from star catalogue in computer simulation, especially when the FOV (Field of View) is large. By the novel algorithm, the stars in catalogue are classified and stored in different zones using latitude and longitude zoning method firstly. Based on the easily accessible star catalogue, the star zones that FOV covers can be computed exactly by constructing a spherical triangle around the FOV. As a result, the searching scope is reduced effectively. Finally, we use CUDA computation architecture to run the process of star retrieving from those star zones parallel on GPU. Experimental results show that, in comparison with CPU-oriented implementation, the proposed algorithm achieves up to tens of times speedup, and the processing time is limited within a millisecond level in large FOV and wide star magnitude span. It meets the requirement of real-time simulation.
(Chao Li, Liqiang Zhang, Jiaze Wu, and Changwen Zheng, “Parallel Accelerating for Star Catalogue Retrieval Algorithm using GPUs”, Journal of Astronautics, 2012)
In order to test the function and performance of star sensor on the ground, a fast method for simulating star map is presented. The algorithm adopts instantanesous coordinate of star and improves the star searching efficiency by optimizing the zone partitioning method for star catalogue. We overcome the low accuracy of the latitude and longitude’s span that FOV overlays by proposing a new spherical right-angled triangle method and the searching scope is reduced highly; meanwhile, the simulation model for star brightness is also built based on adopted star catalogue. Simulation study is conducted for the demonstration of the algorithm. The proposed approach meets the requirement of wide magnitude range and short simulation period.
(Chao Li, Changwen Zheng, Jiaze Wu, and Liqiang Zhang, “A fast algorithm of simulating star map for star sensor”, Proceedings of the 3rd IEEE International Conferernce on Computer and Network Technology (IEEE ICCNT), 2011)
We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from “Large-N” arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia’s Fermi architecture, sustaining up to 79% of the peak single precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared to ASIC and FPGA implementations have the potential to greatly shorten the cycle of correlator development and deployment, for cases where some power consumption penalty can be tolerated.
(M. A. Clark, P. C. La Plante, L. J. Greenhill: “Accelerating Radio Astronomy Cross-Correlation with Graphics Processing Units”, July 2011. [Preprint on ARXIV] [Sources on GITHUB])
Modern graphics processing units (GPUs) are inexpensive commodity hardware that offer Tflop/s theoretical computing capacity. GPUs are well suited to many compute-intensive tasks including digital signal processing. We describe the implementation and performance of a GPU-based digital correlator for radio astronomy. The correlator is implemented using the NVIDIA CUDA development environment. We evaluate three design options on two generations of NVIDIA hardware. The different designs utilize the internal registers, shared memory, and multiprocessors in different ways. We find that optimal performance is achieved with the design that minimizes global memory reads on recent generations of hardware. The GPU-based correlator outperforms a single-threaded CPU equivalent by a factor of 60 for a 32-antenna array, and runs on commodity PC hardware. The extra compute capability provided by the GPU maximizes the correlation capability of a PC while retaining the fast development time associated with using standard hardware, networking, and programming languages. In this way, a GPU-based correlation system represents a middle ground in design space between high performance, custom-built hardware, and pure CPU-based software correlation. The correlator was deployed at the Murchison Widefield Array 32-antenna prototype system where it ran in real time for extended periods. We briefly describe the data capture, streaming, and correlation system for the prototype array.
(Randall B. Wayth, Lincoln J. Greenhill, and Frank H. Briggs. “A GPU-based Real-time Software Correlation System for the Murchison Widefield Array Prototype“. Publications of the Astronomical Society of the Pacific, 121:857–865, 2009 August.)
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR.
In this paper, we evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general. Read the rest of this entry »
The goal of this workshop, held at the National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, was to help computational scientists in the geosciences, computational chemistry, and astronomy and astrophysics communities take full advantage of emerging high-performance computing resources based on computational accelerators, such as clusters with GPUs and Cell processors.
Slides are now available online and cover a wide range of topics including
- GPU and Cell programming tutorials
- GPU and Cell technology
- Accelerator programming, clusters, frameworks and building blocks such as sparse matrix-vector products, tree-based algorithms and in particular accelerator integration into large-scale established code bases
- Case studies and posters from geosciences, computational chemistry and astronomy/astrophysics such as the simulation of earthquakes, molecular dynamics, solar radiation, tsunamis, weather predictions, climate modeling and n-body systems as well as Monte-Carlo, Euler, Navier-Stokes and Lattice-Boltzmann type of simulations
(National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign: Path to Petascale workshop presentations, organized by Wen-mei Hwu, Volodymyr Kindratenko, Robert Wilhelmson, Todd Martínez and Robert Brunner)
The workshop “Path to PetaScale: Adapting GEO/CHEM/ASTRO Applications for Accelerators and Accelerator Clusters” was held at the National Center for Supercomputing Applications (NCSA), University of Illinois Urbana-Champaign, on April 2-3, 2009. This workshop, sponsored by NSF and NCSA, helped computational scientists in the geosciences, computational chemistry, and astronomy and astrophysics communities take full advantage of emerging high-performance computing accelerators such as GPUs and Cell processors. The workshop consisted of joint technology sessions during the first day and domain-specific sessions on the second day. Slides from the presentations are now online.
This work approaches the fundamental problem of accelerating FFT computation by use of GPUs, in order to apply it to Adaptive Optics, the key for obtaining the maximum performance from projected ground-based eXtremely Large Telescopes. A method to efficiently adapt the FFT for the underlying architecture of GPUs is given. The authors derive a novel FFT method that alternates base-2 and base-4 decomposition of the bidimensional domain to take the most from Multiple Render Target extension as they elaborate a very unusual Pease 8-data “butterfly”. (Modal Fourier wavefront reconstruction using GPUs J.G. Marichal-Hernandez, J.M. Rodriguez-Ramos, F. Rosa. La Laguna University. To appear in Journal of Electronic Imaging.)