Posts
Jan, 26
Weak execution ordering – exploiting iterative methods on many-core GPUs
On NVIDIA’s many-core GPUs, there is no synchronization function among parallel thread blocks. When fine-granularity of data communication and synchronization is required for large-scale parallel programs executed by multiple thread blocks, frequent host synchronization are necessary, and they incur a significant overhead. In this paper, we investigate a class of applications which uses a chaotic […]
Jan, 26
A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture, there are usually halo regions that need to be updated and exchanged among different processing elements (PEs). In addition, synchronization is often used to signal the completion of […]
Jan, 26
Architectural Support for the Stream Execution Model on General-Purpose Processors
There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream programs becoming increasingly popular for both media and more general-purpose computing. Although a special style of programming called stream programming is needed to target these stream architectures, huge […]
Jan, 26
Correlating Radio Astronomy Signals with Many-Core Hardware
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is […]
Jan, 26
Adaptable particle-in-cell algorithms for graphical processing units
We developed new parameterized Particle-in-Cell algorithms and data structures for emerging multi-core and many-core architectures. Four parameters allow tuning of this PIC code to different hardware configurations. Particles are kept ordered at each time step. The first application of these algorithms is to NVIDIA Graphical Processing Units, where speedups of about 15-25 compared to an […]
Jan, 26
Hierarchical Agglomerative Clustering Using Graphics Processor with Compute Unified Device Architecture
We explore the use of today’s high-end Graphics processing units on desktops to perform hierarchical agglomerative clustering with the Compute Unified Device Architecture – CUDA of NVIDIA. Although the advancement in graphics cards has made the gaming industry to flourish,there is a lot more to be gained the field of scientific computing, high performance computing […]
Jan, 26
Simulating flows of incompressible and weakly compressible fluids on multicore hybrid computer systems
A logically simple algorithm based on explicit schemes for modeling flows of incompressible and weakly compressible fluids is considered. The hyperbolic variant of the quasi-gas dynamic system of equations is used as a mathematical model. An ingenious computer cluster based on NVIDIA GPUs is used for the computations.
Jan, 26
Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
Jan, 26
Efficient Bayesian inference in stochastic chemical kinetic models using graphical processing units
A goal of systems biology is to understand the dynamics of intracellular systems. Stochastic chemical kinetic models are often utilized to accurately capture the stochastic nature of these systems due to low numbers of molecules. Collecting system data allows for estimation of stochastic chemical kinetic rate parameters. We describe a well-known, but typically impractical data […]
Jan, 25
Fast Schedulability Analysis Using Commodity Graphics Hardware
In this paper we explore the possibility of using commodity graphics processing units (GPUs) to speedup standard schedulability analysis algorithms. Our long-term goal is to exploit GPUs to accelerate common electronic design automation algorithms, most of which tend to be computationally expensive. Our main contribution in this paper is a reformulation of a standard demand […]
Jan, 25
Molecular dynamics simulation of the supercooled Al melt on GPUs
The method of molecular dynamics (MD) is widely used to study static and dynamic properties of the condensed matter [1]. In particular an approach to study the relaxation of metastable states is developed [2]. These states play essential role in the impulse loading processes such as shock compression, laser ablation, etc. Herewith we report on […]
Jan, 25
Molecular dynamics simulations of the relaxation processes in the condensed matter on GPUs
We report on simulation technique and benchmarks for molecular dynamics simulations of the relaxation processes in solids and liquids using the graphics processing units (GPUs). The implementation of a many-body potential such as the embedded atom method (EAM) on GPU is discussed. The benchmarks obtained by LAMMPS and HOOMD packages for simple Lennard-Jones liquids and […]