## Posts

Nov, 5

### Iterative induced dipoles computation for molecular mechanics on GPUs

In this work, we present a first step towards the efficient implementation of polarizable molecular mechanics force fields with GPU acceleration. The computational bottleneck of such applications is found in the treatment of electrostatics, where higher-order multipoles and a self-consistent treatment of polarization effects are needed. We have coded these sections, for the case of […]

Nov, 5

### The Scalable Heterogeneous Computing (SHOC) benchmark suite

Scalable heterogeneous computing systems, which are composed of a mix of compute devices, such as commodity multicore processors, graphics processors, reconfigurable processors, and others, are gaining attention as one approach to continuing performance improvement while managing the new challenge of energy efficiency. As these systems become more common, it is important to be able to […]

Nov, 5

### Designing efficient sorting algorithms for manycore GPUs

We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort reported in the literature, and is up to 4 times faster than the graphics-based GPUSort. It is also highly competitive with CPU […]

Nov, 5

### Best-effort semantic document search on GPUs

Semantic indexing is a popular technique used to access and organize large amounts of unstructured text data. We describe an optimized implementation of semantic indexing and document search on manycore GPU platforms. We observed that a parallel implementation of semantic indexing on a 128-core Tesla C870 GPU is only 2.4X faster than a sequential implementation […]

Nov, 5

### Sparse matrix solvers on the GPU: conjugate gradients and multigrid

Many computer graphics applications require high-intensity numerical simulation. We show that such computations can be performed efficiently on the GPU, which we regard as a full function streaming processor with high floating-point performance. We implemented two basic, broadly useful, computational kernels: a sparse matrix conjugate gradient solver and a regular-grid multigrid solver. Real time applications […]

Nov, 5

### A Survey of General-Purpose Computation on Graphics Hardware

The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with […]

Nov, 5

### NVIDIA Tesla: A Unified Graphics and Computing Architecture

To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable in C or via graphics APIs.

Nov, 5

### Accelerator: using data parallelism to program GPUs for general-purpose uses

GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead. Programmers use a conventional imperative programming language […]

Nov, 5

### A translation system for enabling data mining applications on GPUs

Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applications, several challenges still remain in programming the GPUs. Thus, it is clearly desirable to be able to program GPUs using a higher-level interface. In this paper, […]

Nov, 5

### A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU

This paper proposes a fast method for computing the costs of all-pairs shortest paths (APSPs) on the graphics processing unit (GPU). The proposed method is implemented using compute unified device architecture (CUDA), which offers us a development environment for performing general-purpose computation on the GPU. Our method is based on Harish’s iterative algorithm that computes […]

Nov, 5

### Lattice SU(2) on GPU’s

We discuss the CUDA approach to the simulation of pure gauge Lattice SU(2). CUDA is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU with single precision. Analysis with single and multiple GPU’s, using CUDA and OPENMP, are also […]

Nov, 5

### ECM on Graphics Cards

This paper reports record-setting performance for the elliptic-curve method of integer factorization: for example, 926.11 curves/second for ECM stage 1 with B1=8192 for 280-bit integers on a single PC. The state-of-the-art GMP-ECM software handles 124.71 curves/second for ECM stage 1 with B1=8192 for 280-bit integers using all four cores of a 2.4 GHz Core 2 Quad […]