6446

Posts

Nov, 24

Scalable Simulation of 3D Wave Propagation in Semi-Infinite Domains Using the Finite Difference Method on a GPU Based Cluster

The scattering of acoustic waves has been of practical interest for the petroleum industry, mainly in the determination of new oil deposits. A family of computational models that represent this phenomenon is based on finite difference methods. The simulation of these phenomena demands a high computational cost and large amounts of available memory. In this […]
Nov, 24

The Optimization of Algorithms in the Process of Temporal Data Mining Using the Compute Unified Device Architecture

Considering the importance and usefulness of real time data mining, in recent years the concern of researchers to discover new hardware architectures that can manage and process large volumes of data has increased significantly. In this paper the performance of algorithms for temporal data mining that are implemented in the new Compute Unified Device Architecture […]
Nov, 24

Accelerating QDP++/Chroma on GPUs

Extensions to the C++ implementation of the QCD Data Parallel Interface are provided enabling acceleration of expression evaluation on NVIDIA GPUs. Single expressions are off-loaded to the device memory and execution domain leveraging the Portable Expression Template Engine and using Just-in-Time compilation techniques. Memory management is automated by a software implementation of a cache controlling […]
Nov, 24

A GPU-Enabled, High-Resolution Cosmological Microlensing Parameter Survey

In the era of synoptic surveys, the number of known gravitationally lensed quasars is set to increase by over an order of magnitude. These new discoveries will enable a move from single-quasar studies to investigations of statistical samples, presenting new opportunities to test theoretical models for the structure of quasar accretion discs and broad emission […]
Nov, 23

Automated architecture-aware mapping of streaming applications onto GPUs

Graphic Processing Units (GPUs) are made up of many streaming multiprocessors, each consisting of processing cores that interleave the execution of a large number of threads. Groups of threads – called warps and wave fronts, respectively, in nVidia and AMD literature – are selected by the hardware scheduler and executed in lockstep on the available […]
Nov, 23

A Parallel Deconvolution Algorithm in Perfusion Imaging

In this paper, we will present the implementation of a deconvolution algorithm for brain perfusion quantification on GPGPU (General Purpose Graphics Processor Units) using the CUDA programming model. GPUs originated as graphics generation dedicated co-processors, but the modern GPUs have evolved to become a more general processor capable of executing scientific computations. It provides a […]
Nov, 23

Real-World Constraints of GPUs in Real-Time Systems

Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful coprocessors. In this paper, we explore possible applications for GPUs in real-time systems, discuss the limitations and constraints imposed by current GPU technology, and present a summary of our research addressing many […]
Nov, 23

Soren: Adaptive MapReduce for Programmable GPUs

In recent years the MapReduce programming model has been widely used for developing parallel data-intensive applications. As a result of its popularity, there exist many implementations of the MapReduce model on different parallel architectures including on massively parallel programmable GPUs. A basic challenge in implementing a MapReduce runtime system is the wide diversity of applications […]
Nov, 23

Towards solving the Table Maker’s Dilemma on GPU

Since 1985, the IEEE 754 standard defines formats, rounding modes and basic operations for floating-point arithmetic. In 2008 the standard has been extended, and recommendations have been added about the rounding of some elementary functions such as trigonometric functions (cosine, sine, tangent and their inverses), exponentials, and logarithms. However to guarantee the exact rounding of […]
Nov, 23

Accelerating Protein Sequence Search in a Heterogeneous Computing System

The "Basic Local Alignment Search Tool” (BLAST) is arguably the most widely used computational tool in bioinformatics. However, the computational power required for routine BLAST analysis has been outstripping Moore’s Law due to the exponential growth in the size of the genomic sequence databases that BLAST searches on. To address the above issue, we propose […]
Nov, 23

Building-Blocks for Performance Oriented DSLs

Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware […]
Nov, 23

TEG: GPU Performance Estimation Using a Timing Model

Modern Graphic Processing Units (GPUs) offer significant performance speedup over conventional processors. Programming on GPU for general purpose applications has become an important research area. CUDA programming model provides a C-like interface and is widely accepted. However, since hardware vendors do not disclose enough underlying architecture details, programmers have to optimize their applications without fully […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: