8282

Posts

Sep, 14

On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation

This work discusses how a flexible body formalism, specifically, the Absolute Nodal Coordinate Formulation (ANCF), is combined with the Discrete Element Method (DEM) and the Newmark implicit integration method to address many-body dynamics problems; i.e., problems with hundreds of thousands of rigid and deformable bodies. DEM is used to model friction and contact between elements, […]
Sep, 14

Accelerating Sparse Matrix Kernels on Graphics Processing Units

After microprocessor clock speeds have levelled off, general purpose computing on Graphics Pro- cessing Units (GPGPUs) has shown some promise for the future High Performance Computing (HPC). This can be largely attributed to the performance per unit cost, performance per unit watt and CUDA, a programming model for GPU. For instance, for under $400, one […]
Sep, 13

Data Sorting Using Graphics Processing Units

Graphics processing units (GPUs) have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort […]
Sep, 13

GPU Fluid Simulation using Smoothed Particle Hydrodynamics

In this paper we present an overview of our implementation of a fluid simulation technique called "Smoothed Particle Hydrodynamics". Our implementation uses a hybrid CPU+GPU hash based data structure to provide quick lookups of particle nearest neighbors and improve memory access patterns.In our discussion we begin with a brief overview of the Navier Stokes equations […]
Sep, 13

Exploring Heterogeneous Scheduling using the Task-Centric Programming Model

Computer architecture technology is moving towards more heterogeneous solutions, which will contain a number of processing units with different capabilities that may increase the performance of the system as a whole. However, with increased performance comes increased complexity; complexity that is now barely handled in homogeneous multiprocessing systems. The present study tries to solve a […]
Sep, 13

Multi-GPU implementation of the NICAM atmospheric model

Climate simulation models are used for a variety of scientific problems and accuracy of the climate prognoses is mostly limited by the resolution of the models. Finer resolution results in more accurate prognoses but, at the same time, significantly increases computational complexity. This explains the increasing interest to the High Performance Computing (HPC), and GPU […]
Sep, 13

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

The use of GPUs has been very beneficial in accelerating dense linear algebra computational kernels (DLA). Many high performance numerical libraries like CUBLAS, MAGMA, and CULA provide BLAS and LAPACK implementations on GPUs as well as hybrid computations involving both, CPUs and GPUs. GPUs usually score better performance than CPUs for compute-bound operations, especially those […]
Sep, 12

Hotspot Analysis Based Partial CUDA Acceleration of HMMER 3.0 on GPGPUs

With the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as bioinformatics, energy, finance and several research areas. Even though the GPUs provide highly parallel processing capability, the communication interface between CPU and GPU could be a performance bottleneck due to heavy data transfer. If data transfer […]
Sep, 12

Fast Determination of the Number of Endmembers for Real-Time Hyperspectral Unmixing on GPUs

Spectral unmixing is a very important task for remotely sensed hyperspectral data exploitation. It amounts at identifying a set of spectrally pure components (called endmembers) and their associated per-pixel coverage fractions (called abundances). A challenging problem in spectral unmixing is how to determine the number of endmembers in a given scene. Several automatic techniques exist […]
Sep, 12

Mapping a Dataflow Programming Model onto Heterogeneous Architectures

This thesis describes and evaluates how extending Intel’s Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flow programming. The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing […]
Sep, 12

Medusa: Simplified Graph Processing on GPUs

Graphs are the de facto data structures for many applications, and efficient graph processing is a must for the application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest path. However, it is difficult to write correct and efficient GPU programs and even […]
Sep, 12

Dynamical heterogeneities as fingerprints of a backbone structure in Potts models

We investigate slow non-equilibrium dynamical processes in two-dimensional q-state Potts model with both ferromagnetic and $pm J$ couplings. Dynamical properties are characterized by means of the mean-flipping time distribution. This quantity is known for clearly unveiling dynamical heterogeneities. Using a two-times protocol we characterize the different time scales observed and relate them to growth processes […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: