5581

Posts

Sep, 8

GPU Computation in Bioinspired Algorithms: A Review

Bioinspired methods usually need a high amount of computational resources. For this reason, parallelization is an interesting alternative in order to decrease the execution time and to provide accurate results. In this sense, recently there has been a growing interest in developing parallel algorithms using graphic processing units (GPU) also refered as GPU computation. Advances […]
Sep, 8

Towards GPGPU Assisted Computing in Virtualized Environments

General Purpose Computation on Graphics Processing Units (GPGPU) makes it possible to use the massive computing power of modern graphics cards for generic high-performance computing. However, the new virtualization technologies will typically not support high-performance graphics cards and as a consequence GPGPU resources can not be used in typical virtualization setups. In this paper we […]
Sep, 8

Implementing Independent Component Analysis in General-Purpose GPU Architectures

New computational architectures, such as multi-core processors and graphics processing units (GPUs), pose challenges to application developers. Although in the case of general-purpose GPU programming, environments and toolkits such as CUDA and OpenCL have simplified application development, different ways of thinking about memory access, storage, and program execution are required. This paper presents a strategy […]
Sep, 8

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as […]
Sep, 8

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

The growth in multicore CPUs and the emergence of powerful manycore GPUs has led to proliferation of parallel applications. Many applications are not straight forward to be parallelized. This paper examines the performance of a parallelized implementation for calculating measurements of Complex Networks. We present an algorithm for calculating complex networks topological feature clustering coefficient, […]
Sep, 7

Pegasus: coordinated scheduling for virtualized accelerator-based systems

Heterogeneous multi-cores–platforms comprised of both general purpose and accelerator cores–are becoming increasingly common. While applications wish to freely utilize all cores present on such platforms, operating systems continue to view accelerators as specialized devices. The Pegasus system described in this paper uses an alternative approach that offers a uniform resource usage model for all cores […]
Sep, 7

GPU-Based approaches for multiobjective local search algorithms. A case study: the flowshop scheduling problem

Multiobjective local search algorithms are efficient methods to solve complex problems in science and industry. Even if these heuristics allow to significantly reduce the computational time of the solution search space exploration, this latter cost remains exorbitant when very large problem instances are to be solved. As a result, the use of graphics processing units […]
Sep, 7

Automatic CPU-GPU communication management and optimization

The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is limited by the complexities of CPU-GPU communication. To address these communications problems, this paper presents the first fully automatic system for managing and optimizing CPU-GPU communcation. This system, called the CPU-GPU […]
Sep, 7

High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs

The visualization of molecular orbitals (MOs) is important for analyzing the results of quantum chemistry simulations. The functions describing the MOs are computed on a three-dimensional lattice, and the resulting data can then be used for plotting isocontours or isosurfaces for visualization as well as for other types of analyses. Existing software packages that render […]
Sep, 7

MacroSS: macro-SIMDization of streaming applications

SIMD (Single Instruction, Multiple Data) engines are an essential part of the processors in various computing markets, from servers to the embedded domain. Although SIMD-enabled architectures have the capability of boosting the performance of many application domains by exploiting data-level parallelism, it is very challenging for compilers and also programmers to identify and transform parts […]
Sep, 7

CUDA-level performance with python-level productivity for Gaussian mixture model applications

Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware […]
Sep, 7

Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

Heterogeneous multicore processors have emerged as an energy- and area-efficient architectural solution to improving performance for domain-specific applications such as those with a plethora of data-level parallelism. These processors typically contain a large number of small, compute-centric cores for acceleration while keeping one or two high-performance ILP cores on the die to guarantee single-thread performance. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: