7446

Posts

Mar, 29

Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System’s Perspective

Multicore machines equipped with accelerators are becoming increasingly popular in the High Performance Computing ecosystem. Hybrid architectures provide significantly improved energy efficiency, so that they are likely to generalize in the Manycore era. However, the complexity introduced by these architectures has a direct impact on programmability, so that it is crucial to provide portable abstractions […]
Mar, 29

A computing origami: Optimized code generation for emerging parallel platforms

This thesis deals with code generation for parallel applications on emerging platforms, in particular FPGA and GPU-based platforms. These platforms expose a large design space, throughout which performance is affected by significant architectural idiosyncrasies. In this context, generating efficient code is a global optimization problem. The code generation methods described in this thesis apply to […]
Mar, 29

Multicore Processing for Clustering Algorithms

Data Mining algorithms such as classification and clustering are the future of computation, though multidimensional data-processing is required. People are using multicore processors with GPU’s. Most of the programming languages doesn’t provide multiprocessing facilities and hence wastage of processing resources. Clustering and classification algorithms are more resource consuming. In this paper we have shown strategies […]
Mar, 29

A Massively Parallel Approach for Nonlinear Interdependency Analysis of Multivariate Signals with GPGPU

Nonlinear interdependency (NLI) analysis is an effective method for measurement of synchronization among brain regions, which is an important feature of normal and abnormal brain functions. But its application in practice has long been largely hampered by the ultra-high complexity of the NLI algorithms. We developed a massively parallel approach to address this problem. The […]
Mar, 29

Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees

Auto-tuning is a widely used and effective technique for optimizing a parametrized GPU code template for a particular computation on particular hardware. Its drawback is that thorough or exhaustive auto-tuning requires compiling many kernels and calling each one many times; this process is slow. Furthermore, library abstraction boundaries provide operations such as image filtering and […]
Mar, 28

Auto-tuning a High-Level Language Targeted to GPU Codes

Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and […]
Mar, 28

Accelerating the FDTD Method Using SSE and Graphics Processing Units

The Finite-Difference Time-Domain (FDTD) method is a computational technique for modelling the behaviour of electromagnetic waves in three-dimensional space. When executed to solve real-world problems the FDTD method is characterised by long execution times involving a large amount of data organised into matrices. The FDTD method exhibits ample data parallelism, and parallel computing techniques are […]
Mar, 28

Systematic construction, verification and implementation methodology for LDPC codes

In this article, a novel and systematic Low-density parity-check (LDPC) code construction, verification and implementation methodology is proposed. The methodology is composed by the simulated annealing based LDPC code constructor, the GPU based high-speed code selector, the ant colony optimization based pipeline scheduler and the FPGA-based hardware implementer. Compared to the traditional ways, this methodology […]
Mar, 28

Fast, parallel and secure cryptography algorithm using Lorenz’s attractor

A novel cryptography method based on the Lorenz’s attractor chaotic system is presented. The proposed algorithm is secure and fast, making it practical for general use. We introduce the chaotic operation mode, which provides an interaction among the password, message and a chaotic system. It ensures that the algorithm yields a secure codification, even if […]
Mar, 28

Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems

We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and multi-GPU systems to support dense matrix computations efficiently. The main idea is that we treat a heterogeneous system as a distributed-memory machine, and use a heterogeneous multi-level block cyclic distribution method to allocate data to the host and […]
Mar, 27

Improving Performance of OpenCL on CPUs

Data-parallel languages like OpenCL and CUDA are an important means to exploit the computational power of today’s computing devices. In this paper, we deal with two aspects of implementing such languages on CPUs: First, we present a static analysis and an accompanying optimization to exclude code regions from control-flow to data-flow conversion, which is the […]
Mar, 27

Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU

With an increasing amount of data and demand for fast query processing, the efficiency of database operations continues to be a challenging task. A common approach is to leverage parallel hardware platforms. With the introduction of general-purpose GPU (Graphics Processing Unit) computing, massively parallel hardware has become available within commodity hardware. XML is based on […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: