12062

Posts

May, 10

Fine-grain Task Aggregation and Coordination on GPUs

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction, which facilitates dynamically aggregating asynchronously produced […]
May, 10

A Study of the Parallelization of Hybrid SAT Solver using CUDA

SAT solver is an algorithm for finding the solution of a given problem by using CNF (Conjunctive Normal Form). Recently SAT solver studies have focused on the aspect of cryptography. The purpose of this paper is to construct the framework of a parallel SAT solver that can be applied to cryptanalysis. First, we transform an […]
May, 10

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

BACKGROUND: Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally […]
May, 10

3D Objects Tracking by GPGPU-Enhanced Particle Filter Algorithms

Objects tracking methods have been wildly used in the field of video surveillance, motion monitoring, robotics and so on. Particle filter is one of the promising methods, but it is difficult to apply for real time objects tracking because of its high computation cost. In order to reduce the processing cost without sacrificing the tracking […]
May, 9

Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types […]
May, 9

Applying Source Level Auto-Vectorization to Aparapi Java

Ever since chip manufacturers hit the power wall preventing them from increasing processor clock speed, there has been an increased push towards parallelism for performance improvements. This parallelism comes in the form of both data parallel single instruction multiple data (SIMD) instructions, as well as parallel compute cores in both central processing units (CPUs) and […]
May, 9

Acceleration of LSB Algorithm in GPU

This paper presents a method for acceleration of LSB (Least Significant Bit) Algorithm in GPU (Graphics Processing Unit) using a programming model called CUDA. CUDA is a state-of-the-art parallel computing architecture developed by nVIDIA. CUDA allows the programmers to access the GPU directly by invoking the Kernel. In Image Steganography, parallelization of computations to a […]
May, 9

GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Detection Intruder

The network anomaly detection technology based on support vector machine (SVM) can efficiently detect unknown attacks or variants of known attacks, however, it cannot be used for detection of large-scale intrusion scenarios due to the demand of computational time. The graphics processing unit (GPU) has the characteristics of multi-threads and powerful parallel processing capability. Based […]
May, 9

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity […]
May, 9

7th International Conference on Advanced Computer Theory and Engineering, ICACTE 2014

Submission Deadline: 2014-06-05 Publication: All accepted papers of ICACTE 2014 will be published in the conference proceedings, under an ISBN reference by ASME Press, which will be included in the ASME Digital Library, and the publisher will send the proceeding to be reviewed by the Ei Compendex, ISI Proceeding and other major indexing services. Call […]
May, 7

Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc)

Modern computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA architectures. Parallel applications developers have to take locality into account before they can expect good efficiency on these platforms. Thus there is a strong need for a portable tool gathering and exposing this information. The Hardware Locality project (hwloc) offers a tree […]
May, 7

3D data denoising via Non-Local means filter by using parallel GPU strategies

Non-Local Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieve reasonable running times by […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: