12058

Posts

May, 9

Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types […]
May, 9

Applying Source Level Auto-Vectorization to Aparapi Java

Ever since chip manufacturers hit the power wall preventing them from increasing processor clock speed, there has been an increased push towards parallelism for performance improvements. This parallelism comes in the form of both data parallel single instruction multiple data (SIMD) instructions, as well as parallel compute cores in both central processing units (CPUs) and […]
May, 9

Acceleration of LSB Algorithm in GPU

This paper presents a method for acceleration of LSB (Least Significant Bit) Algorithm in GPU (Graphics Processing Unit) using a programming model called CUDA. CUDA is a state-of-the-art parallel computing architecture developed by nVIDIA. CUDA allows the programmers to access the GPU directly by invoking the Kernel. In Image Steganography, parallelization of computations to a […]
May, 9

GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Detection Intruder

The network anomaly detection technology based on support vector machine (SVM) can efficiently detect unknown attacks or variants of known attacks, however, it cannot be used for detection of large-scale intrusion scenarios due to the demand of computational time. The graphics processing unit (GPU) has the characteristics of multi-threads and powerful parallel processing capability. Based […]
May, 9

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity […]
May, 9

7th International Conference on Advanced Computer Theory and Engineering, ICACTE 2014

Submission Deadline: 2014-06-05 Publication: All accepted papers of ICACTE 2014 will be published in the conference proceedings, under an ISBN reference by ASME Press, which will be included in the ASME Digital Library, and the publisher will send the proceeding to be reviewed by the Ei Compendex, ISI Proceeding and other major indexing services. Call […]
May, 7

Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc)

Modern computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA architectures. Parallel applications developers have to take locality into account before they can expect good efficiency on these platforms. Thus there is a strong need for a portable tool gathering and exposing this information. The Hardware Locality project (hwloc) offers a tree […]
May, 7

3D data denoising via Non-Local means filter by using parallel GPU strategies

Non-Local Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieve reasonable running times by […]
May, 7

Simulation of earthquake sloshing loads in a nuclear reactor

Modelling of sloshing flow inside a Lead-cooled Fast Nuclear Reactor during an earthquake is conducted, focusing on the evaluation of the loads caused by the fluid on the structure. AQUAgpusph, a free software OpenCL accelerated SPH code has been used. This tool is analysed, including the performance comparison with some available GPU accelerated SPH codes, […]
May, 7

Learning Sparse Recurrent Neural Networks in Language Modeling

In the context of statistical language modeling, we explored the task of learning an Elman network with sparse weight matrices, as a pilot study towards learning a sparsely connected fully recurrent neural network, which would be potentially useful in many cases. We also explored how efficient and scalable it can be in practice. In particular, […]
May, 7

Evolution of a double-front Rayleigh-Taylor system using a GPU-based high resolution thermal Lattice-Boltzmann model

We study the turbulent evolution originated from a system subjected to a Rayleigh-Taylor instability with a double density at high resolution in a 2 dimensional geometry using a highly optimized thermal Lattice Boltzmann code for GPUs. The novelty of our investigation stems from the initial condition, given by the superposition of three layers with three […]
May, 7

Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor: Experiments with Algorithm Adaptation

Recent Intel processors (IvyBridge, Haswell) contain an embedded on-chip GPU unit, in addition to the main CPU processor. In this work we consider the issue of efficiently mapping Fast Fourier Transform computation onto such coprocessor units. To achieve this we pursue three goals: First, we want to study half-systematic ways to adjust the actual variant […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: