17285

Posts

Jun, 10

CELES: CUDA-accelerated simulation of electromagnetic scattering by large ensembles of spheres

CELES is a freely available MATLAB toolbox to simulate light scattering by many spherical particles. Aiming at high computational performance, CELES leverages block-diagonal preconditioning, a lookup-table approach to evaluate costly functions and massively parallel execution on NVIDIA graphics processing units using the CUDA computing platform. The combination of these techniques allows to efficiently address large […]
Jun, 5

Neneta: Heterogeneous Computing Complex-Valued Neural Network Framework

Due to increased demand for computational efficiency for the training, validation and testing of artificial neural networks, many open source software frameworks have emerged. Almost exclusively GPU programming model of choice in such software frameworks is CUDA. Symptomatic is also lack of the support for complex-valued neural networks. With our research going exactly in that […]
Jun, 5

Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

Traditional speedup models, such as Amdahl’s, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured […]
Jun, 5

Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU

Reaching the so-called "performance wall" in 2004 inspired innovative approaches to performance improvement. Parallel programming, distributive computing, and System on a Chip (SOC) design drove change. Hardware acceleration in mainstream computing systems brought significant improvement in the performance of applications targeted directly to a specific hardware platform. Targeting a single hardware platform, however, typically requires […]
Jun, 5

UT-OCL: An OpenCL Framework for Embedded Systems Using Xilinx FPGAs

The number of heterogeneous components on a System-on-Chip (SoC) has continued to increase. Software developers leverage these heterogeneous systems by using high-level languages to enable the execution of applications. For the application to execute correctly, hardware support for features and constructs of the programming model need to be incorporated into the system. OpenCL is a […]
Jun, 5

A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit

Multi-start algorithms are a common and effective tool for metaheuristic searches. In this paper we amplify multi-start capabilities by employing the parallel processing power of the graphics processer unit (GPU) to quickly generate a diverse starting set of solutions for the Unconstrained Binary Quadratic Optimization Problem which are evaluated and used to implement screening methods […]
Jun, 1

Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing

The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In […]
Jun, 1

A performance spectrum for parallel computational frameworks that solve PDEs

Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to determine whether a computational framework is performing efficiently or is scalable. The aim of this paper is to present various […]
Jun, 1

SparkJNI: A Reference Design for a Heterogeneous Apache Spark Framework

The digital era’s requirements pose many challenges related to deployment, implementation and efficient resource utilization in modern hybrid computing infrastructures. In light of the recent improvements in computing units, the defacto structure of a high-performance computing cluster, ordinarily consisted of CPUs only, is superseeded by heterogeneous architectures (comprised of GPUs, FPGAs and DSPs) which offer […]
Jun, 1

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating […]
Jun, 1

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations make […]
May, 24

Accelerating Discrete Wavelet Transforms on GPUs

The two-dimensional discrete wavelet transform has a huge number of applications in image-processing techniques. Until now, several papers compared the performance of such transform on graphics processing units (GPUs). However, all of them only dealt with lifting and convolution computation schemes. In this paper, we show that corresponding horizontal and vertical lifting parts of the […]
Page 7 of 926« First...56789...203040...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: