4422

Posts

Apr, 2

A characterization and analysis of PTX kernels

General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA’s CUDA, OpenCL, and Intel’s Ct. While significant effort has been focused on developing and evaluating applications and software tools, comparatively […]
Apr, 2

Parallel computing with CUDA

Summary form only given. NVIDIA’s CUDA architecture provides a powerful platform for writing highly parallel programs. By providing simple abstractions for hierarchical thread organization, memories, and synchronization, the CUDA programming model allows programmers to write scalable programs without the burden of learning a multitude of new programming constructs. The CUDA architecture can support many languages […]
Apr, 1

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without […]
Apr, 1

Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC

Parallel processor architectures are a promising solution to provide the required computing performance for current and future high performance applications. Certainly, the impact on the computational power of such a parallel computer system is related to the inherent parallelism of the algorithm to be implemented. The implementation of an algorithm onto a parallel computer architecture, […]
Apr, 1

A Dynamic Resource Management and Scheduling Environment for Embedded Multimedia and Communications Platforms

We present a framework, OpenCLosE, for dynamic resource management and scheduling of applications written in open compute language (OpenCL) for heterogeneous multimedia and graphics platforms, such as those found in multimedia smartphones and automotive infotainment clusters. We describe the design of a resource manager and master scheduler for the OpenCLosE environment, that allows efficient realization […]
Apr, 1

Poster: GPU-accelerated artificial neural network for QSAR modeling

Here, we present a GPU-accelerated OpenCL implementation of a back-propagation artificial neural network for the creation of QSAR models for drug discovery and virtual high-throughput screening. A QSAR model for HSD achieved an enrichment of 5.9 and area under the curve of 0.83 on an independent data set which signifies sufficient predictive ability for virtual […]
Apr, 1

Efficient PageRank and SpMV Computation on AMD GPUs

Google’s famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the power method for computing PageRank on AMD GPUs. The core component of the power method […]
Apr, 1

Optimizing Smith-Waterman algorithm on Graphics Processing Unit

Local Sequence alignment is an important task for bioinformatics. The most widely used algorithm is Smith-Waterman has a quadratic time complexity which is time consuming especially in large biological database search. Many attempts were made to accelerate Smith-Waterman using parallel architecture. In this paper a parallel implementation of Smith Waterman algorithm will be presented. This […]
Mar, 28

Programming Massively Parallel Architectures using MARTE: a Case Study

Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general- purpose microprocessors has slowed significantly, the GPUs have continued to […]
Mar, 18

Using Parallel Computing for the Display and Simulation of the Space Debris Environment

Parallelism is becoming the leading paradigm in today’s computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating […]
Mar, 17

Language virtualization for heterogeneous parallel computing

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Most programmersare having a difficult time using these programming models […]
Mar, 9

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. […]
Page 99 of 102« First...102030...979899100101...Last »

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: