11612

Posts

Mar, 7

2014 the 4th International Conference on Computer and Communication Devices, ICCCD 2014

2014-05-20 All accepted papers will be published in the volume of International Journal of Computer Theory and Engineering (IJCTE), and will be included in the Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, INSPEC, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET). Analog and Mixed-Signal IC Design and Testing RF […]
Mar, 6

OCCA: A unified approach to multi-threading languages

The inability to predict lasting languages and architectures led us to develop OCCA, a C++ library focused on host-device interaction. Using run-time compilation and macro expansions, the result is a novel single kernel language that expands to multiple threading languages. Currently, OCCA supports device kernel expansions for the OpenMP, OpenCL, and CUDA platforms. Computational results […]
Mar, 6

Hybrid Framework for pairwise DNA Sequence Alignment Using the CUDA compatible GPU

This paper provides a novel framework for accelerating the solution of the pairwise DNA sequence alignment problem using CUDA parallel paradigm available on the NVIDIA GPU. The main idea is to implement a new algorithm that assigns different nucleotide weights using GPU architectures then merge the subsequences of match using CPU to get the optimum […]
Mar, 6

Code Optimization and Scaling of the Astrophysics Software Gadget on Intel Xeon Phi

The whitepaper reports our investigation into the porting, optimization and subsequent performance of the astrophysics software package GADGET, on the Intel Xeon Phi. The GADGET code is intended for cosmological N-body/SPH simulations to solve a wide range of astrophysical tasks. The test cases within the project were simulations of galaxy systems. A performance analysis of […]
Mar, 6

Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs

Floating point precision and performance and the ratio of floating point units to integer processing elements on a graphics processing unit accelerator all continue to present complex tradeoffs for optimising core utilisation on modern devices. We investigate various hybrid CPU and GPU combinations using a range of different GPU models occupying different points in this […]
Mar, 6

Performance Analysis for GPU-based Ray-triangle Algorithms

Several algorithms have been proposed during the past years to solve the ray-triangle intersection test. In this paper we collect the most prominent solutions and describe how to parallelize them on modern programmable graphics processing units (GPUs) by means of NVIDIA CUDA. This paper also provides a comprehensive performance analysis based on several optional features […]
Mar, 6

Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs

Analyzing how species are distributed on the Earth has been one of the fundamental questions in the intersections of environmental sciences, geosciences and biological sciences. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer […]
Mar, 4

2014 3rd International Conference on Knowledge and Education Technology, ICKET 2014

2014-05-01 All ICKET 2014 papers will be published in International Journal of Information and Education Technology (ISSN: 2010-3689), and all papers will be indexed by Engineering & Technology Digital Library, Google Scholar, Crossref and ProQuest. Information Technology and Applications Augmented and Virtual Reality Computer Human Interaction Cyber Security;Data Structure and Algorithm Distributed and Parallel Computing […]
Mar, 4

QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We […]
Mar, 4

On-Demand Source Code Generation & Scheduling Optimised Parallel Applications on Heterogeneous Platforms

Scheduling applications tasks across heterogeneous clusters is a growing problem, particularly when new upgraded components are added to a parallel computing system that may have originally been homogeneous. We describe how automatic and just-in-time source code generation techniques can be used to make the best parallel decomposition for whatever resource is available in a heterogeneous […]
Mar, 4

Computational Experiments in Markov Chain Monte Carlo

In this thesis, I investigate computational questions in Markov chain Monte Carlo (MCMC). I am investigating one new MCMC method called the stretch move ensemble sampler [3]. I have looked at the performance of this algorithm, in terms of acceptance rates, autocorrelation time and compute performance. The thesis describes a parallel implementation of the algorithm […]
Mar, 4

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

GPGPU (general purpose computing on graphics processing units) programming is one interesting way to increase performance; unfortunately it is not easily done, because extensive knowledge of the GPU’s architecture is required to write programs that are faster than CPU programs. Obsidian is an embedded domain specific language for writing GPGPU kernels, which tries to make […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: