Posts
Apr, 14
Breadth First Search Vectorization on the Intel Xeon Phi
Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. […]
Apr, 14
High-level GPU programming in Julia
GPUs are popular devices for accelerating scientific calculations. However, as GPU code is usually written in low-level languages, it breaks the abstractions of high-level languages popular with scientific programmers. To overcome this, we present a framework for CUDA GPU programming in the high-level Julia programming language. This framework compiles Julia source code for GPU execution, […]
Apr, 14
GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring
Fisher vector has been widely used in many multimedia retrieval and visual recognition applications with good performance. However, the computation complexity prevents its usage in real-time video monitoring. In this work, we proposed and implemented GPU-FV, a fast Fisher vector extraction method with the help of modern GPUs. The challenge of implementing Fisher vector on […]
Apr, 14
GPIC – GPU Power Iteration Cluster
This work presents a new clustering algorithm, the GPIC, a Graphics Processing Unit (GPU) accelerated algorithm for Power Iteration Clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintining the algorith original properties. The proposed method was compared against the serial and parallel Spark implementation, […]
Apr, 14
A smooth particle hydrodynamics code to model collisions between solid, self-gravitating objects
Modern graphics processing units (GPUs) lead to a major increase in the performance of the computation of astrophysical simulations. Owing to the different nature of GPU architecture compared to traditional central processing units (CPUs) such as x86 architecture, existing numerical codes cannot be easily migrated to run on GPU. Here, we present a new implementation […]
Apr, 12
CUED-RNNLM – An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
In recent years, recurrent neural network language models (RNNLMs) have become increasingly popular for a range of applications including speech recognition. However, the training of RNNLMs is computationally expensive, which limits the quantity of data, and size of network, that can be used. In order to fully exploit the power of RNNLMs, efficient training implementations […]
Apr, 12
Efficient Parallel Implementation for Single Block Orthogonal Dictionary Learning
Dictionary training for sparse representations involves dealing with large chunks of data and complex algorithms that determine time consuming tasks. In this paper we propose an improved parallel version for the single block orthogonal dictionary learning algorithm that reduces the representation error and improves the execution time. Our solution targets OpenCL capable graphical device units […]
Apr, 12
Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing
Accelerators, such as Graphic Processing Units (GPUs), are popular components of modern parallel systems. Their energy-efficient performance make them attractive components for modern data center nodes. However, they lack control for fair resource sharing amongst multiple users. This paper presents a runtime and Just In Time compiler that enables resource sharing control and software managed […]
Apr, 12
Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments
Massively data-parallel processors, Graphics Processing Units (GPUs) in particular, have recently entered the main stream of general-purpose computing as powerful hardware accelerators to a large scope of applications including databases, medical informatics, and big data analytics. However, despite their performance benefit and cost effectiveness, the utilization of GPUs in production systems still remains limited. A […]
Apr, 12
Real-Time Computation of Parameter Fitting and Image Reconstruction Using Graphical Processing Units
In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper […]
Apr, 9
GIFT: A Real-time and Scalable 3D Shape Search Engine
Projective analysis is an important solution for 3D shape retrieval, since human visual perceptions of 3D shapes rely on various 2D observations from different view points. Although multiple informative and discriminative views are utilized, most projection-based retrieval systems suffer from heavy computational cost, thus cannot satisfy the basic requirement of scalability for search engines. In […]
Apr, 9
Monte-Carlo Black-Scholes Implementation using OpenCL Standard
The OpenCL is a standard parallel language which is based on C language. It offers users to take full advantage and also provide the flexibility of high level language. In this paper, we explore the use of OpenCL language to implement the complex design on FPGAs by describing the design with high level abstraction language. […]