Posts
Dec, 12
A Common GPU n-Dimensional Array for Python and C
Currently there are multiple incompatible array/matrix/n-dimensional base object implementations for GPUs. This hinders the sharing of GPU code and causes duplicate development work. This paper proposes and presents a first version of a common GPU n-dimensional array(tensor) named GpuNdArray that works with both CUDA and OpenCL. It will be usable from python, C and possibly […]
Dec, 12
Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization
Today’s productivity programmers, such as scientists who need to write code to do science, are typically forced to choose between productive and maintainable code with modest performance (e.g. Python plus native libraries such as SciPy [SciPy]) or complex, brittle, hardware-specific code that entangles application logic with performance concerns but runs two to three orders of […]
Dec, 11
Self-Supervised Clustering for Codebook Construction: An Application to Object Localization
Approaches to object localization based on codebooks do not exploit the dependencies between appearance and geometric information present in training data. This work addresses the problem of computing a codebook tailored to the task of localization by applying regularization based on geometric information. We present a novel method, the Regularized Combined Partitional-Agglomerative clustering, which extends […]
Dec, 11
Aquila: An Open-Source GPU-Accelerated Toolkit for Cognitive Robotics Research
This paper presents a novel open-source software Aquila developed as a part of the iTalk and RobotDoC projects. This software provides many different tools and biologically inspired systems that are useful for cognitive robotics research. Aquila addresses the need for high-performance robot control by adopting the latest parallel processing paradigm based on the NVidia CUDA […]
Dec, 11
Gyrokinetic Toroidal Simulations on Leading Multi-and Manycore HPC Systems
The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions […]
Dec, 11
Accelerating Swarm Intelligence Algorithms with GPU-Computing
Swarm intelligence describes the ability of groups of social animals and insects to exhibit highly organized and complex problem-solving behaviors that allow the group as a whole to accomplish tasks which are beyond the capabilities of any individual. This phenomenon found in nature is the inspiration for swarm intelligence algorithms — systems that utilize the […]
Dec, 11
Fast Face Detection Using Graphics Processor
Fast face detection is one of the key components of various computer vision applications. Viola-Jones algorithm provides a good and fast detection for low and medium resolution images. This paper proposes a new and fast approach to perform real time face detection. The proposed method includes the enhanced Haar-like features and uses SVM for training […]
Dec, 11
A Dynamic Approach to Weighted Suffix Tree Construction Algorithm
In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our […]
Dec, 11
Generalizing Execution of Vectorizable Computations by Generating Vector Oriented Byte Code
Computer simulations, which are widely used in both academia and in the industry, often work on very large data sets. This makes them well suited for harvesting the computing power of modern, highly parallel computing systems, such as GPU’s, clusters and vector processors. The challenge lies in the fact, that these systems must be programmed […]
Dec, 11
Data analysis and 3D evolution in High Energy Physics using graphic processor
One of the main challenges in High Energy Physics (HEP) is to make fast analysis of high amount of experimental and simulated data. For example, the amount of data generated at Large Hadron Collider (LHC) is estimated to reach 1 PetaByte/year. The time taken to analyze the data and to obtain fast results depends on […]
Dec, 11
ALICE HLT High Speed Tracking on GPU
The on-line event reconstruction in ALICE is performed by the High Level Trigger, which should process up to 2000 events per second in proton-proton collisions and up to 300 central events per second in heavy-ion collisions, corresponding to an input data stream of 30 GB/s. In order to fulfill the time requirements, a fast on-line […]
Dec, 11
Evaluating graph coloring on GPUs
This paper evaluates features of graph coloring algorithms implemented on graphics processing units (GPUs), comparing coloring heuristics and thread decompositions. As compared to prior work on graph coloring for other parallel architectures, we find that the large number of cores and relatively high global memory bandwidth of a GPU lead to different strategies for the […]