Apr, 20

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of timestepped simulations, little is known about the applicability of […]
Apr, 20

An efficient GPU implementation of the revised simplex method

The computational power provided by the massive parallelism of modern graphics processing units (GPUs) has moved increasingly into focus over the past few years. In particular, general purpose computing on GPUs (GPGPU) is attracting attention among researchers and practitioners alike. Yet GPGPU research is still in its infancy, and a major challenge is to rearrange […]
Apr, 20

Tutorial 3: Methodologies and Performance Impacts of General Purpose Computing on GPUs

Graphics Processing Units (GPUs) has been applied to graphics applications to implement realistic perspectives of virtual scenes especially in entertainment market. Due to the demands from the market for creating super high definition scenes with high frame rate that simulates physics phenomenon naturally in visualization applications, the last decade promoted drastic performance improvement of GPUs. […]
Apr, 20

Design and implementation of software-managed caches for multicores with local memory

Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies from different types of memory accesses add overhead and adversely affect instruction scheduling. Instead, the accelerator cores have internal local memory to place their code and data. Programmers of such […]
Apr, 20

Compressing Floating-Point Number Stream for Numerical Applications

A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator […]
Apr, 19

HPP-Controller: An intra-node controller designed for connecting heterogeneous CPUs

Heterogeneity is considered as a solution for supercomputers to scale to petascale. Many systems which are composed of general CPUs and special processing units such as Cells, GPGPUs and FPGAs have been implemented. In these systems, CPU needs interact with special processing units to process data together, thus communications between these heterogeneous processing units become […]
Apr, 19

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also […]
Apr, 19

Parallel Approaches for SWAMP Sequence Alignment

This document is a summary and overview of several approaches to implement the local sequence alignment algorithms known as SWAMP and SWAMP+ on commercially available hardware. Using a Smith-Waterman style of alignment, these parallel algorithms have several innovative extensions that take advantage of the ASC associative computing model while maintaining speed, accuracy, and producing a […]
Apr, 19

A Hybrid Analytical DRAM Performance Model

As process technology scales, the number of transistors that can fit in a unit area has increased exponentially. Processor throughput, memory storage, and memory throughput have all been increasing at an exponential pace. As such, DRAM has become an ever-tightening bottleneck for applications with irregular memory access patterns. Computer architects in industry sometimes use ad […]
Apr, 19

Extending the Scalability of Single Chip Stream Processors with On-chip Caches

As semiconductor scaling continues, more transistors can be put onto the same chip despite growing challenges in clock frequency scaling. Stream processor architectures can make effective use of these additional resources for appropriate applications. However, it is important that programmer effort be amortized across future generations of stream processor architectures. Current industry projections suggest a […]
Apr, 19

A Task-centric Memory Model for Scalable Accelerator Architectures

This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, […]
Apr, 19

Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in today’s desktop and notebook computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into […]
Page 574 of 768« First...102030...572573574575576...580590600...Last »

* * *

* * *

Like us on Facebook

HGPU group

184 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1314 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: