Edoardo Paone
Parallel programming is a skill which software engineers no longer can do without, since multi- and many-core architectures have been widely adopted for general-purpose computing platforms. In 2006 Intel introduced the first multi-core processor on the consumer market and, at the same time, NVIDIA unveiled CUDA, a programming paradigm to exploit Graphics Processing Units (GPUs) […]
View View   Download Download (PDF)   
Lai-Huei Wang
Dataflow models are widely used for expressing the functionality of digital signal processing (DSP) applications due to their useful features, such as providing formal mechanisms for description of application functionality, imposing minimal data-dependency constraints in specifications, and exposing task and data level parallelism effectively. Due to the increased complexity of dynamics in modern DSP applications, […]
View View   Download Download (PDF)   
Hana Park, Jungmin So, Young-Woong Ko, Jeong-Gun Lee
Power and energy consumptions are also becoming important design criteria. Consequently, software designs have to consider the power/energy consumptions together with performance when they are developing software. In this paper, we explore a design space exploration with a commercial GPU: nVidia GTX 660 for investigating the best configuration of a kernel grid structure in a […]
View View   Download Download (PDF)   
Wenhao Jia, Kelly A. Shaw, Margaret Martonosi
Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow […]
View View   Download Download (PDF)   
Junying Chen
This work explored the design considerations on the real-time medical ultrasound adaptive beamformer implementations using different computing platforms: CPU, GPU and FPGA. Adaptive beamforming has been well considered as an advanced solution for improving the image quality of medical ultrasound imaging machines. Although it provides promising improvements in lateral resolution, image contrast and imaging penetration […]
View View   Download Download (PDF)   
Lai-Huei Wang, Chung-Ching Shen, Gunasekaran Seetharaman, Kannappan Palaniappan, Shuvra S. Bhattacharyya
Multidimensional synchronous dataflow (MDSDF) provides an effective model of computation for a variety of multidimensional DSP systems that have static dataflow structures. In this paper, we develop new methods for optimized implementation of MDSDF graphs on embedded platforms that employ multiple levels of parallelism to enhance performance at different levels of granularity. Our approach allows […]
View View   Download Download (PDF)   
Mahesh Nanjundappa, Anirudh Kaushik, Hiren D. Patel, Sandeep K. Shukla
Recent developments in graphics processing unit (GPU) technology has invigorated an interest in using GPUs for accelerating the simulation of SystemC models. SystemC is extensively used for design space exploration, and early performance analysis of hardware systems. SystemC’s reference implementation of the simulation kernel supports a single-threaded simulation kernel. However, modern computing platforms offer substantially […]
View View   Download Download (PDF)   
David Sheffield, Michael Anderson, Kurt Keutzer
We present Three Fingered Jack, a highly productive approach to mapping vectorizable applications to the FPGA. Our system applies traditional dependence analysis and reordering transformations to a restricted set of Python loop nests. It does this to uncover parallelism and divide computation between multiple parallel processing elements (PEs) that are automatically generated through high-level synthesis […]
Muhammad Shafiq
The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is […]
View View   Download Download (PDF)   
G. Falcao, M. Owaida, D. Novo, M. Purnaprajna, N. Bellas, C.D. Antonopoulos, G. Karakonstantis, A. Burg, P. Ienne
Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the […]
View View   Download Download (PDF)   
Georg Kunz, Daniel Schemmel, James Gross, Klaus Wehrle
Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design […]
View View   Download Download (PDF)   
Shivani Raghav, Martino Ruggiero, David Atienza, Christian Pinto, Andrea Marongiu, Luca Benini
Future architectures will feature hundreds to thousands of simple processors and on-chip memories connected through a network-on-chip. Architectural simulators will remain primary tools for design space exploration, performance (and power) evaluation of these massively parallel architectures. However, architectural simulation performance is a serious concern, as virtual platforms and simulation technology are not able to tackle […]
View View   Download Download (PDF)   
Page 1 of 212

* * *

* * *

Like us on Facebook

HGPU group

230 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1425 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: