Category Archives: Development

Calling CUDA Kernels from C++

I had a need to call a CUDA kernel from inside a C++ class. When I went looking through the NVIDIA examples and the wealth of information on the internet, I could not find a clear solution to my problem. So, let me state plainly how to accomplish this:

  1. Create a ".cu" file for your kernels;
  2. For each kernel you would like to call from C++, create a wrapper that can be called from inside your class;
  3. Inside this wrapper, call your kernel function;
  4. Include the prototype for your wrapper in the header file of your class, but not inside the class definition.

An example of the ".cu" file and the class header are shown below. One thing you apparently do not need to do (particularly in Visual Studio 2010) is include the ".cu" file in your class. As long as it is part of your project, everything should work just fine.

#ifndef _PROJECT_KERNELS
#define _PROJECT_KERNELS

__global__ void kernel(double Param1, double Param2){
    // Do Kernel Stuff here
}
extern "C" void functionName(double Param1, double Param2){

    kernel<<<Nb, Nt>>kernel(Param1, Param2);
}
#endif
#ifndef MYCLASS_H_
#define MYCLASS_H_

#include <cuda.h>
#include "cuda_runtime.h"

extern "C" void functionName(double Param1, double Param2);

class myClass{
    myClass();
}
#endif

Coding Guidelines: Finding the Art in the Science - Part 2

After writing Part 1 of this post, I received a very awesome Christmas gift - "The Pragmatic Programmer: From Journeyman to Master" by Andrew Hunt & Dave Thomas. While I've only dug into the first few chapters of this book, I like the majority of what is said. While there was one section of Chapter 1 that really struck a chord with me (section 6 where the authors speak about communication of various types), reading the book spurred me to take a look around pragprog.com, something that I haven't done for quite a while.

While at pragprog.com, I stumbled across their magazine, PragPub. In the September 2009 magazine there is a wonderful article titled, "Beauty in Code." The entire article deals with understanding the function and meaning of these unique snippets of code. While this is a fun challenge (I may use this in the classroom someday), it makes a solid point regarding development that is very well stated in the article:

We translate user stories into source code. How much of it survives the operation? More importantly, how much of it gets lost in the translation?

The issue being dealt with here is clarity. Clarity in code is vital and adds directly to the beauty of what is being developed in both form in function. While I could rant on this topic for quite some time, I will simply let this simmer for a while.

I leave you with two excellent quotes. The first is from the article I've discussed and the other is from Java.next #4: Immutability.

If you do it right, the beauty you see is multifaceted. If the code (as it runs) is as beautiful as the code (as it is written), then you are singing it just right.

Languages are not about what they make possible, but about what they make beautiful.

Coding Guidelines: Finding the Art in the Science - Part 1

Recently, I wrote a post about an article titled, "Software Engineering for Scientists". One of the strong points made in that article was that

There’s 60 years of existing scientific knowledge buried in code and it’s extremely difficult to extract. As scientists write new code, they need to clearly ex­press intent in a way that doesn’t affect code performance.

This reinforces the idea that there is a huge issue concerning the writing of clear, concise, and understandable code in industry and academics. In light of this idea, I've recently published an article with Henry Ledgard titled "Coding Guidelines: Finding the Art in the Science" that can be found here and here. The article itself is focused on the idea that there is a role that aesthetics plays when writing code. This is perhaps summed up best by the first few lines of the article:

Computer science is both a science and an art. Its scientific aspects range from the theory of computation and algorithmic studies to code design and program architecture. Yet, when it comes time for implementation, there is a combination of artistic flare, nuanced style, and technical prowess that separates good code from great code.

This article, as opposed to those that layout strict coding guidelines to increase performance and maintainability, is focused on developing code that is easier to read by focusing on the aesthetic appeal of the code. And, while it has not been studied yet, code that appeals to developers in this fashion should lead to 1) Easier maintenance, 2) Increased productivity, and 3) A shorter period of acclimation when working with an unfamiliar code base.

In the paper, we present 6 distinct guidelines that should be followed as much is possible:

  1. Consider a program as a table;
  2. Let simple English be your guide;
  3. Rely on context to simplify code;
  4. Use white space to show structure;
  5. Let decision structures speak for themselves; and
  6. Focus on the code, not the comments.

Some of these ideas are controversial and some people will see a few things that we have presented as impractical. That is why we have presented our ideas as "guidelines", not "standards". This is a flexible set of ideas that can be molded to fit individuals and corporations alike.

Beyond this, the use of aesthetics in coding suggests that there are some general principles that will aid in making code appealing to nearly everyone, though there will be some nuances between different developers.

One thing I would like to mention is that some folks have pointed out some errors in the Figures (particularly Figure 1). I would encourage readers not to lose sight of the "forest for the trees" in this situation as requests have been made to fix these minor errors. This is particularly true as the guidelines are intended to be IDE, language, and syntax neutral. Thus, the visual appeal (and the point) of the examples should be apparent regardless.

 

Particle Swarm Optimization (PSO) in Matlab

Here is a very simple version of PSO in Matlab. PSO is a very popular, population based metaheuristic algorithm that mimics swarming behavior and swarm intelligence in order to solve optimization problems.

The code below is intended to get you started working with PSO in Matlab or Octave. Best efforts were made to keep the code clean and easy to understand. Feel free to play with it and contact me with any questions.

Click Here to Download PSO.m

Genetic Algorithms on GPU using CUDA

Some references for GA on GPU. If you know of any further resources, please contact me.

Downloads: PDF | Bibtex

[1] Q. Yu, C. Chen, and Z. Pan, “Parallel genetic algorithms on programmable graphics hardware,” in Lecture Notes in Computer Science
3612. Springer, 2005, p. 1051.

[2] P. Pospichal and J. Jaros, “GPU-based Acceleratino of the Genetic Algorithm,” in Proceedings of GECCO 2009, 2009.

[3] A. Munawar, M. Wahib, M. Munetomo, and K. Akama, “Hybrid of genetic algorithm and local search to solve max-sat problem using NVIDIA CUDA framework,” Genetic Programming and Evolvable Machines, vol. 10, pp. 391–415, 2009.

[4] S. Debattistic, N. Marlat, L. Mussi, and S. Cagnoni, “Implementatino of a Simple Genetic Algorithm within the CUDA Architecture,” in Proceedings of GECCO 2009, 2009.

[5] S. Zhang and Z. He, “Implementation of parallel genetic algorithm based on cuda,” in Advances in Computation and Intelligence, ser. Lecture Notes in Computer Science, Z. Cai, Z. Li, Z. Kang, and Y. Liu, Eds. Springer Berlin / Heidelberg, 2009, vol. 5821, pp. 24–30.

[6] S. Tsutsui and N. Fujimoto, “Solving quadratic assignment problems by genetic algorithms with gpu computation: a case study,” in Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, ser. GECCO ’09. New York, NY, USA: ACM, 2009, pp. 2523–2530.

[7] P. Vidal and E. Alba, “A multi-gpu implementation of a cellular genetic algorithm,” in 2010 IEEE Congress on Evolutionary Computation (CEC), July 2010, pp. 1–7.

[8] ——, “Cellular genetic algorithm on graphic processing units,” in Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), ser. Studies in Computational Intelligence, J. Gonzlez, D. Pelta, C. Cruz, G. Terrazas, and N. Krasnogor, Eds. Springer Berlin/Heidelberg, 2010, vol. 284, pp. 223–232.

[9] R. Arora, R. Tulshyan, and K. Deb, “Parallelization of binary and real-coded genetic algorithms on GPU using CUDA,” in IEEE Congress on Evolutionary Computation, 2010, pp. 1–8.

[10] N. Fujimoto and S. Tsutsui, “A highly-parallel tsp solver for a gpu computing platform,” in Proceedings of the 7th international conference on Numerical methods and applications, ser. NMA’10. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 264–271.

Artificial Immune Optimization on the GPU using CUDA

Some references for AIS on GPU. If you know of any further resources, please contact me.

Downloads: PDF | Bibtex

[1] J. Zhao, Q. Liu, W. Wang, Z. Wei, and P. Shi, “A parallel immune algorithm for traveling salesman problem and its application on cold
rolling scheduling,” Information Sciences, vol. 181, no. 7, pp. 1212 – 1223, 2011.
[2] J. Li, L. Zhang, and L. Liu, “A Parallel Immune Algorithm Based on Fine-grained Model with GPU-Acceleration,” in Foruth International
Conference on Innovative Computing, Information, and Control, 2009, pp. 683–686.

Ant Colony Optimization on GPU using CUDA

Some references for ACO on GPU. If you know of any further resources, please contact me.

Downloads: PDF | Bibtex

[1] J. Li, X. Hu, Z. Pang, and K. Qian, “A Parallel Ant Colony Optimization Algorithm based on Fine-Grained Model with GPU-Acceleration,” Internation Journal of Innovative Computing, Information, and Control, vol. 5, no. 11(A), November 2009.

[2] Y.-S. You, “Parallel ant system for traveling salesman problem on GPUs,” in Proceedings of GECCO 2009, 2009.

[3] S. Sanci, “A Parallel Algorithm for Flight Route Planning on GPU using CUDA,” Master’s thesis, Middle East Technical University,
Turkey, 2010.

[4] J. M. Cecilia, J. M. Garca, M. Ujaldon, A. Nisbet, and M. Amos, “Parallelization strategies for ant colony optimisation on GPUs,”
Computing Research Repository, pp. –1–1, 2011.

Moore's vs. May's

I just read Doublas Eadline's post at  Linux Magazine and found it really fascinating. The best tidbit to chew on is:

This is where all the hardware excitement meets the cold reality of parallel programming. We are all familiar with “Moore’s Law” (the transistor density doubles every two years). Many people have probably not heard of “May’s Law.” (for the purist, you can substitute “trend” for “law”). In any case, David May states his law as follows, “Software efficiency halves every 18 months, compensating for Moore’s Law.” Think about it. Every new generation of hardware

Moore's Law - The number of transistors on a chip doubles roughly every 2 years

May's Law - Software efficiency halves every 18 months, compensating for Moore’s Law.

Chew on that.

Installing the MATLAB plug-in for CUDA in Ubuntu 10.04

I had previously installed CUDA and CUDA SDK on my computer running Ubuntu 10.04. Matlab is also installed on the computer, and some of the students are currently using libSVM for data classification research. One of the major problems that we are having is that the data sets are so large that libSVM is simply too slow to use. The solution - let's add cuSVM to the installation so they can speed up their code easily.

I planned on following the instructions here where the first step is "Download Matlab plug-in for Cuda (Version 1.1) from http://developer.nvidia.com/object/matlab_cuda.html". Along the way I decided that I would not only download the Matlab plug-in for Cuda, but that I would install and test it as well. Though, along the way I ran into a couple of issues. Here are the steps that I followed:

  1. Download the Matlab plug-in for CUDA and extract it
  2. Open the Makefile for editing
  3. Add a line at the very beginning of the file that specifies the current directory. In my example this is:
    CWD=/path/to/my/home/directory/Matlab_Cuda_1.1
  4. If you are on a 64-bit system (as I am), make sure that line 6 reads as :
    INCLUDELIB  = -L$(CUDAHOME)/lib64 -lcufft  -Wl,-rpath,$(CUDAHOME)/lib64
  5. Change line 11 (export Matlab=/usr/local/matlab) so that it matches your Matlab installation. In my case this was /opt/Matlab
  6. On lines 38, 45, and 52 you will find references to a script called nvopts.sh. When I first ran this Makefile, I continually got errors telling me that nvopts.sh did not exist. How to fix this? Add the correct path to the nvopts.sh reference. I did this by following the instructions in step 3 and then I adjusted lines 39, 46, and 53 to read as follows:
    $(NVMEX) -f $(CWD)/nvopts.sh $(INCLUDEDIR) $(INCLUDELIB)
  7. Save the Makefile
  8. Open the file called nvmex
  9. Comment out lines 1448-1454. Just in case I have the line numbers slightly wrong, these lines read like this:
    if [ "$compile_only" != "1" ]; then
        if [ "$gateway_lang" = "C" ]; then
        	files="$files $MATLAB/extern/src/mexversion.c"
        else
            files="$files $MATLAB/extern/lib/$Arch/version4.o"
        fi
    fi

    and should now look like

    #if [ "$compile_only" != "1" ]; then
    #    if [ "$gateway_lang" = "C" ]; then
    #    	files="$files $MATLAB/extern/src/mexversion.c"
    #    else
    #        files="$files $MATLAB/extern/lib/$Arch/version4.o"
    #    fi
    #fi
  10. Save nvmex
  11. In the case that your paths are not set properly (mine are not) open up nvopts.sh. Make sure that the call to nvcc on lines 56, 87 and 164 are set correctly. I changed these from
    CC='nvcc'

    to

    CC='/usr/local/cuda/bin/nvcc'
  12. Save nvmex
  13. Open the terminal in the directory where your Makefile is and type 'make all'. Enjoy!

NS-2 and Classifiers

I am currently adding a custom classifier to NS2 for some research. To start, I thought that I would build a small model and install a classifier just to see how things worked. Easy? No, nothing with NS2 is easy, simple, or clean (at least this is the impression that I am left with). I started with a section of code that looked like:

#Create two nodes
set n0 [$ns node]
set n1 [$ns node]
set n2 [$ns node]
 
set clsfr [new Classifier/Hash/Dest 32]
set rm1 [new RtModule/Base]
$n1 insert-entry $rm1 $clsfr
 
#Create a duplex link between the nodes
$ns duplex-link $n0 $n1 1Mb 10ms DropTail
$ns duplex-link $n1 $n2 1Mb 10ms DropTail
 
#Create a UDP agent and attach it to node n0
set udp0 [new Agent/UDP]
$ns attach-agent $n0 $udp0
 
# Create a CBR traffic source and attach it to udp0
set cbr0 [new Application/Traffic/CBR]
$cbr0 set packetSize_ 500
$cbr0 set interval_ 0.005
$cbr0 attach-agent $udp0
 
#Create a Null agent (a traffic sink) and attach it to node n1
set null0 [new Agent/Null]
$ns attach-agent $n2 $null0

Apparently the problem with this code is the order in which things are added. If you run this code, the following error (or something similar will be produced):

--- Classfier::no-slot{} default handler (tcl/lib/ns-lib.tcl) ---
	_o19: no target for slot -1
	_o19 type: Classifier/Hash/Dest
content dump:
classifier _o19
	0 offset
	0 shift
	2147483647 mask
	0 slots
	-1 default
---------- Finished standard no-slot{} default handler ----------

This error is a common error in NS2 that appears to be cause because things are not connected correctly. The problem with the above example is that the nodes are not properly connected before the new routing module and classifier are added. The proper implementation should look like:

#Create two nodes
set n0 [$ns node]
set n1 [$ns node]
set n2 [$ns node]
 
#Create a duplex link between the nodes
$ns duplex-link $n0 $n1 1Mb 10ms DropTail
$ns duplex-link $n1 $n2 1Mb 10ms DropTail
 
set clsfr [new Classifier/Hash/Dest 32]
set rm1 [new RtModule/Base]
$n1 insert-entry $rm1 $clsfr
 
#Create a UDP agent and attach it to node n0
set udp0 [new Agent/UDP]
$ns attach-agent $n0 $udp0
 
# Create a CBR traffic source and attach it to udp0
set cbr0 [new Application/Traffic/CBR]
$cbr0 set packetSize_ 500
$cbr0 set interval_ 0.005
$cbr0 attach-agent $udp0
 
#Create a Null agent (a traffic sink) and attach it to node n1
set null0 [new Agent/Null]
$ns attach-agent $n2 $null0
 
#Connect the traffic source with the traffic sink
$ns connect $udp0 $null0