It’s Wonderful When Things are Going Right

Today, I’m in the process of starting a new research project, and so far things are going very well. I’m hopeful that this trend will continue, and that the project can be completed, written up and submitted to a journal in just a few weeks. Now, the point of this post is not to simply boast about how well things are going (though I don’t think it’s important to celebrate life’s little victories). The point I would like to make is about why things are going well, kind of building on yesterday’s post about things I’ve learned about programming.

First, let me describe the project briefly (this actually goes on for a while, and isn’t completely relevant to the main point of the post, so if you don’t want to know the details click here to get to the point). As I’ve mentioned in the past, I study cosmology and large scale structure of the Universe, and I use numerical methods to help me in that pursuit. The new project deals with the fact that galaxies in our Universe are moving, and since our primary tool for measuring the distance to galaxies is their spectroscopic redshift, our distance measurements are thrown off by galaxies moving thanks to the Doppler effect. See, what we want to measure is the cosmological redshift which is due to the fact that space-time is expanding as the light from the galaxy is traveling towards us. This redshift should be directly related to the distance the light has traveled, given some model for our Universe’s expansion history, giving us the distance to the galaxy. However, the galaxy that emitted the light is most likely moving through space as well, and not simply being dragged along with the cosmological expansion. This peculiar motion, as it is called, will also shift the wavelengths of light either towards the blue if its peculiar motion is towards us, or towards the red if its peculiar motion is away from us.

The problem we have is that there is no reliable way of distinguishing between the two effects without building in more model based assumptions. So, the measured redshift is simply used to calculate a distance, and our 3D picture of the Universe is skewed by these peculiar motions of galaxies. These are generally referred to as redshift space distortions (RSD), and they create the famous (or infamous) Fingers-of-God effect since the distortions only occur along our line of sight to the galaxy. They also create more subtle effects, but the main point is that we have a distorted view of the Universe. Now, instead of trying to get rid of these effects (which is done in some cases), we can actually try to measure the level of distortion, and since the distortions are caused by the peculiar velocities due to gravitational forces between the galaxies, the distortions can let use test gravitational theory on the largest scales imaginable.

Sounds great, but measuring the distortions is not an easy task, and my current project is basically examining some of the difficulties involved (it may grow to encompass more, but it’s still early in the project). To study all of this, I need to create mock catalogs of galaxies with velocities which I can do via code that I’ve named LNKNLogs for Log-Normal mocK aNisotropic cataLogs. This code using the log-normal method (a very fast way of generating point distributions that to low order resemble our Universe), and since it generates a density field in the process, uses that field to compute the gravitational field, which in linear theory is simply proportional to the peculiar velocity, so you can easily generate some physically reasonable velocities for galaxies.

In order to generate these mock catalogs, we have to have a particular type of galaxy tracer in mind, which will have a particular bias, which is a measure of how clumped together they are with respect to the dark matter in the Universe. The tracers we have in mind of typically referred to as bright galaxies, and the survey that we are going to mimic gave an equation for the bias of these tracers that they used in their models. In order to use this equation, I needed to solve an integral equation at least 5 times, and it was better to solve it many more times than that so that I could generate a nice, smooth, cubic spline of the data. The reason I needed to do this was that the equation gave me bias as a function of redshift, and I needed to map out the bias for the redshift range we will be modeling in our mocks. So, integrating an equation by hand 1000 that many times was of course out of the question, so I wrote a short C++ program that used GSL (the GNU Scientific Library) to do the integration. In the end the code was 61 lines (could be even shorter, but I have some white-space here and there to help separate some blocks of code), it finishes almost immediately after hitting enter on the command line to run it, and it works exactly as intended. I only had one compilation error which was an easy fix as it was caused by copying and pasting something and forgetting to alter a variable name.

That program is a text book case of things going very well. It took very little time to write, and worked right away. While it is definitely important to analyze what happened when things go wrong, it can be as beneficial, if not more so, to do that when things go right. Why did this work so well when other times there are many struggles? What made this case special? How was it really no different than any other case? Asking these questions can help set you up for more success in the future.

Time for introspection

Let’s start with what made this small piece of code different than others:

  1. It was a small piece of code, which means not many places to make mistakes.
  2. It didn’t require me to do anything new.
  3. I already had the knowledge of how to solve the problem.

So, elaborating a bit point by point, since the code was only 61 lines long, it also wasn’t very complex, and there are very few places to make mistakes. In longer pieces of code, the chances of a typo or mistake being missed increases. In complex code that contains many source files and links a lot of libraries, it can take a long time to track down what’s causing problems (hence the reason bugs can persist in commercial software for a long time). While I had never written a piece of code to solve this exact problem, my past experience meant that I just had to do things I had done before, but with subtle differences. I have used GSL to do integration before, I just had to have it integrate a different function. I have a parameter library already written, so I just had to use it so that the code I was writing can now be more general (i.e. take some different model parameters). Lastly, given the brief equation, I knew exactly where to look to expand it out to a solvable equation to plug into a GSL function to integrate. All of this contributed to the code being very easy to write.

So, what made this no different than other codes:

  1. It needed to be accurate.
  2. It needed to be fast.
  3. It needed to be developed quickly.
  4. It should be reusable so that in the future, you don’t even need to write it.

Most of the time, the above will be true for all programs you need to write (though, sometimes it’s okay to ignore number 3 in favor of numbers 1, 2 and 4). The first goes without saying. If your code isn’t accurate (i.e. doing what it should be) then it’s worthless. The second is very desirable, after all, you don’t want to be waiting around for ever for your code to calculate something when the whole point is that a computer should be doing it very quickly. The third is just a reflection of the fact that most of the time, you just need something to solve a problem and you need it now (again, you don’t want to be wasting too much time that could be spent doing other things). Lastly, if you aim to make it reusable now, it make take a couple extra minutes, but it could save you time in the future since you can reuse the same program for a variety of different problems. My code can calculate b(z)D(z) = c where c is a constant for any value of c, and since D(z) depends on \Omega_{\mathrm{M}} and \Omega_{\Lambda}, it can also take any value of those parameters via the parameter file (which is parsed by my custom HARPPI library).

So, what can I learn from this that will help me have more success in the future? We obviously can’t prepare ourselves by learning everything we need to know to solve every problem, whether that be from the coding side or the problem solving side. I just happened to be lucky that I was prepared this time. But, there are some general lessons to take away here. First, I didn’t have to write custom parameter parsing functions (like I did for longer than I should have) because I have created a custom library for that now which is setup on my system in a way that I can simply #include <harppi.h> in the cpp file, and then link the library with -lharppi when compiling with g++. This highlights how writing reusable, flexible code to solve problems you come across frequently helps you in the long run. Sure you may spend a while creating some library of functions, but once you have them you can easily use those functions over and over again. Also, because some really smart people have written the GSL, there are lots of tools that I can simply use without having to write custom code. So, the lesson here is don’t reinvent the wheel. Try your best to recognize when custom code you are writing is something that should be very reusable and create library with the function in it so you can access them easily later. Try to learn about useful libraries that have already been developed, and use them when you can.

You might be asking, how do I know when I might need to create reusable code? When in doubt, make it reusable. By that I mean, avoid hard coded values as much as possible and use a parameter file to pass numeric values you might need. That way, in the least, the code can do the same calculation for any values of constants. Then, if you do find yourself running that code a lot, move it to a custom library, place that in a local lib directory with the associated header file in a local include directory, both specified in your path variables as needed to make inclusion in future code very simple.

So, the success of even a simple piece of code can help you figure out strategies for the future. Introspection is something we should all strive to do more.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s