Friday, November 18, 2011

Strategies for Successful Coding

I was never exposed to the world of coding before I started graduate school. I thought it was mostly for computer programmers, but apparently, statisticians do it a lot. Earlier this semester, I started learning Stata, and I must confess, I went slightly crazy learning it. I spent hours staring at the data and none of the codes I wrote for cleaning and analyzing data made sense to me. The weekly assignments were due every Monday 9 am, and I have never been more traumatized at the prospect of spending 14-15 hours every weekend coding. I am perhaps one of those outlier cases with an extremely slow learning curve, and some things just don’t make sense to me unless I draw diagrams and flowcharts. However, I am beginning to see light at the end of the tunnel, and you will never know how much joy a simple 20 line code running successfully brings you, unless you have spent 4 hours writing that code and staring at the data wondering why it would not run. In the process, I picked up some strategies that have worked for me, and I might be rehashing things that already seem obvious to you, but I will share my wisdom nevertheless.

1. Organize

Try to be extremely meticulous and careful about organizing data. Make folders and subfolders, but do not overdo it to the point that it increases work for you. Unlike Indian parents who take credit for naming their babies something that will take years for them to master enunciating or spelling, keep simple names and avoid using “underscores” if you can. If naming a file M9ScoreSummary suffices, do not try naming it Mathematics_Grade9_Score_Summary. You will waste time typing a long name every time, and will significantly increase your chances of making mistakes. Keep a separate notebook as a key for identifying actual names, lest you forget it at some point. The more time and effort you put in organizing your initial data, the better off you are in terms of not splitting hairs. Most importantly, don’t leave it to your brain for remembering things. Write them down.

2. Engage

Imagine spending a good whole week learning to code, getting codes running, and then going away for a month long trip to Timbuktu. Chances are that nothing would make sense to you when you are back. You spent all this time and effort boosting your learning curve, and now it is all gone. The more you do it, the better you get at it. So while in the initial stages of learning, spend some hours every day doing that. Remember as a child how your mommy insisted you spent at least two hours solving math problems every day, and that too first thing in the morning if it was a weekend? Not that all of you went on to become math majors or math professors. However, since the learning is so application oriented, and requires you to develop skills observing, getting dexterous, analyzing, and learning the logic, you should spend every day practicing it during the learning phase.

3. Attention to detail

There is a lot that can go wrong over a missed semicolon, an extra underscore, or simple typing an N for an M (the same reason why the more succinct your data naming system is, the better). Don’t run a code blindly unless you have a clear reasoning of why you are doing it. Don’t use the “cd” command unless you know it is meant to change directories, else you will keep looking for your file in a random folder all day long. Remember the “i” command overwrites your original file, so always make sure to save it as something different, like “i_different” if you do not want to mess with your original dataset. Pay attention to coloring details, it once took me six hours to figure out that my data will not run because all my numbers were coded red (string variables) and not black (numeric variables). Learning a coding language is no different compared to learning a language. It is very intuitive and logical. There is a reason your teachers taught you to begin every sentence with a capital (upper case) letter, end every sentence with a full stop, and use punctuations. Every bit of code you feed into Stata has some meaning to it. Stata is not crazy (although I have often alleged it to be), and it will not spew output if you screw up even a single alphabet. What more, even if it spews output, there is no guarantee that it is the correct one. So use your brain, and pay attention to minute details.

4. Seek help

Learn to look for help whenever you are stuck. It is great to cogitate and analyze issues in your head, but staring at numbers can get so overwhelming that by the time you have figured out a solution, you will be too tired to do anything with the solution. Sometimes you overlook a single missing command that makes all the difference, and a fresh pair of eyes looking at the data spots it right away. Google is a wonderful resource, and so are colleagues and professors.

5. Work hard, and work smart.

Learn to use various tools that make your life easy. Why wash clothes by hand when you can access a washing machine? Don’t write a thousand lines of code if you can get away with a hundred. Learn to use loops, macros, egen commands, foreach commands, and the various other tools that make your life easy. I resisted it for the longest time because it did not seem intuitive first, and looked scary. My codes did not run when I used the tools, the data messed up, and I gave up. Eventually I sat with my professor for three hours and figured it out (somewhat). Those three hours you put into learning it is going to save you 300 hours of future work and 3,000 lines of writing codes. I see it as a difference between calculating mathematical solutions by hand and then learning to use a calculator. First, you learn the entire process of doing calculations by hand. Then you have the added responsibility to learning how to operate a calculator. You realize it is not worth your time (especially if you have deadlines) and continue to calculate things by hand. Here is my advice. Be thorough about how to calculate without a calculator. Then invest some more time getting used to a calculator. This way even if you make mistakes, you would have developed the intuition to go back and see what went wrong. If you only knew how to use a calculator, you would never be able to function without one, or detect coding errors if you ran into one.

My biggest learning and advice from my experience is, learn to play around with data. There is no learning greater than the one that comes from playing around with systems, making mistakes, going back to fix them, and self-training yourself using structured resources (like professors, forums, and books) and a little bit of external guidance every now and then. Remember, learning to code is not research. It is just a tool you learn to help you do research. You are still dependent on your brain and your analyzing ability at the end of the day.

Happy coding.



Badri said...

Ah! You are a great teacher in the making......

NG said...

And if you need more, here is a post I wrote last year :)

Jeevan Baretto said...

Also commenting after every couple steps helps in the long run and other programmers can use your code.