Wednesday, September 27, 2006

Amateur Science

A couple weeks ago, I was talking to someone about the research I did in grad school. Briefly, my research involved computer-aided design of antibodies. While describing my research, I realized that the things I worked on using a relatively powerful computer and high-end software a mere 5-10 years ago could now be accomplished on a cheap home computer, most of it using free software.

I've often thought about going back to grad school, but this realization pointed me to an interesting possibility. Why not try some computational research on my own, self-funded, to see if I could get a research paper published?

The first thing I thought to try was something I had thought of at the end of my time in grad school. One of the big mysteries left in biology is how proteins fold. If proteins simply sampled the possible shapes which they could fold into one at a time, it's said that it would take longer than the age of the universe for a typical protein to fold. But within a cell, proteins fold into their proper shape within milliseconds, or, at longest, hours. And the shape of proteins is important; it's what determines what they do in cells, and what proteins do is what determines what living things do. Another important note on all this is that the sequence of amino acids in the protein determines the final shape of the protein.

One thing that's likely to happen during protein folding is that certain parts of the protein quickly fold, and this quick folding brings other parts closer together, which then interact to fold the protein into its final shape. So, I thought it would be interesting to search through the known protein structures, and see if any short sequences of amino acids --say, combinations of 3 amino acids--tend to have a standard shape in these known proteins. With 20 possible amino acids, and 3 positions, that comes out to 20 * 20 * 20 = 8000 possible 3-amino-acid combinations. It's a lot of things to check, but that's what computers are good at.

So, before starting on this project, I decided to check PubMed to see what had been done since I looked into this in 2000 or so. And, of course, in December of 2002, somebody published a paper about exactly what I was planning to do.

So, that one was out (well, mostly; I still think I might give it a try, but just as an exercise in programming and to test the replicability of their data). While I was at PubMed, I decided to poke around and see if any other ideas fell out.

So then I started thinking about evolution. I started thinking about the human genome. The human genome contains about 3 billion DNA base pairs. New DNA gets into the human genome through two routes, as far as we know: duplication of existing DNA (through things like copying errors or transposons), or incorporation of viral DNA into the genome (something called lysogeny). By tracing the relationships between these genes, using the same techniques we use to trace relationships between genes in different organisms, it should be possible to trace the evolutionary history of every gene in the human genome (with the possible exceptions of the virally introduced genes and genes that diverged too long ago for us to recognize their relationship). I thought that would be an interesting thing to try.

So, of course, did several other people. For example, this mob worked together to map human chromosome 18. Others have done similar things on other parts of the human genome. Not only had people beat me to the punch again, but the job was way harder than my computer is likely to be able to handle.

So... I'm still not sure exactly what I'm going to look into. I still plan to do this, but I've decided the first step is to read a bit more to catch up. If you know of any computational biology work that it might be interesting to look into, feel free to tell me about it in the comments.


Die Anyway said...

Jon, First a little story and then a comment on how it is appropos to your blog entry.

This past weekend my wife and I camped at a local park. The campground was nearly empty but around 11:00pm as we were crawling into bed, a van pulled up to the camp site next to ours and began unloading camping equipment and setting up tents. As best I could tell there were 3 men, 2 women, and 1 or 2 kids. They left the van headlights on so that they could see to set up camp, began popping a few brews and shouting at each other about effing this and effing that. One particular loudmouth complained about his effing parents, his effing brother, his effing time in Juvie, his effing time in effing prison, the fact that one of the girls brought him a soda when he effing well wanted another beer, etc, etc. By 2:30am the battery in the van had gone dead and they wanted to drive into town to get cigs so they went hollering around the campground for someone to give them a jump start. By 4:00am we had heard the whole sorry story of their low-class lives before they finally gave up and we got a few hours of sleep. In any case, it made me wonder how we ever manage to put up a space station, build airplanes, computers and MRI's, cure diseases, build skyscrapers, etc. These people pretty much made me lose any small amount of faith in humanity that I might have possessed.
Then I read your column and my faith was restored. There are intelligent, ambitious, creative people out there after all. My only hope is that you can overcome the drag of those low-life leeches.

Jon the Geek said...