Monday, August 24, 2009

Mad Science Monday, 8/24/2009

Today I'm taking part in a blog hop to wish a happy birthday to UnderstandBlue. I met UnderstandBlue through my sister, Stampin Libby. We took part in the first ever "This Is What a Tweetup Is, Libby" tweetup at Phil's Icehouse (warning: that site makes annoying noises).

So, given that I met UnderstandBlue through Twitter, and I'm taking part in a blog hop, it seemed like a good day to look into the science behind how crap spreads around the internets.

Mad Reference: David Liben-Nowell and Jon Kleinberg. (2008) "Tracing information flow on a global scale using Internet chain-letter data." PNAS 105(12): 4633-4638. doi: 10.1073/pnas.0708471105 (full text available free online)

Mad Background: In 1976, Richard Dawkins introduced the word "meme" in his book The Selfish Gene as a way to describe the cultural equivalent of a gene. A meme is a a replicator, like a gene; it carries an idea, but must be copied to be transmitted. While genes copy through DNA replication, memes copy by being repeated. They still seem to evolve by natural selection, though; as they're copied, sometimes they change a little, and the ones that "work" better (from the meme's perspective, at least) spread.

Mad observations: The word "meme" is, itself, a meme, and has gotten a lot of use lately, specifically in the form of Internet memes. It seems entirely random, though, which things take off on the internet, and which fizzle. But science has a knack for finding patterns and explanations in the seemingly random. Maybe models developed for the spread of diseases will work.

Mad hypothesis: Perhaps internet memes spread "with a rapid, epidemic-style fan-out." If internet chain letters spread like disease epidemics, most people should spread it to several people, most of whom spread it to several people, etc.

Mad experiment: The researchers used an online petition that spread mainly in 2002-2003. Each recipient of this petition was asked to add their name to the end of the petition, and then forward it on to their friends. They collected 637 copies of the petition from mailing-list archives, each representing a distinct chain of participants, totaling 18,119 distinct signatories. They repeated this procedure for another petition that circulated in 1995. They used the multiple copies of each petition to construct a tree diagram, tracing the routes that the letter traveled. This was complicated by "noise," such as rearrangements of the list of names, deletions, insertions, mutations (changing a name on the list to a political figure, for example), and even hybridization when a user apparently received two copies of the petition and merged them together (interestingly, these are all things that happen with genes). Both petitions resulted in similar structures. They then followed all of this up by computationally modeling different patterns of forwarding (including modelling different patterns of the information being posted in a form that they could evaluate, ie taking into account the fact that they couldn't see everything), and seeing which pattern matched the observed trees (if any).

They all laughed, but: The hypothesis that these internet memes would spread in a similar manner to disease epidemics was (at least for these two examples) disproven just from the initial tree constructions. The trees were much longer than they would be for disease epidemics (the average distance between a given individual and the "root" of the tree was much longer than for diseases), and more than 90% of the nodes had exactly one child (ie, most people only spread the meme to one person).

The modeling experiments showed that two parameters had to be added to the disease model in order to get results that matched the petitions. First, not all recipients respond in the same amount of time. Some respond right away, and some take months to respond. This would be similar to a disease with a very widely varying incubation period (the memes don't match real diseases because real diseases don't have such widely varying incubations). Second, some recipients would send the meme back to either the person they got it from or the people the original person sent it to. This also doesn't happen in quite the same way for real diseases, and thus doesn't match known disease spreading patterns.

Of course, this was just a model. It would take more research to determine if the model was correct.

Mad follow-up: Researchers in Spain did the additional research. They started a meme through the IBM company newsletter, and were able to more exactly track its spread, and they found that the spread matched the model's prediction. Moreover, given a small set of initial data on the spread, they were able to predict how far and fast the information would spread (by calculating the parameters used by the model).

Mad engineering applications: Combined with the Spanish research, this is getting close to a way to construct messages to spread far and wide (such as, for example, your Manifesto on Why Everyone Should Bow to Your Will). It definitely isn't there yet; they can predict how far it'll spread given initial information about its spread, but they can't predict it before it's released into the wild. But, given that ability, more experiments are now possible; they don't have to wait until the meme has spread to see how effective it is, they only have to know initial information, so now they can construct slight variants of memes and see what makes different ones spread. Perhaps soon we will know what to include in your Manifesto to get it out there.

BTW, if you want to see the rest of the UnderstandBlue Birthday Blog Hop, start here.


Lydia said...

WOW!!!!!! Is that true???

I've seen meme everywhere, but no one has ever explained it before. That is incredible. I'm stunned (and not) that it is replicatable though. I have a fractally, chaos theory feeling right now, which is a great feeling on a girl's birthday! I'm so glad I got to meet you, and I'm honored to have such a brainy birthday blog!!!

I also love your scanned face every time I see it in my Twitter feed!!!

Thank you!!!

Leslie Hanna said...

I LOVE this post! So geeky, yet in a very strange way, it satisfies the geek in me to know WHY WHY WHY. Thank you so much for playing!!!

Die Anyway said...

>"(ie, most people only spread the meme to one person)."

This seems to imply that there are some "Typhoid Mary"s out there who cause the bulk of the spread. So when you send out your manifesto you need to either identify some targets who will behave that way or you need to figure out a way to duplicate that behavior. Also, I wonder if this relates in any way at all with you-tube videos that "go viral"? That seems to me to be a whole different mechanism than the chain-letter thing.

Jon Harmon said...

Die Anyway:
On "Typhoid Mary"s: It's probably at least partly that, but part of it is also that users who receive the forward and ignore it are "invisible," so really I should have said that >90% only "infected" one person; they may have sent it to dozens, but maybe those dozens already had it, or did not sign it and send it on.

I think it'll turn out to be mostly similar for YouTube and other "viral" things. They still spread through sharing, just not necessarily through email. I think the "step back" stuff would still apply, too. For most diseases, once you've had the disease, you're safe. But if someone sends you a "viral" YouTube video that you've already seen, that might remind you to send it to a different group, etc.

Of course, now wheels are turning on how to do some research in this arena myself...