My DNA on my iPod?

June 19, 2008

If I wanted to store all the information included in my DNA, what size would it occupy? Could I copy it on my iPod and take it everywhere with me?

DNA is a double-stranded molecule. That means it is compounded of two strands, i.e. two sequences of nucleotides (which, once again, are represented with four letters A,C,G and T). Those two strands are complementary, facing each other. A natural measure of the length of DNA is known as “base pair” or “bp” . A base pair is just one nucleotide and its complement on the other strand.

The human genome contains about 3 billions baise pairs. Since there are four nucleotide, if one wants to code it in a computer format, he has to translate it in binary code: with 1’s and 0’s.

For example, let’s say:

  • A=00
  • C=01
  • G=10
  • T=11

Two bits are needed to code four letters. Each letter thus requires two bits of information. Eight bits make a byte, which is a classical measure of computer memory size. Don’t be afraid with the two lines of utterly simple math that lay below… In bytes, the size my genome requires:

3,000,000,000 bp x 2 bits / 8 = 750,000,000 bytes

A Mega byte, or Mb, is 2^20 bytes. In Mb, my genome would need:

750,000,000 / 2^20 = 715.2557 Mb

We can now answer the initial question: YES! I could store my whole genome on any iPod, even the smallest one. However, it couldn’t be stored on a classical CD-R, containing only 700Mb.

For the most tenacious among you, I should precise that here, we made the (sound) hypothesis that we wanted to record only one of the two strands of the DNA molecule. In fact, the two strands are redundant so it is useless to store both of them. But if you wanted to do so, note that the smallest model of iPod (the 1Gb Shuffle) is not able to store your genome!

Be cautious when choosing your next iPod!


  • 1. K3nt1  |  June 20, 2008 at 12:13 pm

    Just for my personal knowledge:
    Are the different nucleotides present in the same proportions ?
    If this is not the case, one could use a shorter code for the most frequent one and a (slightly) longer code for the less frequent. And maybe my old CD burner could then prove itself useful after all.. šŸ™‚

  • 2. Personomics  |  June 20, 2008 at 2:12 pm

    Excellent point! No, the nucleotides don’t appear in the same proportions. It is somehow depending on the species considered.

    Within human genome we have roughly the following proportions:


    Note that there is an explanation of the similarity of proportions in G-C (around 19%) and A-T (around 30%). It is just because the linkages between the two DNA strands are much likely to occur between C and G on one hand, and A and T on the other hand than any other combination like A-G or C-T, etc.

  • 3. Andrew Yates  |  July 8, 2008 at 11:57 pm

    It depends if you are storing a reference genome or an individual’s genome. Humans are diploid, meaning they have a pair of each chromosome. So, you’d need about 2GB for YOUR genome, and about 1GB for a sample genome.

    K3nt1: rather than nucleotides, observe the disparity in levels of dinucleotides (2 bases). I’ll write about this on Think Gene later this week. Thanks for the prompt!


