Forensic genomics isn’t exactly the sexiest field of study. The only headline you may have heard of was the 2012 case involving more than 20.000 mishandled drug samples in the state of New York. Another headline dated 2015, which made less of a splash, was when the FBI found that 96% of its own hair sample analyses (dated between 1985 and 1999) turned out to be wrong. Perhaps one of the main problems is that in the United States, to this day, there does not seem to be a consensus on acceptable forensic methodology.

The major advances in forensics can be traced back to the discovery of a technique called Sanger Sequencing: it allows to ‘read’ (sequence, in the jargon) the letters that make up our DNA. Our DNA is made up of four such letters, A, T, C and G, each representing a unique molecule, called nucleotides. Added one after the other, a long sequence — well, a very, very long sequence of over 3.3 billion — of these four nucleotides is how all our genetic information is encoded. In 1990, a collection of governments and universities launched the Human Genome Project. With an expected duration of 15 years, the goal was to read (‘sequence’ in the scientific jargon) the entire human genome, all 3.3 billion nucleotides of it. The Human Genome Project finished ahead of time. Some discoveries made were real surprises: the previous consensus was that the human genome encoded at least 50,000 genes, possibly up to 150,000 – but it turns out there are about 20,000. More surprisingly, perhaps, genes only make up about 1.5% of our DNA. We have in recent years started to understand the function of the remaining 98.5% — but a large chunk of that still has no function that we know of and has come to be known as ‘junk DNA.’. As more sequences of DNA were read, there were more surprises; human beings share 40% of our DNA with worms, 98% with chimpanzees, and in average, two different human beings have 99.9% of their DNA in common. One could therefore wonder how labs can claim to ID someone’s DNA with the accuracy typically they claim to achieve.

Since the mid-1990s, forensic labs have been using this newfound knowledge to provide criminal investigators with DNA fingerprinting, that allegedly allowed to identify individuals with laser-like accuracy. The first technique used was Restriction Fragment Length Polymorphism (RFLP). RFLP uses part of the 98.5% which has no known genomic function for DNA testing: among other things, this poorly understood part of our DNA contains VNTRs (Variable Number of Tandem Repeats). VNTRs are sequences of genomic bases repeated up to a few thousand times.  Though everyone one of us has a large number of VNTRs, they vary quite a bit from person to person. By chopping up VNTRs (usually around 10-15), one can obtain a relatively unique fingerprint: the probability of having the exact same versions of the VNTRs is minute — in the order of one in millions. Polymerase Chain Reaction (PCR) a technique which amplifies a specific genomic locus, means that investigators can obtain enough DNA for a fingerprint event from the tiniest of hair bulbs left at the crime scene.

The controversial use of statistics in trials involving RFLP as well as a raft of lab errors means that this technique was discontinued early on. Though a study by the National Research Council Committee attempted to give guidance on the statistical means by which RFLP data should be analysed, news that the lab error rate was at least one in a hundred precipitated the decline of RFLP analysis. Both defence attorneys and criminal prosecutors would frequently misuse the statistical analysis that comes with RFLP fingerprints to bias juries and judges in favour or against evidence. RFLP was soon replaced by studies of Short Tandem Repeats (STRs) instead. STRs are short, repeated sequences of bases — crime labs rely on the fact that unrelated individuals have a different number of repeats of the short sequence of bases to create a more objective comparison between two strands of genomes. STR analyses are used to this day – and though it is less controversial than RFLP analyses, by no means is there a consensus on its use in trials.

The appearance of new techniques to sequence DNA has led to exciting new opportunities for the field of forensics. The idea is relatively simple: the genome is shredded, small segments of 50-150 nucleotides are amplified and sequenced independently. A computer then processes the billions of small sequences to recreate the whole sequence of the human genome by matching the sequences against reference sequences stored in data banks. Scientists around the world today sequence whole genomes routinely, usually for $800 or less. With the cost coming down with every year, one can be forgiven to hope that we may finally be able to use DNA identification reliably.

The superstar company of genomics, Illumina, recently developed a new machine, the MiSeq FGx Forensics Genomics System, which, according to a couple of recent papers in Forensics Science International: Genetics, could do away with the uncertainty involved in DNA forensics. The machine promises to analyse large swaths of a person’s genome to accurately match it against a genome to be tested – one could conceivably imagine a fast-approaching day where DNA matching could be 100% accurate.

Excitingly however, new sequencing technology promises to rev up other forensic fields as well. In a 2014 Nature Communication paper, Dr De Wit and colleagues were the first to provide proof of concept that genomics can be used for forensics on a large scale. The researchers examined the cause of an observed large scale death of an edible sea snail off the coast of California. The scale was such that crabs, sea urchins and other invertebrates in the area were affected as well. By comparing genomic data from surviving sea snails to that of unaffected sea snails, the team found a significantly high number of mutations in a gene which product is known to be targeted by a toxin — yessotoxin (YTX) — which is produced by a certain algae. Most interestingly, when the team tested the waters for YTX and other toxins, none were present at significantly toxic levels — in other words genomics allowed to identify the culprit where no other known technology had managed this feat.

Finally, because a cell’s DNA often allows to trace back origin and ancestry, genomic forensics could also be of use in the study of microbes and viruses (although viral genome is not composed of DNA, it can still be investigated in similar ways). Why is this interesting, and more so than the study of sea snails? A paper published last year in Virology showed that genomics allow to trace viral origins, which would be a huge boost when a new strain of supervirus threaten a pandemic — quickly identifying the virus, and which viruses are closely related promises to speed up production of antiviral medications.

Granted, considering the difference in cost between STR analysis and next generation sequencing, it may still take a while before criminal labs and others take up the new tech — but we’re definitely due an upgrade.

About the author

HUGO LAROSE is a Ph.D student at the University of Cambridge. In his research, he uses next-generation sequencing in the hopes of finding new insights into pediatric tumors (lymphomas).