Mining big data highlights importance of ‘junk DNA’

An exploratory bioinformatics investigation suggests ‘junk DNA’ plays a critical role in early embryonic development and species evolution.



A physicist-turned-bioinformatician from South Dakota State University in the United States has conducted a largescale, open-ended genomic data-mining study to gain insights into early embryonic development. His results suggest that genetic components called transposable elements, or TEs, play a crucial role in early development and should be further investigated.

An organism’s genome contains all the information needed for its formation. Some genetic components, like TEs, use a copy-and-paste mechanism to duplicate themselves and move around within the genome. TEs make up just under 50 per cent of the human genome, but for many years, some scientists discounted them as ‘junk DNA’ that is only there to fill space. This study by Xijin Ge in South Dakota supports other recent investigations demonstrating the opposite is true: TEs are in fact vital for development and evolution.

Ge conducted an exploratory bioinformatics investigation (EBI) into genomic big data from studies of early embryonic development. Specifically, he focused on gene regulation in pre-implantation mouse embryos.

“For two years, I spent my free time using all available bioinformatics tools and data, especially single-cell RNA-sequencing data, to understand how early embryos develop from fertilized eggs,” says Ge. “With the massive amount of genomic data available, there is much to be gained from re-analysing it.”

Unlike traditional hypothesis-driven research, EBI allows the data to tell its own story, helping scientists to frame future research questions. By examining multiple RNA-sequencing datasets, Ge found that TE transcription—the process of translating its genetic information— can directly influence the expression of neighbouring genes, thus influencing the dynamics of gene expression in early embryos. Ge also identified possible regulatory proteins that bind to DNA motifs provided by TEs.

“Transposons are essential in shaping gene regulatory networks during evolution,” says Ge. “They provide a copyand-paste mechanism to re-use and re-shape genetic logic, similar to the way in which computer programmers copy and revise old codes. TEs can introduce harmful mutations, but they drastically enhance genetic diversity for the population, and may be necessary for adaptation and survival, especially for species that reproduce slowly.” 

Ge calls for new, targeted studies using the insights gained from bioinformatics. “Ultimately, my goal was to gain as many actionable insights as possible from existing data, so that scientists can build a more complete picture of early embryonic development.”


  1. Ge, S.X. Exploratory bioinformatics investigation reveals importance of ‘junk’ DNA in early embryo development. BMC Genomics 18:200. BMC Genomics 18, 200 (2017). | article

Read this next

Single gene defect found for unexplained dwarfism

Mutations in a DNA replication gene cause a rare developmental disorder seen in Saudi children.

Measuring the risk of contracting latent tuberculosis in clinical training

Medical students face a greater risk of being infected with TB bacteria than trainees in other specializations

IVF conception does not affect educational outcomes of premature children