image

Technology drives scientific development. With the advancement and application of new technologies such as high-throughput next-generation sequencing, high-precision quantitative mass spectrometry, high-resolution cryo-electron microscopy, and molecular imaging, biological research has entered a post-genomic era with a primary goal of discovering and interpreting whole-genome functional components, revealing the structure of super-complexes and complex molecular machines, as well as understanding genetic changes and disease relationships from system-wide component interactions and quantitative levels. The development of these "new" biology provides us with new perspectives and tools for solving traditional problems, and raises our understanding of the complexity of biological systems to a new level, but at the same time bringing more new and more challenging problems.

We are aware of these issues, challenges and opportunities. We have also witnessed that these high-throughput, high-precision, high-resolution new technologies have accumulated and are accumulating a large amount of data. We foresee that the development of "new" biology will increasingly rely on technologies such as high performance computing and big data analytics to study biological problems from unprecedented depth, breadth and height. We believe that because of the complexity of biological systems, with the accumulation of data, machine intelligence will play an important role in the study of biological problems.

The field of this research idea of our laboratory is the emerging structural systems biology. The laboratory combines research techniques such as structural biology, genomics, machine learning, and big data analysis to study major and cutting-edge biological problems. We use computing, especially machine learning, as our core tool and explore our conjecture through a unique experimental platform. In particular, the laboratory is also interested in developing and creating new computing and experimental techniques.

1、Biomacromolecules, especially RNA structure and its functional relationship

The sequence determines the structure and the structure determines the function. The central topics of structural information biology are how to predict the structure of biomacromolecules (protein, RNA and DNA) sequences accurately, how to compare and analyze the relationship between devices in the structure space, and how to predict its function from the structure.

Structurally specific chemical modifications that block the ability of the reverse transcription process can be used to probe RNA structures. After modifying RNA, add the reverse transcriptase, and then using sequencing to obtain the site of the reverse transcription stop, which is the modification sit. Since the modification is structurally specific, we can infer the structural properties of the modification site accordingly. We developed a new method for the determination of intracellular transcriptome RNA structures (icSHAPE: in vivo click SHAPE) at the Howard Chang Laboratory in Stanford. This method uses a newly synthesized, cell permeable small molecule, NAI-N3. This molecule can selectively modify RNA according to the structural environment of the 2 OH group on the sugar ring. Because it has an additional azide side chain, the modified RNA molecule can be chemically coupled to a handle and purified, which greatly reduces sequencing background and improves structural detection accuracy. Our preliminary results in mouse embryonic stem cells demonstrate the superior signal-to-noise ratio and structural detection accuracy and validity of the new method, and reveal the relationship between RNA structure and function from the level of whole transcriptome.

Our laboratory will develop more efficient algorithms and software to predict and model intracellular RNA molecular structures based on in vivo experimental information. Most of the current prediction methods are based on simulated folding, and the structural state within the cell cannot be obtained. Our research will reveal the structure of RNA molecules as they perform functions, and conformational changes in different states. Based on this, the laboratory will analyze the relationship between proteins, RNA and other structures as well as the properties of the structural space. In particular, we will develop effective prediction or discovery methods for protein functional sites, RNA functional motifs, etc., to further elucidate the principles and laws governing the structure of biological macromolecules.

2、Biomacromolecular interaction network

The function of biological macromolecules (proteins, RNA, and DNA molecules) is achieved by interacting with other macromolecules. The patterns of these interactions are structurally conservative. Recently, we developed a methodology for accurately predicting genome-wide protein-protein interactions using the information of conserved patterns of interacting structures (PrePPI). We will develop the PrePPI prediction algorithm and apply it to the study of more and newer protein-protein interaction networks. For some special species and systems, we will work together to develop a new method for structural model optimization with experimental information (such as electron microscopy data) to finally reveal the evolutionary laws of dynamic interactions of protein interaction networks in different environments, as well as the fine structure of supercomplexes.

Similar structural modeling prediction methods can be applied to protein-RNA interaction prediction to improve accuracy and sensitivity. The difficulty is that our knowledge of protein-RNA interactions is very limited. We participated in the development of experimental techniques (ChIRP-MS) using oligonucleotide tile arrays and protein profiles to detect proteins interacting with specific RNAs. Unlike popular protein-centric approaches in CLIP experiments, ChIRP is RNA-centric and can be used to delve into the interactions and functions of certain RNAs (especially ncRNAs) that have important functions or are abnormal in disease.

3、Long non-coding RNA system evolution and functional classification

A growing number of functional studies have found that long non-coding RNAs (lncRNAs) are closely related to many biological processes, including participation in epigenetic regulation. In particular, in recent years, some studies have found that abnormal expression of lncRNA, mis-shearing or even single-base mutations are closely related to human diseases, including cancer.

Comparison and evolutionary analysis of sequences between species helps to understand sequence-function relationships, including identifying sequence segments or sequence features that are meaningful for lncRNA function, identifying homologous lncRNAs that perform the same function between species, and even grouping lncRNAs with similar functional mechanisms and so on. However, due to the low level of sequence conservation of lncRNA and incomplete annotation information between different species, It is very difficult to study Systematic evolution and functional of lncRNA.

In our previous work, we successfully identified a low sequence-consistent lncRNA ortholog roX in Drosophila based on evolutionary collinearity and motif distribution patterns in the sequence. The laboratory will develop this analytical approach, based on genome-based comparative analysis including more lncRNA features, combined with machine learning and evolutionary analysis, systematically study the evolution of lncRNA in vertebrate taxa and identify a number of potential orthologous lncRNAs. Our research aims to establish a relatively complete lncRNA analysis process and provide new clues for understanding the evolution of vertebrate lncRNA, especially the conserved originals in the sequence and their possible functional significance.

4、RNA viral molecular mechanism

The genetic material of RNA viruses is RNA. RNA viruses have higher variability than DNA viruses because of the lack of error correction mechanisms in the replication process. RNA viruses include many common infectious disease viruses, such as flu, HIV, SARS, MERS, Ebola and most plant viruses. The ravages of these viruses have brought enormous and painful life and economic losses to human society. Our laboratory will combine the structural and molecular interaction networks of RNA and proteins to study the molecular mechanisms and possible therapeutic modalities of RNA virus invading cells and the resulting infectious diseases.

5、Complex diseases and precision medicine

There is growing evidence that protein-protein interaction networks with structural information can help us understand the mechanisms of complex diseases with greater precision. Because of the structural information, we can predict the different consequences of different gene mutations at a higher resolution. Comparing a limited number of PDB protein complexes, the protein-protein and protein-RNA global interaction networks established by structural modeling methods will greatly enhance our ability to identify mutations that affect the structure and interaction of proteins. In particular, cancer genome research has found millions of cancer-related mutation sites. Our analysis shows that even though most of these studies use exon sequencing that focuses on protein coding regions, there are still more than a quarter of mutations either in non-coding regions or without coding changes. These recurrent mutations in the cancer genome cannot be understood at the protein level, but our research suggests that they may cause disease by altering the RNA structure and perturbing the protein-RNA interaction network.