- Information
- AI Chat
388836521 Applications of Bioinformatics
Computer Programming Lab (CSC 170L)
Norfolk State University
Preview text
Subject: Bioinformatics
Lesson: Applications of Bioinformatics
Lesson Developer: Arun Jagannath
College/ Department: Sri Venkateswara College, University of Delhi
Table of Contents
Chapter: Applications of Bioinformatics
Introduction
Bioinformatics and Genomics
Genome and genome sequencing projects The Human Genome Project (HGP) Gene prediction methods Comparative genomics and functional genomics Pharmacogenomics Next Generation Sequencing
Bioinformatics and Protein Structure Prediction
Levels of protein architecture
Explosion in the growth of biological sequence and structure data
Computational approaches to protein structure prediction
Bioinformatics in Drug Discovery and
Development
Drug Discovery & Development: A difficult and expensive problem
Where computational techniques are used?
Target identification and validation
Target Structure Prediction
Binding Site Identification and Characterization
Introduction
Bioinformatics is the application of IT to address a biological data. Bioinformatics helps us in understanding biological processes and involves development and application of computational techniques to analyse and interpret a biological problem. Major research efforts in the area of bioinformatics and computational biology include sequence alignment, genome annotation, prediction of protein structure and drug discovery. The challenges facing bioinformatics and areas of potential applications are shown below-
Figure : Challenges in bioinformatics Source: Author
Figure: Major research areas in bioinformatics Source: Author
Bioinformatics and Genomics
Genome and Genome Sequencing Projects
The word "genome" was coined by Hans Winkler from the German “Genom” in as early as 1926. The total DNA present in a given cell is called genome. In most cells, the genome is packed into two sets of chromosomes, one set from maternal and another one set from paternal inheritance. These chromosomes are composed of 3 billion base pairs of DNA. The four nucleotides (letters) that make up DNA are A, T, G, and C. Just like the alphabets in a sentence in a book make words to tell a story, same do letters of the four bases – A, T, G, C in our genomes.
Genomics is the study of the genomes that make up the genetic material of organism. Genome studies include sequencing of the complete DNA sequence in a genome and also include gene annotation for understanding the structural and functional aspects of the genome.
complete genome sequence in a human cell. It also aimed at identifying and mapping the genes and the non-genes regions in the human genome.
Some key findings of the draft (2001) and complete (2004) human genome sequences
included
- Total number of genes in a human genome was estimated to be around 20,500.
- Gene expression studies helped us in understanding some diseases and disorders in man.
- Identification of primate specific genes in the human genome.
- Identification of some vertebrate specific protein families.
- The role of junk DNA was being elucidated.
- It is estimated that only 483 targets in the human body accounted for all the pharmaceutical drugs in the global market.
Figure: The Human Genome Project Source: TIME Magazine
How was the whole genome sequenced?
The human genome was sequenced by two different methods – Hierarchical Genome Shotgun (HGS) Sequencing and Whole Genome Sequencing (WGS)
Figure: Two approaches for genome sequencing in the Human Genome Project
Why do we want to determine the sequence of DNA of an organism?
Homology based gene prediction tools
Name Algorithm Organism Url
GeneWise Dynamic Programing
Human sanger.ac/resources/software/
ORFgene2 Dynamic Programing
Human,mouse,Droso phila,Arabidopsis
itb.cnr/sun/webgene/
PredictGe nes
Dynamic Programing
Invertebrates,verteb rates,plants
cbrg.ethz/Server
PROCRUS
TES
Dynamic Programing
Vertebrates www-hto.usc/software/procrustes/
Ab initio based gene prediction tools
Name Algorithm Organism Url
GeneMark Hidden markov Model
Prokaryotes,e ukaryotes
opal.biology.gatech/GeneMark/
GENEFINDER Dynamic Programing
Human,mous e,Drosophila, yeast
rulai.cshl/tools/genefinder/
GENSCAN Hidden markov Model,Dyn amic Programing
Vertebrates,m aize,Arabidop sis
genes.mit/GENSCANinfo.html
GRAIL Dynamic Programing
Human , mouse,
compbio.ornl/Grail-bin/EmptyGrailForm
Comparative genomics and Functional Genomics
Comparative genomics is the analysis and comparison of genomes from two or more different organisms. Comparative genomics is studied to gain a better understanding of how a species has evolved and to study phylogenetic relationships among different organisms. One of the most widely used sequence similarity tool made available in the public domain is Basic Local Alignment Search Tool (BLAST). BLAST is a set of programs designed to perform sequence alignment on a pair of sequences (both nucleotide and protein sequence).
,Neural Network
Arabidopsis, Drosophila
HMMgene CHMM Vertebrates, C
cbs.dtu/services/HMMgene/hmmge ne1_1
ChemGenome Physioche mical Model
Prokaryotes, Eukaryotes
scfbio-iitd.res/chemgenome
Genie Hidden markov Model,Dyn amic Programing
Drosophila,hu man
fruitfly/seq_tools/genie.html
GeneParser Dynamic Programing ,Neural Network
Vertebrates home.cc.umanitoba/~psgendb/birchdoc/ package/GENEPARSER
environmental genomics or community genomics. It provides solutions to fundamental questions in microbial ecology and genomic analysis of microorganisms.
INTERESTING FACTS ON GENOMICS
Every cell of the human body contain complete set of DNA that make up the genome with the exception of egg and sperm cells that carry half of human genome. There are cells like red blood cells which have no DNA at all. The sequencing of the human genome was completed in 2003. Both female (blood) and male (sperm) samples were processed for human genome sequencing project. Genetic variation among the human, chimpanzee and gorilla shows that humans are more chimp-like than gorillas. A major part of our DNA whose function is unknown is referred to as junk DNA. The human genome is 3 billion bases of DNA made into 46 chromosomes (23 pairs autosomes & 1 pair of sex chromosome). It would take a century to just recite the complete sequence if done at a rate of one letter per sec for 24 hours a day. Our DNA differs from each other by only 0 percent ( 1 in 500 bases).
Next Generation Sequencing
The advancement of the field of molecular biology has been principally due to the capability to sequence DNA. Over the past eight years, massively parallel sequencing platforms have transformed the field by reducing the sequencing cost by more than two folds. Previously, Sanger sequencing (‘first-generation’ sequencing technology) has been the sole conventional technique used to sequence genomes of several organisms. In contrast, NGS platforms rely on high-throughput massively parallel sequencing involving unison sequencing of millions of DNA fragments from a single sample. The former facilitates the sequencing of an entire genome in less than a day. The speed, accessibility and the cost of newer sequencing technologies have accelerated the present – day biomedical research.
These technologies reveal large scale applications outspreading even genomic sequencing. The most regularly used NGS platforms in research and diagnostic labs today have been- the Life Technologies Ion Torrent Personal Genome Machine (PGM), the IlluminaMiSeq, and the Roche 454 Genome Sequencer. NGS platforms rapidly generate sequencing read data on the gigabase scale. So the NGS data analysis poses the major challenge as it can be time-consuming and require advanced skill to extract the maximum accurate information from sequence data. A massive computational effort is needed along with in-depth biological knowledge to interpret enormous NGS data.
Table : Next-Generation Sequencing Platforms
Source:
Bioinformatics and Protein Structure Prediction
Proteins are linear polymer of amino acids joined by peptide bonds. Every protein adopts a unique three-dimensional structure to form a native state. It is this native 3D structure
Next-generation sequencing technologies employ different techniques, but all have in common, the ability to sequence more DNA base pairs per sequencing run than earlier methods like Sanger sequencing. Manufacturer Technique Run Time Per Read
Base Length
Cost (in 000s)
Helicos ReversibleTerminator 8 days 32 $
Illumina ReversibleTerminator 4 – 9 days 75 – 100 $500– 900
Ion Torrent Real-time <1 day 964 $
Roche/454 Pyrosequencing <1 day 330 $500– 700
SOLiD Sequencing By Ligation
7 – 14 days 50 $600– 700
Adapted from Mol Ecol Resour 2011;11:759–69; Nat Rev Genet 2010;11:31–46; Am J Clin Path 2011;136:527–39.
Figure: Growth of structures in PDB. The red bar indicates the growth of structures totally while blue bar indicates the number of structures in PDB in that particular year.
Source: rcsb
Computational approaches to protein structure prediction
There are three different methods of protein 3D structure prediction using computational approaches
- Comparative Protein Modeling or Homology Modeling
Homology modeling predicts the structure of a protein based on the assumption that homologous proteins share very similar structure, as during the course of evolution, structures are more conserved than amino acid sequences. So a model is generated based on the good alignment between query sequence and the template. In general we can predict a model when sequence identity is more than 30%. Highly homologous sequences will generate a more accurate model.
Table: Some protein structure prediction softwares/tools
Tool Prediction method 3D-JIGSAW Homology Modeling CPHModel Homology Modeling SWISSMODEL Homology Modeling ESyPred3D Homology Modeling MODELLER Homology Modeling PHYRE Threading or Fold Recognition BHAGEERATH Ab-initio method I-TASSER Ab-intio method ROBETTA Ab-intio method Rosetta@home Ab-intio method
- Protein Threading
If two sequences show no detectable sequence similarity, threading or fold recognition is employed to model a protein. Threading predicts the structure for a protein by matching its sequence to each member of a library of known folds and seeing if there is a statistically significant fit with any of them.
- Ab initio method
Ab initio protein modeling is a database independent approach based exploring the physical properties of amino acids rather than previously solved structure. Ab-initio modeling takes into consideration that a protein native structure has minimum global free energy.
Computer-aided drug design (CADD) is a popular term that describes many computational approaches used at various stages of a drug design project. It constitutes development of online repositories of the chemical compounds for generation of hits, programs for prefiltering compounds with remarkable physicochemical characteristics, as well as tools for systematic assessment of potential lead candidates before they are synthesized and tested in animal models.
Target Identification and Validation
The identification of new drug targets implicated in disease remains one of the major challenges in the drug discovery process. Target identification can be carried out by classical biochemical methods or computational systems biology approaches. Target validation includes evaluating a biomolecule physiologically and pharmacologically and also at the molecular, cellular, or whole organism level. It has been reported that all current drugs with a known mode-of-action act through 324 distinct molecular drug targets.
Figure : Classification of drug targets. More than 60% of the drug targets are membrane receptor proteins and enzymes.
Source: Author
Table: Some enzymes as drug targets and drugs developed.
Enzymes Drugs
Cyclooxygenase Aspirin
Angiotensin converting enzyme Captopril
Dihydrofolate reductase Methotrexate
HIV protease Saquinavir
Xanthine Oxidase Allopurinol
Carbonic anhydrase Acetazolamide
Reverse Transcriptase AZT( Retrovir)
Target Structure Prediction
The drug targets generally selected for drug discovery are proteins. Most of the structure of proteins has been determined experimentally by X-ray crystallography or NMR spectroscopy. The structure of the protein target can also be modeled computationally using one or combination of the three approaches- Homology Modelling, Threading and ab-initio approaches.
Figure: A model of the dehydrogenase reductase SDR family 7B (DHRS7B) protein predicted by homology modeling ( SWISS-MODEL). Source: Author
388836521 Applications of Bioinformatics
Course: Computer Programming Lab (CSC 170L)
University: Norfolk State University
- Discover more from: