Information
AI Chat

388836521 Applications of Bioinformatics

Course

Computer Programming Lab (CSC 170L)

22 Documents

Students shared 22 documents in this course

University

Norfolk State University

Academic year: 2022/2023

Uploaded by:

sara serrni

Norfolk State University

0followers

21Uploads

0upvotes

Comments

Please sign in or register to post comments.

Preview text

Subject: Bioinformatics

Lesson: Applications of Bioinformatics

Lesson Developer: Arun Jagannath

College/ Department: Sri Venkateswara College, University of Delhi

Chapter: Applications of Bioinformatics

 Introduction

 Bioinformatics and Genomics

 Genome and genome sequencing projects  The Human Genome Project (HGP)  Gene prediction methods  Comparative genomics and functional genomics  Pharmacogenomics  Next Generation Sequencing

 Bioinformatics and Protein Structure Prediction

 Levels of protein architecture

 Explosion in the growth of biological sequence and structure data

 Computational approaches to protein structure prediction

 Bioinformatics in Drug Discovery and

Development

 Drug Discovery & Development: A difficult and expensive problem

 Where computational techniques are used?

 Target identification and validation

 Target Structure Prediction

 Binding Site Identification and Characterization

Introduction

Bioinformatics is the application of IT to address a biological data. Bioinformatics helps us in understanding biological processes and involves development and application of computational techniques to analyse and interpret a biological problem. Major research efforts in the area of bioinformatics and computational biology include sequence alignment, genome annotation, prediction of protein structure and drug discovery. The challenges facing bioinformatics and areas of potential applications are shown below-

Figure : Challenges in bioinformatics Source: Author

Figure: Major research areas in bioinformatics Source: Author

Bioinformatics and Genomics

Genome and Genome Sequencing Projects

The word "genome" was coined by Hans Winkler from the German “Genom” in as early as 1926. The total DNA present in a given cell is called genome. In most cells, the genome is packed into two sets of chromosomes, one set from maternal and another one set from paternal inheritance. These chromosomes are composed of 3 billion base pairs of DNA. The four nucleotides (letters) that make up DNA are A, T, G, and C. Just like the alphabets in a sentence in a book make words to tell a story, same do letters of the four bases – A, T, G, C in our genomes.

Genomics is the study of the genomes that make up the genetic material of organism. Genome studies include sequencing of the complete DNA sequence in a genome and also include gene annotation for understanding the structural and functional aspects of the genome.

complete genome sequence in a human cell. It also aimed at identifying and mapping the genes and the non-genes regions in the human genome.

Some key findings of the draft (2001) and complete (2004) human genome sequences

included

Total number of genes in a human genome was estimated to be around 20,500.
Gene expression studies helped us in understanding some diseases and disorders in man.
Identification of primate specific genes in the human genome.
Identification of some vertebrate specific protein families.
The role of junk DNA was being elucidated.
It is estimated that only 483 targets in the human body accounted for all the pharmaceutical drugs in the global market.

Figure: The Human Genome Project Source: TIME Magazine

How was the whole genome sequenced?

The human genome was sequenced by two different methods – Hierarchical Genome Shotgun (HGS) Sequencing and Whole Genome Sequencing (WGS)

Figure: Two approaches for genome sequencing in the Human Genome Project

Why do we want to determine the sequence of DNA of an organism?

Homology based gene prediction tools

Name Algorithm Organism Url

GeneWise Dynamic Programing

Human sanger.ac/resources/software/

ORFgene2 Dynamic Programing

Human,mouse,Droso phila,Arabidopsis

itb.cnr/sun/webgene/

PredictGe nes

Dynamic Programing

Invertebrates,verteb rates,plants

cbrg.ethz/Server

PROCRUS

TES

Dynamic Programing

Vertebrates www-hto.usc/software/procrustes/

Ab initio based gene prediction tools

Name Algorithm Organism Url

GeneMark Hidden markov Model

Prokaryotes,e ukaryotes

opal.biology.gatech/GeneMark/

GENEFINDER Dynamic Programing

Human,mous e,Drosophila, yeast

rulai.cshl/tools/genefinder/

GENSCAN Hidden markov Model,Dyn amic Programing

Vertebrates,m aize,Arabidop sis

genes.mit/GENSCANinfo.html

GRAIL Dynamic Programing

Human , mouse,

compbio.ornl/Grail-bin/EmptyGrailForm

Comparative genomics and Functional Genomics

Comparative genomics is the analysis and comparison of genomes from two or more different organisms. Comparative genomics is studied to gain a better understanding of how a species has evolved and to study phylogenetic relationships among different organisms. One of the most widely used sequence similarity tool made available in the public domain is Basic Local Alignment Search Tool (BLAST). BLAST is a set of programs designed to perform sequence alignment on a pair of sequences (both nucleotide and protein sequence).

,Neural Network

Arabidopsis, Drosophila

HMMgene CHMM Vertebrates, C

cbs.dtu/services/HMMgene/hmmge ne1_1

ChemGenome Physioche mical Model

Prokaryotes, Eukaryotes

scfbio-iitd.res/chemgenome

Genie Hidden markov Model,Dyn amic Programing

Drosophila,hu man

fruitfly/seq_tools/genie.html

GeneParser Dynamic Programing ,Neural Network

Vertebrates home.cc.umanitoba/~psgendb/birchdoc/ package/GENEPARSER

environmental genomics or community genomics. It provides solutions to fundamental questions in microbial ecology and genomic analysis of microorganisms.

INTERESTING FACTS ON GENOMICS

 Every cell of the human body contain complete set of DNA that make up the genome with the exception of egg and sperm cells that carry half of human genome.  There are cells like red blood cells which have no DNA at all.  The sequencing of the human genome was completed in 2003. Both female (blood) and male (sperm) samples were processed for human genome sequencing project.  Genetic variation among the human, chimpanzee and gorilla shows that humans are more chimp-like than gorillas.  A major part of our DNA whose function is unknown is referred to as junk DNA.  The human genome is 3 billion bases of DNA made into 46 chromosomes (23 pairs autosomes & 1 pair of sex chromosome). It would take a century to just recite the complete sequence if done at a rate of one letter per sec for 24 hours a day.  Our DNA differs from each other by only 0 percent ( 1 in 500 bases).

Next Generation Sequencing

The advancement of the field of molecular biology has been principally due to the capability to sequence DNA. Over the past eight years, massively parallel sequencing platforms have transformed the field by reducing the sequencing cost by more than two folds. Previously, Sanger sequencing (‘first-generation’ sequencing technology) has been the sole conventional technique used to sequence genomes of several organisms. In contrast, NGS platforms rely on high-throughput massively parallel sequencing involving unison sequencing of millions of DNA fragments from a single sample. The former facilitates the sequencing of an entire genome in less than a day. The speed, accessibility and the cost of newer sequencing technologies have accelerated the present – day biomedical research.

These technologies reveal large scale applications outspreading even genomic sequencing. The most regularly used NGS platforms in research and diagnostic labs today have been- the Life Technologies Ion Torrent Personal Genome Machine (PGM), the IlluminaMiSeq, and the Roche 454 Genome Sequencer. NGS platforms rapidly generate sequencing read data on the gigabase scale. So the NGS data analysis poses the major challenge as it can be time-consuming and require advanced skill to extract the maximum accurate information from sequence data. A massive computational effort is needed along with in-depth biological knowledge to interpret enormous NGS data.

Table : Next-Generation Sequencing Platforms

Source:

Bioinformatics and Protein Structure Prediction

Proteins are linear polymer of amino acids joined by peptide bonds. Every protein adopts a unique three-dimensional structure to form a native state. It is this native 3D structure

Next-generation sequencing technologies employ different techniques, but all have in common, the ability to sequence more DNA base pairs per sequencing run than earlier methods like Sanger sequencing. Manufacturer Technique Run Time Per Read

Base Length

Cost (in 000s)

Helicos ReversibleTerminator 8 days 32 $

Illumina ReversibleTerminator 4 – 9 days 75 – 100 $500– 900

Ion Torrent Real-time <1 day 964 $

Roche/454 Pyrosequencing <1 day 330 $500– 700

SOLiD Sequencing By Ligation

7 – 14 days 50 $600– 700

Adapted from Mol Ecol Resour 2011;11:759–69; Nat Rev Genet 2010;11:31–46; Am J Clin Path 2011;136:527–39.

Figure: Growth of structures in PDB. The red bar indicates the growth of structures totally while blue bar indicates the number of structures in PDB in that particular year.

Source: rcsb

Computational approaches to protein structure prediction

There are three different methods of protein 3D structure prediction using computational approaches

Comparative Protein Modeling or Homology Modeling

Homology modeling predicts the structure of a protein based on the assumption that homologous proteins share very similar structure, as during the course of evolution, structures are more conserved than amino acid sequences. So a model is generated based on the good alignment between query sequence and the template. In general we can predict a model when sequence identity is more than 30%. Highly homologous sequences will generate a more accurate model.

Table: Some protein structure prediction softwares/tools

Tool Prediction method 3D-JIGSAW Homology Modeling CPHModel Homology Modeling SWISSMODEL Homology Modeling ESyPred3D Homology Modeling MODELLER Homology Modeling PHYRE Threading or Fold Recognition BHAGEERATH Ab-initio method I-TASSER Ab-intio method ROBETTA Ab-intio method Rosetta@home Ab-intio method

Protein Threading

If two sequences show no detectable sequence similarity, threading or fold recognition is employed to model a protein. Threading predicts the structure for a protein by matching its sequence to each member of a library of known folds and seeing if there is a statistically significant fit with any of them.

Ab initio method

Ab initio protein modeling is a database independent approach based exploring the physical properties of amino acids rather than previously solved structure. Ab-initio modeling takes into consideration that a protein native structure has minimum global free energy.

Computer-aided drug design (CADD) is a popular term that describes many computational approaches used at various stages of a drug design project. It constitutes development of online repositories of the chemical compounds for generation of hits, programs for prefiltering compounds with remarkable physicochemical characteristics, as well as tools for systematic assessment of potential lead candidates before they are synthesized and tested in animal models.

Target Identification and Validation

The identification of new drug targets implicated in disease remains one of the major challenges in the drug discovery process. Target identification can be carried out by classical biochemical methods or computational systems biology approaches. Target validation includes evaluating a biomolecule physiologically and pharmacologically and also at the molecular, cellular, or whole organism level. It has been reported that all current drugs with a known mode-of-action act through 324 distinct molecular drug targets.

Figure : Classification of drug targets. More than 60% of the drug targets are membrane receptor proteins and enzymes.

Source: Author

Table: Some enzymes as drug targets and drugs developed.

Enzymes Drugs

Cyclooxygenase Aspirin

Angiotensin converting enzyme Captopril

Dihydrofolate reductase Methotrexate

HIV protease Saquinavir

Xanthine Oxidase Allopurinol

Carbonic anhydrase Acetazolamide

Reverse Transcriptase AZT( Retrovir)

Target Structure Prediction

The drug targets generally selected for drug discovery are proteins. Most of the structure of proteins has been determined experimentally by X-ray crystallography or NMR spectroscopy. The structure of the protein target can also be modeled computationally using one or combination of the three approaches- Homology Modelling, Threading and ab-initio approaches.

Figure: A model of the dehydrogenase reductase SDR family 7B (DHRS7B) protein predicted by homology modeling ( SWISS-MODEL). Source: Author

Was this document helpful?