Course No. 8 "Microbial Genome Analysis"


Problem 1

We have the following goals in analyzing microbial genomic sequences:

  •     Identify organisms with complete genome sequences

  •     Obtain information about microbes such as genome size, natural environment, and pathogenicity

  •     Download information about proteins encoded by a particular genome

  •     Compare genomes of various microbes

  •     Identify genes that are present in two organisms and/or absent in a third organism

  •     Identify neighboring genes in multiple organisms that share a gene of interest

  •     Find summary information and pathway maps for microbial genes

NCBI’s Genome Database

The main page for the database is Click on the Microbes link to get to the Genome Project page for prokaryotes. Click on the Prokaryotic Projects link.  Click on Prokaryotes tab.

To access entries for a particular organism, such Escherichia coli, use the Search by organism box at the top.  Click on the Escherichia coli link. Click on the genome Annotation Report link.  The link in the proteins column  or the See Protein Details link provides detailed information about the proteins annotated on the genome. Click on See protein details link.

Click on the Return to Genome Overview link.  Click on the Genome Project Report link to access information about Escherichia coli genomes such as its size, GC content and links to the genomes and proteins sequences. Note different genome sizes for different sub-species/strains. Click on the genome Annotation Report link. 

Note the different genome sizes for different strains.  Scroll down to the region O157:H7 str. TW14359. Click on the PRNJA59235 link and then on the more link in the description.  Note the differences listed to distinguish pathogenic O157:H7 from non-pathogenic K-12. 

Integrated Microbial Genomes (IMG)

Go to the IMG site (

Select the Find Genes tab and then click on the Phylogenetic Profilers > Single Genes option.

Select the radio button in the first column “Find Genes In” for Escherichia coli O157:H7 str. TW14359, second column “With Homologs In” for Escherichia coli O157:H7 Sakai, and the third column “Without Homologs In” for Escherichia coli str. K-12 substr. MG1655. The first two are pathogenic and the third is not.

Click on the Go button at the bottom of the page.

The Summary Statistics table provides access to functional classifications of these proteins based on COG, Enzyme, Pfam etc. The larger table on the page provides additional information such as identifier, length and each of the functional classifications. From the Summary Statistics table, select the COG functional category to study proteins specific to this pathogenic strain.

Clicking on the number to the right of the "Intracellular trafficking, secretion and vesicular transport category" label may reveal proteins associated with pathogenicity.

The list includes fimbrial proteins, adhesions, type II secretory proteins, hemolysin activator protein, etc. These proteins are essential for pathogenicity of E.coli. From the protein list, select the protein labeled "putative adhesin" with gene_id 644924025. 

This will lead you to a Gene Detail page with information about the gene such as links to DNA and protein sequence, function and domain, neighboring genes (neighborhood) and conserved neighborhood.  Click on the “Show ortholog neighborhood regions link. 

The result shows that this gene (red bar in the middle) is present in other pathogenic O157 strains.  The result also displays some of the differences among various pathogenic strains.  For example, rhsA protein in rhs element, shown by a purple bar, is present in other pathogenic O157:H7 strains but absent in str. TW14359.

MetaCyc and Biocyc ( and

Access the BioCyc web page Type ‘acid resistance’ in the top right hand corner window and click on the “Quick Search” button.

The result lists two major acid resistance pathways which are dependent upon availability of amino acids, glutamate and arginine. Click on the "arginine dependent acid resistance link". 

The result page will take us to the page below. One can opt to seek “more detail’ (detailed structure and biochemistry of the enzymatic reactions) or ‘less detail’ (overview of the pathway). Click on the ‘Species Comparison’ button to study whether this pathway is present in other species. Click on the adiA link in the Operons column.  Click on the Select Allowed Organisms button.

Select four enteric pathogens associated with food-borne infections: E. coli O157:H7 Str. Sakai, E. coli O157:H7 Str. TW14359, Shigella boydee Sd227 and Salmonella enterica serovar Typhimurium str LT2 (keeping K-12 substr. MG1655 selected).

Click on the OK button at the bottom of the page.  Then click on the ‘Align in Multi-Gene Browser’ button.  All three components of arginine dependent acid resistance (adiA, arginine decarboxylase; adiY, AraC-type transcriptional regulator; and adiC/yjdB, arginine-agmatine transporter) are present in near-identical manner in all five bacterial species. Although this pathway was originally considered to be absent in Shigella and Salmonella, it was later discovered that an arginine dependent acid resistance pathway is indeed operative in Salmonella in response to different physiological stimuli.


Questions, Comments:  Medha Bhagwat, PhD