Course No. 12 "Gene Expression Omnibus (GEO)"


Problem 1

Gene Expression Omnibus

This class demonstrates how to search for an expression record in GEO, obtain differentially expressed genes and information about their pathway enrichment.

Topics to be covered include:

  • Types of databases (GEO DataSets and GEO Profiles)

  • Types of entries in GEO DataSets (Platform, Sample, Series and Dataset)

  • Searching options for GEO DataSets

  • Obtaining differentially expressed gene list for an experiment (using analysis tools in GEO DataSet or using GEO2R)

  • Links to accessing or downloading data, profiles and pathway enrichment

Access the NCBI home page.  Click on the Genes and Expression link on the left side of the page.  Notice three listings, Gene Expression Omnibus (GEO) Database, Gene Expression Omnibus (GEO) DataSets and Gene Expression Omnibus (GEO) Profiles.  Click on the Gene Expression Omnibus (GEO) Database link.  Click on the Overview link listed under Documentation.  Note different types of submitted entries, NCBI curated records and their accession number prefixes.  Also note the contents of two databases, GEO DataSets and GEO Profiles.

Searching entries in GEO DataSets and downloading data

Go back to the GEO home page.  Click on the Search for studies at GEO DataSets link.   Click on the Advanced link.  Note various options listed under All Fields to restrict your query such as DataSet Type, Entry Type and Subset Variable Type.  Use the Show index list to list of options under that field.   However, in this example, we will not use any restriction.  Go back to the DataSets main page. 

Type “breast cancer” including the double quotes in the Search box at the top and click on the Search button. Note the number of different entry types, study types and organisms listed on the search results. Restrict the search by adding "AND leukemia inhibitory factor" without quotes after "breast cancer" in the search box and clicking on the Search button. 

Note the first entry “Leukemia inhibitory factor effect on Sin3a-silenced MCF7 breast cancer cell line”.  Note the links to its Platform and Series records and a link to download data.  Click on the Series GSE35696 link.  Note summary, overall design, number of samples and links to various download options.  Click on the Query GEO DataSets for GSE35696 at the top of the page to access all associated 14 entries, 1 curated DataSet and submitter provided 1 Series, 1 Platform and 11 samples entries.  Note links to GEO Profiles from GDS4388 entry and Analyze with GEO2R link from the GSE35696 entry. 

Analyzing data using tools in GEO DataSet

Click on the title “Leukemia inhibitory factor effect on Sin3a-silenced MCF7 breast cancer cell line” to access the DataSet Record GDS4388.   To get information about the samples, color coding and value distribution, click on the Experiment design and value distribution link then on the details link. 

Go back to the DataSet page.  To obtain a list of differentially expressed genes, use the Compare 2 sets of samples link.  Select the test and significance level 0.01.  Click on Select which samples to put in Group A and Group B.  Assign samples to group A and B by clicking on them (SIN3A knockdown in Group A and control in Group B).  Click on the OK button.  Click on Query Group A vs. B.  You can download the profile data by using the link at the top.  (Links for top 200 genes with similar profiles can be obtained from the Profile neighbours link.)  You may wish to sort the results page by Subgroup effect under the Display Settings and click on the Apply button.   Information about pathways enriched in these genes can be obtained by using the Find pathways link.  Alternatively, the gene list (without fold change) can be downloaded using the Find related data Database menu by Select -> Gene for input into your choice of pathway analysis resource. 

Go back to the DataSet page.  Click on the Cluster heatmaps link.  Select the method and click on the Display button.  Select a particular area of interest and double click.  Search for a gene of interest such as PTPRG. 

Go back to the DataSet page.  Click on the Find Genes Link.  Type PTPR* in the box next to Find gene name or symbol and click on Go. 

GEO2R:  Above analysis links are present only in the curated Dataset.  You can use GEO2R link provided on the Series page to obtain a list of differentially expressed genes.  Access the GSE35696 page.  Click on the GEO2R link and then on the Define groups link to define groups.  Sort the sample names by clicking on Treatment. Select samples by clicking on one sample and dragging.  Assign them to a group by clicking on the appropriate group.  Use the default options or choose options by clicking on the Options tab.  Click on top 250 or on Save all results.


Questions, Comments:  Medha Bhagwat, PhD