ors.od.nih.gov logo and link to the ors web page

Course No. 10 "Gene Expression Microarray Data Analysis"



Problem 2


Psoriasis is an inflammatory skin disease usually chronic in nature.  Molecular mechanisms of the disease were not well known, making it a good candidate for study by the microarray technology.  Researchers took samples from both affected and unaffected regions of patients with psoriasis.  Students will find genes differentially expressed between these two regions using microarray data from the NCBI public database, Gene Expression Omnibus (GEO) with identifier GSE2737 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2737). They will use BRB-Array Tools (http://linus.nci.nih.gov/BRB-ArrayTools.html) from NCI to analyze the data.


The steps are as follows (steps A-C can be skipped, if these files are already provided).


  1. Download data from the GEO page http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2737 , under the section “Supplementary file” by clicking on the link “http”.  (A direct link to the data is provided here -Skin_PsoriasisInactiveVsActive.zip.)  Extract the data from the zip file.
  2. Annotating the genes:  Unlike Problem 1, we will let BRB-Array Tools find the annotation file for us.  This process can be time consuming, but these arrays have many fewer genes.  It should take the software around ten minutes when we get to this step. (Alternatively, one could use the GEO platform link or the vendor web site and import this annotation file.)
  3. Create an experiment description table describing the information about the experiment using the file names of the micorarray data files as the array identifiers in the first column (file name extensions such as “.cel” should not be included).  The second column should contain the skin type (active or inactive).  This information can be found at the GEO web site for GSE2737 under the heading, “Samples”: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2737 .  Use only the Skin_Psoriasis Inactive and Active samples.  Alternatively, one can use the file provided here - ExpDescFile2.xlsx.
  4. Import and preprocess the data:   Open excel and click on “Add-Ins” to access “ArrayTools”.  Click on “ArrayTools”, and select “Import data” and then “Data import wizard”.  In the Data Import Wizard, select “Data Type” “Affymetrix .CEL Data”.  Select “File Type” as “The expression data are in separate files stored in one folder.  Click on the browse button. Find the Desktop folder “Skin_PsoriasisActiveVsInactive”.  Select this folder and click “OK”. In the “Data Import Wizard”, also click “OK”.  Click “Yes” when asked if 8 arrays are the correct number of arrays.  Select “justRMA” as the “method to analyze your Affymetrix CEL files”, and then click “OK”.
  5. Annotate the data:  Continuing with the “Options for Annotation”, select “Use Bioconductor packages for annotation”, and click “OK”.  At this point you may get a question about installing a package from BioConductor.  If this occurs, click “Yes”. 
  6. Open the experiment description file:  Continuing with the Experiment Descriptor File (ExpDescFile2.xlsx), click the “Browse” button.  Select the Desktop file ExpDecFile2.xls and click “Open”.  Then click the button “Next”. 


  1. Manage the output file location:  Name the project folder by typing “SkinPsoriasis-Project” into the text box to the right “Project folder”.  Name the project by typing “SkinPsoriasisProject.xls” into the text box to the right of “Project name”.  Then click the button, “Next”.
  2. Filtering:  Note that the analysis of Affymetrix data does not use the interfaces for “1. Spot filters” or “2. Normalization” except for two default parameters.  Click on “3.  Gene filters”.  Accept the defaults by clicking  the button “OK”.  This executes the data preparation stage of the analysis.  Acknowledge the “number of genes passing the filtering and subsetting criteria” by clicking on the “OK” button.  Click the “Yes” button for “Annotate genes?”  Acknowledge the number of arrays and the number of arrays for which data is shown by clicking on the button “OK”.  Check the import of the annotation file by selecting the excel worksheet labeled “Gene identifiers”.  This is at the bottom of the excel workbook. (Portions of the data preparation stage can take ten minutes or more, depending on the number of one-time downloads of Bioconductor packages required.)To see the results without waiting, download and unzip this file, Skin_Psoriasis –Project.zip.  Then open the folder Skin_Psoriasis –Project, and click on the file Problem2_Project.xlsx.
  3. A quality control step such as clustering the samples to ensure that they group into the expected sets is performed next.  Click on the “ArrayTools” menu, and select “Graphics” and then “Visualization of samples”.  For “Class variable for coloring rotating scatterplot”, select “Type”.  Accept the rest of the defaults.  Click on the button “OK”.  Note that samples cluster according to skin types, with the Active group colored green and the Inactive group colored blue.  Click on the button “Close 3D plot”.
  4. Statistical tests to find genes that are expressed differently between the two sets of samples are performed.  Go to the “ArrayTools” menu, and select “Analysis wizard”.  Click on the button “Gene Finding”.  Select “Single label (e.g. Affy)”.    Click on the button “Comparing Classes”.  Click on the button “Class Comparison”.  Click on the “OK” button.  Under the “Experiment Design” section, use the menu to select the “Column defining classes” as “Type”.  Under “Find gene lists determined by”, choose “Restriction on proportion of false discoveries”.  Accept the rest of the defaults.  Click on the “OK” button to execute the gene finding calculation.  Note that the results will appear in the web browser such as Internet Explorer or Firefox.  Note that the table of genes can be copied and pasted into a spreadsheet application such as excel.  For further analysis, it is useful to save the list of Affymetrix “ProbeSet” identifiers from this table into a file (file SkinPsoriasisGenesListIDsOnly.txt).  Please note that well known psoriatic markers TGM1, IVL, CSTA, FABP5, and SPRR are up-regulated as given in step A, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2737.
  5.  Visualization: “View a clustered heatmap of significant genes” by clicking on the link with that name on the results page under the section “Contents”.
  6. Functional analysis:  Use a web application called DAVID (http://david.abcc.ncifcrf.gov) from the National Institute for Allergy and Infectious Diseases to aid in the functional interpretation of the list of differentially expressed genes by clicking here (http://david.abcc.ncifcrf.gov).


1.      Click on the “Start Analysis” link.  To upload the list of differentially expressed genes, use option “B. Choose From a File” under the tab “Upload”.  Click on “Browse”.  Select the Desktop file “SkinPsoriasisGeneListIDsOnly.txt”.  Click the “Open” button. 

2.      Ensure that “Step 2: Select Identifier” shows the menu item “AFFYMETRUX_3PRIME_IVT_ID”. 

3.      Under “Step 3: List Type”, choose “Gene List”. 

4.      For “Step 4: Submit List”, click the button “Submit List”.  Next select the tab “Background”.  Under this “Population Manager”, go to the section “Affymetrix 3’ IVT Backgrounds” and look for “Human Genome U95A Array”.  (It is listed below a “U133B” array.)  Select “Human Genome U95A Array”.  On the “List” tab, under the section “Select to limit annotations by one or more species” select only “Homo sapiens”.  Click on the button “Select Species”. 

5.      Note the name of the successfully submitted gene list and background listed on the right of the newly loaded page under “Analysis Wizard”.  The section is labeled “Step 1. Successfully submitted gene list”.  Now the analysis can be executed. 

6.      Gene Ontology:  Click on the link “Functional Annotation Tool”.  Shown in dark red are the DAVID annotation categories and in parentheses, the number of annotations in each category which are associated with genes from the submitted list. 

7.      Click on the button “Functional Annotation Clustering”.  These are a list of annotation terms, including a count of how many genes from the list are associated with the term. 

8.      Pathway:  Click on the link “Pathways”.  Then click on the “Chart” button to the right of “KEGG_PATHWAY”.  Finally, click on the link “Proteasome” in the first row of the table to the right of “KEGG_PATHWAY”.  Note the genes from the list shown in red. Why would so many proteasome genes have higher expression in regions active with psoriasis?  Note also in the KEGG_PATHWAY Proteasome figure that some of the genes are involved in the “Formation of immunoproteasomes” function (see the lower part of the figure).

9.      Return to the microarray data analysis output (from step J) to verify that the Active genes (Class 1 in the output) labeled proteasome (such as PSMA3, PSMD11, PSMB6, and PSMB8) are higher in expression than when found in the Inactive regions.



Questions, Comments:  Medha Bhagwat, PhD