is an inflammatory skin disease usually chronic in nature. Molecular mechanisms
of the disease were not well known, making it a good candidate for study by
the microarray technology. Researchers took samples from both
affected and unaffected regions of patients with psoriasis. Students
will find genes differentially expressed between these two regions using
microarray data from the NCBI public database, Gene Expression Omnibus
(GEO) with identifier GSE2737 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2737).
They will use BRB-Array Tools (http://linus.nci.nih.gov/BRB-ArrayTools.html)
from NCI to analyze the data.
steps are as follows (steps A-C can be skipped, if these files are already
from the GEO page
, under the
section “Supplementary file” by clicking on the link “http”. (A direct link to the data is
provided here -Skin_PsoriasisInactiveVsActive.zip.) Extract the data from the zip file.
the genes: Unlike Problem 1, we will let
BRB-Array Tools find the annotation file for us. This process
can be time consuming, but these arrays have many fewer genes.
It should take the software around ten minutes when we get to this
step. (Alternatively, one could use the GEO platform link or the
vendor web site and import this annotation file.)
- Create an
experiment description table describing the information about
the experiment using the file names of the micorarray
data files as the array identifiers in the first column (file name
extensions such as “.cel” should not be
included). The second column should contain the skin type
(active or inactive). This information can be found at the GEO
web site for GSE2737 under the heading, “Samples”:
. Use only the Skin_Psoriasis Inactive
and Active samples.
Alternatively, one can use the file provided here -
- Import and
preprocess the data: Open excel
and click on “Add-Ins” to access “ArrayTools”. Click on
“ArrayTools”, and select “Import data” and then “Data import
wizard”. In the Data Import Wizard, select “Data Type”
“Affymetrix .CEL Data”. Select “File Type” as “The expression
data are in separate files stored in one folder. Click on the
browse button. Find the Desktop folder “Skin_PsoriasisActiveVsInactive”.
Select this folder and click “OK”. In the “Data Import Wizard”,
also click “OK”. Click “Yes” when asked if 8 arrays are the
correct number of arrays. Select “justRMA”
as the “method to analyze your Affymetrix CEL files”, and then click
- Annotate the
data: Continuing with the “Options for
Annotation”, select “Use Bioconductor
packages for annotation”, and click “OK”. At this point you may
get a question about installing a package from BioConductor.
If this occurs, click “Yes”.
- Open the
experiment description file:
Continuing with the Experiment Descriptor File (ExpDescFile2.xlsx), click the “Browse” button.
Select the Desktop file ExpDecFile2.xls and click “Open”. Then
click the button “Next”.
- Manage the
output file location:
Name the project folder by typing “SkinPsoriasis-Project”
into the text box to the right “Project folder”. Name the
project by typing “SkinPsoriasisProject.xls” into the text box to the
right of “Project name”. Then click the button, “Next”.
- Filtering: Note
that the analysis of Affymetrix data does not use the interfaces for
“1. Spot filters” or “2. Normalization” except for two default
parameters. Click on “3. Gene filters”. Accept the
defaults by clicking the button
“OK”. This executes the data preparation stage of the
analysis. Acknowledge the “number of genes passing the filtering
and subsetting criteria” by clicking on the
“OK” button. Click the “Yes” button for “Annotate genes?”
Acknowledge the number of arrays and the number of arrays for which
data is shown by clicking on the button “OK”. Check the import
of the annotation file by selecting the excel worksheet labeled “Gene
identifiers”. This is at the bottom of the excel workbook.
(Portions of the data preparation stage can take ten minutes or more,
depending on the number of one-time downloads of Bioconductor
see the results without waiting, download and unzip this file,
–Project.zip. Then open the folder Skin_Psoriasis –Project, and click
on the file Problem2_Project.xlsx.
- A quality
such as clustering the samples to ensure that they group into the
expected sets is performed next. Click on the “ArrayTools” menu,
and select “Graphics” and then “Visualization of samples”. For
“Class variable for coloring rotating scatterplot”, select “Type”.
Accept the rest of the defaults. Click on the button “OK”.
Note that samples cluster according to skin types, with the Active
group colored green and the Inactive group colored blue. Click
on the button “Close 3D plot”.
to find genes that are expressed differently between the two sets of
samples are performed. Go to the “ArrayTools” menu, and select
“Analysis wizard”. Click on the button “Gene Finding”.
Select “Single label (e.g. Affy)”.
Click on the button “Comparing Classes”. Click on the button
“Class Comparison”. Click on the “OK” button. Under the
“Experiment Design” section, use the menu to select the “Column
defining classes” as “Type”. Under “Find gene lists determined
by”, choose “Restriction on proportion of false discoveries”.
Accept the rest of the defaults. Click on the “OK” button to
execute the gene finding calculation. Note that the results will
appear in the web browser such as Internet Explorer or Firefox.
Note that the table of genes can be copied and pasted into a
spreadsheet application such as excel. For further analysis, it
is useful to save the list of Affymetrix “ProbeSet”
identifiers from this table into a file (file SkinPsoriasisGenesListIDsOnly.txt). Please note that well known
psoriatic markers TGM1, IVL, CSTA, FABP5, and SPRR are up-regulated as
given in step A,
- Visualization: “View a
clustered heatmap of significant genes” by
clicking on the link with that name on the results page under the
analysis: Use a web application called DAVID (http://david.abcc.ncifcrf.gov) from the
National Institute for Allergy and Infectious Diseases to aid in the
functional interpretation of the list of differentially expressed
genes by clicking here (http://david.abcc.ncifcrf.gov).
on the “Start Analysis” link. To upload the list of differentially
expressed genes, use option “B. Choose From a
File” under the tab “Upload”. Click on “Browse”. Select the
Desktop file “SkinPsoriasisGeneListIDsOnly.txt”. Click the “Open”
that “Step 2: Select Identifier” shows the menu item
“Step 3: List Type”, choose “Gene List”.
“Step 4: Submit List”, click the button “Submit List”. Next select
the tab “Background”. Under this “Population Manager”, go to the
section “Affymetrix 3’ IVT Backgrounds” and look for “Human Genome U95A
Array”. (It is listed below a “U133B” array.) Select “Human
Genome U95A Array”. On the “List” tab, under the section “Select to
limit annotations by one or more species” select only “Homo sapiens”.
Click on the button “Select Species”.
the name of the successfully submitted gene list and background listed on
the right of the newly loaded page under “Analysis Wizard”. The section is labeled
“Step 1. Successfully submitted gene list”. Now the analysis can be
Gene Ontology: Click on the link “Functional Annotation
Tool”. Shown in dark red are the DAVID annotation categories and in
parentheses, the number of annotations in each category which are
associated with genes from the submitted list.
on the button “Functional Annotation Clustering”. These are a list of
annotation terms, including a count of how many genes from the list are
associated with the term.
Pathway: Click on the link “Pathways”. Then click on the “Chart” button to the
right of “KEGG_PATHWAY”. Finally,
click on the link “Proteasome” in the first row of the table to the right
of “KEGG_PATHWAY”. Note the genes from the list shown in
red. Why would so many proteasome genes have higher expression in
regions active with psoriasis? Note also in the KEGG_PATHWAY
Proteasome figure that some of the genes are involved in the “Formation of
immunoproteasomes” function (see the lower part of the figure).
Return to the
microarray data analysis output (from step J) to verify that the Active
genes (Class 1 in the output) labeled proteasome (such as PSMA3, PSMD11,
PSMB6, and PSMB8) are higher in expression than when found in the Inactive