The NOTCH signaling pathway
is involved in intercellular communication, using a family of NOTCH proteins
involved in gene regulation. Researchers created a peptide, SAHM1,
which disrupts this pathway. Students will find genes affected by this
disruption using microarray data from the NCBI public database, Gene
Expression Omnibus (GEO) with identifier GSE18198. They will use
BRB-Array Tools from NCI to analyze the data.
The steps are as
follows (steps A-C can be skipped, if these files are already provided for
Download data from the GEO
, under the section “Supplementary file” by clicking on the link
(A direct link to the data is
provided here - KOPT-K1Subset.zip). Extract the data from the zip file.
gene annotation file from the GEO page
under the section “Platforms” by clicking on the link GPL570. The page
for GPL570, the platform used in this study, has a link at the bottom called
“Download full table…”. This link can be
used to download the annotation file. (A direct link to the data
is provided here -
experiment description table with information about the experiment
using the file names of the micorarray data
files as the array identifiers in the first column (file name extensions
such as “.cel” should not be included).
The second column should contain the name of the treatment. This
information can be found at the GEO web site for GSE18198 under the
. Use only the KOPT-K1 cell line.
Alternatively, one can use the file provided here -
- Import and
preprocess the data: Open excel and click
on “Add-Ins” to access “ArrayTools”. Click on
“ArrayTools”, and select “Import data” and then
“Data import wizard”. In the Data Import Wizard, select “Data Type” “Affymetrix .CEL Data”. Select “File Type” as “The
expression data are in separate files stored in one folder. Click on
the browse button. Find the Desktop folder “KOPT-K1Subset”. Select this
folder and click “OK”. In the “Data Import Wizard”, also click “OK”.
Click “Yes” when asked if 6 arrays are the correct number of arrays.
Select “justRMA” as the “method to analyze your Affymetrix CEL files”, and then click “OK”
data: Continuing with the “Options for
Annotation”, select “Import your own annotation file, and click
“OK”. At this point you may get a question about installing a
package from BioConductor. If this
occurs, click “Yes”. Next for “Please specify the location of your
gene identifiers”, select “The identifiers are stored in a separate
file.” Click “Browse” to select your “Gene Identifiers
file”. On the Desktop select the file
HG-U133_Plus_2na32AnnotBrief.txt) and click “Open”. For
the “Gene Name, Title, or Description”, select “Col 5: Gene
Title”. For the “GenBank Accession”,
select “Col 2: Representative Public ID”. For the “Map Location”,
select “Col 4: Alignments”. Then click the button “Next”.
experiment description file:
Continuing with the Experiment Descriptor File (file
click the “Browse” button. Select the Desktop file ExpDecFile.xls
and click “Open”. Then click the button “Next”.
output file location: Name the
project folder by typing “KOPTK1-Project” into the text box to the right
“Project folder”. Name the project by typing “KOPTK1Project.xls”
into the text box to the right of “Project name”. Then click the button,
“Next”. Please wait a few minutes
for the large data sets to upload.
Also, currently on Windows XP and BRB ArrayTools
4.2, feedback is not forthcoming until one clicks on the window and
Filtering: Note that
the analysis of Affymetrix data does not use the
interfaces for “1. Spot filters” or “2. Normalization” except for two default
parameters. Click on “3. Gene filters”. Accept the defaults
by clicking the button “OK”. This executes the data preparation stage of
the analysis. Acknowledge the “number of genes passing the filtering
and subsetting criteria” by clicking on the “OK”
button. Acknowledge the number of arrays and the number of arrays for
which data is shown by clicking on the button “OK”. Check the import of
the annotation file by selecting the excel worksheet labeled “Gene
identifiers”. This is at the bottom of the excel workbook.
such as clustering the samples to ensure that they group into the
expected sets is performed next. Click on the “ArrayTools”
menu, and select “Graphics” and then “Visualization of samples”.
For “Class variable for coloring rotating scatterplot”, select
“Treatment”. Accept the rest of the defaults. Click on the
button “OK”. Note that samples cluster according to treatment
groups, with the DMSO group colored blue and the SAHM1 group colored
green. Click on the button “Close 3D plot”.
to find genes that are expressed differently between the two sets of
samples are performed. Go to the “ArrayTools”
menu, and select “Analysis wizard”. Click on the button “Gene
Finding”. Select “Single label (e.g. Affy)”.
Click on the button “Comparing Classes”. Click on the button
“Class Comparison”. Click on the “OK” button. Under the
“Experiment Design” section, use the menu to select the “Column defining
classes” as “Treatment”. Under “Find gene lists determined by”,
choose “Restriction on proportion of false discoveries”. Accept
the rest of the defaults. Click on the “OK” button to execute the
gene finding calculation. Note that the results will appear in the
web browser such as Internet Explorer or Firefox. Note that the
table of genes can be copied and pasted into a spreadsheet application
such as excel. For further analysis, it is useful to save the list
of Affymetrix “ProbeSet”
identifiers from this table into a file (file KOPT-K1GenesListIDsOnly.txt).
Visualization: “View a clustered heatmap
of significant genes” by clicking on the link with that name on the
results page under the section “Contents”.
analysis: Use a web application called DAVID (http://david.abcc.ncifcrf.gov)
from the National Institute for Allergy and Infectious Diseases to aid
in the functional interpretation of the list of differentially expressed
genes by clicking here (http://david.abcc.ncifcrf.gov).
Click on the
“Start Analysis” link. To upload the list of differentially
expressed genes, use option “B. Choose From a
File” under the tab “Upload”. Click on “Browse”. Select the
Desktop file “KOPT-K1GeneListIDsOnly.txt”. Click the “Open”
“Step 2: Select Identifier” shows the menu item
3: List Type”, choose “Gene List”.
For “Step 4:
Submit List”, click the button “Submit List”. Next select the tab
“Background”. Under this “Population Manager”, go to the section
“Affymetrix 3’ IVT Backgrounds” and look for
“Human Genome U133 Plus 2 Array”. (It is
listed below one of the “Focus” arrays.) Select “Human Genome
U133 Plus 2 Array”. On the “List” tab, under the section “Select
to limit annotations by one or more species” select only “Homo
sapiens”. Click on the button “Select Species”.
Note the name
of the successfully submitted gene list and background listed on the
right under “Step 1. Successfully submitted gene list”. Now the
analysis can be executed.
Click on the link “Functional Annotation Tool”. Shown in dark red
are the DAVID annotation categories and in parentheses, the number of
annotations in each category which are associated with genes from the
Click on the
button “Functional Annotation Clustering”. These are a list of
annotation terms, including a count of how many genes from the list are
associated with the term.
To see a heatmap representing associations of the genes and
terms click on the heatmap icon. Click the
“Run” button, if asked if you want to run the application. Click
on the “Yes” button if given a security warning. On the resulting
heatmap page, click on the “Zoom Out”
link. In the heatmap, choose a gene
from the descriptions on the right of the map. For example,
ribosomal protein L38 has only one association with a term.
Hovering over the green square in the row to the left of this gene
description, highlights the associated term in the list at the bottom
of the table. It is “GO:0006412~translations”.
- Pathway: Go back to the “Functional Annotation
Result” tab of the web browser. This is the page that showed in
dark red the DAVID annotation categories. Click on the button,
“Functional Annotation Table”. This table gives a list of all of
the terms associated with each gene.
Go back to
the “Functional Annotation Result” window of the web browser (the page
with the DAVID annotation categories shown in dark red). Find the
annotation category “Pathways”, and click on the “+” to the left of it
to expand the category. Locate the “KEGG_PATHWAY” term and click
on the blue bar to the right of the “Chart” button. In this
“Function Annotation Table”, under the section “Notch homolog 2
(Drosophila)”, click on the link “Notch Signaling Pathway”. Note
the genes from the list shown in red.
Return to the
microarray data analysis output web page obtained in step 10 to verify that
the treatment of SAHM1 (Class 2 in the output) caused a disruption in
this pathway, possible decreasing the expression of the Notch 2
(1557543_at) and Deltex (227336_at, DTX1)
genes shown in the figure.