SNOMAD - Standardization and NOrmalization of MicroArray Data
 

The SNOMAD gene expression data analysis tools were developed by Carlo Colantuoni and George W. Henry in the laboratory of Jonathan Pevsner (Johns Hopkins School of Medicine, Department of Neuroscience and Kennedy Krieger Research Institute, Department of Neurology) and Scott Zeger (Johns Hopkins School of Public Health).

SNOMAD is Copyrighted (C) 2000 by Carlo Colantuoni, George Henry, Jonathan Pevsner and is distributed under the terms of the GNU General Public License and comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions: see details.

SNOMAD consists of a collection of algorithms directed at the normalization and standardization of DNA microarray data. The majority of the transformations within SNOMAD are directed at the refinement of paired microarray data. That is, two sets of element signal intensities generated in two individual hybridizations using a one-channel microarray technology (e.g radioactivity-based systems), or two sets of intensities generated in a single two-channel fluorescent experiment (two-color, simulataneous hybridization). The freely available R-statistical language was used to develop the SNOMAD tools. This page will allow you to apply any of the SNOMAD transformations to your own array data, without any programming expertise or downloading of additional software. Using the interface below, you may select any number of the SNOMAD transformations to be perform on an example dataset or your own microarray dataset. The YELLOW regions of this page highlight where your input is needed.

DATA
INPUT
You may either USE AN EXAMPLE DATASET or UPLOAD YOUR OWN DATASET.
If you want to use an example dataset to see how SNOMAD works: If you want to upload your own microarray dataset for analysis:
You can choose from three different example datasets. Each requires a different group of SNOMAD transformation. Here are the example datafiles you can choose:

1: Research Genetics Human GeneFilters Dataset. This example dataset includes two sets of element signal intensities, derived from two individual radioactive hybridizations (one experimental, one control) to cDNAs spotted on replicate nylon filters.

2: Incyte Genomics UniGEM v 1.0 Dataset. This example dataset includes two sets of element signal intensities, derived from a single simulataneos two-color fluorescence hybridization to cDNAs spotted on a glass slide.

3: AffyMetrix GeneChip Dataset. This example dataset includes two sets of element signal intensities, derived from two individual fluorescence hybridizations to photolithographically synthesized oligonucleotides anchored to a silicon chip.

You will select the transformations for the example file you choose, by checking the appropriate check boxes below. Check both the "Perform This Transformation" and "Graph This Transformation" checkboxes indicated for the specific example dataset you select (see below).
The datafile you upload must be a tab-delimited text file (.txt). MS Excel and other file types can easily be saved as .txt files using the "Save as ..." command. This file must contain two columns of raw element signal intensities which you would like to compare. Do not include any negative intensity values in this file. For Incyte data this could be the P1Signal and P2Signal columns (raw intensities, not "Balanced"). For Affymetrix data this could be two columns of Average Difference values. All rows containing information not derived from elements representing genes, i.e. control spots, total genomic DNA etc. should be removed from the datafile prior to uploading.

Optional are two columns of local background measurements (required for transformation #1 below), and two columns of X and Y coordinates descrbing the spatial location of array elements (required for transformation #2 optional below). The transformed data will be returned to you along with all the columns in your originally uploaded file, so any GenBank numbers, UniGene Identifiers, gene names or other array element descriptors will remain associated with both your raw and transformed data.

Each row of the datafile should contain information pertaining to the identical array element in a two-channel dataset, or corresponding elements in two one-channel datastes. All ratios will be defined as ONE / TWO, such that positive ratios and Z-scores indicate greater expression levels in sample ONE, and negative values greater expression in sample TWO.

Use an example dataset to see how SNOMAD works. Upload your own microarray dataset for transformation.


1: Research Genetics Human GeneFilters Dataset.
Below select Transformations #2 and #2 optional

This transformation will illustrate and correct hybridization artifacts resulting from the radioactive hybridization + washing process (manifested as non-uniform background intensities). This helps avoid the misidentification of differentially expressed genes due to non-uniform contribution of background intensity.

2: Incyte Genomics UniGEM v 1.0 Dataset.
Below select Transformations #2,3,4, and 5

Global Mean Normalization (see Transformation #2 below) ensures that the average intensity for two samples will be equal (or, the average ratio between two samples will be equal to a value 1, or 0 on a log scale). However, it is often the case that the global mean ratio is equivalent, while the mean ratio varies greater across the range of element signal intensities (See Transformation #5 below). This transformation will ensure that the mean expression ratio is 1 across the entire range of absolute expression values.

3: AffyMetrix GeneChip Dataset.
Below select Transformations #2,3,4,5, and 6

It is often the case that variance in expression ratios is non-uniform across the range of element signal intensities (See Transformation #6 below). Under these conditions two elements with equal expression ratios, may represent expression changes of different significance. This transformation will ensure that the variance is uniform across the range of element intensities, and hence that expression ratios are comparable across at all signal intensities.

Select the file you wish to upload for transformation: 

Check this box if the first line of your datafile contains header information:

Specify the column numbers in your datafile (leftmost column being 1) where each of these data types resides:

Required
Optional
ONEintensitiesONEbackgroundX
TWOintensitiesTWObackgroundY

You MUST include X and Y coordinates in the uploaded datafile and specify their column number in the datafile if you wish to apply Transformation #2 optional: Local Mean Normalization Across a Microarray Surface.

You MUST include background values in the uploaded datafile and specify their column number in the datafile if you wish to apply Transformation #1 Backgorund Subtraction.


OVERVIEW OF THE SNOMAD
TRANSFORMATIONS
 The SNOMAD Transformations
The red numbers in the image at left correspond to the numbers of the individual transformations listed below. Select the transformations you wish to perform and/or graph by checking the "Perform This Transformation" checkbox in YELLOW where each transformation is described below. If you select certain transformations, some additional parameters are required. These are listed directly below the name of each transformation in yellow. Suggested default values are already in place, but may be customized. In the description of each transformation below, the text at left describes the transformation and the image at right is its graphical illustration. The RED arrow indicates the transformation of the data. The region of each image highlighted in GREEN indicates the graphic which will be generated if the "Graph this Transformation" box is selected.
1 BACKGROUND SUBTRACTION
Perform This Transformation
Graph This Transformation
This function will subtract background intensities from element signal intensities as defined in the datafile specified above. If a single background values is to be subtracted from each element intensity, simply repeat this single value in all rows of the background column.

Element Intensity - Element Background Intensity =
Background Corrected Intensity

2 GLOBAL MEAN NORMALIZATION
Perform This Transformation
Graph This Transformation
This transformation will divide each element signal intensity in a set (column) of intensities by the mean intensity for that set of elements. This global mean normalization will be performed for both sets of intensities separately.

Element Intensity / Global Mean Element Intensity =
Global Mean Normalized Intensity

2 optional LOCAL MEAN NORMALIZATION ACROSS A MICROARRAY SURFACE
Perform This Transformation
Graph This Transformation
Span:  Trim:
This transform can detect and/or correct artifacts which are spatially systematic across the surface of a microarray. This includes artifacts generated in the robotic printing of arrays and hybridization artifacts (such as non-uniform background intensity values - example at right).

This transformation is described in the publication here, and requires X and Y coordinates for each microarray element (see "DATA IN" above). It should be used in combination with global mean normalization (Transform #2 above), and may be substituted for or used in combination with the background subtraction (Transform #1 above). This transformation can be used in a quality control capacity, to detect artifacts across the surface of arrays. Inspection of the graphical representation of the difference betweeen the local mean for the two arrays is most useful for this prupose. Alternatively, this transformation can be used to spatially systematic artifacts as they vary across an array surface.

Graphing this transformation will produce representations of the local element intensity as it varies across the surface of each of the microarrays. A representation of the differences betweeen this local mean for the two arrays will also be produced. In addition, a new Intensity vs. Intensity scatterplot will be produced, showing intensities following this normalization.

Element Intensity / Local Mean Element Intensity =
Local Mean Normalized Intensity


This is similar to the Global Mean Normalization (see Transformation #2 above). However, "Global Mean Element Intensity" is a single value used to normalize element intensities at all array positions, while "Local Mean Element Intensity" is a smooth function, estimated locally across the 2-dimensional array surface and used to normalize array elements at corresponding positions on the array surface.

The "loess" function in the R statistical language is used to calculate the function which estimates the local mean signal intensity across the array surface. The smoothness of this function (i.e. the size of the "window" used in calculating the local mean) is controled by the "span" parameter, which can be input above. A larger span creates a larger "window" and a smoother function for the estimation of the local mean. A smaller span creates a smaller "window" and a function which follows local changes more closely - beware of overfitting.

The loess function used here is "robust" in that it is insensitive to a small fraction of extreme or outlying values in the calculation of the local mean. The "trim" parameter, which can also be input above, sets the value of this fraction of extreme values which the function will ignore.

The data depicted at right were derived from two experiments using Research Genetics' Human GeneFilter microarray technology (a single-channel, radioactive hybridization technology), using total RNA derived from postmortem human brain labeled with 33P.


3 LOGARITHMIC TRANSFORMATION
Perform This Transformation
Graph This Transformation
Logarithmic Base:
This function will perform a logarithmic transformation of all intensities in both sets of element intensities. You can specify the base of the logarithmic transformation above.

4 CALCULATE MEAN LOG (INTENSITIES) and LOG (RATIOS)
Perform This Transformation
Graph This Transformation
For each pair of element intensities, this function will calculate the Mean Log(Intensity) (X axis) and Log(Ratio) (Y axis):

X axis: A measure of mean gene expression level in the two experiments: Mean (Log(Intensity)) = Geomentric Mean Intensity

Y axis: A measure of differential gene expression between the two samples: Log(ONE/TWO) = Log(ONE) - Log(TWO)

If you choose to perform this transformation, you must also include the logarithmic transformation of your data (Transformation #3 above).
5 LOCAL MEAN NORMALIZATION ACROSS ELEMENT SIGNAL INTENSITY
Perform This Transformation
Graph This Transformation
Span:  Trim:
This function will perform a mean normalization using a mean intensity which is calculated locally across the range of gene expression levels (i.e. X axis). We have also called this "Balancing" of gene expression ratios. This ensures that the mean expression ratio between the samples being compared is 1 (0 on the log scale) at all points across the range of element signal intensities, i.e. at all points across the X axis.

Log (Ratio) - Local Mean Intensity =
Residual = Corrected Log (Ratio)

The "loess" function in the R statistical language is used to calculate the local mean expression ratio (Y axis) across the range of expression levels (X axis). The "residuals" from this loess fit (i.e. the distance on the Y axis between a datapoint and the loess fit, see figure at right) are used as the corrected Log(Ratio) values (bottom plot at right). The "span" variable defined above determines the smoothness of the robust local regression which estimates this local mean intensity. The "trim" variable defined above determines the proportion of the most extreme values in the data which are ignored in the estimation of the local mean intensity.

The data depicted at right were derived from a single experiment using Incyte's UniGEM V 1.0 cDNA microarray technology, using mRNA derived from postmortem human brain labeled with the fluorescent dyes Cyn3 and Cyn5 (a two-channel, simulataneous fluorecent hybridization technology). This dataset is used as example dataset #2 at the top of this page where you can explore different example datasets.
6 LOCAL VARIANCE CORRECTION ACROSS ELEMENT SIGNAL INTENSITY
Perform This Transformation
Graph This Transformation
Span:  Trim:
This function will correct for differences in the variance of Log(Ratios) across the range of gene expression levels. This is done by dividing each Log(Ratio) value by the locally calculated standard deviation of the Log(Ratio) values. Because the variance correction entails division by the standard deviation, the resulting values are Z-scores (i.e. in standard deviation units). Because the standard deviation is calculated locally across gene expression level (X axis), the Z-scores reflect values relative to the amount of variance in the Log(Ratio) values at particular points along the X axis, i.e. variance in the log(ratio) values at particular expression levels.

Corrected Log (Ratio) / Local Stadard Deviation =
Local Z-Score

The "loess" function in the R statistical language is used to calculate the local standard deviation in Log(Ratio) values (Y axis) across the range of expression levels (X axis). This is done individually for the positive and negative Log(ratios). The "span" variable defined above determines the smoothness of the robust local regression which estimates the local standard deviation. The "trim" variable defined above determines the proportion of the most extreme values in the data which are ignored in the estimation of the local standard deviation.

The data depicted at right were derived from two experiments using the Affymetrix GeneChip oligonucleotide microarray technology, using RNA derived from human leukocytes (a single-channel, fluorecent hybridization technology). Affymetrix data is ideal for the illustration of the variance correction as the variance in Log(Ratio) values (Y axis) varies drastically across the range of element signal intensities (X axis). This dataset is used as example dataset #3 at the top of this page where you can explore different example datasets.

DATA
OUTPUT
SNOMAD OUTPUT
Send my web browser a new HTML page with my requested images and a link to my SNOMAD output datafile (~2 minutes).
Email me a text file containing my transformed data. Enter your full email address here: 

If you want your results sent to multiple addresses, enter them all in the above textbox separated by commas. Ex: "joe@ab.com, mary@cd.gov, bobo@ef.edu"

What do you want us to name the output file?

This name is optional, but useful if you send multiple jobs to SNOMAD so that you can distinguish them from one another when they are sent to you.

SUBMIT REQUEST
SUBMIT REQUEST