On MPD's SNP data retrieval utility page,
specify genomic region, desired mouse strains or data set, and any desired filtering, then
You'll be able to preview your result on
the web page and download your entire result as CSV.
If you'd like to adjust and rerun your query just use your browser's Back button to return to the form.
Chromosomal coordinates are GRCm38 / mm10. Functional annotation is from dbSNP build 142.
Please contact us to register interest in any of the following data sets: Chicago1 8200+ locations, 58 inbred strains (2009) CNB1 1300+ locations, 18 inbred strains. (2010) JAXSNP1 2000+ locations, 107 inbred strains (2007) UNC-MUGA1 7400+ locations, 19 inbred strains (2011) WUSTL1 2300+ locations, 16 LGXSM strains (2006)
Screen shot of example result
Any call cells that are white / empty indicate "no data available".
Specifying genes | markers | regions
Enter a gene symbol or chromosome coordinate range. Click on the Show examples link to see various possibilities.
As noted there, multiple items can be supplied, and
it's also possible to retrieve an entire chromosome or entire genome (except for the largest data sets).
Click on Include additional flank to add upstream and/or downstream flank to the retrieval basepair range for
each requested gene or marker.
When searching on gene / marker symbols, MGI's current nomenclature is recognized, and retrieved locations are based on MGI's current coordinates.
MGI batch query can be used to get current gene symbols.
Please note that this MGI gene information sometimes differs from the dbSNP 142 SNP annotation.
Here's how to use MGI batch query to be sure all symbols in your set of genes are current:
1. Go to MGI batch query 2. Copy-paste your gene list into the tool
3. For Type, select "All Symbols/ Synonyms/ Homologs"
4. Check the box for Genome Location
5. Click Search
6. On the result page click on Excel File
7. Open the Excel result and see column D
8. Copy-paste column D to your destination.
Choosing mouse strains
If you selected a specific data set you'll get a pulldown of available strains. Otherwise you'll be typing into
an input field with search-suggest. Either way, after choosing a strain click on the green button to add it to the query.
If you're not sure about strains, type in CC8 which provides a good default.
Polymorphism filtering with strain groups A and B
The most common use of this feature is to limit your result to have only the locations where the call for one strain differs from that of another.
To do this choose a strain for Group A and a different strain for Group B.
If either strain's call is "No data" or "Het" then that location is ineligible for comparison.
You can also have two or more strains in Group A and/or Group B, in which case
a given location is considered polymorphic if all useable calls in Group A differ from all usable calls in Group B
(the calls in Group A must be uniform and likewise for Group B).
Cells that are "No data" or "Het" are ignored; if a group ends up with no usable calls then that location is ineligible for comparison.
Interpreting dbSNP 142 functional annotation
Using a variation effect prediction algorithm, NCBI dbSNP has annotated basepair locations within genes as intronic, coding, or several other classes.
In your results, MPD represents these annotations using an abbreviated notation:
code : gene
Certain function codes (Cn, Cs) have additional information appended to this construct, see below for more info.
The dbSNP 142 annotation gene names/locations sometimes differ from the MGI gene information found elsewhere in MPD.
Some basepair locations have multiple, differing annotations due to more than one transcript covering the location.
In these cases MPD lists the above construct for each unique instance seen, separated by a space.
On the other hand, some basepair locations have no functional annotation at all, either because the location
is not within any gene, or because the location is not present in NCBI dbSNP mouse build 142.
You can click on the dbSNP RS number linkout for further details on these annotations.
MPD function code
dbSNP equivalent term
Definition and Sequence Ontology link
A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved.
In MPD result displays, two
amino acid codes
(reference and variant) are appended to this annotation, followed by the amino acid position.
A sequence variant where there is no resulting change to the encoded amino acid.
In MPD result displays, one
amino acid code
(the reference) is appended to this annotation, followed by the amino acid position.
A transcript variant occurring within an intron.
A transcript variant of a non coding RNA gene.
A sequence variant whereby at least one base of a codon is changed, resulting in a
premature stop codon, leading to a shortened transcript.
Amino acid info is shown similarly to Cn above.
A sequence variant where at least one base of the terminator codon (stop) is changed,
resulting in an elongated transcript.
Amino acid info is shown similarly to Cn above.
A splice variant that changes the 2 base region at the 3' end of an intron.
A splice variant that changes the 2 base region at the 5' end of an intron.
(indels only) An attribute describing a sequence that contains a mutation involving the deletion or
insertion of one or more bases, where this number is not divisible by 3.
(indels only) An indel variation with length of multiple of 3bp,
not causing frameshift (no SO term)
Including indels in the result
One data set (Sanger) includes indels (small insertions or deletions that are larger than the usual one basepair size of SNPs).
You can opt to include these in your result using the checkbox near the bottom of the form.
MPD does not retain specific allele sequences for indels. Rather we encode them so that the C57BL/6J reference is always 0
then the other reported variants are assigned 1, 2 and so on. To see actual allele sequence use the
linkout to Sanger VCF.
Click on the rs number cells to go to NCBI dbSNP for more information on the specific location.
The Sanger VCF linkouts are no longer provided.
This column (to the far right) is useful when several genomic regions are supplied,
as a way to differentiate the requested regions in the result.