This vignette describe how to use the variant
endpoints which store information about SNPs used in the smarter
genotype datasets.
Assembly versions
One of the aim of this project is to manage genotypes in different
assembly version. This means collect data from different assemblies (due
to when data is generated), from different sources (Affymetrix,
Illumina, WGS) and different file formats. Genotypes
are normalized in order to be consistent accross data sources and stored
in one genotype file for each specie.
Currently four assembly versions are managed, two for the sheep
dataset and two for the goat dataset. Information about assemblies data
sources can be retrieved from the backend info
endpoint through the get_smarter_info()
function:
ARS1 |
ARS1 |
manifest |
CHI1 |
CHI1.0 |
SNPchiMp v.3 |
OAR3 |
Oar_v3.1 |
SNPchiMp v.3 |
OAR4 |
Oar_v4.0 |
SNPchiMp v.3 |
Collect data from an assembly
get_smarter_variants()
have two mandatory parameters,
species
and assembly
, then it could accept
additional parameters (see one of the variant
endpoints to have more information). For example you can search
variants for snp name or rs id (if the latter
exists):
snp <- get_smarter_variants(
species = "Goat",
assembly = "ARS1",
query = list(
name = "snp12965-scaffold1499-3295573"
)
)
Table continues below
Affx-122936132 |
8 |
2021-01-08 |
T/C |
BOT |
Table continues below
A/G |
manifest |
66621364 |
BOT |
ARS1 |
snp12965-scaffold1499-3295573 |
AffymetrixAxiomGoatv2 |
rs119103301 |
Please, refer to the get_smarter_info()
working_assemblies
to have an idea of the assemblies
supported by the SMARTER-database.
Data which come from SNPchiMp v.3 like the
Sheep OAR3
assembly, support the illumina
forward attribute. For example the following SNP:
Table continues below
IlluminaOvineHDSNP |
C/T |
18 |
T/C |
T/C |
Table continues below
bottom |
A/G |
SNPchiMp v.3 |
64249211 |
ss896246690 |
forward |
Oar_v3.1 |
oar3_OAR18_64249211 |
rs10721092 |
Is T/C
on the forward strand of OAR3:
this means that the reversed probe is aligned to the genome (as you
could infer from the bottom
illumina strand
attribute of this SNP). Genotypes in the SMARTER-database are converted
using the illumina top coding convention, so you will
find this SNP as A/G
in the SMARTER-database while on the
reference sequence it’s T/C
.
Fetch Variants by region
Variants endpoint support query by regions, using
<chromosome>:<start>-<end>
as format, for
example:
Table continues below
Affx-1111832849 |
IlluminaGoatSNP50,AffymetrixAxiomGoatv2 |
1 |
2021-01-08 |
Affx-122941307 |
IlluminaGoatSNP50,AffymetrixAxiomGoatv2 |
1 |
2021-01-08 |
NA |
IlluminaGoatSNP50 |
1 |
2021-01-08 |
Table continues below
T/C |
BOT |
A/G |
manifest |
18960 |
TOP |
T/C |
BOT |
A/G |
manifest |
47271 |
BOT |
T/C |
BOT |
A/G |
manifest |
63095 |
BOT |
ARS1 |
1_18960_AF-PAKI |
NULL |
ARS1 |
snp14099-scaffold1560-920888 |
rs268246860 |
ARS1 |
GoatD01.029677 |
NULL |
Fetch Variants by chip name
You can download all variants for a certain chip: please
consider that it will require a lot of time and memory, since we store
more than 600K SNPs in the smarter database. First, collect the
available chips from the SMARTER-database, for example for the Sheep
species:
illumina |
54241 |
IlluminaOvineSNP50 |
Sheep |
illumina |
605998 |
IlluminaOvineHDSNP |
Sheep |
affymetrix |
56793 |
AffymetrixAxiomOviCan |
Sheep |
affymetrix |
49702 |
AffymetrixAxiomBGovisNP |
Sheep |
affymetrix |
60379 |
AffymetrixAxiomBGovis2 |
Sheep |
NA |
0 |
WholeGenomeSequencing |
Sheep |
Then collect all the SNPs for a certain chip by providing the SMARTER
chip name. Please, consider that you will download more than
50K SNP for this chip and this will take a lot of time
variants <- get_smarter_variants(
species = "Sheep",
assembly = "OAR3",
query = list(
chip_name = "IlluminaOvineSNP50"
)
)