This vignette explains how to properly cite SMARTER data when used in
your research work. SMARTER data are composed by background
datasets and foreground datasets: the former derive from other
public sources, while the latter are generated by the SMARTER project.
Ideally, when using SMARTER data, foreground datasets should be
cited using the SMARTER-database publication (see below), while
background datasets should be cited using the original
publication(s) from which data were taken. In this vignette we provide
some tips to retrieve citation information using the
smarterapi package.
Loading the package
When loading the smarterapi package, a message is shown
to remind users to cite SMARTER data if used in publications:
You can also retrieve citation information using the
citation() function:
citation("smarterapi")
#> To cite SMARTER-database in publications use:
#>
#> Paolo Cozzi, Arianna Manunza, Johanna Ramirez-Diaz, Valentina
#> Tsartsianidou, Konstantinos Gkagkavouzis, Pablo Peraza, Anna Maria
#> Johansson, Juan José Arranz, Fernando Freire, Szilvia Kusza, Filippo
#> Biscarini, Lucy Peters, Gwenola Tosser-Klopp, Gabriel Ciappesoni,
#> Alexandros Triantafyllidis, Rachel Rupp, Bertrand Servin, Alessandra
#> Stella, SMARTER-database: a tool to integrate SNP array datasets for
#> sheep and goat breeds, Gigabyte, 2024
#> https://doi.org/10.46471/gigabyte.139
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Article{,
#> title = {SMARTER-database: a tool to integrate SNP array datasets for sheep and goat breeds},
#> author = {Paolo Cozzi and Arianna Manunza and Johanna Ramirez-Diaz and Valentina Tsartsianidou and Konstantinos Gkagkavouzis and Pablo Peraza and Anna Maria Johansson and Juan José Arranz and Fernando Freire and Szilvia Kusza and Filippo Biscarini and Lucy Peters and Gwenola Tosser-Klopp and Gabriel Ciappesoni and Alexandros Triantafyllidis and Rachel Rupp and Bertrand Servin and Alessandra Stella},
#> journal = {GigaByte},
#> year = {2024},
#> url = {https://github.com/cnr-ibba/SMARTER-database},
#> doi = {10.46471/gigabyte.139},
#> }This will provide you with the reference to the SMARTER-database publication to be used when citing SMARTER data, in particular for foreground datasets.
Retrieving citation information for samples
For background datasets, the original publication is tracked
in the doi field. You can retrieve citation information for
the samples you are using by merging sample data with dataset data. For
example, let’s retrieve Italian sheep samples and then merge them with
dataset information using the dataset_id field:
italian_sheeps <- get_smarter_samples(
species = "Sheep",
query = list(country = "Italy")
)
datasets <- get_smarter_datasets(
query = list(species = "Sheep")
)
italian_sheeps_with_datasets <- dplyr::inner_join(
datasets,
italian_sheeps,
by = dplyr::join_by(`_id.$oid` == `dataset_id.$oid`)
)Select some columns for simplicity: let’s keep the breed
with the doi, to know which publication to cite for each
breed used. The same breed could be present in multiple datasets, or the
same dataset could include multiple breeds. Let’s keep only distinct
rows for simplicity. Note that since both the dataset and
sample tables have a breed column, the resulting
data frame will have breed.x and breed.y
columns, respectively for the left and right side of the join. We are
interested in the breed.y column, which we rename to
breed_name for clarity:
italian_breeds_doi <- italian_sheeps_with_datasets %>%
dplyr::select(breed.y, doi) %>%
dplyr::rename(breed_name = breed.y) %>%
dplyr::distinct()
italian_breeds_doi %>% head()
#> breed_name doi
#> 1 Altamurana https://doi.org/10.1371/journal.pbio.1001258
#> 2 Comisana https://doi.org/10.1371/journal.pbio.1001258
#> 3 Leccese https://doi.org/10.1371/journal.pbio.1001258
#> 4 SardinianAncestralBlack https://doi.org/10.1371/journal.pbio.1001258
#> 5 Sardinian mouflon https://doi.org/10.1038/s41598-017-07382-7
#> 6 Sarda sheep https://doi.org/10.1038/s41598-017-07382-7The smarterapi package includes utility functions to
help users retrieve citation information. These functions use
memoization to speed up repeated queries. For example, you can use the
get_full_citation() function to retrieve the full citation
information for a given DOI, and get_short_citation() to
retrieve a short citation. Let’s apply these functions to our
dataframe:
italian_breeds_citations <- italian_breeds_doi %>%
dplyr::mutate(
full_citation = sapply(doi, get_full_citation),
short_citation = sapply(doi, get_short_citation)
)
italian_breeds_citations %>%
dplyr::select(breed_name, short_citation) %>%
head()
#> breed_name short_citation
#> 1 Altamurana Kijas et al. 2012
#> 2 Comisana Kijas et al. 2012
#> 3 Leccese Kijas et al. 2012
#> 4 SardinianAncestralBlack Kijas et al. 2012
#> 5 Sardinian mouflon Barbato et al. 2017
#> 6 Sarda sheep Barbato et al. 2017In the example above we retrieved both full and short citations for
each breed using the doi field. We displayed only the breed
name and short citation for simplicity. You can use the full citation
when preparing your bibliography, while the short citation can be used
in the main text of your publication. Let’s see how to display the full
citations for all the Italian sheep breeds we collected:
| short_citation | full_citation |
|---|---|
| Kijas et al. 2012 | Kijas, J. W., Lenstra, J. A., Hayes, B., Boitard, S., Porto Neto, L. R., San Cristobal, M., Servin, B., McCulloch, R., Whan, V., Gietzen, K., Paiva, S., Barendse, W., Ciani, E., Raadsma, H., McEwan, J., & Dalrymple, B. (2012). Genome-Wide Analysis of the World’s Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection. PLoS Biology, 10(2), e1001258. https://doi.org/10.1371/journal.pbio.1001258 |
| Barbato et al. 2017 | Barbato, M., Hailer, F., Orozco-terWengel, P., Kijas, J., Mereu, P., Cabras, P., Mazza, R., Pirastru, M., & Bruford, M. W. (2017). Genomic signatures of adaptive introgression from European mouflon into domestic sheep. Scientific Reports, 7(1). https://doi.org/10.1038/s41598-017-07382-7 |
| Ciani et al. 2020 | Ciani, E., Mastrangelo, S., Da Silva, A., Marroni, F., Ferenčaković, M., Ajmone-Marsan, P., Baird, H., Barbato, M., Colli, L., Delvento, C., Dovenski, T., Gorjanc, G., Hall, S. J. G., Hoda, A., Li, M.-H., Marković, B., McEwan, J., … Lenstra, J. A. (2020). On the origin of European sheep as revealed by the diversity of the Balkan breeds and by optimizing population-genetic analysis tools. Genetics Selection Evolution, 52(1). https://doi.org/10.1186/s12711-020-00545-7 |
Collect citations in BibTeX format
You can also retrieve citation information in BibTeX format, which is
useful when preparing your bibliography with BibTeX. The
get_bibtex_citation() function retrieves citation
information in BibTeX format for a given DOI:
# retrieve bibtex citations
bibtex_entries <- italian_breeds_doi %>%
dplyr::mutate(bibtex_citation = sapply(doi, get_bibtex_citation)) %>%
dplyr::select(bibtex_citation) %>%
dplyr::distinct() %>%
dplyr::pull(bibtex_citation)
# collect all entries into a single string
bibtex_content <- paste(bibtex_entries, collapse = "\n\n")
# Save to a .bib file
writeLines(bibtex_content, "italian_breeds.bib")