Skip to contents

This vignette explains how to properly cite SMARTER data when used in your research work. SMARTER data are composed by background datasets and foreground datasets: the former derive from other public sources, while the latter are generated by the SMARTER project. Ideally, when using SMARTER data, foreground datasets should be cited using the SMARTER-database publication (see below), while background datasets should be cited using the original publication(s) from which data were taken. In this vignette we provide some tips to retrieve citation information using the smarterapi package.

Loading the package

When loading the smarterapi package, a message is shown to remind users to cite SMARTER data if used in publications:

You can also retrieve citation information using the citation() function:

citation("smarterapi")
#> To cite SMARTER-database in publications use:
#> 
#>   Paolo Cozzi, Arianna Manunza, Johanna Ramirez-Diaz, Valentina
#>   Tsartsianidou, Konstantinos Gkagkavouzis, Pablo Peraza, Anna Maria
#>   Johansson, Juan José Arranz, Fernando Freire, Szilvia Kusza, Filippo
#>   Biscarini, Lucy Peters, Gwenola Tosser-Klopp, Gabriel Ciappesoni,
#>   Alexandros Triantafyllidis, Rachel Rupp, Bertrand Servin, Alessandra
#>   Stella, SMARTER-database: a tool to integrate SNP array datasets for
#>   sheep and goat breeds, Gigabyte, 2024
#>   https://doi.org/10.46471/gigabyte.139
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {SMARTER-database: a tool to integrate SNP array datasets for sheep and goat breeds},
#>     author = {Paolo Cozzi and Arianna Manunza and Johanna Ramirez-Diaz and Valentina Tsartsianidou and Konstantinos Gkagkavouzis and Pablo Peraza and Anna Maria Johansson and Juan José Arranz and Fernando Freire and Szilvia Kusza and Filippo Biscarini and Lucy Peters and Gwenola Tosser-Klopp and Gabriel Ciappesoni and Alexandros Triantafyllidis and Rachel Rupp and Bertrand Servin and Alessandra Stella},
#>     journal = {GigaByte},
#>     year = {2024},
#>     url = {https://github.com/cnr-ibba/SMARTER-database},
#>     doi = {10.46471/gigabyte.139},
#>   }

This will provide you with the reference to the SMARTER-database publication to be used when citing SMARTER data, in particular for foreground datasets.

Retrieving citation information for samples

For background datasets, the original publication is tracked in the doi field. You can retrieve citation information for the samples you are using by merging sample data with dataset data. For example, let’s retrieve Italian sheep samples and then merge them with dataset information using the dataset_id field:

italian_sheeps <- get_smarter_samples(
  species = "Sheep",
  query = list(country = "Italy")
)

datasets <- get_smarter_datasets(
  query = list(species = "Sheep")
)

italian_sheeps_with_datasets <- dplyr::inner_join(
  datasets,
  italian_sheeps,
  by = dplyr::join_by(`_id.$oid` == `dataset_id.$oid`)
)

Select some columns for simplicity: let’s keep the breed with the doi, to know which publication to cite for each breed used. The same breed could be present in multiple datasets, or the same dataset could include multiple breeds. Let’s keep only distinct rows for simplicity. Note that since both the dataset and sample tables have a breed column, the resulting data frame will have breed.x and breed.y columns, respectively for the left and right side of the join. We are interested in the breed.y column, which we rename to breed_name for clarity:

italian_breeds_doi <- italian_sheeps_with_datasets %>%
  dplyr::select(breed.y, doi) %>%
  dplyr::rename(breed_name = breed.y) %>%
  dplyr::distinct()

italian_breeds_doi %>% head()
#>                breed_name                                          doi
#> 1              Altamurana https://doi.org/10.1371/journal.pbio.1001258
#> 2                Comisana https://doi.org/10.1371/journal.pbio.1001258
#> 3                 Leccese https://doi.org/10.1371/journal.pbio.1001258
#> 4 SardinianAncestralBlack https://doi.org/10.1371/journal.pbio.1001258
#> 5       Sardinian mouflon   https://doi.org/10.1038/s41598-017-07382-7
#> 6             Sarda sheep   https://doi.org/10.1038/s41598-017-07382-7

The smarterapi package includes utility functions to help users retrieve citation information. These functions use memoization to speed up repeated queries. For example, you can use the get_full_citation() function to retrieve the full citation information for a given DOI, and get_short_citation() to retrieve a short citation. Let’s apply these functions to our dataframe:

italian_breeds_citations <- italian_breeds_doi %>%
  dplyr::mutate(
    full_citation = sapply(doi, get_full_citation),
    short_citation = sapply(doi, get_short_citation)
  )

italian_breeds_citations %>%
  dplyr::select(breed_name, short_citation) %>%
  head()
#>                breed_name      short_citation
#> 1              Altamurana   Kijas et al. 2012
#> 2                Comisana   Kijas et al. 2012
#> 3                 Leccese   Kijas et al. 2012
#> 4 SardinianAncestralBlack   Kijas et al. 2012
#> 5       Sardinian mouflon Barbato et al. 2017
#> 6             Sarda sheep Barbato et al. 2017

In the example above we retrieved both full and short citations for each breed using the doi field. We displayed only the breed name and short citation for simplicity. You can use the full citation when preparing your bibliography, while the short citation can be used in the main text of your publication. Let’s see how to display the full citations for all the Italian sheep breeds we collected:

italian_breeds_citations %>%
  dplyr::select(short_citation, full_citation) %>%
  dplyr::distinct()
short_citation full_citation
Kijas et al. 2012 Kijas, J. W., Lenstra, J. A., Hayes, B., Boitard, S., Porto Neto, L. R., San Cristobal, M., Servin, B., McCulloch, R., Whan, V., Gietzen, K., Paiva, S., Barendse, W., Ciani, E., Raadsma, H., McEwan, J., & Dalrymple, B. (2012). Genome-Wide Analysis of the World’s Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection. PLoS Biology, 10(2), e1001258. https://doi.org/10.1371/journal.pbio.1001258
Barbato et al. 2017 Barbato, M., Hailer, F., Orozco-terWengel, P., Kijas, J., Mereu, P., Cabras, P., Mazza, R., Pirastru, M., & Bruford, M. W. (2017). Genomic signatures of adaptive introgression from European mouflon into domestic sheep. Scientific Reports, 7(1). https://doi.org/10.1038/s41598-017-07382-7
Ciani et al. 2020 Ciani, E., Mastrangelo, S., Da Silva, A., Marroni, F., Ferenčaković, M., Ajmone-Marsan, P., Baird, H., Barbato, M., Colli, L., Delvento, C., Dovenski, T., Gorjanc, G., Hall, S. J. G., Hoda, A., Li, M.-H., Marković, B., McEwan, J., … Lenstra, J. A. (2020). On the origin of European sheep as revealed by the diversity of the Balkan breeds and by optimizing population-genetic analysis tools. Genetics Selection Evolution, 52(1). https://doi.org/10.1186/s12711-020-00545-7

Collect citations in BibTeX format

You can also retrieve citation information in BibTeX format, which is useful when preparing your bibliography with BibTeX. The get_bibtex_citation() function retrieves citation information in BibTeX format for a given DOI:

# retrieve bibtex citations
bibtex_entries <- italian_breeds_doi %>%
  dplyr::mutate(bibtex_citation = sapply(doi, get_bibtex_citation)) %>%
  dplyr::select(bibtex_citation) %>%
  dplyr::distinct() %>%
  dplyr::pull(bibtex_citation)

# collect all entries into a single string
bibtex_content <- paste(bibtex_entries, collapse = "\n\n")

# Save to a .bib file
writeLines(bibtex_content, "italian_breeds.bib")