specleanr: Detecting Environmental Outliers in Data Analysis Pipelines

A framework used to detect and handle outliers during data analysis workflows. Outlier detection is a statistical concept with applications in data analysis workflows, highlighting records that are suspiciously high or low. Outlier detection in distribution models was initiated by Chapman (1991) (available at <https://www.researchgate.net/publication/332537800_Quality_control_and_validation_of_point-sourced_environmental_resource_data>), who developed the reverse jackknifing method. The concept was further developed and incorporated into different R packages, including 'flexsdm' (Velazco et al., 2022, <doi:10.1111/2041-210X.13874>) and 'biogeo' (Robertson et al., 2016 <doi:10.1111/ecog.02118>). We compiled various outlier detection methods obtained from the literature, including those elaborated in Dastjerdy et al. (2023) <doi:10.3390/geotechnics3020022> and Liu et al. (2008) <doi:10.1109/ICDM.2008.17>. In this package, we introduced the ensembling aspect, where multiple outlier detection methods are used to flag the record as either an absolute outlier. The concept can also be applied in general data analysis, as well as during the development of species distribution models.

Version: 1.0.0
Depends: R (≥ 4.1.0)
Imports: cluster, dbscan, e1071, isotree, methods, utils, robust, robustbase, usdm, mgcv
Suggests: dplyr, knitr, rmarkdown, testthat (≥ 3.0.0), ggplot2, ggpmisc, tibble, rinat, rvertnet, rgbif, curl, rfishbase (≥ 5.0.1), sf, terra, tidytext, scatterplot3d
Published: 2025-11-25
DOI: 10.32614/CRAN.package.specleanr (may not be active yet)
Author: Anthony Basooma ORCID iD [aut, cre], Thomas Hein ORCID iD [ctb, fnd, ths], Astrid Schmidt-Kloiber ORCID iD [ctb, fnd, dtc], Merret Buurman [ctb], Sami Domisch [ctb], Martin Tschikof [ctb], Florian Borgwardt ORCID iD [ctb, fnd]
Maintainer: Anthony Basooma <anthony.basooma at boku.ac.at>
BugReports: https://github.com/AnthonyBasooma/specleanr/issues
License: GPL (≥ 3)
URL: https://anthonybasooma.github.io/specleanr/
NeedsCompilation: no
Materials: README
CRAN checks: specleanr results

Documentation:

Reference manual: specleanr.html , specleanr.pdf
Vignettes: Detecting environmental outliers in species distribution models for plants. (source, R code)
Flag outliers based on species ecological ranges. (source, R code)
Environmental outlier detection with bootstrapping and principal component analysis. (source, R code)
General outlier detection for univariate datasets (source, R code)
Optimising the LOESS method used for automatic identification of the outlier thresholds. (source, R code)

Downloads:

Package source: specleanr_1.0.0.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): specleanr_1.0.0.tgz, r-oldrel (arm64): specleanr_1.0.0.tgz, r-release (x86_64): not available, r-oldrel (x86_64): specleanr_1.0.0.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=specleanr to link to this page.