Type: | Package |
Title: | A Comprehensive Package for Sediment Source Unmixing |
Version: | 2.0 |
Encoding: | UTF-8 |
Date: | 2025-08-01 |
Description: | "This package quantifies the provenance of sediments in a catchment or study area. Based on a characterization of the sediment sources and the end sediment mixtures, a mixing model algorithm is applied to the sediment mixtures to estimate the relative contribution of each potential source. The package includes several graphs to help users in their data understanding, such as box plots, correlation, PCA, and LDA graphs. In addition, new developments such as the Consensus Ranking (CR), Consistent Tracer Selection (CTS), and Linear Variability Propagation (LVP) methods are included to correctly apply the fingerprinting technique and increase dataset and model understanding. A new method based on Conservative Balance (CB) method has also been included to enable the use of isotopic tracers." |
License: | GPL (≥ 3) |
URL: | https://github.com/eead-csic-eesa/fingerPro |
Depends: | R (≥ 3.5) |
Imports: | Rcpp (≥ 0.11.3), klaR (≥ 0.6-12), ggplot2 (≥ 2.2.1), GGally (≥ 1.3.2), plyr (≥ 1.8.4), MASS (≥ 7.3-45), reshape (≥ 0.8.7), grid (≥ 3.1.1), gridExtra (≥ 2.3), scales (≥ 0.5.0), car (≥ 3.0.0), RcppProgress (≥ 0.4), Ternary (≥ 1.2.2), dplyr (≥ 1.0.7), crayon (≥ 1.4.2), plotly (≥ 4.10.3), rgl (≥ 1.2.8) |
LinkingTo: | Rcpp, RcppGSL, RcppProgress |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | yes |
Packaged: | 2025-08-26 13:24:43 UTC; r1052262 |
Author: | Borja Latorre [aut, cre], Ivan Lizaga [aut], Leticia Gaspar [aut], Leticia Palazon [aut], Ana Navas [aut], Vince Q Vu [ctb] |
Maintainer: | Borja Latorre <borja.latorre@csic.es> |
Repository: | CRAN |
Date/Publication: | 2025-08-27 12:00:14 UTC |
A comprehensive package for sediment source unmixing
Description
This package quantifies the provenance of sediments in a catchment or study area. Based on a characterization of the sediment sources and the end sediment mixtures, a mixing model algorithm is applied to the sediment mixtures to estimate the relative contribution of each potential source. The package includes several graphs to help users in their data understanding, such as box plots, correlation, PCA, and LDA graphs. In addition, new developments such as the Consensus Ranking (CR), Consistent Tracer Selection (CTS), and Linear Variability Propagation (LVP) methods are included to correctly apply the fingerprinting technique and increase dataset and model understanding. A new method based on Conservative Balance (CB) method has also been included to enable the use of isotopic tracers.
Legal Deposits
FingerPro R. An R package for sediment source fingerprinting (computer program). Authors: Iván Lizaga, Borja Latorre, Leticia Gaspar, Ana María Navas. (EEAD-CSIC). Notarial Act No. 3758 (José Periel Martín), 18/10/2019. Representative of CSIC: Javier Echave Oria.
FingerPro. Model for environmental mixture analysis (computer program). Authors: Leticia Palazón, Borja Latorre, Ana María Navas. (EEAD-CSIC). Notarial Act No. 4021 (Pedro Antonio Mateos Salgado), 21/07/2017. Representative of CSIC: Javier Echave Oria.
Author(s)
Borja Latorre
Ivan Lizaga
Leticia Gaspar
Leticia Palazon
Ana Navas
Maintainer: Erosion, and Soil and Water Evaluation (Research Group) fingerpro@eead.csic.es
See Also
Useful links:
Examples
# Load the 'fingerPro' package to access its functions.
library('fingerPro')
################################################################################
########################## EXPLORATORY DATA ###################################
################################################################################
# Load the example dataset for a 3-source mixing problem.
data <- read.csv(system.file("extdata", "example_geo_3s_raw.csv", package = "fingerPro") )
# Verify the structure and integrity of the loaded dataset.
check_database(data)
# Create a box and whisker plot to visualize the distribution of each tracer.
# box_plot <- box_plot(data)
# Generate a correlation plot to examine relationships between tracers.
# correlation_plot(data)
# Perform and plot Linear Discriminant Analysis (LDA) to visualize group separation.
# LDA_plot <- LDA_plot(data)
# Perform and plot Principal Component Analysis (PCA) for dimensionality reduction.
# PCA_plot(data)
# Perform a Kruskal-Wallis test (KW) (p-values less than 0.05)
# output_KW <- KW_test(data, pvalue=0.05)
# Perform Discriminant Function Analysis (DFA) (confidence level is set to 0.1)
# output_DFA <- DFA_test(data, niveau = 0.1)
################################################################################
#################### FingerPro TRACER SELECTION ##############################
################################################################################
# Individual Tracer Analysis (ITA) to get descriptive statistics for each tracer.
# output_ITA <- individual_tracer_analysis(data)
# Calculate the Conservativeness Index (CI) for each tracer based on ITA results.
# output_CI <- CI(output_ITA)
# Generate ternary diagrams to visualize source contribution of tracers.
# ternary_diagram(output_ITA, tracers = c(1:9), rows = 3, cols = 3, solution = NA)
# Perform the Range Test (RT) to identify non-conservative tracers.
# output_RT <- range_test(data)
# Calculates a Consensus Ranking (CR) score to identify the most reliable tracers.
# output_CR <- CR(data, debates = 1000)
# Extract all possible minimal tracer combinations (seeds) for further evaluation.
# This helps identify the most discriminant subsets of tracers.
# output_CTS_seeds <- CTS_seeds(data, iter = 1000)
# Evaluate the mathematical consistency of a specific tracer combination using the
# Consistency Test and Selection (CTS) error.
# The user must select a row from the CTS_seeds output based on a criteria (see help CTS_error)
# Criteria: positive apportionment (if negative close to zero), high percentage of
# physically feasible solutions, low dispersion)
# e.g. select row 1: solution = output_CTS_seeds[1, ]
# output_CTS <- CTS_error(data, solution = output_CTS_seeds[1,])
#### OPTIMUM TRACER SELECTION
# Merge the results from CTS, CR, and CI into a single summary data frame.
# output_data_summary <- merge(output_CTS, output_CR, by = "tracer")
# output_data_summary <- merge(output_data_summary, output_CI, by = "tracer")
# Filter the summary data to select only the most robust tracers.
# The criteria are a low CTS error (< 0.05) and a high CR score (> 80).
# output_data_TracerSelection <- output_data_summary[output_data_summary$CTS_err < 0.05 &
# output_data_summary$CR_score > 80, ]
################################################################################
######################## U N M I X I N G #####################################
################################################################################
# Reload the original data to ensure the analysis starts from the full dataset.
# data <- read.csv(system.file("extdata", "example_geo_3s_raw.csv", package = "fingerPro") )
# Select only the tracers identified as optimal in the previous step.
# data <- select_tracers(data, output_data_TracerSelection[, 1])
# Run the unmixing model to estimate source apportionment.
# output_UNMIX <- unmix(data)
# Plot the unmixing results using a violin plot to visualize source contributions.
# plot_results(output_UNMIX)
# Plot the unmixing results using a density plot for an alternative visualization.
# plot_results(output_UNMIX, violin = FALSE)
Conservative Balance (CB) Method for Isotopic Tracer Analysis
Description
This function transforms isotopic ratio and content data of individual tracers in a dataset into virtual elemental tracers, which can then be combined with classical tracers and analyzed with standard unmixing models.
Usage
CB_method(data)
Arguments
data |
A data frame containing the isotopic tracer characteristics of sediment sources and mixtures. The data should be correctly formatted for isotopic analysis, including both isotopic ratio and isotopic content. Users should ensure their data is in a valid format by using the check_database() function before running the CB method. |
Details
The Conservative Balance (CB) method provides a novel, physically-based framework for analyzing isotopic tracers in sediment fingerprinting.
The core of the method is an exact transformation that combines the isotopic ratio and isotopic content into a virtual elemental tracer. This approach has two key advantages: it allows isotopic tracers to be analyzed using classical unmixing models, and it enables their combined use with elemental tracers to potentially increase the discriminant capacity of the fingerprinting analysis.
This function implements the simplified approximation of the CB transformation, assuming that the isotopic ratio is much smaller than 1. The calculation is performed for both averaged and non-averaged datasets.
A key feature of this transformation is that the tracer values for the mixture are set to zero. This is a direct consequence of the method, as the isotopic ratio of each source is subtracted from the mixture's isotopic ratio, meaning the mixture's own value minus itself results in zero.
Value
A data frame where isotopic tracers have been converted into scalar virtual tracers for further analysis. After the transformation, the mixture's row will have tracer values of zero.
References
Lizaga, I., Latorre, B., Gaspar, L., & Navas, A. (2022). Combined use of fingerprinting and tracing. Science of The Total Environment, 832, 154834.
Compute Conservativeness Index (CI) for individual tracers
Description
This function calculates the Conservativeness Index (CI) for each tracer based on the results of an individual tracer analysis.
The CI index was adapted from its original definition to better describe the conservativeness of tracers in a high-dimensional space of multiple sources. The predicted source contributions from each tracer were first calculated and characterized by their centroid. Then, the CI index was calculated as the percentage of solutions with conservative apportionments (0 <= wi <= 1) relative to the centroid position. This new definition of the CI does not penalize tracers with dominant apportionments from one source and distributions close to a vertex of the physical space, unlike the previous definition.
Usage
CI(ita)
Arguments
ita |
A list of data frames, where each data frame contains the predicted apportionments for a specific tracer, as obtained from the 'individual_tracer_analysis' function. |
Value
A data frame containing the CI value for each tracer.
References
Lizaga, I., Latorre, B., Bodé, S., Gaspar, L., Boeckx, P., & Navas, A. (2024). Combining isotopic and elemental tracers for enhanced sediment source partitioning in complex catchments. *Journal of Hydrology*, 631, 130768. https://doi.org/10.1016/j.jhydrol.2024.130768
Lizaga, I., Latorre, B., Gaspar, L., & Navas, A. (2020). Consensus ranking as a method to identify non-conservative and dissenting tracers in fingerprinting studies. *Science of The Total Environment*, *720*, 137537. https://doi.org/10.1016/j.scitotenv.2020.137537
Consensus Ranking (CR) method for tracer selection
Description
This function computes the Consensus Ranking (CR) method, an ensemble technique to identify non-conservative and dissenting tracers in sediment fingerprinting studies. The method combines predictions from single-tracer models and is based on a scoring function derived from a series of random "debates" between tracers.
Usage
CR(data, debates = 1000, seed = 123456)
Arguments
data |
A data frame containing sediment source and mixture data. Users should ensure their data is in a valid format by using the 'check_database()' function before running the CR method. |
debates |
An integer specifying the target number of debates each tracer should participate in. The function will run until each tracer has participated in at least this many debates. |
seed |
An integer used to initialize the random number generator for reproducibility. |
Details
The Consensus Ranking method is based on a series of random debates to test the compatibility of tracers. In each debate, a random subset of tracers is selected. The size of this subset is determined by the number of sources, corresponding to the minimum number of equations needed to overdetermine the unmixing model.
For each debate, a least-squares method is used to find a solution to the overdetermined mass balance equations. The consensus of the debate is measured by the mathematical compatibility of the tracers, specifically using the Root Mean Square Error (RMSE) of the mass balance equations. The tracer whose exclusion from the debate results in lowest RMSE is identified as the "dissenting" tracer for that round.
This process is repeated for a specified number of debates. Each tracer accumulates a count of total participations and a count of lost debates (being identified as dissenting). The final CR score is a quantitative measure of consensus, calculated as '100 - (lost debates / total debates) * 100'.
A low CR score indicates that a tracer frequently disrupts the consensus and is considered a non-conservative or dissenting tracer. Conversely, a high CR score suggests the tracer is in frequent agreement with the others, making it a reliable and conservative tracer for the unmixing model. This method is robust and does not require pre-screening or filtering of tracers.
Value
A data frame containing the CR score for each tracer. The score, ranging from 100 to 0, indicates the tracer's rank in terms of consensus and conservativeness. Tracers are ordered by their score in descending order, with the most conservative tracers having high scores and dissenting tracers having low scores.
References
Lizaga, I., Latorre, B., Gaspar, L., & Navas, A. (2020). Consensus ranking as a method to identify non-conservative and dissenting tracers in fingerprinting studies. *Science of The Total Environment*, *720*, 137537. https://doi.org/10.1016/j.scitotenv.2020.137537
Evaluate the mathematical consistency of a tracer selection for an apportionment solution.
Description
This function assesses the mathematical consistency of a tracer selection for an apportionment result by computing the normalized error between the predicted and observed tracer concentrations in the virtual mixture. A low normalized error for all tracers indicates a consistent tracer selection. This function can be used to A) extend a minimal tracer combination obtained from the 'CTS_seeds' function ensuring its mathematical consistency in order to select optimum tracers to perform the unmix, and to B) diagnose problems in the results of fingerprinting models.
Usage
CTS_error(data, solution)
Arguments
data |
A data frame containing the characteristics of sediment sources and mixtures. |
solution |
A data frame or vector containing the apportionment values for each source. If a data frame, the user must select a row from the CTS_seeds output based on a criteria: apportionment values should be positive (or if negative close to zero), high percentage of physically feasible solutions (percent_physical), and low dispersion indicating higher discriminant capacity. If a vector, it must contain the weights for each source, in the same order as they appear in the data. |
Details
The function calculates a normalized error for each tracer to assess the consistency of a given apportionment solution. The method involves first computing a "virtual mixture" by using the proposed apportionment values to perform a weighted average of the source tracer concentrations. The error for each tracer is then the difference between the tracer concentration in the real mixture and the virtual mixture. This error is normalized by the range of the tracer, which is estimated from the extremes of the sources' confidence intervals.
A low normalized error for all tracers (i.e., less than a predefined threshold like $0.05$) indicates a mathematically consistent tracer selection. If most tracers show low errors while a few have high errors, it suggests that those tracers may be non-conservative or less influential on the model's result. Conversely, high normalized errors in most tracers indicate mathematical inconsistency and can point to the existence of multiple partial solutions in the dataset.
Value
A data frame containing the normalized error for each tracer.
References
Latorre, B., Lizaga, I., Gaspar, L., & Navas, A. (2021). A novel method for analysing consistency and unravelling multiple solutions in sediment fingerprinting. *Science of The Total Environment*, *789*, 147804.
Extract all possible minimal tracer combinations to identify the most discriminant.
Description
This function generates a list of all possible minimal tracer combinations and serves as a crucial initial step (a "seed") in building a consistent tracer selection within a sediment fingerprinting study. This analysis systematically explores various minimal tracer combinations and solves the resulting determined systems of equations to assess the **variability** of each combination. The **dispersion of the solution** directly reflects the **discriminant capacity** of each tracer combination: a lower dispersion indicates a higher discriminant capacity. While traditional methods like Discriminant Function Analysis (DFA) also identify discriminant tracer combinations, this function provides solutions that are **not restricted to the physically feasible space (0 < wi < 1)**. This unconstrained approach is valuable for identifying problematic tracer selections that might otherwise be masked when using constrained unmixing models, as discussed by Latorre et al. (2021).
Usage
CTS_seeds(data, iter = 1000, seed = 123456)
Arguments
data |
Data frame containing sediment source and mixtures. Users should ensure their data is in a valid format by using the check_database() function before running this function. |
iter |
The number of iterations for the variability analysis. Increase 'iter' to improve the reliability and accuracy of the results. A sufficient number of iterations is reached when the output no longer changes significantly with further increases. |
seed |
An integer value used to initialize the random number generator. Setting a seed ensures that the sequence of random numbers generated during the unmixing is reproducible. This is useful for debugging, testing, and comparing results across different runs. If no seed is provided, a random seed will be generated. |
Details
The Consistent Tracer Selection (CTS) method, as described by Latorre et al. (2021), begins by considering all possible sets of $n-1$ tracers, where $n$ is the number of sources. Each of these sets forms a determined system of linear equations that can be solved. To account for the variability within the sources, each tracer set is iteratively solved. This process involves sampling the source average values from a t-distribution, reflecting the discrepancy between the true mean and the measured mean due to finite observations. The maximum dispersion observed in the average apportionments for each tracer set is then used as a criterion to rank them, with lower dispersion indicating higher discriminant capacity. This initial step is crucial for identifying multiple discriminant solutions within the dataset, a problem often unexplored by traditional tracer selection methods.
Value
The function returns a data frame summarizing all possible tracer combinations. The data frame includes the following columns for a scenario with three sources: 'tracers', 'w1', 'w2', 'w3', 'percent_physical', 'sd_w1', 'sd_w2', 'sd_w3', and 'max_sd_wi'. Each row represents a tracer combination, detailing its corresponding solution ($w_i$), the percentage of solutions that are physically feasible (0 < w_i < 1), the standard deviation of the results (sd_w_i), and the maximum dispersion among all sources (max_sd_w_i). The solutions are sorted in descending order, with the solution having the lowest dispersion appearing first. This highlights the most discriminant combinations.
References
Latorre, B., Lizaga, I., Gaspar, L., & Navas, A. (2021). A novel method for analysing consistency and unravelling multiple solutions in sediment fingerprinting. *Science of The Total Environment*, *789*, 147804.
Discriminant Function Analysis (DFA) Test
Description
Performs a stepwise forward variable selection using the Wilk's Lambda criterion to identify the most discriminant tracers in a dataset.
Usage
DFA_test(data, niveau = 0.1)
Arguments
data |
A data frame containing the characteristics of sediment sources and mixtures. |
niveau |
A numeric value specifying the significance level for the approximate F-test decision. |
Value
A data frame containing only the tracers that pass the DFA test.
Kruskal-Wallis rank sum test
Description
This function excludes from the original data frame the properties which do not show significant differences between sources.
Usage
KW_test(data, pvalue = 0.05)
Arguments
data |
Data frame containing source and mixtures |
pvalue |
p-value threshold |
Value
Data frame only containing the variables that pass the Kruskal-Wallis test
Linear discriminant analysis chart
Description
The function performs a linear discriminant analysis and displays the data in the relevant dimensions.
Usage
LDA_plot(data, P3D = FALSE, text = TRUE, colors = NULL, interactive = FALSE)
Arguments
data |
Data frame containing source and mixtures data |
P3D |
Boolean to switch between 2 to 3 dimensional chart |
text |
Boolean to show or not the identification number of each sample point in the plot |
colors |
Allows choosing between a different set of colors in the plots |
interactive |
Boolean to determine whether the plot should be interactive |
Principal component analysis chart
Description
The function performs a principal components analysis on the given data matrix and displays a biplot using vqv.ggbiplot package of the results for each different source to help the user in the decision.
Usage
PCA_plot(data, components = c(1, 2), colors = NULL)
Arguments
data |
Data frame containing source and mixtures data |
components |
Numeric vector containing the index of the two principal components in the chart |
colors |
Vector of colors to use for the groups in the plot |
Box and whiskers plot for sediment tracers
Description
This function creates a series of box and whisker plots to visualize the distribution and variability of individual tracers within a dataset. It is designed to work with sediment source and mixture data, automatically adapting to averaged or raw data formats.
Usage
box_plot(data, tracers = NULL, ncol = 3, colors = NULL)
Arguments
data |
A data frame containing sediment source and mixture data. Users should ensure their data is in a valid format by using the check_database() function before running this function. |
tracers |
A numeric vector specifying the column indices of the tracers to be plotted. The index 1 corresponds to the first tracer column after the sample ID and group columns. If NULL (the default), plots will be generated for all tracer columns. |
ncol |
An integer specifying the number of charts to display per row in the final plot layout. |
colors |
A character vector of colors to use for the box plots. The colors are applied sequentially to each group (sources and mixture). |
Details
This function is a wrapper for ggplot2 that automates the creation of a series of box plots, one for each tracer. The function first checks if the input data is averaged, and if so, converts it to a virtual raw dataset using the raw_dataset() function to enable the box plot visualization.
Each plot displays the distribution of a single tracer, with different groups (sources and mixtures) represented by separate box plots. In addition to the standard five-number summary (median, hinges, and whiskers), the function also overlays the sample count and the mean value for each group, providing a more detailed summary of the data.
The final output is a multi-panel plot arranged in a grid, with an optional legend depending on the input data.
Verify the integrity of a sediment unmixing database
Description
This function automatically infers the type of sediment database ("raw", "averaged", or "isotopic") based on its column names and verifies its integrity. It validates column names and their order to ensure data is correctly structured for subsequent package functions.
To retain conservative tracers for subsequent analyses, it is recommended to perform a minimal dataset cleaning beforehand:
Replace BDL (below detection limit) entries with a small positive number.
Exclude tracers whose mixture value is BDL or zero.
Optionally, remove tracers with predominantly BDL values.
**Database 'raw' format:** This database contains individual measurements for scalar tracers. It must have the following columns in order:
ID: Unique identifier for each sample.
samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.
tracer1, tracer2, ...: Columns for each tracer measurement.
**Database 'isotopic raw' format:** This database contains individual measurements for isotopic tracers, which require both ratio and content data. It must have the following columns in order:
ID: Unique identifier for each sample.
samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.
ratio1, ratio2, ...: Columns with the isotopic ratio values for each tracer.
cont_ratio1, cont_ratio2, ...: Columns with the corresponding content (concentration) values for each tracer.
**Database 'averaged' format:** This database contains statistical summaries of the scalar tracer data. It must have the following columns in order:
ID: Unique identifier for each sample.
samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.
mean_tracer1, mean_tracer2, ...: Columns with the mean value for each tracer.
sd_tracer1, sd_tracer2, ...: Columns with the standard deviation for each tracer.
n: The number of measurements used to calculate the mean and standard deviation.
**Database 'isotopic averaged' format:** This database contains statistical summaries for isotopic tracers. It must have the following columns in order:
ID: Unique identifier for each sample.
samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.
mean_ratio1, mean_ratio2, ...: Columns with the mean isotopic ratio values.
mean_cont_ratio1, mean_cont_ratio2, ...: Columns with the mean isotopic content values.
sd_ratio1, sd_ratio2, ...: Columns with the standard deviation of the isotopic ratio values.
sd_cont_ratio1, sd_cont_ratio2, ...: Columns with the standard deviation of the isotopic content values.
n: The number of measurements.
Usage
check_database(data)
Arguments
data |
A data frame to be checked. |
Value
A logical value ('TRUE' if the database is valid, 'FALSE' otherwise). If the check fails, the function will also print a descriptive error message.
Correlation matrix chart
Description
The function displays a correlation matrix of each of the properties divided by the different sources to help the user in the decision.
Usage
correlation_plot(
data,
columns = c(1:ncol(data) - 1),
mixtures = FALSE,
nmixtures = 1,
colors = NULL
)
Arguments
data |
Data frame containing sediment source and mixture data. Users should ensure their data is in a valid format by using the check_database() function before running this function. |
columns |
Numeric vector containing the index of the columns in the chart (the first column refers to the grouping variable) |
mixtures |
Boolean to include or exclude the mixture samples in the chart |
nmixtures |
Number of mixtures in the dataset |
colors |
Vector of colors to use for the scatterplot |
Biplot for Principal Components using ggplot2
Description
Biplot for Principal Components using ggplot2
Usage
ggbiplot(
pcobj,
choices = 1:2,
scale = 1,
pc.biplot = TRUE,
obs.scale = 1 - scale,
var.scale = scale,
groups = NULL,
ellipse = FALSE,
ellipse.prob = 0.68,
labels = NULL,
labels.size = 3,
alpha = 1,
var.axes = TRUE,
circle = FALSE,
circle.prob = 0.69,
varname.size = 3,
varname.adjust = 1.5,
varname.abbrev = FALSE,
...
)
Arguments
pcobj |
an object returned by prcomp() or princomp() |
choices |
which PCs to plot |
scale |
covariance biplot (scale = 1), form biplot (scale = 0). When scale = 1, the inner product between the variables approximates the covariance and the distance between the points approximates the Mahalanobis distance. |
pc.biplot |
for compatibility with biplot.princomp() |
obs.scale |
scale factor to apply to observations |
var.scale |
scale factor to apply to variables |
groups |
optional factor variable indicating the groups that the observations belong to. If provided the points will be colored according to groups |
ellipse |
draw a normal data ellipse for each group? |
ellipse.prob |
size of the ellipse in Normal probability |
labels |
optional vector of labels for the observations |
labels.size |
size of the text used for the labels |
alpha |
alpha transparency value for the points (0 = transparent, 1 = opaque) |
var.axes |
draw arrows for the variables? |
circle |
draw a correlation circle? (only applies when prcomp was called with scale = TRUE and when var.scale = 1) |
circle.prob |
size of the circle in Normal probability |
varname.size |
size of the text for variable names |
varname.adjust |
adjustment factor the placement of the variable names, >= 1 means farther from the arrow |
varname.abbrev |
whether or not to abbreviate the variable names |
... |
... |
Value
a ggplot2 plot
Individual tracer analysis
Description
This function computes the distribution of apportionments compatible with each individual tracer in the dataset, providing insights into the tracer's discriminant capacity and conservativeness. The method assesses the contribution of a single tracer to an unmixing model by solving a determined system of equations for each tracer.
Usage
individual_tracer_analysis(
data,
completion_method = "virtual",
iter = 5000,
seed = 123456L
)
Arguments
data |
A data frame containing the characteristics of sediment sources and mixtures. Users should ensure their data is in a valid format by using the 'check_database()' function before running the individual tracer analysis. |
completion_method |
A character string specifying the method for selecting the required remaining tracers to form a determined system of equations. Possible values are: "virtual": Fabricate remaining tracers virtually using generated random numbers. This method is valuable for an initial assessment of the tracer's consistency without the influence of other tracers from the dataset. "random": Randomly select remaining tracers from the dataset to complete the system. This method is useful for understanding how the tracer behaves when paired with others from the dataset. |
iter |
The number of iterations for the variability analysis. Increase 'iter' to improve the reliability and accuracy of the results. A sufficient number of iterations is reached when the output no longer changes significantly with further increases. |
seed |
An integer used to initialize the random number generator for reproducibility. Setting a seed ensures that the sequence of random numbers generated during the unmixing is reproducible. This is useful for debugging, testing, and comparing results across different runs. |
Details
The function performs an individual tracer analysis to evaluate the conservativeness and discriminant capacity of each tracer. For each tracer, it constructs a determined system of linear equations by combining it with a minimal set of other tracers.
There are two methods for completing this minimal set: 1. The **"virtual" method** fabricates the remaining tracers by randomly generating values. This approach isolates the tracer of interest from the influence of other measured tracers. 2. The **"random" method** randomly selects the remaining tracers from the available dataset, providing an assessment of how the tracer performs in combination with others.
Value
A list of data frames, where each data frame contains the predicted apportionments for a specific tracer. The last element of the list is a data frame containing the **Consistency Index (CI)** for each tracer.
References
Lizaga, I., Latorre, B., Gaspar, L., & Navas, A. (2020). Consensus ranking as a method to identify non-conservative and dissenting tracers in fingerprinting studies. *Science of The Total Environment*, *720*, 137537. https://doi.org/10.1016/j.scitotenv.2020.137537
Input sediment mixtures
Description
The function select and extract the sediment mixtures of the raw dataset.
Usage
inputMixture(data)
Arguments
data |
Data frame containing source and mixtures data |
Input sediment sources
Description
The function select and extract the source samples of the dataset.
Usage
inputSource(data, na.omit = T)
Arguments
data |
Data frame containing source and mixtures data |
na.omit |
Boolean to omit or not NA values when computing the mean and SD |
Displays the results of an unmixing analysis
Description
This function generates a plot showing the relative contribution of sediment sources to each mixture. The output of the unmix
function should be used as input for this function.
Usage
plot_results(
data,
violin = T,
bounds = c(0, 1),
scaled = T,
y_high = 1,
colors = NULL,
ncol = 1
)
Arguments
data |
A data frame, typically the output from the |
violin |
A logical value. If |
bounds |
A numeric vector of length 2 specifying the lower and upper bounds for the data. |
scaled |
A logical value. If |
y_high |
The maximum value for the y-axis. |
colors |
A character vector of colors to use for the plots. |
ncol |
The number of plots per row. |
Range test
Description
Function that excludes the properties of the sediment mixture/s outside the minimum and maximum values in the sediment sources.
Usage
range_test(data)
Arguments
data |
Data frame containing source and mixtures |
Value
Data frame containing sediment sources and mixtures
Build a raw dataset from averaged data
Description
Generates a raw (non-averaged) dataset by sampling individual observations from the mean and standard deviation values provided in an averaged input data frame. For each source, it generates 'n' observations for each tracer by sampling from a normal distribution using the provided mean and standard deviation. Mixture data is appended directly without sampling.
Usage
raw_dataset(data)
Arguments
data |
A data frame containing averaged source and mixture data. It is expected to have columns for tracer means (prefixed with "mean_"), standard deviations (prefixed with "sd_"), and a column "n" indicating the number of observations for each source. |
Value
A data frame representing the raw, non-averaged dataset, with each row corresponding to an individual observation.
Select specific tracers from a data frame
Description
This function allows you to select a subset of tracer columns from a data frame. It is designed to work with both isotopic and non-isotopic datasets, and also with both averaged and raw data formats.
Usage
select_tracers(data, tracers)
Arguments
data |
A data frame containing tracer data. |
tracers |
A character vector of tracers to select. |
Value
A data frame containing only the specified tracer columns. The returned columns will be selected based on the data format. For non-isotopic and raw data, it selects the tracer columns (e.g., "tracer1"). For non-isotopic and averaged data, it selects the mean and standard deviation columns (e.g., "mean_tracer1", "sd_tracer1"). For isotopic and raw data, it selects the tracer and its corresponding concentration column (e.g., "tracer1", "cont_tracer1"). For isotopic and averaged data, it selects the mean and standard deviation for both the tracer and its concentration (e.g., "mean_tracer1", "mean_cont_tracer1", "sd_tracer1", "sd_cont_tracer1").
Visualize individual tracer analysis as ternary diagrams
Description
This function creates ternary diagrams to visualize the results of the individual tracer analysis. Each ternary diagram represents the predicted apportionments for a specific tracer.
Usage
ternary_diagram(data, tracers = c(1:2), rows = 1, cols = 2, solution = NA)
Arguments
data |
A data frame containing the results from the individual tracer analysis function. |
tracers |
A vector specifying the indices of the tracers to be displayed. |
rows |
An integer specifying the number of rows in the grid of ternary diagrams. |
cols |
An integer specifying the number of columns in the grid of ternary diagrams. |
solution |
A vector containing an optional reference solution for visual comparison. |
Value
A grid of ternary diagrams, each representing the predicted apportionments for a specific tracer. If there are three sources, the function generates one ternary triangle for each tracer. If there are four sources, the function generates six triangles for each tracer. The six triangles represent the following source combinations at their vertices: 1. (S1, S2, S3+S4) 2. (S2, S3, S1+S4) 3. (S3, S4, S1+S2) 4. (S4, S1, S2+S3) 5. (S1, S3, S2+S4) 6. (S2, S4, S1+S3)
Unmix sediment mixtures
Description
This function assesses the relative contribution of potential sediment sources to each sediment mixture in a dataset using a mass balance approach. It supports both unconstrained and constrained optimization, allowing for different methods of handling source variability.
Usage
unmix(
data,
iter = 1000L,
variability = "SEM",
lvp = TRUE,
constrained = FALSE,
resolution = NA,
seed = 123456L
)
Arguments
data |
Data frame containing sediment source and mixture data. Users should ensure their data is in a valid format by using the check_database() function before running the unmixing process. |
iter |
The number of iterations for the variability analysis. Increase 'iter' to improve the reliability and accuracy of the results. A sufficient number of iterations is reached when the output no longer changes significantly with further increases. |
variability |
A character string specifying the type of variability to calculate. Possible values are "SD" for Standard Deviation or "SEM" for Standard Error of the Mean. |
lvp |
A logical value to switch between classical variability analysis (lvp = FALSE) and Linear Variability Propagation (lvp = TRUE). LVP is a more accurate method for calculating uncertainty in unmixing models under high variability and extreme source apportionments. |
constrained |
A logical value indicating whether the optimization should be constrained to physical solutions. If constrained = TRUE, the optimization will be restricted to solutions where all source contributions are within the range of 0 to 1. If constrained = FALSE, the optimization is unconstrained. |
resolution |
An integer specifying the number of samples used in each hypercube dimension for constrained optimization. This parameter is only used when constrained = TRUE and is required to perform the analysis. |
seed |
An integer value used to initialize the random number generator. Setting a seed ensures that the sequence of random numbers generated during the unmixing is reproducible. This is useful for debugging, testing, and comparing results across different runs. If no seed is provided, a random seed will be generated. |
Value
A data frame containing the relative contributions of the sediment sources to each sediment mixture, across all iterations. The second and third rows of the result correspond to the solution for the central or mean value of the sources. The output includes an ID column to identify each mixture, a GOF (Goodness of Fit) column, and columns for each source showing their calculated contributions.
References
Latorre, B., Lizaga, I., Gaspar, L., & Navas, A. (2025). Evaluating the Impact of High Source Variability and Extreme Contributing Sources on Sediment Fingerprinting Models. *Water Resources Management*, *1-15*. https://doi.org/10.1007/s11269-025-04169-8
Create a virtual sediment mixture
Description
This function generates a virtual sediment mixture based on the characteristics of existing sediment sources and a set of user-defined apportionment weights. It effectively simulates a mixture with known source contributions.
Usage
virtual_mixture(data, weights)
Arguments
data |
A data frame containing the characteristics of the sediment sources. Users should ensure their data is in a valid format by using the 'check_database()' function before running this function. |
weights |
A numeric vector representing the proportional contributions (apportionment values) of each source to the virtual mixture. The order of weights in the vector must correspond to the order of sources in the 'data' frame. The sum of 'weights' should ideally equal 1. |
Details
A virtual mixture is a hypothetical sediment sample created by mathematically combining the tracer characteristics of known sources according to specified proportions ('weights'). This is a powerful tool in sediment fingerprinting for:
**Consistency Checks**: Comparing observed mixture data against a virtual mixture can help assess the consistency of a dataset or the validity of an unmixing solution.
**Scenario Testing**: Simulating mixtures under different hypothetical source contributions to understand how changes might affect sediment composition.
**Model Validation**: Generating known virtual mixtures to test the accuracy and performance of unmixing models.
The function calculates the tracer values for the virtual mixture by taking the weighted average of the corresponding tracer values from each source.
Value
A data frame representing the virtual mixture. This data frame will have the same structure as a single row for a mixture in your input 'data', but with tracer values calculated based on the provided 'weights'.
Save the results
Description
The function saves the results in the workspace file for all the sediment mixture samples and for each sediment mixture sample separately
Usage
write_results(data)
Arguments
data |
Data frame containing the relative contribution of the potential sediment sources for each sediment mixture in the dataset |