Provides tools for detecting XOR-like patterns in variable pairs. Includes visualizations for pattern exploration.
Traditional feature selection methods often miss complex non-linear
relationships where variables interact to produce class differences. The
detectXOR
package specifically targets XOR
patterns - relationships where class discrimination only
emerges through variable interactions, not individual variables
alone.
π XOR pattern detection - Statistical
identification using ΟΒ² and Wilcoxon tests
π Correlation analysis - Class-wise Kendall Ο
coefficients
π Visualization - Spaghetti plots and decision
boundary visualizations
β‘ Parallel processing - Multi-core acceleration for
large datasets
π¬ Robust statistics - Winsorization and scaling
options for outlier handling
Install the development version from GitHub:
# Install devtools if needed
if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }
# Install detectXOR
::install_github("JornLotsch/detectXOR") devtools
The package requires R β₯ 3.5.0 and depends on: - dplyr
,
tibble
(data manipulation) - ggplot2
,
ggh4x
, scales
(visualization) -
future
, future.apply
, pbmcapply
,
parallel
(parallel processing) - reshape2
,
glue
(data processing and string manipulation) -
DescTools
(statistical tools) - Base R packages:
stats
, utils
, methods
,
grDevices
Optional packages (suggested): - testthat
,
knitr
, rmarkdown
(development and
documentation) - doParallel
, foreach
(additional parallel processing options)
library(detectXOR)
# Load example data
data(XOR_data)
# Detect XOR patterns with default settings
<- detectXOR(XOR_data, class_col = "class")
results # View summary
print(results$results_df)
# Detection with custom thresholds and parallel processing
<- detect_xor(
results data = XOR_data,
class_col = "class",
p_threshold = 0.01,
tau_threshold = 0.4,
max_cores = 4,
extreme_handling = "winsorize",
scale_data = TRUE
)
detectXOR()
-
Main detection functionParameter | Type | Default | Description |
---|---|---|---|
data |
data.frame | required | Input dataset with variables and class column |
class_col |
character | "class" |
Name of the class/target variable column |
check_tau |
logical | TRUE |
Compute class-wise Kendall Ο correlations |
compute_axes_parallel_significance |
logical | TRUE |
Perform group-wise Wilcoxon tests |
p_threshold |
numeric | 0.05 |
Significance threshold for statistical tests |
tau_threshold |
numeric | 0.3 |
Minimum absolute Ο for βstrongβ correlation |
abs_diff_threshold |
numeric | 20 |
Minimum absolute difference for practical significance |
split_method |
character | "quantile" |
Tile splitting method: "quantile" or
"range" |
max_cores |
integer | NULL |
Maximum cores for parallel processing (auto-detect if NULL) |
extreme_handling |
character | "winsorize" |
Outlier handling: "winsorize" , "remove" ,
or "none" |
winsor_limits |
numeric vector | c(0.05, 0.95) |
Winsorization percentiles |
scale_data |
logical | TRUE |
Standardize variables before analysis |
use_complete |
logical | TRUE |
Use only complete cases (remove NA values) |
The detectXOR()
function returns a list with two
components: ### results_df
- Summary data frame
Column | Description |
---|---|
var1 , var2 |
Variable pair names |
xor_shape_detected |
Logical: XOR pattern identified |
chi_sq_p_value |
ΟΒ² test p-value for tile independence |
tau_class_0 , tau_class_1 |
Class-wise Kendall Ο coefficients |
tau_difference |
Absolute difference between class Ο values |
wilcox_p_x , wilcox_p_y |
Wilcoxon test p-values for each axis |
significant_wilcox |
Logical: significant group differences detected |
pair_list
- Detailed
resultsContains comprehensive analysis for each variable pair including: - Tile pattern analysis results - Statistical test outputs - Processed data subsets - Intermediate calculations
Function | Description | Key Parameters |
---|---|---|
generate_spaghetti_plot_from_results() |
Creates connected line plots showing variable trajectories for XOR-detected pairs | results , data , class_col ,
scale_data = TRUE |
generate_xy_plot_from_results() |
Generates scatter plots with decision boundary lines for detected XOR patterns | results , data , class_col ,
scale_data = TRUE ,
quantile_lines = c(1/3, 2/3) ,
line_method = "quantile" |
Both functions return ggplot objects that can be displayed or saved manually.
# Generate plots
generate_spaghetti_plot_from_results(results, XOR_data)
generate_xy_plot_from_results(results, XOR_data)
Function | Description | Key Parameters |
---|---|---|
generate_xor_reportConsole() |
Creates console-friendly formatted report with optional plots | results , data , class_col ,
scale_data = TRUE , show_plots = TRUE |
generate_xor_reportHTML() |
Generates comprehensive HTML report with interactive elements | results , data , class_col ,
output_file , open_browser = TRUE |
# Generate formatted report
generate_xor_reportHTML(results, XOR_data, class_col = "class")
The report will be automaticlaly opened in the system standard web browser.
future::multisession
for
parallel processingpbmcapply::pbmclapply
with fork-based parallelism/
detectXOR/ # Package source code
βββ R/ # Package documentation
βββ man/ # Example dataset
βββ data/ # Problem reporting
βββ issues/ # Files used to generate or plot publictaion data sets (not in library) βββ analyses
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests on GitHub. ## License GPL-3 ## Citation
For citation details or to request a formal publication reference, please contact the maintainer.