Rule Extraction Methods

SolePostHoc.RuleExtraction.intreesMethod
intrees(model::Union{AbstractModel,DecisionForest}, X, y::AbstractVector{<:Label}; kwargs...)::DecisionList

Return a decision list which approximates the behavior of the input model on the specified supervised dataset. The set of relevant and non-redundant rules in the decision list are obtained by means of rule selection, rule pruning, and sequential covering (STEL).

References

  • Deng, Houtao. "Interpreting tree ensembles with intrees." International Journal of Data Science and Analytics 7.4 (2019): 277-287.

Keyword Arguments

  • prune_rules::Bool=true: access to prune or not
  • pruning_s::Union{Float64,Nothing}=nothing: parameter that limits the denominator in the pruning metric calculation
  • pruning_decay_threshold::Union{Float64,Nothing}=nothing: threshold used in pruning to remove or not a joint from the rule
  • rule_selection_method::Symbol=:CBC: rule selection method. Currently only supports :CBC
  • rule_complexity_metric::Symbol=:natoms: Metric to use for estimating a rule complexity measure
  • max_rules::Int=-1: maximum number of rules in the final decision list (excluding default rule). Use -1 for unlimited rules.
  • min_coverage::Union{Float64,Nothing}=nothing: minimum rule coverage for STEL
  • See modalextractrules keyword arguments...

Although the method was originally presented for forests it is hereby extended to work with any symbolic models.

See also AbstractModel, DecisionList, listrules, rulemetrics.

source

Lumen

SolePostHoc.RuleExtraction.Lumen.lumenMethod
lumen(model; config_args...) -> LumenResult
lumen(model, config::LumenConfig) -> LumenResult

Logic-driven Unified Minimal Extractor of Notions (LUMEN): Extract and minimize logical rules from decision tree models into interpretable DNF formulas.

LUMEN implements a comprehensive pipeline for converting decision tree models into interpretable logical rules. The algorithm extracts the underlying decision logic, constructs truth tables, and applies advanced minimization techniques to produce compact, human-readable rule sets.

Method Signatures

Keyword Arguments Interface

lumen(model; minimization_scheme=:mitespresso, vertical=1.0, horizontal=1.0, ...)

Convenient interface using keyword arguments with automatic config construction.

Configuration Object Interface

lumen(model, config::LumenConfig)

Advanced interface using pre-constructed configuration for complex scenarios.

Arguments

Required Arguments

  • model: Decision tree model to analyze
    • Single trees: Individual decision tree models
    • Ensembles: Random forests, gradient boosting, etc.
    • Supported formats: DecisionTree.jl, SoleModels framework

Configuration (via keywords or LumenConfig)

  • minimization_scheme::Symbol = :mitespresso: DNF minimization algorithm
  • vertical::Float64 = 1.0: Instance coverage parameter α ∈ (0,1]
  • horizontal::Float64 = 1.0: Feature coverage parameter β ∈ (0,1]
  • ott_mode::Bool = false: Enable memory-optimized processing
  • silent::Bool = false: Suppress progress output
  • return_info::Bool = true: Include detailed metadata in results

Returns

LumenResult containing:

  • decision_set: Collection of minimized logical rules
  • info: Metadata including statistics and unminimized rules
  • processing_time: Total algorithm execution time

Algorithm Pipeline

Phase 1: Model Analysis and Rule Extraction

Input Model → Rule Extraction → Logical Rule Set
  • Analyzes model structure (single tree vs ensemble)
  • Extracts decision paths as logical rules
  • Handles different model types with appropriate strategies

Phase 2: Alphabet Construction and Atom Processing

Logical Rules → Atom Extraction → Logical Alphabet
  • Identifies atomic logical conditions
  • Constructs vocabulary for formula building
  • Validates feature support and operator compatibility

Phase 3: Truth Table Generation

Model + Alphabet → Truth Combinations → Labeled Examples
  • Generates systematic input combinations
  • Evaluates model on each combination
  • Creates correspondence between inputs and outputs

Phase 4: DNF Construction and Minimization

Truth Table → DNF Formulas → Minimized Rules
  • Constructs DNF formulas for each decision class
  • Applies advanced minimization algorithms
  • Converts back to interpretable rule format

Performance Characteristics

Computational Complexity

  • Time: O(2^k × n × d) where k=features, n=instances, d=tree depth
  • Space: O(k × r) where r=number of rules
  • Scalability: Optimized modes available for large datasets

Memory Usage

  • Standard mode: Suitable for typical datasets (< 20 features)
  • Optimized mode: Memory-efficient processing for large problems
  • Streaming capability: Future versions may support streaming processing

Advanced Features

Custom Processing

# Custom alphabet filtering for domain expertise
custom_filter = alphabet -> remove_irrelevant_features(alphabet)
config = LumenConfig(filteralphabetcallback = custom_filter)
result = lumen(model, config)

Performance Tuning

# Memory-optimized processing for large datasets
config = LumenConfig(ott_mode = true, vertical = 0.8)

# Speed-optimized processing with basic minimization
config = LumenConfig(minimization_scheme = :abc, silent = true)

Analysis and Debugging

# Full information retention for analysis
config = LumenConfig(return_info = true, controllo = true)
result = lumen(model, config)

# Access detailed statistics  
println("Rules before minimization: _(length(result.info.unminimized_ds.rules))")
println("Rules after minimization: _(length(result.decision_set.rules))")

Error Handling

The algorithm implements comprehensive error handling:

Configuration Validation

  • Parameter range checking (coverage parameters must be ∈ (0,1])
  • Algorithm availability verification
  • Consistency validation across parameters

Processing Errors

  • Graceful handling of minimization failures
  • Fallback strategies for problematic formulas
  • Detailed error reporting with context

Model Compatibility

  • Automatic detection of supported model types
  • Clear error messages for unsupported formats
  • Suggestions for model preprocessing

Examples

Basic Usage

# Simple rule extraction with default settings
model = build_tree(X, y)
result = lumen(model)
println("Extracted _(length(result.decision_set.rules)) rules")

Advanced Configuration

# Customized processing for complex scenarios
config = LumenConfig(
    minimization_scheme = :boom,        # Aggressive minimization
    vertical = 0.9,                     # High instance coverage  
    horizontal = 0.8,                   # Moderate feature coverage
    ott_mode = true,                    # Memory optimization
    return_info = true                  # Full information retention
)
result = lumen(large_ensemble, config)

Performance Analysis

# Detailed performance and quality analysis
result = lumen(model, LumenConfig(return_info = true))

# Analyze minimization effectiveness
stats = result.info.vectPrePostNumber
total_reduction = sum(pre - post for (pre, post) in stats)
avg_compression = mean(pre / post for (pre, post) in stats)

println("Total term reduction: total_reduction")
println("Average compression ratio: (round(avg_compression, digits=2))x")
println("Processing time: _(result.processing_time) seconds")

Implementation Notes

Design Principles

  1. Modularity: Each phase is independently testable and extensible
  2. Configurability: Extensive customization without code modification
  3. Performance: Multiple optimization strategies for different scenarios
  4. Robustness: Comprehensive error handling and validation
  5. Usability: Clean interfaces with sensible defaults

Extensibility Points

  • New minimization algorithms: Add via Val() dispatch system
  • Custom model types: Extend rule extraction strategies
  • Domain-specific processing: Custom alphabet filters and apply functions
  • Output formats: Additional result formatters and exporters

See also: LumenConfig, LumenResult, extract_rules, minimize_formula

source
SolePostHoc.RuleExtraction.Lumen.LumenConfigType
LumenConfig

Configuration parameters for the Logic-driven Unified Minimal Extractor of Notions (LUMEN) algorithm.

This struct encapsulates all configuration options for the LUMEN algorithm, providing a clean interface with automatic validation and sensible defaults. It uses Julia's @kwdef macro to enable keyword-based construction with default values.

Fields

Core Algorithm Parameters

  • minimization_scheme::Symbol = :AlgorithmName: The DNF minimization algorithm to use
    • :mitespresso: Advanced minimization with good balance of speed/quality
    • :boom: Boom minimizator
    • :abc: Minimization whit Berkeley framework

Coverage Parameters

  • vertical::Float64 = 1.0: Vertical coverage parameter (α) ∈ (0.0, 1.0] Controls how many instances must be covered by extracted rules
  • horizontal::Float64 = 1.0: Horizontal coverage parameter (β) ∈ (0.0, 1.0] Controls the breadth of rule coverage across feature space (% of different thresholds)

Processing Modes

  • ott_mode::Bool = false: Optimized truth table processing When true, uses memory-efficient and time-efficient algorithms for large datasets
  • controllo::Bool = false: Enable validation mode Compares results between different processing methods for correctness verification

Customization Options

  • minimization_kwargs::NamedTuple = (;): Additional parameters for minimization algorithms
  • filteralphabetcallback = identity: Custom function to filter/modify the logical alphabet
  • apply_function = nothing: Custom function for model application If nothing, automatically determined based on model type (with SoleModels)

Output Control

  • silent::Bool = false: Suppress progress and diagnostic output
  • return_info::Bool = true: Include additional metadata in results
  • vetImportance::Vector = []: Vector for tracking feature importance values

Testing and Debugging

  • testott = nothing: Special testing mode for optimization validation
  • alphabetcontroll = nothing: Special mode for alphabet analysis only

Constructor Validation

The constructor automatically validates parameters and throws descriptive errors:

  • Coverage parameters must be in range (0.0, 1.0]
  • Minimization scheme must be supported
  • Inconsistent parameter combinations are caught early

Examples

# Basic usage with defaults
config = LumenConfig()

# Customized configuration
config = LumenConfig(
    minimization_scheme = :abc,
    vertical = 0.8,
    horizontal = 0.9,
    silent = true
)

# Advanced configuration with custom processing
config = LumenConfig(
    ott_mode = true,
    minimization_kwargs = (max_iterations = 1000,),
    filteralphabetcallback = my_custom_filter
)

See also: lumen, LumenResult, validate_config

source
SolePostHoc.RuleExtraction.Lumen.LumenResultType
LumenResult

Comprehensive result structure containing extracted logical rules and associated metadata.

This immutable struct encapsulates all outputs from the LUMEN algorithm, providing a clean and extensible interface for accessing results. The design follows the principle of returning rich, self-documenting results rather than simple tuples.

Fields

  • decision_set::DecisionSet: The primary output - a collection of minimized logical rules Each rule consists of a logical formula (antecedent) and a decision outcome (consequent)
  • info::NamedTuple: Extensible metadata container with algorithm-specific information Common fields include:
    • vectPrePostNumber: Vector of (pre, post) minimization term counts
    • unminimized_ds: Original decision set before minimization (if requested)
    • processing_time: Total algorithm execution time
    • feature_importance: Feature ranking information (if available)
  • processing_time::Float64: Total processing time in seconds Measured from algorithm start to completion, useful for performance analysis

Constructors

Two constructors are provided for different use cases:

# Full constructor - for complete results with metadata
LumenResult(decision_set, info_tuple, processing_time)

# Minimal constructor - when only rules are available  
LumenResult(decision_set)  # info=empty, processing_time=0.0

Design Rationale

This structured approach provides several advantages over returning raw tuples:

  1. Self-documentation: Field names clearly indicate content
  2. Type safety: Julia's type system validates structure at compile time
  3. Extensibility: Easy to add new fields without breaking existing code
  4. IDE support: Autocompletion and inline documentation
  5. Backward compatibility: Old code can still access fields by name

Examples

# Basic usage
result = lumen(model, config)
rules = result.decision_set
println("Extracted (length(rules.rules)) rules in (result.processing_time)s")

# Accessing metadata
if haskey(result.info, :vectPrePostNumber)
    stats = result.info.vectPrePostNumber
    total_reduction = sum(pre - post for (pre, post) in stats)
    println("Reduced formula complexity by 	otal_reduction terms")
end

# Comparing minimized vs original rules
if haskey(result.info, :unminimized_ds)
    original_rules = result.info.unminimized_ds
    println("Minimization: (length(original_rules.rules)) → (length(result.decision_set.rules))")
end

See also: lumen, LumenConfig, DecisionSet

source

REFNE

SolePostHoc.RuleExtraction.REFNE.refneMethod
refne(m, Xmin, Xmax; L=100, perc=1.0, max_depth=-1, n_subfeatures=-1, 
      partial_sampling=0.7, min_samples_leaf=5, min_samples_split=2, 
      min_purity_increase=0.0, seed=3)

Extract interpretable rules from a trained neural network ensemble using decision tree approximation.

This implementation follows the REFNE-a (Rule Extraction From Neural Network Ensemble) algorithm, which approximates complex neural network behavior with an interpretable decision tree model.

Arguments

  • m: Trained neural network model to extract rules from
  • Xmin: Minimum values for each input feature
  • Xmax: Maximum values for each input feature
  • L: Number of samples to generate in the synthetic dataset (default: 100)
  • perc: Percentage of generated samples to use (default: 1.0)
  • max_depth: Maximum depth of the decision tree (default: -1, unlimited)
  • n_subfeatures: Number of features to consider at each split (default: -1, all)
  • partial_sampling: Fraction of samples used for each tree (default: 0.7)
  • min_samples_leaf: Minimum number of samples required at a leaf node (default: 5)
  • min_samples_split: Minimum number of samples required to split a node (default: 2)
  • min_purity_increase: Minimum purity increase required for a split (default: 0.0)
  • seed: Random seed for reproducibility (default: 3)

Returns

  • A forest-decision trees representing the extracted rules

Description

The algorithm works by:

  1. Generating a synthetic dataset spanning the input space
  2. Using the neural network to label these samples
  3. Training a decision tree to approximate the neural network's behavior

References

  • Zhi-Hua, Zhou, et al. Extracting Symbolic Rules from Trained Neural Network Ensembles

Example

```julia model = loaddecisiontree_model() refne(model, Xmin, Xmax)

See also AbstractModel, DecisionList, listrules, rulemetrics.

source

TREPAN

BATrees

SolePostHoc.RuleExtraction.BATrees.batreesFunction
batrees(f; dataset_name="iris", num_trees=10, max_depth=10, dsOutput=true)

Builds and trains a set of binary decision trees OR using the specified function f.

Arguments

  • f: An SoleForest.
  • dataset_name::String: The name of the dataset to be used. Default is "iris".
  • num_trees::Int: The number of trees to be built. Default is 10.
  • max_depth::Int: The maximum depth of each tree. Default is 10.
  • dsOutput::Bool: A flag indicating whether to return the dsStruct output. Default is true. if false, returns the result single tree.

Returns

  • If dsOutput is true, returns the result is in DecisionSet ds.
  • If dsOutput is false, returns the result is SoleTree t`.

Example

source

RULECOSIPLUS

SolePostHoc.RuleExtraction.RULECOSIPLUS.rulecosiplusMethod
rulecosiplus(ensemble::Any, X_train::Any, y_train::Any)

Extract interpretable rules from decision tree ensembles using the RuleCOSI+ algorithm.

This function implements the RuleCOSI+ methodology for rule extraction from trained ensemble classifiers, producing a simplified and interpretable rule-based model. The method combines and simplifies rules extracted from individual trees in the ensemble to create a more compact and understandable decision list.

Reference

Obregon, J. (2022). RuleCOSI+: Rule extraction for interpreting classification tree ensembles. Information Fusion, 89, 355-381. Available at: https://www.sciencedirect.com/science/article/pii/S1566253522001129

Arguments

  • ensemble::Any: A trained ensemble classifier (e.g., Random Forest, Gradient Boosting) that will be serialized and converted to a compatible format for rule extraction.
  • X_train::Any: Training feature data. Can be a DataFrame or Matrix. If DataFrame, column names will be preserved in the extracted rules; otherwise, generic names (V1, V2, ...) will be generated.
  • y_train::Any: Training target labels corresponding to X_train. Will be converted to string format for processing.

Returns

  • DecisionList: A simplified decision list containing the extracted and combined rules from the ensemble, suitable for interpretable classification.

Details

The function performs the following steps:

  1. Converts input data to appropriate matrix format
  2. Generates or extracts feature column names
  3. Serializes the Julia ensemble to a Python-compatible format
  4. Builds an sklearn-compatible model using the serialized ensemble
  5. Applies RuleCOSI+ algorithm with the following default parameters:
    • metric="fi": Optimization metric for rule combination
    • n_estimators=100: Number of estimators considered
    • tree_max_depth=100: Maximum depth of trees
    • conf_threshold=0.25 (α): Confidence threshold for rule filtering
    • cov_threshold=0.1 (β): Coverage threshold for rule filtering
    • verbose=2: Detailed output during processing
  6. Extracts and converts rules to a decision list format

Configuration

The algorithm uses fixed parameters optimized for interpretability:

  • Confidence threshold (α) = 0.25: Rules below this confidence are discarded
  • Coverage threshold (β) = 0.1: Rules covering fewer samples are excluded
  • Maximum rules = max(20, n_classes × 5): Adaptive limit based on problem complexity

Example

# Assuming you have a trained ensemble and training data
ensemble = ... # your trained ensemble
X_train = ... # training features
y_train = ... # training labels

# Extract interpretable rules
decision_list = rulecosiplus(ensemble, X_train, y_train)

Notes

  • The function prints diagnostic information including the number of trees and dataset statistics
  • Raw rules are displayed before conversion to decision list format
  • Requires Python interoperability and the RuleCOSI implementation
  • The resulting decision list provides an interpretable alternative to the original ensemble
source