Rule Extraction Methods
SolePostHoc.RuleExtraction.intrees
— Methodintrees(model::Union{AbstractModel,DecisionForest}, X, y::AbstractVector{<:Label}; kwargs...)::DecisionList
Return a decision list which approximates the behavior of the input model
on the specified supervised dataset. The set of relevant and non-redundant rules in the decision list are obtained by means of rule selection, rule pruning, and sequential covering (STEL).
References
- Deng, Houtao. "Interpreting tree ensembles with intrees." International Journal of Data Science and Analytics 7.4 (2019): 277-287.
Keyword Arguments
prune_rules::Bool=true
: access to prune or notpruning_s::Union{Float64,Nothing}=nothing
: parameter that limits the denominator in the pruning metric calculationpruning_decay_threshold::Union{Float64,Nothing}=nothing
: threshold used in pruning to remove or not a joint from the rulerule_selection_method::Symbol=:CBC
: rule selection method. Currently only supports:CBC
rule_complexity_metric::Symbol=:natoms
: Metric to use for estimating a rule complexity measuremax_rules::Int=-1
: maximum number of rules in the final decision list (excluding default rule). Use -1 for unlimited rules.min_coverage::Union{Float64,Nothing}=nothing
: minimum rule coverage for STEL- See
modalextractrules
keyword arguments...
Although the method was originally presented for forests it is hereby extended to work with any symbolic models.
See also AbstractModel
, DecisionList
, listrules
, rulemetrics
.
SolePostHoc.RuleExtraction.BATreesRuleExtractor
— TypeExtract rules from a symbolic model using batrees
.
See also modalextractrules
, RuleExtractor
.
SolePostHoc.RuleExtraction.InTreesRuleExtractor
— TypeExtract rules from a symbolic model using intrees
.
See also modalextractrules
, RuleExtractor
.
SolePostHoc.RuleExtraction.LumenRuleExtractor
— TypeExtract rules from a symbolic model using lumen
.
See also modalextractrules
, RuleExtractor
.
SolePostHoc.RuleExtraction.REFNERuleExtractor
— TypeExtract rules from a symbolic model using SolePostHoc.RuleExtraction.REFNE
.
See also modalextractrules
, RuleExtractor
.
SolePostHoc.RuleExtraction.RULECOSIPLUSRuleExtractor
— TypeExtract rules from a symbolic model using SolePostHoc.RuleExtraction.RULECOSIPLUS
.
See also modalextractrules
, RuleExtractor
.
SolePostHoc.RuleExtraction.TREPANRuleExtractor
— TypeExtract rules from a symbolic model using SolePostHoc.RuleExtraction.TREPAN
.
See also modalextractrules
, RuleExtractor
.
Lumen
SolePostHoc.RuleExtraction.Lumen.lumen
— Methodlumen(model; config_args...) -> LumenResult
lumen(model, config::LumenConfig) -> LumenResult
Logic-driven Unified Minimal Extractor of Notions (LUMEN): Extract and minimize logical rules from decision tree models into interpretable DNF formulas.
LUMEN implements a comprehensive pipeline for converting decision tree models into interpretable logical rules. The algorithm extracts the underlying decision logic, constructs truth tables, and applies advanced minimization techniques to produce compact, human-readable rule sets.
Method Signatures
Keyword Arguments Interface
lumen(model; minimization_scheme=:mitespresso, vertical=1.0, horizontal=1.0, ...)
Convenient interface using keyword arguments with automatic config construction.
Configuration Object Interface
lumen(model, config::LumenConfig)
Advanced interface using pre-constructed configuration for complex scenarios.
Arguments
Required Arguments
model
: Decision tree model to analyze- Single trees: Individual decision tree models
- Ensembles: Random forests, gradient boosting, etc.
- Supported formats: DecisionTree.jl, SoleModels framework
Configuration (via keywords or LumenConfig)
minimization_scheme::Symbol = :mitespresso
: DNF minimization algorithmvertical::Float64 = 1.0
: Instance coverage parameter α ∈ (0,1]horizontal::Float64 = 1.0
: Feature coverage parameter β ∈ (0,1]ott_mode::Bool = false
: Enable memory-optimized processingsilent::Bool = false
: Suppress progress outputreturn_info::Bool = true
: Include detailed metadata in results
Returns
LumenResult
containing:
decision_set
: Collection of minimized logical rulesinfo
: Metadata including statistics and unminimized rulesprocessing_time
: Total algorithm execution time
Algorithm Pipeline
Phase 1: Model Analysis and Rule Extraction
Input Model → Rule Extraction → Logical Rule Set
- Analyzes model structure (single tree vs ensemble)
- Extracts decision paths as logical rules
- Handles different model types with appropriate strategies
Phase 2: Alphabet Construction and Atom Processing
Logical Rules → Atom Extraction → Logical Alphabet
- Identifies atomic logical conditions
- Constructs vocabulary for formula building
- Validates feature support and operator compatibility
Phase 3: Truth Table Generation
Model + Alphabet → Truth Combinations → Labeled Examples
- Generates systematic input combinations
- Evaluates model on each combination
- Creates correspondence between inputs and outputs
Phase 4: DNF Construction and Minimization
Truth Table → DNF Formulas → Minimized Rules
- Constructs DNF formulas for each decision class
- Applies advanced minimization algorithms
- Converts back to interpretable rule format
Performance Characteristics
Computational Complexity
- Time: O(2^k × n × d) where k=features, n=instances, d=tree depth
- Space: O(k × r) where r=number of rules
- Scalability: Optimized modes available for large datasets
Memory Usage
- Standard mode: Suitable for typical datasets (< 20 features)
- Optimized mode: Memory-efficient processing for large problems
- Streaming capability: Future versions may support streaming processing
Advanced Features
Custom Processing
# Custom alphabet filtering for domain expertise
custom_filter = alphabet -> remove_irrelevant_features(alphabet)
config = LumenConfig(filteralphabetcallback = custom_filter)
result = lumen(model, config)
Performance Tuning
# Memory-optimized processing for large datasets
config = LumenConfig(ott_mode = true, vertical = 0.8)
# Speed-optimized processing with basic minimization
config = LumenConfig(minimization_scheme = :abc, silent = true)
Analysis and Debugging
# Full information retention for analysis
config = LumenConfig(return_info = true, controllo = true)
result = lumen(model, config)
# Access detailed statistics
println("Rules before minimization: _(length(result.info.unminimized_ds.rules))")
println("Rules after minimization: _(length(result.decision_set.rules))")
Error Handling
The algorithm implements comprehensive error handling:
Configuration Validation
- Parameter range checking (coverage parameters must be ∈ (0,1])
- Algorithm availability verification
- Consistency validation across parameters
Processing Errors
- Graceful handling of minimization failures
- Fallback strategies for problematic formulas
- Detailed error reporting with context
Model Compatibility
- Automatic detection of supported model types
- Clear error messages for unsupported formats
- Suggestions for model preprocessing
Examples
Basic Usage
# Simple rule extraction with default settings
model = build_tree(X, y)
result = lumen(model)
println("Extracted _(length(result.decision_set.rules)) rules")
Advanced Configuration
# Customized processing for complex scenarios
config = LumenConfig(
minimization_scheme = :boom, # Aggressive minimization
vertical = 0.9, # High instance coverage
horizontal = 0.8, # Moderate feature coverage
ott_mode = true, # Memory optimization
return_info = true # Full information retention
)
result = lumen(large_ensemble, config)
Performance Analysis
# Detailed performance and quality analysis
result = lumen(model, LumenConfig(return_info = true))
# Analyze minimization effectiveness
stats = result.info.vectPrePostNumber
total_reduction = sum(pre - post for (pre, post) in stats)
avg_compression = mean(pre / post for (pre, post) in stats)
println("Total term reduction: total_reduction")
println("Average compression ratio: (round(avg_compression, digits=2))x")
println("Processing time: _(result.processing_time) seconds")
Implementation Notes
Design Principles
- Modularity: Each phase is independently testable and extensible
- Configurability: Extensive customization without code modification
- Performance: Multiple optimization strategies for different scenarios
- Robustness: Comprehensive error handling and validation
- Usability: Clean interfaces with sensible defaults
Extensibility Points
- New minimization algorithms: Add via Val() dispatch system
- Custom model types: Extend rule extraction strategies
- Domain-specific processing: Custom alphabet filters and apply functions
- Output formats: Additional result formatters and exporters
See also: LumenConfig
, LumenResult
, extract_rules
, minimize_formula
SolePostHoc.RuleExtraction.Lumen.LumenConfig
— TypeLumenConfig
Configuration parameters for the Logic-driven Unified Minimal Extractor of Notions (LUMEN) algorithm.
This struct encapsulates all configuration options for the LUMEN algorithm, providing a clean interface with automatic validation and sensible defaults. It uses Julia's @kwdef
macro to enable keyword-based construction with default values.
Fields
Core Algorithm Parameters
minimization_scheme::Symbol = :AlgorithmName
: The DNF minimization algorithm to use:mitespresso
: Advanced minimization with good balance of speed/quality:boom
: Boom minimizator:abc
: Minimization whit Berkeley framework
Coverage Parameters
vertical::Float64 = 1.0
: Vertical coverage parameter (α) ∈ (0.0, 1.0] Controls how many instances must be covered by extracted ruleshorizontal::Float64 = 1.0
: Horizontal coverage parameter (β) ∈ (0.0, 1.0] Controls the breadth of rule coverage across feature space (% of different thresholds)
Processing Modes
ott_mode::Bool = false
: Optimized truth table processing Whentrue
, uses memory-efficient and time-efficient algorithms for large datasetscontrollo::Bool = false
: Enable validation mode Compares results between different processing methods for correctness verification
Customization Options
minimization_kwargs::NamedTuple = (;)
: Additional parameters for minimization algorithmsfilteralphabetcallback = identity
: Custom function to filter/modify the logical alphabetapply_function = nothing
: Custom function for model application Ifnothing
, automatically determined based on model type (with SoleModels)
Output Control
silent::Bool = false
: Suppress progress and diagnostic outputreturn_info::Bool = true
: Include additional metadata in resultsvetImportance::Vector = []
: Vector for tracking feature importance values
Testing and Debugging
testott = nothing
: Special testing mode for optimization validationalphabetcontroll = nothing
: Special mode for alphabet analysis only
Constructor Validation
The constructor automatically validates parameters and throws descriptive errors:
- Coverage parameters must be in range (0.0, 1.0]
- Minimization scheme must be supported
- Inconsistent parameter combinations are caught early
Examples
# Basic usage with defaults
config = LumenConfig()
# Customized configuration
config = LumenConfig(
minimization_scheme = :abc,
vertical = 0.8,
horizontal = 0.9,
silent = true
)
# Advanced configuration with custom processing
config = LumenConfig(
ott_mode = true,
minimization_kwargs = (max_iterations = 1000,),
filteralphabetcallback = my_custom_filter
)
See also: lumen
, LumenResult
, validate_config
SolePostHoc.RuleExtraction.Lumen.LumenResult
— TypeLumenResult
Comprehensive result structure containing extracted logical rules and associated metadata.
This immutable struct encapsulates all outputs from the LUMEN algorithm, providing a clean and extensible interface for accessing results. The design follows the principle of returning rich, self-documenting results rather than simple tuples.
Fields
decision_set::DecisionSet
: The primary output - a collection of minimized logical rules Each rule consists of a logical formula (antecedent) and a decision outcome (consequent)info::NamedTuple
: Extensible metadata container with algorithm-specific information Common fields include:vectPrePostNumber
: Vector of (pre, post) minimization term countsunminimized_ds
: Original decision set before minimization (if requested)processing_time
: Total algorithm execution timefeature_importance
: Feature ranking information (if available)
processing_time::Float64
: Total processing time in seconds Measured from algorithm start to completion, useful for performance analysis
Constructors
Two constructors are provided for different use cases:
# Full constructor - for complete results with metadata
LumenResult(decision_set, info_tuple, processing_time)
# Minimal constructor - when only rules are available
LumenResult(decision_set) # info=empty, processing_time=0.0
Design Rationale
This structured approach provides several advantages over returning raw tuples:
- Self-documentation: Field names clearly indicate content
- Type safety: Julia's type system validates structure at compile time
- Extensibility: Easy to add new fields without breaking existing code
- IDE support: Autocompletion and inline documentation
- Backward compatibility: Old code can still access fields by name
Examples
# Basic usage
result = lumen(model, config)
rules = result.decision_set
println("Extracted (length(rules.rules)) rules in (result.processing_time)s")
# Accessing metadata
if haskey(result.info, :vectPrePostNumber)
stats = result.info.vectPrePostNumber
total_reduction = sum(pre - post for (pre, post) in stats)
println("Reduced formula complexity by otal_reduction terms")
end
# Comparing minimized vs original rules
if haskey(result.info, :unminimized_ds)
original_rules = result.info.unminimized_ds
println("Minimization: (length(original_rules.rules)) → (length(result.decision_set.rules))")
end
See also: lumen
, LumenConfig
, DecisionSet
REFNE
SolePostHoc.RuleExtraction.REFNE.refne
— Methodrefne(m, Xmin, Xmax; L=100, perc=1.0, max_depth=-1, n_subfeatures=-1,
partial_sampling=0.7, min_samples_leaf=5, min_samples_split=2,
min_purity_increase=0.0, seed=3)
Extract interpretable rules from a trained neural network ensemble using decision tree approximation.
This implementation follows the REFNE-a (Rule Extraction From Neural Network Ensemble) algorithm, which approximates complex neural network behavior with an interpretable decision tree model.
Arguments
m
: Trained neural network model to extract rules fromXmin
: Minimum values for each input featureXmax
: Maximum values for each input featureL
: Number of samples to generate in the synthetic dataset (default: 100)perc
: Percentage of generated samples to use (default: 1.0)max_depth
: Maximum depth of the decision tree (default: -1, unlimited)n_subfeatures
: Number of features to consider at each split (default: -1, all)partial_sampling
: Fraction of samples used for each tree (default: 0.7)min_samples_leaf
: Minimum number of samples required at a leaf node (default: 5)min_samples_split
: Minimum number of samples required to split a node (default: 2)min_purity_increase
: Minimum purity increase required for a split (default: 0.0)seed
: Random seed for reproducibility (default: 3)
Returns
- A forest-decision trees representing the extracted rules
Description
The algorithm works by:
- Generating a synthetic dataset spanning the input space
- Using the neural network to label these samples
- Training a decision tree to approximate the neural network's behavior
References
- Zhi-Hua, Zhou, et al. Extracting Symbolic Rules from Trained Neural Network Ensembles
Example
```julia model = loaddecisiontree_model() refne(model, Xmin, Xmax)
See also AbstractModel
, DecisionList
, listrules
, rulemetrics
.
TREPAN
SolePostHoc.RuleExtraction.TREPAN.trepan
— Method- Mark W. Craven, et al. "Extracting Thee-Structured Representations of Thained Networks"
BATrees
SolePostHoc.RuleExtraction.BATrees.batrees
— Functionbatrees(f; dataset_name="iris", num_trees=10, max_depth=10, dsOutput=true)
Builds and trains a set of binary decision trees OR using the specified function f
.
Arguments
f
: An SoleForest.dataset_name::String
: The name of the dataset to be used. Default is "iris".num_trees::Int
: The number of trees to be built. Default is 10.max_depth::Int
: The maximum depth of each tree. Default is 10.dsOutput::Bool
: A flag indicating whether to return the dsStruct output. Default is true. if false, returns the result single tree.
Returns
- If
dsOutput
is true, returns the result is in DecisionSet ds. - If
dsOutput
is false, returns the result is SoleTree t`.
Example
RULECOSIPLUS
SolePostHoc.RuleExtraction.RULECOSIPLUS.rulecosiplus
— Methodrulecosiplus(ensemble::Any, X_train::Any, y_train::Any)
Extract interpretable rules from decision tree ensembles using the RuleCOSI+ algorithm.
This function implements the RuleCOSI+ methodology for rule extraction from trained ensemble classifiers, producing a simplified and interpretable rule-based model. The method combines and simplifies rules extracted from individual trees in the ensemble to create a more compact and understandable decision list.
Reference
Obregon, J. (2022). RuleCOSI+: Rule extraction for interpreting classification tree ensembles. Information Fusion, 89, 355-381. Available at: https://www.sciencedirect.com/science/article/pii/S1566253522001129
Arguments
ensemble::Any
: A trained ensemble classifier (e.g., Random Forest, Gradient Boosting) that will be serialized and converted to a compatible format for rule extraction.X_train::Any
: Training feature data. Can be a DataFrame or Matrix. If DataFrame, column names will be preserved in the extracted rules; otherwise, generic names (V1, V2, ...) will be generated.y_train::Any
: Training target labels corresponding toX_train
. Will be converted to string format for processing.
Returns
DecisionList
: A simplified decision list containing the extracted and combined rules from the ensemble, suitable for interpretable classification.
Details
The function performs the following steps:
- Converts input data to appropriate matrix format
- Generates or extracts feature column names
- Serializes the Julia ensemble to a Python-compatible format
- Builds an sklearn-compatible model using the serialized ensemble
- Applies RuleCOSI+ algorithm with the following default parameters:
metric="fi"
: Optimization metric for rule combinationn_estimators=100
: Number of estimators consideredtree_max_depth=100
: Maximum depth of treesconf_threshold=0.25
(α): Confidence threshold for rule filteringcov_threshold=0.1
(β): Coverage threshold for rule filteringverbose=2
: Detailed output during processing
- Extracts and converts rules to a decision list format
Configuration
The algorithm uses fixed parameters optimized for interpretability:
- Confidence threshold (α) = 0.25: Rules below this confidence are discarded
- Coverage threshold (β) = 0.1: Rules covering fewer samples are excluded
- Maximum rules = max(20, n_classes × 5): Adaptive limit based on problem complexity
Example
# Assuming you have a trained ensemble and training data
ensemble = ... # your trained ensemble
X_train = ... # training features
y_train = ... # training labels
# Extract interpretable rules
decision_list = rulecosiplus(ensemble, X_train, y_train)
Notes
- The function prints diagnostic information including the number of trees and dataset statistics
- Raw rules are displayed before conversion to decision list format
- Requires Python interoperability and the RuleCOSI implementation
- The resulting decision list provides an interpretable alternative to the original ensemble