Rule Extraction Methods
SolePostHoc.RuleExtraction.intrees — Method
intrees(model::Union{AbstractModel,DecisionForest}, X, y::AbstractVector{<:Label}; kwargs...)::DecisionListReturn a decision list which approximates the behavior of the input model on the specified supervised dataset. The set of relevant and non-redundant rules in the decision list are obtained by means of rule selection, rule pruning, and sequential covering (stel).
References
- Deng, Houtao. "Interpreting tree ensembles with intrees." International Journal of Data Science and Analytics 7.4 (2019): 277-287.
Keyword Arguments
prune_rules::Bool=true: access to prune or notpruning_s::Union{Float64,Nothing}=nothing: parameter that limits the denominator in the pruning metric calculationpruning_decay_threshold::Union{Float64,Nothing}=nothing: threshold used in pruning to remove or not a joint from the rulerule_selection_method::Symbol=:CBC: rule selection method. Currently only supports:CBCrule_complexity_metric::Symbol=:natoms: Metric to use for estimating a rule complexity measuremax_rules::Int=-1: maximum number of rules in the final decision list (excluding default rule). Use -1 for unlimited rules.min_coverage::Union{Float64,Nothing}=nothing: minimum rule coverage for stel- See
extractruleskeyword arguments...
Although the method was originally presented for forests it is hereby extended to work with any symbolic models.
See also AbstractModel, DecisionList, listrules, rulemetrics.
SolePostHoc.RuleExtraction.BATreesRuleExtractor — Type
Extract rules from a symbolic model using batrees.
See also extractrules, RuleExtractor.
SolePostHoc.RuleExtraction.InTreesRuleExtractor — Type
InTreesRuleExtractor(; kwargs...)Create a rule extractor based on the InTrees method.
Keyword Arguments
prune_rules::Bool=true: access to prune or notpruning_s::Union{Float64,Nothing}=nothing: parameter that limits the denominator in the pruning metric calculationpruning_decay_threshold::Union{Float64,Nothing}=nothing: threshold used in pruning to remove or not a joint from the rulerule_selection_method::Symbol=:CBC: rule selection method. Currently only supports:CBCrule_complexity_metric::Symbol=:natoms: Metric to use for estimating a rule complexity measuremin_coverage::Union{Float64,Nothing}=nothing: minimum rule coverage for stelrng::AbstractRNG=Random.TaskLocalRNG(): RNG used for any randomized steps (e.g., feature selection)
See also intrees.
SolePostHoc.RuleExtraction.LumenRuleExtractor — Type
Extract rules from a symbolic model using SolePostHoc.RuleExtraction.Lumen.
See also extractrules, RuleExtractor.
SolePostHoc.RuleExtraction.REFNERuleExtractor — Type
Extract rules from a symbolic model using SolePostHoc.RuleExtraction.REFNE.
See also extractrules, RuleExtractor.
SolePostHoc.RuleExtraction.RULECOSIPLUSRuleExtractor — Type
Extract rules from a symbolic model using SolePostHoc.RuleExtraction.RULECOSIPLUS.
See also extractrules, RuleExtractor.
SolePostHoc.RuleExtraction.TREPANRuleExtractor — Type
Extract rules from a symbolic model using SolePostHoc.RuleExtraction.TREPAN.
See also extractrules, RuleExtractor.
Lumen
SolePostHoc.RuleExtraction.Lumen.lumen — Method
lumen(config::LumenConfig, model::SM.AbstractModel) -> SM.DecisionSetCore single-model entry point for the LUMEN algorithm.
Extracts a minimized DecisionSet from model using the parameters encoded in config.
Pipeline
- Build
ExtractRulesDatafromconfigandmodel(atom extraction, truth-table enumeration, per-class grouping). - For each class, call
run_minimizationon the derived atom vectors. - Filter out classes for which no formula could be produced.
- Wrap the minimized formulas in
SM.Ruleobjects and return aDecisionSet.
Arguments
config::LumenConfig: Algorithm configuration (minimization scheme, depth, etc.).model::SM.AbstractModel: A single decision-tree model.
Returns
SM.DecisionSet: The minimized rule set.
lumen(config::LumenConfig, model::Vector{SM.AbstractModel}) -> LumenResultBatch variant: applies lumen(config, m) to every model in the vector and collects the results into a LumenResult.
lumen(model::SM.AbstractModel, args...; kwargs...) -> SM.DecisionSetConvenience wrapper: constructs a LumenConfig from keyword arguments and delegates to lumen(config, model).
lumen(model::Vector{SM.AbstractModel}, args...; kwargs...) -> LumenResultConvenience wrapper for vector of models: constructs LumenConfig from keyword arguments and maps over the vector.
Examples
# Single model with default settings
ds = lumen(my_tree)
# Single model with custom minimization scheme
ds = lumen(my_tree; minimization_scheme=:mitespresso, depth=0.8)
# Explicit config object
config = LumenConfig(minimization_scheme=:abc, depth=0.7)
ds = lumen(config, my_tree)
# Batch processing
results = lumen(config, [tree1, tree2, tree3])See also: LumenConfig, LumenResult, ExtractRulesData
SolePostHoc.RuleExtraction.Lumen.LumenConfig — Type
LumenConfig <: AbstractConfigConfiguration object for the LUMEN rule-extraction algorithm.
Bundles every tunable parameter into a single, validated, immutable struct. All fields are set through the keyword constructor, which performs range validation and resolves the correct minimizer binary before storing anything.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
minimization_scheme | Symbol | :abc | DNF minimization algorithm to use. |
binary | String | (auto) | Absolute path to the minimizer executable, resolved automatically from minimization_scheme. |
depth | Float64 | 1.0 | Fraction of each tree's BFS-ordered atoms to include ∈ (0, 1]. 1.0 uses the full alphabet. |
vertical | Float64 | 1.0 | Instance-coverage parameter α ∈ (0, 1]. |
horizontal | Float64 | 1.0 | Feature-coverage parameter β ∈ (0, 1]. |
minimization_kwargs | NamedTuple | (;) | Extra keyword arguments forwarded verbatim to the chosen minimizer. |
filt_alphabet | Base.Callable | identity | Optional callback applied to the logical alphabet before rule extraction. |
apply_function | Base.Callable | SM.apply | Function used to evaluate the model on generated input combinations. |
importance | Vector | Float64[] | Feature-importance weights; influences rule construction when non-empty. |
check_opt | Bool | false | When true, validates the OTT optimisation against the standard algorithm. |
check_alphabet | Bool | false | When true, runs alphabet-analysis diagnostics instead of full extraction. |
Supported minimization schemes
| Scheme | Backend | Notes |
|---|---|---|
:mitespresso | MIT Espresso | Balanced speed / quality. |
:boom | BOOM | Aggressive minimisation. |
:abc | Berkeley ABC | Fast, moderate compression. |
:abc_balanced | Berkeley ABC | Balanced ABC variant. |
:abc_thorough | Berkeley ABC | Thorough ABC variant. |
:quine | Quine–McCluskey | Exact minimisation. |
:quine_naive | Quine–McCluskey | Naïve variant, educational use. |
Validation
The constructor throws ArgumentError when:
- Any of
vertical,depth, orhorizontalis outside (0.0, 1.0]. minimization_schemeis not one of the supported symbols listed above.
Examples
# Default configuration
cfg = LumenConfig()
# Custom scheme and coverage parameters
cfg = LumenConfig(
minimization_scheme = :mitespresso,
depth = 0.7,
vertical = 0.9,
horizontal = 0.8,
)
# Pass extra kwargs to the minimizer and use a custom alphabet filter
cfg = LumenConfig(
minimization_scheme = :abc,
minimization_kwargs = (timeout = 30,),
filt_alphabet = alph -> my_filter(alph),
)See also: lumen, LumenResult, AbstractConfig
SolePostHoc.RuleExtraction.Lumen.LumenResult — Type
LumenResultLightweight container for the output produced by lumen.
Fields
decision_set::DecisionSet: The minimized rule set extracted from the model.info::NamedTuple: Auxiliary metadata. Empty(;)when not requested.
Constructors
LumenResult(decision_set, info) # Full construction with metadata.
LumenResult(decision_set) # Convenience constructor; info defaults to (;).Examples
result = lumen(model)
rules = result.decision_set
meta = result.info # NamedTuple – may be emptySee also: lumen, LumenConfig
REFNE
SolePostHoc.RuleExtraction.REFNE.refne — Method
refne(m, Xmin, Xmax; L=100, perc=1.0, max_depth=-1, n_subfeatures=-1,
partial_sampling=0.7, min_samples_leaf=5, min_samples_split=2,
min_purity_increase=0.0, seed=3)Extract interpretable rules from a trained neural network ensemble using decision tree approximation.
This implementation follows the REFNE-a (Rule Extraction From Neural Network Ensemble) algorithm, which approximates complex neural network behavior with an interpretable decision tree model.
Arguments
m: Trained neural network model to extract rules fromXmin: Minimum values for each input featureXmax: Maximum values for each input featureL: Number of samples to generate in the synthetic dataset (default: 100)perc: Percentage of generated samples to use (default: 1.0)max_depth: Maximum depth of the decision tree (default: -1, unlimited)n_subfeatures: Number of features to consider at each split (default: -1, all)partial_sampling: Fraction of samples used for each tree (default: 0.7)min_samples_leaf: Minimum number of samples required at a leaf node (default: 5)min_samples_split: Minimum number of samples required to split a node (default: 2)min_purity_increase: Minimum purity increase required for a split (default: 0.0)seed: Random seed for reproducibility (default: 3)
Returns
- A forest-decision trees representing the extracted rules
Description
The algorithm works by:
- Generating a synthetic dataset spanning the input space
- Using the neural network to label these samples
- Training a decision tree to approximate the neural network's behavior
References
- Zhi-Hua, Zhou, et al. Extracting Symbolic Rules from Trained Neural Network Ensembles
Example
model = load_decision_tree_model()
refne(model, Xmin, Xmax)See also AbstractModel, DecisionList, listrules, rulemetrics.
TREPAN
SolePostHoc.RuleExtraction.TREPAN.trepan — Method
- Mark W. Craven, et al. "Extracting Thee-Structured Representations of Thained Networks"
BATrees
SolePostHoc.RuleExtraction.BATrees.batrees — Function
batrees(f; dataset_name="iris", num_trees=10, max_depth=10, dsOutput=true)Builds and trains a set of binary decision trees OR using the specified function f.
Arguments
f: An SoleForest.dataset_name::String: The name of the dataset to be used. Default is "iris".num_trees::Int: The number of trees to be built. Default is 10.max_depth::Int: The maximum depth of each tree. Default is 10.dsOutput::Bool: A flag indicating whether to return the dsStruct output. Default is true. if false, returns the result single tree.
Returns
- If
dsOutputis true, returns the result is in DecisionSet ds. - If
dsOutputis false, returns the result is SoleTree t`.
Example
RULECOSIPLUS
SolePostHoc.RuleExtraction.RULECOSIPLUS.rulecosiplus — Method
rulecosiplus(ensemble::Any, X_train::Any, y_train::Any)Extract interpretable rules from decision tree ensembles using the RuleCOSI+ algorithm.
This function implements the RuleCOSI+ methodology for rule extraction from trained ensemble classifiers, producing a simplified and interpretable rule-based model. The method combines and simplifies rules extracted from individual trees in the ensemble to create a more compact and understandable decision list.
Reference
Obregon, J. (2022). RuleCOSI+: Rule extraction for interpreting classification tree ensembles. Information Fusion, 89, 355-381. Available at: https://www.sciencedirect.com/science/article/pii/S1566253522001129
Arguments
ensemble::Any: A trained ensemble classifier (e.g., Random Forest, Gradient Boosting) that will be serialized and converted to a compatible format for rule extraction.X_train::Any: Training feature data. Can be a DataFrame or Matrix. If DataFrame, column names will be preserved in the extracted rules; otherwise, generic names (V1, V2, ...) will be generated.y_train::Any: Training target labels corresponding toX_train. Will be converted to string format for processing.
Returns
DecisionList: A simplified decision list containing the extracted and combined rules from the ensemble, suitable for interpretable classification.
Details
The function performs the following steps:
- Converts input data to appropriate matrix format
- Generates or extracts feature column names
- Serializes the Julia ensemble to a Python-compatible format
- Builds an sklearn-compatible model using the serialized ensemble
- Applies RuleCOSI+ algorithm with the following default parameters:
metric="fi": Optimization metric for rule combinationn_estimators=100: Number of estimators consideredtree_max_depth=100: Maximum depth of treesconf_threshold=0.25(α): Confidence threshold for rule filteringcov_threshold=0.1(β): Coverage threshold for rule filteringverbose=2: Detailed output during processing
- Extracts and converts rules to a decision list format
Configuration
The algorithm uses fixed parameters optimized for interpretability:
- Confidence threshold (α) = 0.25: Rules below this confidence are discarded
- Coverage threshold (β) = 0.1: Rules covering fewer samples are excluded
- Maximum rules = max(20, n_classes × 5): Adaptive limit based on problem complexity
Example
# Assuming you have a trained ensemble and training data
ensemble = ... # your trained ensemble
X_train = ... # training features
y_train = ... # training labels
# Extract interpretable rules
decision_list = rulecosiplus(ensemble, X_train, y_train)Notes
- The function prints diagnostic information including the number of trees and dataset statistics
- Raw rules are displayed before conversion to decision list format
- Requires Python interoperability and the RuleCOSI implementation
- The resulting decision list provides an interpretable alternative to the original ensemble