Rule Extraction Methods

SolePostHoc.RuleExtraction.intreesMethod
intrees(model::Union{AbstractModel,DecisionForest}, X, y::AbstractVector{<:Label}; kwargs...)::DecisionList

Return a decision list which approximates the behavior of the input model on the specified supervised dataset. The set of relevant and non-redundant rules in the decision list are obtained by means of rule selection, rule pruning, and sequential covering (stel).

References

  • Deng, Houtao. "Interpreting tree ensembles with intrees." International Journal of Data Science and Analytics 7.4 (2019): 277-287.

Keyword Arguments

  • prune_rules::Bool=true: access to prune or not
  • pruning_s::Union{Float64,Nothing}=nothing: parameter that limits the denominator in the pruning metric calculation
  • pruning_decay_threshold::Union{Float64,Nothing}=nothing: threshold used in pruning to remove or not a joint from the rule
  • rule_selection_method::Symbol=:CBC: rule selection method. Currently only supports :CBC
  • rule_complexity_metric::Symbol=:natoms: Metric to use for estimating a rule complexity measure
  • max_rules::Int=-1: maximum number of rules in the final decision list (excluding default rule). Use -1 for unlimited rules.
  • min_coverage::Union{Float64,Nothing}=nothing: minimum rule coverage for stel
  • See extractrules keyword arguments...

Although the method was originally presented for forests it is hereby extended to work with any symbolic models.

See also AbstractModel, DecisionList, listrules, rulemetrics.

source
SolePostHoc.RuleExtraction.InTreesRuleExtractorType
InTreesRuleExtractor(; kwargs...)

Create a rule extractor based on the InTrees method.

Keyword Arguments

  • prune_rules::Bool=true: access to prune or not
  • pruning_s::Union{Float64,Nothing}=nothing: parameter that limits the denominator in the pruning metric calculation
  • pruning_decay_threshold::Union{Float64,Nothing}=nothing: threshold used in pruning to remove or not a joint from the rule
  • rule_selection_method::Symbol=:CBC: rule selection method. Currently only supports :CBC
  • rule_complexity_metric::Symbol=:natoms: Metric to use for estimating a rule complexity measure
  • min_coverage::Union{Float64,Nothing}=nothing: minimum rule coverage for stel
  • rng::AbstractRNG=Random.TaskLocalRNG(): RNG used for any randomized steps (e.g., feature selection)

See also intrees.

source

Lumen

SolePostHoc.RuleExtraction.Lumen.lumenMethod
lumen(config::LumenConfig, model::SM.AbstractModel) -> SM.DecisionSet

Core single-model entry point for the LUMEN algorithm.

Extracts a minimized DecisionSet from model using the parameters encoded in config.

Pipeline

  1. Build ExtractRulesData from config and model (atom extraction, truth-table enumeration, per-class grouping).
  2. For each class, call run_minimization on the derived atom vectors.
  3. Filter out classes for which no formula could be produced.
  4. Wrap the minimized formulas in SM.Rule objects and return a DecisionSet.

Arguments

  • config::LumenConfig: Algorithm configuration (minimization scheme, depth, etc.).
  • model::SM.AbstractModel: A single decision-tree model.

Returns

  • SM.DecisionSet: The minimized rule set.

lumen(config::LumenConfig, model::Vector{SM.AbstractModel}) -> LumenResult

Batch variant: applies lumen(config, m) to every model in the vector and collects the results into a LumenResult.


lumen(model::SM.AbstractModel, args...; kwargs...) -> SM.DecisionSet

Convenience wrapper: constructs a LumenConfig from keyword arguments and delegates to lumen(config, model).


lumen(model::Vector{SM.AbstractModel}, args...; kwargs...) -> LumenResult

Convenience wrapper for vector of models: constructs LumenConfig from keyword arguments and maps over the vector.

Examples

# Single model with default settings
ds = lumen(my_tree)

# Single model with custom minimization scheme
ds = lumen(my_tree; minimization_scheme=:mitespresso, depth=0.8)

# Explicit config object
config = LumenConfig(minimization_scheme=:abc, depth=0.7)
ds = lumen(config, my_tree)

# Batch processing
results = lumen(config, [tree1, tree2, tree3])

See also: LumenConfig, LumenResult, ExtractRulesData

source
SolePostHoc.RuleExtraction.Lumen.LumenConfigType
LumenConfig <: AbstractConfig

Configuration object for the LUMEN rule-extraction algorithm.

Bundles every tunable parameter into a single, validated, immutable struct. All fields are set through the keyword constructor, which performs range validation and resolves the correct minimizer binary before storing anything.

Fields

FieldTypeDefaultDescription
minimization_schemeSymbol:abcDNF minimization algorithm to use.
binaryString(auto)Absolute path to the minimizer executable, resolved automatically from minimization_scheme.
depthFloat641.0Fraction of each tree's BFS-ordered atoms to include ∈ (0, 1]. 1.0 uses the full alphabet.
verticalFloat641.0Instance-coverage parameter α ∈ (0, 1].
horizontalFloat641.0Feature-coverage parameter β ∈ (0, 1].
minimization_kwargsNamedTuple(;)Extra keyword arguments forwarded verbatim to the chosen minimizer.
filt_alphabetBase.CallableidentityOptional callback applied to the logical alphabet before rule extraction.
apply_functionBase.CallableSM.applyFunction used to evaluate the model on generated input combinations.
importanceVectorFloat64[]Feature-importance weights; influences rule construction when non-empty.
check_optBoolfalseWhen true, validates the OTT optimisation against the standard algorithm.
check_alphabetBoolfalseWhen true, runs alphabet-analysis diagnostics instead of full extraction.

Supported minimization schemes

SchemeBackendNotes
:mitespressoMIT EspressoBalanced speed / quality.
:boomBOOMAggressive minimisation.
:abcBerkeley ABCFast, moderate compression.
:abc_balancedBerkeley ABCBalanced ABC variant.
:abc_thoroughBerkeley ABCThorough ABC variant.
:quineQuine–McCluskeyExact minimisation.
:quine_naiveQuine–McCluskeyNaïve variant, educational use.

Validation

The constructor throws ArgumentError when:

  • Any of vertical, depth, or horizontal is outside (0.0, 1.0].
  • minimization_scheme is not one of the supported symbols listed above.

Examples

# Default configuration
cfg = LumenConfig()

# Custom scheme and coverage parameters
cfg = LumenConfig(
    minimization_scheme = :mitespresso,
    depth               = 0.7,
    vertical            = 0.9,
    horizontal          = 0.8,
)

# Pass extra kwargs to the minimizer and use a custom alphabet filter
cfg = LumenConfig(
    minimization_scheme  = :abc,
    minimization_kwargs  = (timeout = 30,),
    filt_alphabet        = alph -> my_filter(alph),
)

See also: lumen, LumenResult, AbstractConfig

source
SolePostHoc.RuleExtraction.Lumen.LumenResultType
LumenResult

Lightweight container for the output produced by lumen.

Fields

  • decision_set::DecisionSet: The minimized rule set extracted from the model.
  • info::NamedTuple: Auxiliary metadata. Empty (;) when not requested.

Constructors

LumenResult(decision_set, info)   # Full construction with metadata.
LumenResult(decision_set)         # Convenience constructor; info defaults to (;).

Examples

result = lumen(model)
rules  = result.decision_set
meta   = result.info          # NamedTuple – may be empty

See also: lumen, LumenConfig

source

REFNE

SolePostHoc.RuleExtraction.REFNE.refneMethod
refne(m, Xmin, Xmax; L=100, perc=1.0, max_depth=-1, n_subfeatures=-1, 
      partial_sampling=0.7, min_samples_leaf=5, min_samples_split=2, 
      min_purity_increase=0.0, seed=3)

Extract interpretable rules from a trained neural network ensemble using decision tree approximation.

This implementation follows the REFNE-a (Rule Extraction From Neural Network Ensemble) algorithm, which approximates complex neural network behavior with an interpretable decision tree model.

Arguments

  • m: Trained neural network model to extract rules from
  • Xmin: Minimum values for each input feature
  • Xmax: Maximum values for each input feature
  • L: Number of samples to generate in the synthetic dataset (default: 100)
  • perc: Percentage of generated samples to use (default: 1.0)
  • max_depth: Maximum depth of the decision tree (default: -1, unlimited)
  • n_subfeatures: Number of features to consider at each split (default: -1, all)
  • partial_sampling: Fraction of samples used for each tree (default: 0.7)
  • min_samples_leaf: Minimum number of samples required at a leaf node (default: 5)
  • min_samples_split: Minimum number of samples required to split a node (default: 2)
  • min_purity_increase: Minimum purity increase required for a split (default: 0.0)
  • seed: Random seed for reproducibility (default: 3)

Returns

  • A forest-decision trees representing the extracted rules

Description

The algorithm works by:

  1. Generating a synthetic dataset spanning the input space
  2. Using the neural network to label these samples
  3. Training a decision tree to approximate the neural network's behavior

References

  • Zhi-Hua, Zhou, et al. Extracting Symbolic Rules from Trained Neural Network Ensembles

Example

model = load_decision_tree_model()
refne(model, Xmin, Xmax)

See also AbstractModel, DecisionList, listrules, rulemetrics.

source

TREPAN

BATrees

SolePostHoc.RuleExtraction.BATrees.batreesFunction
batrees(f; dataset_name="iris", num_trees=10, max_depth=10, dsOutput=true)

Builds and trains a set of binary decision trees OR using the specified function f.

Arguments

  • f: An SoleForest.
  • dataset_name::String: The name of the dataset to be used. Default is "iris".
  • num_trees::Int: The number of trees to be built. Default is 10.
  • max_depth::Int: The maximum depth of each tree. Default is 10.
  • dsOutput::Bool: A flag indicating whether to return the dsStruct output. Default is true. if false, returns the result single tree.

Returns

  • If dsOutput is true, returns the result is in DecisionSet ds.
  • If dsOutput is false, returns the result is SoleTree t`.

Example

source

RULECOSIPLUS

SolePostHoc.RuleExtraction.RULECOSIPLUS.rulecosiplusMethod
rulecosiplus(ensemble::Any, X_train::Any, y_train::Any)

Extract interpretable rules from decision tree ensembles using the RuleCOSI+ algorithm.

This function implements the RuleCOSI+ methodology for rule extraction from trained ensemble classifiers, producing a simplified and interpretable rule-based model. The method combines and simplifies rules extracted from individual trees in the ensemble to create a more compact and understandable decision list.

Reference

Obregon, J. (2022). RuleCOSI+: Rule extraction for interpreting classification tree ensembles. Information Fusion, 89, 355-381. Available at: https://www.sciencedirect.com/science/article/pii/S1566253522001129

Arguments

  • ensemble::Any: A trained ensemble classifier (e.g., Random Forest, Gradient Boosting) that will be serialized and converted to a compatible format for rule extraction.
  • X_train::Any: Training feature data. Can be a DataFrame or Matrix. If DataFrame, column names will be preserved in the extracted rules; otherwise, generic names (V1, V2, ...) will be generated.
  • y_train::Any: Training target labels corresponding to X_train. Will be converted to string format for processing.

Returns

  • DecisionList: A simplified decision list containing the extracted and combined rules from the ensemble, suitable for interpretable classification.

Details

The function performs the following steps:

  1. Converts input data to appropriate matrix format
  2. Generates or extracts feature column names
  3. Serializes the Julia ensemble to a Python-compatible format
  4. Builds an sklearn-compatible model using the serialized ensemble
  5. Applies RuleCOSI+ algorithm with the following default parameters:
    • metric="fi": Optimization metric for rule combination
    • n_estimators=100: Number of estimators considered
    • tree_max_depth=100: Maximum depth of trees
    • conf_threshold=0.25 (α): Confidence threshold for rule filtering
    • cov_threshold=0.1 (β): Coverage threshold for rule filtering
    • verbose=2: Detailed output during processing
  6. Extracts and converts rules to a decision list format

Configuration

The algorithm uses fixed parameters optimized for interpretability:

  • Confidence threshold (α) = 0.25: Rules below this confidence are discarded
  • Coverage threshold (β) = 0.1: Rules covering fewer samples are excluded
  • Maximum rules = max(20, n_classes × 5): Adaptive limit based on problem complexity

Example

# Assuming you have a trained ensemble and training data
ensemble = ... # your trained ensemble
X_train = ... # training features
y_train = ... # training labels

# Extract interpretable rules
decision_list = rulecosiplus(ensemble, X_train, y_train)

Notes

  • The function prints diagnostic information including the number of trees and dataset statistics
  • Raw rules are displayed before conversion to decision list format
  • Requires Python interoperability and the RuleCOSI implementation
  • The resulting decision list provides an interpretable alternative to the original ensemble
source