ModalDecisionTrees
Welcome to the documentation for ModalDecisionTrees.
ModalDecisionTrees.AbstractDecision
ModalDecisionTrees.AbstractDecisionInternal
ModalDecisionTrees.AbstractDecisionLeaf
ModalDecisionTrees.AbstractNode
ModalDecisionTrees.AbstractScalarLogiset
ModalDecisionTrees.DTNode
ModalDecisionTrees.InfoNode
ModalDecisionTrees.MCARTState
ModalDecisionTrees.MLJInterface.ModalDecisionTree
ModalDecisionTrees.MLJInterface.ModalRandomForest
ModalDecisionTrees.RestrictedMCARTState
AbstractTrees.children
AbstractTrees.printnode
ModalDecisionTrees.build_forest
ModalDecisionTrees.build_stump
ModalDecisionTrees.build_tree
ModalDecisionTrees.limit_threshold_domain
ModalDecisionTrees.modalstep
ModalDecisionTrees.squashtoleaf
ModalDecisionTrees.wrap
ModalDecisionTrees.DTNode
— TypeUnion type for internal and decision nodes of a decision tree.
ModalDecisionTrees.AbstractDecision
— TypeA decision is an object that is placed at an internal decision node, and influences on how the instances are routed to its left or right child.
ModalDecisionTrees.AbstractDecisionInternal
— TypeAbstract type for internal decision nodes of a decision tree.
ModalDecisionTrees.AbstractDecisionLeaf
— TypeAbstract type for leaves in a decision tree.
ModalDecisionTrees.AbstractNode
— TypeAbstract type for nodes in a decision tree.
ModalDecisionTrees.AbstractScalarLogiset
— TypeLogical datasets with scalar features.
ModalDecisionTrees.InfoNode
— TypeInfoNode{T,S}
InfoLeaf{T}
These types are introduced so that additional information currently not present in a ModalDecisionTree
-structure – for example, the names of the variables – can be used for visualization. This additional information is stored in the variable info
of these types. It is a NamedTuple
. So it can be used to store arbitraty information, apart from the two points mentioned. In analogy to the type definitions of ModalDecisionTree
, the generic type S
is the type of the variable values used within a node as a threshold for the splits between its children and T
is the type of the output given (basically, a Number or a String).
ModalDecisionTrees.MCARTState
— TypeRecursion state for ModalCART (see paper On The Foundations of Modal Decision Trees)
ModalDecisionTrees.RestrictedMCARTState
— TypeTODO document vector of current worlds for each instance and modality
AbstractTrees.children
— Methodchildren(node::InfoNode)
Return for each node
given, its children.
In case of a ModalDecisionTree
there are always exactly two children, because the model produces binary trees where all nodes have exactly one left and one right child. children
is used for tree traversal. The additional information info
is carried over from node
to its children.
AbstractTrees.printnode
— Methodprintnode(io::IO, node::InfoNode)
printnode(io::IO, leaf::InfoLeaf)
Write a printable representation of node
or leaf
to output-stream io
. If node.info
/leaf.info
have a field called
modality_variable_names
it is expected to be an array of arrays of variable names corresponding to the variable names used in the tree nodes; note that there are two layers of reference because variables are grouped intomodalities
(see MLJ's docs for ModalDecisionTree: @doc ModalDecisionTree) They will be used for printing instead of the ids.
Note that the left subtree of any split node represents the 'yes-branch', while the right subtree the 'no-branch', respectively. print_tree
outputs the left subtree first and then below the right subtree.
ModalDecisionTrees.build_forest
— Methodbuild_stump(X, Y, W = nothing; kwargs...)
build_tree(X, Y, W = nothing; kwargs...)
build_forest(X, Y, W = nothing; kwargs...)
Train a decision stump (i.e., decision tree with depth 1), a decision tree, or a random forest model on logiset X
with labels Y
and weights W
.
ModalDecisionTrees.build_stump
— Methodbuild_stump(X, Y, W = nothing; kwargs...)
build_tree(X, Y, W = nothing; kwargs...)
build_forest(X, Y, W = nothing; kwargs...)
Train a decision stump (i.e., decision tree with depth 1), a decision tree, or a random forest model on logiset X
with labels Y
and weights W
.
ModalDecisionTrees.build_tree
— Methodbuild_stump(X, Y, W = nothing; kwargs...)
build_tree(X, Y, W = nothing; kwargs...)
build_forest(X, Y, W = nothing; kwargs...)
Train a decision stump (i.e., decision tree with depth 1), a decision tree, or a random forest model on logiset X
with labels Y
and weights W
.
ModalDecisionTrees.limit_threshold_domain
— MethodReferences:
- "Generalizing Boundary Points"
- "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning"
ModalDecisionTrees.modalstep
— MethodPerform the modal step, that is, evaluate an existential formula on a set of worlds, eventually computing the new world set.
ModalDecisionTrees.squashtoleaf
— MethodSquashes a vector of DTNode
s into a single leaf using bestguess
.
ModalDecisionTrees.wrap
— Functionwrap(node::MDT.DTInternal, info = NamedTuple())
wrap(leaf::MDT.AbstractDecisionLeaf, info = NamedTuple())
Add to each node
(or leaf
) the additional information info
and wrap both in an InfoNode
/InfoLeaf
. Typically a node
or a leaf
is obtained by creating a decision tree using either the native interface of ModalDecisionTrees.jl
or via other interfaces which are available for this package (e.g., MLJ
, see their docs for further details). Using the function build_tree
of the native interface returns such an object. To use a ModalDecisionTree mdt
(obtained this way) with the abstraction layer provided by the AbstractTrees
-interface implemented here and optionally add variable names (modality_variable_names
, an arrays of arrays of strings) use the following syntax:
wdc = wrap(mdt)
wdc = wrap(mdt, (modality_variable_names = modality_variable_names, ))
In the first case mdt
gets just wrapped, no information is added. No. 2 adds variable names. Note that the trailing comma is needed, in order to create a NamedTuple.
ModalDecisionTrees.MLJInterface.ModalDecisionTree
— TypeModalDecisionTree
A model type for constructing a modal decision tree, based on unknown.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
ModalDecisionTree = @load ModalDecisionTree pkg=unknown
Do model = ModalDecisionTree()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ModalDecisionTree(max_depth=...)
.
ModalDecisionTree
implements Modal C4.5. This classification and regression algorithm, originally presented in Manzella et al. (2021). "Interval Temporal Random Forests with an Application to COVID-19 Diagnosis". 10.4230/LIPIcs.TIME.2021.7, is an extension of the CART and C4.5 decision tree learning algorithms that leverages the expressive power of modal logics of time and space to perform temporal/spatial reasoning on non-scalar data, such as time-series and images.
The symbolic, probabilistic model is able to extract logical descriptions of the data in terms of logical formulas (see SoleLogics.jl) on atoms that are scalar conditions on the variables (or features); for example, min[V2] ≥ 10, that is, "the minimum of variable 2 is not less than 10". As such, the model is suitable for tasks that involve non-scalar data, but require some level of interpretable and transparent modeling. At the moment, the only loss functions available are Shannon's entropy (classification) and variance (regression).
Training data
In MLJ or MLJBase, bind an instance model
to data with mach = machine(model, X, y) where
X
: any table of input features (e.g., aDataFrame
) whose columns each have one of the following element scitypes:Continuous
,Count
,OrderedFactor
, or any 0-, 1-, 2-dimensional array with elements of these scitypes; check column scitypes withschema(X)
y
: is the target, which can be anyAbstractVector
whose element scitype isMulticlass
,Continuous
,Finite
, orTextual
; check the scitype withscitype(y)
Train the machine with fit!(mach)
.
Hyper-parameters
max_depth=-1
: Maximum depth of the decision tree (-1=any)min_samples_leaf=4
: Minimum number of samples required at each leafmin_purity_increase=0.002
: Minimum value for the loss function needed for a splitmax_purity_at_leaf=Inf
: Minimum value for the loss function needed for a splitmax_modal_depth=-1
: Maximum modal depth of the decision tree (-1=any). When this depth is reached, only propositional decisions are taken.n_subfeatures=0
: Number of features to select at random at each node (0 for all),
or a Function that outputs this number, given a number of available features.
feature=[minimum, maximum]
Feature functions to be used by the tree to mine scalar conditions (e.g.,minimum[V2] ≥ 10
)featvaltype=Float64
Output type for feature functions, when it cannot be inferred (e.g., with custom feature functions provided).relations=nothing
Relations that the model uses to look for patterns; it can be a symbol in [:IA, :IA3, :IA7, :RCC5, :RCC8], where :IA stands for Allen's Interval Algebra (13 relations in 1D, 169 relations in 2D), :IA3 and :IA7 are coarser fragments with 3 and 7 relations, respectively, :RCC5 and :RCC8 are Region Connection Calculus algebras with 5 and 8 topological operators, respectively. Relations from :IA, :IA3, :IA7, capture directional aspects of the relative arrangement of two intervals in time (or rectangles in a 2D space), while relations from :RCC5 and :RCC8 only capture topological aspects and are therefore rotation and flip-invariant. This hyper-parameter defaults to either no relation (adimensional data), IA7 interval relations (1- and 2-dimensional data)..initconditions=nothing
initial conditions for evaluating modal decisions at the root; it can be a symbol in [:startwithglobal, :startatcenter]. :startwithglobal forces the first decision to be a global decision (e.g.,⟨G⟩ (minimum[V2] ≥ 10)
, which translates to "there exists a region where the minimum of variable 2 is higher than 10"). :startatcenter forces the first decision to be evaluated on the smallest central world, that is, the central value of a time-series, or the central pixel of an image. This hyper-parameter defaults to :startwithglobal.downsize=true
Whether to perform automatic downsizing, by means of moving average. In fact, this algorithm has high complexity (both time and space), and can only handle small time-series (< 100 points) & small images (< 10 x 10 pixels). When set totrue
, automatic downsizing is performed; when it is anNTuple
ofInteger
s, a downsizing of dimensional data to match that size is performed.print_progress=false
: set totrue
for a progress barpost_prune=false
: set totrue
for post-fit pruningmerge_purity_threshold=1.0
: (post-pruning) merge leaves having combined purity>= merge_purity_threshold
display_depth=5
: max depth to show when displaying the tree(s)rng=Random.GLOBAL_RNG
: random number generator or seed
display_depth=5
: max depth to show when displaying the tree
Operations
predict(mach, Xnew)
: return predictions of the target given featuresXnew
having the same scitype asX
above.
Fitted parameters
The fields of fitted_params(mach)
are:
model
: the tree object, as returned by the core algorithmvar_grouping
: the adopted grouping of the features encountered in training, in an order consistent with the output ofprintmodel
. The MLJ interface can currently deal with scalar, temporal and spatial features, but has one limitation, and one tricky procedure for handling them at the same time. The limitation is for temporal and spatial features to be uniform in size across the instances (the algorithm will automatically throw away features that do not satisfy this constraint). As for the tricky procedure: before the learning phase, features are divided into groups (referred to asmodalities
) according to each variable'schannel size
, that is, the size of the vector or matrix. For example, if X is multimodal, and has three temporal features :x, :y, :z with 10, 10 and 20 points, respectively, plus three spatial features :R, :G, :B, with the same size 5 × 5 pixels, the algorithm assumes that :x and :y share a temporal axis, :R, :G, :B share two spatial axis, while :z does not share any axis with any other variable. As a result, the model will group features into three modalities: - {1} [:x, :y] - {2} [:z] - {3} [:R, :G, :B] andvar_grouping
will be [["x", "y"], ["z"], ["R", "G", "B"]].
"R", "G", "B"]
Report
The fields of report(mach)
are:
printmodel
: method to print a pretty representation of the fitted model, with single argument the tree depth. The interpretation of the tree requires you to understand how the current MLJ interface of ModalDecisionTrees.jl handles features of different modals. Seevar_grouping
above. Note that the split conditions (or decisions) in the tree are relativized to a specific modality, of which the number is shown.var_grouping
: the adopted grouping of the features encountered in training, in an order consistent with the output ofprintmodel
. Seevar_grouping
above.feature_importance_by_count
: a simple count of each of the occurrences of the features across the model, in an order consistent withvar_grouping
.classes_seen
: list of target classes actually observed in training.
Examples
using MLJ
using ModalDecisionTrees
using Random
tree = ModalDecisionTree(min_samples_leaf=4)
# Load an example dataset (a temporal one)
X, y = ModalDecisionTrees.load_japanesevowels()
N = length(y)
mach = machine(tree, X, y)
# Split dataset
p = randperm(N)
train_idxs, test_idxs = p[1:round(Int, N*.8)], p[round(Int, N*.8)+1:end]
# Fit
fit!(mach, rows=train_idxs)
# Perform predictions, compute accuracy
yhat = predict_mode(mach, X[test_idxs,:])
accuracy = MLJ.accuracy(yhat, y[test_idxs])
# Access raw model
fitted_params(mach).model
report(mach).printmodel(3)
"{1} ⟨G⟩ (max[coefficient1] <= 0.883491) 3 : 91/512 (conf = 0.1777)
✔ {1} ⟨G⟩ (max[coefficient9] <= -0.157292) 3 : 89/287 (conf = 0.3101)
│✔ {1} ⟨L̅⟩ (max[coefficient6] <= -0.504503) 3 : 89/209 (conf = 0.4258)
││✔ {1} ⟨A⟩ (max[coefficient3] <= 0.220312) 3 : 81/93 (conf = 0.8710)
[...]
││✘ {1} ⟨L̅⟩ (max[coefficient1] <= 0.493004) 8 : 47/116 (conf = 0.4052)
[...]
│✘ {1} ⟨A⟩ (max[coefficient2] <= -0.285645) 7 : 41/78 (conf = 0.5256)
│ ✔ {1} min[coefficient3] >= 0.002931 4 : 34/36 (conf = 0.9444)
[...]
│ ✘ {1} ⟨G⟩ (min[coefficient5] >= 0.18312) 7 : 39/42 (conf = 0.9286)
[...]
✘ {1} ⟨G⟩ (max[coefficient3] <= 0.006087) 5 : 51/225 (conf = 0.2267)
✔ {1} ⟨D⟩ (max[coefficient2] <= -0.301233) 5 : 51/102 (conf = 0.5000)
│✔ {1} ⟨D̅⟩ (max[coefficient3] <= -0.123654) 5 : 51/65 (conf = 0.7846)
[...]
│✘ {1} ⟨G⟩ (max[coefficient9] <= -0.146962) 7 : 16/37 (conf = 0.4324)
[...]
✘ {1} ⟨G⟩ (max[coefficient9] <= -0.424346) 1 : 47/123 (conf = 0.3821)
✔ {1} min[coefficient1] >= 1.181048 6 : 39/40 (conf = 0.9750)
[...]
✘ {1} ⟨G⟩ (min[coefficient4] >= -0.472485) 1 : 47/83 (conf = 0.5663)
[...]"
ModalDecisionTrees.MLJInterface.ModalRandomForest
— TypeModalRandomForest
A model type for constructing a modal random forest, based on unknown.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
ModalRandomForest = @load ModalRandomForest pkg=unknown
Do model = ModalRandomForest()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ModalRandomForest(sampling_fraction=...)
. ModalRandomForest
implements the standard Random Forest algorithm, originally published in Breiman, L. (2001): "Random Forests.", Machine Learning, vol. 45, pp. 5–32, based on Modal C4.5. This classification and regression algorithm, originally presented in Manzella et al. (2021). "Interval Temporal Random Forests with an Application to COVID-19 Diagnosis". 10.4230/LIPIcs.TIME.2021.7, is an extension of the CART and C4.5 decision tree learning algorithms that leverages the expressive power of modal logics of time and space to perform temporal/spatial reasoning on non-scalar data, such as time-series and images.
The symbolic, probabilistic model is able to extract logical descriptions of the data in terms of logical formulas (see SoleLogics.jl) on atoms that are scalar conditions on the variables (or features); for example, min[V2] ≥ 10, that is, "the minimum of variable 2 is not less than 10". As such, the model is suitable for tasks that involve non-scalar data, but require some level of interpretable and transparent modeling. At the moment, the only loss functions available are Shannon's entropy (classification) and variance (regression).
Training data
In MLJ or MLJBase, bind an instance model
to data with mach = machine(model, X, y) where
X
: any table of input features (e.g., aDataFrame
) whose columns each have one of the following element scitypes:Continuous
,Count
,OrderedFactor
, or any 0-, 1-, 2-dimensional array with elements of these scitypes; check column scitypes withschema(X)
y
: is the target, which can be anyAbstractVector
whose element scitype isMulticlass
,Continuous
,Finite
, orTextual
; check the scitype withscitype(y)
Train the machine with fit!(mach)
.
Hyper-parameters
n_trees=10
: number of trees to trainsampling_fraction=0.7
fraction of samples to train each tree on
max_depth=-1
: Maximum depth of the decision tree (-1=any)min_samples_leaf=1
: Minimum number of samples required at each leafmin_purity_increase=-Inf
: Minimum value for the loss function needed for a splitmax_purity_at_leaf=Inf
: Minimum value for the loss function needed for a splitmax_modal_depth=-1
: Maximum modal depth of the decision tree (-1=any). When this depth is reached, only propositional decisions are taken.n_subfeatures
: Number of features to select at random at each node (0 for all),
or a Function that outputs this number, given a number of available features. Defaulted to ceil(Int, sqrt(x))
.
feature=[minimum, maximum]
Feature functions to be used by the tree to mine scalar conditions (e.g.,minimum[V2] ≥ 10
)featvaltype=Float64
Output type for feature functions, when it cannot be inferred (e.g., with custom feature functions provided).relations=nothing
Relations that the model uses to look for patterns; it can be a symbol in [:IA, :IA3, :IA7, :RCC5, :RCC8], where :IA stands for Allen's Interval Algebra (13 relations in 1D, 169 relations in 2D), :IA3 and :IA7 are coarser fragments with 3 and 7 relations, respectively, :RCC5 and :RCC8 are Region Connection Calculus algebras with 5 and 8 topological operators, respectively. Relations from :IA, :IA3, :IA7, capture directional aspects of the relative arrangement of two intervals in time (or rectangles in a 2D space), while relations from :RCC5 and :RCC8 only capture topological aspects and are therefore rotation and flip-invariant. This hyper-parameter defaults to either no relation (adimensional data), IA7 interval relations (1- and 2-dimensional data)..initconditions=nothing
initial conditions for evaluating modal decisions at the root; it can be a symbol in [:startwithglobal, :startatcenter]. :startwithglobal forces the first decision to be a global decision (e.g.,⟨G⟩ (minimum[V2] ≥ 10)
, which translates to "there exists a region where the minimum of variable 2 is higher than 10"). :startatcenter forces the first decision to be evaluated on the smallest central world, that is, the central value of a time-series, or the central pixel of an image. This hyper-parameter defaults to :startwithglobal.downsize=true
Whether to perform automatic downsizing, by means of moving average. In fact, this algorithm has high complexity (both time and space), and can only handle small time-series (< 100 points) & small images (< 10 x 10 pixels). When set totrue
, automatic downsizing is performed; when it is anNTuple
ofInteger
s, a downsizing of dimensional data to match that size is performed.print_progress=false
: set totrue
for a progress barpost_prune=false
: set totrue
for post-fit pruningmerge_purity_threshold=1.0
: (post-pruning) merge leaves having combined purity>= merge_purity_threshold
display_depth=5
: max depth to show when displaying the tree(s)rng=Random.GLOBAL_RNG
: random number generator or seed
n_subrelations=identity
Number of relations to randomly select at any point of the tree. Must be a function of the number of the available relations. It defaults toidentity
, that is, consider all available relations.n_subfeatures=x -> ceil(Int64, sqrt(x))
Number of functions to randomly select at any point of the tree. Must be a function of the number of the available functions. It defaults tox -> ceil(Int64, sqrt(x))
, that is, consider only about square root of the available functions.ntrees=50
Number of trees in the forest.sampling_fraction=0.7
Fraction of samples to train each tree on.rng=Random.GLOBAL_RNG
Random number generator or seed.
Operations
predict(mach, Xnew)
: return predictions of the target given featuresXnew
having the same scitype asX
above. Predictions are probabilistic, but uncalibrated.predict_mode(mach, Xnew)
: instead return the mode of each prediction above.
Fitted parameters
The fields of fitted_params(mach)
are:
model
: the forest object, as returned by the core algorithmvar_grouping
: the adopted grouping of the features encountered in training, in an order consistent with the output ofprintmodel
. The MLJ interface can currently deal with scalar, temporal and spatial features, but has one limitation, and one tricky procedure for handling them at the same time. The limitation is for temporal and spatial features to be uniform in size across the instances (the algorithm will automatically throw away features that do not satisfy this constraint). As for the tricky procedure: before the learning phase, features are divided into groups (referred to asmodalities
) according to each variable'schannel size
, that is, the size of the vector or matrix. For example, if X is multimodal, and has three temporal features :x, :y, :z with 10, 10 and 20 points, respectively, plus three spatial features :R, :G, :B, with the same size 5 × 5 pixels, the algorithm assumes that :x and :y share a temporal axis, :R, :G, :B share two spatial axis, while :z does not share any axis with any other variable. As a result, the model will group features into three modalities: - {1} [:x, :y] - {2} [:z] - {3} [:R, :G, :B] andvar_grouping
will be [["x", "y"], ["z"], ["R", "G", "B"]].
Report
The fields of report(mach)
are:
printmodel
: method to print a pretty representation of the fitted model, with single argument the depth of the trees. The interpretation of the tree requires you to understand how the current MLJ interface of ModalDecisionTrees.jl handles features of different modals. Seevar_grouping
above. Note that the split conditions (or decisions) in the tree are relativized to a specific frame, of which the number is shown.var_grouping
: the adopted grouping of the features encountered in training, in an order consistent with the output ofprintmodel
. Seevar_grouping
above.feature_importance_by_count
: a simple count of each of the occurrences of the features across the model, in an order consistent withvar_grouping
.classes_seen
: list of target classes actually observed in training.
Examples
using MLJ
using ModalDecisionTrees
using Random
forest = ModalRandomForest(ntrees = 50)
# Load an example dataset (a temporal one)
X, y = ModalDecisionTrees.load_japanesevowels()
N = length(y)
mach = machine(forest, X, y)
# Split dataset
p = randperm(N)
train_idxs, test_idxs = p[1:round(Int, N*.8)], p[round(Int, N*.8)+1:end]
# Fit
fit!(mach, rows=train_idxs)
# Perform predictions, compute accuracy
Xnew = X[test_idxs,:]
ynew = predict_mode(mach, Xnew) # point predictions
accuracy = MLJ.accuracy(ynew, y[test_idxs])
yhat = predict_mode(mach, Xnew) # probabilistic predictions
pdf.(yhat, "1") # probabilities for one of the classes ("1")
# Access raw model
fitted_params(mach).model
report(mach).printmodel(3) # Note that the output here can be quite large.