Association rule mining with modal logic

Symbolic modal learning is a branch of machine learning which deals with training classical symbolic machine learning models (e.g., list and set of rules, decision trees, random forests, association rules, etc.) but substituting propositional logic with a more expressive logical formalism (yet, computationally more affordable than first order logic), that is, a specific kind of modal logic.

In the context of this package, modal logic helps us highlight complex relations hidden in data, especially in unstructured data, consisting of graph-like relational data, time series, spatial databases, text, etc. For more information about the modal symbolic learning, we suggest reading the main page of Sole.jl framework and SoleLogics.jl.

The idea is to discretize complex data into relational objects called Kripke models, each of which consists of many propositional models called worlds, and expliciting the relations between worlds. In this way, it is possible to mine complex Itemset, including a certain subset of Item that are true on a target world, but also modally enhanced items that are true on related worlds.

A picture is worth a thousand words. Here you are a slightly more complex example, with respect to the one at the top of Getting started section. We consider this monovariate time series:

Monovariate time series.

We want to encode a graph-like structure from the data above. We could think of various strategies, one of which is to consider every contiguous subsequence in the time series and model it as a set of intervals.

Monovariate time series sliced into intervals.

At this point, we can see every resulting blue signal as a propositional model, on which items may be evaluated as true or false. In the modal logic jargon, this is exactly a Kripke model.

Kripke model resulting from the image above.

After fixing a set of suitable relations, we express them with arcs in the structure. Without defining them, we graphically present some possible relations between intervals. The one below is the begins relation.

Begins relation.

Conversely, this one is the ends relation.

Ends relation.

When an interval is completely included in another one, then we say that it happens during the other one.

During relation.

Finally, we say that an interval comes just after another one if its end coincides with the beginning of the other one.

After relation.

Every relation $R$ can be declined in an existential or a universal way. In the former (latter) case, given an item $p$, we say that $<R>p$ is true on $w$ if at least one world (all worlds) $w'$ is such that $wRw'$ and $p$ is true on $w'$. Such relations can be encoded thanks to SoleLogics.jl; in particular, we use diamond(relation_name) to indicate an existential modality while box(relation_name) to indicate universal ones:

myitem = ScalarCondition(VariableDistance(1, [1,2,3]), <=, 1.0) |> diamond(IA_L)

Meaningfulness measures

We already introduced lsupport, gsupport, lconfidence and gconfidence in the Getting started section. Other measures that are already built into the package, are the following; note how their definition always depends on the notion of lsupport, and they are always organized in both local and global versions.

Local means that a measure is designed to be applied within a modal instance (a Kripke model), while global keywords denotes the fact that the computation is performed across instances.

ModalAssociationRules.lliftFunction
function llift(
    rule::ARule,
    ith_instance::LogicalInstance;
    miner::Union{Nothing,AbstractMiner}=nothing
)::Float64

Compute the local lift for the given rule.

Local lift measures how far from independence are rule's antecedent and consequent on a modal logic instance.

Given an ARule X ⇒ Y, if local lift value is around 1, then this means that P(X ⋃ Y) = P(X)P(Y), and hence, the two Itemsets X and Y are independant. If value is greater than (lower than) 1, then this means that X and Y are dependant and positively (negatively) correlated Itemsets.

If a miner is provided, then its internal state is updated and used to leverage memoization.

See also AbstractMiner, antecedent, ARule, glift, LogicalInstance, llift, Threshold.

source
ModalAssociationRules.lconvictionFunction
function lconviction(
    rule::ARule,
    ith_instance::LogicalInstance;
    miner::Union{Nothing,AbstractMiner}=nothing
)::Float64

Compute the local conviction for the given rule.

Conviction attempts to measure the degree of implication of a rule. It's value ranges from 0 to +∞. Unlike lift, conviction is sensitive to rule direction; like lift, values far from 1 indicate interesting rules.

If a miner is provided, then its internal state is updated and used to leverage memoization.

See also AbstractMiner, antecedent, ARule, LogicalInstance, llift, Threshold.

source

In general, we can define new meaningfulness measures by leveraging the following macros, ensuring to avoid solving repeated subproblems when computing the measure.

ModalAssociationRules.@localmeasureMacro
macro localmeasure(measname, measlogic)

Build a generic local meaningfulness measure, levering the optimizations provided by any AbstractMiner.

Arguments

  • measname: the name of the local measure you are defining (e.g., lsupport);
  • measlogic: a lambda function whose arguments are (itemset, data, ith_instance, miner) -

see the note below to know more about this.

Note

When defining a new local measure, you only need to write its essential logic through a lambda function (itemset, X, ith_instance, miner).

In particular, itemset is an Itemset, X is a reference to the dataset, ith_instance is an integer defining on which instance you want to compute the measure, and miner is the AbstractMiner in which you want to save the measure.

Also, miner argument can be used to leverage its miningstate structure. A complete example of the logic behind local support is shown below:

_lsupport_logic = (itemset, X, ith_instance, miner) -> begin
    # vector representing on which world an Itemset holds
    wmask = [
        check(formula(itemset), X, ith_instance, w) for w in allworlds(X, ith_instance)]

    # return the result enriched with more informations, that will eventually will be
    # used if miner's miningstate has specific fields (e.g., :worldmask).
    return Dict(
        :measure => count(wmask) / length(wmask),
        :worldmask => wmask,
    )
end

See also AbstractMiner, hasminingstate, lsupport, miningstate.

source
ModalAssociationRules.@globalmeasureMacro
macro globalmeasure(measname, measlogic)

Build a generic global meaningfulness measure, levering the optimizations provided by any AbstractMiner.

Arguments

  • measname: the name of the global measure you are defining (e.g., gsupport);
  • measlogic: a lambda function whose arguments are (rule, X, threshold, miner) - see the

note below to know more about this.

Note

When defining a new global measure, you only need to write its essential logic through a lambda function (itemset, X, ith_instance, miner).

In particular, itemset is an Itemset, X is a reference to the dataset and miner is the AbstractMiner in which you want to save the measure.

Also, miner argument can be used to leverage its miningstate structure. A complete example of the logic behind global support is shown below:

_gsupport_logic = (itemset, X, threshold, miner) -> begin
    _measure = sum([
        lsupport(itemset, getinstance(X, ith_instance), miner) >= threshold
        for ith_instance in 1:ninstances(X)
    ]) / ninstances(X)

    # at the moment, no `miningstate` fields in miner are leveraged
    return Dict(:measure => _measure)
end

See also AbstractMiner, hasminingstate, gsupport, miningstate.

source

You can identify which is the local (global) counterpart of a global (local) meaningfulness measure with the following utility dispatches.

ModalAssociationRules.islocalofMethod
islocalof(::Function, ::Function)::Bool

Twin method of isglobalof.

Trait to indicate that a local meaningfulness measure is used as subroutine in a global measure.

For example, islocalof(lsupport, gsupport) is true, and isglobalof(gsupport, lsupport) is false.

Warning

When implementing a custom meaningfulness measure, make sure to implement both islocalof/isglobalof and localof/globalof. This is fundamental to guarantee the correct behavior of some methods, such as getlocalthreshold. Alternatively, you can simply use the macro @linkmeas.

See also getlocalthreshold, gsupport, isglobalof, linkmeas, lsupport.

source

Of course, you can even link your measures local-to-global and viceversa. See also findmeasure.

Mining Algorithms

The core of every association rule mining workflow consists of extracting the frequent patterns from data. To explore the search space of all such patterns, we can use a "level-wise" approach, based on breadth-first search algorithm, called apriori, or a "vertical data format" approach, based on a depth-first search, called eclat.

Missing docstring.

Missing docstring for apriori. Check Documenter's build log for details.

ModalAssociationRules.eclatFunction
eclat(miner::Miner, X::MineableData; verbose::Bool=true)::Nothing

Examples

julia> using ModalAssociationRules julia> Xdf, y = loadNATOPS(); julia> Xdf1havecommand = Xdf[1:30, :] julia> X1 = scalarlogiset(Xdf1have_command)

julia> manualp = Atom(ScalarCondition(VariableMin(1), >, -0.5)) julia> manualq = Atom(ScalarCondition(VariableMin(2), <=, -2.2)) julia> manualr = Atom(ScalarCondition(VariableMin(3), >, -3.6)) julia> manuallp = box(IAL)(manualp) julia> manuallq = diamond(IAL)(manualq) julia> manuallr = box(IAL)(manualr) julia> julia> manualitems = Vector{Item}([manualp, manualq, manualr, manuallp, manuallq, manual_lr])

julia> 1items = Vector{Item}([manualp, manualq, manuallp, manuallq]) julia> 1itemsetmeasures = [(gsupport, 0.1, 0.1)] julia> 1rulemeasures = [(gconfidence, 0.2, 0.2)]

julia> eclatminer = Miner(X1, eclat, _1items, 1itemsetmeasures, 1rulemeasures)

julia> mine!(eclat_miner)

See also anchored_eclat, Miner.

source

We suggest to run fpgrowth, as it is the most performant algorithm among those implemented at the moment of writing. It is based on a compression strategy, based on iteratively "projecting" the initial dataset; projections are particular slicings of the initial dataset, keeping only the information needed for continuining the mining process.

Missing docstring.

Missing docstring for fpgrowth. Check Documenter's build log for details.

Missing docstring.

Missing docstring for FPTree. Check Documenter's build log for details.

Mining Policies

It is possible to limit the action of the mining, to force an AbstractMiner to only consider a subset of the available data.

Missing docstring.

Missing docstring for worldfilter. Check Documenter's build log for details.

We can also constrain the generation of new itemsets and rules by defining a vector of policies. Every policy is a closure returning a custom parameterized checker for a certain constraint.

For what regards itemsets, the following dispatches are available:

Missing docstring.

Missing docstring for itemset_policies. Check Documenter's build log for details.

Missing docstring.

Missing docstring for islimited_length_itemset. Check Documenter's build log for details.

Missing docstring.

Missing docstring for isanchored_itemset. Check Documenter's build log for details.

Missing docstring.

Missing docstring for isdimensionally_coherent_itemset. Check Documenter's build log for details.

The following are referred to association rules:

Missing docstring.

Missing docstring for arule_policies. Check Documenter's build log for details.

Missing docstring.

Missing docstring for islimited_length_arule. Check Documenter's build log for details.

Missing docstring.

Missing docstring for isanchored_arule. Check Documenter's build log for details.

Missing docstring.

Missing docstring for isheterogeneous_arule. Check Documenter's build log for details.

To apply the policies, call the following.

Missing docstring.

Missing docstring for Base.filter!(targets::Vector{Union{ARule,Itemset}}, policies_pool::Vector{Function}). Check Documenter's build log for details.

Anchored semantics

To ensure the mining process is fair when dealing with modal operators, we must ensure that the miner is compliant with anchored semantics constraints.

To understand why the fairness of the frequent pattern extraction process is not guaranteed in the modal scenario, we suggest reading this short paper.

Missing docstring.

Missing docstring for isanchored_miner. Check Documenter's build log for details.

Missing docstring.

Missing docstring for anchored_apriori. Check Documenter's build log for details.

Missing docstring.

Missing docstring for anchored_fpgrowth. Check Documenter's build log for details.

Missing docstring.

Missing docstring for anchored_eclat. Check Documenter's build log for details.

Each algorithm above is simply a small wrapper around anchored_semantics:

Missing docstring.

Missing docstring for anchored_semantics. Check Documenter's build log for details.

Utilities

The following utilities often involve performing some combinatoric trick between Itemsets and ARules, and might be useful to avoid reinventing the wheel.

Missing docstring.

Missing docstring for combine_items. Check Documenter's build log for details.

Missing docstring.

Missing docstring for grow_prune. Check Documenter's build log for details.

Missing docstring.

Missing docstring for anchored_grow_prune. Check Documenter's build log for details.

These are three common types definition appearing during modal association rule mining.

We can enrich the MiningState of an AbstractMiner by using the following trait, depending on the specific algorithm and the kind of data we want to handle.

Missing docstring.

Missing docstring for initminingstate(::typeof(fpgrowth), ::MineableData). Check Documenter's build log for details.