Getting started

In this introductory section, you will learn about the main building blocks of ModalAssociationRules.jl. Also if a good picture about association rule mining (ARM, from now onwards) is given during the documentation, to make the most out of this guide we suggest reading the following articles:

The above introduce two important algorithms, which are also built-in into this package. Moreover, the latter one is the state-of-the-art in the field of ARM.

Further on in the documentation, the potential of ModalAssociationRules.jl will emerge: this package's raison d'être is to generalize the already existing ARM algorithms to modal logics, which are more expressive than propositional ones and computationally less expensive than first-order logic. If you are new to Sole.jl and you want to learn more about modal logic, please have a look at SoleLogics.jl for a general overview on the topic, or follow this documentation and return to this link if needed.

Core definitions

One Item is just a logical formula, which can be interpreted by a certain model. At the moment, here, we don't care about how models are represented by Sole.jl under the hood, or how the checking algorithm works: what matters is that Items are manipulated by ARM algorithms, which try to find which conjunctions between items are most statistically significant.

ModalAssociationRules.ItemsetType
const Itemset{I<:Item} = Vector{I}

Vector collecting multiple Items.

Semantically, Itemsets represent Items (that is, formulas) conjunctions.

Note

In the context of association rule mining, interesting itemsets are manipulated to discover interesting relations between Items, in the form of association rules (ARule).

Interestingness is established through a set of MeaningfulnessMeasure.

Details

Itemsets are implemented as a vector for two reasons:

  1. lookup is faster

when the collection is small (an itemset is unlikely to consist of more than 100 items);

  1. most of the time, we want to keep an ordering between items while searching for

interesting itemsets.

Examples

julia> p = ScalarCondition(VariableMin(1), >, 1.0)  |> Atom |> Item
min[V1] > 1.0
julia> q = ScalarCondition(VariableMin(2), >=, 0.0) |> Atom |> Item
min[V2] ≥ 0.0

julia> pq = Itemset([p,q])
julia> qp = Itemset([q,p])

julia> pq == qp
true
julia> pq === qp
false

julia> r = ScalarCondition(VariableMax(3), <=, 2.0) |> Atom |> Item
max[V3] ≤ 2.0
julia> pqr = [pq; r];

julia> pq in pqr
true

julia> formula(pqr) |> syntaxstring
"(min[V1] > 1.0) ∧ (min[V2] ≥ 0.0) ∧ (max[V3] ≤ 2.0)"

See also ARule, formula, gsupport, Item, lsupport, MeaningfulnessMeasure.

source

Notice that one Itemset could be a set, but actually it is a vector: this is because, often, ARM algorithms need to establish an order between items in itemsets to work efficiently. To convert an Itemset in its conjunctive normla form we simply call formula.

In general, an Itemset behaves exactly like you would expect a Vector{Item} would do. At the end of the day, the only difference is that manipulating an Itemset, for example through push! or union, guarantees the wrapped items always keep the same sorting.

Enough about Itemsets. Our final goal is to produce association rules.

ModalAssociationRules.ARuleType
const ARule = Tuple{Itemset,Itemset}

An association rule represents a frequent and meaningful co-occurrence relationship of the form "X ⇒ Y", between two Itemsets X and Y, where X ∩ Y = ∅, respectively called antecedent and consequent.

Note

Extracting all the ARule "hidden" in the data is the main purpose of association rule mining (ARM).

Given an itemset Z containing atleast two Items (|Z| ≥ 2), it can be partitioned in two (sub)itemsets X and Y; trying all the possible binary partitioning of Z is a systematic way to generate ARules.

The general framework always followed by ARM techniques is to, firstly, generate all the frequent itemsets considering a set of MeaningfulnessMeasure specifically tailored to work with Itemsets.

Thereafter, all the association rules are generated by testing all the combinations of frequent itemsets against another set of MeaningfulnessMeasure, this time designed to capture how interesting a rule is.

See also antecedent, consequent, gconfidence, Itemset, lconfidence, MeaningfulnessMeasure.

source

To print an ARule enriched with more informations (at the moment, this is everything we need to know), we can use the following.

Sometimes we could be interested in writing a function that consider a generic entity obtained through an association rule mining algorithm (frequent itemsets and, of course, association rules). Think about a dictionary mapping some extracted pattern to metadata. We call that generic entity "an ARM subject", and the following union type comes in help.

Measures

To establish when an ARMSubject is interesting, we need meaningfulness measures.

ModalAssociationRules.MeaningfulnessMeasureType
const MeaningfulnessMeasure = Tuple{Function, Threshold, Threshold}

To fully understand this description, we suggest reading this article.

In the classic propositional case scenario, we can think each instance as a propositional interpretation, or equivalently, as a Kripke frame containing only one world. In this setting, a meaningfulness measure indicates how many times a specific property of an Itemset (or an ARule) is satisfied.

The most important meaningfulness measure is support, defined as "the number of instances in a dataset that satisfy an itemset" (it is defined similarly for association rules, where we consider the itemset obtained by combining both rule's antecedent and consequent). Other meaningfulness measures can be defined in function of support.

In the context of modal logic, where the instances of a dataset are relational objects called Kripke frames, every meaningfulness measure must capture two aspects: how much an Itemset or an ARule is meaningful within an instance, and how much the same object is meaningful across all the instances, that is, how many times it resulted meaningful within an instance. Note that those two aspects coincide in the propositional scenario.

When a meaningfulness measure is applied locally within an instance, it is said to be "local". Otherwise, it is said to be "global". For example, local support is defined as "the number of worlds within an instance, which satisfy an itemset". To define global support we need to define a minimum local support threshold sl which is a real number between 0 and 1. Now, we can say that global support is "the number of instances for which local support overpassed the minimum local support threshold".

As in the propositional setting, more meaningfulness measures can be defined starting from support, but now they must respect the local/global dichotomy.

We now have all the ingredients to understand this type definition. A MeaningfulnessMeasure is a tuple composed of a global meaningfulness measure, a local threshold used internally, and a global threshold we would like our itemsets (rules) to overpass.

See also gconfidence, gsupport, lsupport, lconfidence.

source
ModalAssociationRules.islocalofMethod
islocalof(::Function, ::Function)::Bool

Twin method of isglobalof.

Trait to indicate that a local meaningfulness measure is used as subroutine in a global measure.

For example, islocalof(lsupport, gsupport) is true, and isglobalof(gsupport, lsupport) is false.

Warning

When implementing a custom meaningfulness measure, make sure to implement both islocalof/isglobalof and localof/globalof. This is fundamental to guarantee the correct behavior of some methods, such as getlocalthreshold. Alternatively, you can simply use the macro @linkmeas.

See also getlocalthreshold, gsupport, isglobalof, linkmeas, lsupport.

source

The following are little data structures which will return useful later, when you will read about how a dataset is mined, looking for ARMSubjects.

What follows is a list of the already built-in meaningfulness measures. In the Hands on section you will learn how to implement your own measure. More information are available in the Modal generalization section.

ModalAssociationRules.gsupportFunction
function gsupport(
    itemset::Itemset,
    X::SupportedLogiset,
    threshold::Threshold;
    miner::Union{Nothing,AbstractMiner}=nothing
)::Float64

Compute the global support for the given itemset on a logiset X, considering threshold as the threshold for the local support called internally.

Global support is the ratio between the number of LogicalInstances in a SupportedLogiset for which the local support, lsupport, is greater than a Threshold, and the total number of instances in the same logiset.

If a miner is provided, then its internal state is updated and used to leverage memoization.

See also Miner, lsupport, LogicalInstance, Itemset, SupportedLogiset, Threshold.

source
ModalAssociationRules.lconfidenceFunction
function lconfidence(
    rule::ARule,
    ith_instance::LogicalInstance;
    miner::Union{Nothing,AbstractMiner}=nothing
)::Float64

Compute the local confidence for the given rule.

Local confidence is the ratio between lsupport of an ARule on a LogicalInstance and the lsupport of the antecedent of the same rule.

If a miner is provided, then its internal state is updated and used to leverage memoization.

See also AbstractMiner, antecedent, ARule, LogicalInstance, lsupport, Threshold.

source
ModalAssociationRules.gconfidenceFunction
function gconfidence(
    rule::ARule,
    X::SupportedLogiset,
    threshold::Threshold;
    miner::Union{Nothing,AbstractMiner}=nothing
)::Float64

Compute the global confidence for the given rule on a logiset X, considering threshold as the threshold for the global support called internally.

Global confidence is the ratio between gsupport of an ARule on a SupportedLogiset and the gsupport of the only antecedent of the same rule.

If a miner is provided, then its internal state is updated and used to leverage memoization.

See also antecedent, ARule, AbstractMiner, gsupport, SupportedLogiset.

source

Mining structures

Finally, we are ready to start mining. To do so, we can create a custom AbstractMiner type.

ModalAssociationRules.AbstractMinerType

Any entity capable of perform association rule mining.

Interface

Each new concrete miner structure must define the following getters and setters. Actually, depending on its purposes, a structure may partially implement these dispatches. For example, Miner does completely implement the interface while Bulldozer does not.

  • data(miner::AbstractMiner)

  • items(miner::AbstractMiner)

  • algorithm(miner::AbstractMiner)

  • freqitems(miner::AbstractMiner)

  • arules(miner::AbstractMiner)

  • itemsetmeasures(miner::AbstractMiner)

  • arulemeasures(miner::AbstractMiner)

  • localmemo(miner::AbstractMiner)

  • localmemo!(miner::AbstractMiner)

  • globalmemo(miner::AbstractMiner)

  • globalmemo!(miner::AbstractMiner)

  • worldfilter(miner::AbstractMiner)

  • itemset_policies(miner::AbstractMiner)

  • arule_policies(miner::AbstractMiner)

  • miningstate(miner::AbstractMiner)

  • miningstate!(miner::AbstractMiner)

  • info(miner::AbstractMiner)

See also Miner, Bulldozer.

source

The main implementation of such an interface is embodied by the Miner object. To mine using a Miner, we just need to specify which dataset we are working with, together with a mining function, a vector of initial Items, and the MeaningfulnessMeasures to establish ARMSubject interestingness.

ModalAssociationRules.MinerType
struct Miner{
    D<:MineableData,
    I<:Item
} <: AbstractMiner
    X::D                            # target dataset

    algorithm::Function             # algorithm used to perform extraction

    items::Vector{I}                # alphabet

    # meaningfulness measures
    itemset_constrained_measures::Vector{<:MeaningfulnessMeasure}
    arule_constrained_measures::Vector{<:MeaningfulnessMeasure}

    freqitems::Vector{Itemset}      # collected frequent itemsets
    arules::Vector{ARule}           # collected association rules

    localmemo::LmeasMemo            # local memoization structure
    globalmemo::GmeasMemo           # global memoization structure

    worldfilter::Union{Nothing,WorldFilter} # metarules about world filterings
    itemset_policies::Vector{<:Function}    # metarules about itemsets mining
    arule_policies::Vector{<:Function}      # metarules about arules mining

    miningstate::MiningState        # mining algorithm miningstate (see documentation)

    info::Info                      # general informations

    # locks on memoization and miningstate structures
    lmemolock::ReentrantLock
    gmemolock::ReentrantLock
    miningstatelock::ReentrantLock
end

Concrete AbstractMiner containing both the data, the logic and the parameterization to perform association rule mining in the modal setting.

Examples

julia> using ModalAssociationRules
julia> using SoleData

# Load NATOPS DataFrame
julia> X_df, y = load_NATOPS();

# Convert NATOPS DataFrame to a Logiset
julia> X = scalarlogiset(X_df)

# Prepare some propositional atoms
julia> p = Atom(ScalarCondition(VariableMin(1), >, -0.5))
julia> q = Atom(ScalarCondition(VariableMin(2), <=, -2.2))
julia> r = Atom(ScalarCondition(VariableMin(3), >, -3.6))

# Prepare modal atoms using later relationship - see [`SoleLogics.IntervalRelation`](@ref))
julia> lp = box(IA_L)(p)
julia> lq = diamond(IA_L)(q)
julia> lr = box(IA_L)(r)

# Compose a vector of items, regrouping the atoms defined before
julia> my_alphabet = Vector{Item}([p, q, r, lp, lq, lr])

# Establish which meaningfulness measures you want to define the notion of itemset and
# association rule holding on an instance and on a modal dataset
julia> my_itemsetmeasures = [(gsupport, 0.1, 0.1)]
julia> my_rulemeasures = [(gconfidence, 0.1, 0.1)]

# (optional) Establish a filter to iterate the worlds in a generic modal instance
julia> my_worldfilter = SoleLogics.FunctionalWorldFilter(
        x -> length(x) >= 3 && length(x) <= 10, Interval{Int}
    )

# (optional) Establish a policy to further restrict itemsets that can be considered frequent
julia> my_itemset_policies = [islimited_length_itemset()]

# (optional) Establish a policy to further restrict rules that can be considered
# association rules
julia> my_arule_policies = [
        islimited_length_arule(), isanchored_arule(), isheterogeneous_arule()
    ]

# Create an association rule miner wrapping `fpgrowth` algorithm - see [`fpgrowth`](@ref);
julia> miner = Miner(X, fpgrowth, my_alphabet,
        my_itemsetmeasures, my_rulemeasures,
        worldfilter=my_worldfilter,
        itemset_policies=my_itemset_policies,
        arule_policies=my_arule_policies
    )

# We mine using mine!
# (optional) We could pass kwargs to forward them to the mining algorithm
julia> mine!(miner)

# Print all the mined association rules
julia> for arule in generaterules(miner)
    println(arule)
end

See also ARule, Bulldozer, MeaningfulnessMeasure, Info, isanchored_arule, isheterogeneous_arule, islimited_length_arule(), islimited_length_itemset(), Item, Itemset, GmeasMemo, LmeasMemo, MiningState, SoleLogics.WorldFilter.

source

Let us see which getters and setters are available for Miner.

After a Miner ends mining (we will see how to mine in a second), frequent Itemsets and ARule are accessible through the getters below.

To start the mining algorithm, simply call the following:

The mining call returns an ARule generator. Since the extracted rules could be several, it's up to you to collect all the rules in a step or arule_analysis them lazily, collecting them one at a time. You can also call the mining function ignoring it's return value, and then generate the rules later by calling the following.

During both the mining and the rules generation phases, the values returned by MeaningfulnessMeasure applied on a certain ARMSubject are saved (memoized) inside the Miner. Thanks to the methods hereafter, a Miner can avoid useless recomputations.

Miner customization

A Miner also contains two fields to keep additional information, those are info and miningstate.

The info field in Miner is a dictionary used to store extra information about the miner, such as statistics about mining. Currently, since the package is still being developed, the info field only contains a flag indicating whether the miner has been used for mining or not.

When writing your own mining algorithm, or when mining with a particular kind of dataset, you might need to specialize the Miner, keeping, for example, custom metadata and data structures. To specialize a Miner, you can fill a MiningState structure to fit your needs.

Parallelization

To support parallel mining, we provide a Bulldozer miner, that is, a lightweight copy of Miner which mines a specific section of the data in its own thread.

ModalAssociationRules.BulldozerType
struct Bulldozer{
    I<:Item,
    IMEAS<:MeaningfulnessMeasure
} <: AbstractMiner
    # data mineable by the Bulldozer
    data::D

    # original instance ids associated with the current slice of data
    # if this is 5:10, this this means that the first instance of the slice is
    # the original fifth and so on.
    instancesrange::UnitRange{<:Integer}

    # alphabet
    items::Vector{I}

    # measures associated with mined itemsets
    itemsetmeasures::Vector{<:MeaningfulnessMeasure}

    # meaningfulness measures memoization structure
    localmemo::LmeasMemo

    # special fields related to mining algorithms
    worldfilter::Union{Nothing,WorldFilter}
    itemset_policies::Vector{<:Function}
    miningstate::MiningState

    # locks on data, memoization structure and miningstate structure
    datalock::ReentrantLock
    memolock::ReentrantLock
    miningstatelock::ReentrantLock
}

Concrete AbstractMiner specialized to mine a single modal instance.

Bulldozer's interface is similar to Miner's one, but contains only the essential fields necessary to work locally within a Kripke structure, and is designed to be thread-safe.

Note

Bulldozers are designed to easily implement multi-threaded mining algorithms. When doing so, you can use a monolithic miner structure to collect the initial parameterization, map the computation on many bulldozers, each of which can be easily constructed from the miner itself, and then reduce the results together.

See also AbstractMiner, Miner.

source
ModalAssociationRules.miningstateMethod
miningstate(bulldozer::Bulldozer)::MiningState
miningstate(bulldozer::Bulldozer, key::Symbol)::Any
miningstate(bulldozer::Bulldozer, key::Symbol, inner_key)::Any

Getter for the customizable dictionary wrapped by a Bulldozer.

See also [miningstate!(::Bulldozer)].

source