Getting started

In this introductory section you will learn about the main building blocks of ModalAssociationRules.jl. Also if a good picture about association rule mining (ARM, from now onwards) is given during the documentation, to make the most out of this guide we suggest to read the following articles:

Those up above introduce two important algorithms, which are also built-in in this package. Moreover, the latter one is the state-of-the-art in the field of ARM.

Further on in the documentation, the potential of ModalAssociationRules.jl will emerge: this package's raison d'être is to generalize the already existing ARM algorithms to modal logics, which are more expressive than propositional one and computationally less expensive than first order logic. If you are new to Sole.jl and you want to learn more about modal logic, please have a look at SoleLogics.jl for a general overview on the topic, or follow this documentation and return to this link if needed.

Core definitions

One Item is just a logical formula, which can be interpreted by a certain model. At the moment, here, we don't care about how models are represented by Sole.jl under the hood, or how the checking algorithm works: what matters is that Items are manipulated by ARM algorithms, which try to find which conjunctions between items are most statistically significant.

ModalAssociationRules.ItemType
const Item = SoleLogics.Formula

Fundamental type in the context of association rule mining. An Item is a logical formula, which can be SoleLogics.checked on models.

The purpose of association rule mining (ARM) is to discover interesting relations between Items, regrouped in Itemsets, to generate association rules (ARule).

Interestingness is established through a set of MeaningfulnessMeasure.

See also SoleLogics.check, gconfidence, lsupport, MeaningfulnessMeasure, SoleLogics.Formula.

source
ModalAssociationRules.ItemsetType
struct Itemset
    items::Vector{Item}
end

Collection of unique Items.

Given a MeaningfulnessMeasure meas and a threshold to be overpassed t, then an itemset itemset is said to be meaningful with respect to meas if and only if meas(itemset) > t.

Generally speaking, meaningfulness (or interestingness) of an itemset is directly correlated to its frequency in the data: intuitively, when a pattern is recurrent in data, then it is candidate to be interesting.

Every association rule mining algorithm aims to find frequent itemsets by applying meaningfulness measures such as local and global support, respectively lsupport and gsupport.

Frequent itemsets are then used to generate association rules (ARule).

Note

Despite being implemented as vector, an Itemset behaves like a set. Lookup is faster and the internal sorting of the items is essential to make mining algorithms work.

In other words, it is guaranteed that, if two Itemset are created with the same content, regardless of their items order, then their hash is the same.

See also ARule, gsupport, Item, lsupport, MeaningfulnessMeasure.

source

Notice that one Itemset could be a set, but actually it is a vector: this is because, often, ARM algorithms need to establish an order between items in itemsets to work efficiently. To convert an Itemset in its conjunctive normla form we simply call toformula.

In general, an Itemset behaves exactly like you would expect a Vector{Item} would do. At the end of the day, the only difference is that manipulating an Itemset, for example through push! or union, guarantees the wrapped items always keep the same sorting.

Enough about Itemsets. Our final goal is to produce association rules.

ModalAssociationRules.ARuleType
const ARule = Tuple{Itemset,Itemset}

An association rule represents a strong and meaningful co-occurrence relationship between two Itemsets, callend antecedent and consequent, whose intersection is empty.

Extracting all the ARule "hidden" in the data is the main purpose of ARM.

The general framework always followed by ARM techniques is to, firstly, generate all the frequent itemsets considering a set of MeaningfulnessMeasure specifically tailored to work with Itemsets. Thereafter, all the association rules are generated by testing all the combinations of frequent itemsets against another set of MeaningfulnessMeasure, this time designed to capture how "reliable" a rule is.

See also antecedent, consequent, gconfidence, Itemset, lconfidence, MeaningfulnessMeasure.

source

To print an ARule enriched with more informations (at the moment, this is everything we need to know), we can use the following.

Sometimes we could be interested in writing a function that consider a generic entity obtained through an association rule mining algorithm (frequent itemsets and, of course, association rules). Think about a dictionary mapping some extracted pattern to metadata. We call that generic entity "an ARM subject", and the following union type comes in help.

Measures

To establish when an ARMSubject is interesting, we need meaningfulness measures.

ModalAssociationRules.MeaningfulnessMeasureType
const MeaningfulnessMeasure = Tuple{Function, Threshold, Threshold}

In the classic propositional case scenario where each instance of a Logiset is composed of just a single world (it is a propositional interpretation), a meaningfulness measure is simply a function which measures how many times a property of an Itemset or an ARule is respected across all instances of the dataset.

In the context of modal logic, where the instances of a dataset are relational objects, every meaningfulness measure must capture two aspects: how much an Itemset or an ARule is meaningful inside an instance, and how much the same object is meaningful across all the instances.

For this reason, we can think of a meaningfulness measure as a matryoshka composed of an external global measure and an internal local measure. The global measure tests for how many instances a local measure overpass a local threshold. At the end of the process, a global threshold can be used to establish if the global measure is actually meaningful or not. (Note that this generalizes the propositional logic case scenario, where it is enough to just apply a measure across instances.)

Therefore, a MeaningfulnessMeasure is a tuple composed of a global meaningfulness measure, a local threshold and a global threshold.

See also gconfidence, gsupport, lconfidence, lsupport.

source
ModalAssociationRules.islocalofMethod
islocalof(::Function, ::Function)::Bool

Twin method of isglobalof.

Trait to indicate that a local meaningfulness measure is used as subroutine in a global measure.

For example, islocalof(lsupport, gsupport) is true, and isglobalof(gsupport, lsupport) is false.

Warning

When implementing a custom meaningfulness measure, make sure to implement both traits if necessary. This is fundamental to guarantee the correct behavior of some methods, such as getlocalthreshold.

See also getlocalthreshold, gsupport, isglobalof, lsupport.

source

The following are little data structures which will return useful later, when you will read about how a dataset is mined, looking for ARMSubjects.

What follows is a list of the already built-in meaningfulness measures. In the Hands on section you will learn how to implement your own measure.

Missing docstring.

Missing docstring for lsupport(itemset::Itemset, logi_instance::LogicalInstance; miner::Union{Nothing,Miner}=nothing). Check Documenter's build log for details.

Missing docstring.

Missing docstring for gsupport(itemset::Itemset, X::SupportedLogiset, threshold::Threshold; miner::Union{Nothing,Miner}=nothing). Check Documenter's build log for details.

Missing docstring.

Missing docstring for lconfidence(rule::ARule, logi_instance::LogicalInstance; miner::Union{Nothing,Miner} = nothing). Check Documenter's build log for details.

Missing docstring.

Missing docstring for gconfidence(rule::ARule, X::SupportedLogiset, threshold::Threshold; miner::Union{Nothing,Miner}=nothing). Check Documenter's build log for details.

Mining structures

Finally, we are ready to start mining. To do so, we need to create a Miner object. We just need to specify which dataset we are working with, together with a mining function, a vector of initial Items, and the `MeaningfulnessMeasures to establish ARMSubject interestingness.

ModalAssociationRules.MinerType
struct Miner{
    DATA<:AbstractDataset,
    MINALGO<:Function,
    I<:Item,
    IMEAS<:MeaningfulnessMeasure,
    RMEAS<:MeaningfulnessMeasure
}
    X::DATA                             # target dataset
    algorithm::MINALGO                  # algorithm used to perform extraction
    items::Vector{I}                    # items considered during the extraction

                                        # meaningfulness measures
    item_constrained_measures::Vector{IMEAS}
    rule_constrained_measures::Vector{RMEAS}

    freqitems::Vector{Itemset}          # collected frequent itemsets
    arules::Vector{ARule}               # collected association rules

    lmemo::LmeasMemo                    # local memoization structure
    gmemo::GmeasMemo                    # global memoization structure

    powerups::Powerup                   # mining algorithm powerups (see documentation)
    info::Info                          # general informations
end

Machine learning model interface to perform association rules extraction.

Examples

julia> using ModalAssociationRules
julia> using SoleData

# Load NATOPS DataFrame
julia> X_df, y = load_arff_dataset("NATOPS");

# Convert NATOPS DataFrame to a Logiset
julia> X = scalarlogiset(X_df)

# Prepare some propositional atoms
julia> p = Atom(ScalarCondition(UnivariateMin(1), >, -0.5))
julia> q = Atom(ScalarCondition(UnivariateMin(2), <=, -2.2))
julia> r = Atom(ScalarCondition(UnivariateMin(3), >, -3.6))

# Prepare modal atoms using later relationship - see [`SoleLogics.IntervalRelation`](@ref))
julia> lp = box(IA_L)(p)
julia> lq = diamond(IA_L)(q)
julia> lr = boxlater(r)

# Compose a vector of items, regrouping the atoms defined before
julia> manual_alphabet = Vector{Item}([p, q, r, lp, lq, lr])

# Create an association rule miner wrapping `fpgrowth` algorithm - see [`fpgrowth`](@ref);
# note that meaningfulness measures are not explicited and, thus, are defaulted as in the
# call below.
julia> miner = Miner(X, fpgrowth(), manual_alphabet)

# Create an association rule miner, expliciting global meaningfulness measures with their
# local and global thresholds, both for [`Itemset`](@ref)s and [`ARule`](@ref).
julia> miner = Miner(X, fpgrowth(), manual_alphabet,
    [(gsupport, 0.1, 0.1)], [(gconfidence, 0.2, 0.2)])

# Consider the dataset and learning algorithm wrapped by `miner` (resp., `X` and `fpgrowth`)
# Mine the frequent itemsets, that is, those for which item measures are large enough.
# Then iterate the generator returned by [`mine`](@ref) to enumerate association rules.
julia> for arule in ModalAssociationRules.mine!(miner)
    println(miner)
end

See also ARule, apriori, MeaningfulnessMeasure, Itemset, GmeasMemo, LmeasMemo.

source

Let us see which getters and setters are available for Miner.

Missing docstring.

Missing docstring for getlocalthreshold_integer(miner::Miner, meas::Function,contributorslength::Int64). Check Documenter's build log for details.

Missing docstring.

Missing docstring for getglobalthreshold_integer(miner::Miner, meas::Function, ninstances::Int64). Check Documenter's build log for details.

After a Miner ends mining (we will see how to mine in a second), frequent Itemsets and ARule are accessibles through the getters below.

Here is how to start mining.

The mining call returns an ARule generator. Since the extracted rules could be several, it's up to you to collect all the rules in a step or analyze them lazily, collecting them one at a time. You can also call the mining function ignoring it's return value, and then generate the rules later by calling the following.

During both the mining and the rules generation phases, the values returned by MeaningfulnessMeasure applied on a certain ARMSubject are saved (memoized) inside the Miner. Thanks to the methods hereafter, a Miner can avoid useless recomputations.

Miner customization

A Miner also contains two fields to keep additional informations, those are info and powerups.

The info field in Miner is a dictionary used to store extra informations about the miner, such as statistics about mining. Currently, since the package is still being developed, the info field only contains a flag indicating whether the miner has been used for mining or no.

ModalAssociationRules.infoMethod
info(miner::Miner)::Powerup
info(miner::Miner, key::Symbol)

Getter for the entire additional informations field inside a miner, or one of its specific entries.

See also Miner, Powerup.

source

When writing your own mining algorithm, or when mining with a particular kind of dataset, you might need to specialize the Miner, keeping, for example, custom meta data and data structures. To specialize a Miner, you can fill a Powerup structure to fit your needs.