Getting started
In this introductory section you will learn about the main building blocks of ModalAssociationRules.jl. Also if a good picture about association rule mining (ARM, from now onwards) is given during the documentation, to make the most out of this guide we suggest to read the following articles:
Those up above introduce two important algorithms, which are also built-in in this package. Moreover, the latter one is the state-of-the-art in the field of ARM.
Further on in the documentation, the potential of ModalAssociationRules.jl will emerge: this package's raison d'être is to generalize the already existing ARM algorithms to modal logics, which are more expressive than propositional one and computationally less expensive than first order logic. If you are new to Sole.jl and you want to learn more about modal logic, please have a look at SoleLogics.jl for a general overview on the topic, or follow this documentation and return to this link if needed.
Core definitions
One Item
is just a logical formula, which can be interpreted by a certain model. At the moment, here, we don't care about how models are represented by Sole.jl under the hood, or how the checking algorithm works: what matters is that Item
s are manipulated by ARM algorithms, which try to find which conjunctions between items are most statistically significant.
ModalAssociationRules.Item
— Typeconst Item = SoleLogics.Formula
Fundamental type in the context of association rule mining. An Item
is a logical formula, which can be SoleLogics.check
ed on models.
The purpose of association rule mining (ARM) is to discover interesting relations between Item
s, regrouped in Itemset
s, to generate association rules (ARule
).
Interestingness is established through a set of MeaningfulnessMeasure
.
See also SoleLogics.check
, gconfidence
, lsupport
, MeaningfulnessMeasure
, SoleLogics.Formula
.
ModalAssociationRules.Itemset
— Typestruct Itemset
items::Vector{Item}
end
Collection of unique Item
s.
Given a MeaningfulnessMeasure
meas
and a threshold to be overpassed t
, then an itemset itemset
is said to be meaningful with respect to meas
if and only if meas(itemset) > t
.
Generally speaking, meaningfulness (or interestingness) of an itemset is directly correlated to its frequency in the data: intuitively, when a pattern is recurrent in data, then it is candidate to be interesting.
Every association rule mining algorithm aims to find frequent itemsets by applying meaningfulness measures such as local and global support, respectively lsupport
and gsupport
.
Frequent itemsets are then used to generate association rules (ARule
).
Despite being implemented as vector, an Itemset
behaves like a set. Lookup is faster and the internal sorting of the items is essential to make mining algorithms work.
In other words, it is guaranteed that, if two Itemset
are created with the same content, regardless of their items order, then their hash is the same.
See also ARule
, gsupport
, Item
, lsupport
, MeaningfulnessMeasure
.
Notice that one Itemset
could be a set, but actually it is a vector: this is because, often, ARM algorithms need to establish an order between items in itemsets to work efficiently. To convert an Itemset
in its conjunctive normla form we simply call toformula
.
ModalAssociationRules.toformula
— Functiontoformula(itemset::Itemset)
Conjunctive normal form of the Item
s contained in itemset
.
See also Item
, Itemset
, SoleLogics.LeftmostConjunctiveForm
In general, an Itemset
behaves exactly like you would expect a Vector{Item}
would do. At the end of the day, the only difference is that manipulating an Itemset
, for example through push!
or union
, guarantees the wrapped items always keep the same sorting.
Enough about Itemset
s. Our final goal is to produce association rules.
ModalAssociationRules.ARule
— Typeconst ARule = Tuple{Itemset,Itemset}
An association rule represents a strong and meaningful co-occurrence relationship between two Itemset
s, callend antecedent
and consequent
, whose intersection is empty.
Extracting all the ARule
"hidden" in the data is the main purpose of ARM.
The general framework always followed by ARM techniques is to, firstly, generate all the frequent itemsets considering a set of MeaningfulnessMeasure
specifically tailored to work with Itemset
s. Thereafter, all the association rules are generated by testing all the combinations of frequent itemsets against another set of MeaningfulnessMeasure
, this time designed to capture how "reliable" a rule is.
See also antecedent
, consequent
, gconfidence
, Itemset
, lconfidence
, MeaningfulnessMeasure
.
ModalAssociationRules.content
— Methodcontent(rule::ARule)::Tuple{Itemset,Itemset}
Getter for the content of an ARule
, that is, both its antecedent
and its consequent
.
See also antecedent
, ARule
, consequent
, Itemset
.
ModalAssociationRules.antecedent
— MethodModalAssociationRules.consequent
— MethodTo print an ARule
enriched with more informations (at the moment, this is everything we need to know), we can use the following.
ModalAssociationRules.analyze
— Methodanalyze(arule::ARule, miner::Miner; io::IO=stdout, localities=false)
Print an ARule
analysis to the console, including related meaningfulness measures values.
Sometimes we could be interested in writing a function that consider a generic entity obtained through an association rule mining algorithm (frequent itemsets and, of course, association rules). Think about a dictionary mapping some extracted pattern to metadata. We call that generic entity "an ARM subject", and the following union type comes in help.
ModalAssociationRules.ARMSubject
— TypeARMSubject = Union{ARule,Itemset}
Memoizable types for association rule mining (ARM).
See also GmeasMemo
, GmeasMemoKey
, LmeasMemo
, LmeasMemoKey
.
Measures
To establish when an ARMSubject
is interesting, we need meaningfulness measures.
ModalAssociationRules.Threshold
— Typeconst Threshold = Float64
Threshold value for meaningfulness measures.
See also gconfidence
, gsupport
, lconfidence
, lsupport
.
ModalAssociationRules.MeaningfulnessMeasure
— Typeconst MeaningfulnessMeasure = Tuple{Function, Threshold, Threshold}
In the classic propositional case scenario where each instance of a Logiset
is composed of just a single world (it is a propositional interpretation), a meaningfulness measure is simply a function which measures how many times a property of an Itemset
or an ARule
is respected across all instances of the dataset.
In the context of modal logic, where the instances of a dataset are relational objects, every meaningfulness measure must capture two aspects: how much an Itemset
or an ARule
is meaningful inside an instance, and how much the same object is meaningful across all the instances.
For this reason, we can think of a meaningfulness measure as a matryoshka composed of an external global measure and an internal local measure. The global measure tests for how many instances a local measure overpass a local threshold. At the end of the process, a global threshold can be used to establish if the global measure is actually meaningful or not. (Note that this generalizes the propositional logic case scenario, where it is enough to just apply a measure across instances.)
Therefore, a MeaningfulnessMeasure
is a tuple composed of a global meaningfulness measure, a local threshold and a global threshold.
See also gconfidence
, gsupport
, lconfidence
, lsupport
.
ModalAssociationRules.islocalof
— Methodislocalof(::Function, ::Function)::Bool
Twin method of isglobalof
.
Trait to indicate that a local meaningfulness measure is used as subroutine in a global measure.
For example, islocalof(lsupport, gsupport)
is true
, and isglobalof(gsupport, lsupport)
is false
.
When implementing a custom meaningfulness measure, make sure to implement both traits if necessary. This is fundamental to guarantee the correct behavior of some methods, such as getlocalthreshold
.
See also getlocalthreshold
, gsupport
, isglobalof
, lsupport
.
ModalAssociationRules.localof
— Methodlocalof(::Function)
Return the local measure associated with the given one.
See also islocalof
, isglobalof
, globalof
.
ModalAssociationRules.isglobalof
— Methodisglobalof(::Function, ::Function)::Bool
Twin trait of islocalof
.
See also getlocalthreshold
, gsupport
, islocalof
, lsupport
.
ModalAssociationRules.globalof
— Methodglobalof(::Function) = nothing
Return the global measure associated with the given one.
See also islocalof
, isglobalof
, localof
.
The following are little data structures which will return useful later, when you will read about how a dataset is mined, looking for ARMSubject
s.
ModalAssociationRules.LmeasMemoKey
— Typeconst LmeasMemoKey = Tuple{Symbol,ARMSubject,Int64}
Key of a LmeasMemo
dictionary. Represents a local meaningfulness measure name (as a Symbol
), a ARMSubject
, and the number of a dataset instance where the measure is applied.
See also LmeasMemo
, ARMSubject
.
ModalAssociationRules.LmeasMemo
— Typeconst LmeasMemo = Dict{LmeasMemoKey,Threshold}
Association between a local measure of a ARMSubject
on a specific dataset instance, and its value.
See also LmeasMemoKey
, ARMSubject
.
ModalAssociationRules.GmeasMemoKey
— Typeconst GmeasMemoKey = Tuple{Symbol,ARMSubject}
Key of a GmeasMemo
dictionary. Represents a global meaningfulness measure name (as a Symbol
) and a ARMSubject
.
See also GmeasMemo
, ARMSubject
.
ModalAssociationRules.GmeasMemo
— Typeconst GmeasMemo = Dict{GmeasMemoKey,Threshold}
Association between a global measure of a ARMSubject
on a dataset, and its value.
The reference to the dataset is not explicited here, since GmeasMemo
is intended to be used as a memoization structure inside Miner
objects, and the latter already knows the dataset they are working with.
See also GmeasMemoKey
, ARMSubject
.
What follows is a list of the already built-in meaningfulness measures. In the Hands on
section you will learn how to implement your own measure.
Missing docstring for lsupport(itemset::Itemset, logi_instance::LogicalInstance; miner::Union{Nothing,Miner}=nothing)
. Check Documenter's build log for details.
Missing docstring for gsupport(itemset::Itemset, X::SupportedLogiset, threshold::Threshold; miner::Union{Nothing,Miner}=nothing)
. Check Documenter's build log for details.
Missing docstring for lconfidence(rule::ARule, logi_instance::LogicalInstance; miner::Union{Nothing,Miner} = nothing)
. Check Documenter's build log for details.
Missing docstring for gconfidence(rule::ARule, X::SupportedLogiset, threshold::Threshold; miner::Union{Nothing,Miner}=nothing)
. Check Documenter's build log for details.
Mining structures
Finally, we are ready to start mining. To do so, we need to create a Miner
object. We just need to specify which dataset we are working with, together with a mining function, a vector of initial Item
s, and the `MeaningfulnessMeasures to establish ARMSubject
interestingness.
ModalAssociationRules.Miner
— Typestruct Miner{
DATA<:AbstractDataset,
MINALGO<:Function,
I<:Item,
IMEAS<:MeaningfulnessMeasure,
RMEAS<:MeaningfulnessMeasure
}
X::DATA # target dataset
algorithm::MINALGO # algorithm used to perform extraction
items::Vector{I} # items considered during the extraction
# meaningfulness measures
item_constrained_measures::Vector{IMEAS}
rule_constrained_measures::Vector{RMEAS}
freqitems::Vector{Itemset} # collected frequent itemsets
arules::Vector{ARule} # collected association rules
lmemo::LmeasMemo # local memoization structure
gmemo::GmeasMemo # global memoization structure
powerups::Powerup # mining algorithm powerups (see documentation)
info::Info # general informations
end
Machine learning model interface to perform association rules extraction.
Examples
julia> using ModalAssociationRules
julia> using SoleData
# Load NATOPS DataFrame
julia> X_df, y = load_arff_dataset("NATOPS");
# Convert NATOPS DataFrame to a Logiset
julia> X = scalarlogiset(X_df)
# Prepare some propositional atoms
julia> p = Atom(ScalarCondition(UnivariateMin(1), >, -0.5))
julia> q = Atom(ScalarCondition(UnivariateMin(2), <=, -2.2))
julia> r = Atom(ScalarCondition(UnivariateMin(3), >, -3.6))
# Prepare modal atoms using later relationship - see [`SoleLogics.IntervalRelation`](@ref))
julia> lp = box(IA_L)(p)
julia> lq = diamond(IA_L)(q)
julia> lr = boxlater(r)
# Compose a vector of items, regrouping the atoms defined before
julia> manual_alphabet = Vector{Item}([p, q, r, lp, lq, lr])
# Create an association rule miner wrapping `fpgrowth` algorithm - see [`fpgrowth`](@ref);
# note that meaningfulness measures are not explicited and, thus, are defaulted as in the
# call below.
julia> miner = Miner(X, fpgrowth(), manual_alphabet)
# Create an association rule miner, expliciting global meaningfulness measures with their
# local and global thresholds, both for [`Itemset`](@ref)s and [`ARule`](@ref).
julia> miner = Miner(X, fpgrowth(), manual_alphabet,
[(gsupport, 0.1, 0.1)], [(gconfidence, 0.2, 0.2)])
# Consider the dataset and learning algorithm wrapped by `miner` (resp., `X` and `fpgrowth`)
# Mine the frequent itemsets, that is, those for which item measures are large enough.
# Then iterate the generator returned by [`mine`](@ref) to enumerate association rules.
julia> for arule in ModalAssociationRules.mine!(miner)
println(miner)
end
See also ARule
, apriori
, MeaningfulnessMeasure
, Itemset
, GmeasMemo
, LmeasMemo
.
Let us see which getters and setters are available for Miner
.
ModalAssociationRules.dataset
— Methoddataset(miner::Miner)::AbstractDataset
Getter for the dataset wrapped by miner
s.
ModalAssociationRules.algorithm
— MethodModalAssociationRules.items
— MethodModalAssociationRules.measures
— Methodmeasures(miner::Miner)::Vector{<:MeaningfulnessMeasure}
Return all the MeaningfulnessMeasures
wrapped by miner
.
See also MeaningfulnessMeasure
, Miner
.
ModalAssociationRules.findmeasure
— Methodfindmeasure(
miner::Miner,
meas::Function;
recognizer::Function=islocalof
)::MeaningfulnessMeasure
Retrieve the MeaningfulnessMeasure
associated with meas
.
See also isglobalof
, islocalof
, MeaningfulnessMeasure
, Miner
.
ModalAssociationRules.itemsetmeasures
— Methoditemsetmeasures(miner::Miner)::Vector{<:MeaningfulnessMeasure}
Return the MeaningfulnessMeasure
s tailored to work with Itemset
s, loaded inside miner
.
See Itemset
, MeaningfulnessMeasure
, Miner
.
ModalAssociationRules.additemmeas
— Methodadditemmeas(miner::Miner, measure::MeaningfulnessMeasure)
Add a new measure
to miner
's itemsetmeasures
.
See also addrulemeas
, Miner
, rulemeasures
.
ModalAssociationRules.rulemeasures
— Methodrulemeasures(miner::Miner)::Vector{<:MeaningfulnessMeasure}
Return the MeaningfulnessMeasure
s tailored to work with ARule
s, loaded inside miner
.
See Miner
, ARule
, MeaningfulnessMeasure
.
ModalAssociationRules.addrulemeas
— Methodaddrulemeas(miner::Miner, measure::MeaningfulnessMeasure)
Add a new measure
to miner
's rulemeasures
.
See also itemsetmeasures
, Miner
, rulemeasures
.
ModalAssociationRules.getlocalthreshold
— Methodgetlocalthreshold(miner::Miner, meas::Function)::Threshold
Getter for the Threshold
associated with the function wrapped by some MeaningfulnessMeasure
tailored to work locally (that is, analyzing "the inside" of a dataset's instances) in miner
.
See Miner
, MeaningfulnessMeasure
, Threshold
.
Missing docstring for getlocalthreshold_integer(miner::Miner, meas::Function,contributorslength::Int64)
. Check Documenter's build log for details.
ModalAssociationRules.getglobalthreshold
— Methodgetglobalthreshold(miner::Miner, meas::Function)::Threshold
Getter for the Threshold
associated with the function wrapped by some MeaningfulnessMeasure
tailored to work globally (that is, measuring the behavior of a specific local-measure across all dataset's instances) in miner
.
See Miner
, MeaningfulnessMeasure
, Threshold
.
Missing docstring for getglobalthreshold_integer(miner::Miner, meas::Function, ninstances::Int64)
. Check Documenter's build log for details.
After a Miner
ends mining (we will see how to mine in a second), frequent Itemset
s and ARule
are accessibles through the getters below.
ModalAssociationRules.freqitems
— MethodModalAssociationRules.arules
— MethodHere is how to start mining.
ModalAssociationRules.mine!
— Methodmine!(miner::Miner)
Synonym for ModalAssociationRules.apply!(miner, dataset(miner))
.
See also ARule
, Itemset
, ModalAssociationRules.apply
.
ModalAssociationRules.apply!
— Methodapply!(miner::Miner, X::AbstractDataset)
Extract association rules in the dataset referenced by miner
, saving the interesting Itemset
s inside miner
. Then, return a generator of ARule
s.
The mining call returns an ARule
generator. Since the extracted rules could be several, it's up to you to collect all the rules in a step or analyze them lazily, collecting them one at a time. You can also call the mining function ignoring it's return value, and then generate the rules later by calling the following.
ModalAssociationRules.generaterules!
— Methodgeneraterules!(miner::Miner; kwargs...)
Return a generator of ARule
s, given an already trained Miner
.
During both the mining and the rules generation phases, the values returned by MeaningfulnessMeasure
applied on a certain ARMSubject
are saved (memoized) inside the Miner
. Thanks to the methods hereafter, a Miner
can avoid useless recomputations.
ModalAssociationRules.localmemo
— Methodlocalmemo(miner::Miner)::LmeasMemo
localmemo(miner::Miner, key::LmeasMemoKey)
Return the local memoization structure inside miner
, or a specific entry if a LmeasMemoKey
is provided.
See also Miner
, LmeasMemo
, LmeasMemoKey
.
ModalAssociationRules.localmemo!
— Methodlocalmemo!(miner::Miner, key::LmeasMemoKey, val::Threshold)
Setter for a specific entry key
inside the local memoization structure wrapped by miner
.
See also Miner
, LmeasMemo
, LmeasMemoKey
.
ModalAssociationRules.globalmemo
— Methodglobalmemo(miner::Miner)::GmeasMemo
globalmemo(miner::Miner, key::GmeasMemoKey)
Return the global memoization structure inside miner
, or a specific entry if a GmeasMemoKey
is provided.
See also Miner
, GmeasMemo
, GmeasMemoKey
.
ModalAssociationRules.globalmemo!
— Methodglobalmemo!(miner::Miner, key::GmeasMemoKey, val::Threshold)
Setter for a specific entry key
inside the global memoization structure wrapped by miner
.
See also Miner
, GmeasMemo
, GmeasMemoKey
.
Miner customization
A Miner
also contains two fields to keep additional informations, those are info
and powerups
.
The info
field in Miner
is a dictionary used to store extra informations about the miner, such as statistics about mining. Currently, since the package is still being developed, the info
field only contains a flag indicating whether the miner
has been used for mining or no.
ModalAssociationRules.Info
— Typeconst Info = Dict{Symbol,Any}
Generic setting storage inside Miner
structures.
ModalAssociationRules.info
— Methodinfo(miner::Miner)::Powerup
info(miner::Miner, key::Symbol)
Getter for the entire additional informations field inside a miner
, or one of its specific entries.
ModalAssociationRules.info!
— Methodinfo!(miner::Miner, key::Symbol, val)
Setter for the content of a specific field of miner
's info
.
ModalAssociationRules.hasinfo
— Methodhasinfo(miner::Miner, key::Symbol)
Return whether miner
additional informations field contains an entry key
.
See also Miner
.
When writing your own mining algorithm, or when mining with a particular kind of dataset, you might need to specialize the Miner
, keeping, for example, custom meta data and data structures. To specialize a Miner
, you can fill a Powerup
structure to fit your needs.
ModalAssociationRules.Powerup
— Typeconst Powerup = Dict{Symbol,Any}
Additional informations associated with an ARMSubject
that can be used to specialize a Miner
, augmenting its capabilities.
To understand how to specialize a Miner
, see haspowerup
, initpowerups
, 'powerups`, powerups!
.
ModalAssociationRules.powerups
— Methodpowerups(miner::Miner)::Powerup
powerups(miner::Miner, key::Symbol)
Getter for the entire powerups structure currently loaded in miner
, or a specific powerup.
See also haspowerup
, initpowerups
, Miner
, Powerup
.
ModalAssociationRules.powerups!
— Methodpowerups!(miner::Miner, key::Symbol, val)
Setter for the content of a specific field of miner
's powerups
.
See also haspowerup
, initpowerups
, Miner
, Powerup
.
ModalAssociationRules.haspowerup
— Methodhaspowerup(miner::Miner, key::Symbol)
Return whether miner
powerups field contains an entry key
.
ModalAssociationRules.initpowerups
— Methodinitpowerups(::Function, ::AbstractDataset)
This defines how Miner
's powerup
field is filled to optimize the mining.