Manipulation
Modalities
MultiData.addmodality!
— Methodaddmodality!(md, indices)
addmodality!(md, index)
addmodality!(md, variable_names)
addmodality!(md, variable_name)
Create a new modality in a multimodal dataset using variables at indices
or index
, and return the dataset itself.
Alternatively to the indices
and the index
, the variable name(s) can be used.
Note: to add a new modality with new variables see insertmodality!
.
Arguments
md
is aMultiDataset
;indices
is anAbstractVector{Integer}
that indicates which indices of the multimodal dataset's corresponding dataframe to add to the new modality;index
is anInteger
that indicates the index of the multimodal dataset's corresponding dataframe to add to the new modality;variable_names
is anAbstractVector{Symbol}
that indicates which variables of the multimodal dataset's corresponding dataframe to add to the new modality;variable_name
is aSymbol
that indicates the variable of the multimodal dataset's corresponding dataframe to add to the new modality;
Examples
julia> df = DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])
2×5 DataFrame
Row │ name age sex height weight
│ String Int64 Char Int64 Int64
─────┼─────────────────────────────────────
1 │ Python 25 M 180 80
2 │ Julia 26 F 175 60
julia> md = MultiDataset([[1]], df)
● MultiDataset
└─ dimensionalities: (0,)
- Modality 1 / 1
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Spare variables
└─ dimensionality: 0
2×4 SubDataFrame
Row │ age sex height weight
│ Int64 Char Int64 Int64
─────┼─────────────────────────────
1 │ 25 M 180 80
2 │ 26 F 175 60
julia> addmodality!(md, [:age, :sex])
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ age sex
│ Int64 Char
─────┼─────────────
1 │ 25 M
2 │ 26 F
- Spare variables
└─ dimensionality: 0
2×2 SubDataFrame
Row │ height weight
│ Int64 Int64
─────┼────────────────
1 │ 180 80
2 │ 175 60
julia> addmodality!(md, 5)
● MultiDataset
└─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 3
└─ dimensionality: 0
2×2 SubDataFrame
Row │ age sex
│ Int64 Char
─────┼─────────────
1 │ 25 M
2 │ 26 F
- Modality 3 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ height
│ Int64
─────┼────────
1 │ 180
2 │ 175
MultiData.addvariable_tomodality!
— Methodaddvariable_tomodality!(md, i_modality, var_index)
addvariable_tomodality!(md, i_modality, var_indices)
addvariable_tomodality!(md, i_modality, var_name)
addvariable_tomodality!(md, i_modality, var_names)
Add variable at index var_index
to the modality at index i_modality
in a multimodal dataset, and return the dataset. Alternatively to var_index
the variable name can be used. Multiple variables can be inserted into the multimodal dataset at once using var_indices
or var_inames
.
Note: The function does not allow you to add a variable to a new modality, but only to add it to an existing modality. To add a new modality use addmodality!
instead.
Arguments
md
is aMultiDataset
;i_modality
is anInteger
indicating the modality in which the variable(s) will be added;var_index
is anInteger
that indicates the index of the variable to add to a specific modality of the multimodal dataset;var_indices
is anAbstractVector{Integer}
indicating the indices of the variables to add to a specific modality of the multimodal dataset;var_name
is aSymbol
indicating the name of the variable to add to a specific modality of the multimodal dataset;var_names
is anAbstractVector{Symbol}
indicating the name of the variables to add to a specific modality of the multimodal dataset;
Examples
julia> df = DataFrame(:name => ["Python", "Julia"],
:age => [25, 26],
:sex => ['M', 'F'],
:height => [180, 175],
:weight => [80, 60])
)
2×5 DataFrame
Row │ name age sex height weight
│ String Int64 Char Int64 Int64
─────┼─────────────────────────────────────
1 │ Python 25 M 180 80
2 │ Julia 26 F 175 60
julia> md = MultiDataset([[1, 2],[3]], df)
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Spare variables
└─ dimensionality: 0
2×2 SubDataFrame
Row │ height weight
│ Int64 Int64
─────┼────────────────
1 │ 180 80
2 │ 175 60
julia> addvariable_tomodality!(md, 1, [4,5])
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×4 SubDataFrame
Row │ name age height weight
│ String Int64 Int64 Int64
─────┼───────────────────────────────
1 │ Python 25 180 80
2 │ Julia 26 175 60
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> addvariable_tomodality!(md, 2, [:name,:weight])
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×4 SubDataFrame
Row │ name age height weight
│ String Int64 Int64 Int64
─────┼───────────────────────────────
1 │ Python 25 180 80
2 │ Julia 26 175 60
- Modality 2 / 2
└─ dimensionality: 0
2×3 SubDataFrame
Row │ sex name weight
│ Char String Int64
─────┼──────────────────────
1 │ M Python 80
2 │ F Julia 60
MultiData.dropmodalities!
— Methoddropmodalities!(md, indices)
dropmodalities!(md, index)
Remove the i
-th modality from a multimodal dataset while dropping all variables in it, and return the dataset itself.
Note: if the dropped variables are contained in other modalities they will also be removed from them. This can lead to the removal of additional modalities other than the i
-th.
If the intention is to remove a modality without dropping the variables use removemodality!
instead.
Arguments
md
is aMultiDataset
;index
is anInteger
indicating the index of the modality to drop;indices
is anAbstractVector{Integer}
indicating the indices of the modalities to drop.
Examples
julia> df = DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])
2×5 DataFrame
Row │ name age sex height weight
│ String Int64 Char Int64 Int64
─────┼─────────────────────────────────────
1 │ Python 25 M 180 80
2 │ Julia 26 F 175 60
julia> md = MultiDataset([[1, 2],[3,4],[5],[2,3]], df)
● MultiDataset
└─ dimensionalities: (0, 0, 0, 0)
- Modality 1 / 4
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 4
└─ dimensionality: 0
2×2 SubDataFrame
Row │ sex height
│ Char Int64
─────┼──────────────
1 │ M 180
2 │ F 175
- Modality 3 / 4
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
- Modality 4 / 4
└─ dimensionality: 0
2×2 SubDataFrame
Row │ age sex
│ Int64 Char
─────┼─────────────
1 │ 25 M
2 │ 26 F
julia> dropmodalities!(md, [2,3])
[ Info: Variable 3 was last variable of modality 2: removing modality
[ Info: Variable 3 was last variable of modality 2: removing modality
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
julia> dropmodalities!(md, 2)
[ Info: Variable 2 was last variable of modality 2: removing modality
● MultiDataset
└─ dimensionalities: (0,)
- Modality 1 / 1
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
MultiData.eachmodality
— Methodeachmodality(md)
Return a (lazy) iterator to the modalities of a multimodal dataset.
MultiData.insertmodality!
— Functioninsertmodality!(md, col, new_modality, existing_variables)
insertmodality!(md, new_modality, existing_variables)
Insert new_modality
as new modality to multimodal dataset, and return the dataset. Existing variables can be added to the new modality while adding it to the dataset by passing the corresponding indices as existing_variables
. If col
is specified then the variables will be inserted starting at index col
.
Arguments
md
is aMultiDataset
;col
is anInteger
indicating the column in which to insert the columns ofnew_modality
;new_modality
is anAbstractDataFrame
which will be added to the multimodal dataset as a sub-dataframe of a new modality;existing_variables
is anAbstractVector{Integer}
orAbstractVector{Symbol}
. It indicates which variables of the multimodal dataset internal dataframe structure to insert in the new modality.
Examples
julia> df = DataFrame(
:name => ["Python", "Julia"],
:stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
)
2×2 DataFrame
Row │ name stat1
│ String Array…
─────┼───────────────────────────────────────────
1 │ Python [0.841471, 0.909297, 0.14112, -0…
2 │ Julia [0.540302, -0.416147, -0.989992,…
julia> md = MultiDataset(df; group = :all)
● MultiDataset
└─ dimensionalities: (0, 1)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 2
└─ dimensionality: 1
2×1 SubDataFrame
Row │ stat1
│ Array…
─────┼───────────────────────────────────
1 │ [0.841471, 0.909297, 0.14112, -0…
2 │ [0.540302, -0.416147, -0.989992,…
julia> insertmodality!(md, DataFrame(:age => [30, 9]))
● MultiDataset
└─ dimensionalities: (0, 1, 0)
- Modality 1 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 3
└─ dimensionality: 1
2×1 SubDataFrame
Row │ stat1
│ Array…
─────┼───────────────────────────────────
1 │ [0.841471, 0.909297, 0.14112, -0…
2 │ [0.540302, -0.416147, -0.989992,…
- Modality 3 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 30
2 │ 9
julia> md.data
2×3 DataFrame
Row │ name stat1 age
│ String Array… Int64
─────┼──────────────────────────────────────────────────
1 │ Python [0.841471, 0.909297, 0.14112, -0… 30
2 │ Julia [0.540302, -0.416147, -0.989992,… 9
or, selecting the column
julia> df = DataFrame(
:name => ["Python", "Julia"],
:stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
)
2×2 DataFrame
Row │ name stat1
│ String Array…
─────┼───────────────────────────────────────────
1 │ Python [0.841471, 0.909297, 0.14112, -0…
2 │ Julia [0.540302, -0.416147, -0.989992,…
julia> md = MultiDataset(df; group = :all)
● MultiDataset
└─ dimensionalities: (0, 1)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 2
└─ dimensionality: 1
2×1 SubDataFrame
Row │ stat1
│ Array…
─────┼───────────────────────────────────
1 │ [0.841471, 0.909297, 0.14112, -0…
2 │ [0.540302, -0.416147, -0.989992,…
julia> insertmodality!(md, 2, DataFrame(:age => [30, 9]))
● MultiDataset
└─ dimensionalities: (1, 0)
- Modality 1 / 2
└─ dimensionality: 1
2×1 SubDataFrame
Row │ stat1
│ Array…
─────┼───────────────────────────────────
1 │ [0.841471, 0.909297, 0.14112, -0…
2 │ [0.540302, -0.416147, -0.989992,…
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 30
2 │ 9
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
julia> md.data
2×3 DataFrame
Row │ name age stat1
│ String Int64 Array…
─────┼──────────────────────────────────────────────────
1 │ Python 30 [0.841471, 0.909297, 0.14112, -0…
2 │ Julia 9 [0.540302, -0.416147, -0.989992,…
or, adding an existing variable:
julia> df = DataFrame(
:name => ["Python", "Julia"],
:stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
)
2×2 DataFrame
Row │ name stat1
│ String Array…
─────┼───────────────────────────────────────────
1 │ Python [0.841471, 0.909297, 0.14112, -0…
2 │ Julia [0.540302, -0.416147, -0.989992,…
julia> md = MultiDataset([[2]], df)
● MultiDataset
└─ dimensionalities: (1,)
- Modality 1 / 1
└─ dimensionality: 1
2×1 SubDataFrame
Row │ stat1
│ Array…
─────┼───────────────────────────────────
1 │ [0.841471, 0.909297, 0.14112, -0…
2 │ [0.540302, -0.416147, -0.989992,…
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
julia> insertmodality!(md, DataFrame(:age => [30, 9]); existing_variables = [1])
● MultiDataset
└─ dimensionalities: (1, 0)
- Modality 1 / 2
└─ dimensionality: 1
2×1 SubDataFrame
Row │ stat1
│ Array…
─────┼───────────────────────────────────
1 │ [0.841471, 0.909297, 0.14112, -0…
2 │ [0.540302, -0.416147, -0.989992,…
- Modality 2 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ age name
│ Int64 String
─────┼───────────────
1 │ 30 Python
2 │ 9 Julia
MultiData.keeponlymodalities!
— MethodTODO
MultiData.modality
— Methodmodality(md, i)
Return the i
-th modality of a multimodal dataset.
modality(md, indices)
Return a Vector
of modalities at indices
of a multimodal dataset.
MultiData.nmodalities
— Methodnmodalities(md)
Return the number of modalities of a multimodal dataset.
MultiData.removemodality!
— Methodremovemodality!(md, indices)
removemodality!(md, index)
Remove i
-th modality from a multimodal dataset, and return the dataset.
Note: to completely remove a modality and all variables in it use dropmodalities!
instead.
Arguments
md
is aMultiDataset
;index
is anInteger
that indicates which modality to remove from the multimodal dataset;indices
is anAbstractVector{Integer}
that indicates the modalities to remove from the multimodal dataset;
Examples
julia> df = DataFrame(:name => ["Python", "Julia"],
:age => [25, 26],
:sex => ['M', 'F'],
:height => [180, 175],
:weight => [80, 60])
)
2×5 DataFrame
Row │ name age sex height weight
│ String Int64 Char Int64 Int64
─────┼─────────────────────────────────────
1 │ Python 25 M 180 80
2 │ Julia 26 F 175 60
julia> md = MultiDataset([[1, 2],[3],[4],[5]], df)
● MultiDataset
└─ dimensionalities: (0, 0, 0, 0)
- Modality 1 / 4
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 4
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Modality 3 / 4
└─ dimensionality: 0
2×1 SubDataFrame
Row │ height
│ Int64
─────┼────────
1 │ 180
2 │ 175
- Modality 4 / 4
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
julia> removemodality!(md, [3])
● MultiDataset
└─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Modality 3 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ height
│ Int64
─────┼────────
1 │ 180
2 │ 175
julia> removemodality!(md, [1,2])
● MultiDataset
└─ dimensionalities: (0,)
- Modality 1 / 1
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
- Spare variables
└─ dimensionality: 0
2×4 SubDataFrame
Row │ name age sex height
│ String Int64 Char Int64
─────┼─────────────────────────────
1 │ Python 25 M 180
2 │ Julia 26 F 175
MultiData.removevariable_frommodality!
— Methodremovevariable_frommodality!(md, i_modality, var_indices)
removevariable_frommodality!(md, i_modality, var_index)
removevariable_frommodality!(md, i_modality, var_name)
removevariable_frommodality!(md, i_modality, var_names)
Remove variable at index var_index
from the modality at index i_modality
in a multimodal dataset, and return the dataset itself.
Alternatively to var_index
the variable name can be used. Multiple variables can be dropped from the multimodal dataset at once, by passing a Vector
of Symbols
(for names), or a Vector
of integers (for indices) as a last argument.
Note: when all variables are dropped from a modality, it will be removed.
Arguments
md
is aMultiDataset
;i_modality
is anInteger
indicating the modality in which the variable(s) will be dropped;var_index
is anInteger
that indicates the index of the variable to drop from a specific modality of the multimodal dataset;var_indices
is anAbstractVector{Integer}
indicating the indices of the variables to drop from a specific modality of the multimodal dataset;var_name
is aSymbol
indicating the name of the variable to drop from a specific modality of the multimodal dataset;var_names
is anAbstractVector{Symbol}
indicating the name of the variables to drop from a specific modality of the multimodal dataset;
Examples
julia> df = DataFrame(:name => ["Python", "Julia"],
:age => [25, 26],
:sex => ['M', 'F'],
:height => [180, 175],
:weight => [80, 60])
)
2×5 DataFrame
Row │ name age sex height weight
│ String Int64 Char Int64 Int64
─────┼─────────────────────────────────────
1 │ Python 25 M 180 80
2 │ Julia 26 F 175 60
julia> md = MultiDataset([[1,2,4],[2,3,4],[5]], df)
● MultiDataset
└─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
└─ dimensionality: 0
2×3 SubDataFrame
Row │ name age height
│ String Int64 Int64
─────┼───────────────────────
1 │ Python 25 180
2 │ Julia 26 175
- Modality 2 / 3
└─ dimensionality: 0
2×3 SubDataFrame
Row │ age sex height
│ Int64 Char Int64
─────┼─────────────────────
1 │ 25 M 180
2 │ 26 F 175
- Modality 3 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
julia> removevariable_frommodality!(md, 3, 5)
[ Info: Variable 5 was last variable of modality 3: removing modality
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×3 SubDataFrame
Row │ name age height
│ String Int64 Int64
─────┼───────────────────────
1 │ Python 25 180
2 │ Julia 26 175
- Modality 2 / 2
└─ dimensionality: 0
2×3 SubDataFrame
Row │ age sex height
│ Int64 Char Int64
─────┼─────────────────────
1 │ 25 M 180
2 │ 26 F 175
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
julia> removevariable_frommodality!(md, 1, :age)
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name height
│ String Int64
─────┼────────────────
1 │ Python 180
2 │ Julia 175
- Modality 2 / 2
└─ dimensionality: 0
2×3 SubDataFrame
Row │ age sex height
│ Int64 Char Int64
─────┼─────────────────────
1 │ 25 M 180
2 │ 26 F 175
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
julia> removevariable_frommodality!(md, 2, [3,4])
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name height
│ String Int64
─────┼────────────────
1 │ Python 180
2 │ Julia 175
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
- Spare variables
└─ dimensionality: 0
2×2 SubDataFrame
Row │ sex weight
│ Char Int64
─────┼──────────────
1 │ M 80
2 │ F 60
julia> removevariable_frommodality!(md, 1, [:name,:height])
[ Info: Variable 4 was last variable of modality 1: removing modality
● MultiDataset
└─ dimensionalities: (0,)
- Modality 1 / 1
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
- Spare variables
└─ dimensionality: 0
2×4 SubDataFrame
Row │ name sex height weight
│ String Char Int64 Int64
─────┼──────────────────────────────
1 │ Python M 180 80
2 │ Julia F 175 60
Variables
MultiData.dropsparevariables!
— Methoddropsparevariables!(md)
Drop all variables that are not contained in any of the modalities in a multimodal dataset.
Arguments
md
is aMultiDataset
, that is the structure at which sparevariables will be dropped.
Examples
julia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => ["Python", "Julia"]))
● MultiDataset
└─ dimensionalities: (0,)
- Modality 1 / 1
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 30
2 │ 9
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
julia> dropsparevariables!(md)
2×1 DataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
MultiData.dropvariables!
— Methoddropvariables!(md, i)
dropvariables!(md, variable_name)
dropvariables!(md, indices)
dropvariables!(md, variable_names)
dropvariables!(md, i_modality, indices)
dropvariables!(md, i_modality, variable_names)
Drop the i
-th variable from a multimodal dataset, and return the dataset itself.
Arguments
md
is an MultiDataset;i
is anInteger
that indicates the index of the variable to drop;variable_name
is aSymbol
that idicates the variable to drop;indices
is anAbstractVector{Integer}
that indicates the indices of the variables to drop;variable_names
is anAbstractVector{Symbol}
that indicates the variables to drop.i_modality
: index of the modality; if this argument is specified,indices
are considered as relative to thei_modality
-th modality
Examples
julia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×3 SubDataFrame
Row │ sex height weight
│ Char Int64 Int64
─────┼──────────────────────
1 │ M 180 80
2 │ F 175 60
julia> dropvariables!(md, 4)
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ sex weight
│ Char Int64
─────┼──────────────
1 │ M 80
2 │ F 60
julia> dropvariables!(md, :name)
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
- Modality 2 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ sex weight
│ Char Int64
─────┼──────────────
1 │ M 80
2 │ F 60
julia> dropvariables!(md, [1,3])
[ Info: Variable 1 was last variable of modality 1: removing modality
● MultiDataset
└─ dimensionalities: (0,)
- Modality 1 / 1
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
TODO: To be reviewed
MultiData.hasvariables
— Methodhasvariables(df, variable_name)
hasvariables(md, i_modality, variable_name)
hasvariables(md, variable_name)
hasvariables(df, variable_names)
hasvariables(md, i_modality, variable_names)
hasvariables(md, variable_names)
Check whether a multimodal dataset contains a variable named variable_name
.
Instead of a single variable name a Vector
of names can be passed. If this is the case, this function will return true
only if md
contains all the specified variables.
Arguments
df
is anAbstractDataFrame
, which is one of the two structure in which you want to check the presence of the variable;md
is anAbstractMultiDataset
, which is one of the two structure in which you want to check the presence of the variable;variable_name
is aSymbol
indicating the variable, whose existence I want to verify;i_modality
is anInteger
indicating in which modality to look for the variable.
Examples
julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> hasvariables(md, :age)
true
julia> hasvariables(md.data, :name)
true
julia> hasvariables(md, :height)
false
julia> hasvariables(md, 1, :sex)
false
julia> hasvariables(md, 2, :sex)
true
julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> hasvariables(md, [:sex, :age])
true
julia> hasvariables(md, 1, [:sex])
false
julia> hasvariables(md, 2, [:sex])
true
julia> hasvariables(md.data, [:name, :sex])
true
MultiData.insertvariables!
— Methodinsertvariables!(md, col, index, values)
insertvariables!(md, index, values)
insertvariables!(md, col, index, value)
insertvariables!(md, index, value)
Insert a variable in a multimodal dataset with a given index.
Each inserted variable will be added in as a spare variables.
Arguments
md
is anAbstractMultiDataset
;col
is anInteger
indicating in which position to insert the new variable. If no col is passed, the new variable will be placed last in the md's underlying dataframe structure;index
is aSymbol
and denote the name of the variable to insert. Duplicated variable names will be renamed to avoid conflicts: seemakeunique
argument for insertcols! in DataFrames documentation;values
is anAbstractVector
that indicates the values for the newly inserted variable. The length ofvalues
should matchninstances(md)
;value
is a single value for the new variable. If a singlevalue
is passed as a last argument this will be copied and used for each instance in the dataset.
Examples
julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> insertvariables!(md, :weight, [80, 75])
2×4 DataFrame
Row │ name age sex weight
│ String Int64 Char Int64
─────┼─────────────────────────────
1 │ Python 25 M 80
2 │ Julia 26 F 75
julia> md
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 75
julia> insertvariables!(md, 2, :height, 180)
2×5 DataFrame
Row │ name height age sex weight
│ String Int64 Int64 Char Int64
─────┼─────────────────────────────────────
1 │ Python 180 25 M 80
2 │ Julia 180 26 F 75
julia> insertvariables!(md, :hair, ["brown", "blonde"])
2×6 DataFrame
Row │ name height age sex weight hair
│ String Int64 Int64 Char Int64 String
─────┼─────────────────────────────────────────────
1 │ Python 180 25 M 80 brown
2 │ Julia 180 26 F 75 blonde
julia> md
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Spare variables
└─ dimensionality: 0
2×3 SubDataFrame
Row │ height weight hair
│ Int64 Int64 String
─────┼────────────────────────
1 │ 180 80 brown
2 │ 180 75 blonde
MultiData.keeponlyvariables!
— Methodkeeponlyvariables!(md, indices)
keeponlyvariables!(md, variable_names)
Drop all variables that do not correspond to the indices in indices
from a multimodal dataset.
Note: if the dropped variables are contained in some modality they will also be removed from them; as a side effect, this can lead to the removal of modalities.
Arguments
md
is aMultiDataset
;indices
is anAbstractVector{Integer}
that indicates which indices to keep in the multimodal dataset;variable_names
is anAbstractVector{Symbol}
that indicates which variables to keep in the multimodal dataset.
Examples
julia> md = MultiDataset([[1, 2],[3, 4, 5],[5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
● MultiDataset
└─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 3
└─ dimensionality: 0
2×3 SubDataFrame
Row │ sex height weight
│ Char Int64 Int64
─────┼──────────────────────
1 │ M 180 80
2 │ F 175 60
- Modality 3 / 3
└─ dimensionality: 0
2×1 SubDataFrame
Row │ weight
│ Int64
─────┼────────
1 │ 80
2 │ 60
julia> keeponlyvariables!(md, [1,3,4])
[ Info: Variable 5 was last variable of modality 3: removing modality
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ sex height
│ Char Int64
─────┼──────────────
1 │ M 180
2 │ F 175
julia> keeponlyvariables!(md, [:name, :sex])
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
TODO: review
MultiData.nvariables
— Methodnvariables(md)
nvariables(md, i)
Return the number of variables in a multimodal dataset.
If an index i
is passed as second argument, then the number of variables of the i
-th modality is returned.
Alternatively, nvariables
can be called on a single modality.
Arguments
md
is aMultiDataset
;i
(optional) is anInteger
indicating the modality of the multimodal dataset whose number of variables you want to know.
Examples
julia> md = MultiDataset([[1],[2]], DataFrame(:age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> nvariables(md)
2
julia> nvariables(md, 2)
1
julia> mod2 = modality(md, 2)
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> nvariables(mod2)
1
julia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×3 SubDataFrame
Row │ sex height weight
│ Char Int64 Int64
─────┼──────────────────────
1 │ M 180 80
2 │ F 175 60
julia> nvariables(md)
5
julia> nvariables(md, 2)
3
julia> mod2 = modality(md,2)
2×3 SubDataFrame
Row │ sex height weight
│ Char Int64 Int64
─────┼──────────────────────
1 │ M 180 80
2 │ F 175 60
julia> nvariables(mod2)
3
MultiData.sparevariables
— Methodsparevariables(md)
Return the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.
Arguments
md
is aMultiDataset
, which is the structure whose indices of the sparevariables are to be known.
Examples
julia> md = MultiDataset([[1],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
julia> md.data
2×3 DataFrame
Row │ name age sex
│ String Int64 Char
─────┼─────────────────────
1 │ Python 25 M
2 │ Julia 26 F
julia> sparevariables(md)
1-element Vector{Int64}:
2
MultiData.variableindex
— Methodvariableindex(df, variable_name)
variableindex(md, i_modality, variable_name)
variableindex(md, variable_name)
Return the index of the variable. When i_modality
is passed, the function returns the index of the variable in the sub-dataframe of the modality identified by i_modality
. It returns 0
when the variable is not contained in the modality identified by i_modality
.
Arguments
df
is anAbstractDataFrame
;md
is anAbstractMultiDataset
;variable_name
is aSymbol
indicating the variable whose index you want to know;i_modality
is anInteger
indicating of which modality you want to know the index of the variable.
Examples
julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×2 SubDataFrame
Row │ name age
│ String Int64
─────┼───────────────
1 │ Python 25
2 │ Julia 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> md.data
2×3 DataFrame
Row │ name age sex
│ String Int64 Char
─────┼─────────────────────
1 │ Python 25 M
2 │ Julia 26 F
julia> variableindex(md, :age)
2
julia> variableindex(md, :sex)
3
julia> variableindex(md, 1, :name)
1
julia> variableindex(md, 2, :name)
0
julia> variableindex(md, 2, :sex)
1
julia> variableindex(md.data, :age)
2
MultiData.variables
— Methodvariables(md, i)
Return the names as Symbol
s of the variables in a multimodal dataset.
When called on a object of type MultiDataset
a Dict
is returned which will map the modality index to an AbstractVector{Symbol}
.
Note: the order of the variable names is granted to match the order of the variables in the modality.
If an index i
is passed as second argument, then the names of the variables of the i
-th modality are returned as an AbstractVector
.
Alternatively, nvariables
can be called on a single modality.
Arguments
md
is an MultiDataset;i
is anInteger
indicating from which modality of the multimodal dataset to get the names of the variables.
Examples
julia> md = MultiDataset([[2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
- Spare variables
└─ dimensionality: 0
2×1 SubDataFrame
Row │ name
│ String
─────┼────────
1 │ Python
2 │ Julia
julia> variables(md)
Dict{Integer, AbstractVector{Symbol}} with 2 entries:
2 => [:sex]
1 => [:age]
julia> variables(md, 2)
1-element Vector{Symbol}:
:sex
julia> variables(md, 1)
1-element Vector{Symbol}:
:age
julia> mod2 = modality(md, 2)
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> variables(mod2)
1-element Vector{Symbol}:
:sex
Instances
MultiData.deleteinstances!
— Methoddeleteinstances!(md, i)
Remove the i
-th instance in a multimodal dataset, and return the dataset itself.
deleteinstances!(md, i_instances)
Remove the instances at i_instances
in a multimodal dataset, and return the dataset itself.
MultiData.instance
— Methodinstance(md, i)
Return the i
-th instance in a multimodal dataset.
instance(md, i_modality, i_instance)
Return the i_instance
-th instance in a multimodal dataset with only variables from the the i_modality
-th modality.
instance(md, i_instances)
Return instances at i_instances
in a multimodal dataset.
instance(md, i_modality, i_instances)
Return iinstances at `iinstancesin a multimodal dataset with only variables from the the
i_modality`-th modality.
MultiData.keeponlyinstances!
— Methodkeeponlyinstances!(md, i_instances)
Remove all instances from a multimodal dataset, which index does not appear in i_instances
.
MultiData.pushinstances!
— Methodpushinstances!(md, instance)
Add an instance to a multimodal dataset, and return the dataset itself.
The instance can be a DataFrameRow
or an AbstractVector
but in both cases the number and type of variables should match those of the dataset.
SoleBase.ninstances
— Methodninstances(md)
Return the number of instances in a multimodal dataset.
Examples
julia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
└─ dimensionalities: (0, 0)
- Modality 1 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ age
│ Int64
─────┼───────
1 │ 25
2 │ 26
- Modality 2 / 2
└─ dimensionality: 0
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> mod2 = modality(md, 2)
2×1 SubDataFrame
Row │ sex
│ Char
─────┼──────
1 │ M
2 │ F
julia> ninstances(md) == ninstances(mod2) == 2
true