Manipulation

Modalities

MultiData.addmodality!Method
addmodality!(md, indices)
addmodality!(md, index)
addmodality!(md, variable_names)
addmodality!(md, variable_name)

Create a new modality in a multimodal dataset using variables at indices or index, and return the dataset itself.

Alternatively to the indices and the index, the variable name(s) can be used.

Note: to add a new modality with new variables see insertmodality!.

Arguments

  • md is a MultiDataset;
  • indices is an AbstractVector{Integer} that indicates which indices of the multimodal dataset's corresponding dataframe to add to the new modality;
  • index is an Integer that indicates the index of the multimodal dataset's corresponding dataframe to add to the new modality;
  • variable_names is an AbstractVector{Symbol} that indicates which variables of the multimodal dataset's corresponding dataframe to add to the new modality;
  • variable_name is a Symbol that indicates the variable of the multimodal dataset's corresponding dataframe to add to the new modality;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])
2×5 DataFrame
 Row │ name    age    sex   height  weight
     │ String  Int64  Char  Int64   Int64
─────┼─────────────────────────────────────
   1 │ Python     25  M        180      80
   2 │ Julia      26  F        175      60

julia> md = MultiDataset([[1]], df)
● MultiDataset
   └─ dimensionalities: (0,)
- Modality 1 / 1
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Spare variables
   └─ dimensionality: 0
2×4 SubDataFrame
 Row │ age    sex   height  weight
     │ Int64  Char  Int64   Int64
─────┼─────────────────────────────
   1 │    25  M        180      80
   2 │    26  F        175      60


julia> addmodality!(md, [:age, :sex])
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ age    sex
     │ Int64  Char
─────┼─────────────
   1 │    25  M
   2 │    26  F
- Spare variables
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ height  weight
     │ Int64   Int64
─────┼────────────────
   1 │    180      80
   2 │    175      60


julia> addmodality!(md, 5)
● MultiDataset
   └─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 3
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ age    sex
     │ Int64  Char
─────┼─────────────
   1 │    25  M
   2 │    26  F
- Modality 3 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ height
     │ Int64
─────┼────────
   1 │    180
   2 │    175
source
MultiData.addvariable_tomodality!Method
addvariable_tomodality!(md, i_modality, var_index)
addvariable_tomodality!(md, i_modality, var_indices)
addvariable_tomodality!(md, i_modality, var_name)
addvariable_tomodality!(md, i_modality, var_names)

Add variable at index var_index to the modality at index i_modality in a multimodal dataset, and return the dataset. Alternatively to var_index the variable name can be used. Multiple variables can be inserted into the multimodal dataset at once using var_indices or var_inames.

Note: The function does not allow you to add a variable to a new modality, but only to add it to an existing modality. To add a new modality use addmodality! instead.

Arguments

  • md is a MultiDataset;
  • i_modality is an Integer indicating the modality in which the variable(s) will be added;
  • var_index is an Integer that indicates the index of the variable to add to a specific modality of the multimodal dataset;
  • var_indices is an AbstractVector{Integer} indicating the indices of the variables to add to a specific modality of the multimodal dataset;
  • var_name is a Symbol indicating the name of the variable to add to a specific modality of the multimodal dataset;
  • var_names is an AbstractVector{Symbol} indicating the name of the variables to add to a specific modality of the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
                      :age => [25, 26],
                      :sex => ['M', 'F'],
                      :height => [180, 175],
                      :weight => [80, 60])
                     )
2×5 DataFrame
 Row │ name    age    sex   height  weight
     │ String  Int64  Char  Int64   Int64
─────┼─────────────────────────────────────
   1 │ Python     25  M        180      80
   2 │ Julia      26  F        175      60

julia> md = MultiDataset([[1, 2],[3]], df)
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Spare variables
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ height  weight
     │ Int64   Int64
─────┼────────────────
   1 │    180      80
   2 │    175      60

julia> addvariable_tomodality!(md, 1, [4,5])
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×4 SubDataFrame
 Row │ name    age    height  weight
     │ String  Int64  Int64   Int64
─────┼───────────────────────────────
   1 │ Python     25     180      80
   2 │ Julia      26     175      60
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> addvariable_tomodality!(md, 2, [:name,:weight])
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×4 SubDataFrame
 Row │ name    age    height  weight
     │ String  Int64  Int64   Int64
─────┼───────────────────────────────
   1 │ Python     25     180      80
   2 │ Julia      26     175      60
- Modality 2 / 2
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ sex   name    weight
     │ Char  String  Int64
─────┼──────────────────────
   1 │ M     Python      80
   2 │ F     Julia       60
source
MultiData.dropmodalities!Method
dropmodalities!(md, indices)
dropmodalities!(md, index)

Remove the i-th modality from a multimodal dataset while dropping all variables in it, and return the dataset itself.

Note: if the dropped variables are contained in other modalities they will also be removed from them. This can lead to the removal of additional modalities other than the i-th.

If the intention is to remove a modality without dropping the variables use removemodality! instead.

Arguments

  • md is a MultiDataset;
  • index is an Integer indicating the index of the modality to drop;
  • indices is an AbstractVector{Integer} indicating the indices of the modalities to drop.

Examples

julia> df = DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60])
2×5 DataFrame
 Row │ name    age    sex   height  weight
     │ String  Int64  Char  Int64   Int64
─────┼─────────────────────────────────────
   1 │ Python     25  M        180      80
   2 │ Julia      26  F        175      60

julia> md = MultiDataset([[1, 2],[3,4],[5],[2,3]], df)
● MultiDataset
   └─ dimensionalities: (0, 0, 0, 0)
- Modality 1 / 4
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 4
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ sex   height
     │ Char  Int64
─────┼──────────────
   1 │ M        180
   2 │ F        175
- Modality 3 / 4
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60
- Modality 4 / 4
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ age    sex
     │ Int64  Char
─────┼─────────────
   1 │    25  M
   2 │    26  F

julia> dropmodalities!(md, [2,3])
[ Info: Variable 3 was last variable of modality 2: removing modality
[ Info: Variable 3 was last variable of modality 2: removing modality
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26

julia> dropmodalities!(md, 2)
[ Info: Variable 2 was last variable of modality 2: removing modality
● MultiDataset
   └─ dimensionalities: (0,)
- Modality 1 / 1
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
source
MultiData.insertmodality!Function
insertmodality!(md, col, new_modality, existing_variables)
insertmodality!(md, new_modality, existing_variables)

Insert new_modality as new modality to multimodal dataset, and return the dataset. Existing variables can be added to the new modality while adding it to the dataset by passing the corresponding indices as existing_variables. If col is specified then the variables will be inserted starting at index col.

Arguments

  • md is a MultiDataset;
  • col is an Integer indicating the column in which to insert the columns of new_modality;
  • new_modality is an AbstractDataFrame which will be added to the multimodal dataset as a sub-dataframe of a new modality;
  • existing_variables is an AbstractVector{Integer} or AbstractVector{Symbol}. It indicates which variables of the multimodal dataset internal dataframe structure to insert in the new modality.

Examples

julia> df = DataFrame(
           :name => ["Python", "Julia"],
           :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
       )
2×2 DataFrame
 Row │ name    stat1
     │ String  Array…
─────┼───────────────────────────────────────────
   1 │ Python  [0.841471, 0.909297, 0.14112, -0…
   2 │ Julia   [0.540302, -0.416147, -0.989992,…

julia> md = MultiDataset(df; group = :all)
● MultiDataset
   └─ dimensionalities: (0, 1)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 2
   └─ dimensionality: 1
2×1 SubDataFrame
 Row │ stat1
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.841471, 0.909297, 0.14112, -0…
   2 │ [0.540302, -0.416147, -0.989992,…

julia> insertmodality!(md, DataFrame(:age => [30, 9]))
● MultiDataset
   └─ dimensionalities: (0, 1, 0)
- Modality 1 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 3
   └─ dimensionality: 1
2×1 SubDataFrame
 Row │ stat1
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.841471, 0.909297, 0.14112, -0…
   2 │ [0.540302, -0.416147, -0.989992,…
- Modality 3 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    30
   2 │     9

julia> md.data
2×3 DataFrame
 Row │ name    stat1                              age
     │ String  Array…                             Int64
─────┼──────────────────────────────────────────────────
   1 │ Python  [0.841471, 0.909297, 0.14112, -0…     30
   2 │ Julia   [0.540302, -0.416147, -0.989992,…      9

or, selecting the column

julia> df = DataFrame(
           :name => ["Python", "Julia"],
           :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
       )
2×2 DataFrame
 Row │ name    stat1
     │ String  Array…
─────┼───────────────────────────────────────────
   1 │ Python  [0.841471, 0.909297, 0.14112, -0…
   2 │ Julia   [0.540302, -0.416147, -0.989992,…

julia> md = MultiDataset(df; group = :all)
● MultiDataset
   └─ dimensionalities: (0, 1)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 2
   └─ dimensionality: 1
2×1 SubDataFrame
 Row │ stat1
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.841471, 0.909297, 0.14112, -0…
   2 │ [0.540302, -0.416147, -0.989992,…

julia> insertmodality!(md, 2, DataFrame(:age => [30, 9]))
● MultiDataset
   └─ dimensionalities: (1, 0)
- Modality 1 / 2
   └─ dimensionality: 1
2×1 SubDataFrame
 Row │ stat1
     │ Array…
─────┼───────────────────────────────────
    1 │ [0.841471, 0.909297, 0.14112, -0…
    2 │ [0.540302, -0.416147, -0.989992,…
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    30
   2 │     9
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia

julia> md.data
2×3 DataFrame
 Row │ name    age    stat1
     │ String  Int64  Array…
─────┼──────────────────────────────────────────────────
   1 │ Python     30  [0.841471, 0.909297, 0.14112, -0…
   2 │ Julia       9  [0.540302, -0.416147, -0.989992,…

or, adding an existing variable:

julia> df = DataFrame(
           :name => ["Python", "Julia"],
           :stat1 => [[sin(i) for i in 1:50000], [cos(i) for i in 1:50000]]
       )
2×2 DataFrame
 Row │ name    stat1
     │ String  Array…
─────┼───────────────────────────────────────────
   1 │ Python  [0.841471, 0.909297, 0.14112, -0…
   2 │ Julia   [0.540302, -0.416147, -0.989992,…

julia> md = MultiDataset([[2]], df)
● MultiDataset
   └─ dimensionalities: (1,)
- Modality 1 / 1
   └─ dimensionality: 1
2×1 SubDataFrame
 Row │ stat1
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.841471, 0.909297, 0.14112, -0…
   2 │ [0.540302, -0.416147, -0.989992,…
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia


julia> insertmodality!(md, DataFrame(:age => [30, 9]); existing_variables = [1])
● MultiDataset
   └─ dimensionalities: (1, 0)
- Modality 1 / 2
   └─ dimensionality: 1
2×1 SubDataFrame
 Row │ stat1
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.841471, 0.909297, 0.14112, -0…
   2 │ [0.540302, -0.416147, -0.989992,…
- Modality 2 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ age    name
     │ Int64  String
─────┼───────────────
   1 │    30  Python
   2 │     9  Julia
source
MultiData.modalityMethod
modality(md, i)

Return the i-th modality of a multimodal dataset.

modality(md, indices)

Return a Vector of modalities at indices of a multimodal dataset.

source
MultiData.removemodality!Method
removemodality!(md, indices)
removemodality!(md, index)

Remove i-th modality from a multimodal dataset, and return the dataset.

Note: to completely remove a modality and all variables in it use dropmodalities! instead.

Arguments

  • md is a MultiDataset;
  • index is an Integer that indicates which modality to remove from the multimodal dataset;
  • indices is an AbstractVector{Integer} that indicates the modalities to remove from the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
                      :age => [25, 26],
                      :sex => ['M', 'F'],
                      :height => [180, 175],
                      :weight => [80, 60])
                     )
2×5 DataFrame
 Row │ name    age    sex   height  weight
     │ String  Int64  Char  Int64   Int64
─────┼─────────────────────────────────────
   1 │ Python     25  M        180      80
   2 │ Julia      26  F        175      60

julia> md = MultiDataset([[1, 2],[3],[4],[5]], df)
● MultiDataset
   └─ dimensionalities: (0, 0, 0, 0)
- Modality 1 / 4
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 4
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Modality 3 / 4
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ height
     │ Int64
─────┼────────
   1 │    180
   2 │    175
- Modality 4 / 4
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60

julia> removemodality!(md, [3])
● MultiDataset
   └─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Modality 3 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ height
     │ Int64
─────┼────────
   1 │    180
   2 │    175

julia> removemodality!(md, [1,2])
● MultiDataset
   └─ dimensionalities: (0,)
- Modality 1 / 1
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60
- Spare variables
   └─ dimensionality: 0
2×4 SubDataFrame
 Row │ name    age    sex   height
     │ String  Int64  Char  Int64
─────┼─────────────────────────────
   1 │ Python     25  M        180
   2 │ Julia      26  F        175
source
MultiData.removevariable_frommodality!Method
removevariable_frommodality!(md, i_modality, var_indices)
removevariable_frommodality!(md, i_modality, var_index)
removevariable_frommodality!(md, i_modality, var_name)
removevariable_frommodality!(md, i_modality, var_names)

Remove variable at index var_index from the modality at index i_modality in a multimodal dataset, and return the dataset itself.

Alternatively to var_index the variable name can be used. Multiple variables can be dropped from the multimodal dataset at once, by passing a Vector of Symbols (for names), or a Vector of integers (for indices) as a last argument.

Note: when all variables are dropped from a modality, it will be removed.

Arguments

  • md is a MultiDataset;
  • i_modality is an Integer indicating the modality in which the variable(s) will be dropped;
  • var_index is an Integer that indicates the index of the variable to drop from a specific modality of the multimodal dataset;
  • var_indices is an AbstractVector{Integer} indicating the indices of the variables to drop from a specific modality of the multimodal dataset;
  • var_name is a Symbol indicating the name of the variable to drop from a specific modality of the multimodal dataset;
  • var_names is an AbstractVector{Symbol} indicating the name of the variables to drop from a specific modality of the multimodal dataset;

Examples

julia> df = DataFrame(:name => ["Python", "Julia"],
                      :age => [25, 26],
                      :sex => ['M', 'F'],
                      :height => [180, 175],
                      :weight => [80, 60])
                     )
2×5 DataFrame
 Row │ name    age    sex   height  weight
     │ String  Int64  Char  Int64   Int64
─────┼─────────────────────────────────────
   1 │ Python     25  M        180      80
   2 │ Julia      26  F        175      60

julia> md = MultiDataset([[1,2,4],[2,3,4],[5]], df)
● MultiDataset
   └─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ name    age    height
     │ String  Int64  Int64
─────┼───────────────────────
   1 │ Python     25     180
   2 │ Julia      26     175
- Modality 2 / 3
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ age    sex   height
     │ Int64  Char  Int64
─────┼─────────────────────
   1 │    25  M        180
   2 │    26  F        175
- Modality 3 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60

julia> removevariable_frommodality!(md, 3, 5)
[ Info: Variable 5 was last variable of modality 3: removing modality
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ name    age    height
     │ String  Int64  Int64
─────┼───────────────────────
   1 │ Python     25     180
   2 │ Julia      26     175
- Modality 2 / 2
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ age    sex   height
     │ Int64  Char  Int64
─────┼─────────────────────
   1 │    25  M        180
   2 │    26  F        175
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60

julia> removevariable_frommodality!(md, 1, :age)
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    height
     │ String  Int64
─────┼────────────────
   1 │ Python     180
   2 │ Julia      175
- Modality 2 / 2
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ age    sex   height
     │ Int64  Char  Int64
─────┼─────────────────────
   1 │    25  M        180
   2 │    26  F        175
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60

julia> removevariable_frommodality!(md, 2, [3,4])
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    height
     │ String  Int64
─────┼────────────────
   1 │ Python     180
   2 │ Julia      175
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26
- Spare variables
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ sex   weight
     │ Char  Int64
─────┼──────────────
   1 │ M         80
   2 │ F         60

julia> removevariable_frommodality!(md, 1, [:name,:height])
[ Info: Variable 4 was last variable of modality 1: removing modality
● MultiDataset
   └─ dimensionalities: (0,)
- Modality 1 / 1
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26
- Spare variables
   └─ dimensionality: 0
2×4 SubDataFrame
 Row │ name    sex   height  weight
     │ String  Char  Int64   Int64
─────┼──────────────────────────────
   1 │ Python  M        180      80
   2 │ Julia   F        175      60
source

Variables

MultiData.dropsparevariables!Method
dropsparevariables!(md)

Drop all variables that are not contained in any of the modalities in a multimodal dataset.

Arguments

  • md is a MultiDataset, that is the structure at which sparevariables will be dropped.

Examples

julia> md = MultiDataset([[1]], DataFrame(:age => [30, 9], :name => ["Python", "Julia"]))
● MultiDataset
   └─ dimensionalities: (0,)
- Modality 1 / 1
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    30
   2 │     9
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia


julia> dropsparevariables!(md)
2×1 DataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
source
MultiData.dropvariables!Method
dropvariables!(md, i)
dropvariables!(md, variable_name)
dropvariables!(md, indices)
dropvariables!(md, variable_names)
dropvariables!(md, i_modality, indices)
dropvariables!(md, i_modality, variable_names)

Drop the i-th variable from a multimodal dataset, and return the dataset itself.

Arguments

  • md is an MultiDataset;
  • i is an Integer that indicates the index of the variable to drop;
  • variable_name is a Symbol that idicates the variable to drop;
  • indices is an AbstractVector{Integer} that indicates the indices of the variables to drop;
  • variable_names is an AbstractVector{Symbol} that indicates the variables to drop.
  • i_modality: index of the modality; if this argument is specified, indices are considered as relative to the i_modality-th modality

Examples

julia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ sex   height  weight
     │ Char  Int64   Int64
─────┼──────────────────────
   1 │ M        180      80
   2 │ F        175      60

julia> dropvariables!(md, 4)
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ sex   weight
     │ Char  Int64
─────┼──────────────
   1 │ M         80
   2 │ F         60

julia> dropvariables!(md, :name)
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26
- Modality 2 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ sex   weight
     │ Char  Int64
─────┼──────────────
   1 │ M         80
   2 │ F         60

julia> dropvariables!(md, [1,3])
[ Info: Variable 1 was last variable of modality 1: removing modality
● MultiDataset
   └─ dimensionalities: (0,)
- Modality 1 / 1
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

TODO: To be reviewed

source
MultiData.hasvariablesMethod
hasvariables(df, variable_name)
hasvariables(md, i_modality, variable_name)
hasvariables(md, variable_name)
hasvariables(df, variable_names)
hasvariables(md, i_modality, variable_names)
hasvariables(md, variable_names)

Check whether a multimodal dataset contains a variable named variable_name.

Instead of a single variable name a Vector of names can be passed. If this is the case, this function will return true only if md contains all the specified variables.

Arguments

  • df is an AbstractDataFrame, which is one of the two structure in which you want to check the presence of the variable;
  • md is an AbstractMultiDataset, which is one of the two structure in which you want to check the presence of the variable;
  • variable_name is a Symbol indicating the variable, whose existence I want to verify;
  • i_modality is an Integer indicating in which modality to look for the variable.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> hasvariables(md, :age)
true

julia> hasvariables(md.data, :name)
true

julia> hasvariables(md, :height)
false

julia> hasvariables(md, 1, :sex)
false

julia> hasvariables(md, 2, :sex)
true
julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> hasvariables(md, [:sex, :age])
true

julia> hasvariables(md, 1, [:sex])
false

julia> hasvariables(md, 2, [:sex])
true

julia> hasvariables(md.data, [:name, :sex])
true
source
MultiData.insertvariables!Method
insertvariables!(md, col, index, values)
insertvariables!(md, index, values)
insertvariables!(md, col, index, value)
insertvariables!(md, index, value)

Insert a variable in a multimodal dataset with a given index.

Note

Each inserted variable will be added in as a spare variables.

Arguments

  • md is an AbstractMultiDataset;
  • col is an Integer indicating in which position to insert the new variable. If no col is passed, the new variable will be placed last in the md's underlying dataframe structure;
  • index is a Symbol and denote the name of the variable to insert. Duplicated variable names will be renamed to avoid conflicts: see makeunique argument for insertcols! in DataFrames documentation;
  • values is an AbstractVector that indicates the values for the newly inserted variable. The length of values should match ninstances(md);
  • value is a single value for the new variable. If a single value is passed as a last argument this will be copied and used for each instance in the dataset.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> insertvariables!(md, :weight, [80, 75])
2×4 DataFrame
 Row │ name    age    sex   weight
     │ String  Int64  Char  Int64
─────┼─────────────────────────────
   1 │ Python     25  M         80
   2 │ Julia      26  F         75

julia> md
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     75

julia> insertvariables!(md, 2, :height, 180)
2×5 DataFrame
 Row │ name    height  age    sex   weight
     │ String  Int64   Int64  Char  Int64
─────┼─────────────────────────────────────
   1 │ Python     180     25  M         80
   2 │ Julia      180     26  F         75

julia> insertvariables!(md, :hair, ["brown", "blonde"])
2×6 DataFrame
 Row │ name    height  age    sex   weight  hair
     │ String  Int64   Int64  Char  Int64   String
─────┼─────────────────────────────────────────────
   1 │ Python     180     25  M         80  brown
   2 │ Julia      180     26  F         75  blonde

julia> md
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Spare variables
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ height  weight  hair
     │ Int64   Int64   String
─────┼────────────────────────
   1 │    180      80  brown
   2 │    180      75  blonde
source
MultiData.keeponlyvariables!Method
keeponlyvariables!(md, indices)
keeponlyvariables!(md, variable_names)

Drop all variables that do not correspond to the indices in indices from a multimodal dataset.

Note: if the dropped variables are contained in some modality they will also be removed from them; as a side effect, this can lead to the removal of modalities.

Arguments

  • md is a MultiDataset;
  • indices is an AbstractVector{Integer} that indicates which indices to keep in the multimodal dataset;
  • variable_names is an AbstractVector{Symbol} that indicates which variables to keep in the multimodal dataset.

Examples

julia> md = MultiDataset([[1, 2],[3, 4, 5],[5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
● MultiDataset
   └─ dimensionalities: (0, 0, 0)
- Modality 1 / 3
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 3
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ sex   height  weight
     │ Char  Int64   Int64
─────┼──────────────────────
   1 │ M        180      80
   2 │ F        175      60
- Modality 3 / 3
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ weight
     │ Int64
─────┼────────
   1 │     80
   2 │     60

julia> keeponlyvariables!(md, [1,3,4])
[ Info: Variable 5 was last variable of modality 3: removing modality
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ sex   height
     │ Char  Int64
─────┼──────────────
   1 │ M        180
   2 │ F        175

julia> keeponlyvariables!(md, [:name, :sex])
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

TODO: review

source
MultiData.nvariablesMethod
nvariables(md)
nvariables(md, i)

Return the number of variables in a multimodal dataset.

If an index i is passed as second argument, then the number of variables of the i-th modality is returned.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is a MultiDataset;
  • i (optional) is an Integer indicating the modality of the multimodal dataset whose number of variables you want to know.

Examples

julia> md = MultiDataset([[1],[2]], DataFrame(:age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F


julia> nvariables(md)
2

julia> nvariables(md, 2)
1

julia> mod2 = modality(md, 2)
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> nvariables(mod2)
1

julia> md = MultiDataset([[1, 2],[3, 4, 5]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F'], :height => [180, 175], :weight => [80, 60]))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×3 SubDataFrame
 Row │ sex   height  weight
     │ Char  Int64   Int64
─────┼──────────────────────
   1 │ M        180      80
   2 │ F        175      60

julia> nvariables(md)
5

julia> nvariables(md, 2)
3

julia> mod2 = modality(md,2)
2×3 SubDataFrame
 Row │ sex   height  weight
     │ Char  Int64   Int64
─────┼──────────────────────
   1 │ M        180      80
   2 │ F        175      60

julia> nvariables(mod2)
3
source
MultiData.sparevariablesMethod
sparevariables(md)

Return the indices of all the variables that are not contained in any of the modalities of a multimodal dataset.

Arguments

  • md is a MultiDataset, which is the structure whose indices of the sparevariables are to be known.

Examples

julia> md = MultiDataset([[1],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26

julia> md.data
2×3 DataFrame
 Row │ name    age    sex
     │ String  Int64  Char
─────┼─────────────────────
   1 │ Python     25  M
   2 │ Julia      26  F

julia> sparevariables(md)
1-element Vector{Int64}:
 2
source
MultiData.variableindexMethod
variableindex(df, variable_name)
variableindex(md, i_modality, variable_name)
variableindex(md, variable_name)

Return the index of the variable. When i_modality is passed, the function returns the index of the variable in the sub-dataframe of the modality identified by i_modality. It returns 0 when the variable is not contained in the modality identified by i_modality.

Arguments

  • df is an AbstractDataFrame;
  • md is an AbstractMultiDataset;
  • variable_name is a Symbol indicating the variable whose index you want to know;
  • i_modality is an Integer indicating of which modality you want to know the index of the variable.

Examples

julia> md = MultiDataset([[1, 2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×2 SubDataFrame
 Row │ name    age
     │ String  Int64
─────┼───────────────
   1 │ Python     25
   2 │ Julia      26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> md.data
2×3 DataFrame
 Row │ name    age    sex
     │ String  Int64  Char
─────┼─────────────────────
   1 │ Python     25  M
   2 │ Julia      26  F

julia> variableindex(md, :age)
2

julia> variableindex(md, :sex)
3

julia> variableindex(md, 1, :name)
1

julia> variableindex(md, 2, :name)
0

julia> variableindex(md, 2, :sex)
1

julia> variableindex(md.data, :age)
2
source
MultiData.variablesMethod
variables(md, i)

Return the names as Symbols of the variables in a multimodal dataset.

When called on a object of type MultiDataset a Dict is returned which will map the modality index to an AbstractVector{Symbol}.

Note: the order of the variable names is granted to match the order of the variables in the modality.

If an index i is passed as second argument, then the names of the variables of the i-th modality are returned as an AbstractVector.

Alternatively, nvariables can be called on a single modality.

Arguments

  • md is an MultiDataset;
  • i is an Integer indicating from which modality of the multimodal dataset to get the names of the variables.

Examples

julia> md = MultiDataset([[2],[3]], DataFrame(:name => ["Python", "Julia"], :age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F
- Spare variables
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ name
     │ String
─────┼────────
   1 │ Python
   2 │ Julia

julia> variables(md)
Dict{Integer, AbstractVector{Symbol}} with 2 entries:
  2 => [:sex]
  1 => [:age]

julia> variables(md, 2)
1-element Vector{Symbol}:
 :sex

julia> variables(md, 1)
1-element Vector{Symbol}:
 :age

julia> mod2 = modality(md, 2)
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> variables(mod2)
1-element Vector{Symbol}:
 :sex
source

Instances

MultiData.deleteinstances!Method
deleteinstances!(md, i)

Remove the i-th instance in a multimodal dataset, and return the dataset itself.

deleteinstances!(md, i_instances)

Remove the instances at i_instances in a multimodal dataset, and return the dataset itself.

source
MultiData.instanceMethod
instance(md, i)

Return the i-th instance in a multimodal dataset.

instance(md, i_modality, i_instance)

Return the i_instance-th instance in a multimodal dataset with only variables from the the i_modality-th modality.

instance(md, i_instances)

Return instances at i_instances in a multimodal dataset.

instance(md, i_modality, i_instances)

Return iinstances at `iinstancesin a multimodal dataset with only variables from the thei_modality`-th modality.

source
MultiData.keeponlyinstances!Method
keeponlyinstances!(md, i_instances)

Remove all instances from a multimodal dataset, which index does not appear in i_instances.

source
MultiData.pushinstances!Method
pushinstances!(md, instance)

Add an instance to a multimodal dataset, and return the dataset itself.

The instance can be a DataFrameRow or an AbstractVector but in both cases the number and type of variables should match those of the dataset.

source
SoleBase.ninstancesMethod
ninstances(md)

Return the number of instances in a multimodal dataset.

Examples

julia> md = MultiDataset([[1],[2]],DataFrame(:age => [25, 26], :sex => ['M', 'F']))
● MultiDataset
   └─ dimensionalities: (0, 0)
- Modality 1 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ age
     │ Int64
─────┼───────
   1 │    25
   2 │    26
- Modality 2 / 2
   └─ dimensionality: 0
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> mod2 = modality(md, 2)
2×1 SubDataFrame
 Row │ sex
     │ Char
─────┼──────
   1 │ M
   2 │ F

julia> ninstances(md) == ninstances(mod2) == 2
true
source