Description

Just like DataFrames, MultiDatasets can be described using the method describe:

julia> ts_cos = [cos(i) for i in 1:50000];

julia> ts_sin = [sin(i) for i in 1:50000];

julia> df_data = DataFrame(
                         :id => [1, 2],
                         :age => [30, 9],
                         :name => ["Python", "Julia"],
                         :stat => [deepcopy(ts_sin), deepcopy(ts_cos)]
                     );

julia> md = MultiDataset([[2,3], [4]], df_data);

julia> description = describe(md)
2-element Vector{DataFrame}:
 2×7 DataFrame
 Row │ variable  mean    min    median  max     nmissing  eltype   
     │ Symbol    Union…  Any    Union…  Any     Int64     DataType
─────┼─────────────────────────────────────────────────────────────
   1 │ age       19.5    9      19.5    30             0  Int64
   2 │ name              Julia          Python         0  String
 1×7 DataFrame
 Row │ Variables  mean                               min                      ⋯
     │ Symbol      Array…                             Array…                   ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ stat        AbstractFloat[8.63372e-6; -2.848…  AbstractFloat[-1.0; -1.0 ⋯
                                                               5 columns omitted

the describe implementation for MultiDatasets will try to find the best statistical measures that can be used to the type of data the modality contains.

In the example the 2nd modality, which contains variables (just one in the example) of data of type Vector{Float64}, was described by applying the well known 22 features from the package Catch22.jl plus maximum, minimum and mean as the vectors were time series.

DataAPI.describeFunction
describe(md; t = fill([(1, 0, 0)], nmodalities(md)), kwargs...)

Return descriptive statistics for an AbstractMultiDataset as a Vector of new DataFrames where each row represents a variable and each column a summary statistic.

Arguments

  • md: the AbstractMultiDataset;
  • t: is a vector of nmodalities elements, where each element is a vector as long as the dimensionality of
the i-th modality. Each element of the innermost vector is a tuple
of arguments for [`paa`](@ref).

For other see the documentation of DataFrames.describe function.

Examples

TODO: examples

source