Description
Just like DataFrame
s, MultiDataset
s can be described using the method describe
:
julia> ts_cos = [cos(i) for i in 1:50000];
julia> ts_sin = [sin(i) for i in 1:50000];
julia> df_data = DataFrame(
:id => [1, 2],
:age => [30, 9],
:name => ["Python", "Julia"],
:stat => [deepcopy(ts_sin), deepcopy(ts_cos)]
);
julia> md = MultiDataset([[2,3], [4]], df_data);
julia> description = describe(md)
2-element Vector{DataFrame}:
2×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Union… Any Union… Any Int64 DataType
─────┼─────────────────────────────────────────────────────────────
1 │ age 19.5 9 19.5 30 0 Int64
2 │ name Julia Python 0 String
1×7 DataFrame
Row │ Variables mean min ⋯
│ Symbol Array… Array… ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ stat AbstractFloat[8.63372e-6; -2.848… AbstractFloat[-1.0; -1.0 ⋯
5 columns omitted
the describe
implementation for MultiDataset
s will try to find the best statistical measures that can be used to the type of data the modality contains.
In the example the 2nd modality, which contains variables (just one in the example) of data of type Vector{Float64}
, was described by applying the well known 22 features from the package Catch22.jl plus maximum
, minimum
and mean
as the vectors were time series.
DataAPI.describe
— Functiondescribe(md; t = fill([(1, 0, 0)], nmodalities(md)), kwargs...)
Return descriptive statistics for an AbstractMultiDataset
as a Vector
of new DataFrame
s where each row represents a variable and each column a summary statistic.
Arguments
md
: theAbstractMultiDataset
;t
: is a vector ofnmodalities
elements, where each element is a vector as long as the dimensionality of
the i-th modality. Each element of the innermost vector is a tuple
of arguments for [`paa`](@ref).
For other see the documentation of DataFrames.describe
function.
Examples
TODO: examples