What is the equivalent to pandas dataframe info() in Deedle?

275 Views Asked by At

Python's pandas library allows getting info() on a data frame.

For example.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Name           30 non-null     object 
 1   PhoneNumber    30 non-null     object 
 2   City           30 non-null     object 
 3   Address        30 non-null     object 
 4   PostalCode     30 non-null     object 
 5   BirthDate      30 non-null     object 
 6   Income         26 non-null     float64
 7   CreditLimit    30 non-null     object 
 8   MaritalStatus  24 non-null     object 
dtypes: float64(1), object(8)
memory usage: 2.2+ KB

Is there an equivalent in Deedle's data frame? Something that can get an overview for missing values and the inferred types.

2

There are 2 best solutions below

1
On BEST ANSWER

There isn't a single function to do this - it would be a nice addition to the library if you wanted to consider sending a pull-request.

The following gets all the information you would need:

// Prints column names and types, with data preview
df.Print(true)

// Print key range of rows (or key sequence if it is not ordered)
if df.RowIndex.IsOrdered then printfn "%A" df.RowIndex.KeyRange
else printfn "%A" df.RowIndex.Keys

// Get access to the data of the frame so that we can inspect the columns
let dt = df.GetFrameData()
for n, (ty, vec) in Seq.zip dt.ColumnKeys dt.Columns do 
  // Print name, type of column
  printf "%A %A" n ty
  // Query the interal data storage to see if it uses
  // array of optional values (may have nulls) or not
  match vec.Data with 
  | Vectors.VectorData.DenseList _ -> printfn " (no nulls)"
  | _ -> printfn " (nulls)" 
0
On

Based on Thomas's suggestion (thank you!) I modified it slightly to produce an output similar to pandas:

let info (df: Deedle.Frame<'a,'b>) =
let dt = df.GetFrameData()
let countOptionalValues d =
    d
    |> Seq.filter (
        function
        | OptionalValue.Present _ -> true
        | _ -> false
    )
    |> Seq.length

Seq.zip dt.ColumnKeys dt.Columns
|> Seq.map (fun (col, (ty, vec)) ->
    {|
        Column = col
        ``Non-Null Count`` =
            match vec.Data with
            | Vectors.VectorData.DenseList d -> $"%i{d |> Seq.length} non-null"
            | Vectors.VectorData.SparseList d -> $"%i{d |> countOptionalValues} non-null"
            | Vectors.VectorData.Sequence d -> $"%i{d |> countOptionalValues} non-null"
        Dtype = ty
    |}
)

Pandas output: enter image description here

Deedle output: enter image description here