Why does this Deedle DataFrame has duplicate columns?

45 Views Asked by At

I created a Deedle DataFrame with this code:

type Person = { Name: string; Age: int; Gender: string; }
let people =
    [ { Name = "Alice"; Age = 25; Gender = "Female" }
      { Name = "Bob"; Age = 30; Gender = "Male" }
      { Name = "Carol"; Age = 22; Gender = "Female" }
      { Name = "David"; Age = 35; Gender = "Male" } ]
let df = Frame.ofRecords people

It should have four rows and three columns. However,

df.ColumnCount;;
val it: int = 6

There are three additional columns Name@, Age@ and Gender@. Why is this happening?

I tried to put semicolons and commas separating the elements of the list that defines 'people' but that did not help.

1

There are 1 best solutions below

0
Gebb On

This is a quirk of dotnet fsi. This code

open System.Reflection

type Person = { Name: string }
typeof<Person>.GetFields(
  BindingFlags.Public ||| BindingFlags.NonPublic ||| BindingFlags.Instance)
|> Seq.head
|> (fun x -> printfn "field: [ %A ] \
        is public: [ %A ] \
        attrs: [ %A ]" x (x.IsPublic) (x.Attributes))

produces the following output in FSI:

field: [ System.String Name@ ] is public: [ true ] attrs: [ Public ]

but different output in dotnet repl and VS Code with the Polyglot Notebooks plugin:

field: [ System.String Name@ ] is public: [ false ] attrs: [ Assembly ]

When FSI creates the record type, it makes the fields public.

The explanation seems to be that by default FSI may create different assemblies for different inputs, and it needs all assemblies to "see" all members of types of other assemblies.

This can be disabled via a command line parameter (the following is an excerpt from the output of dotnet fsi --help):

--multiemit[+|-]     Emit multiple assemblies (on by default)

I.e. instead of dotnet fsi you run dotnet fsi --multiemit-.

Related Questions in F#