I have a table with one key with duplicate values. I would like to drop/reduce all duplicate keys but preserve the first row of each duplicate.
let data = "A;B\na;1\nb;\nb;2\nc;3"
let bytes = System.Text.Encoding.UTF8.GetBytes data
let stream = new MemoryStream( bytes )
let df=
Frame.ReadCsv(
stream = stream,
separators = ";",
hasHeaders = true
)
df.Print()
A B
0 -> a 1
1 -> b <missing>
2 -> b 2
3 -> c 3
The result should be
A B
0 -> a 1
1 -> b <missing>
2 -> c 3
I have tried applyLevel but I only get the value not the first entry:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.applyLevel fst (fun s -> s |> Series.firstValue)
df1.Print()
A B
a -> a 1
b -> b 2 <- wrong
c -> c 3
This is essentially a duplicate of a previous SO question. The short answer is:
The output is:
I've added a call to
Frame.mapRowKeysat the end to match your desired output as closely as possible. Note that the actual output differs slightly from your expected output, because row3 -> c 3has original index 3 instead of 2. I think this is more correct, but you can renumber the rows if necessary.The referenced question has more details.