Convert a dataframe into a ped object in R (pedtools)

152 Views Asked by At

I have this dataframe in R. It has the structure of a pedigree dataframe, with the id, fid, mid and sex columns.

pedigree <- structure(list(id = c(212, 214, 263, 266, 273, 274, 275, 279, 
280, 281, 286, 287, 312, 313, 314, 315, 316, 317, 318, 319, 320, 
321, 322, 323, 324, 325, 326, 327, 332, 333, 334, 335, 336, 337, 
338, 339, 340, 341, 346, 347, 348, 349, 389, 390, 391, 392, 413, 
414, 415, 416, 466, 475, 476, 477, 478, 479, 480, 483, 486, 487, 
491, 492, 493, 494, 498, 501, 502, 506, 507, 508, 509, 510, 511, 
512, 513, 514, 518, 519, 542, 543, 544, 545, 546, 547, 551, 552, 
553, 554, 555, 556, 564, 565, 568, 569, 570, 575, 576, 579, 580, 
584, 585, 586, 589, 590, 593, 595, 596, 597, 598, 599, 614, 615, 
616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 653, 654, 662, 
663, 671, 672, 673, 674, 675, 676, 681, 682, 683, 684, 688, 689, 
693, 694, 695, 696, 697, 698, 701, 702, 703, 704, 709, 710, 715, 
716, 718, 720, 721, 722, 723, 724, 725, 726, 727, 730, 731, 736, 
737, 738, 739, 740, 744, 745, 842, 843, 874, 875, 884, 885, 886, 
887, 889, 890, 894, 895, 896, 897, 898, 903, 905, 906, 907, 908, 
909, 910, 911, 912, 913, 914, 915, 917, 925, 926, 927, 928, 929, 
931, 932, 936, 965, 999, 1000, 1006, 1007, 1041, 1043, 1044, 
1046, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1099, 1100, 
1101, 1321, 1322, 1368, 1551, 1552, 1553, 1554, 1555), fid = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 326, 326, 326, 326, 279, 320, 320, 320, 320, 320, 320, 
320, 320, 320, 324, 324, 324, 324, 322, 322, 322, 324, 324, 324, 
324, 324, 324, 324, 324, 324, 318, 318, 326, 326, 326, 326, 326, 
326, 326, 326, 326, 326, 326, 326, 332, 332, 287, 287, 287, 287, 
287, 286, 286, 346, 346, 346, 348, 348, 348, 326, 326, 326, 326, 
326, 332, 332, 320, 320, 320, 320, 320, 287, 346, 346, 346, 346, 
273, 273, 273, 273, 266, 334, 334, 334, 334, 334, 336, 336, 336, 
336, 336, 336, 334, 334, 334, 334, 334, 334, 338, 338, 338, 338, 
340, 340, 340, 338, 338, 334, 334, 334, 334, 334, 334, 334, 334, 
314, 314, 314, 314, 314, 314, 314, 312, 312, 0, 0, 286, 286, 
314, 314, 314, 314, 314, 314, 334, 334, 334, 334, 334, 389, 389, 
389, 389, 389, 389, 389, 389, 389, 389, 389, 389, 338, 332, 332, 
332, 332, 332, 332, 332, 346, 274, 391, 391, 391, 391, 0, 0, 
0, 0, 316, 316, 316, 316, 316, 316, 316, 316, 842, 842, 842, 
1041, 1041, 1041, 1043, 1043, 1043, 1043, 1043), mid = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 327, 327, 327, 327, 275, 321, 321, 321, 321, 321, 321, 
321, 321, 321, 325, 325, 325, 325, 323, 323, 323, 325, 325, 325, 
325, 325, 325, 325, 325, 325, 319, 319, 327, 327, 327, 327, 327, 
327, 327, 327, 327, 327, 327, 327, 333, 333, 212, 212, 212, 212, 
212, 214, 214, 347, 347, 347, 349, 349, 349, 327, 327, 327, 327, 
327, 333, 333, 321, 321, 321, 321, 321, 212, 347, 347, 347, 347, 
281, 281, 281, 281, 263, 335, 335, 335, 335, 335, 337, 337, 337, 
337, 337, 337, 335, 335, 335, 335, 335, 335, 339, 339, 339, 339, 
341, 341, 341, 339, 339, 335, 335, 335, 335, 335, 335, 335, 335, 
315, 315, 315, 315, 315, 315, 315, 313, 313, 0, 0, 214, 214, 
315, 315, 315, 315, 315, 315, 335, 335, 335, 335, 335, 390, 390, 
390, 390, 390, 390, 390, 390, 390, 390, 390, 390, 339, 333, 333, 
333, 333, 333, 333, 333, 347, 280, 392, 392, 392, 392, 0, 0, 
0, 0, 317, 317, 317, 317, 317, 317, 317, 317, 843, 843, 843, 
1044, 1044, 1044, 1046, 1046, 1046, 1046, 1046), sex = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), levels = c("1", "2"), class = "factor")), row.names = c(NA, 
-234L), class = c("tbl_df", "tbl", "data.frame"))

This is the structure, where there are 234 individuals:

str(pedigree)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   234 obs. of  4 variables:
 $ id : num  212 214 263 266 273 274 275 279 280 281 ...
 $ fid: num  0 0 0 0 0 0 0 0 0 0 ...
 $ mid: num  0 0 0 0 0 0 0 0 0 0 ...
 $ sex: Factor w/ 2 levels "1","2": 1 1 1 2 2 2 1 2 1 1 ...

I am trying to do a pedigree analysis by using pedtools.

In order to convert this dataframe into a ped object, I use this as.ped(pedigree) function.

However, I see this malformed pedigree information:

as.ped(pedigree)
Error: Malformed pedigree.
 Individual 287 is female, but appear as the father of 568
 Individual 212 is male, but appear as the mother of 568

I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 212, who is the father of 568 (and is included in mid).

As a convention, 1 refers to males and 2 to females.

What might be happening?

2

There are 2 best solutions below

4
KSkoczek On

I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 287.

Looking at your dataset, the record for 568 states

  A tibble: 1 x 4
     id   fid   mid sex  
  <dbl> <dbl> <dbl> <fct>
1   568   287   212 1 

287 is in the fid column, not the mid column as you state. There is an error somewhere in the data (either fid and mid have been switched here, or the sex value of 287 and 212 have been swapped)

Edit: On further inspection, several records indicate 287 as the father and 212 as the mother, specifically:

# A tibble: 6 x 4
     id   fid   mid sex  
  <dbl> <dbl> <dbl> <fct>
1   568   287   212 1    
2   569   287   212 1    
3   570   287   212 2    
4   575   287   212 1    
5   576   287   212 2    
6   621   287   212 2   

This may indicate the sex values for 287 and 212 are incorrect (rather than fid and mid being swapped across several records), but you will need to examine your data source (or processing pipeline) to confirm

0
Laura On

The problem is that males (1) are assigned as mothers (2) and females are assigned as fathers. R only returns the error for the first case it evaluates.

You can rename using colnames and then run the code:

colnames(pedigree) = c("id", "mid", "fid", "sex")
as.ped(pedigree)

You can change the name in the df directly too.