How do I combine to RDS files having both common and different column names?

16 Views Asked by At

I have two rds files seqtab22 and seqtab23 with the columnnames being unique DNA sequences and row names being unique samples in each of them. In the two rds files, there are tons of columnnames (DNA sequences) that are identical with the rest being non-identical.

I wish to combine them as a single .rds file where the rownames (unique samples) are combined and for each rowname,

the identical columnnames from seqtab22 and seqtab23 are combined with their respective values under one combined column (suppose there is column A (DNA sequence A) in both files then I wish to have just one column name in the final file rather than 2 columns with 2 A).

In the case of non-identical columnnames, they can be combined normally as there is no risk of duplication for them.

I have other 2 metadata files that are in csv format and it seems like they are pretty easy to combine. This worked for the csv files.


seqtab22=readRDS("C:/Users/susan/Desktop/Ben/seqtab2022.rds")
seqtab23=readRDS("C:/Users/susan/Desktop/Ben/seqtab2023.rds")

df1 <- as.data.frame(seqtab22)
df2 <- as.data.frame(seqtab23)

First stupid approach!

# Combine the two datasets
combined_seqtab <- rbind(df1, df2)
df <- combined_seqtab
Error in rbind(...) :
number of columns of matrices must match (see arg 2)

###I do not expect the no. of unique DNA sequences to be the same in both files as they are 2 different batches. Second stupid approach (I am new to R, so had to take help from GPT).

# Check column names of both dataframes
colnames_22 <- colnames(df1)
colnames_23 <- colnames(df2)

# Merge the dataframes based on column names
if (identical(colnames_22, colnames_23)) {
  # If column names are identical, merge the values into the same column
  merged_df <- bind_rows(df1, df2)
} else {
  # If column names are different, merge them so that different column names exist as separate columns
  merged_df <- bind_cols(df1, df2)
}

# View the merged dataframe
print(merged_df) 

Error : object 'famdf2' not found

These are some of the libraries that are running. library(tidyverse) library(dplyr) library(patchwork)

0

There are 0 best solutions below