I have got this pipeline that works perfectly without transposition but suddenly everything goes wrong once I need to transpose my data.
My data are loaded from an excel file, I have other 29000 variables so I can't put them as columns in my excel spreadsheet as it can't handle it, has to be rows, which then need to be transposed in R + some further tweaking to adjust row/columns names as required for the rest of the pipeline.
Here's the first bit of the script:
library(readxl)
library(tidyverse)
library(factoextra)
library(ggrepel)
library(ggplot2)
library(ggfortify)
library(ggforce)
library(dplyr)
library(data.table)
init <- read_excel("D:/Bureau/Bioinformatique/All_transcripts.xlsx") #Load data
temp <- t(init) #Transpose data as too many columns for excel
temp2 <- data.frame(rownames(temp), temp) #Move row names to the dataframe - to be used later
colnames(temp2) <- temp2[1,] #Set row 1 as columns names
data <- temp2[-1,] #Remove row 1
rownames(data) <- NULL #Transform row names into numbers from 1 to 12
numdata <- data[,-1:-2] #Remove columns 1 and 2, which are group names and not numeric values
numdata <- lapply(numdata, as.numeric) #Set everything as numeric values
prcomp(numdata, scale. = TRUE, center = TRUE)
pca<-prcomp(numdata, scale. = TRUE, center = TRUE)
Some more details:
str(numdata)
List of 29701
$ ENST00000000233_10: num \[1:12\] 2 7 6 9 5 23 2 5 8 8 ...
$ ENST00000000412_8 : num \[1:12\] 1 2 1 1 0 1 0 0 1 1 ...
$ ENST00000001008_6 : num \[1:12\] 0 1 0 0 0 0 0 0 1 0 ...
$ ENST00000002125_9 : num \[1:12\] 0 1 2 1 2 0 0 1 2 0 ...
$ ENST00000002165_11: num \[1:12\] 0 0 0 1 0 1 0 0 0 0 ...
$ ENST00000004103_8 : num \[1:12\] 0 0 0 0 0 0 0 0 2 0 ...
is.numeric(numdata)
[1] FALSE
prcomp(numdata, scale. = TRUE, center = TRUE)
Erreur dans colMeans(x, na.rm = TRUE) : 'x' doit être numérique (=Error in \[...\] : 'x' must be numerical)
Without using lapply(), everything is considered 'chr' for some reason. I have checked for any missing data, NA, leading 0, or anything and couldn't find anything. Curiously when not having to use transposition, when using smaller datasets, although the framedata for data and numdata look identical, it works and I don't even need the lapply(), and by testing quite a lot of other things, it seems the issue comes from what happens with t().
I initially had dots in the columns names, changed that for "_", tried changing data format in excel, looked for any non-numeric value, tried with made-up small scales spreadsheet with no issue. I tried to completely erase the first row/column, not touching the headers, to remove all cells with text. Always got the same error message.