I am struggling with R loess function in R. I have a dataframe on which I would locally weighted polynomial regression For each ‘Gene’ is associated a Count (log10 transformed) which gives information regarding the gene expression. For each Gene is associated an ‘Integrity’ measurement (span 0-100) which tells you the quality of the ‘Count’ measurement for each gene. As a general principle, higher is the ‘Integrity’, more reliable is the ‘Count’ for the specific Gene. Below is reported a chunk of the dataframe Sample dataframe:
| Gene | Integrity | Count |
|---|---|---|
| ENSG00000198786.2 | 96.6937 | 3.55279 |
| ENSG00000210194.1 | 96.68682 | 1.39794 |
| ENSG00000212907.2 | 94.81709 | 2.396199 |
| ENSG00000198886.2 | 93.87207 | 3.61595 |
| ENSG00000198727.2 | 89.08319 | 3.238548 |
| ENSG00000198804.2 | 88.82048 | 3.78326 |
I would like to use loess to predict the ‘true’ value of genes with low ‘Integrity’ values (since less reliable).
I) Should I pre-process my dataframe in order to correctly apply loess ? From a pletora of examples I observed sinusoidal distributions of points (A), while my dataset seem distributed in a ‘rollercoaster’-like fashion (B).
II) How should I run loess? I cannot understand how to run loess with the correct syntax to differentially weighted observations:
-1 loess( Count ~ Integrity, weight=None)
-2 loess( Count ~ 1:nrow(dataframe), weight=Integrity)
I performed several tests. Fig. C-D used loess (stats), Fig. E-F run weightedloess (limma). I used 2 different packages since, from the loess docs it is clear that prior weights are set based on x distance between points. weightedloess function allow the user to give priors in order to perform regression. Below is reported the basic sintax adopted to perform regression and to generate images.
C) loess(Count ~ Integrity),degree=2,span=0.1)
D) loess(Count ~ 1:nrow(df)),weigths=’Integrity’,degree=2,span=0.1)
E) weightedLowess(x=1:nrow(df), y=Count, weigths=’Integrity’, span=0.1)
F) weightedLowess(x=1:nrow(df), y=order(Count), weigths=’Integrity’, span=0.1)
Please find enclosed images cited in the question.