Problem in printing Russian characters in rendering .docx in Rmarkdown

192 Views Asked by At

I am trying to render a .docx file using Rmarkdown that includes Russian characters in the dataframe

---
params: 
title: "Encoding Issue"
output:
 bookdown::word_document2:
   reference_docx: Word_template.docx
classoption: a4paper
always_allow_html: yes
lang: ru
---

```{r data}
df <- data.frame(x = 
c("Не хватка медикаментов",
"Далеко ехать",
"Опасность на дорогах к мед.учереждению",
"Only one family physician")
)
Encoding(df$x) <- "UTF-8"
```

```{r cat}
cat(df$x)
```

�� ������ ������������ ������ ����� ��������� �� ������� � ���.����������� Only one family physician

```{r print}
print(df$x)
```

[1] "\xcd\xe5 \xf5\xe2\xe0\xf2\xea\xe0 \xec\xe5\xe4\xe8\xea\xe0\xec\xe5\xed\xf2\xee\xe2"
[2] "\xc4\xe0\xeb\xe5\xea\xee \xe5\xf5\xe0\xf2\xfc"
[3] "\xce\xef\xe0\xf1\xed\xee\xf1\xf2\xfc \xed\xe0 \xe4\xee\xf0\xee\xe3\xe0\xf5 \xea \xec\xe5\xe4.\xf3\xf7\xe5\xf0\xe5\xe6\xe4\xe5\xed\xe8\xfe" [4] "Only one family physician"

The rendered .docx file shows the printed result as

[1] "<U+041D><U+0435> <U+0445><U+0432><U+0430><U+0442><U+043A><U+0430> <U+043C><U+0435><U+0434><U+0438><U+043A><U+0430><U+043C><U+0435><U+043D><U+0442><U+043E><U+0432>"
[2] "<U+0414><U+0430><U+043B><U+0435><U+043A><U+043E> <U+0435><U+0445><U+0430><U+0442><U+044C>"
[3] "<U+041E><U+043F><U+0430><U+0441><U+043D><U+043E><U+0441><U+0442><U+044C> <U+043D><U+0430> <U+0434><U+043E><U+0440><U+043E><U+0433><U+0430><U+0445> <U+043A> <U+043C><U+0435><U+0434>.<U+0443><U+0447><U+0435><U+0440><U+0435><U+0436><U+0434><U+0435><U+043D><U+0438><U+044E>" [4] "Only one family physician"

The Sys.getlocale() is

"LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251"

Where could be the origin of the encoding issue? Is there any way to correctly render the .docx file with the correct characters?

Не хватка медикаментов Далеко ехать Опасность на дорогах к мед.учереждению Only one family physician

I have also tried with Sys.setlocale("LC_CTYPE", "English"). The .docx template is set to "UTF-8". The rmarkdown is also set to options(encoding = "UTF-8").

1

There are 1 best solutions below

0
manro On

You should use enc2utf8 in this situation:

```{r data}
df <- data.frame(x = 
c("Нехватка медикаментов",
"Далеко ехать",
"Опасность на дорогах к мед.учреждению",
"Only one family physician")
)
```

```{r}
enc2utf8(df$x)
```