I am encountering an issue when uploading a PDF that contains Chinese text. Upon attempting to read the PDF content, the Chinese text is not displayed correctly, and instead, I am getting newline characters (\n). I am using pdf-lib and pdf-parse.
import pdfParse from "pdf-parse";
// Function to parse a PDF file
async function parsePDF(filePath) {
try {
// Read the PDF file
const dataBuffer = await fs.promises.readFile(filePath);
// Convert the buffer to text using pdf-parse
const data = await pdfParse(dataBuffer);
// Access the parsed text, which may contain Chinese characters
const chineseText = data.text;
// Now you can work with the Chinese text as needed
console.log(chineseText);
} catch (error) {
console.error("Error parsing PDF:", error);
}
}
// Example usage
const filePath = "path/to/your/pdf/file.pdf";
parsePDF(filePath);
this is what i get when i read the text from the file.
'\n\nFBA\n\nFBA: *****\n\n\n\n\n\n\n\n\n\n\nFBA17PG60THDU000001\nSingle SKU\n\n\n\n\n\nFBA\n\nFBA: EVEO LLC\n\n\n\n\n\n\n\n\n\n\nFBA17PG60THDU000002\nSingle SKU\n\n\n\n\n\nFBA\n\n
maybe someone know how to read ?
