I have a bunch of txt-files and want to read them into std::string and some of them are UCS-2, UTF-8 encoded. How to read them into a std::string. I just want to read any text-file into std::string. Do i have to convert them?
c++: How to read any file into std::string
313 Views Asked by extreme001 At
1
There are 1 best solutions below
Related Questions in C++
- How to immediately apply DISPLAYCONFIG_SCALING display scaling mode with SetDisplayConfig and DISPLAYCONFIG_PATH_TARGET_INFO
- Why can't I use templates members in its specialization?
- How to fix "Access violation executing location" when using GLFW and GLAD
- Dynamic array of structures in C++/ cannot fill a dynamic array of doubles in structure from dynamic array of structures
- How do I apply the interface concept with the base-class in design?
- File refuses to compile std::erase() even if using -std=g++23
- How can I do a successful map when the number of elements to be mapped is not consistent in Thrust C++
- Can std::bit_cast be applied to an empty object?
- Unexpected inter-thread happens-before relationships from relaxed memory ordering
- How i can move element of dynamic vector in argument of function push_back for dynamic vector
- Brick Breaker Ball Bounce
- Thread-safe lock-free min where both operands can change c++
- Watchdog Timer Reset on ESP32 using Webservers
- How to solve compiler error: no matching function for call to 'dmhFS::dmhFS()' in my case?
- Conda CMAKE CXX Compiler error while compiling Pytorch
Related Questions in FILE
- Helpt with reading files
- Why can't I use the file pointer after the first read attempt fails?
- Can't read the file using std::wifstream C++
- How can the scanner reread the entire file after it has already executed hasNextLine once?
- What is 'Invalid Load Key, '\x00'
- php $_FILE variable undefined index
- Data loaded from the file is not returned in the correct order
- File splitting and encryption
- Optimizing an s5cmd command that uses awk to generate a text file
- segmentation fault while reading in text file ( c++ )
- File.OpenText is adding C:\ to the front which is an error
- UTF-8 issue with excel
- How to upload files to MediaWiki APIs in Rust?
- No such file or directory: '/tmp/tmp_ejr26m6.upload.mp3' in Django
- Problems accessing zip files on the react front end from express backend
Related Questions in UTF-8
- Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?
- UTF-8 issue with excel
- UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits
- How to convert from Java ASCII properties to UTF8 (Java 9) properties
- How to read a file that contains both ANSI and UTF-8 encoded characters
- BSONError in MongoDB Compass
- Create HMAC SHA-1 in JS with byte array
- pdftk unicode works in preview but not adobe acrobat
- xml file from ISO-8859-2 to UTF-8 in python
- How to store metadata for a UTF-8 text file cross-platform?
- Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?
- How to get character position in a text file encode in UTF-8 in C?
- Unicode character ſ is matched as itself and as 's.'
- VS Code integrated terminal UTF-8 input problem
- pdftk generated pdf does not render correct utf-8
Related Questions in UCS2
- How to configure gsm modem for sending sms in text mode with ucs2 set of characters?
- Build a UCS-2 encoded HEX string from a Javascript default string encoding
- How to read json encoded in ibm437 in Ruby
- Perl issue when encoding mysql data from UTF-8 to UCS-2 for SMPP
- C# AT commands send mutli part SMS with CS2 encoding and User Data Header
- Cannot read JSON with Pandas a file encoded in UCS-2 Little Endian
- SSIS tab delimited csv flat file import, import as ragged right, replaces tabs with spaces
- I need help understanding how to handle JSON \u escapes where surrogate pairs are involved
- NVARCHAR storing characters not supported by UCS-2 encoding on SQL Server
- Encode local name like XmlConvert.EncodeLocalName in pure XQuery
- trying to figure out what kind of unicode should i have
- How to decode javascript-unicode string in python?
- Can SmartEncoding in Twilio's SMS service send GSM-7 characters like éÉÑñ via C# API?
- How to convert ucs2 encoded input to base64 on node.js server
- Convert C++ string to a char array, while encoding it in UCS2 (or utf-8)
Related Questions in UCS
- ansible ucs intersight error with loop and dictionary list
- Enter key works manually but not when sent via paramiko invoke_shell.sendall
- AttributeError: 'Node' object has no attribute 'is_goal' UCS Python
- How do you determine the byte width of a UTF-16 character?
- Create Virtualenv in Ubuntu 14.04 having Python UCS4
- What's the difference between Modified Dijkstra with single source, single destination point and Uniform Cost Search?
- C++: implementation-defined accepted physical source file characters
- How to convert between a Unicode/UCS codepoint and a UTF16 surrogate pair?
- Build Python as UCS-4 via pyenv
- How to resolve this error "undefined symbol: PyUnicodeUCS4_FromObject " while including Python packages in Odoo 8?
- LDAP - Univention Corporate server - Central authentication - SSO
- Which nonnegative integers aren't assigned a character in the UCS?
- convert ucs(Universal Character Set) character to unicode?
- Cisco UCS Python SDK script for querying Firmware versions
- Java String to UCS2 encoding for Letters with Accents
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
How they are read depends on what your OS supports and the locale you're using.
If you just naïvely read in files without touching your locale, and their locale does not match the locale that your std c++ library is using, you may encounter difficulties. Similar issue for single vs multiple byte character sets.
There's no reliable way to tell what the locale of the file is prior to reading it (meta-data may be wrong), so the general strategy is to attempt to read in the most common formats first, and then re-try with different formats if that fails (i.e. an invalid character is encountered). Even then it may be ambiguous. This is a deceivingly complex problem, you run into the same issue parsing HTML with crazy character sets.
In general, there are two sets of file I/O functions available, one for multibyte character sets and one for single byte character sets. Support for this functionality is deeply platform specific though, so if you're using an English localized OS with no special character support added, then multibyte sets may not be supported by C++ directly without the use of an external library.
Microsoft specifies non-standard extensions to cin and cout. By prefixing them with a w, they separate the streams by their byte width.
This works as you'd expect, but you'll have to
#define _UNICODEfor it to compile. As a side note, Windows separates many of its system API calls into two versions, one that takes a single byte string, and one that takes a multi-byte string. I.e.CreateProcessAvsCreateProcessW.So to summarize, IO functionality is split along character set's byte width and locale. In order to give you a more targeted answer to your question, I'd need to know more about your goals. Take a look at C++'s locale support to get a better idea about this. Specifically the locale functions in
ios_base,imbueandgetloc. There isn't currently a good way to handle these problems with widely deployed versions of C++, though I understand these issues have been alleviated in upcoming versions of C++.