Powershell Script to Replace Text in Text File, but not save to new file

58 Views Asked by At

I am trying to replace text in a large text file, 5gb. I found the script below. It outputs to a new file.

powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII myFile.txt"

I am trying not to create another file because of the size. How do I replace text in a text file without saving it to a new file.

2

There are 2 best solutions below

4
Re1ter On

Try to use:

powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | set-content myFile.txt -encoding ASCII"
0
mklement0 On

How do I replace text in a text file without saving it to a new file?

The command in your question -
(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII myFile.txt - does write to the same file, although it rewrites the entire content of the file.

It is able to do so, because enclosing the gc (Get-Content) call in (...) ensures that the entire content is read into memory first.

While this may be slow and memory-intensive and bears a hypothetical risk of data loss, it does allow you to write back to the same file.

Note that with an ASCII-encoded input file that is 5GB in size, you'll need 10GB+ of available memory to even read the lines of the file into memory (on disk, each characters is a single byte, in memory, every [char] instance occupies 2 bytes), including a varying amount of memory depending on how many lines are affected by actual replacements (via -replace, which passes a given input string through if there's no match).

If you do have enough memory, you can greatly speed up your command as follows (leaving the powershell -Command call aside):

Set-Content -Encoding Ascii myFile.txt -Value (
  (Get-Content -ReadCount 0 myFile.txt) -replace 'foo', 'bar'
)

The only way to avoid having to rewrite the file in full is if (a) the search regex always matches a fixed number of characters, and (b) the substitution (replacement) text is the exact same length. If you were to write to a file with a variable-width encoding such as UTF-8, the encoded form of both strings would have to have the exact same byte count.

You could then read the file in binary mode, and overwrite only the bytes of interest.

Needless to say, this would require much more effort.