Regex/Method to comment a Japanese text

481 Views Asked by At

I have a large text file of the below format.

{
    "glossary": {
        "title": "example glossary",
        cm="私は今プログラミングーをしています"; 
        "text2": "example glossary",
        cm="私はABあああをしています"
}

I need to comment out the line which includes Japanese characters. There are 4 or multiple tabs at the start of this line. Tab count varies on each line. I need to change the above file as below:

{
    "glossary": {
        "title": "example glossary",
        */cm="私は今プログラミングーをしています";*/
        "text2": "example glossary",
        */cm="私はABあああをしています";*/
}

Environment:

★ I can run a batch file.

★ I can run a VB script.

★ I can use the Sakura Editor. (preferred)

★ I cannot use/download 3rd party software.

Things I have tried.

■ Using regex ➞ I tried to replace the Japanese text with "" using regex \p{Hiragana} and then \p{Katakana} after that \p{Han} but these still remained the symbols.

■ Using VBA I have tried to read each line of text file using vba and replace the matching line with "*/" I don't know why but it replaced the whole file. The code I used is as below:

Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists("C:\Users\s162138\Desktop\test.txt") then
Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 1)

Do Until objFile.AtEndOfStream
strLine = objFile.Readline
If strNextLine = "cm=*" then
strLine = "text"+ strLine + "text"
End If

strNewText = strLine + vbcrlf
Loop
Set objFile = Nothing

Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 2)
objFile.Write strNewText
Set objFile = Nothing
End If

I would be grateful if anyone could help me out..

1

There are 1 best solutions below

0
Ryszard Czech On

Use the Japanese regex provided at https://gist.github.com/ryanmcgrath/982242 like this:

^([ \t]*)(.*?(?:[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|[\u2190-\u2195]|\u203B).*?)([ \t]*)$

Replace with $1/*$2*/$3. See proof.

EXPLANATION

                         EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [ \t]*                   any character of: ' ', '\t' (tab) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      [\u3000-\u303F]          punctuation
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u3040-\u309F]          hiragana
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u30A0-\u30FF]          katakana
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\uFF00-\uFFEF]          Full-width roman + half-width katakana
                               
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u4E00-\u9FAF]          Common and uncommon kanji
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u2605-\u2606]          Stars
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u2190-\u2195]          arrows
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      \u203B                    Weird asterisk thing
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    [ \t]*                   any character of: ' ', '\t' (tab) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \3
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string