Change italics filenames in my markdown use backtick code

70 Views Asked by At

After this discussion about using backtick code for filenames in markdown, I now need to go back through all my .md files and change *some.filename.php* to *`some.filename.php`* every time it happens.

I'm sure some sort of $1 and $2 arguments might work in sed or awk, but I have no idea how or which is best. Here's the criteria I'm up against:

  • All filenames are in italics asterisks with an asterisk * at the end: *some.filename.php*
  • A space might preceede the file name in asterisk italics or not: *asterisk italics then some-filename.php*
  • All file names end with an extension, but not always .php. They could end with .xml or .css or .whatthefork.
  • Some file names might have two periods . in them, some might not.

Example of my text:

*Read through index.php*

*Notice the changes below:*

  - *some.file.php*
  - *and anotherphpfile.php*
  - *don't miss this.four.word.file.amp*
  - *backup.txt*
  - *index2.html*
  - *style3.css*
  - *action.js*

*Look at XML in my-feed.xml*

What can I run against the .md file so every filename in the file contents will become wrapped in backticks like so:

*Read through `index.php`*

*Notice the changes below:*

  - *`some.file.php`*
  - *and `anotherphpfile.php`*
  - *don't miss `this.four.word.file.amp`*
  - *`backup.txt`*
  - *`index2.html`*
  - *`style3.css`*
  - *`action.js`*

*Look at XML in `my-feed.xml`*
3

There are 3 best solutions below

1
Ed Morton On BEST ANSWER

This might be what you want, using a sed that supports -E to enable EREs such as GNU or BSD sed:

$ sed -E 's/(\*)([^*]*[[:space:]])?([[:alnum:]_][[:alnum:]_.-]*\.[[:alnum:]_-]+)(\*)/\1\2`\3`\4/g' file
*Read through `index.php`*

*Notice the changes below:*

  - *`some.file.php`*
  - *and `anotherphpfile.php`*
  - *don't miss `this.four.word.file.amp`*
  - *`backup.txt`*
  - *`index2.html`*
  - *`style3.css`*
  - *`action.js`*

*Look at XML in `my-feed.xml`*

but be aware that Unix file names can contain *s or white space, including newlines, as well as any other characters except / or NUL in which case the above regexp would fail to match them, it'd only match file names that look like the ones in your example.

This would accept file names containing all chars except * and white space as there's no way given input of *and anotherphpfile.php* to tell if and is part of the filename or not if file names can contain white space and it might match other strings you don't want to match in input in other contexts that you haven't shown us:

$ sed -E 's/(\*)([^*]*[[:space:]])?([^*]+\.[^*]+)(\*)/\1\2`\3`\4/g' file
*Read through `index.php`*

*Notice the changes below:*

  - *`some.file.php`*
  - *and `anotherphpfile.php`*
  - *don't miss `this.four.word.file.amp`*
  - *`backup.txt`*
  - *`index2.html`*
  - *`style3.css`*
  - *`action.js`*

*Look at XML in `my-feed.xml`*
1
tax evader On

Does this sed regex pattern provide your expected result?

sed 's/[a-zA-Z0-9.-]*\.[a-zA-Z]*/`&`/g'
0
Daweo On

I would harness GNU AWK for this task following way, let file.txt content be

*Read through index.php*

*Notice the changes below:*

  - *some.file.php*
  - *and anotherphpfile.php*
  - *don't miss this.four.word.file.amp*
  - *backup.txt*
  - *index2.html*
  - *style3.css*
  - *action.js*

*Look at XML in my-feed.xml*

then

awk 'match($0,/[^[:space:]*]*[.][^[:space:]*]+[*]/){$0=substr($0,1,RSTART-1) "`" substr($0,RSTART,RLENGTH-1) "`*"}{print}' file.txt

gives output

*Read through `index.php`*

*Notice the changes below:*

  - *`some.file.php`*
  - *and `anotherphpfile.php`*
  - *don't miss `this.four.word.file.amp`*
  - *`backup.txt`*
  - *`index2.html`*
  - *`style3.css`*
  - *`action.js`*

*Look at XML in `my-feed.xml`*

Explanation: I crafted regular expression to match filename and trailing asteriks. It is as follows:

  • zero-or-more characters other than white-space and *
  • literal dot
  • one-or-more characters other than white-space and *
  • literal asteriks

Then I use String Functions match and substr If there is match I alter line to what is before match followed by backtick followed by match with last character (which is *) dropped followed by backtick-asteriks. For each line I print it. Disclaimer this solution assumes each line has no more than 2 *.

(tested in GNU Awk 5.1.0)