Situation & Problem
1 .
eg:
Say, you have a paragraph.
The word sentence is broken down to sente-nce with a hyphen.
Imagine you have this sample sentence, which is a very long sente-
nce that has a word being broken down with a hyphen.
2 .
How can I detect that word sente-nce is broken down with a hyphen, and correct it into sentence?
note:
Is there any library I can use to do that (prefer Java / Python / any software)?
Using a simple regex to match all
(\w)-(\w)& replace with$1$2, wont work in all cases.eg: Imagine you have a word
event-driven, it will becomeeventdriven, which is undesired.
Solution (may not be the best)
logic & usage
/*
@logic::
regex match all words with hypen -
loop check if those words are correct by using a dictionary
_ & fix if they have hypen misplaced
@to_use::
put your dictionary in
Path path = Paths.get("words_alpha.txt");<= https://github.com/dwyl/english-wordsput your sentence to autoCorrect on in
content_TESTINGexecute & get output
@note::
depending on the quality of the dictionary, the results may not be good.
@note::
if your words contains "space or newline \n" -> modify the regex in
String str_RegexPattern = "([a-zA-Z]+)-([a-zA-Z]+)";@note::
this is not fully tested yet
*/
code
input
output