word comparison in a line of text in c#

67 Views Asked by At

Hi I am using c# language in my project and I am trying to get output something like below.

 string str1 = "Cat meet's a dog has";
 string str2 = "Cat meet's a dog and a bird";

 string[] str1Words = str1.ToLower().Split(' ');
 string[] str2Words = str2.ToLower().Split(' ');

 var uniqueWords = str2Words
   .Except(str1Words)
   .Concat(str1Words.Except(str2Words))
   .ToList();

This gives me out put has,and ,a, bird which is correct but what i would like is something like below

has - present in first string not present in second string

and a bird - not present in first string but present in second string

For example, second user case

String S1 = "Added"
String S2 = "Edited"

here out put should be

Added - present in first string not present in second string

Edited - not present in first string but present in second string

I would like to have some indication which is present in first and not in second, present in second and not in first and comparison should be word by word rather than character by character. Can someone please help me with this. Any help would be appreciated. Thanks

2

There are 2 best solutions below

2
Dmitry Bychenko On BEST ANSWER

I suggest matching words

Let word be a sequence of letters and apostrophes

with a help of regular expression (please, note that splitting doesn't take punctuation into account and thus cat cat, and cat! will be considered three different words) and then query matches for two given strings:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions; 

...

private static readonly Regex WordsRegex = new Regex(@"[\p{L}']+"); 

// 1 - in text1, 2 - in text2, 3 - in both text1 and text2 
private static List<(string word, int presentAt)> MyWords(string text1, string text2) {
  HashSet<string> words1 = WordsRegex
    .Matches(text1)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToHashSet(StringComparer.OrdinalIgnoreCase);

  HashSet<string> words2 = WordsRegex
    .Matches(text2)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToHashSet(StringComparer.OrdinalIgnoreCase);

  return words1
    .Union(words2)
    .Select(word => (word, presentAt: (words1.Contains(word) ? 1 : 0) | 
                                      (words2.Contains(word) ? 2 : 0)))
    .ToList();
}

Demo:

string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
    
var result = MyWords(str1, str2);
    
var report = string.Join(Environment.NewLine, result);
    
Console.Write(report);

Output:

(Cat, 3)         # 3: in both str1 and str2 
(meet's, 3)      # 3: in both str1 and str2
(a, 3)           # 3: in both str1 and str2
(dog, 3)         # 3: in both str1 and str2 
(has, 1)         # 1: in str1 only
(and, 2)         # 2: in str2 only
(bird, 2)        # 2: in str2 only 

Fiddle

If you want a wordy output:

string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
    
string[] options = new string[] {
  "not present",
  "present in first string not present in second string",
  "not present in first string but present in second string",
  "present in first string and present in second string"
};
        
var report = string.Join(Environment.NewLine, result
  .Select(pair => $"{pair.word} - {options[pair.presentAt]}"));

Console.Write(report);

Output:

Cat - present in first string and present in second string
meet's - present in first string and present in second string
a - present in first string and present in second string
dog - present in first string and present in second string
has - present in first string not present in second string
and - not present in first string but present in second string
bird - not present in first string but present in second string
2
Lajos Arpad On
str2Words.Except(str1Words)

finds the words in str2Words that are not in str1Words.

str1Words.Except(str2Words)

finds the words in str1Words that are not in str2Words.

Since you need the two separately, you will need to avoid concatenating them and, instead use Join on each of them to get space-separated results and append the "present" addendum that you planned for them.