This seems like a very difficult problem, possibly impossible to solve without brute force testing every possibility (I think it is probably NP-hard), and I'm not even sure that is is, or can be, well-defined.
I want to write a program, but I don't need code per se, more a general algorithm (or meta-code) to help me figure out what the code should do, exactly.
So, I have a list of elements, which are all explicitly known. Let's say the fifty U.S. states. And I have an old set of sets of these elements, such as { {Alabama, Mississippi, Louisiana}, {New Hampshire, Delaware}, {Florida, California}, {Washington, Wisconsin, Wyoming}, ... } etc.
The end goal is, given a different set of sets of the same elements, rate it (for comparison with others like it), giving it a single numerical value for how alike the two sets of sets are.
The second set might be: { {Wisconsin, Wyoming, Washington}, {Florida, Delaware, Louisiana}, {Mississippi, Alabama, New Hampshire}, {California} }.
Order doesn't matter, either within the sets or the sets of sets.
Some further specifications are:
- There will always be the same number of [non-empty] sets.
- The sets themselves can contain any number of elements greater than zero - obviously not too high that that would force other sets to be empty.
- The same elements are always present, never any missing.
I need to first line up the sets so that they best match, and then rate those individual matches, and produce a total matching value for the entire set of sets.
So, in the second listing, {Mississippi, Alabama, New Hampshire} presumably should line up with "Alabama, Mississippi, Louisiana" from the original listing, and get a score of 2/3 or something...
Whereas the set {Wisconsin, Wyoming, Washington} would definitely line up with the identical set {Washington, Wisconsin, Wyoming} from the original listing, and get a score of 1, the highest value possible.
And then those scores should be - added? multiplied? to get a total score for the set of sets. (If added, then the maximum possible value would be the constant number of sets. If multiplied, then the maximum possible value would be 1. I'm not sure which is better; I could use both.)
Any help? Please???
I am stuck, and don't know where to start.