SpamAssassin rules explaination

3.3k Views Asked by At

I have a little problem with SpamAssassin. I don't find the documentation for the rules.

For example for the rule MIME_HTML_MOSTLY I have this link : https://wiki.apache.org/spamassassin/Rules/MIME_HTML_MOSTLY But apparently the documentation is no longer available and I didn't find the new link.

Could you please help me to find the new wiki link ?

Thanks in advance.

2

There are 2 best solutions below

0
AChichi On BEST ANSWER

Here is what SpamAssassin support answered me :

The wiki was mostly migrated to the ASF Confluence instance recently and is now at https://cwiki.apache.org/confluence/display/SPAMASSASSIN/. The old rules descriptions (which had not been maintained since v3.3) were not migrated, as they were largely outdated where they were not redundant.

I don't have a definitive reference for the decision to stop maintaining rule descriptions on the wiki, so there may be a more correct explanation out there in the heads of the people who were on the PMC at the time. However, my view is that this was the right decision because of how the default rules are managed. Rules can shift in and out of the update channel based on the automated QA process, and there is a continuous trickle of new rules, rule changes, and rule deletions coming from the development team that get integrated (or not) via RuleQA. There was never a functional process for maintaining the wiki pages for rules properly in conjunction with that continuous change process, and the descriptions were mostly not much more illuminating than the 'describe' lines in the rules files.

4
Adam Katz On

Not all rules are documented on the SpamAssassin wiki — there's way too many of them to do that. You can get automated efficacy data for MIME_HTML_MOSTLY from the SpamAssassin Rule QA system, but not the definition.

The current definition for that rule (discounting translations) from rules/20_body_tests.cf is:

# … line 139 (quite likely to change)
body MIME_HTML_MOSTLY       eval:check_mime_multipart_ratio('0.00','0.01')
describe MIME_HTML_MOSTLY   Multipart message mostly text/html MIME
# … rules/50_scores.cf line 616 (also quite likely to change)
score MIME_HTML_MOSTLY 0.1

This is an eval rule, so you'll have to look at the perl code to see exactly what it's doing.

In lib/Mail/SpamAssassin/Plugin/MIMEEval.pm, you'll find:

# … line 214
sub check_mime_multipart_ratio {
  my ($self, $pms, undef, $min, $max) = @_;

  $self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
  return 0 unless exists $pms->{mime_multipart_ratio};
  return ($pms->{mime_multipart_ratio} >= $min &&
      $pms->{mime_multipart_ratio} < $max);
}

# … line 491
    if (defined($text) && defined($html) && $html > 0) {
      $pms->{mime_multipart_ratio} = ($text / $html);
    }

This means the ratio of the text MIME part's length to the HTML MIME part's length must be equal to or above zero and also under 1%.

(Line numbers are from the current trunk repository, not a release. The code shouldn't change much, but the line numbers likely will, especially within the .cf files.)