Let's define two overlapping sets: (1) Cases of criminal and (being generous) what ought to be criminal activity actually conducted or condoned under government auspices; (2) Claimed cases of such activity propagated via informal channels, true or not, accurate or not.
We can probably agree (a) that most of (1) are not the subject of (2), and are never found out, let alone prosecuted, in large part because often (2) would be the only means available to expose them, and are not taken seriously. We can probably agree (b) that most of (1), whether included in (2) or not, amount to petty corruption with no systematic intent to subvert government policy, or to harm specific groups beyond robbing them blind and suppressing evidence of it. We can probably agree (c) that identifying, stopping, and (who knows?) even prosecuting (1), whether in (2) or not, ought to make for better government. We can probably agree (d) that cases of (1) not described in (b) are both the most harmful to good government and the most vigorously defended against exposure.
Of items in (2), we would like to identify those also in (1). If recognizing statistical properties were to help enough, we might be able to focus enough attention on the real ones to do something about them. A problem is that any big-enough and more-or-less accountable government operates research programs devoted (with always the best intentions!) to ways to keep (1) from serious public attention, by way of misleading assertions in public media, and by exaggeration and insertion of recognizably false or just distracting details in (2). Research into how to identify cases of (1) in (2) is most easily applied for such purposes. We need statistical tests that not only identify the (1) subset of (2), but also resist being corrupted, preferably also exposing the attempt.
(As posted on technologyreview.com)