Democratic
societies are built around the principle of free and fair elections, that each
citizen’s
vote should
count equal. National elections can be regarded as large-scale social
experiments, where
people are
grouped into usually large numbers of electoral districts and vote according to
their
preferences.
The large number of samples implies certain statistical consequences for the
polling
results
which can be used to identify election irregularities. Using a suitable data
collapse, we find
that vote
distributions of elections with alleged fraud show a kurtosis of hundred times
more than
normal
elections. As an example we show that reported irregularities in the 2011 Duma
election are
indeed well
explained by systematic ballot stuffing and develop a parametric model quantifying
to
which
extent fraudulent mechanisms are present. We show that if specific statistical properties are
present in
an election, the results do not represent the will of the people. We formulate
a parametric
test detecting these statistical properties in election results. For demonstration the model is also
applied to
election outcomes of several other countries.
Free and
fair elections are the cornerstone of every
democratic
society [1]. A central characteristic of elec-
tions being
free and fair is that each citizen’s vote counts
equal.
However, already Joseph Stalin believed that
”The people
who cast the votes decide nothing. The
people who
count them decide everything.” How can it
be
distinguished whether an election outcome represents
the will of
the people or the will of the counters?
Elections
are fascinating, large scale social experi-
ments. A
country is segmented into a usually large
number of
electoral districts. Each district represents
a standardized
experiment where each citizen articulates
his/her
political preference via a ballot. Despite differ-
ences in
e.g. income levels, religions, ethnicities, etc.
across the
populations in these districts, outcomes of
these experiments
have been shown to follow certain uni-
versal
statistical laws [2, 3]. Huge deviations from these
expected
distributions have been reported for the votes
for United
Russia, the winning party in the 2011 Duma
election
[4, 5].
In general,
using an appropriate re-scaling of elec-
tion data,
the distributions of votes and turnout are ap-
proximately
a Gaussian [3]. Let Wi be the number of
votes for
the winning party and Ni the number of vot-
ers in
electoral district i, then the logarithmic vote rate
is νi = log Wi−Ni
Wi
. In figure 2 we show the distribution
of νi over all electoral districts. To first order the data
from different countries collapse to a
Gaussian. Clearly
the data
for
line.
Skewness and kurtosis are listed for each data-set
in table
SII, confirming
these observations quantitatively.
Most
strikingly, the kurtosis of the distributions for Rus-
sia (2003,
2007 and 2011) and
orders of
magnitude from each other country. The only
reasonable
conclusion from this is that the voting results
in
or
processes than other countries.
However,
such distributions only reveal part of the
story, and
a different representation of the data becomes
helpful to
gain a deeper understanding. Figure 1 shows
a 2-d
histogram of the number of electoral districts for a
given
fraction of voter turnout (x-axis) and for the per-
centage of
votes for the winning party (y-axis). Results
are shown
for recent parliamentary elections in
presidential
elections in the
obtained
from official election homepages of the respec-
tive
countries, for more details and more election results,
see SOM.
These figures can be interpreted as fingerprints
of several
processes and mechanisms leading to the over-
all
election results. For
these fingerprints
are immediately seen to differ from the
other
countries. In particular there is a large number of
districts
(thousands) with a 100% percent turnout and
at the same
time a 100 % of votes for the winning party.
The shape
of these irregularities can be understood
with the
assumption of the presence of the fraudulent
action of
ballot stuffing. This means that bundles of
ballots
with votes for one party are stuffed into the
urns.
Videos purportedly documenting these practices
are openly
available on online platforms [6–8]. In one
case the
urn is already filled with ballots before the elec-
tions
start, e.g. [6], in other cases members of the elec-
tion
commission are caught filling out ballots, e.g. [7].
Yet in
another case the pens in the polling stations are
shown to be
erasable, e.g. [8]. Are these incidents non-
representative
exceptions or the rule?
We develop
a parametric model to quantify the extent
of ballot
stuffing for a given party to explain the elec-
tion fingerprints
in figure 1. The distributions for
and
levels of
turnout and votes, smeared towards the upper
right parts
of the plot. The second peak is situated at
the
vicinity of the 100% turnout, 100% votes point. This
suggests
two modes of fraud mechanisms, incremental
and extreme
fraud. Incremental fraud means that with a
given rate
ballots for one party are added to the urn and
votes for
other parties are replaced. This occurs within
a fraction
fi of electoral districts. In the election finger-
prints in figure
1 these districts are shifted to the upper
right.
Extreme fraud corresponds to reporting nearly all
votes for a
single party with an almost complete voter
turnout.
This happens in a fraction fe of districts, which
form a
second cluster near 100% turnout and votes for
the
incumbent party.
For
simplicity in the model we assume that within each
electoral
district turnout and voter preferences follow a
Gaussian
distribution with the mean and standard devi-
ation taken
from the actual sample, see figure S2. With
probability
fi (fe) the incremental (extreme) fraud mech-
anisms are
then applied. Note that if more detailed as-
sumptions
are made about possible mechanisms leading
to
large-scale heterogeneities in the data such as city-
country differences in turnout (
(
mate of fi.
Figure 3 compares the observed and mod-
eled fingerprint
plots for the winning parties in
fi = fe = 0
(fair elections) and for best fits to the data
(see SOM)
for fi and fe. To describe the smearing from
the main
peak to the upper right corner, an incremental
fraud
probability around fi = 0.64 is needed for the case
of United
districts.
In the second peak around the 100% turnout
scenario
there are roughly 3,000 districts with a 100%
of votes
for United Russia representing an electorate of
more than
two million people. Best fits yield fe = 0.05,
i.e. five
percent of all electoral districts experience ex-
treme
fraud. A more detailed comparison of the model
performance
for the Russian parliamentary elections of
2003, 2007
and 2011 is found in the figure S3. The fraud
parameters
for the
and fe =
0.01.
The
dimension of election irregularities can be visual-
ized with
the cumulative number of votes as a function
of the
turnout, figure 4. For each turnout level the to-
tal number
of votes from districts with this, or lower
turnouts
are shown. Each curve corresponds to the re-
spective
election winner in a different country. Normally
these cdfs
level off and form a plateau from the party’s
maximal
vote count on. Again this is not the case for
creased
extreme fraud toward the right end of the distri-
bution (red
circles).
to form a
plateau.
It is
imperative to emphasize that the shape of the fin-
gerprints
in figure 1 will deviate from pure 2-d Gaussian
distributions
due to non-fraudulent mechanisms, such as
heterogeneities
in the population or voter mobilization,
see SOM.
However, these can under no circumstances ex-
plain the
mode of extreme fraud. A bad forgery is the
ultimate
insult1.
It can be
said with almost certainty that an election
does not
represent the will of the people, if a substantial
fraction
(fe) of districts reports a 100% turnout with al-
most all
votes for a single party, and/or if any significant
deviations
from the sigmoid form in the cumulative dis-
tribution
of votes versus turnout are observed. Another
indicator
of systematic fraudulent or irregular voting be-
havior is a
kurtosis of the logarithmic vote rate distribu-
tion of the
order of several hundreds.
Should such
signals be detected it is tempting to in-
voke G.B.
Shaw who held that ”[d]emocracy is a form of
government
that substitutes election by the incompetent
many for
appointment by the corrupt few.”
FIG. 1.
Election fingerprints: 2-d histograms of the num-
ber of
electoral districts for a given voter turnout (x-axis)
and the
percentage of votes (y-axis) for the winning party (or
candidate)
in recent elections from eight different countries
(from left
to right, top to bottom:
represents the
number of electoral districts. Districts usually
cluster
around a given turnout and voting level. In
and
region of
the plots, reaching a second peak at a 100% turnout
and a 100%
of votes (red circles). In
is smeared
out into two directions (indicative of voter mo-
bilization
due to controversies surrounding the True Finns).
In the
rural and
urban areas (see SOM).
FIG.
tions in different countries is to present the distributions of
the
logarithmic vote rates νi of the winning parties as rescaled
distributions
with zero-mean and unit-variance [3]. Large de-
viations
from other countries can be seen for
FIG. 3.
Comparison of observed and modeled 2-d histograms
for (top to
bottom)
left column
shows the actual election fingerprints, the middle
column
shows a fit with the fraud model. The column to
the right
shows the expected model outcome of fair elections
(i.e.
absence of fraudulent mechanisms fi = fe = 0). For
The results
for
the model
assuming a large number of fraudulent districts.
FIG. 4. The
ballot stuffing mechanism can be visualized by
considering
the cumulative number of votes as a function of
turnout.
Each country’s election winner is represented by a
curve which
typically takes the shape of a sigmoid function
reaching a
plateau. In contrast to the other countries,
and
show a
pronounced increase (boost) close to complete turnout.
Both
irregularities are indicative of the two ballot stuffing
modes being
present.
SUPPORTING
ONLINE MATERIAL
The data
Descriptive
statistics and official sources of
the
election results are shown in table SI. The
raw data will
be made available for download at
http://www.complex-systems.meduniwien.ac.at/.
Each data
set reports election results of parliamentary
(
or
presidential (
level. In the
rare circumstances where electoral districts
report more
valid ballots than registered voters, we work
with a
turnout of 100%. With the exception of the
data, each
country reports the number of registered
voters and
valid ballots for each party and district. For
the
population
on district level, which was estimated to be
the same as
the population above 18 years, available
at
http://census.gov. Fingerprints for the 2000 US
presidential
elections are shown in figure S1 for both
candidates
for districts from the entire
only. There
are no irregularities discernible.
Model
A country
is separated into n electoral districts i, each
having an
electorate of Ni people and in total Vi valid
votes. The
fraction of valid votes for the winning party
in district
i is denoted vi. The average turnout over all
districts,
¯ a, is given by ¯ a = 1/n
P
i
(Vi/Ni)
with stan-
dard
deviation sa, the mean fraction of votes ¯ v for the
winning
party is ¯ v = 1/n
Pi
vi with
standard deviation
sv. The
mean values ¯ a and ¯ v are typically close to but
not
identical to the values which maximize the empirical
distribution
function of turnout and votes over all dis-
tricts. Let
v be the number of votes where the empirical
distribution
function assumes its (first local) maximum
(rounded to
entire percents), see figure S2. Similarly a
is the
turnout where the empirical distribution function
of turnouts
ai takes its (first local) maximum. The dis-
tributions
for turnout and votes are extremely skewed to
the right
for
standard
deviations in these countries, see table SI. To
account for
this a ’left-sided’ (’right-sided’) mean devia-
tion
σL
v (σR
v ) from v
is introduced. σR
v can be
regarded
as the
incremental fraud width, a measurable parameter
quantifying
how intense the vote stuffing is. This con-
tributes to
the ’smearing out’ of the main peaks in the
election fingerprints,
see figure
larger
σR
v , the
more inflated the vote results due to urn
stuffing, in contrast to σL
v which
quantifies the scatter
of the
voters’ actual preferences. They can be estimated
from the
data by
σL
v =
ph(vi −
v)2ivi<v , (1)
σR
v =
ph(vi −
v)2ivi>v . (2)
Similarly
the extreme fraud width σx can be estimated,
i.e. the
width of the peak around 100% votes. We found
that
σx = 0.075 describes all encountered vote distribu-
FIG. 1.
Turnout against percentage of votes for Bush (left col-
umn) and
Gore (right) in the 2000 US presidential elections.
Results are
shown for all districts in the
for
districts from
fraudulent
mechanisms discernible in the fingerprints.
FIG.
function
shows how v, σL
v , σR
v and σx are derived from the
election
results. v is the maximum of the distribution func-
tion.
σL
v measures
the distribution width of values to the left
of v, i.e.
smaller than v. The incremental fraud with σR v
measures the
distribution width of values to the right of v,
i.e. larger
than v. The extreme fraud width σx is the width
of the peak
at 100% votes.