Current Newsletter: Dataset Review

Dataset Review

Free and Fair Elections- A new database Sylvia Bishop and Anke Hoeffler

Review by Bryce Williams-Tuggle,
Vanderbilt University

A new, theoretically driven dataset on election quality with coverage from 1975-2012, including approximately 180 African elections.

Bishop and Hoeffler (2016) have made a sizable contribution to the literature by compiling an original dataset that evaluates the quality of elections around the world from 1975-2012. They motivate their research endeavor by stating “the holding of elections has become universal but only about half of all elections are free and fair…[Nevertheless,] existing datasets either provide information on election quality for a large number of elections but offer little detail, or they provide very detailed information for a small number of elections.” They see this as problematic and thus seek to provide a richer data source on election quality around the world than previously available to researchers. 

To this end, their data are primarily composed of ten new indicators, seven related to pre-election events, and three to either day of or post-election events. While their research agenda is global in scope, their data also comprise approximately 180 elections from across Africa for which they have coverage on at least some of their indicators of election quality.1   Their data are freely available for academic research and are available from the Center for the Study of African Economies (CSAE) at the University of Oxford through their website found here. 

"While compelling, I think they don't make enough of what their data can offer to researchers."

Perhaps one of the most interesting components of their data is first making sense of how to use them. For their part, (Bishop and Hoeffler 2016) perform some preliminary analyses to determine how things such as aid/trade flows, international observers, election type, democratization timing, and constitutional constraints impact whether elections are free, fair, or jointly free and fair. While compelling, I think they don’t make enough of what their data can offer to researchers.

First and foremost, their classification scheme for how to group their electoral quality indicators is self-consciously theoretically driven, but I wonder if this is not problematic. Specifically, as mentioned above, they group their first seven indicators2  as those related to whether or not an election is “free” in nature, and their final three indicators3  as those related to whether or not an election is “fair.” For each indicator, there is a series of criteria that determine whether that election is positively evaluated on that measure, and they dichotomously code each indicator such that those election observations that do not abridge any of the selection criteria are coded as a 1, and those that do are coded as a 0. From there they code elections as ‘free’ if they receive a score of 1 for at least four of the seven related indicators, and as ‘fair’ if they receive a score of 1 for at least two of the three related indicators.

From this starting point, I think there are a few issues to note. First, in their coding scheme they construct their free and fair measures from a count total of the number of categories that fulfill the criterion, but this includes data points for which there is too much missingness in order for them to meet the criteria. That is, while their count totals that they use to construct the measures from do exclude those observations for which they have missing data for all of the indicators, included in them are observations for which they are missing data for up to nine of the ten indicators. That is, an observation has the potential to be coded as ‘unfree’ or ‘unfair’ if they have a score of 0 for one of the relevant indicators and data missingness for all of the other indicators.

While this might be sensible if missingness belies something theoretically relevant about their data, whether or not this is the case is not entirely clear. To improve upon this, I think it would be more appropriate that the measures used to compose the ‘unfree’ and ‘unfair’ indicators should be limited to those observations for which there is enough data for them to at least theoretically fulfill the selection criteria. Thus, for the unfree measure, there must be data for at least four of the seven relevant indicators, and at least two of the three for the unfair measure. Constructing the measures in this way does have the negative externality of decreasing the number of elections covered by at least one of the two measures from 890 to 722 observations, but it does so with the advantage of increasing the congruence of the empirical measure with the concept.

This minor point aside, while I applaud Bishop and Hoeffler (2016) for their theoretical approach to the construction of their free and fair measures, it strikes me that there is room for a more inductive approach to the analysis of the data. Specifically, while it might be the case that ‘free’ and ‘fair’ represent different theoretical dimensions of electoral quality each in need of their own analyses, it might also be the case that they are both related to one underlying dimension. Therefore, the approach of collapsing their rich data into a series of either-or dummies could obfuscate some pretty interesting variation.

To wit, as a quick first peek at whether or not these items scale together I performed a test of average inter-item consistency on the ten measures of electoral quality included in the data both broken down on the free and fair dimensions as well as on the full set of indicators. The resulting statistic for such a test, Chronbach’s-, ranges from [0,1] where reliability coefficients above .8 are generally regarded as high and illustrative of items that scale together well. While the measures for the free and fair dimensions do scale together quite nicely with reliability coefficients of .855 and .833 respectively, they actually scale better together than separately as an analysis of all ten measures together yields a reliability coefficient of .890

Some important caveats here are needed due to the structure of the tests performed above, though, as they come with several empirical assumptions. First the test cannot parse whether or not there are multiple underlying directions per se, but rather just that they on average correlate highly with one another. Relatedly, the test also implies a linear relationship between the measures and weights them all equally in order to produce any resulting scales. To disentangle them further would require a more sophisticated analysis than is within the scope of this review such as a principal component analysis (PCA), but these initial results hopefully indicate the necessity of such an endeavor. Further, more than a purely empirical endeavor, such an analysis of these rich data might also uncover meaningful theoretical dimensions of electoral quality that could serve to enhance future research for scholars of democracy.

Nevertheless, as a first cut of the data, I composed a scale from the full set of ten indicators that denotes the average electoral equality for each election in the data for which there were data for at least four of the ten indicators.4 The resulting electoral quality index ranges from [0,1] where a value of 0 denotes an extremely low quality election, and a score of 1 an high quality election. Table 3 shows the average electoral quality of Africa as a whole as well several specific nations in the data along with the number of elections for which there are enough data to compute the scale described above. Additionally, Figure 1 conveys similar information graphically, by showing the average electoral quality in Africa across time as well as the number of elections within each year that compose that average. As the figure conveys, there is a trend toward more data availability/elections across the continent, but this increased sample size troublingly appears to coincide with generally lower quality elections.

As a final note, the data in their current state, as briefly mentioned above, are a series of indicator variables composed from a complex selection criterion. For each of the indicators, there are 3-5 selection criteria that a given election must meet in order to be classified as having fulfilled those criteria or not.

While useful, I wonder if it might not be more interesting to have access to coarser data that allow for ordinality in each of the indicators. That is, for each of their ten indicators, does a given election fulfill none, some, or all of the selection criteria? Especially if these data are going to be used in order to create more sophisticated scales of electoral quality, that richness would likely prove quite useful.

In sum, then, the original dataset compiled from primary source documents by Bishop and Hoeffler (2016) represents a significant contribution to the academic community and merits further exploration. First and foremost, I think the data might be improved by backing out their constituent parts to create a series of ordinal indicators of election quality that can then be interrogated to see how well, and in what ways, they scale together. From there, they can be easily paired with a variety of other publicly available data to explore both the determinants and downstream consequences of low/high electoral integrity.


  1. Specifically, they have data on at least four of their ten indicators in the following countries (number of elections in parentheses): Algeria (8); Benin (5); Botswana (7); Burkina Faso (3); Burundi (3); Cameroon (4); CAR (4); Chad (3); Comoros (3); DRC (2); Congo (3); Cote d’Ivoire (4); Djibouti (4); Egypt (2); Equatorial Guinea (3); Gabon (2); The Gambia (6); Ghana (5); Guinea (4); Guinea-Bissau (3); Kenya (4); Lesotho (5); Liberia (4); Malawi (4); Mali (4); Mauritania (5); Mauritius (8); Morocco (1); Mozambique (3); Namibia (4); Niger (4); Nigeria (7); Rwanda (2); Senegal (7); Seychelles (3); Sierra Leone (3); South Africa (8); Sudan (3); Tanzania (4); Togo (5); Tunisia (3); Uganda (4); Zambia (6); Zimbabwe (4)

  2. Issues related to a state’s legal framework, electoral management bodies, electoral rights, voter register, ballot access, campaign process, and media access.

  3. Those pertaining to a state’s voting process, role of election officials, and counting of votes.

  4. That is, if there were four indicators with data for a given observation, the average be composed of those four. Likewise, if there were data for five indicators for a given observation, the average would be composed of those five, and so on up to observations that had data for all ten indicators.


Variables Included in the Data & Their Selection Criteria:

  • Variable 1 – Legal framework:
    • Citizens are constitutionally guaranteed the right to vote.
    • Citizens are constitutionally guaranteed the right to run for office.
    • Laws governing the electoral process are not changed just before the election.
    • Elections are held at regular intervals.
  • Variable 2 – Electoral management bodies (EMB):
    • Election boundaries are set so that no candidate/party is favoured (no gerrymandering) (de facto).
    • EMBs are held accountable to election law and abide by it.
    • EMBs are independent and impartial.
    • EMBs have sufficient time to organize elections (i.e. no snap election).
    • Decisions made by and complaints made to the EMBs are subject to review and possible reversal.
  • Variable 3 – Electoral rights:
    • Equal suffrage is in place for citizens of voting age (e.g. no voter group is systematically disadvantaged) (de facto).
    • Equal and effective access to polling stations is in place.
    • Any limitations on voting are based on internationally recognizable and acceptable norms.
    • Voters have been informed effectively about how and where to vote.
  • Variable 4 – Voter register:
    • Voter registers are up to date for the election taking place.
    • Voter registers are accurate: without false names, lack of correct names of individuals, inclusion of name of non-eligible voters (e.g. the dead or children) and multiple entries.
    • Voters are able to easily and effectively register to vote and can meet the necessary requirements on time.
  • Variable 5 – Ballot access:
    • Citizens eligible to stand are able to compete in the election (de facto).
    • Parties/candidates get equitable treatment when applying for office.
    • Any rejections of candidature are based on inter- nationally recognizable and acceptable norms.
    • No one candidate gets over 75% of the votes.
  • Variable 6 – Campaign process:
    • No violence, bribery, intimidation or any other inequitable treatment of voters occurs during the process.
    • No violence, bribery, intimidation or any other inequitable treatment of candidates occurs during the process.
    • Campaigns are free from government interference and the candidates are able to freely express themselves by holding rallies, etc.
    • Campaign finance: i. Prohibition on use of government resources other than that provided to all candidates; ii. Without massive financial advantages for the incumbents.
  • Variable 7 – Media access:
    • All parties/candidates are provided with access to the media.
    • All parties/candidates have equitable treatment and time on government owned media and the ruling party does not get disproportionately large media coverage in the name of news/editorial coverage.
    • Freedom of speech is preserved.
  • Variable 8 – Voting process:
    • Votes are cast by a secret ballot.
    • Voters are practically limited to one vote per person (de facto).
    • Adequate security is in place for both the voters and the ballots.
    • Balloting is done without ballot box stuffing, multiple voting, destruction of valid ballots, officer voting, or manipulation of votes cast out- side the polling place.
    • Voting occurs without intervention of any agent.
  • Variable 9 – Role of officials:
    • The officials adhere to the election procedures (e.g. they have been trained adequately; know which procedures to follow; do not interfere in the voting process; file complaints made to them, etc.).
    • Unauthorized persons are barred from entering the polling station (e.g. army members).
    • No campaigning is done within the polling station.
    • Transparency is in place: all parties are able to have observers in the station.
    • International Election Observers can view all parts of the voting process.
  • Variable 10 – Counting of votes:
    • Tabulation of votes can be tracked from polling stations up through intermediate centres and to the final processing station.
    • Entire counting process is observed by more than one group.
    • No rules on what constitutes a valid ballot that favour one candidate/party.
    • No evidence for fraud in any way (e.g. no inflation of election results by polling officials, no tampering with the ballot boxes during the counting or movement, etc.).