Crowdsourcing a Democracy Index: An Update

(Part 1 of possibly several, depending on time and mood)

A couple of months ago, I set up a democracy ranking website using the Allourideas software as part of a class project to crowdsource a democracy index (which has now been completed; more on that project in an upcoming post). The site works by presenting the user with a random comparison between two countries, and asking them to vote on which of these countries was more democratic in 2010 (click here if you can't see the widget below):

The 100 or so students in my class started the ball rolling, and their responses generated an initial democracy index that had a correlation of about 0.62 with the Freedom in the World index produced by Freedom House: respectable but not great. The post describing the initial results got some links from Mark Belinsky, the Allourideas blog, and Jonathan Bernstein, which increased the number of votes substantially. In fact, as of this writing, the website has registered 4402 (valid) votes, from about 203 different IP addresses, mostly in the USA, New Zealand, and Australia:

4,402 valid votes means at most 4,402 distinct comparisons out of a possible 36,672 potential comparisons of 192 countries (most comparisons have appeared only once, but a few have appeared a couple of times), or about 12% of all possible comparisons. How has the increase in the number of voters changed the generated index? And how does it compare to the current Freedom House index for 2010? As we shall see, the extra votes appear to have improved the crowdsourced index considerably.

Here is a map of the scores generated by the "crowd" - i.e., voters in the exercise (darker is more democratic, all data here):

And here's a scatterplot comparing the generated scores to Freedom House's scores for 2010 (click here for a proper large interactive version):

The Y axis represents the score generated by the Allourideas software: basically, the probability that the country would prevail in a comparison with a randomly selected country. For example, the Allourideas software predicts that Denmark (the highest ranked country) has a 96% chance, given previous votes, of prevailing in a “more democratic” comparison with another randomly selected country for 2010, whereas North Korea (the lowest ranked country) only has a 5% chance of prevailing in this comparison. The X axis represents the sum of the Freedom House Political Rights and Civil Liberties scores for last year (from the “Freedom in the World 2011” report), reversed and shifted so that 0 is least democratic and 12 is most democratic (i.e., 14-PR+CL). The correlation between Freedom House and the crowdsourced index is a fairly high 0.84 (which is about as high as the correlation between the combined Freedom House score and the Polity2 score for 2008: 0.87). But how good is this, really? What do these scores really represent?

At the extremes, judgments of democracy appear to be “easy”: Freedom House and the crowd converge. For example, among countries that Freedom House classifies as “Free,” only six countries (Benin, Israel, Mongolia, Sao Tome and Principe, and Suriname) receive a score of 40 or below from the “crowd,” which is the highest score that any country Freedom House classifies as “Not Free” receives (Russia). But in the middle there is a fair amount of overlap (just as with expert-coded indexes, whose high levels of correlation are driven by the “extreme” cases – clear democracies or clear dictatorships). Some of these disagreements could further be attributed to the relative obscurity of some of the countries involved, given the location of the voters in this exercise (few people know much about Benin, and anyway the index got no votes from Africa), but some of the disagreements seem to have more to do with the average conceptual model used by the crowd (e.g., the case of Israel). The crowd would seem to weigh the treatment of Palestinians more heavily than Freedom House in its (implicit) judgment of Israel’s democracy. This is unsurprising, since the website does not ask participants to stick to a particular “model” of democracy; the average model or concept of democracy to which the crowd appears to be converging seems to be slightly different than the model used by Freedom House.

We can try to figure out where the crowd differs the most from Freedom House by running a simple regression of Freedom House’s score on the score produced by the crowd, and looking at the residuals from the model as a measure of “lack of fit.” This extremely simple model can account for about 69% of the variance in the crowdsourced scores on the basis of the Freedom House score (all data available here); we can improve the fit (to 72%) by adding a measure of “uncertainy” as a control (the number of times a country appeared in an “I don’t know” event, divided by the total number of times it appeared in any comparison). What (I think) we’re doing here is basically trying to predict Freedom House’s index on the basis of the crowdsourced judgment plus a measure of the subjective uncertainty of the participants. The results are of some interest: for example, participants in the exercise appear to think Venezuela, Honduras, and Papua New Guinea have higher levels of democracy than Freedom House thinks, and they also appear to think that Sierra Leone, Lithuania, Israel, Mongolia, Kuwait, Kiribati, Benin, and Mauritius have lower levels of democracy than Freedom House thinks.

A more interesting test, however, would be to do what Pemstein, Meserve, and Melton do here with existing measures of democracy. Their work takes existing indexes of democracy as (noisy) measurements of the true level of democracy and attempts to estimate their error bounds by aggregating their information in a specific way. I might try do this later (I need to learn to use their software, and might only have time in a few weeks), though it is worth noting that a simple correlation of the crowdsourced score for 2010 with the “Unified Democracy Scores” Pemstein et. al. produce for 2008 by aggregating the information from all available indexes is an amazing 0.87, and a simple regression of one on the other has an R2 of .76. So the crowdsourced index seems to be doing something much like what the Unified Democracy Scores are doing: averaging different models of democracy and different "perspectives" on each country.

This all assumes, however, that there is something to be measured – a true level of democracy, which is only loosely captured by existing models. On this view, existing indexes of democracy reflect different interpretations of the concept of democracy, plus some noise due to imperfect information and the vagaries of judgment; they each involve a “fixed” bias due to potential misinterpretation of the concept, plus the uncertainty involved in trying to apply the concept to a messy reality whose features are not always easy to discern (try figuring out the level of civil rights violations in the Central African Republic compared with Peru in 2010, quick!). The crowdsourced index actually goes further and averages the different interpretations of democracy of every participant, just as the Unified Democracy Scores aggregate the different “models” of democracy used by different existing indexes. To the extent that the crowd’s models converge to the true model of democracy, then the crowdsourced index should also eliminate that “bias” due to misinterpretation. But it is not clear that there is a true model, or that the crowd will converge to it even if it existed: the crowdsourced index may have a higher bias (total amount of misinterpretation of the concept) than the indexes created by professional organizations. (And this conceptual bias might shift if more people from other countries voted; I’d really love to get more votes from Africa and Asia).

Even if there is no true model of democracy, it would be interesting to “reverse-engineer” the crowd’s implicit model by trying to figure out its components. (What do people weigh most, when thinking about democracy? Violations of civil liberties? Elections? Opportunities for participation? Economic opportunities?). One could do this, I suppose, by trying to predict the crowdsourced scores from linear combinations of independently gathered measures of elections, civil liberties, etc.; some form of factor analysis might help here? My feeling is that the crowd weighs economic “outcomes” more than experts do (so that crowdsourced assessments of democracy will be correlated with perceptions of how well a country is doing, like GDP growth), but I haven’t tried to investigate that possibility.

It would also be interesting to repeat the exercise by asking people to stick to a particular model of democracy (e.g., Freedom House’s checklist, or the checklist developed by my students – more on that later). It would also be great if the allourideas software had an option that allowed a voter to indicate that two countries are equal in their level of democracy (I think one could do this, but then I would have to modify the client; right now, the only way of signalling this is to click on the “I don’t know” button). Perhaps next year I will try some of these possibilities. All in all, it seems that crowdsourcing a democracy index produces reasonable results, and might produce even better results if the crowdsourcing is done with slightly more controls. (E.g., one could imagine using Amazon's "Mechanical Turk" and a specific model of democracy for generating data on particular years). I would nevertheless be interested in thoughts/further analysis from my more statistically sophisticated readers.

In an upcoming post I will explain how my students produced an index of democracy for 2010, 1995, and 1980, and how that crowdsourced effort compares with other existing indexes. (Short version: pretty well).

Nauseating Displays of Loyalty (Towards a General Theory of Sycophancy and Related Phenomena)

An anonymous reader points me to a very interesting paper by Victor Shih on "'Nauseating' Displays of Loyalty: Monitoring the Factional Bargain through Ideological Campaigns" (Journal of Politics 2008, vol. 70(4) pp. 1177-1192 [ungated]):
Autocrats, as factional patrons, only find out the true loyalty of clients during a serious political challenge, when they are least able to enforce the factional bargain. In autocracies with norms against cults of personalities, public, exaggerated praises may constitute an alternative way for clients to signal loyalty credibly. By suffering the social cost of being despised by others, sycophants credibly signal their affinity to a particular leader, thus deterring factional rivals from recruiting them into an alternative coalition. This article develops a measure of such displays of loyalty in China through content analysis of provincial newspapers between 2000 and 2004. OLS and PCSE estimations are used to inquire whether provincial faction members were more likely to echo an ideological campaign launched by their patron. Further analysis explores whether faction members in rich and poor localities echoed the campaign in different ways. The findings suggest that ideological campaigns function as radars that allow senior leaders to discern the loyalty of faction members.
The argument here is in interesting contrast to what I was trying to say in the post on cults of personality. The problem with cults of personality is that the "signal" of loyalty the dictator gets from followers is often uninformative: if everyone says that the dictator is a god, then the dictator cannot distinguish who is loyal (who will stand by him in a crisis, or at least not rebel if given the opportunity) and who is not. Mere praise in such circumstances is "cheap talk." So the leader has a incentive to develop some ways of making praise costly if it is to serve as a signal of loyalty (where loyalty is understood as a certain level of commitment to support the dictator, or at least not to support challengers). But where can this cost come from?

In my post on cults of personality, I argued that the cost comes precisely from the very dynamics of the strategic situation: because the dictator knows that the extravagant praise is uninformative as a signal of loyalty, he demands ever more bizarre performances, and in particular demands that one denounce those who show insufficient enthusiasm for the ever more bizarre performances. To the extent that most people do find it costly to deny reality and denounce others (especially if those others are friends and family), the signals retain some information about the level of commitment of the population to support him, or at least to acquiesce in his rule (given also the costs of not praising the dictator). The level of extorted praise serves as a gauge of the effectiveness of extortion. (Especially when the extorted praise includes denunciations of others: this is what it means in practice to support the dictator, i.e., to be loyal. It has little to do with liking the dictator).

To be sure, as Bernard Guerrero notes here in an interesting response, it is possible that what happens is that you get a sort of "arms race," where ever more bizarre performances are required as old performances lose their information content (because everyone eventually does them). Yet it does not necessarily follow that the signals from the cult lose all their informational value immediately; and as many dictators well know, a cult of personality has to be constantly refreshed. Propaganda is never-ending work. Moreover, even if the cult does not work well as a gauge of support, it can still produce loyalty directly (if some fraction of those exposed to it come to believe in the leader's charisma, which increases their commitment to support him) and it can prevent coordination, so that even if people actually hate the dictator, the cult still prevents them from plotting to overthrow him because they can't gauge other people's feelings. (For a somewhat different if related take on this, emphasizing the ways in which cults implicate the population into supporting the ruler even when they do not actually believe in the leader's charisma, see Lisa Wedeen's superb piece on Syria's Hafez al-Assad and his cult of personality, also recommended to me by a reader. The anecdote that opens the piece is priceless).

Which of these functions of the cult of personality as a tactic of power (gauging loyalty, producing loyalty, and preventing coordination) is most important is a complex question, whose answer probably depends on particular features of the strategic situation facing the dictator. (I'm writing a paper on the topic, so I hope to come to more definite conclusions in the future). I suspect, however, that the direct production of loyalty is the least important function; it seems exceedingly unlikely that calling Assad pere "the World's greatest dentist," as a friend told me apparently happened in Syria in the 80s, was ever seriously intended to persuade people of his charisma. Moreover, I think (for reasons that will become clearer in a second) that perhaps cults of personality are most useful to the dictator when he fears revolutionary threats (threats from outside the ruling elite) more than he fears coups (threats from inside the ruling elite), perhaps because he has been able to sufficiently consolidate his power at the expense of this elite. (Though there's a chicken-egg problem here, for the cult of personality also seems useful as a tactic to consolidate power, as it appears happened in Mao's China and Qaddhafi's Libya). There is after all a tension between the loyalty-gauging and the coordination-prevention uses of the cult, because the cult works best to prevent coordination when the costs to not praising the dictator are much higher than the costs of praising him, whereas it works best to gauge loyalty when the costs of praising him are not insignificant (though both costs could be and normally are high: not praising may entail jail or worse, but praising may entail denouncing loved ones or engaging in humiliating behavior). This means that the dictator may wish to relax the cult if he needs to gauge the loyalty of his close followers (who will help him against his people) more than he needs to prevent coordination among them. One might add that dictators don't always need very precise knowledge of the level of loyalty of the general population (and at any rate there are often other indicators of their likely level of loyalty, like protests, informers, surveys, the level of unemployment, etc.), in which case the coordination prevention and loyalty production functions of the cult becomes more useful vis a vis the general population than its loyalty-gauging uses.

Shih's paper nevertheless helps us understand how mechanisms similar to the cult of personality can help autocratic leaders gauge the loyalty of their close followers (not so much of the population as a whole). His focus is on the "ideological campaigns" that one sees in many communist countries, and especially in China, such as the "Three Represents" campaign during Jiang Zemin's tenure (opening the party to businessmen), or the "Harmonious Society" campaign that is still going on. Such campaigns typically present the thoughts of some particular leader as some momentous and utterly brilliant contribution to philosophy, and they constitute a standing invitation to sycophants, who say things like this:
‘Comrade Jiang Zemin’s thought concerning the "Three Represents" is like a giant building that overlooks the whole situation and contains rich content and deep meanings. It is a creative usage and development of Marxist theory and is strongly theoretical, scientiļ¬c, creative, and practical. (Yang Yongliang, the vice-secretary of Hubei, quoted by Shih).
But how is this sort of thing useful to leaders? The problem a leader faces here is that he needs to cultivate his supporters by paying them in various forms; but until the chips are down, he does not necessarily know who will in fact help him in such circumstances, because there are no regular opportunities to test their loyalty (like elections in democracies), and after a crisis he may not be around to punish actual disloyalty. So the leader really does need to gauge the loyalty of his clients if he fears potential revolt from below or attacks from other factions, but even extravagant praise does not reliably indicate a credible commitment to support him in times of crisis.

Shih argues that in modern China (post-Mao) extravagant praise has retained its informational value as a signal of loyalty precisely because top leaders have supported norms against cults of personality (a norm that existed before Mao consolidated his power and which was supported by the top leadership after he died as a preventive measure against attempts to concentrate power in similar ways). When there is a norm against cults of personality, the stigma of violating it (and being known as a groveller) is a sufficient cost to ensure that the "praise" really is a credible signal of loyalty to a patron, especially when there are few other options to provide credible signals of loyalty (like, e.g., providing business opportunities for the leader's family or extending extravagant "hospitality" to the leader when he comes to visit your city). The norm seems to exist not only or even at all to prevent concentrations of power, but because top leaders gather useful information from its violation. So leaders launch "ideological" campaigns (like the "Three Represents" campaign) in order to see who will violate the norm against cults of personality.

This is a very clever piece of research. The key fact that Shih exploits to support his thesis is the degree of variation in the extent to which ideological campaigns are echoed by party newspapers around China. In particular, he shows that during the "Three Represents" campaign, newspapers in provinces linked to Jiang Zemin's clients were much more likely to echo it than other newspapers, but only if the province apparatchiks had few other means to signal support. So party newspapers in richer provinces (like Shanghai) which could offer Jiang other signals of support (like business opportunities for his family members or special hospitality when he came to visit) were less likely to exhibit "nauseating" displays of loyalty (the phrase comes from one of the people Shih interviewed, and reflects the anti-cult of personality norm current in today's China) than party newspapers in poorer provinces (which were more dependent on central government support), allowing Jiang to keep tabs on the loyalty of his poorer clients. And in provinces which were not linked to his faction, there were far fewer nauseating displays of loyalty. (One could quibble with a few things. For one, I am unsure how good Shih's measures of whether a province's leaders could be said to be part of Jiang's faction are. But I'm no China specialist. And there is a question as to how useful those extreme displays of loyalty really are to the leader).

The more interesting general point that comes out of these sorts of studies, for me, is how little traditional ideas about "legitimacy" matter for explaining support in all sorts of regimes. Support seems explainable in many cases as a result of signalling equilibria, whereas the traditional Weberian ideas about traditional, charismatic, and rational legitimacy seem to play little role. In fact, I have a hunch - not well developed - that one could understand what is traditionally called "legitimacy" in terms of various sorts of signalling equilibria, and not much would be lost. But that would require a much longer post to explain, and perhaps a paper.