Don’t Hate the Requester; Hate the Game: An economic dilettante’s take on Mturk ethics
Maybe they are just not very vocal, but I’ve never come across an Amazon Mechanical Turk (mturk) worker who lauds the business practices of mturk. In my experience as both a requester and worker on Amazon Mechanical Turk (mturk), I find ethical problems revolve around two primary concerns: 1) Workers are paid pittance and 2) Requesters have free range to reject work without justification. The second problem is hard to address. It basically comes down to “who watches the watchers.” Tools like TurkOpticon put some power into the worker’s hands, but probably not enough. However, it also seems to not be as large of a problem. People may occasionally get shafted, but in the big picture the effects seem to be small compared to the low payment rate (not to say it doesn’t suck though).
The low payment rate is an interesting moral question because it is somewhat unique, at least to those involved. Opponents argue that a sizable percentage use mturk as a primary or secondary income, and they should be earning a living wage. Proponents argue that mturk was not meant to used that way. Unlike the Walmart wage debate, no requester is trying to hire a worker even at part-time. They would say they are merely putting a task out there. If it pays too little then no one will accept it; forcing a higher payment rate would simply mean they would go elsewhere to go their data. Everyone loses. This reasoning is not infallible, and the rhetorical arguments can go on forever.
Instead, I have been thinking more about the economic argument. I’ve covered about 2 weeks worth of game theory in my economic principles of marketing course (essentially a whirlwind microecon 1 and 2 course covered in like 5 weeks). So naturally, I am now an expert ready to analyze important, broad, real-world topics. (Side note: if you didn’t catch that, I’m not an expert and don’t claim that what I write below is at all the way a world-class economist would analyze the situation… hell, I may have even made a wrong assumption. If you can do better, join the discussion).
At the most shallow level, mturk, like all employee-employer relationships, is like a Prisoners’ Dilemma game. The incentives in place push for workers to not work hard, and push for employers to pay low wages. In the table below, workers can choose to work or loaf (i.e. give junk work), and requesters choose to pay a high wage or a low wage. Each party chooses a strategy at the same time with perfect knowledge of the other person’s strategy. By perfect knowledge, I mean that both the employer and employee know the payoffs to each party. Thus, they know what the other person will do given the strategy they pick.
|High wage||v-h, h-a||-h, h|
|Low wage||v-l, l-a||-l, l|
The first amount is the outcome for the requester, after the comma is the outcome to the worker. v=value of work, h=high wage, l=low wage, a=effort of working.
The equilibrium of this game (i.e. the point where neither player has an incentive to deviate from that strategy) is for the worker to loaf and the requester to pay a low wage. This is not how mturk operates, but it provides a baseline to analyze against. The low wages seem to exist, but the loafing does not. If it did, no one was use it for data purposes. However, these are the things people worry about because without checks the prisoners’ dilemma, where everyone is worse off, would occur.
Something that differentiates mturk is the ability for requesters to reject workers submissions and not pay them. Forgetting that requesters can be evil and reject everyone for no reason, we’ll assume that requesters have sufficient reason and have to put some effort into checking. In this case, workers can still work or loaf, but requesters can pay high and check, pay high and not check, pay low and check, and pay low and not check. Additionally, loafing has costs as well (it takes some time to complete a HIT with bad work, but not as much as it takes to complete a HIT with good work — hence d < a). They were not included before because they did not affect the equilibrium. Last, checking has costs as well (e.g. time taken to add an attention filter to a survey or get correct answers to compare to, research validity costs of making workers think you are watching them, time taken to actually check and reject bad work, etc.). This cost will be reflected in the cells where the requester checked if the worker loafed or not.
|High wage, check||v-h-c, h-a||-c, -d|
|High wage, not check||v-h, h-a||-h, h-d|
|Low wage, check||v-l-c, l-a||-c, -d|
|Low wage, not check||v-l, l-a||-l, l-d|
Outcomes are in the same order. v, h, l, and a are the same. c=cost of checking, d=cost of loafing; d<a, c<v
Here there is no pure strategy Nash equilibrium (i.e. at no point in the table do both sides have no incentive to choose a different strategy). If the worker works, the requester’s strategy is to pay low and not check (that cell has the highest payoff to the requester). However, if the requester is going to do that (which the worker is aware of given perfect knowledge), then the worker would loaf since they do less work and get the same amount of money. Knowing that is the case, the requester would check thus not paying the worker. Knowing the requester would check, the worker would work. Finally knowing that, the requester wouldn’t check anymore. This unending cycle means each party’s strategy is chosen probabilistically (i.e. a mixed strategy Nash equilibrium) to maximize payoffs given beliefs about how likely an actor is to choose a particular strategy. This explains why some workers loaf and some work very hard. The loafers are playing a probability that they will be paid even though they did poor work. That probability exists because the requester is also playing a probability. Instead of checking work for each HIT, they only check some. Thus they save on the costs of checking, and provide some incentive to give good work. It is possible to work out these probabilities as a function of the costs and payments involved, but instead going through all that math, I will make two reasonable assumptions to change the game to make it slightly easier to solve.
First, the cost of checking is rather low, especially for research experiments. Adding an attention filter requires little effort, and most people have a research assistant updating credit (thus there is no time component). Treating c as zero means the check and not check rows in each wage level pay the same to the requester (v-h or v-l). Second, the cost of not working is sometimes as high as the cost of working. The amount of time saved is often negligible, and some people (those not doing it as a primary or secondary income) find the cognitive challenge of some research studies stimulating (that is the studies that don’t kill you with boredom), making the cost of doing good work less. Thus if we treat a and d as equal they can also be removed from the columns. This last assumption is less believable, but it also may not affect the equilibrium in a meaningful way.
After these assumptions are made, a pure strategy equilibrium exists (i.e. one of the cells is clearly preferred, and people do not choose probabilistically anymore). That strategy is for workers to work and for requesters to pay a low wage and check. This equilibrium somewhat reflects the state of mturk. Most workers do good work. Most requesters check to make sure they do and 99% of HITs pay below the minimum wage if computed at an hourly rate. One interesting point is after the assumption stated above are made, the low wage-check and low wage-no check cells are identical. This means that while all the actors can assume requesters will check. Requesters may not check, and only they would know (since information about checking is only known to workers if they do not work). After talking with lots of other researchers who use mturk, this reflects the state of the game. They check when its easy to do so, but find it isn’t always necessary and often do not check.
Recall that two hazards existed in mturk. Workers can turn in poor work, and requesters can pay a low wage. Absent the ability to reject HITs by checking the work of workers, a Prisoner’s Dilemma existed where workers turned in poor work in exchange for a small wage. When the ability to reject was introduced, the optimal strategy for workers was to turn in good work. Rejection thus solves for one of the hazards of mturk. This ends up being beneficial for both parties since workers get paid more on average and requesters get a high value from the data received.
However, low wages still exist. For those who do mturk for fun, that might not be such a bad thing. There is some intrinsic reward, as I mentioned earlier, and it may be better than other ways to waste time. However, given the large number of people who use Mturk as a source of income, the ethical dilemma still exists. What can be done to push the equilibrium to the high wage cells? If costs of checking were different for the two levels (i.e. if for some reason it was cheaper to check the work when paying high wages compared to low wages), it may be enough make the low wage payoff lower. However, it is unclear if this is feasible. A better possibility is to have different values of work for low wage and high wage. If the value of v in the high wage were, say, double that of v in the low wage cells, v-h-c may be greater than v-l-c. What does that look like in reality. 1) If you can help it, don’t complete under paid HITs. If it takes 4 times longer to get a HIT done at 25 cents than 50 cents, people will pay 50 cents. This is because the value (v) at that low wage is much lower given that people usually want their HITs completed quickly. 2) Hopefully this happens naturally, and not purposefully, but poorer data (i.e. more noise, more loafers, etc.) that comes from cheap HITs is likely to cause higher payments. Again, the value of a cheap HIT becomes low because the data isn’t usable or it takes time to weed through the poor responses. This could backfire though by causing requesters to look elsewhere for data collection needs. 3) Really focus when you complete a high paying HIT. Confirm the researchers thoughts that if they pay better they get better data (this is a generally accepted assumption). I’ve seen some respondents who complete a $5 HIT that takes 10 minutes in under a minute solely because (I assume) they think there is a chance itll sneak by. Seeing data get worse as cost goes up instantly lowers the payment rate (i.e. the value of work at high payment rates becomes lower than the value of work at low payment rates). The example above is very isolated however.
This simple economic analysis of mturk tells me that payment levels are where they are due to the incentive structure of mturk. Rejection prevents workers from not delivering good information, but giving good information regardless of payment causes low wages. Enforcing a type of minimum wage on HITs likely would push requesters out of the market, and return them to labs and expensive national online panel companies. Thus the primary way workers can help boost the wage level is to not complete low-wage HITs and ensure high-wage HITs are completed well. Seemingly obvious conclusions I know, but it was fun figuring out the game.