the:behavioral:lab

Don’t Hate the Requester; Hate the Game: An economic dilettante’s take on Mturk ethics

Maybe they are just not very vocal, but I’ve never come across an Amazon Mechanical Turk (mturk) worker  who lauds the business practices of mturk. In my experience as both a requester and worker on Amazon Mechanical Turk (mturk), I find ethical problems revolve around two primary concerns: 1) Workers are paid pittance and 2) Requesters have free range to reject work without justification. The second problem is hard to address. It basically comes down to “who watches the watchers.” Tools like TurkOpticon put some power into the worker’s hands, but probably not enough. However, it also seems to not be as large of a problem. People may occasionally get shafted, but in the big picture the effects seem to be small compared to the low payment rate (not to say it doesn’t suck though).

The low payment rate is an interesting moral question because it is somewhat unique, at least to those involved. Opponents argue that a sizable percentage use mturk as a primary or secondary income, and they should be earning a living wage. Proponents argue that mturk was not meant to used that way. Unlike the Walmart wage debate, no requester is trying to hire a worker even at part-time. They would say they are merely putting a task out there. If it pays too little then no one will accept it; forcing a higher payment rate would simply mean they would go elsewhere to go their data. Everyone loses. This reasoning is not infallible, and the rhetorical arguments can go on forever.

Instead, I have been thinking more about the economic argument. I’ve covered about 2 weeks worth of game theory in my economic principles of marketing course (essentially a whirlwind microecon 1 and 2 course covered in like 5 weeks). So naturally, I am now an expert ready to analyze important, broad, real-world topics. (Side note: if you didn’t catch that, I’m not an expert and don’t claim that what I write below is at all the way a world-class economist would analyze the situation… hell, I may have even made a wrong assumption. If you can do better, join the discussion).

At the most shallow level, mturk, like all employee-employer relationships, is like a Prisoners’ Dilemma game. The incentives in place push for workers to not work hard, and push for employers to pay low wages. In the table below, workers can choose to work or loaf (i.e. give junk work), and requesters choose to pay a high wage or a low wage. Each party chooses a strategy at the same time with perfect knowledge of the other person’s strategy. By perfect knowledge, I mean that both the employer and employee know the payoffs to each party. Thus, they know what the other person will do given the strategy they pick.

Work Loaf
High wage v-h, h-a -h, h
Low wage v-l, l-a -l, l

The first amount is the outcome for the requester, after the comma is the outcome to the worker. v=value of work, h=high wage, l=low wage, a=effort of working.

The equilibrium of this game (i.e. the point where neither player has an incentive to deviate from that strategy) is for the worker to loaf and the requester to pay a low wage. This is not how mturk operates, but it provides a baseline to analyze against. The low wages seem to exist, but the loafing does not. If it did, no one was use it for data purposes. However, these are the things people worry about because without checks the prisoners’ dilemma, where everyone is worse off, would occur.

Something that differentiates mturk is the ability for requesters to reject workers submissions and not pay them. Forgetting that requesters can be evil and reject everyone for no reason, we’ll assume that requesters have sufficient reason and have to put some effort into checking. In this case, workers can still work or loaf, but requesters can pay high and check, pay high and not check, pay low and check, and pay low and not check. Additionally, loafing has costs as well (it takes some time to complete a HIT with bad work, but not as much as it takes to complete a HIT with good work — hence d < a). They were not included before because they did not affect the equilibrium. Last, checking has costs as well (e.g. time taken to add an attention filter to a survey or get correct answers to compare to, research validity costs of making workers think you are watching them, time taken to actually check and reject bad work, etc.). This cost will be reflected in the cells where the requester checked if the worker loafed or not.

Work Loaf
High wage, check v-h-c, h-a -c, -d
High wage, not check v-h, h-a -h, h-d
Low wage, check v-l-c, l-a -c, -d
Low wage, not check v-l, l-a -l, l-d

Outcomes are in the same order. v, h, l, and a are the same. c=cost of checking, d=cost of loafing; d<a, c<v

Here there is no pure strategy Nash equilibrium (i.e. at no point in the table do both sides have no incentive to choose a different strategy). If the worker works, the requester’s strategy is to pay low and not check (that cell has the highest payoff to the requester). However, if the requester is going to do that (which the worker is aware of given perfect knowledge), then the worker would loaf since they do less work and get the same amount of money. Knowing that is the case, the requester would check thus not paying the worker. Knowing the requester would check, the worker would work. Finally knowing that, the requester wouldn’t check anymore. This unending cycle means each party’s strategy is chosen probabilistically (i.e. a mixed strategy Nash equilibrium) to maximize payoffs given beliefs about how likely an actor is to choose a particular strategy. This explains why some workers loaf and some work very hard. The loafers are playing a probability that they will be paid even though they did poor work. That probability exists because the requester is also playing a probability. Instead of checking work for each HIT, they only check some. Thus they save on the costs of checking, and provide some incentive to give good work. It is possible to work out these probabilities as a function of the costs and payments involved, but instead going through all that math, I will make two reasonable assumptions to change the game to make it slightly easier to solve.

First, the cost of checking is rather low, especially for research experiments. Adding an attention filter requires little effort, and most people have a research assistant updating credit (thus there is no time component). Treating c as zero means the check and not check rows in each wage level pay the same to the requester (v-h or v-l). Second, the cost of not working is sometimes as high as the cost of working. The amount of time saved is often negligible, and some people (those not doing it as a primary or secondary income) find the cognitive challenge of some research studies stimulating (that is the studies that don’t kill you with boredom), making the cost of doing good work less. Thus if we treat a and d as equal they can also be removed from the columns. This last assumption is less believable, but it also may not affect the equilibrium in a meaningful way.

After these assumptions are made, a pure strategy equilibrium exists (i.e. one of the cells is clearly preferred, and people do not choose probabilistically anymore). That strategy is for workers to work and for requesters to pay a low wage and check. This equilibrium somewhat reflects the state of mturk. Most workers do good work. Most requesters check to make sure they do and 99% of HITs pay below the minimum wage if computed at an hourly rate. One interesting point is after the assumption stated above are made, the low wage-check and low wage-no check cells are identical. This means that while all the actors can assume requesters will check. Requesters may not check, and only they would know (since information about checking is only known to workers if they do not work). After talking with lots of other researchers who use mturk, this reflects the state of the game. They check when its easy to do so, but find it isn’t always necessary and often do not check.

Recall that two hazards existed in mturk. Workers can turn in poor work, and requesters can pay a low wage. Absent the ability to reject HITs by checking the work of workers, a Prisoner’s Dilemma existed where workers turned in poor work in exchange for a small wage. When the ability to reject was introduced, the optimal strategy for workers was to turn in good work. Rejection thus solves for one of the hazards of mturk. This ends up being beneficial for both parties since workers get paid more on average and requesters get a high value from the data received.

However, low wages still exist. For those who do mturk for fun, that might not be such a bad thing. There is some intrinsic reward, as I mentioned earlier, and it may be better than other ways to waste time. However, given the large number of people who use Mturk as a source of income, the ethical dilemma still exists. What can be done to push the equilibrium to the high wage cells? If costs of checking were different for the two levels (i.e. if for some reason it was cheaper to check the work when paying high wages compared to low wages), it may be enough make the low wage payoff lower. However, it is unclear if this is feasible. A better possibility is to have different values of work for low wage and high wage. If the value of v in the high wage were, say, double that of v in the low wage cells, v-h-c may be greater than v-l-c. What does that look like in reality. 1) If you can help it, don’t complete under paid HITs. If it takes 4 times longer to get a HIT done at 25 cents than 50 cents, people will pay 50 cents. This is because the value (v) at that low wage is much lower given that people usually want their HITs completed quickly. 2) Hopefully this happens naturally, and not purposefully, but poorer data (i.e. more noise, more loafers, etc.) that comes from cheap HITs is likely to cause higher payments. Again, the value of a cheap HIT becomes low because the data isn’t usable or it takes time to weed through the poor responses. This could backfire though by causing requesters to look elsewhere for data collection needs. 3) Really focus when you complete a high paying HIT. Confirm the researchers thoughts that if they pay better they get better data (this is a generally accepted assumption). I’ve seen some respondents who complete a $5 HIT that takes 10 minutes in under a minute solely because (I assume) they think there is a chance itll sneak by. Seeing data get worse as cost goes up instantly lowers the payment rate (i.e. the value of work at high payment rates becomes lower than the value of work at low payment rates). The example above is very isolated however.

This simple economic analysis of mturk tells me that payment levels are where they are due to the incentive structure of mturk. Rejection prevents workers from not delivering good information, but giving good information regardless of payment causes low wages. Enforcing a type of minimum wage on HITs likely would push requesters out of the market, and return them to labs and expensive national online panel companies. Thus the primary way workers can help boost the wage level is to not complete low-wage HITs and ensure high-wage HITs are completed well. Seemingly obvious conclusions I know, but it was fun figuring out the game.

Advertisements

Single Post Navigation

11 thoughts on “Don’t Hate the Requester; Hate the Game: An economic dilettante’s take on Mturk ethics

  1. Hi Andrew,

    This post was linked to on turkopticon-discuss, the mailing list for Turkopticon.

    Thank you for this simple but helpful analysis. In my capacity as a graduate student, I am in the process of (slowly, between other things) writing a game-theory-inspired model of an abstract crowd work market that tries to get at these (and some other) questions. It’s nice to see other people thinking along similar lines—and your model is much more concise than mine 🙂

    Best,

    six

    • Thanks for the comment. While mine is more concise, and despite that I haven’t seen it, I’m sure your model is much more accurate. I’d love to have a look when you have a shareable version ready.

  2. CaliBboy on said:

    “Thus the primary way workers can help boost the wage level is to not complete low-wage HITs and ensure high-wage HITs are completed well. ”

    Nice theory but not,in anyway, based upon reality. That theory could work in an environment where there is limited number of workers – Mturk is not that environment. Any US resident at anytime can join Mturk thus you have new workers joining the workforce everyday.

    Moreover, because of the system Amazon designed these workers are only allowed to complete low wage HITS. That ensures there always be a workforce preyed upon by cheap requesters.

    “Enforcing a type of minimum wage on HITs likely would push requesters out of the market, and return them to labs and expensive national online panel companies.”

    This is a logic I don’t fully understand. A researcher has a number options to find participants for a study:

    a)Use only a finite group of undergraduates, that for most will not be diverse enough to yield solid results. However, there is little to no cost for this.

    b) Hire a private research firm to find participants which can cost hundreds to thousands of dollars. $5 – $100 per person

    c) Try to find participants in your local vicinity to join your study. The cost for this can range $5 – $30 per person and extremely time consuming.

    d) Use a Mturk, where there are willing participants in the thousands who are eager to complete you study for small minimum wage – 10 cents per minute.

    With those choices,I don’t see how trying ensure all researchers pay a bare minimum wage will drive them out. Really? A researcher would rather pay $30 per person to a private research firm to avoid paying 10 cents per minute on Mturk? Furthermore, I just don’t understand how so many researchers are downright cheap and are appalled that people actually only completed their study for money,when it reads in big bold letters on the mturk homepage that “Mechanical Turk is marketplace for work”.

    • Yes. The prescriptions are probably a bit strong. However, I wouldn’t characterize the mturk workforce as unlimited. Any US resident can join, but what fraction are willing to complete work for such a low wage? Mturk has been around for years, yet it only contains a small percentage of the US population. Naturally some percentage hasn’t heard of mturk yet, but the percentage of people willing to join and complete work is fairly low. Also, the number of requesters (especially those who post survey and experimental research hits) is increasing at a large rate. Assuming the number of HITs grows faster than the workforce, wages should go up naturally. Not completing cheap hits would hasten that process. This can be illustrated by the doubling of the average payment for research related HITs in the last 5 years (10 minute studies used to average 25 cents per completion, its now about 50 cents).

      Using an outside research company is pretty appealing to a lot of researchers. Consider that with the pay differential that exists currently, national panel companies are used in more published research than mturk. Quadruple what you have to pay an mturk worker, and mturk doesn’t look nearly as appealing. National panels are better for getting representative samples, and the data is considered “better” (I doubt it really is, but reality isn’t as important as perception). Also, an increase in payments to mturk participants will have tax implications that most universities ignore right now. The assumption is that no worker gets paid $600 by a single institution since it would take like 1200 HITs to do that now. When that number gets down to 80 a lot of universities will simply ban mturk usage since the cost of handling the paper work for hundreds or thousands of private contractors (which are what workers are considered) would likely be astronomical.

  3. I can’t “thank” for a comment like you can “thank” for posts on Turker Nation, but if I could, I would thank CaliBboy for the previous comment 🙂

  4. I’m keen on models but not sure your equilibrium has face validity as it’s not what I observe on MTurk. Low wages, sure, but not quality work with monitoring at those low wages.

    Why is it that I post reasonably paying hits (by MTurk standards, paying about $4-5 hour on average which seems above many HITs but lower than the 10¢/min Six suggests), include quality check questions, and reject workers that miss those (painfully obvious) questions? I shouldn’t get any but I get quite a few on some surveys, sometimes up to 10% of the workforce just clicking randomly through questions without reading them. Then enraged rejected workers (who methinks doth protest too much) post angry negative reviews to TO about how I lied about their work and am cheating them…some equilibrium!

    • Yeah, I’m not exactly super-proud of this model (though I think this is my most read blog). I’m clearly not an economist. I suppose some kind of Bayesian model that somehow incorporates beliefs about requesters or about the likelihood of getting caught, and also incorporating the fact that workers can erase their rejection rate by starting a new account would be more accurate. I’ve talk with a bunch of people about payment, and the consensus is that increasing payment makes data come in faster (to a point) but doesn’t necessarily make it better (i.e. less random variance, etc). I think that makes sense, since I’m sure most good workers think they are (and probably are) giving they best attention, and high payments are likely to attract poor workers who think the expected value of doing nothing and receiving a large payment is high.

      • BillB on said:

        I agree. Higher pay will not lead, in and of itself, to higher quality on a per-worker basis, as loafing workers will respond to those incentives just as well as attentive workers.

        One of my main points, perhaps not emphasized enough, is the risks to requesters of getting negative reviews from rejecting bad workers on Turkopticon. There is an interesting, though of course not complete, symmetry between TO and MTurk. On MTurk bad requesters can reject work unfairly, stealing the worker’s labor. On TO bad workers (say, who were fairly rejected for doing crap work) can retaliate and rate requesters unfairly. In each case, there is no recourse for the affected party: the worker goes unpaid by the bad requester and the good requester cannot demonstrate the anonymous reviewer is in fact a bad actor.

        One thing you didn’t address (and neither did CaliBboy) is that Amazon has addressed these dilemmas in part by creating the Master Worker category. It’s necessarily opaque to qualify as otherwise loafing workers would simply target the qualification criteria. I suspect the Master qualification is not a “trick” of finding a secret test posted by Amazon, but rather something granted to long-time workers who have an astounding number of completed hits and an astonishingly low rejection rate, like 10,000 hits completed (50,000?) and 99.9% accepted. What better way to prove that you’re a dependable worker than by demonstrating it time and time again?

        As an aside that may not be interesting to all readers here, the economic rationale of the Master Worker “barrier to entry,” whatever it may be, is similar to that argued for the qualification of London cabbies with “the knowledge”–an exhaustive test of every street in London that takes prospective cabbies two full years to master. While GPS has rendered the knowledge unnecessary, it serves two functions: 1) maintaining the high price of cabs in London, much to the chagrin of riders, and 2) perhaps underappreciated by the customer, it virtually guarantees that cabbies will be professional and dedicated. In other words, good workers. If it took 5 pounds and 5 minutes to get a taxi license in London, what might happen to the quality of service?

      • Anonymous on said:

        In response to BillB’s comment, the premise of the Masters qualification sounds great, but the reality is that a lot of workers with many more approved HITs than that and very high approval rates haven’t received Masters yet; and some workers have received Masters when they had relatively low amounts of approved HITs (as low as 3000 or less), less-than-ideal approval rates (as low as 97% or less), etc.

  5. Anonymous: Do you have any evidence to support your claim besides posts to Internet discussion boards?

  6. The problem is that the convenience of Turk–you can do it when you want from your couch–has currently resulted in supply-demand imbalance. People are willing to do these HITs at the current pay rate, and the reasons vary from person to person, but the demand from workers is there. The only solutions is more requesters….a LOT more requesters. If the pool of requesters grew, and if the worker pool remained about the same, they’d be competing for the same worker pool and they’d have to raise their pay. So the question is, how to attract more requesters?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: