Archive for the month “August, 2012”

How attention filter format affects responses on Mturk

Attention filters or known-answers are the most common way to gauge the responders level of attention or effort. If a question where the answer is given to the participant or easily known (e.g. 2+2 = ?; or telling the answer in instructions before the question) is answered incorrectly, you have a good idea that the person was breezing through and they either should not be paid or you should not use their data.

When I first started using Mturk, the attention filter fail rate was about 5% in my surveys, which is phenomenal for the cost. I attributed this to the fact that people on Mturk are worried about not getting paid, so more attention is put into work. However then I had a string of surveys where the fail rate was around 50%. This is terrible and justification to not publish results even if they are good. I decided to try to figure out the cause.

A few weeks ago I wrote about how workers sometimes don’t attend to instructions in order to make the survey go faster. Workers are used to quick tasks, and repetitive multiple choice questions are therefore better than long instruction sets. I ran two studies and each contained the same attention filter. The filter was one that gave the answer in a paragraph before the question. Then asked a simple question where the correct answer (the one given) would not be the correct answer to the question. Like “Answer ‘8’ to the following question. 2+2=___.” The question was always bolded, but the instructions were bold in one study and not bold in another. When bold the fail rate was 7%. When not bold the fail rate was 45%. The simple act of making the instructions look like instructions made people not read them.

Because of this, it is important both to construct attention filters in a way that they measure what you want to measure (e.g. do you want to know if instructions are being read or if question text is being read) and it is important to format instructions in a way that they look like questions.

Questions get read, instructions don’t.


The kinds of experiments that work on Mturk

In my never ending attempt to push colleagues into the Mturk world, I often tell them how I have never had a study that worked on another population and not work on the Mturk worker population. However, inevitably my advertisements prove false and I get someone who says that they got some weird results. After studying the types of experiments conducted, I noticed something that should have been obvious from the beginning. Workers seem to be really good at short iterative tasks, or longer tasks that have objective measurements. Thus my decision making work which asks multiple choice questions or asks people to click some buttons  and watch some numbers appear and then judge them works really well. It is akin to tasks like photo tagging. Something short over and over again is easy to pay attention to. Similarly, asking someone to write out how they would solve a problem gets good results. Anyone can look at a paragraph on how to choose between two insurance options and tell if the person put effort into it or not.

However, complex imagined scenarios tend to not work very well. A worker is not used to keeping 2 pages worth of instructions memorized to imagine what it would be like to be a Doctor figuring out whether to prescribe a patient drug X which has a higher cure and premature death rate or drug Y which guarantees immediate safety, but may not cure the disease. In general I think it should be assumed that workers are under high-cognitive load at all times. The worker may be taking your survey at work and while answering a question they may also be trying to gauge if their boss is walking by. If cognitive load factors into your variables or you think a high amount of attention is needed, you will be better to use a different recruitment method or pay a significantly higher amount of money. Last, if you are using Mturk, I would recommend using mechanisms to force attention to instructions like not allowing the Next button to appear for a minute or two in Qualtrics.

What makes the Turker click

When selling the benefits of Amazon Mechanical Turk (Mturk) to people, I usually just have to say “faster and cheaper” and the sales pitch is over. Last year, I got 12,000 people to do a task that paid a penny in about 90 minutes. Sometimes I would spend 3 hour getting 20 subjects to take a survey at UCLA. That 12,000 subject task was a special case, however. It took maybe 20 seconds to complete, and involved looking at a picture and then typing something. On average I tell people they will get their subjects in about 24 hours. This typically is ~200 people doing a 10 minute task for at $0.50.

Lately, though, I have had some surprises. Consider the following three studies. 1 posted on a Tuesday, 1 on a Thursday, 1 on a Monday. All use the same posting template with the same number of participants needed and same payment, but go to a different external survey site. They also had slightly different titles. The Monday post took ~4 days. The Thursday post is still going and is projected to take 6 days. The Tuesday post took 4 hours. When the researchers ask me why its taking so long, I have to act surprised and tell them to just sit it out. Then they worry that the added time means something is wrong and their data won’t be very good. I wonder the same things.

While I don’t have answers to why those HITs acted differently, I have found a few tips to maximize response rates. (Note: sometimes below I will generalize thoughts to “they” when really what I’m saying is “me.” In addition to being a master requester, I am an avid Mturk worker.)

1. Pay at least $0.50. With some exceptions, I found 50 cents to be a good price. It’s more than most HITs out there, but surveys are widespread now and people know they may not make as much answering questions than tagging images or pasting Wikipedia URLs. Most workers set a minimum payment for their searches. When I run studies at $0.25 I expect it to take 2 weeks to get enough subjects. The difference is exponential. If you are not pretesting or piloting consider something even higher. For studies that are published I try to pay $1.00 for a 15 minute study. Still under minimum wage, but around 4x the Mturk average. When people see a $1.00 post, they jump at it, and that desire to participate translates into a better research participant.

2. Obvious but use as many keywords as you can think of. Workers tend toward repetition. A person will log on, see a photo tagging HIT and spend the next hour tagging photos and searching for more photo tagging HITs. Similarly, some people don’t want something so monotonous and they search for surveys. Make sure survey is somewhere in your keywords. Start brainstorming and use a lot. People only find

3. Difficult but try to offer multiple HITs in each batch. Most researcher requesters think of their postings a single entity. That is not how Mturk was designed. Mturk was designed to take a template and input and generate many individual HITs from that data, like taking a template and 1000 images and generating a batch of HITs for tagging all 1000 of those images without having to create the 1000 separate HITs individually. Mturk by default orders search results by the number of HITs available. That means if you only have 1 HIT available (i.e. each single person can only do 1 survey) then you will be lower down the list and get fewer workers. Offering multiple HITs is not easy though. You have to design a scenario where you can have someone do two or more tasks but they can stop at 1 if they want. A good idea is if you have multiple projects going and want to run multiple studies at the same time. If you can’t do this, then you have to get really creative. One thing to avoid is what I have seen several times.  A survey completes and I try to move on to the next. It then tells me that I cannot do more than 1 HIT in the batch. The requester keeps track of workers and really only wants them to do one of the surveys but posts multiple so they are higher up in the search. This is bad for reputation… which brings me to.

4. Keep a good reputation. All requesters know about the stats of workers. Things like rejection rates, etc. A lot of requesters don’t know about sites like Turker Nation and Turkopticon. Turker Nation is a forum for workers, where among other things they talk about the quality of requesters. Even more useful for workers to gauge the quality of a requester is Turkopticon. This is a browser extension that allows users to hover over the names of requesters and see the Turkopticon ratings for that requester. Turkopticon keeps rating for Generosity, Speed of payment, Fairness, and Communicativity. So answer your emails, pay people quickly (don’t wait a week for it to auto-pay), give real reasons to reject (or better yet, be lenient with you $0.50 and give a warning if you think they tried but still failed), and don’t pay 10 cents for an hour-long task.

5. Consider the time. A few years ago I did a survey and discovered that most workers do tasks during their work day as a way to waste time and get paid instead of waste time and get fired on Facebook. This probably has changed some in the last 3 years, but I still find posting early in the week and in the morning gives you an edge over posting at night on Friday.

There are dozens of more examples. I will probably make this a continuing series, but don’t hesitate to share your ideas. I still have no explanation for the disparity in times recently and would love to hear what you have to say.

You’d think they’d make it easier

I am often in need of the separate components of regression function. For instance, differences in utility functions can be easily measured by comparing the exponents of a the functions’ power regressions. Excel can calculate the power regression and place the line on a chart. For some reason though, they did not give an easy way to get that data in a formula. If I felt like programming my own functions I imagine =powreg(x’s,y’s,[desiredDataToReturn]) would definitely be in there. Since it’s not I’m usually forced to go to Google. Unfortunately this appears to not be to wide-spread of a problem, or any easy solution exists that I am not aware of, because I am always at a loss even after a Google search. Of course Excel help is useless. They can’t bother to document that functions have more uses than they appear (=ln() is also an array function? I guess you need to buy a book to learn these things).

At first I was forced to look up and convert the power law equations into an Excel function. If you are looking for practice with Excel, have fun; here’s the equation for the exponent in a power function:

Power law equation for exponent

It would be easy if Excel had a better way to manage parentheses

Just kidding. Here’s the formula: =(count(x array)*sum(ln(x array)*ln(y array))-sum(ln(x array)*sum(ln(y array))/(count(x array)*sum(power(ln(x array),2))-power(sum(ln(x array),2)). Then hit CTRL+Shift+Enter NOT just enter. Quick explanation, ctrl+shift+enter enters the sums and natural logs as array functions which is how it calculates xi instead of just x. You can check the result against a trendline inserted in a chart.

Since that way is really no fun and typing that out (or even now copy and pasting and then editing the arrays) is excruciating, I decided to see what else Excel has to offer. The =linest() function looks like it’s no help at first since it only does a straight line regressions, but then I thought, “What if I just change the space the regression is playing in?” To skip some 12th grade calculus (I really don’t even know if what I’m doing is calculus), I’ll simply say a linear regression of on a log scale is a power regression in the linear scale. Simply put take the natural log of all your coordinates when plugging them into your formula.

=linest(ln(x array),ln(y array),false,false)

This formula gives you a power regression where =linest(x array, y array, false,false) gives you the linear regression. If you want the exponent, you could do the ctrl+shift+enter approach I mentioned above, but if you like using Excel arrays in a more user-friendly format, use the =index() function. The exponent is the first index (or 0 since it’s a 0-based array) and the coefficient is the second index.

=index(linest(ln(x array),ln(y array),false,false),0)

This gives you the exponent. Below is the coefficient. You have to inverse the coefficient (take it out of the log scale and back to the linear scale) first though.

=exp(index(linest(ln(x array),ln(y array),false,false),1)

Easy enough. I suppose programming a separate formula for this and every other regression style is superfluous, but why couldn’t Microsoft just put this in their Help document. It literally adds maybe 2 lines, as opposed to the 100 lines you’ve read here if you go this far.

Post Navigation