Archive for the tag “Mechanical Turk”

How to email multiple MTurk workers

I frequently want to contact Mechanical Turk workers who have completed studies for me. Sometimes they say they are interested in the study and want more debriefing info. Sometimes I just want to thank them for notifying me of a typo or error that I didn’t catch. Sometimes I want to offer a follow-up opportunity to people who have participated before. It is a violation of Amazon Mechanical Turk policies to require workers to divulge their email address or other identifying information. Some people (including me) have avoided outright breaking that policy by entering the grey area of “asking” for the email address but not making HIT completion contingent on entering a valid address. Another way to contact workers would be to use the Requester User Interface. However, its difficult to A) figure out whether it is possible to contact workers through the RUI, B) figure out how to do it, and C) send emails to multiple workers, since the RUI requires you to do it one at a time.

MTurk does allow you batch process emails to workers, but only through their API. The API allows people to build their own software to utilize Mturk features — more features than what is available on the website. One such feature is sending emails, and allowing a computer to send out 500 emails as opposed to doing it manually saves a lot of time. The API requires using the .NET Framework, which is only implementable in languages I don’t have much experience with. My goal was to create a web service to allow anyone to log in and send emails. However, earlier this summer I came across an R library on GitHub that implemented many, if not all, of the API’s resources in a single package. That library has recently (9/8/14) been published on CRAN (making it easier to find and use). I want to thank Thomas Leeper for saving me the time of both learning Ruby and programming a web service in Ruby. Since Mr. Leeper (Dr. Leeper I believe, actually) implemented it in R, it won’t function as a web service, unless someone feels like setting up an R server with it. However, it is probably better this way. Using the Mturk API requires telling the program your Amazon Web Services (AWS) credentials. With those credentials someone can have full access to your entire AWS account including credit cards on Amazon Payments and thousands of other pieces of information you’d rather not let people have. Thus, security is a concern, and using the Mturk API does carry some risks that you should be willing to accept before using it.

Below is a step-by-step guide to emailing workers. It is written assuming you have no knowledge of the R programming language.

  1. Install R and Rstudio. Rstudio is technically optional, but the rest of the instructions assumes you have it.
  2. Open Rstudio.
  3. In the console (lower-left is the default location), you should see a > on the bottom. Place the cursor there and paste the following:

    then hit Enter. The console should show the steps of installing (downloading, etc.) until it says that the package was sucessfully unpacked.

  4. Create a new R script by clicking File > New File > Rscript (or Ctrl+Shift+N on PCs).
  5. Copy the following code into the script:
     credentials(keypair=c("AccessKeyID goes here","SecretAccessKey goes here"))  
        subjects = "Email subject line goes here",  
        msgs = "Email body goes here",  
        workers = "WorkerId goes here"  

    #Note: The contact function above is setup to contact a single worker. You can send multiple workers the same message or customized messages to each worker in a single function call. To do so, you use vectors. For our purposes, think of a vector as a list. If you have a list of 100 email subject lines, 100 email bodies, and 100 worker IDs, R can send 100 custom emails. If you have a list of 1 email subject line, 1 email body, and 100 worker IDs, R will send the same email to 100 people. It can get more complicated, but I don’t know how it would applicable for this purpose. To create a vector, you use the c() function. Enclose all your text (e.g. worker IDs) in quotation marks and separate them with commas, like below.

        subjects = c("Email subject 1", "Email subject 2", "..."),  
        msgs = c("Email body 1", "Email body 2", "..."),  
        workers = c("WorkerId 1","Worker Id 2", "...")  
  6. In the line that starts with credentials, replace AccessKeyId goes here with your AWS access key and SecretAccessKey with your secret access key. To find your credentials, do the following:
    1.  Log in to the AWS Security Manager with the log in info for your Mturk account.
    2. Open the Access Keys panel.
    3. Click “Create New Access Keys.”
    4. Copy the Access Key and Secret Access Key and paste into the R script
      1. Note to those who know more than I do. It is my understanding that AWS no longer lets you view Secret Access Keys and you have to create a new one in order to see it. If there is a way to view a Secret Access Key without creating a new one, please email me or leave a comment.
  7. Save your R script. You will edit this file each time you want to send an email out, so put it somewhere you remember.
  8. To send an email, edit one of the two “contact” functions. The first one is for sending a single email to single worker (note you can only send emails to workers who have done work for you). The second one is for sending multiple emails (e.g. multiple emails to same worker [not recommended] or the same email to multiple workers or distinct emails to multiple workers). I’ll assume you want to send one email to multiple workers, and the instructions will edit the second contact() function.
    1. The parameter “subjects” is the subject line. You are assigning a list of subjects to that variable. If the list has only one entry, everyone will get the same subject. If you have 500 workers, you can create 500 separate subjects if you want. You just separate them by a comma. For the subject and every other item you will edit, make sure everything is enclosed within quotation marks.
    2. The parameter “msgs” is the body of the email. Similarly, you can create a list of separate email bodies or just put in one.
    3. The parameter “workers” is the WorkerId of the workers you want to contact. You can get worker IDs from the RUI if you don’t collect them yourself. Simply log into the Requester site, and go to Manage, then open a HIT posting, then download the CSV of the data. In the CSV are a lot of columns. One is labeled WorkerID. Entries in that column are what you are looking for. You can also view worker IDs in the main Review Results page. They’re in one of the default columns of results or, if they are not showing, you can click “Customize View” to have them show.
    4. Assuming you want to send one email to a list of workers, the contact function should look something like this
      1.  contact(  
            subjects = c("Email subject"),  
            msgs = c("Email body"),  
            workers = c("WorkerId 1","Worker Id 2", "...")  
  9. Execute the code.
    1. Place the cursor on the first line (starts with library) or highlight the entire line. Hit CTRL+Enter (for PCs) or Command+Enter (for Macs, I think). This loads the MturkR library.
    2. Place the cursor on the line that starts with credentials or highlight the entire line. Hit CTRL+Enter or Command+Enter. This stores your credentials temporarily, to use when the program communicates with Amazon’s servers.
    3. Highlight the entire contact function you wish to execute (i.e. the from the c in contact to final parentheses after all the worker ids). Hit CTRL+Enter.
    4. Look at the console to see if everything executed fine. You should see a list of workers that were notified. If some weren’t it’ll tell you why (e.g. worker ID invalid, worker hasn’t completed work for you, etc.). If you see an error, it may give you a hint as to why. For instance, perhaps you are missing an quotation mark surrounding your entries. Perhaps you are missing a comma between worker IDs.

Once everything has executed, you are finished. You can close the file, then reopen it next time, edit it again (step 8) then execute the new code (step 9). You only need to install the MturkR package once, but need to load it every time (the library command) and enter your credentials every time (the credentials function).

There are A LOT of other things the MTurkR library can do, such as setting up qualifications as well as qualifications tests, publishing and managing HITs, etc. I may tinker with other features and post my findings, but for now, MturkR is a great resource even if you only use it for contacting workers.


What makes the Turker click

When selling the benefits of Amazon Mechanical Turk (Mturk) to people, I usually just have to say “faster and cheaper” and the sales pitch is over. Last year, I got 12,000 people to do a task that paid a penny in about 90 minutes. Sometimes I would spend 3 hour getting 20 subjects to take a survey at UCLA. That 12,000 subject task was a special case, however. It took maybe 20 seconds to complete, and involved looking at a picture and then typing something. On average I tell people they will get their subjects in about 24 hours. This typically is ~200 people doing a 10 minute task for at $0.50.

Lately, though, I have had some surprises. Consider the following three studies. 1 posted on a Tuesday, 1 on a Thursday, 1 on a Monday. All use the same posting template with the same number of participants needed and same payment, but go to a different external survey site. They also had slightly different titles. The Monday post took ~4 days. The Thursday post is still going and is projected to take 6 days. The Tuesday post took 4 hours. When the researchers ask me why its taking so long, I have to act surprised and tell them to just sit it out. Then they worry that the added time means something is wrong and their data won’t be very good. I wonder the same things.

While I don’t have answers to why those HITs acted differently, I have found a few tips to maximize response rates. (Note: sometimes below I will generalize thoughts to “they” when really what I’m saying is “me.” In addition to being a master requester, I am an avid Mturk worker.)

1. Pay at least $0.50. With some exceptions, I found 50 cents to be a good price. It’s more than most HITs out there, but surveys are widespread now and people know they may not make as much answering questions than tagging images or pasting Wikipedia URLs. Most workers set a minimum payment for their searches. When I run studies at $0.25 I expect it to take 2 weeks to get enough subjects. The difference is exponential. If you are not pretesting or piloting consider something even higher. For studies that are published I try to pay $1.00 for a 15 minute study. Still under minimum wage, but around 4x the Mturk average. When people see a $1.00 post, they jump at it, and that desire to participate translates into a better research participant.

2. Obvious but use as many keywords as you can think of. Workers tend toward repetition. A person will log on, see a photo tagging HIT and spend the next hour tagging photos and searching for more photo tagging HITs. Similarly, some people don’t want something so monotonous and they search for surveys. Make sure survey is somewhere in your keywords. Start brainstorming and use a lot. People only find

3. Difficult but try to offer multiple HITs in each batch. Most researcher requesters think of their postings a single entity. That is not how Mturk was designed. Mturk was designed to take a template and input and generate many individual HITs from that data, like taking a template and 1000 images and generating a batch of HITs for tagging all 1000 of those images without having to create the 1000 separate HITs individually. Mturk by default orders search results by the number of HITs available. That means if you only have 1 HIT available (i.e. each single person can only do 1 survey) then you will be lower down the list and get fewer workers. Offering multiple HITs is not easy though. You have to design a scenario where you can have someone do two or more tasks but they can stop at 1 if they want. A good idea is if you have multiple projects going and want to run multiple studies at the same time. If you can’t do this, then you have to get really creative. One thing to avoid is what I have seen several times.  A survey completes and I try to move on to the next. It then tells me that I cannot do more than 1 HIT in the batch. The requester keeps track of workers and really only wants them to do one of the surveys but posts multiple so they are higher up in the search. This is bad for reputation… which brings me to.

4. Keep a good reputation. All requesters know about the stats of workers. Things like rejection rates, etc. A lot of requesters don’t know about sites like Turker Nation and Turkopticon. Turker Nation is a forum for workers, where among other things they talk about the quality of requesters. Even more useful for workers to gauge the quality of a requester is Turkopticon. This is a browser extension that allows users to hover over the names of requesters and see the Turkopticon ratings for that requester. Turkopticon keeps rating for Generosity, Speed of payment, Fairness, and Communicativity. So answer your emails, pay people quickly (don’t wait a week for it to auto-pay), give real reasons to reject (or better yet, be lenient with you $0.50 and give a warning if you think they tried but still failed), and don’t pay 10 cents for an hour-long task.

5. Consider the time. A few years ago I did a survey and discovered that most workers do tasks during their work day as a way to waste time and get paid instead of waste time and get fired on Facebook. This probably has changed some in the last 3 years, but I still find posting early in the week and in the morning gives you an edge over posting at night on Friday.

There are dozens of more examples. I will probably make this a continuing series, but don’t hesitate to share your ideas. I still have no explanation for the disparity in times recently and would love to hear what you have to say.

The death of the confirmation code… hopefully

The confirmation code is an interesting psychological experiment in and of itself. We receive them for every digital transaction that we take part in from purchasing a book on to paying for dinner with a credit card. In our minds that number is our guarantee that what we just experienced actually happened and that we did not merely purchase that copy of 50 Shades of Grey (what??? I mean A Song of Ice and Fire!) in our imaginations. Without one, in some instances we can survive. Who looks that confirmation code on their credit card receipts after all? However, in others we seem to be lost, uncertain of the future and whether we can go on. Or so it seems in Amazon Mechanical Turk.

When I helped kickstart that use of MTurk at the Anderson School of Management at UCLA a few years ago, we needed a way to connect the data file on Mturk with the data file for our experiments. I’m sure I’m not the first person to think of this simple solution, but I suggested we assign every participant a unique code and have the participant enter it in the MTurk HIT before submitting it. It caught on there, and from the looks of the landscape its how everyone else does it too. It’s so ubiquitous, in fact, that when I stopped using them in favor of, in my opinion, a better practice, I started receiving 30 emails per study asking where their confirmation code was.

The problem with the confirmation code in MTurk is the error rate. Sometimes as many as 10% of people enter the code wrong. Also, it’s a pain to give each participant a unique code, and it’s annoying using these extra columns in the data files. I wanted to eliminate human copy & paste error, and utilize data already available. To do this I started storing each participant’s Amazon Worker ID in my experiment data files. The ID is automatically stored in the MTurk data file, and sending it and storing it in Qualtrics or other online survey systems is a breeze. The only problem, other than the uncertain feeling of the missing confirmation code, is that the worker has to accept the hit before you can access the ID. This is only a small problem.

What is below is a simple JavaScript that will create a hyperlink that only sends people to a URL if they have accepted the HIT already. Once accepted and clicked, the worker’s workerId is appended to the URL as a query string variable.  The code can be copy and pasted into the Source window of the Mturk HIT template. Then you simply have to replace the surveyUrl with the URL of your survey. The code is purposefully rudimentary (for JavaScript at least) so anyone can use it. If you know JavaScript, you can edit the code how you like, using buttons and events in place of the anchor, etc.

<script type='text/javascript'>
  var surveyUrl="";//The url you want to send people to.
  function gotoSurvey()
    var href=window.location.href;//Get the url of loading document (in mturk is the iframe the HIT content is in, not the url of the page itself)
    var queryString={};//Create an empty object to dump query string variables in.
    href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Use a regular expression as well the a nifty alternative second parameter for the String.replace() method to dump all query variables into the queryString object
    {,'workerId',queryString['workerId']),'survey_window');//add workerId variable to URL and open popup. Edit the link text in case of popup blockers
      document.getElementById('survey_anchor').innerHTML="If your popup blocker prevented the survey window from opening, disable it and click this link again.";
      //No workerId variable. Worker hasn't accepted HIT -> ask to accept
      alert("You have not accepted the HIT yet. Please do so before clicking this link");
      document.getElementById('survey_anchor').innerHTML="Please accept the HIT, then click this link again";
  function addQueryVar(url, name, value)
    //Find anchor in URL since you can't add the query string after an anchor
    var fragmentStart = url.indexOf('#');
    if (fragmentStart < 0) fragmentStart = url.length;//no anchor so add variable to end
    var urlBeforeFragment = url.substring(0, fragmentStart);
    return urlBeforeFragment+(urlBeforeFragment.indexOf('?') < 0 ? '?' : '&')+encodeURIComponent(name)+'='+encodeURIComponent(value)+url.substring(fragmentStart);//if there is already query string variables add new variable with &, if not, add with ?, making sure to URI encode the content.
  <a id="survey_anchor" name="survey_anchor" href="javascript:gotoSurvey()">Click here to go to the survey</a>

You no longer have to have subjects enter anything in your Mturk HITs before submitting them. Mturk does require you have at least one input in your code, so I usually put a comment box: <textarea id="comments"></textarea>, and let the users tell me about typos and stuff that I missed.

Storing the ID in Qualtrics is easy. All you have to do is create an embedded data element in the Survey Flow menu. Name the embedded data field workerId (all lowercase except for the I). You are done (leave the value unset). It will grab the value from the URL and store it in your Qualtrics file. The workerId field is already stored in the Mturk data file, so simply sort them, or use VLOOKUP in Excel, or however else you transfer who to accept and who to reject to Mturk.

While confirmation codes psychologically tell us that something actually happened and there is a documented way to look it up, in MTurk they are fraught with problems. I can only hope that by showing people some code that will make their research lives easier, it will also help me by diminishing the amount of worried email I receive each time I post a study.

Post Navigation