A better way to ask for currency responses in Qualtrics

Willingness to Pay questions (WTP) are ubiquitous in marketing research (my field), as well as many others. When asking for WTP in Qualtrics, you normally have to settle for some sub-par work-arounds. When asking in an open-answer text box, you can leave no validation, but you risk getting a lot of gibberish that both isn’t quite missing data, but isn’t quite usable data either. If you put a numerical validator participants may get annoyed at the inevitable error messages. They try to type something like $300, but to a computer, that’s not a number (it contains the dollar sign so it gets interpreted as a character string). You could also do things like slider bars, but that sets artificial anchors that could bias results.

I solve this problem with regular expressions. If you are not familiar with regular expression, they are a way of programming a pattern within a string of characters. For instance, an email address is any number letters, numbers, and certain special characters, followed by the @ symbol, followed any number of strings of letter, numbers and hyphens, that are optionally separated by periods, followed by a letter string with between 2 and 4 characters (its actually a little more complicated, but that covers 99% of email addresses). In regular expression coding, that is represented by the following: ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$

If you think it looks complicated, its because it is. After 7 years of programming, when I write regular expressions, I still use cheat sheets. Recently, I decided I was sick of using Qualtrics’ numeric validator and inserting the instruction “Only use numbers and optionally a decimal point. Do not use dollar signs or commas.” every time I asked for a response in currency format. Qualtrics allow you to create your own validations, and one option is to match the inputted text to a regular expression. With some writing, some tinkering, some forum searches, some more tinkering, and a lot testing, I settled on the following expression to validate US currency (it is not difficult to change this to other formats).


That regular expression optionally allows the person to specify positive or negative values ([+-]?), optionally followed by a dollar sign (\$?). The next part is a little complicated. The point of it is to optionally allow a thousands separator (a comma in the US, typically). I defined that as 1 to 3 numbers ([0-9]{1,3}). Optionally followed by a comma, and if it is followed by a comma, there must be 3 numbers after the comma ( (?:,?[0-9]{3}) ). That previous pattern can repeat infinitely ( * ). Optionally, it allows decimals by requiring a period followed by exactly 2 numbers ( (?:\.[0-9]{2})? ). The ^ at the beginning says that whatever is typed has to start there. The $ at the end says whatever is typed has to end there. Both together just means that the entirety of what they type has to match what is between the two symbols. Otherwise someone could type “asdlk;fjasdf$300.00” and it would validate, because it can find a valid string within the whole of the text.

I explained it out in detail in case you are in a different country with different currency symbols and formats. To change the dollar sign, just change the first dollar sign to your currency symbol (other dollar signs have special meaning and have nothing to do with currency, so leave them). If you don’t use commas as thousands separators, but use something else, change the comma within ?:,? to your thousands separator. If you don’t use thousands separators at all, you could just leave it the way it is assuming no one would use them if they don’t know they exist, or you could remove the ?:,? entirely. If you use something besides a period as a decimal indicator (e.g. a comma) replace the \. in ?:\. with your decimal indicator (e.g. ?:,).

The regular expression above matches as many formats that I could think of.


It isn’t perfect, however. Someone could put in a value like $3000000,000, and it would match, even though with only 1 thousands separator it hard to know what the person meant. I couldn’t figure out a way to require proper thousands separators if they are used. However, this kind of problem would be so rare, that I can’t imagine ever seeing it.

To use it, click on your text entry question. In the Validation Type area of the menu on the right, select Custom Validation. The logic should read IF [your question] [theres only 1 option for the second drop down menu] [MATCHES REGEX] [^[+-]?\$?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?$]. Note the brackets are only there to separate out the different drop down and text entry menus. Do not leave the brackets in place when pasting in the regular expression.

The final piece of the puzzle is how the data is stored. You could do some javascript work to have Qualtrics save the currency as a number. However, Excel is good at noticing currency and changing it to a number, so I figured I’d do the conversion in the data cleaning phase rather than adding complicated javascript to every question.

Random thoughts: No, America doesn’t format their dates incorrectly

It’s a common criticism of America that the way we format dates is illogical. It’s so common that I now roll my eyes when I see images like this on Facebook:

If you are unaware of the criticism, the month, day, year format (M-D-Y) can be rephrased as, “Medium size unit, small size unit, large size unit.” Those outside of the US typically format dates as D-M-Y or Y-M-D, because it follows the logical progression of the sizes of the units. Days are smaller than months which are smaller than years! Our forefathers (or whoever established the M-D-Y format) must have been idiots! However, I would bet that most people can’t really argue as to why size of the unit is the best way to order the format. To provide some support for why the M-D-Y format makes sense, I offer the argument that the American format orders the units by importance of the information the unit conveys, in common usage.

How often do you write a date without the year? If you are telling someone a deadline next month, you may write it as, “Send me the file by 10/15 (October 15th).” The year is often meaningless. People can assume by context that you are talking about the current year. Similarly, people often write dates without the day of the month. “Yes, we’re getting married in 10/15 (October of 2015).” Again, the context tells the listener that they are giving a general time, and the day of the month doesn’t matter. Now, how many times have you seen a date without the month, such as “I’m moving on the 10th of the month in 2015?” The only instance I can think of is when giving a single unit of time (e.g. “Get it back to me by the 15th” or “We’re getting married in 2015”). However, in these cases formatting doesn’t really matter since there is only one format. The logical conclusion is that across the several formats of dates, it makes sense to always put the month first, since it is the only constant piece of information.

The units also carry different amounts of information. For instance, if you remove yourself from a specific time, days of the month carry virtually no information. Saying “the 15th of the month” is virtually meaningless outside of where that is relative to the current date (e.g. “that’s 2 weeks from now”). The 1st of the month is associated with rent payments, etc., but I would argue that such information is used infrequently. Months, however, include information about the seasons and other activities that recur yearly. If you are in school, August or September are probably associated with going back to school. December is associated with holidays. When you are thinking about celebrating someone’s birthday, the month generally carries the most important information. Sure day of the month is necessary, and the year tells you their age, but the month helps you determine how far away it is, when to start planning, who else to group the birthday with (e.g. if you have a single office birthday party once a month), etc.

I, of course, am not saying MM-DD-YYYY is the best format. Instead, as Howard Moskowitz may say, “There is no best format — only best formats.” When dealing with recurring events, the month is often the most important piece of information, and learning that information first makes sense. However, the year is often very important, so the international standard of Y-M-D makes the most sense in many cases (especially on computers where it is the easiest to sort by date). Really, though, I’m not even advocating for M-D-Y’s normativity. Instead, I argue that when people don’t understand why something is the way it is (especially if that thing differs from what you expect), it is better to put some thought into figuring it out then to assume everyone is dumb.

How to email multiple MTurk workers

I frequently want to contact Mechanical Turk workers who have completed studies for me. Sometimes they say they are interested in the study and want more debriefing info. Sometimes I just want to thank them for notifying me of a typo or error that I didn’t catch. Sometimes I want to offer a follow-up opportunity to people who have participated before. It is a violation of Amazon Mechanical Turk policies to require workers to divulge their email address or other identifying information. Some people (including me) have avoided outright breaking that policy by entering the grey area of “asking” for the email address but not making HIT completion contingent on entering a valid address. Another way to contact workers would be to use the Requester User Interface. However, its difficult to A) figure out whether it is possible to contact workers through the RUI, B) figure out how to do it, and C) send emails to multiple workers, since the RUI requires you to do it one at a time.

MTurk does allow you batch process emails to workers, but only through their API. The API allows people to build their own software to utilize Mturk features — more features than what is available on the website. One such feature is sending emails, and allowing a computer to send out 500 emails as opposed to doing it manually saves a lot of time. The API requires using the .NET Framework, which is only implementable in languages I don’t have much experience with. My goal was to create a web service to allow anyone to log in and send emails. However, earlier this summer I came across an R library on GitHub that implemented many, if not all, of the API’s resources in a single package. That library has recently (9/8/14) been published on CRAN (making it easier to find and use). I want to thank Thomas Leeper for saving me the time of both learning Ruby and programming a web service in Ruby. Since Mr. Leeper (Dr. Leeper I believe, actually) implemented it in R, it won’t function as a web service, unless someone feels like setting up an R server with it. However, it is probably better this way. Using the Mturk API requires telling the program your Amazon Web Services (AWS) credentials. With those credentials someone can have full access to your entire AWS account including credit cards on Amazon Payments and thousands of other pieces of information you’d rather not let people have. Thus, security is a concern, and using the Mturk API does carry some risks that you should be willing to accept before using it.

Below is a step-by-step guide to emailing workers. It is written assuming you have no knowledge of the R programming language.

  1. Install R and Rstudio. Rstudio is technically optional, but the rest of the instructions assumes you have it.
  2. Open Rstudio.
  3. In the console (lower-left is the default location), you should see a > on the bottom. Place the cursor there and paste the following:
    then hit Enter. The console should show the steps of installing (downloading, etc.) until it says that the package was sucessfully unpacked.
  4. Create a new R script by clicking File > New File > Rscript (or Ctrl+Shift+N on PCs).
  5. Copy the following code into the script:

    credentials(keypair=c(“AccessKeyID goes here”,”SecretAccessKey goes here”))

    “Email subject line goes here”,
    “Email body goes here”,
    “WorkerId goes here”)

    #Note: In the function call below, you can have multiple workers, but only one email subject and body. If so, the same email will go to all the workers.
    c(“Email subject 1”, “Email subject 2”, “…”),
    c(“Email body 1”, “Email body 2”, “…”),
    c(“WorkerId 1″,”Worker Id 2”, “…”))

  6. In the line that starts with credentials, replace AccessKeyId goes here with your AWS access key and SecretAccessKey with your secret access key. To find your credentials, do the following:
    1.  Log in to the AWS Security Manager with the log in info for your Mturk account.
    2. Open the Access Keys panel.
    3. Click “Create New Access Keys.”
    4. Copy the Access Key and Secret Access Key and paste into the R script
      1. Note to those who know more than I do. It is my understanding that AWS no longer lets you view Secret Access Keys and you have to create a new one in order to see it. If there is a way to view a Secret Access Key without creating a new one, please email me or leave a comment.
  7. Save your R script. You will edit this file each time you want to send an email out, so put it somewhere you remember.
  8. To send an email, edit one of the two “contact” functions. The first one is for sending a single email to single worker (note you can only send emails to workers who have done work for you). The second one is for sending multiple emails (e.g. multiple emails to same worker [not recommended] or the same email to multiple workers or distinct emails to multiple workers). I’ll assume you want to send one email to multiple workers, and the instructions will edit the second contact() function.
    1. The parameter “subjects” is the subject line. You are assigning a list of subjects to that variable. If the list has only one entry, everyone will get the same subject. If you have 500 workers, you can create 500 separate subjects if you want. You just separate them by a comma. For the subject and every other item you will edit, make sure everything is enclosed within quotation marks.
    2. The parameter “msgs” is the body of the email. Similarly, you can create a list of separate email bodies or just put in one.
    3. The parameter “workers” is the WorkerId of the workers you want to contact. You can get worker IDs from the RUI if you don’t collect them yourself. Simply log into the Requester site, and go to Manage, then open a HIT posting, then download the CSV of the data. In the CSV are a lot of columns. One is labeled WorkerID. Entries in that column are what you are looking for. You can also view worker IDs in the main Review Results page. They’re in one of the default columns of results or, if they are not showing, you can click “Customize View” to have them show.
    4. Assuming you want to send one email to a list of workers, the contact function should look something like this
      1. contact(subjects=
        c(“Test email”),
        c(“This is a test email from Mturk”),
        c(“ID1″,”ID2”, “ID3”))
  9. Execute the code.
    1. Place the cursor on the first line (starts with library) or highlight the entire line. Hit CTRL+Enter (for PCs) or Command+Enter (for Macs, I think). This loads the MturkR library.
    2. Place the cursor on the line that starts with credentials or highlight the entire line. Hit CTRL+Enter or Command+Enter. This stores your credentials temporarily, to use when the program communicates with Amazon’s servers.
    3. Highlight the entire contact function you wish to execute (i.e. the from the c in contact to final parentheses after all the worker ids). Hit CTRL+Enter.
    4. Look at the console to see if everything executed fine. You should see a list of workers that were notified. If some weren’t it’ll tell you why (e.g. worker ID invalid, worker hasn’t completed work for you, etc.). If you see an error, it may give you a hint as to why. For instance, perhaps you are missing an quotation mark surrounding your entries. Perhaps you are missing a comma between worker IDs.

Once everything has executed, you are finished. You can close the file, then reopen it next time, edit it again (step 8) then execute the new code (step 9). You only need to install the MturkR package once, but need to load it every time (the library command) and enter your credentials every time (the credentials function).

There are A LOT of other things the MTurkR library can do, such as setting up qualifications as well as qualifications tests, publishing and managing HITs, etc. I may tinker with other features and post my findings, but for now, MturkR is a great resource even if you only use it for contacting workers.

Preventing Mturk workers from prematurely submitting HITs (by altering the submit button’s functionality)

If you’ve ever used the Mturk web interface for creating HITs, you know it’s lacking in several areas. One that can be particularly annoying is the submit button that gets added to the end of every template you make. It looks like a plain HTML submit button, and is a half inch away from your normal content. I used to receive emails constantly from people who accidentally submitted their HIT before completing it. Never one to cry over 25 cents, I approved them, but answering those emails took too much time.

My initial solution was simple. I put about 10 blank lines at the end of my HIT template, following by a horizontal bar, followed by instructions to only click the submit button once the HIT is complete. That stopped about 99% of the emails. Which to me was a success. This is still all I do for my HITs, and if you are having this problem, just add the HTML code below to the end of your HITs.

 <p>After you have completed everything in this HIT, click the submit button below.</p>  

However, a friend recently wanted something more foolproof. I can’t remember the exact reasoning, but my friend wanted the submit button gone entirely. If you need 100% assurance (good luck you’ll never get it), you can try the method below. The logic is to first hide the submit button, which is a simple CSS modification. Once you want the user to submit the HIT, you can “click” the submit button for them by using the JavaScript click() function. What’s in the middle is really up to you. The code below replaces the submit button with a different button. That button queries a database which records when workers complete tasks within HITs. If a task is complete but the HIT is not submitted, the script submits the HIT. If the task is not completed, it alerts the user that they have not finished the task yet. This naturally assumes you have a database that records this information. Likely you do not. However, if you use confirmation codes or secret end-of-survey-passwords you can use JavaScript to check if the passcode is correct or not. If it is incorrect, they are either trying to cheat your out of 25 cents, or forgot the passcode (which is, of course, a problem).

 //Requires jquery. If you want to do the selection and the AJAX by hand, feel free, but its much easier for me to just implement Jquery.  
   //Hide submit button  
   //Create new submit button with validator function when clicked  
   $("<button/>").attr({'id':'newSubmit','type':'button'}).text("Check Submission").click(function(){  
    //Get location, which includes the worker ID, create an object to hold query string variables, and parse the query string  
       var href=window.location.href.toString();  
    var qs={};  
    href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){qs[$1] = $3;});  
    //The url below would be whatever script you use to query the database. The query string includes the worker ID and the surveyId,  
       //so you know who completed what. The callback=? is jquery's method of doing JSON with padding, which is necessary due to the   
       //cross-domain scripting (since this will be implemented in mturk)  
       var url=""+qs['wid']+"&surveyId="+surveyId+"&callback=?";  
                     //Listed as completed in the database  
                     //Listed as something other than completed  
                     alert("Our databases indicate you have not completed the survey yet. Please make sure you have completed every portion of the HIT before submitting.");  

As I mentioned, there are a lot of problems with this kind of implementation. You’re risking screwing over a good worker to save yourself 50 cents, since unless the person is good at manipulating your HTML directly in their browser console (which if you didn’t know is very easy to do) they cannot submit the HIT. One quick fix for that is to replace the line of code toward the end with a confirm() message instead of an alert() message. The confirm message would tell the user that it looks like they haven’t completed the HIT yet, but they can submit anyway if they want to. That code would look like this:

 //Replace the line of code with the alert() function with these two lines of code.  
 var sub=confirm("Our databases indicate you have not completed the survey yet. Please make sure you have completed every portion of the HIT before submitting. If you think this is an error, and you have completed the HIT, click OK to submit the HIT. Click Cancel to continue working on the HIT.");  
 if(sub) $('#submitButton').click();  

That’s about all I can say about preventing workers from submitting HITs before they complete the task. Again, in my experience this is non-problem. Simple HTML formatting fixes it. If someone is nefarious, there is not a lot you can do to stop him or her from stealing your payment other than rejecting workers after the fact. My philosophy is provide the best experience I can for the workers, and I suppose adding the confirm() option isn’t a bad idea. I’ll leave with an alternative function that validates based on a password you give your workers at the end of your survey. It uses the confirm() not the alert() method since I think completely preventing someone from submitting a HIT will do more harm than good.

 //To use this function, copy it and replace the the function that is currently an argument in the click() function in the code above  
      //Someone can look up the password in the console, since they can see the Javascript, but its doubtful. Again, nothings ever foolproof.  
      var password = "PUT YOUR PASSWORD HERE";  
      //Get password entered by user. The code assumes your text box which the user puts the password in has the id: password_input  
      var pass = $('#password_input').val();  
      //If nothing was entered, alert user and exit function  
      if(pass=="") {alert("Please enter the password given to you in the survey before sumbitting the HIT.');return;}  
      //Create Boolean variable to test whether to submit or not  
      var submit;  
      pass==password ? submit=true : submit=confirm('The password you entered does not match the password displayed at the end of the survey. Please make sure the survey is 100% complete before submitting the HIT. If you would like to submit anyway, click OK. However, you run the risk of having your work rejected. If you would like to continue working on the HIT, click Cancel.');  
      if(submit) $('#submitButton').click();  

The Rise and Fall of Kitchen Gadgets

As I’m sure the greats before me did, I get a lot of my research inspiration while shopping at Goodwill. Recently, I purchased a replacement cover for my ice cream maker when I was surprised to see my exact model sitting on a shelf. A week later there were 2 of them. At another store there were 3. All. The. Same. Model. Why? I tested one out. It spun. That’s all an ice cream maker does really. The cold part isn’t electric and can’t really break. Well it turns out the Cuisinart ICE-20 was insanely popular back in 2009. In 2013, apparently not so much. People are literally giving them away. My wife commented that they are the new bread machine. Everyone has one, one day. Everyone donates it to Goodwill the next.

Ice cream makers aren’t the first kitchen gadgets to have a meteoric rise and fall. Like I just said, bread machines were huge in the 90s. Electric knives made quite a splash in 1981. Fondue sets were famously over-popular in the 1970’s. All these still exist today, but are hardly considered must-have. So what caused their popularity and subsequent rejection? Impossible to answer. All of these occurred during food fads that required their use, but who know which causes which (the fad or the popularity). Also, all these examples reflect what Alton Brown calls uni-taskers. They are gadgets that are meant for doing one thing, and can’t really do much else. You can try to argue that by that definition, a knife is a unitasker since it only cuts, but you’d be wrong. Knives also stab. Seriously though, you can cut so many things in so many different ways, that it doesn’t really qualify. Ice cream makers on the other hand really just make ice cream. Sorbet is doable, but that’s just ice cream without the fat. Products that do only one thing will tend to have a sharp but short period of popularity.

The examples above also reflect consumers’ desire to make their lives simpler or cheaper. Making food at home tends to be cheaper than going out. Dumping flour and water into a machine and out pops bread 2 hours later is far easier than my 15 hour long sourdough bread procedure (way better though… way better). However, the gadgets may not offer much help. Ice cream makers do something you can’t do otherwise, but the machine rarely gets cold enough, affecting quality. And the process of making the custard to put into the machine is pretty difficult itself. Bread machines are bulky, hard to put away, hard to get back out, and no one wants it left on the counter-top taking up nearly an acre of counter-real-estate. Thus, products that are meant to make consumer’s live easier that really don’t will also suffer from uni-tasker syndrome.

After those hypotheses, I am left with one question. Why do people actually get rid of them? Is it just a Spring cleaning thing, or do people get rid of fad gadgets faster than they discard other items? My guess is yes, but this will have to born out through data.

So what will be the next item I can walk to Goodwill and pay 1/10 the original price for a slightly used (or maybe untouched, still in the box) kitchen gadget? Sodastream comes to mind. It only makes soda. It saves you at most like 50 cents up against a steep start-up investment. The only thing missing is the bizarre fad.

Don’t Hate the Requester; Hate the Game: An economic dilettante’s take on Mturk ethics

Maybe they are just not very vocal, but I’ve never come across an Amazon Mechanical Turk (mturk) worker  who lauds the business practices of mturk. In my experience as both a requester and worker on Amazon Mechanical Turk (mturk), I find ethical problems revolve around two primary concerns: 1) Workers are paid pittance and 2) Requesters have free range to reject work without justification. The second problem is hard to address. It basically comes down to “who watches the watchers.” Tools like TurkOpticon put some power into the worker’s hands, but probably not enough. However, it also seems to not be as large of a problem. People may occasionally get shafted, but in the big picture the effects seem to be small compared to the low payment rate (not to say it doesn’t suck though).

The low payment rate is an interesting moral question because it is somewhat unique, at least to those involved. Opponents argue that a sizable percentage use mturk as a primary or secondary income, and they should be earning a living wage. Proponents argue that mturk was not meant to used that way. Unlike the Walmart wage debate, no requester is trying to hire a worker even at part-time. They would say they are merely putting a task out there. If it pays too little then no one will accept it; forcing a higher payment rate would simply mean they would go elsewhere to go their data. Everyone loses. This reasoning is not infallible, and the rhetorical arguments can go on forever.

Instead, I have been thinking more about the economic argument. I’ve covered about 2 weeks worth of game theory in my economic principles of marketing course (essentially a whirlwind microecon 1 and 2 course covered in like 5 weeks). So naturally, I am now an expert ready to analyze important, broad, real-world topics. (Side note: if you didn’t catch that, I’m not an expert and don’t claim that what I write below is at all the way a world-class economist would analyze the situation… hell, I may have even made a wrong assumption. If you can do better, join the discussion).

At the most shallow level, mturk, like all employee-employer relationships, is like a Prisoners’ Dilemma game. The incentives in place push for workers to not work hard, and push for employers to pay low wages. In the table below, workers can choose to work or loaf (i.e. give junk work), and requesters choose to pay a high wage or a low wage. Each party chooses a strategy at the same time with perfect knowledge of the other person’s strategy. By perfect knowledge, I mean that both the employer and employee know the payoffs to each party. Thus, they know what the other person will do given the strategy they pick.

Work Loaf
High wage v-h, h-a -h, h
Low wage v-l, l-a -l, l

The first amount is the outcome for the requester, after the comma is the outcome to the worker. v=value of work, h=high wage, l=low wage, a=effort of working.

The equilibrium of this game (i.e. the point where neither player has an incentive to deviate from that strategy) is for the worker to loaf and the requester to pay a low wage. This is not how mturk operates, but it provides a baseline to analyze against. The low wages seem to exist, but the loafing does not. If it did, no one was use it for data purposes. However, these are the things people worry about because without checks the prisoners’ dilemma, where everyone is worse off, would occur.

Something that differentiates mturk is the ability for requesters to reject workers submissions and not pay them. Forgetting that requesters can be evil and reject everyone for no reason, we’ll assume that requesters have sufficient reason and have to put some effort into checking. In this case, workers can still work or loaf, but requesters can pay high and check, pay high and not check, pay low and check, and pay low and not check. Additionally, loafing has costs as well (it takes some time to complete a HIT with bad work, but not as much as it takes to complete a HIT with good work — hence d < a). They were not included before because they did not affect the equilibrium. Last, checking has costs as well (e.g. time taken to add an attention filter to a survey or get correct answers to compare to, research validity costs of making workers think you are watching them, time taken to actually check and reject bad work, etc.). This cost will be reflected in the cells where the requester checked if the worker loafed or not.

Work Loaf
High wage, check v-h-c, h-a -c, -d
High wage, not check v-h, h-a -h, h-d
Low wage, check v-l-c, l-a -c, -d
Low wage, not check v-l, l-a -l, l-d

Outcomes are in the same order. v, h, l, and a are the same. c=cost of checking, d=cost of loafing; d<a, c<v

Here there is no pure strategy Nash equilibrium (i.e. at no point in the table do both sides have no incentive to choose a different strategy). If the worker works, the requester’s strategy is to pay low and not check (that cell has the highest payoff to the requester). However, if the requester is going to do that (which the worker is aware of given perfect knowledge), then the worker would loaf since they do less work and get the same amount of money. Knowing that is the case, the requester would check thus not paying the worker. Knowing the requester would check, the worker would work. Finally knowing that, the requester wouldn’t check anymore. This unending cycle means each party’s strategy is chosen probabilistically (i.e. a mixed strategy Nash equilibrium) to maximize payoffs given beliefs about how likely an actor is to choose a particular strategy. This explains why some workers loaf and some work very hard. The loafers are playing a probability that they will be paid even though they did poor work. That probability exists because the requester is also playing a probability. Instead of checking work for each HIT, they only check some. Thus they save on the costs of checking, and provide some incentive to give good work. It is possible to work out these probabilities as a function of the costs and payments involved, but instead going through all that math, I will make two reasonable assumptions to change the game to make it slightly easier to solve.

First, the cost of checking is rather low, especially for research experiments. Adding an attention filter requires little effort, and most people have a research assistant updating credit (thus there is no time component). Treating c as zero means the check and not check rows in each wage level pay the same to the requester (v-h or v-l). Second, the cost of not working is sometimes as high as the cost of working. The amount of time saved is often negligible, and some people (those not doing it as a primary or secondary income) find the cognitive challenge of some research studies stimulating (that is the studies that don’t kill you with boredom), making the cost of doing good work less. Thus if we treat a and d as equal they can also be removed from the columns. This last assumption is less believable, but it also may not affect the equilibrium in a meaningful way.

After these assumptions are made, a pure strategy equilibrium exists (i.e. one of the cells is clearly preferred, and people do not choose probabilistically anymore). That strategy is for workers to work and for requesters to pay a low wage and check. This equilibrium somewhat reflects the state of mturk. Most workers do good work. Most requesters check to make sure they do and 99% of HITs pay below the minimum wage if computed at an hourly rate. One interesting point is after the assumption stated above are made, the low wage-check and low wage-no check cells are identical. This means that while all the actors can assume requesters will check. Requesters may not check, and only they would know (since information about checking is only known to workers if they do not work). After talking with lots of other researchers who use mturk, this reflects the state of the game. They check when its easy to do so, but find it isn’t always necessary and often do not check.

Recall that two hazards existed in mturk. Workers can turn in poor work, and requesters can pay a low wage. Absent the ability to reject HITs by checking the work of workers, a Prisoner’s Dilemma existed where workers turned in poor work in exchange for a small wage. When the ability to reject was introduced, the optimal strategy for workers was to turn in good work. Rejection thus solves for one of the hazards of mturk. This ends up being beneficial for both parties since workers get paid more on average and requesters get a high value from the data received.

However, low wages still exist. For those who do mturk for fun, that might not be such a bad thing. There is some intrinsic reward, as I mentioned earlier, and it may be better than other ways to waste time. However, given the large number of people who use Mturk as a source of income, the ethical dilemma still exists. What can be done to push the equilibrium to the high wage cells? If costs of checking were different for the two levels (i.e. if for some reason it was cheaper to check the work when paying high wages compared to low wages), it may be enough make the low wage payoff lower. However, it is unclear if this is feasible. A better possibility is to have different values of work for low wage and high wage. If the value of v in the high wage were, say, double that of v in the low wage cells, v-h-c may be greater than v-l-c. What does that look like in reality. 1) If you can help it, don’t complete under paid HITs. If it takes 4 times longer to get a HIT done at 25 cents than 50 cents, people will pay 50 cents. This is because the value (v) at that low wage is much lower given that people usually want their HITs completed quickly. 2) Hopefully this happens naturally, and not purposefully, but poorer data (i.e. more noise, more loafers, etc.) that comes from cheap HITs is likely to cause higher payments. Again, the value of a cheap HIT becomes low because the data isn’t usable or it takes time to weed through the poor responses. This could backfire though by causing requesters to look elsewhere for data collection needs. 3) Really focus when you complete a high paying HIT. Confirm the researchers thoughts that if they pay better they get better data (this is a generally accepted assumption). I’ve seen some respondents who complete a $5 HIT that takes 10 minutes in under a minute solely because (I assume) they think there is a chance itll sneak by. Seeing data get worse as cost goes up instantly lowers the payment rate (i.e. the value of work at high payment rates becomes lower than the value of work at low payment rates). The example above is very isolated however.

This simple economic analysis of mturk tells me that payment levels are where they are due to the incentive structure of mturk. Rejection prevents workers from not delivering good information, but giving good information regardless of payment causes low wages. Enforcing a type of minimum wage on HITs likely would push requesters out of the market, and return them to labs and expensive national online panel companies. Thus the primary way workers can help boost the wage level is to not complete low-wage HITs and ensure high-wage HITs are completed well. Seemingly obvious conclusions I know, but it was fun figuring out the game.

Social Consequences of Unforced Compliance: Reluctantly Joining Social Media

Twitter was always for overshare-ers to me. Instagram was wannabe creative-types without effort or talent. Klout? The attention whore’s measuring stick. One week ago, I started using all of those. After 7 or so days of being an oversharing, wannabe creative-type, attention whore I’ve noticed interesting behavior and changes in attitudes.

My social media awakening was less abrupt than I let on. I had a Friendster and LiveJournal since 2003. I was also an early MySpace and Facebook adopter. The purpose of these tools was to connect personally with friends and family. Facebook still serves that purpose to me. I live far from virtually everyone I’ve ever known, and it’s nice to know they exist. Facebook is not an external tool to me though. When a friends account becomes a marketing extension of the company they work for, the prejudice (and pride) run deep.

My current thrust was spurred on by a Professor’s (Pete McGraw) repeated discussions about the importance of external visibility for research. Projects used to be important based on citation counts. In the future, real-world impact is likely to be of much greater importance. This wasn’t entirely new to me, but my thinking previously was, “Well, I’ll be super impactful when I come up with my popular audience book idea.”

In the span of about five minutes my thoughts went from, “I should probably update my blog more regularly” to “Maybe also dust off the dormant Twitter account and broadcast the blog better.” Then the ball started rolling really fast. “I guess I should look at my Klout score and see if improving it leads to more people reading the blog.” “Instagram impacts Klout a lot? I said never, but since the app is free let’s go for it.”

Festinger’s Cognitive Consequences of Forced Compliance basically proposed that being forced to commit an act, given certain conditions, changes attitudes to be in line with the behavior. Since my behavior wasn’t forced, did my attitudes change in those few minutes which changed my behavior or does voluntarily changing behavior lead to the same (or stronger) attitude change compared to unforced compliance?

I don’t really desire to answer that question. All I know is the attitudes certainly changed. I enjoy having additional channels to interact with friends. I enjoy cultivating a professional online persona. I enjoy seeing a larger readership of my blog.

Self-Reflective Things I’ve Learned Thus Far

  • I am old. I realize this is a very cliched joke to make when you in your twenties and comparing yourself to teenagers, so that’s not really what I am doing. Instead, I’m making the slightly less cliched joke of pointing out when you are becoming everything you made fun of your parents about. Learning the norms of social networks is difficult. Sometimes I have these out of body experiences where I’m looking at myself like I’m watching a child learning to ice skate (or a grown 27 year old man who grew up in New England and never learned to skate learn to skate… blog pending). Luckily I’m cognizant of this, so I’m not too grandpa-on-facebook-like.
  • Instagram is not nearly as douchey as I had always thought. I’ve always felt like I don’t document important events enough (note this is nothing like the recent research in over-documenting… I don’t even have pictures from my honeymoon up the California coast), and Instagram provides a way to do that with double the positive reinforcement — friends connect, Klout score increases. Side note: Headlines like this one recent posted on Instagram’s twitter account will always make me roll my eyes: “Top 10 Photographers [really???] on Instagram”
  • Interestingly, after joining Instagram and Klout, I felt the need to make fun of that fact. Internally, I apparently couldn’t accept the statement “I’m joining Instagram.” It had to be something like, “Even I think I’m crazy and stupid, but I’m joining Instagram anyway.” This is probably already documented, but the need for some semblance of attitudinal or behavioral consistency even when you are performing a completely inconsistent action seems like an interesting phenomenon (assuming its not just me).

How to randomize or shuffle an array in Qualtrics

Qualtrics does many things right. However, its vast capabilities sometimes makes me think it can do things that it can’t. Unlike SurveyMonkey where I just assume it can’t do anything, sometimes I think Qualtrics can do everything. Luckily, Qualtrics’ JavaScript capabilities makes it so if you know some coding, you can do a lot of the things you thought were impossible.

Randomizing arrays of numbers is something Qualtrics can’t do (easily) without JavaScript. Technically, you can create a randomizer, inside the randomizer create X branch elements, set each element to automatically occur (previously set an embedded data element named a, set the value to 1, then set the branches to occur if a=1), create an embdedded data element in each branch, and set the randomizer to randomly show X elements evenly. Pretty difficult, and even that won’t do everything. You would need to add in however you implement the random number or use some piped text code to add that number to a different element to you can end up with a full array after the randomizer ends. Regardless it is difficult.

In Javascript the code is this (here it is on Github):

1:  function shuffleArray(array) {  
2:    for (var i = array.length - 1; i > 0; i--) {  
3:      var j = Math.floor(Math.random() * (i + 1));  
4:      var temp = array[i];  
5:      array[i] = array[j];  
6:      array[j] = temp;  
7:    }  
8:    return array;  
9:  }  

In a question in Qualtrics, you simply insert this code into the javascript editor. One way or another create your array (type it in, grab it from the question text, etc…. if you need help doing this let me know in the comments), shuffle it using this code, then one way or another use it. You can set the array to an embedded data element using the SurveyEngine.setEmbeddedData() function that is already available in the Quatlrics API (mentioned here). You can add it as text to a question (I think that function is mentioned in the previous link, if not its another SurveyEngine function). Again, the implementation options are infinite. If you’ve gotten this far, and don’t know how to use your randomized array, again, let me know in the comments.

Changing Mturk submit button functionality & A new way to prevent duplicate workers on separate HITs

Previously, I’ve posted some simple ways to prevent workers from completing your mturk HITs because they already completed an identical one last week or last month, etc. (here, here, and here). Sometimes keeping track of long lists of mturk workers who have completed your previous HITs is difficult. Editing, copying, and pasting a list of 1000 workers that took your survey last time and can’t take it this time takes a while and is prone to mistakes. The best method I can suggest when this is difficult is to start databasing your workers. This is how I manage my Mturk participants, but it requires a decent amount of knowledge of JavaScript, PHP, and MySQL. If that is not your cup of tea, I have been trying to work on an especially simple method using cookies, though it is by definition imperfect.

Cookies are simply small bits of data that a webpage can store on your computer, then later read back. It’s how website “remember” that you are logged in, for instance. Using a small amount of JavaScript, you can set a cookie that is named after your study. The existence of the cookie with that particular name means that the worker completed that study already. When you post a study later and want to exclude people who took your first study, your HIT will have some JavaScript in it that searches for that cookie you created earlier. If it exists, the person took your first study and is excluded, if it doesn’t, they can take your new study. There is no copy and pasting, and no long files full of worker IDs.

However, as I said, it is imperfect. Some users turn off cookies, preventing you from setting the cookie. Some users clear their cookies regularly, deleting the cookie you created. Cookies are computer based, not user based. So a user can go to a different computer and still take your new study. Also, a second person can try taking your study on the same computer and be prevented from doing so because the cookie applies to anyone using that computer, not the specific person. Given the large amount of mturk workers, and the base-rates for the problems I just mentioned, I would guess that this method would prevent about 90% of workers who you don’t want taking your studies from doing so. It may be more or less. The key factors are how many people regularly clear their cookies or complete HITs on multiple computers.

Here is the commented code you can copy and paste into your HIT template window. It first loads jQuery, then adds an event listener to the Submit button, then checks if a cookie exists with the same name as the cookie that gets created when the submit button is clicked. All you need to change is the studyName value to something unique for your study. However, what happens on line 20 is up to you. right now it simply alerts the user that they have taken your study before. Someone more useful would be to prevent any other content from loading and telling them to not accept/return the HIT. Since exactly how it gets implemented can change for HIT to HIT, I didn’t not specify a procedure beyond alerting the user. The cookies are set to last a maximum of 1 year. You can hypothetically make it longer, but likely people have deleted their cookies by then anyway.

Github code here

1:  <script src=""></script>  
2:  <script type="text/javascript">  
3:  //this code assumes jQuery is implemented, though it would not be difficult to do this with straight javascript. .ready() assures the code runs after the submit button is added  
4:  $(document).ready(function () {  
5:    var studyName="dm102513";//replace this value with a unique, acceptable cookie name... no spaces and stick with alphanumerics and underscores  
6:    $("#submitButton").click(function(){  
7:        var exdate=new Date();  
8:        exdate.setDate(exdate.getDate() + 365);  
9:     document.cookie=studyName+"=1; expires="+exdate.toUTCString()+";; path=/";//When the user clicks the Submit button, it creates a cookie that you can read later as having already completed this survey  
10:    });  
11:    var i,x,y,ARRcookies=document.cookie.split(";");//variables necessary to read the cookies. Last variables separates all available cookie names into an array we will loop through to find our cookie  
12:    for (i=0;i<ARRcookies.length;i++){  
13:     x=ARRcookies[i].substr(0,ARRcookies[i].indexOf("="));//x=cookie name  
14:     y=ARRcookies[i].substr(ARRcookies[i].indexOf("=")+1);//y=cookie value  
15:     x=x.replace(/^\s+|\s+$/g,"");//Remove white space  
16:     if (x==studyName && y==1){  
17:             //If your cookie exists AND the value is 1, they have taken your survey before.  
18:             //Here you would put code that somehow prevents them from doing anything for your HIT, like not loading content etc.  
19:             //However, for our purposes here, I just put a simple alert in. Change this to whatever code works for you.  
20:       alert("You are not eligible for this HIT because you have already completed an identical HIT.");  
21:     }  
22:    }  
23:  });  
24:  </script>  

Allow only specified workers to complete your Mturk HITs

The other day I posted some code that allows mturk requesters to exclude workers based on a black list (here). This is typically used when someone already completed your survey, but for one reason or another you can’t just extend your already created HIT. The code in there actually started as the inverse. Instead of wanting to block certain people, a colleague wanted to only include certain people. Rather than dealing with the under-powered qualification system (under-powered for mturk web interface users at least), I figured it would be easy to just create a white list of worker Ids, and exclude everyone else from HIT. This way seemed faster than somehow creating a qualification test, or other method of excluding/including workers.

Again, this is very flexible and easily changeable. I, for instance, don’t rely on client-side programs (i.e. JavaScript) for validating users. Instead of checking the ID against a list declared in JavaScript, I send the data to server-side PHP script which returns the validation term, and if validated the URL to send workers to. The changes that would need to be done for this is to remove the url variable initialization, and add an AJAX request instead of the check() function call for when the HIT is accepted.

This code below is copy-and-pastable. Just change the Title, Instructions, target URL (where to send workers who are allowed), and create a white list of worker IDs. See the post linked at the top for how, but you will need to wrap all ids in quotation marks, and separate them by a comma. Last, change the conditional instructions to tell the worker what to do when they are on/off the list, etc.

Here’s the code (also here):

1:  <h1>TITLE - change this</h1>  
2:  <p>INSTRUCTIONS - change this</p>  
3:  <div id="idcheck">&nbsp;</div>  
4:  <p><textarea cols="80" name="comment" rows="3"></textarea></p>  
5:  <script type='text/javascript'>  
6:  var url="URL GOES HERE";//The URL that the user will go to after they are checked (i.e. the survey url). Change this.  
7:  var workers=new Array("test1","test1","test3");//This is your white list. Change this. All IDs must be enclosed in quoation marks (single or double), and separated by a comma  
8:  //Below is a utility function that checks if a given value is in an array. This will be used to check if the id (given value) is in the black list (an array)  
9:  Array.prototype.contains = function(k) {  
10:    for(var p in this)  
11:      if(this[p] === k)  
12:        return true;  
13:    return false;  
14:  }  
15:  //HTML instructions and form for ID input  
16:  var naCheck="<p>To see if you are qualified for this survey, enter worker ID below and click \'Check ID\'</p><p><textarea rows='1' cols='20' id='idinput'></textarea></p><div id='ok' onclick='check(false)' style='border: 3px solid black;cursor: hand; cursor: pointer; background-color:gray;color:white;width:70px;height:20px;text-align:center;margin-left:10px;'>Check ID</div>";  
17:  var href=window.location.href.toString();//Get URL of mturk webpage which may or may not include the worker ID  
18:  var queryString={};//storage variable for query string variables  
19:  href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Populates variable named queryString (created above) with the actual queryString variable names and values  
20:  //If queryString contains a variable names workerId, check if the black list contains that value of that variable, if not, ask the user to input their ID.  
21:  if(queryString['workerId']!=undefined)  
22:  {  
23:       check(queryString['workerId']);//Check ID against white list  
24:  }  
25:  else  
26:  {  
27:       document.getElementById('idcheck').innerHTML=naCheck;//Add HTML necessary to ask for id input  
28:  }  
29:  //This function will check the ID given against the white list, or if not given will check the ID input into the text box against the white list  
30:  function check(id){  
31:       if(id==false){  
32:            if(workers.contains(document.getElementById('idinput').value))  
33:            {  
34:                 //Input ID found on list -- ask to accept, when this happens the page reloads and the program starts over
35:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of qualified workers. Please accept the HIT to continue.";  
36:            }  
37:            else  
38:            {  
39:                 //Input ID not found on list -- ask not to accept
40:                 document.getElementById("idcheck").innerHTML="Your ID is not on the list of qualified workers. Please do not accept the HIT.";  
41:            }  
42:       }  
43:       else  
44:       {  
45:            if(workers.contains(id))  
46:            {  
47:                 //Actual ID found on list -- send to survey
                   document.getElementById("idcheck").innerHTML="<p>To proceed to the survey <a href='"+url+'&workerId='+id+"' target='mturksurvey'>click here</a>";
49:            }  
50:            else  
51:            {  
52:                 //Actual ID not found on list -- Ask to return HIT
53:                document.getElementById("idcheck").innerHTML="Your ID is not on the list of qualified workers. Please return the HIT. If you previously entered your ID and were told to accept the HIT, you likely did not enter the correct worker ID.";   
54:            }  
55:       }  
56:  }  
57:  </script>  

Post Navigation


Get every new post delivered to your Inbox.

Join 101 other followers