Preventing Mturk workers from prematurely submitting HITs (by altering the submit button’s functionality)

If you’ve ever used the Mturk web interface for creating HITs, you know it’s lacking in several areas. One that can be particularly annoying is the submit button that gets added to the end of every template you make. It looks like a plain HTML submit button, and is a half inch away from your normal content. I used to receive emails constantly from people who accidentally submitted their HIT before completing it. Never one to cry over 25 cents, I approved them, but answering those emails took too much time.

My initial solution was simple. I put about 10 blank lines at the end of my HIT template, following by a horizontal bar, followed by instructions to only click the submit button once the HIT is complete. That stopped about 99% of the emails. Which to me was a success. This is still all I do for my HITs, and if you are having this problem, just add the HTML code below to the end of your HITs.

 <p>After you have completed everything in this HIT, click the submit button below.</p>  

However, a friend recently wanted something more foolproof. I can’t remember the exact reasoning, but my friend wanted the submit button gone entirely. If you need 100% assurance (good luck you’ll never get it), you can try the method below. The logic is to first hide the submit button, which is a simple CSS modification. Once you want the user to submit the HIT, you can “click” the submit button for them by using the JavaScript click() function. What’s in the middle is really up to you. The code below replaces the submit button with a different button. That button queries a database which records when workers complete tasks within HITs. If a task is complete but the HIT is not submitted, the script submits the HIT. If the task is not completed, it alerts the user that they have not finished the task yet. This naturally assumes you have a database that records this information. Likely you do not. However, if you use confirmation codes or secret end-of-survey-passwords you can use JavaScript to check if the passcode is correct or not. If it is incorrect, they are either trying to cheat your out of 25 cents, or forgot the passcode (which is, of course, a problem).

 //Requires jquery. If you want to do the selection and the AJAX by hand, feel free, but its much easier for me to just implement Jquery.  
   //Hide submit button  
   //Create new submit button with validator function when clicked  
   $("<button/>").attr({'id':'newSubmit','type':'button'}).text("Check Submission").click(function(){  
    //Get location, which includes the worker ID, create an object to hold query string variables, and parse the query string  
       var href=window.location.href.toString();  
    var qs={};  
    href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){qs[$1] = $3;});  
    //The url below would be whatever script you use to query the database. The query string includes the worker ID and the surveyId,  
       //so you know who completed what. The callback=? is jquery's method of doing JSON with padding, which is necessary due to the   
       //cross-domain scripting (since this will be implemented in mturk)  
       var url=""+qs['wid']+"&surveyId="+surveyId+"&callback=?";  
                     //Listed as completed in the database  
                     //Listed as something other than completed  
                     alert("Our databases indicate you have not completed the survey yet. Please make sure you have completed every portion of the HIT before submitting.");  

As I mentioned, there are a lot of problems with this kind of implementation. You’re risking screwing over a good worker to save yourself 50 cents, since unless the person is good at manipulating your HTML directly in their browser console (which if you didn’t know is very easy to do) they cannot submit the HIT. One quick fix for that is to replace the line of code toward the end with a confirm() message instead of an alert() message. The confirm message would tell the user that it looks like they haven’t completed the HIT yet, but they can submit anyway if they want to. That code would look like this:

 //Replace the line of code with the alert() function with these two lines of code.  
 var sub=confirm("Our databases indicate you have not completed the survey yet. Please make sure you have completed every portion of the HIT before submitting. If you think this is an error, and you have completed the HIT, click OK to submit the HIT. Click Cancel to continue working on the HIT.");  
 if(sub) $('#submitButton').click();  

That’s about all I can say about preventing workers from submitting HITs before they complete the task. Again, in my experience this is non-problem. Simple HTML formatting fixes it. If someone is nefarious, there is not a lot you can do to stop him or her from stealing your payment other than rejecting workers after the fact. My philosophy is provide the best experience I can for the workers, and I suppose adding the confirm() option isn’t a bad idea. I’ll leave with an alternative function that validates based on a password you give your workers at the end of your survey. It uses the confirm() not the alert() method since I think completely preventing someone from submitting a HIT will do more harm than good.

 //To use this function, copy it and replace the the function that is currently an argument in the click() function in the code above  
      //Someone can look up the password in the console, since they can see the Javascript, but its doubtful. Again, nothings ever foolproof.  
      var password = "PUT YOUR PASSWORD HERE";  
      //Get password entered by user. The code assumes your text box which the user puts the password in has the id: password_input  
      var pass = $('#password_input').val();  
      //If nothing was entered, alert user and exit function  
      if(pass=="") {alert("Please enter the password given to you in the survey before sumbitting the HIT.');return;}  
      //Create Boolean variable to test whether to submit or not  
      var submit;  
      pass==password ? submit=true : submit=confirm('The password you entered does not match the password displayed at the end of the survey. Please make sure the survey is 100% complete before submitting the HIT. If you would like to submit anyway, click OK. However, you run the risk of having your work rejected. If you would like to continue working on the HIT, click Cancel.');  
      if(submit) $('#submitButton').click();  

The Rise and Fall of Kitchen Gadgets

As I’m sure the greats before me did, I get a lot of my research inspiration while shopping at Goodwill. Recently, I purchased a replacement cover for my ice cream maker when I was surprised to see my exact model sitting on a shelf. A week later there were 2 of them. At another store there were 3. All. The. Same. Model. Why? I tested one out. It spun. That’s all an ice cream maker does really. The cold part isn’t electric and can’t really break. Well it turns out the Cuisinart ICE-20 was insanely popular back in 2009. In 2013, apparently not so much. People are literally giving them away. My wife commented that they are the new bread machine. Everyone has one, one day. Everyone donates it to Goodwill the next.

Ice cream makers aren’t the first kitchen gadgets to have a meteoric rise and fall. Like I just said, bread machines were huge in the 90s. Electric knives made quite a splash in 1981. Fondue sets were famously over-popular in the 1970’s. All these still exist today, but are hardly considered must-have. So what caused their popularity and subsequent rejection? Impossible to answer. All of these occurred during food fads that required their use, but who know which causes which (the fad or the popularity). Also, all these examples reflect what Alton Brown calls uni-taskers. They are gadgets that are meant for doing one thing, and can’t really do much else. You can try to argue that by that definition, a knife is a unitasker since it only cuts, but you’d be wrong. Knives also stab. Seriously though, you can cut so many things in so many different ways, that it doesn’t really qualify. Ice cream makers on the other hand really just make ice cream. Sorbet is doable, but that’s just ice cream without the fat. Products that do only one thing will tend to have a sharp but short period of popularity.

The examples above also reflect consumers’ desire to make their lives simpler or cheaper. Making food at home tends to be cheaper than going out. Dumping flour and water into a machine and out pops bread 2 hours later is far easier than my 15 hour long sourdough bread procedure (way better though… way better). However, the gadgets may not offer much help. Ice cream makers do something you can’t do otherwise, but the machine rarely gets cold enough, affecting quality. And the process of making the custard to put into the machine is pretty difficult itself. Bread machines are bulky, hard to put away, hard to get back out, and no one wants it left on the counter-top taking up nearly an acre of counter-real-estate. Thus, products that are meant to make consumer’s live easier that really don’t will also suffer from uni-tasker syndrome.

After those hypotheses, I am left with one question. Why do people actually get rid of them? Is it just a Spring cleaning thing, or do people get rid of fad gadgets faster than they discard other items? My guess is yes, but this will have to born out through data.

So what will be the next item I can walk to Goodwill and pay 1/10 the original price for a slightly used (or maybe untouched, still in the box) kitchen gadget? Sodastream comes to mind. It only makes soda. It saves you at most like 50 cents up against a steep start-up investment. The only thing missing is the bizarre fad.

Don’t Hate the Requester; Hate the Game: An economic dilettante’s take on Mturk ethics

Maybe they are just not very vocal, but I’ve never come across an Amazon Mechanical Turk (mturk) worker  who lauds the business practices of mturk. In my experience as both a requester and worker on Amazon Mechanical Turk (mturk), I find ethical problems revolve around two primary concerns: 1) Workers are paid pittance and 2) Requesters have free range to reject work without justification. The second problem is hard to address. It basically comes down to “who watches the watchers.” Tools like TurkOpticon put some power into the worker’s hands, but probably not enough. However, it also seems to not be as large of a problem. People may occasionally get shafted, but in the big picture the effects seem to be small compared to the low payment rate (not to say it doesn’t suck though).

The low payment rate is an interesting moral question because it is somewhat unique, at least to those involved. Opponents argue that a sizable percentage use mturk as a primary or secondary income, and they should be earning a living wage. Proponents argue that mturk was not meant to used that way. Unlike the Walmart wage debate, no requester is trying to hire a worker even at part-time. They would say they are merely putting a task out there. If it pays too little then no one will accept it; forcing a higher payment rate would simply mean they would go elsewhere to go their data. Everyone loses. This reasoning is not infallible, and the rhetorical arguments can go on forever.

Instead, I have been thinking more about the economic argument. I’ve covered about 2 weeks worth of game theory in my economic principles of marketing course (essentially a whirlwind microecon 1 and 2 course covered in like 5 weeks). So naturally, I am now an expert ready to analyze important, broad, real-world topics. (Side note: if you didn’t catch that, I’m not an expert and don’t claim that what I write below is at all the way a world-class economist would analyze the situation… hell, I may have even made a wrong assumption. If you can do better, join the discussion).

At the most shallow level, mturk, like all employee-employer relationships, is like a Prisoners’ Dilemma game. The incentives in place push for workers to not work hard, and push for employers to pay low wages. In the table below, workers can choose to work or loaf (i.e. give junk work), and requesters choose to pay a high wage or a low wage. Each party chooses a strategy at the same time with perfect knowledge of the other person’s strategy. By perfect knowledge, I mean that both the employer and employee know the payoffs to each party. Thus, they know what the other person will do given the strategy they pick.

Work Loaf
High wage v-h, h-a -h, h
Low wage v-l, l-a -l, l

The first amount is the outcome for the requester, after the comma is the outcome to the worker. v=value of work, h=high wage, l=low wage, a=effort of working.

The equilibrium of this game (i.e. the point where neither player has an incentive to deviate from that strategy) is for the worker to loaf and the requester to pay a low wage. This is not how mturk operates, but it provides a baseline to analyze against. The low wages seem to exist, but the loafing does not. If it did, no one was use it for data purposes. However, these are the things people worry about because without checks the prisoners’ dilemma, where everyone is worse off, would occur.

Something that differentiates mturk is the ability for requesters to reject workers submissions and not pay them. Forgetting that requesters can be evil and reject everyone for no reason, we’ll assume that requesters have sufficient reason and have to put some effort into checking. In this case, workers can still work or loaf, but requesters can pay high and check, pay high and not check, pay low and check, and pay low and not check. Additionally, loafing has costs as well (it takes some time to complete a HIT with bad work, but not as much as it takes to complete a HIT with good work — hence d < a). They were not included before because they did not affect the equilibrium. Last, checking has costs as well (e.g. time taken to add an attention filter to a survey or get correct answers to compare to, research validity costs of making workers think you are watching them, time taken to actually check and reject bad work, etc.). This cost will be reflected in the cells where the requester checked if the worker loafed or not.

Work Loaf
High wage, check v-h-c, h-a -c, -d
High wage, not check v-h, h-a -h, h-d
Low wage, check v-l-c, l-a -c, -d
Low wage, not check v-l, l-a -l, l-d

Outcomes are in the same order. v, h, l, and a are the same. c=cost of checking, d=cost of loafing; d<a, c<v

Here there is no pure strategy Nash equilibrium (i.e. at no point in the table do both sides have no incentive to choose a different strategy). If the worker works, the requester’s strategy is to pay low and not check (that cell has the highest payoff to the requester). However, if the requester is going to do that (which the worker is aware of given perfect knowledge), then the worker would loaf since they do less work and get the same amount of money. Knowing that is the case, the requester would check thus not paying the worker. Knowing the requester would check, the worker would work. Finally knowing that, the requester wouldn’t check anymore. This unending cycle means each party’s strategy is chosen probabilistically (i.e. a mixed strategy Nash equilibrium) to maximize payoffs given beliefs about how likely an actor is to choose a particular strategy. This explains why some workers loaf and some work very hard. The loafers are playing a probability that they will be paid even though they did poor work. That probability exists because the requester is also playing a probability. Instead of checking work for each HIT, they only check some. Thus they save on the costs of checking, and provide some incentive to give good work. It is possible to work out these probabilities as a function of the costs and payments involved, but instead going through all that math, I will make two reasonable assumptions to change the game to make it slightly easier to solve.

First, the cost of checking is rather low, especially for research experiments. Adding an attention filter requires little effort, and most people have a research assistant updating credit (thus there is no time component). Treating c as zero means the check and not check rows in each wage level pay the same to the requester (v-h or v-l). Second, the cost of not working is sometimes as high as the cost of working. The amount of time saved is often negligible, and some people (those not doing it as a primary or secondary income) find the cognitive challenge of some research studies stimulating (that is the studies that don’t kill you with boredom), making the cost of doing good work less. Thus if we treat a and d as equal they can also be removed from the columns. This last assumption is less believable, but it also may not affect the equilibrium in a meaningful way.

After these assumptions are made, a pure strategy equilibrium exists (i.e. one of the cells is clearly preferred, and people do not choose probabilistically anymore). That strategy is for workers to work and for requesters to pay a low wage and check. This equilibrium somewhat reflects the state of mturk. Most workers do good work. Most requesters check to make sure they do and 99% of HITs pay below the minimum wage if computed at an hourly rate. One interesting point is after the assumption stated above are made, the low wage-check and low wage-no check cells are identical. This means that while all the actors can assume requesters will check. Requesters may not check, and only they would know (since information about checking is only known to workers if they do not work). After talking with lots of other researchers who use mturk, this reflects the state of the game. They check when its easy to do so, but find it isn’t always necessary and often do not check.

Recall that two hazards existed in mturk. Workers can turn in poor work, and requesters can pay a low wage. Absent the ability to reject HITs by checking the work of workers, a Prisoner’s Dilemma existed where workers turned in poor work in exchange for a small wage. When the ability to reject was introduced, the optimal strategy for workers was to turn in good work. Rejection thus solves for one of the hazards of mturk. This ends up being beneficial for both parties since workers get paid more on average and requesters get a high value from the data received.

However, low wages still exist. For those who do mturk for fun, that might not be such a bad thing. There is some intrinsic reward, as I mentioned earlier, and it may be better than other ways to waste time. However, given the large number of people who use Mturk as a source of income, the ethical dilemma still exists. What can be done to push the equilibrium to the high wage cells? If costs of checking were different for the two levels (i.e. if for some reason it was cheaper to check the work when paying high wages compared to low wages), it may be enough make the low wage payoff lower. However, it is unclear if this is feasible. A better possibility is to have different values of work for low wage and high wage. If the value of v in the high wage were, say, double that of v in the low wage cells, v-h-c may be greater than v-l-c. What does that look like in reality. 1) If you can help it, don’t complete under paid HITs. If it takes 4 times longer to get a HIT done at 25 cents than 50 cents, people will pay 50 cents. This is because the value (v) at that low wage is much lower given that people usually want their HITs completed quickly. 2) Hopefully this happens naturally, and not purposefully, but poorer data (i.e. more noise, more loafers, etc.) that comes from cheap HITs is likely to cause higher payments. Again, the value of a cheap HIT becomes low because the data isn’t usable or it takes time to weed through the poor responses. This could backfire though by causing requesters to look elsewhere for data collection needs. 3) Really focus when you complete a high paying HIT. Confirm the researchers thoughts that if they pay better they get better data (this is a generally accepted assumption). I’ve seen some respondents who complete a $5 HIT that takes 10 minutes in under a minute solely because (I assume) they think there is a chance itll sneak by. Seeing data get worse as cost goes up instantly lowers the payment rate (i.e. the value of work at high payment rates becomes lower than the value of work at low payment rates). The example above is very isolated however.

This simple economic analysis of mturk tells me that payment levels are where they are due to the incentive structure of mturk. Rejection prevents workers from not delivering good information, but giving good information regardless of payment causes low wages. Enforcing a type of minimum wage on HITs likely would push requesters out of the market, and return them to labs and expensive national online panel companies. Thus the primary way workers can help boost the wage level is to not complete low-wage HITs and ensure high-wage HITs are completed well. Seemingly obvious conclusions I know, but it was fun figuring out the game.

Social Consequences of Unforced Compliance: Reluctantly Joining Social Media

Twitter was always for overshare-ers to me. Instagram was wannabe creative-types without effort or talent. Klout? The attention whore’s measuring stick. One week ago, I started using all of those. After 7 or so days of being an oversharing, wannabe creative-type, attention whore I’ve noticed interesting behavior and changes in attitudes.

My social media awakening was less abrupt than I let on. I had a Friendster and LiveJournal since 2003. I was also an early MySpace and Facebook adopter. The purpose of these tools was to connect personally with friends and family. Facebook still serves that purpose to me. I live far from virtually everyone I’ve ever known, and it’s nice to know they exist. Facebook is not an external tool to me though. When a friends account becomes a marketing extension of the company they work for, the prejudice (and pride) run deep.

My current thrust was spurred on by a Professor’s (Pete McGraw) repeated discussions about the importance of external visibility for research. Projects used to be important based on citation counts. In the future, real-world impact is likely to be of much greater importance. This wasn’t entirely new to me, but my thinking previously was, “Well, I’ll be super impactful when I come up with my popular audience book idea.”

In the span of about five minutes my thoughts went from, “I should probably update my blog more regularly” to “Maybe also dust off the dormant Twitter account and broadcast the blog better.” Then the ball started rolling really fast. “I guess I should look at my Klout score and see if improving it leads to more people reading the blog.” “Instagram impacts Klout a lot? I said never, but since the app is free let’s go for it.”

Festinger’s Cognitive Consequences of Forced Compliance basically proposed that being forced to commit an act, given certain conditions, changes attitudes to be in line with the behavior. Since my behavior wasn’t forced, did my attitudes change in those few minutes which changed my behavior or does voluntarily changing behavior lead to the same (or stronger) attitude change compared to unforced compliance?

I don’t really desire to answer that question. All I know is the attitudes certainly changed. I enjoy having additional channels to interact with friends. I enjoy cultivating a professional online persona. I enjoy seeing a larger readership of my blog.

Self-Reflective Things I’ve Learned Thus Far

  • I am old. I realize this is a very cliched joke to make when you in your twenties and comparing yourself to teenagers, so that’s not really what I am doing. Instead, I’m making the slightly less cliched joke of pointing out when you are becoming everything you made fun of your parents about. Learning the norms of social networks is difficult. Sometimes I have these out of body experiences where I’m looking at myself like I’m watching a child learning to ice skate (or a grown 27 year old man who grew up in New England and never learned to skate learn to skate… blog pending). Luckily I’m cognizant of this, so I’m not too grandpa-on-facebook-like.
  • Instagram is not nearly as douchey as I had always thought. I’ve always felt like I don’t document important events enough (note this is nothing like the recent research in over-documenting… I don’t even have pictures from my honeymoon up the California coast), and Instagram provides a way to do that with double the positive reinforcement — friends connect, Klout score increases. Side note: Headlines like this one recent posted on Instagram’s twitter account will always make me roll my eyes: “Top 10 Photographers [really???] on Instagram”
  • Interestingly, after joining Instagram and Klout, I felt the need to make fun of that fact. Internally, I apparently couldn’t accept the statement “I’m joining Instagram.” It had to be something like, “Even I think I’m crazy and stupid, but I’m joining Instagram anyway.” This is probably already documented, but the need for some semblance of attitudinal or behavioral consistency even when you are performing a completely inconsistent action seems like an interesting phenomenon (assuming its not just me).

How to randomize or shuffle an array in Qualtrics

Qualtrics does many things right. However, its vast capabilities sometimes makes me think it can do things that it can’t. Unlike SurveyMonkey where I just assume it can’t do anything, sometimes I think Qualtrics can do everything. Luckily, Qualtrics’ JavaScript capabilities makes it so if you know some coding, you can do a lot of the things you thought were impossible.

Randomizing arrays of numbers is something Qualtrics can’t do (easily) without JavaScript. Technically, you can create a randomizer, inside the randomizer create X branch elements, set each element to automatically occur (previously set an embedded data element named a, set the value to 1, then set the branches to occur if a=1), create an embdedded data element in each branch, and set the randomizer to randomly show X elements evenly. Pretty difficult, and even that won’t do everything. You would need to add in however you implement the random number or use some piped text code to add that number to a different element to you can end up with a full array after the randomizer ends. Regardless it is difficult.

In Javascript the code is this (here it is on Github):

1:  function shuffleArray(array) {  
2:    for (var i = array.length - 1; i > 0; i--) {  
3:      var j = Math.floor(Math.random() * (i + 1));  
4:      var temp = array[i];  
5:      array[i] = array[j];  
6:      array[j] = temp;  
7:    }  
8:    return array;  
9:  }  

In a question in Qualtrics, you simply insert this code into the javascript editor. One way or another create your array (type it in, grab it from the question text, etc…. if you need help doing this let me know in the comments), shuffle it using this code, then one way or another use it. You can set the array to an embedded data element using the SurveyEngine.setEmbeddedData() function that is already available in the Quatlrics API (mentioned here). You can add it as text to a question (I think that function is mentioned in the previous link, if not its another SurveyEngine function). Again, the implementation options are infinite. If you’ve gotten this far, and don’t know how to use your randomized array, again, let me know in the comments.

Changing Mturk submit button functionality & A new way to prevent duplicate workers on separate HITs

Previously, I’ve posted some simple ways to prevent workers from completing your mturk HITs because they already completed an identical one last week or last month, etc. (here, here, and here). Sometimes keeping track of long lists of mturk workers who have completed your previous HITs is difficult. Editing, copying, and pasting a list of 1000 workers that took your survey last time and can’t take it this time takes a while and is prone to mistakes. The best method I can suggest when this is difficult is to start databasing your workers. This is how I manage my Mturk participants, but it requires a decent amount of knowledge of JavaScript, PHP, and MySQL. If that is not your cup of tea, I have been trying to work on an especially simple method using cookies, though it is by definition imperfect.

Cookies are simply small bits of data that a webpage can store on your computer, then later read back. It’s how website “remember” that you are logged in, for instance. Using a small amount of JavaScript, you can set a cookie that is named after your study. The existence of the cookie with that particular name means that the worker completed that study already. When you post a study later and want to exclude people who took your first study, your HIT will have some JavaScript in it that searches for that cookie you created earlier. If it exists, the person took your first study and is excluded, if it doesn’t, they can take your new study. There is no copy and pasting, and no long files full of worker IDs.

However, as I said, it is imperfect. Some users turn off cookies, preventing you from setting the cookie. Some users clear their cookies regularly, deleting the cookie you created. Cookies are computer based, not user based. So a user can go to a different computer and still take your new study. Also, a second person can try taking your study on the same computer and be prevented from doing so because the cookie applies to anyone using that computer, not the specific person. Given the large amount of mturk workers, and the base-rates for the problems I just mentioned, I would guess that this method would prevent about 90% of workers who you don’t want taking your studies from doing so. It may be more or less. The key factors are how many people regularly clear their cookies or complete HITs on multiple computers.

Here is the commented code you can copy and paste into your HIT template window. It first loads jQuery, then adds an event listener to the Submit button, then checks if a cookie exists with the same name as the cookie that gets created when the submit button is clicked. All you need to change is the studyName value to something unique for your study. However, what happens on line 20 is up to you. right now it simply alerts the user that they have taken your study before. Someone more useful would be to prevent any other content from loading and telling them to not accept/return the HIT. Since exactly how it gets implemented can change for HIT to HIT, I didn’t not specify a procedure beyond alerting the user. The cookies are set to last a maximum of 1 year. You can hypothetically make it longer, but likely people have deleted their cookies by then anyway.

Github code here

1:  <script src=""></script>  
2:  <script type="text/javascript">  
3:  //this code assumes jQuery is implemented, though it would not be difficult to do this with straight javascript. .ready() assures the code runs after the submit button is added  
4:  $(document).ready(function () {  
5:    var studyName="dm102513";//replace this value with a unique, acceptable cookie name... no spaces and stick with alphanumerics and underscores  
6:    $("#submitButton").click(function(){  
7:        var exdate=new Date();  
8:        exdate.setDate(exdate.getDate() + 365);  
9:     document.cookie=studyName+"=1; expires="+exdate.toUTCString()+";; path=/";//When the user clicks the Submit button, it creates a cookie that you can read later as having already completed this survey  
10:    });  
11:    var i,x,y,ARRcookies=document.cookie.split(";");//variables necessary to read the cookies. Last variables separates all available cookie names into an array we will loop through to find our cookie  
12:    for (i=0;i<ARRcookies.length;i++){  
13:     x=ARRcookies[i].substr(0,ARRcookies[i].indexOf("="));//x=cookie name  
14:     y=ARRcookies[i].substr(ARRcookies[i].indexOf("=")+1);//y=cookie value  
15:     x=x.replace(/^\s+|\s+$/g,"");//Remove white space  
16:     if (x==studyName && y==1){  
17:             //If your cookie exists AND the value is 1, they have taken your survey before.  
18:             //Here you would put code that somehow prevents them from doing anything for your HIT, like not loading content etc.  
19:             //However, for our purposes here, I just put a simple alert in. Change this to whatever code works for you.  
20:       alert("You are not eligible for this HIT because you have already completed an identical HIT.");  
21:     }  
22:    }  
23:  });  
24:  </script>  

Allow only specified workers to complete your Mturk HITs

The other day I posted some code that allows mturk requesters to exclude workers based on a black list (here). This is typically used when someone already completed your survey, but for one reason or another you can’t just extend your already created HIT. The code in there actually started as the inverse. Instead of wanting to block certain people, a colleague wanted to only include certain people. Rather than dealing with the under-powered qualification system (under-powered for mturk web interface users at least), I figured it would be easy to just create a white list of worker Ids, and exclude everyone else from HIT. This way seemed faster than somehow creating a qualification test, or other method of excluding/including workers.

Again, this is very flexible and easily changeable. I, for instance, don’t rely on client-side programs (i.e. JavaScript) for validating users. Instead of checking the ID against a list declared in JavaScript, I send the data to server-side PHP script which returns the validation term, and if validated the URL to send workers to. The changes that would need to be done for this is to remove the url variable initialization, and add an AJAX request instead of the check() function call for when the HIT is accepted.

This code below is copy-and-pastable. Just change the Title, Instructions, target URL (where to send workers who are allowed), and create a white list of worker IDs. See the post linked at the top for how, but you will need to wrap all ids in quotation marks, and separate them by a comma. Last, change the conditional instructions to tell the worker what to do when they are on/off the list, etc.

Here’s the code (also here):

1:  <h1>TITLE - change this</h1>  
2:  <p>INSTRUCTIONS - change this</p>  
3:  <div id="idcheck">&nbsp;</div>  
4:  <p><textarea cols="80" name="comment" rows="3"></textarea></p>  
5:  <script type='text/javascript'>  
6:  var url="URL GOES HERE";//The URL that the user will go to after they are checked (i.e. the survey url). Change this.  
7:  var workers=new Array("test1","test1","test3");//This is your white list. Change this. All IDs must be enclosed in quoation marks (single or double), and separated by a comma  
8:  //Below is a utility function that checks if a given value is in an array. This will be used to check if the id (given value) is in the black list (an array)  
9:  Array.prototype.contains = function(k) {  
10:    for(var p in this)  
11:      if(this[p] === k)  
12:        return true;  
13:    return false;  
14:  }  
15:  //HTML instructions and form for ID input  
16:  var naCheck="<p>To see if you are qualified for this survey, enter worker ID below and click \'Check ID\'</p><p><textarea rows='1' cols='20' id='idinput'></textarea></p><div id='ok' onclick='check(false)' style='border: 3px solid black;cursor: hand; cursor: pointer; background-color:gray;color:white;width:70px;height:20px;text-align:center;margin-left:10px;'>Check ID</div>";  
17:  var href=window.location.href.toString();//Get URL of mturk webpage which may or may not include the worker ID  
18:  var queryString={};//storage variable for query string variables  
19:  href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Populates variable named queryString (created above) with the actual queryString variable names and values  
20:  //If queryString contains a variable names workerId, check if the black list contains that value of that variable, if not, ask the user to input their ID.  
21:  if(queryString['workerId']!=undefined)  
22:  {  
23:       check(queryString['workerId']);//Check ID against white list  
24:  }  
25:  else  
26:  {  
27:       document.getElementById('idcheck').innerHTML=naCheck;//Add HTML necessary to ask for id input  
28:  }  
29:  //This function will check the ID given against the white list, or if not given will check the ID input into the text box against the white list  
30:  function check(id){  
31:       if(id==false){  
32:            if(workers.contains(document.getElementById('idinput').value))  
33:            {  
34:                 //Input ID found on list -- ask to accept, when this happens the page reloads and the program starts over
35:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of qualified workers. Please accept the HIT to continue.";  
36:            }  
37:            else  
38:            {  
39:                 //Input ID not found on list -- ask not to accept
40:                 document.getElementById("idcheck").innerHTML="Your ID is not on the list of qualified workers. Please do not accept the HIT.";  
41:            }  
42:       }  
43:       else  
44:       {  
45:            if(workers.contains(id))  
46:            {  
47:                 //Actual ID found on list -- send to survey
                   document.getElementById("idcheck").innerHTML="<p>To proceed to the survey <a href='"+url+'&workerId='+id+"' target='mturksurvey'>click here</a>";
49:            }  
50:            else  
51:            {  
52:                 //Actual ID not found on list -- Ask to return HIT
53:                document.getElementById("idcheck").innerHTML="Your ID is not on the list of qualified workers. Please return the HIT. If you previously entered your ID and were told to accept the HIT, you likely did not enter the correct worker ID.";   
54:            }  
55:       }  
56:  }  
57:  </script>  

Excluding Mturk workers from your HITs (not just from your Qualtrics surveys)

My method for excluding Mturk workers from Qualtrics surveys (found here) is rather easy to use, but is pretty much limited to Qualtrics and Survey Gizmo. It provides helpful insight into features of the two systems, but I thought I’d develop a method to help the broader Mturk user (and those who use Survey Monkey). This is a rather simple JavaScript based system for telling workers whether or not you have them on a black list. On the black list means they have done your task before (e.g. taken you survey). Off the list means they are AOK.

The system is flexible; there is plenty about it you can change. I put in a complete HTML simply so people who want to can just copy and paste, changing the necessary parameters. There are plenty of instructions (basically everything within double quotation marks) that you can change if you don’t like how it’s worded. It is also designed for survey use where your worker accepts the HIT and is immediately sent to an outside webpage (i.e. the survey). If you need the user to do something else, you’ll have to change what happens in the last conditional of the check() function.

The logic of the system is:
1) User enters
2) System checks if HIT is accepted (by seeing if there is a workerId query string variable in URL)
3) If not accepted
a. Ask for ID input
b. Check given ID against black list
c. If on list, tell not to accept. If not on list, tell to accept
d. Accepting HIT reloads the page, starting program over again (i.e. start back at 1)
4) If accepted
a. Get workerId variable from query string in URL
b. Check actual ID against black list
c. If on list, ask to return HIT (i.e. they lied about their ID before or did not follow instructions). If not on list, give hyperlink and instructions for going to your webpage (e.g. survey).

There are 4 parameters that need to be changed. The first two are simple HTML edits. Just replace the title and instructions with your own title and instructions. This can be done in the rich text editor if you wish after pasting this code into the Source code area of the HIT template creator. The third parameter is the URL to send users to. Just paste your Survey Monkey, Qualtrics, or some other URL between the double quotation marks. The last parameter is a litter more difficult. You need to create a quotation-mark enclosed, comma-separated list of IDs. Typically you do this in Excel by downloading your previous data files, compiling a list of workerId’s in a column then in the column next to it, use the concatenate function to add quotation marks and commas (e.g. putting =concatenate(‘”‘,A1,'”,’) in cell A2 will do what is needed, then copy the formula down for each cell). Copy and paste that list between the parentheses of the workers variable.

Last, note that this code will send the worker ID to your survey. In Qualtrics you can store the ID by creating an embedded data element and naming it workerId

Here is the commented code. Or you can get it from GitHub:

1:  <h1>TITLE - change this</h1>  
2:  <p>INSTRUCTIONS - change this</p>  
3:  <div id="idcheck">&nbsp;</div>  
4:  <p><textarea cols="80" name="comment" rows="3"></textarea></p>  
5:  <script type='text/javascript'>  
6:  var url="URL GOES HERE";//The URL that the user will go to after they are checked (i.e. the survey url). Change this.  
7:  var workers=new Array("test1","test1","test3");//This is your black list. Change this. All IDs must be enclosed in quoation marks (single or double), and separated by a comma  
8:  //Below is a utility function that checks if a given value is in an array. This will be used to check if the id (given value) is in the black list (an array)  
9:  Array.prototype.contains = function(k) {  
10:    for(var p in this)  
11:      if(this[p] === k)  
12:        return true;  
13:    return false;  
14:  }  
15:  //HTML instructions and form for ID input  
16:  var naCheck="<p>To see if you have completed this survey already, enter worker ID below and click \'Check ID\'</p><p><textarea rows='1' cols='20' id='idinput'></textarea></p><div id='ok' onclick='check(false)' style='border: 3px solid black;cursor: hand; cursor: pointer; background-color:gray;color:white;width:70px;height:20px;text-align:center;margin-left:10px;'>Check ID</div>";  
17:  var href=window.location.href.toString();//Get URL of mturk webpage which may or may not include the worker ID  
18:  var queryString={};//storage variable for query string variables  
19:  href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Populates variable named queryString (created above) with the actual queryString variable names and values  
20:  //If queryString contains a variable names workerId, check if the black list contains that value of that variable, if not, ask the user to input their ID.  
21:  if(queryString['workerId']!=undefined)  
22:  {  
23:       check(queryString['workerId']);//Check ID against black list  
24:  }  
25:  else  
26:  {  
27:       document.getElementById('idcheck').innerHTML=naCheck;//Add HTML necessary to ask for id input  
28:  }  
29:  //This function will check the ID given against the black list, or if not given will check the ID input into the text box against the black list  
30:  function check(id){  
31:       if(id==false){  
32:            if(workers.contains(document.getElementById('idinput').value))  
33:            {  
34:                 //Input ID found on list -- taken the survey before, ask not to accept  
35:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of workers who have completed this survey already. Please do not accept the HIT. Thank you for your previous participation.";  
36:            }  
37:            else  
38:            {  
39:                 //Input ID not found on list -- ask to accept, when this happens it will start the program over again, this time checking their actual ID not their inputted ID, which should be the same  
40:                 document.getElementById("idcheck").innerHTML="Your ID is not on the list of workers who have completed this survey. Please accept the HIT to continue.";  
41:            }  
42:       }  
43:       else  
44:       {  
45:            if(workers.contains(id))  
46:            {  
47:                 //Actual ID found on list -- ask to return HIT (this should rarely happen, if ever... it basically means the person did not follow instructions or tried to lie before hand  
48:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of workers who have completed this survey already. Please return the HIT. If you previously entered your ID and were told to accept the HIT, you likely did not enter the correct worker ID.";  
49:            }  
50:            else  
51:            {  
52:                 //Actual ID not found on list -- send to survey  
53:                 document.getElementById("idcheck").innerHTML="<p>To proceed to the survey <a href='"+url+'&workerId='+id+"' target='mturksurvey'>click here</a>";  
54:            }  
55:       }  
56:  }  
57:  </script>  

Using YouTube for Experimental Video Display

[Headnote: WordPress, geniuses that they are, removes iframes and auto-embed youtube urls, even if they are escaped and within code tags. Therefore, in my code here I use jframes and link to utube. If you want to just copy and paste code, make sure you search and replace jframe to iframe and utube to youtube and &amp; to &.]

One of the most common problems I help people solve is how to display video for experimental purposes. Youtube contains links to other videos, information about the title and uploader, etc. Experimenters fotne don’t want to show that. There may be a better service for online video display for experiments (please mention them in the comments), but I find manipulating the YouTube player to be easy enough, and serve most of my purposes. Below is an email I wrote to a person who needed help with this very problem. Since I could not help the person in person, I included a lot of additional information to help understand what is really going on. For the advanced reader, ignore what I oversimplify, but feel free to correct what I got completely wrong. If you have a background in HTML you can probably just skip to the end. Also see footnote as to why the URLs don’t work.

Q: How can I display videos in Qualtrics? Youtube includes rewind buttons and other things I don’t want.

A: Your best bet is using YouTube and manipulating the player. If you have no experience with web development, what you need to do is probably completely new to you, but not too difficult to learn. Here’s an overview, skip to the end (the query string section) if you already know HTML and URL structures:

Embedding a youtube player inside Qualtrics is done using an Inline Frame (IFrame for short).

Ex. code

<jframe width="560" height="315" src="//" frameborder="0" allowfullscreen></jframe>

An iframe is an element on a webpage that itself is a webpage (i.e. a webpage within a webpage). That iframe webpage can be something you create, or it can be something from somewhere completely different, like a YouTube video. The webpage you load for YouTube videos is different than the one you normally view videos on. It is a page that only contains the video. Hence, on your page (your qualtrics survey) it looks like all you are doing is putting in a video player. Know that what you are really doing is putting in a separate web page that contains a video player.

That iframe element above has some attributes. The first is the element type, “iframe.” This tells the browser how to display it and how it functions. The second and third are width and height. This is the width and height of your frame. The youtube player in your target webpage is set to take up the entire page. Changing the values after width and height will change the size of your YouTube player (note: if the youtube player was not set take up the entire page it is on, changing this will not change the youtube player size, just the size of your iframe element on your page, useless info for now, but may be important in the future). Skipping the src attribute for now, the last attribute is allowfullscreen. There is no value associated with it because it is simply a binary property. If it has this attribute, it is true, and each browser that supports HTML5 has a way of allowing your iframe to be made into full screen mode. If you remove this, it can’t be made into full screen mode. For experimental control, I usually remove this.

The src element tells the iframe where to load your target webpage from. If you copy and pasted that url above into your browser you will get a page that is entirely taken up by your youtube video (the URL I chose is for a different Michotte task video I just searched for). It is not a complete URL for a couple reasons. 1) there are some things missing (e.g. http or https)for purposes beyond what we need to talk about today. 2) Some elements of URL’s are not necessary each time. Below is a URL with every possible element. I will go over the elements.

The official syntax is below, and below that is a slightly more readable version




The scheme or protocol is usually http or https but there are others like ftp. Not important, just use whatever youtube spits out (which will match whatever the page you are loading the video on uses).

The domain contains three sub-components sub-domain, domain, and top-level domain. In the example, calendar is a subdomain. Basically, it is a section of the google website. Google is the domain. This is the primary identifier. com is the top-level domain. Used for organizing websites by purpose. Pretty much meaningless now (though it was important in 1992, I’m sure).

Ports you don’t have to worry about, if you are interested just go to Wikipedia.

The filepath is just like a file path on your computer except it is the path on a server. This is oversimplified, but basically how it works. There are folders on the server, to find your file it will go through the path you tell it to, to find the file you are trying to load. The file is usually a web page. The youtube example at the top is in a folder called embed. Within the embed folder is a folder called e_jKNlC2YKo. There is no file name. This is because there is a file inside that folder called index.html (or index.php or any of several file extensions). When no filename is specified it opens the index page. If no index page exits, the server makes one for you. It is just a list of files in the folder.

Skipping one section,the last part is the fragment id. This have many different meaning (again go to wikipedia for all of them). You won’t use it now, but basically what it’ll typically do is take you to a specific part of a website. For instance when you are on a wikipedia article (e.g. Albert Michotte’s,, and click on a section of the table of contents, it takes you to that part of the article. How it does that is by adding a fragment identifier to the URL (e.g. #Early_work), which takes you to the section of the article (the HTML element with the attribute id=”Early_work”).

The query string section is what is important for manipulating the player. A query string contains data that gets passed to the web page that is used to manipulate that page. There are variables and values. The query string starts with a question mark (?), then a variable name, an equals sign and a variable value (technically the equals sign and value are optional). If you have two or more variables, you separate them with a ampersand (&). For instance, the example at the top tells the page that an id which could be the id associated with your account. That tells the page to load content specific to your account. Then there is a variable for whether or not you are logged in. Adding random data doesn’t do anything. The webpage has to be programmed to use data that may be in the query string. YouTube is programmed with certain variables that it can get from the query string. These variables change how the youtube player operates. Here is a page that has all the parameters:

What you want to do is set controls to 0 (which removes them). This allows a person to click the play button to start a video, but not rewind the video (they can rewatch it though after everything is completed). There are others that are good for experimental control purposes. Setting modest branding to 1, uses a small youtube logo instead of baraging the viewer with the knowledge that they are on Youtube and not somewhere else. Setting showinfo to 0 removes the name of the video and who uploaded it. Setting rel to 0 removes suggested videos from the end of your video (you don’t want people viewing a funny cat video after your experimental task.).

So your Youtube url looks like this:

And your code for embedding the video into Qualtrics is this:

<jframe src="//;modestbranding=1&amp;showinfo=0&amp;rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></jframe>

The only way to prevent people from rewatching a video is to have it autoplay (add &autoplay=1 to your url), and tell Quatlrics to autoadvance the page after X second (x being however long your video is). You can easily do this using the Timing question. There is a chance though that it takes a while for a person to load the youtube video. The page will still advance after, say, 5 second, even if it took the computer 3 seconds to load the video, and the person only saw 2 seconds of the video. You can control for this by asking people if they had any technical difficulties and just removing there rows of data if they did.

Pausing is still possible. If you want to prevent pausing, that only way I currently know how to do is put an invisible div html element on top of your youtube player. You also have to edit the player to change what is called the wmode. This isn’t in the docs, but sets the Adobe Flash version of the player to act like any other HTML element. Normally flash videos appear on top of anything else (we need a div on top of everything else). The wmode should always be set first in your query string. Not sure why, but all the forums I found said to do it that way. The div element needs to have some css propoerties set (done in the style attribute of the html tag). As you’ll see below you need to set the positioning type (either relative or absolute), the actual positioning (top and left), the z-index (telling it to display on top of elements with a lower z-index value) and the transparency (using both the internet explorer and the every other browser transparency style).

So your embedded player code looks like this:

<jframe src="//;controls=0&amp;modestbranding=1&amp;showinfo=0&amp;rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></jframe>

Your div looks like this (and should be pasted exactly below your youtube player

<div style="position: relative;top:-315px;left:0px;width: 560px;height:315px;background-color: white;z-index:2;opacity:0.0;filter: alpha(opacity = 0)"></div>


<div style="position: absolute;top:0px;left:0px;width: 560px;height:315px;background-color: white;z-index:2;opacity:0.0;filter: alpha(opacity = 0)"></div>

The difference between absolute and relative is that relative will by default go whereever it is placed then you tell it to go up 315 pixels (top= -315 or -315 pixels from where the top normally is). Absolute starts at the 0 point of the parent element (in this case another div element that contains both your youtube player and your cover div). Since the youtube player displays at the 0 point, you don’t need to change the top or left style of your div. You can use either, in some scenarios, one might make more sense than the other. Width and height have to be set to exactly the width and height of your iframe. Background-color doesn’t really matter since it’s transparent. Z-index have to higher than the z-index of your iframe. By default it is 1 so 2 works here. If you want to be sure, set it to 50000. Opacity is the opacity for current versions of all browsers. Filter: alpha(opacity) is for IE 8 and earlier which a lot of people still use. Opacity goes from 0-1, filter:alpha(opacity) goes from 0-100.

Sorting a 3 dimensional array in Excel

Never learned visual basic. Never thought I would use Excel enough to learn VBA. Then someone asked my to alphabetically order 500 individual rows of what can be characterized as a 3 dimensional array. Rather than spend 5 hours doing that by hand I spent 6 learning VBA and created a macro for it.

A quick overview: I received a file with about 500 rows. Each row had 220, 4-column groupings of data (i.e. column 1-4 go together and must stay together when sorting, 5-9 go together, etc.). This is probably the most logical way to put a 3 dimensional array in Excel. It makes it hard to sort though, since Excel can’t do the grouping very well. I solve this by concatenating the groupings with a delimiter. I then sort, create columns, and split the concatenated cells into columns again. To run the macro you need data with no headers and 4 columns per grouping. If your data is different you’ll need to edit the parameters in the macro.

1:  Sub SortData()  
2:    Application.ScreenUpdating = False  
3:    'concatenate data using ! as delimiter, clearing previous contents of cells  
4:    For rowx = 1 To Cells(Rows.Count, 1).End(xlUp).Row  
5:      For colx = 1 To Cells(rowx, Columns.Count).End(xlToLeft).Column Step 4  
6:        Cells(rowx, colx) = Cells(rowx, colx).Value() & "!" & Cells(rowx, colx + 1).Value() & "!" & Cells(rowx, colx + 2).Value() & "!" & Cells(rowx, colx + 3).Value()  
7:        Cells(rowx, colx + 1).ClearContents  
8:        Cells(rowx, colx + 2).ClearContents  
9:        Cells(rowx, colx + 3).ClearContents  
10:      Next colx  
11:    Next rowx  
12:    'Sort Rows Individually  
13:    For r = 1 To Cells(Rows.Count, 1).End(xlUp).Row  
14:      Range(Cells(r, 1), Cells(r, Columns.Count)).Select  
15:      Selection.Sort Key1:=Cells(r, 2), Order1:=xlAscending, Header:=xlGuess, _  
16:      OrderCustom:=1, MatchCase:=False, Orientation:=xlLeftToRight, _  
17:      DataOption1:=xlSortNormal  
18:    Next r  
19:    'Insert columns after each entry  
20:    For colx = 2 To Cells(1, Columns.Count).End(xlToLeft).Column  
21:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
22:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
23:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
24:    Next  
25:    'Split cells using ! as delimiter  
26:    For colx = 1 To Cells(1, Columns.Count).End(xlToLeft).Column Step 4  
27:    Columns(colx).Select  
28:    Selection.TextToColumns Destination:=Cells(1, colx), DataType:=xlDelimited, _  
29:      TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _  
30:      Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _  
31:      :="!", FieldInfo:=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1)), _  
32:      TrailingMinusNumbers:=True  
33:    Next  
34:    Application.ScreenUpdating = True  
35:    Range("A1").Select  
36:  End Sub  

Post Navigation


Get every new post delivered to your Inbox.

Join 78 other followers