the:behavioral:lab

Allow only specified workers to complete your Mturk HITs

The other day I posted some code that allows mturk requesters to exclude workers based on a black list (here). This is typically used when someone already completed your survey, but for one reason or another you can’t just extend your already created HIT. The code in there actually started as the inverse. Instead of wanting to block certain people, a colleague wanted to only include certain people. Rather than dealing with the under-powered qualification system (under-powered for mturk web interface users at least), I figured it would be easy to just create a white list of worker Ids, and exclude everyone else from HIT. This way seemed faster than somehow creating a qualification test, or other method of excluding/including workers.

Again, this is very flexible and easily changeable. I, for instance, don’t rely on client-side programs (i.e. JavaScript) for validating users. Instead of checking the ID against a list declared in JavaScript, I send the data to server-side PHP script which returns the validation term, and if validated the URL to send workers to. The changes that would need to be done for this is to remove the url variable initialization, and add an AJAX request instead of the check() function call for when the HIT is accepted.

This code below is copy-and-pastable. Just change the Title, Instructions, target URL (where to send workers who are allowed), and create a white list of worker IDs. See the post linked at the top for how, but you will need to wrap all ids in quotation marks, and separate them by a comma. Last, change the conditional instructions to tell the worker what to do when they are on/off the list, etc.

Here’s the code (also here):

1:  <h1>TITLE - change this</h1>  
2:  <p>INSTRUCTIONS - change this</p>  
3:  <div id="idcheck">&nbsp;</div>  
4:  <p><textarea cols="80" name="comment" rows="3"></textarea></p>  
5:  <script type='text/javascript'>  
6:  var url="URL GOES HERE";//The URL that the user will go to after they are checked (i.e. the survey url). Change this.  
7:  var workers=new Array("test1","test1","test3");//This is your white list. Change this. All IDs must be enclosed in quoation marks (single or double), and separated by a comma  
8:  //Below is a utility function that checks if a given value is in an array. This will be used to check if the id (given value) is in the black list (an array)  
9:  Array.prototype.contains = function(k) {  
10:    for(var p in this)  
11:      if(this[p] === k)  
12:        return true;  
13:    return false;  
14:  }  
15:  //HTML instructions and form for ID input  
16:  var naCheck="<p>To see if you are qualified for this survey, enter worker ID below and click \'Check ID\'</p><p><textarea rows='1' cols='20' id='idinput'></textarea></p><div id='ok' onclick='check(false)' style='border: 3px solid black;cursor: hand; cursor: pointer; background-color:gray;color:white;width:70px;height:20px;text-align:center;margin-left:10px;'>Check ID</div>";  
17:  var href=window.location.href.toString();//Get URL of mturk webpage which may or may not include the worker ID  
18:  var queryString={};//storage variable for query string variables  
19:  href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Populates variable named queryString (created above) with the actual queryString variable names and values  
20:  //If queryString contains a variable names workerId, check if the black list contains that value of that variable, if not, ask the user to input their ID.  
21:  if(queryString['workerId']!=undefined)  
22:  {  
23:       check(queryString['workerId']);//Check ID against white list  
24:  }  
25:  else  
26:  {  
27:       document.getElementById('idcheck').innerHTML=naCheck;//Add HTML necessary to ask for id input  
28:  }  
29:  //This function will check the ID given against the white list, or if not given will check the ID input into the text box against the white list  
30:  function check(id){  
31:       if(id==false){  
32:            if(workers.contains(document.getElementById('idinput').value))  
33:            {  
34:                 //Input ID found on list -- ask to accept, when this happens the page reloads and the program starts over
35:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of qualified workers. Please accept the HIT to continue.";  
36:            }  
37:            else  
38:            {  
39:                 //Input ID not found on list -- ask not to accept
40:                 document.getElementById("idcheck").innerHTML="Your ID is not on the list of qualified workers. Please do not accept the HIT.";  
41:            }  
42:       }  
43:       else  
44:       {  
45:            if(workers.contains(id))  
46:            {  
47:                 //Actual ID found on list -- send to survey
                   document.getElementById("idcheck").innerHTML="<p>To proceed to the survey <a href='"+url+'&workerId='+id+"' target='mturksurvey'>click here</a>";
48:              
49:            }  
50:            else  
51:            {  
52:                 //Actual ID not found on list -- Ask to return HIT
53:                document.getElementById("idcheck").innerHTML="Your ID is not on the list of qualified workers. Please return the HIT. If you previously entered your ID and were told to accept the HIT, you likely did not enter the correct worker ID.";   
54:            }  
55:       }  
56:  }  
57:  </script>  

Excluding Mturk workers from your HITs (not just from your Qualtrics surveys)

My method for excluding Mturk workers from Qualtrics surveys (found here) is rather easy to use, but is pretty much limited to Qualtrics and Survey Gizmo. It provides helpful insight into features of the two systems, but I thought I’d develop a method to help the broader Mturk user (and those who use Survey Monkey). This is a rather simple JavaScript based system for telling workers whether or not you have them on a black list. On the black list means they have done your task before (e.g. taken you survey). Off the list means they are AOK.

The system is flexible; there is plenty about it you can change. I put in a complete HTML simply so people who want to can just copy and paste, changing the necessary parameters. There are plenty of instructions (basically everything within double quotation marks) that you can change if you don’t like how it’s worded. It is also designed for survey use where your worker accepts the HIT and is immediately sent to an outside webpage (i.e. the survey). If you need the user to do something else, you’ll have to change what happens in the last conditional of the check() function.

The logic of the system is:
1) User enters
2) System checks if HIT is accepted (by seeing if there is a workerId query string variable in URL)
3) If not accepted
a. Ask for ID input
b. Check given ID against black list
c. If on list, tell not to accept. If not on list, tell to accept
d. Accepting HIT reloads the page, starting program over again (i.e. start back at 1)
4) If accepted
a. Get workerId variable from query string in URL
b. Check actual ID against black list
c. If on list, ask to return HIT (i.e. they lied about their ID before or did not follow instructions). If not on list, give hyperlink and instructions for going to your webpage (e.g. survey).

There are 4 parameters that need to be changed. The first two are simple HTML edits. Just replace the title and instructions with your own title and instructions. This can be done in the rich text editor if you wish after pasting this code into the Source code area of the HIT template creator. The third parameter is the URL to send users to. Just paste your Survey Monkey, Qualtrics, or some other URL between the double quotation marks. The last parameter is a litter more difficult. You need to create a quotation-mark enclosed, comma-separated list of IDs. Typically you do this in Excel by downloading your previous data files, compiling a list of workerId’s in a column then in the column next to it, use the concatenate function to add quotation marks and commas (e.g. putting =concatenate(‘”‘,A1,'”,’) in cell A2 will do what is needed, then copy the formula down for each cell). Copy and paste that list between the parentheses of the workers variable.

Last, note that this code will send the worker ID to your survey. In Qualtrics you can store the ID by creating an embedded data element and naming it workerId

Here is the commented code. Or you can get it from GitHub: https://gist.github.com/TheBehavioralLab/6770997

1:  <h1>TITLE - change this</h1>  
2:  <p>INSTRUCTIONS - change this</p>  
3:  <div id="idcheck">&nbsp;</div>  
4:  <p><textarea cols="80" name="comment" rows="3"></textarea></p>  
5:  <script type='text/javascript'>  
6:  var url="URL GOES HERE";//The URL that the user will go to after they are checked (i.e. the survey url). Change this.  
7:  var workers=new Array("test1","test1","test3");//This is your black list. Change this. All IDs must be enclosed in quoation marks (single or double), and separated by a comma  
8:  //Below is a utility function that checks if a given value is in an array. This will be used to check if the id (given value) is in the black list (an array)  
9:  Array.prototype.contains = function(k) {  
10:    for(var p in this)  
11:      if(this[p] === k)  
12:        return true;  
13:    return false;  
14:  }  
15:  //HTML instructions and form for ID input  
16:  var naCheck="<p>To see if you have completed this survey already, enter worker ID below and click \'Check ID\'</p><p><textarea rows='1' cols='20' id='idinput'></textarea></p><div id='ok' onclick='check(false)' style='border: 3px solid black;cursor: hand; cursor: pointer; background-color:gray;color:white;width:70px;height:20px;text-align:center;margin-left:10px;'>Check ID</div>";  
17:  var href=window.location.href.toString();//Get URL of mturk webpage which may or may not include the worker ID  
18:  var queryString={};//storage variable for query string variables  
19:  href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Populates variable named queryString (created above) with the actual queryString variable names and values  
20:  //If queryString contains a variable names workerId, check if the black list contains that value of that variable, if not, ask the user to input their ID.  
21:  if(queryString['workerId']!=undefined)  
22:  {  
23:       check(queryString['workerId']);//Check ID against black list  
24:  }  
25:  else  
26:  {  
27:       document.getElementById('idcheck').innerHTML=naCheck;//Add HTML necessary to ask for id input  
28:  }  
29:  //This function will check the ID given against the black list, or if not given will check the ID input into the text box against the black list  
30:  function check(id){  
31:       if(id==false){  
32:            if(workers.contains(document.getElementById('idinput').value))  
33:            {  
34:                 //Input ID found on list -- taken the survey before, ask not to accept  
35:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of workers who have completed this survey already. Please do not accept the HIT. Thank you for your previous participation.";  
36:            }  
37:            else  
38:            {  
39:                 //Input ID not found on list -- ask to accept, when this happens it will start the program over again, this time checking their actual ID not their inputted ID, which should be the same  
40:                 document.getElementById("idcheck").innerHTML="Your ID is not on the list of workers who have completed this survey. Please accept the HIT to continue.";  
41:            }  
42:       }  
43:       else  
44:       {  
45:            if(workers.contains(id))  
46:            {  
47:                 //Actual ID found on list -- ask to return HIT (this should rarely happen, if ever... it basically means the person did not follow instructions or tried to lie before hand  
48:                 document.getElementById("idcheck").innerHTML="Your ID is on the list of workers who have completed this survey already. Please return the HIT. If you previously entered your ID and were told to accept the HIT, you likely did not enter the correct worker ID.";  
49:            }  
50:            else  
51:            {  
52:                 //Actual ID not found on list -- send to survey  
53:                 document.getElementById("idcheck").innerHTML="<p>To proceed to the survey <a href='"+url+'&workerId='+id+"' target='mturksurvey'>click here</a>";  
54:            }  
55:       }  
56:  }  
57:  </script>  

Using YouTube for Experimental Video Display

[Headnote: WordPress, geniuses that they are, removes iframes and auto-embeds youtube urls, even if they are escaped and within code tags. Therefore, in my code here I use jframes and link to utube. If you want to just copy and paste code, make sure you search and replace jframe to iframe and utube to youtube and &amp; to &.]

One of the most common problems I help people solve is how to display video for experimental purposes. Youtube contains links to other videos, information about the title and uploader, etc. Experimenters don’t want to show that. There may be a better service for online video display for experiments (please mention them in the comments), but I find manipulating the YouTube player to be easy enough, and serve most of my purposes. Below is an email I wrote to a person who needed help with this very problem. Since I could not help the person in person, I included a lot of additional information to help understand what is really going on. For the advanced reader, ignore what I oversimplify, but feel free to correct what I got completely wrong. If you have a background in HTML you can probably just skip to the end. Also see footnote as to why the URLs don’t work.

Q: How can I display videos in Qualtrics? Youtube includes rewind buttons and other things I don’t want.

A: Your best bet is using YouTube and manipulating the player. If you have no experience with web development, what you need to do is probably completely new to you, but not too difficult to learn. Here’s an overview, skip to the end (the query string section) if you already know HTML and URL structures:

Embedding a youtube player inside Qualtrics is done using an Inline Frame (IFrame for short).

Ex. code

<jframe width="560" height="315" src="//www.utube.com/embed/e_jKNlC2YKo" frameborder="0" allowfullscreen></jframe>

An iframe is an element on a webpage that itself is a webpage (i.e. a webpage within a webpage). That iframe webpage can be something you create, or it can be something from somewhere completely different, like a YouTube video. The webpage you load for YouTube videos is different than the one you normally view videos on. It is a page that only contains the video. Hence, on your page (your qualtrics survey) it looks like all you are doing is putting in a video player. Know that what you are really doing is putting in a separate web page that contains a video player.

That iframe element above has some attributes. The first is the element type, “iframe.” This tells the browser how to display it and how it functions. The second and third are width and height. This is the width and height of your frame. The youtube player in your target webpage is set to take up the entire page. Changing the values after width and height will change the size of your YouTube player (note: if the youtube player was not set to take up the entire page it is on, changing this will not change the youtube player size, just the size of your iframe element on your page, useless info for now, but may be important in the future). Skipping the src attribute for now, the last attribute is allowfullscreen. There is no value associated with it because it is simply a binary property. If it has this attribute, it is true, and each browser that supports HTML5 has a way of allowing your iframe to be made into full screen mode. If you remove this, it can’t be made into full screen mode. For experimental control, I usually remove this.

The src element tells the iframe where to load your target webpage from. If you copy and pasted that url above into your browser you will get a page that is entirely taken up by your youtube video (the URL I chose is for a different Michotte task video I just searched for). It is not a complete URL for a couple reasons. 1) there are some things missing (e.g. http or https)for purposes beyond what we need to talk about today. 2) Some elements of URL’s are not necessary each time. Below is a URL with every possible element. I will go over the elements.

https://calendar.google.com:443/folderName/anotherFolderName/fileName.html?id=1234&loggedin=true#main_content

The official syntax is below, and below that is a slightly more readable version

scheme://domain:port/path?query_string#fragment_id

aka

protocol://subdomain.domain.top-level_domain:port/filepath?query_variable=query_value#fragment_id

The scheme or protocol is usually http or https but there are others like ftp. Not important, just use whatever youtube spits out (which will match whatever the page you are loading the video on uses).

The domain contains three sub-components sub-domain, domain, and top-level domain. In the example, calendar is a subdomain. Basically, it is a section of the google website. Google is the domain. This is the primary identifier. com is the top-level domain. Used for organizing websites by purpose. Pretty much meaningless now (though it was important in 1992, I’m sure).

Ports you don’t have to worry about, if you are interested just go to Wikipedia.

The filepath is just like a file path on your computer except it is the path on a server. This is oversimplified, but basically how it works. There are folders on the server, to find your file it will go through the path you tell it to, to find the file you are trying to load. The file is usually a web page. The youtube example at the top is in a folder called embed. Within the embed folder is a folder called e_jKNlC2YKo. There is no file name. This is because there is a file inside that folder called index.html (or index.php or any of several file extensions). When no filename is specified it opens the index page. If no index page exists, the server makes one for you. It is just a list of files in the folder.

Skipping one section,the last part is the fragment id. This have many different meaning (again go to wikipedia for all of them). You won’t use it now, but basically what it’ll typically do is take you to a specific part of a website. For instance when you are on a wikipedia article (e.g. Albert Michotte’s, http://en.wikipedia.org/wiki/Albert_Michotte), and click on a section of the table of contents, it takes you to that part of the article. How it does that is by adding a fragment identifier to the URL (e.g. #Early_work), which takes you to the section of the article (the HTML element with the attribute id=”Early_work”).

The query string section is what is important for manipulating the player. A query string contains data that gets passed to the web page that is used to manipulate that page. There are variables and values. The query string starts with a question mark (?), then a variable name, an equals sign and a variable value (technically the equals sign and value are optional). If you have two or more variables, you separate them with a ampersand (&). For instance, the example at the top tells the page that an id which could be the id associated with your account. That tells the page to load content specific to your account. Then there is a variable for whether or not you are logged in. Adding random data doesn’t do anything. The webpage has to be programmed to use data that may be in the query string. YouTube is programmed with certain variables that it can get from the query string. These variables change how the youtube player operates. Here is a page that has all the parameters: https://developers.google.com/youtube/player_parameters#Parameters

What you want to do is set controls to 0 (which removes them). This allows a person to click the play button to start a video, but not rewind the video (they can rewatch it though after everything is completed). There are others that are good for experimental control purposes. Setting modest branding to 1, uses a small youtube logo instead of baraging the viewer with the knowledge that they are on Youtube and not somewhere else. Setting showinfo to 0 removes the name of the video and who uploaded it. Setting rel to 0 removes suggested videos from the end of your video (you don’t want people viewing a funny cat video after your experimental task.).

So your Youtube url looks like this:

http://www.utube.com/embed/e_jKNlC2YKo?controls=0&modestbranding=1&showinfo=0&rel=0

And your code for embedding the video into Qualtrics is this:

<jframe src="//www.utube.com/embed/e_jKNlC2YKo?controls=0&amp;modestbranding=1&amp;showinfo=0&amp;rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></jframe>

The only way to prevent people from rewatching a video is to have it autoplay (add &autoplay=1 to your url), and tell Quatlrics to autoadvance the page after X second (x being however long your video is). You can easily do this using the Timing question. There is a chance though that it takes a while for a person to load the youtube video. The page will still advance after, say, 5 second, even if it took the computer 3 seconds to load the video, and the person only saw 2 seconds of the video. You can control for this by asking people if they had any technical difficulties and just removing there rows of data if they did.

Pausing is still possible. If you want to prevent pausing, that only way I currently know how to do is put an invisible div html element on top of your youtube player. You also have to edit the player to change what is called the wmode. This isn’t in the docs, but sets the Adobe Flash version of the player to act like any other HTML element. Normally flash videos appear on top of anything else (we need a div on top of everything else). The wmode should always be set first in your query string. Not sure why, but all the forums I found said to do it that way. The div element needs to have some css properties set (done in the style attribute of the html tag). As you’ll see below you need to set the positioning type (either relative or absolute), the actual positioning (top and left), the z-index (telling it to display on top of elements with a higher z-index value) and the transparency (using both the internet explorer and the every other browser transparency style).

So your embedded player code looks like this:

<jframe src="//www.utube.com/embed/e_jKNlC2YKo?wmode=opaque&amp;controls=0&amp;modestbranding=1&amp;showinfo=0&amp;rel=0" height="315" width="560" allowfullscreen="" frameborder="0"></jframe>

Your div looks like this (and should be pasted exactly below your youtube player

<div style="position: relative;top:-315px;left:0px;width: 560px;height:315px;background-color: white;z-index:2;opacity:0.0;filter: alpha(opacity = 0)"></div>

or

<div style="position: absolute;top:0px;left:0px;width: 560px;height:315px;background-color: white;z-index:2;opacity:0.0;filter: alpha(opacity = 0)"></div>

The difference between absolute and relative is that relative will by default go whereever it is placed then you tell it to go up 315 pixels (top= -315 or -315 pixels from where the top normally is). Absolute starts at the 0 point of the parent element (in this case another div element that contains both your youtube player and your cover div). Since the youtube player displays at the 0 point, you don’t need to change the top or left style of your div. You can use either, in some scenarios, one might make more sense than the other. Width and height have to be set to exactly the width and height of your iframe. Background-color doesn’t really matter since it’s transparent. Z-index have to higher than the z-index of your iframe. By default it is 1 so 2 works here. If you want to be sure, set it to 50000. Opacity is the opacity for current versions of all browsers. Filter: alpha(opacity) is for IE 8 and earlier which a lot of people still use. Opacity goes from 0-1, filter:alpha(opacity) goes from 0-100.

Sorting a 3 dimensional array in Excel

Never learned visual basic. Never thought I would use Excel enough to learn VBA. Then someone asked my to alphabetically order 500 individual rows of what can be characterized as a 3 dimensional array. Rather than spend 5 hours doing that by hand I spent 6 learning VBA and created a macro for it.

A quick overview: I received a file with about 500 rows. Each row had 220, 4-column groupings of data (i.e. column 1-4 go together and must stay together when sorting, 5-9 go together, etc.). This is probably the most logical way to put a 3 dimensional array in Excel. It makes it hard to sort though, since Excel can’t do the grouping very well. I solve this by concatenating the groupings with a delimiter. I then sort, create columns, and split the concatenated cells into columns again. To run the macro you need data with no headers and 4 columns per grouping. If your data is different you’ll need to edit the parameters in the macro.

1:  Sub SortData()  
2:    Application.ScreenUpdating = False  
3:    'concatenate data using ! as delimiter, clearing previous contents of cells  
4:    For rowx = 1 To Cells(Rows.Count, 1).End(xlUp).Row  
5:      For colx = 1 To Cells(rowx, Columns.Count).End(xlToLeft).Column Step 4  
6:        Cells(rowx, colx) = Cells(rowx, colx).Value() & "!" & Cells(rowx, colx + 1).Value() & "!" & Cells(rowx, colx + 2).Value() & "!" & Cells(rowx, colx + 3).Value()  
7:        Cells(rowx, colx + 1).ClearContents  
8:        Cells(rowx, colx + 2).ClearContents  
9:        Cells(rowx, colx + 3).ClearContents  
10:      Next colx  
11:    Next rowx  
12:    'Sort Rows Individually  
13:    For r = 1 To Cells(Rows.Count, 1).End(xlUp).Row  
14:      Range(Cells(r, 1), Cells(r, Columns.Count)).Select  
15:      Selection.Sort Key1:=Cells(r, 2), Order1:=xlAscending, Header:=xlGuess, _  
16:      OrderCustom:=1, MatchCase:=False, Orientation:=xlLeftToRight, _  
17:      DataOption1:=xlSortNormal  
18:    Next r  
19:    'Insert columns after each entry  
20:    For colx = 2 To Cells(1, Columns.Count).End(xlToLeft).Column  
21:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
22:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
23:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
24:    Next  
25:    'Split cells using ! as delimiter  
26:    For colx = 1 To Cells(1, Columns.Count).End(xlToLeft).Column Step 4  
27:    Columns(colx).Select  
28:    Selection.TextToColumns Destination:=Cells(1, colx), DataType:=xlDelimited, _  
29:      TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _  
30:      Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _  
31:      :="!", FieldInfo:=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1)), _  
32:      TrailingMinusNumbers:=True  
33:    Next  
34:    Application.ScreenUpdating = True  
35:    Range("A1").Select  
36:  End Sub  

Converting a JSON file to CSV

JSON doesn’t map perfectly to XML hierarchies, and definitely doesn’t have an ideal proxy for formatting as a CSV. However, JSON data doesn’t load into Excel for analysis purposes. Existing code I’ve found in forums for turning JSON into CSV simply uses json_decode() and then used fputcsv(), which ignores if the columns have different headers. To solve this, I wrote some code in PHP that converts JSON into an array [using json_decode()], then into XML so that I could plug into into my XML to CSV parser. There may be a simpler way, but I already had the XML to CSV script written.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
<?PHP
//All you need to change is the first two lines, when the data comes from and where it goes.
//The top is the procedural elements. Get data-&gt;remove line breaks-&gt;declare vars-&gt;convert json to array-&gt;Flatten array-&gt;convert to XML-&gt;convert to CSV
$json_str = file_get_contents('json.txt');
$filename = "target.csv";
$white= array("\\r\\n", "\\n", "\\r");
$replace = ' ';
$json_str = str_replace($white, $replace, $json_str);//line breaks cause malformed XML
$json_obj = json_decode($json_str,true);
$json_arr = array();
$return = array();
$xml = new SimpleXMLElement("");
foreach ($json_obj as $fields) {
    $json_arr[]=flatten($fields);
    $return=array();
}
foreach($json_arr as $val)
{
	array_to_xml($val,$xml);
	XMLtoCSV($xml);
	$xml=new SimpleXMLElement("");//CSV must be written one row at a time. Each element in the array is single row. Must re-initialize XML afterwards.
}
echo("Completed");//If your file is huge (mine was 300mb) it may take a while to create. Its good to know when it's done.
function flat ($a,$b){global $return; $return[$b] = $a;}//my web hosting company doesn't have PHP 5.3 for some reason so I can't pass a closure to the flatten() function. There may be an easier way still... but this works at least.
function flatten($array) {
	global $return;
	array_walk_recursive($array, 'flat');
	return $return;
}
//Simple conversion script for array to xml. The XML is not necessary well-formed. Indexed arrays will yield elements with numbers for names.
function array_to_xml($arr, &amp;$xml) {
    foreach($arr as $key =&gt; $value) {
        if(is_array($value)) {
            if(!is_numeric($key)){
                $subnode = $xml-&gt;addChild("$key");
                array_to_xml($value, $subnode);
            }
            else{
                array_to_xml($value, $xml);
            }
        }
        else {
            $xml-&gt;addChild("$key","$value");
        }
    }
}
//This parser was designed to take XML data and write it to a CSV mapping data to specific columns based on the column name, and adding columns when necessary. I wrote it a long time ago, and the code gets edited more than the comments get updated. Some of the comments may be off.
function XMLtoCSV($data){
	global $filename;
	if(file_exists($filename))
	{
		$fh = fopen($filename, "r");
		$existing_columns = fgetcsv($fh); //this is an indexed array of all the column headings
		$original_columns = $existing_columns;
		$num_original_columns = count($original_columns);
		//go through all the new data and map it onto the existing stuff...create new headings for the new data
		//first dump the XML $data columns into an array
		$x=0;
		foreach($data as $key =&gt; $value)
		{
			$new_columns_array[$x] = $key;
			$x++;
		}
		$num_newdata = count($new_columns_array);
		$numberofnewvariables = 0;
		for($x=0;$x$location)
		{
			$write_array[$location] = $key;
		}
		//next create data array from XML
		$x=0;
		foreach($data as $key =&gt; $value)
		{
			$value = str_replace("'","",$value);
			$value = str_replace('"','',$value);
			$value = str_replace(',','',$value);
			$value = str_replace(';','',$value);
			$new_values_array[$x] = $value;
			$x++;
		}
		if($numberofnewvariables &gt; 0)
		{			
			$fh = fopen($filename, "r");
			$i=0;
			while(!feof($fh))
			{
				$data2[$i++]=fgets($fh);
			}
			$first_row=explode(",",$data2[0]);
			//add new column headers here
			$temp_numcolumns = count($first_row);

			for($x=0;$x&lt;$numberofnewvariables;$x++)
			{
				$first_row[$temp_numcolumns + $x] = trim($new_columns_array[$write_array[$num_original_columns+$x]]);
			}
			for($x=0;$x&lt;count($first_row);$x++)
			{
				$first_row[$x] = preg_replace( '/\r\n/', '', trim($first_row[$x]) );
				$newfirst_row .= $first_row[$x].",";
			}
			$newfirst_row .="\n";
			$data2[0]=$newfirst_row;
			fclose($fh);
			$fh = fopen($filename,"w");
			$i=0;
			while($i&lt;count($data2))
			{	
				fputs($fh,$data2[$i]);
				$i++;
			}
			fclose($fh);	
		}
		$fh = fopen($filename, "r");
		$total_columns = fgetcsv($fh);
		$num_total_columns = count($total_columns);
		fclose($fh);

		$fh = fopen($filename, "a");
		for($x=0;$x&lt;$num_total_columns;$x++) 		{ 			fwrite($fh, "\"". $new_values_array[$write_array[$x]]."\","); 		 		} 		fwrite($fh,"\n"); //terminate the line; 		fclose($fh); 	} 	else 	{ 		$fh =fopen($filename, 'w'); 		foreach($data as $key =&gt; $value)
		{
			//first loop creates column headers
			fwrite($fh,"$key,");
		}
		fwrite($fh,"\n");

		foreach($data as $key =&gt; $value)
		{
			//second  loop writes the data (and cleans it)
			$value = str_replace("'","",$value);
			$value = str_replace('"','',$value);
			$value = str_replace(',','',$value);
			$value = str_replace(';','',$value);
			fwrite($fh,"\"$value\",");
		}	
		fwrite($fh,"\n"); //terminate the line;
		fclose($fh);
	}
}
?>

Programmatically embedding Flash videos in Qualtrics

I recently was helping out with a project that required a user to view 50 different flash videos. Since Qualtrics parses HTML automatically in the survey preview mode of the editor, having 50 videos pop up and start playing at once is loud and annoying. Additionally, it is inefficient and difficult to handle randomizing conditions. I wanted to come up with a quick and easy way to use JavaScript to do this in a single question.

To start, I created an EmbeddedData element in the survey flow (always on top) with fields ‘cond’ and ‘i’, leaving the values blank. Next in the instructions question, I created a random condition (all 50 videos randomized). Since the videos have the same URL and take a number as a parameter from the query string, I created and randomized and array with numbers 1 through 50. I then updated the ‘cond’ embedded data element and assigned the index element (‘i’) to 0, using the setEmbeddedData() method that Qualtrics has programmed into their API.

var conditions=new array();
//followed by to populate and randomize the array
Qualtrics.SurveyEngine.setEmbeddedData("cond",conditions.toString());
Qualtrics.SurveyEngine.setEmbeddedData("i","0");

On the question that will display the video, I needed a way to access the embedded data. You cannot do that with the getEmbeddData() method because the embedded data was created on a different page. Instead, since I am going to be editing the question text anyway, I pipe the data into the question text, typing in ${e://Field/cond};${e://Field/i}.

Next, I go to the JavaScript editor, I grab the question text HTML element using another method from the Qualtrics API.

var qt = this.getQuestionTextContainer();

Then I grab and parse the embedded data information using String.split().

var parts=qt.innerHTML.split(";");
var index=Number(parts[1]);
var conds=parts[0].split(",");
var condition=conds[index];

Last, I replace the question text with an iframe of the web page that stores my Flash files, and increment the index variable.

qt.innerHTML="";
Qualtrics.SurveyEngine.setEmbeddedData("i",String(++index));

Now I can repeat that block in the survey flow, and each time it will play the next video.

For a summary for those who want to just copy and paste, click here.

Qualtrics JavaScript Methods: Setting Embedded Data with JavaScript

I’ve been frustrated trying to manipulate questions in Qualtrics and not being able to do so. Qualtrics is great and better than any other survey creator on the webs, but as a programmer, I often want to do things Qualtrics just can’t do on its webpage. I also don’t want to program my own study since that takes 10 times longer.

I recently was able to solve several problems by looking at the Qualtrics JavaScript source. Qualtrics publishes a Question API, but it is slightly incomplete. Looking at the full code answered so many questions that their customer service was not able to. For instance, answer responses are stored as hidden input elements. This makes sense, but there are other methods. As I continue to go through the code, I will be posting intermittently various tools to use in Qualtrics or answering puzzling questions I have had that are solved by looking at the programming. Obviously all I have are the client side JavaScript files, so I have no idea how information is stored on servers, etc., but this has been very helpful to me  and should be to you as well.

One function I have wanted to do but didn’t know how until I saw the code is modify embedded data elements within a question. Having to do it in the survey flow is very limiting. When it is modified, a simple JavaScript method is called which updates the HTML input element for your data. Here is the two functions:

addEmbeddedData: function (key, value)
{
$('Page').appendChild(QBuilder('input',
{
type: 'hidden',
name: key,
value: value
}));
},
setEmbeddedData: function (key, value)
{
var fieldName = 'ED~' + key;
if ($(fieldName))
{
$(fieldName).value = value;
}
else
{
$('Header').appendChild(QBuilder('input',
{
type: 'hidden',
id: fieldName,
name: fieldName,
value: value
}));
}
getEmbeddedData: function (key)
{
var fieldName = 'ED~' + key;
if ($(fieldName))
{
return $(fieldName).value;
}
}

There are several limitations that I have learned from trial and error. First, while after looking it the code, it appears like Qualtrics implements jQuery, they in fact do not. Their selector function [$()] takes a string (or multiple strings) and only searches by id. CSS selectors do not work. Second, JavaScript alone can’t do much with EmbeddedData elements. For your data to be store in the datafiles, and for you to use the EmbeddedData across multiple pages, you need to create an element with that name in the Survey Flow. Using addEmbeddedData without the element in the survey flow will result in the data being lost after the user moves on to the next page. Last, the getEmbeddedData() method cannot get embedded data elements that were not created through JavaScript. Creating an embedded data element in the Survey Flow and then trying to access it using getEmbeddedData() will not work. Creating an embedded data element in the survey flow, editing it using setEmbeddedData, then getting it using getEmbeddedData is the only way to get and set the data in a permanent way.

Check out tomorrow’s post for an example (here).

Excluding Mturk workers from surveys in Qualtrics (and elsewhere)

A little while ago I wrote about how to replace the use of confirmation codes with the use of Mturk’s worker ID. It really serves a variety of function other than getting rid of pesky confirmation codes. For instance, many people want to exclude workers from a survey if they have done certain past surveys. Other than data-basing every worker and survey you run (something I recommend, but that’s for another time), you can use this method which matches the respondents worker ID against a black list of past respondents.

First, you need to get and store the worker ID in Qualtrics. I covered this here. Skip the senseless writing at the beginning and just read the comments inside the code and everything below it.

After completing that, all you need to do is make a comma separated list of your worker IDs to exclude, and program a logical branch in Qualtrics. To make the list, you should either have a data file or Mturk batch file with the IDs in a column. Put the column in a new spreadsheeet, and put a comma in each cell to the right of an ID (just type a comma next to the first ID then CTRL click and drag in Excel). Highlight and copy both columns. In your Qualtrics survey flow, add a new field underneath your workerId embedded data field. Name it whatever you want, but I use exclude. For the value, paste the string of IDs.

Last, create a branch element with the logic “IF EMBEDDED DATA exclude CONTAINS workerId” and make that go to a block containing a text question politely asking them to return the HIT, then add an End of Survey element. You may get some complaining emails for making someone accept a HIT just to ask them to return it, but it really does not affect them negatively (I even asked Amazon about this). Everything should look like this:

If you are using something other than Qualtrics, I have used similar methods in SurveyGizmo, Survey Monkey, and in custom programs, though with different implementations. If you need help using this in a different venue, leave a comment.

How attention filter format affects responses on Mturk

Attention filters or known-answers are the most common way to gauge the responders level of attention or effort. If a question where the answer is given to the participant or easily known (e.g. 2+2 = ?; or telling the answer in instructions before the question) is answered incorrectly, you have a good idea that the person was breezing through and they either should not be paid or you should not use their data.

When I first started using Mturk, the attention filter fail rate was about 5% in my surveys, which is phenomenal for the cost. I attributed this to the fact that people on Mturk are worried about not getting paid, so more attention is put into work. However then I had a string of surveys where the fail rate was around 50%. This is terrible and justification to not publish results even if they are good. I decided to try to figure out the cause.

A few weeks ago I wrote about how workers sometimes don’t attend to instructions in order to make the survey go faster. Workers are used to quick tasks, and repetitive multiple choice questions are therefore better than long instruction sets. I ran two studies and each contained the same attention filter. The filter was one that gave the answer in a paragraph before the question. Then asked a simple question where the correct answer (the one given) would not be the correct answer to the question. Like “Answer ‘8’ to the following question. 2+2=___.” The question was always bolded, but the instructions were bold in one study and not bold in another. When bold the fail rate was 7%. When not bold the fail rate was 45%. The simple act of making the instructions look like instructions made people not read them.

Because of this, it is important both to construct attention filters in a way that they measure what you want to measure (e.g. do you want to know if instructions are being read or if question text is being read) and it is important to format instructions in a way that they look like questions.

Questions get read, instructions don’t.

The kinds of experiments that work on Mturk

In my never ending attempt to push colleagues into the Mturk world, I often tell them how I have never had a study that worked on another population and not work on the Mturk worker population. However, inevitably my advertisements prove false and I get someone who says that they got some weird results. After studying the types of experiments conducted, I noticed something that should have been obvious from the beginning. Workers seem to be really good at short iterative tasks, or longer tasks that have objective measurements. Thus my decision making work which asks multiple choice questions or asks people to click some buttons  and watch some numbers appear and then judge them works really well. It is akin to tasks like photo tagging. Something short over and over again is easy to pay attention to. Similarly, asking someone to write out how they would solve a problem gets good results. Anyone can look at a paragraph on how to choose between two insurance options and tell if the person put effort into it or not.

However, complex imagined scenarios tend to not work very well. A worker is not used to keeping 2 pages worth of instructions memorized to imagine what it would be like to be a Doctor figuring out whether to prescribe a patient drug X which has a higher cure and premature death rate or drug Y which guarantees immediate safety, but may not cure the disease. In general I think it should be assumed that workers are under high-cognitive load at all times. The worker may be taking your survey at work and while answering a question they may also be trying to gauge if their boss is walking by. If cognitive load factors into your variables or you think a high amount of attention is needed, you will be better to use a different recruitment method or pay a significantly higher amount of money. Last, if you are using Mturk, I would recommend using mechanisms to force attention to instructions like not allowing the Next button to appear for a minute or two in Qualtrics.

Post Navigation