the:behavioral:lab

Archive for the month “October, 2012”

Converting a JSON file to CSV

JSON doesn’t map perfectly to XML hierarchies, and definitely doesn’t have an ideal proxy for formatting as a CSV. However, JSON data doesn’t load into Excel for analysis purposes. Existing code I’ve found in forums for turning JSON into CSV simply uses json_decode() and then used fputcsv(), which ignores if the columns have different headers. To solve this, I wrote some code in PHP that converts JSON into an array [using json_decode()], then into XML so that I could plug into into my XML to CSV parser. There may be a simpler way, but I already had the XML to CSV script written.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
<?PHP
//All you need to change is the first two lines, when the data comes from and where it goes.
//The top is the procedural elements. Get data-&gt;remove line breaks-&gt;declare vars-&gt;convert json to array-&gt;Flatten array-&gt;convert to XML-&gt;convert to CSV
$json_str = file_get_contents('json.txt');
$filename = "target.csv";
$white= array("\\r\\n", "\\n", "\\r");
$replace = ' ';
$json_str = str_replace($white, $replace, $json_str);//line breaks cause malformed XML
$json_obj = json_decode($json_str,true);
$json_arr = array();
$return = array();
$xml = new SimpleXMLElement("");
foreach ($json_obj as $fields) {
    $json_arr[]=flatten($fields);
    $return=array();
}
foreach($json_arr as $val)
{
	array_to_xml($val,$xml);
	XMLtoCSV($xml);
	$xml=new SimpleXMLElement("");//CSV must be written one row at a time. Each element in the array is single row. Must re-initialize XML afterwards.
}
echo("Completed");//If your file is huge (mine was 300mb) it may take a while to create. Its good to know when it's done.
function flat ($a,$b){global $return; $return[$b] = $a;}//my web hosting company doesn't have PHP 5.3 for some reason so I can't pass a closure to the flatten() function. There may be an easier way still... but this works at least.
function flatten($array) {
	global $return;
	array_walk_recursive($array, 'flat');
	return $return;
}
//Simple conversion script for array to xml. The XML is not necessary well-formed. Indexed arrays will yield elements with numbers for names.
function array_to_xml($arr, &amp;$xml) {
    foreach($arr as $key =&gt; $value) {
        if(is_array($value)) {
            if(!is_numeric($key)){
                $subnode = $xml-&gt;addChild("$key");
                array_to_xml($value, $subnode);
            }
            else{
                array_to_xml($value, $xml);
            }
        }
        else {
            $xml-&gt;addChild("$key","$value");
        }
    }
}
//This parser was designed to take XML data and write it to a CSV mapping data to specific columns based on the column name, and adding columns when necessary. I wrote it a long time ago, and the code gets edited more than the comments get updated. Some of the comments may be off.
function XMLtoCSV($data){
	global $filename;
	if(file_exists($filename))
	{
		$fh = fopen($filename, "r");
		$existing_columns = fgetcsv($fh); //this is an indexed array of all the column headings
		$original_columns = $existing_columns;
		$num_original_columns = count($original_columns);
		//go through all the new data and map it onto the existing stuff...create new headings for the new data
		//first dump the XML $data columns into an array
		$x=0;
		foreach($data as $key =&gt; $value)
		{
			$new_columns_array[$x] = $key;
			$x++;
		}
		$num_newdata = count($new_columns_array);
		$numberofnewvariables = 0;
		for($x=0;$x$location)
		{
			$write_array[$location] = $key;
		}
		//next create data array from XML
		$x=0;
		foreach($data as $key =&gt; $value)
		{
			$value = str_replace("'","",$value);
			$value = str_replace('"','',$value);
			$value = str_replace(',','',$value);
			$value = str_replace(';','',$value);
			$new_values_array[$x] = $value;
			$x++;
		}
		if($numberofnewvariables &gt; 0)
		{			
			$fh = fopen($filename, "r");
			$i=0;
			while(!feof($fh))
			{
				$data2[$i++]=fgets($fh);
			}
			$first_row=explode(",",$data2[0]);
			//add new column headers here
			$temp_numcolumns = count($first_row);

			for($x=0;$x&lt;$numberofnewvariables;$x++)
			{
				$first_row[$temp_numcolumns + $x] = trim($new_columns_array[$write_array[$num_original_columns+$x]]);
			}
			for($x=0;$x&lt;count($first_row);$x++)
			{
				$first_row[$x] = preg_replace( '/\r\n/', '', trim($first_row[$x]) );
				$newfirst_row .= $first_row[$x].",";
			}
			$newfirst_row .="\n";
			$data2[0]=$newfirst_row;
			fclose($fh);
			$fh = fopen($filename,"w");
			$i=0;
			while($i&lt;count($data2))
			{	
				fputs($fh,$data2[$i]);
				$i++;
			}
			fclose($fh);	
		}
		$fh = fopen($filename, "r");
		$total_columns = fgetcsv($fh);
		$num_total_columns = count($total_columns);
		fclose($fh);

		$fh = fopen($filename, "a");
		for($x=0;$x&lt;$num_total_columns;$x++) 		{ 			fwrite($fh, "\"". $new_values_array[$write_array[$x]]."\","); 		 		} 		fwrite($fh,"\n"); //terminate the line; 		fclose($fh); 	} 	else 	{ 		$fh =fopen($filename, 'w'); 		foreach($data as $key =&gt; $value)
		{
			//first loop creates column headers
			fwrite($fh,"$key,");
		}
		fwrite($fh,"\n");

		foreach($data as $key =&gt; $value)
		{
			//second  loop writes the data (and cleans it)
			$value = str_replace("'","",$value);
			$value = str_replace('"','',$value);
			$value = str_replace(',','',$value);
			$value = str_replace(';','',$value);
			fwrite($fh,"\"$value\",");
		}	
		fwrite($fh,"\n"); //terminate the line;
		fclose($fh);
	}
}
?>
Advertisements

Programmatically embedding Flash videos in Qualtrics

I recently was helping out with a project that required a user to view 50 different flash videos. Since Qualtrics parses HTML automatically in the survey preview mode of the editor, having 50 videos pop up and start playing at once is loud and annoying. Additionally, it is inefficient and difficult to handle randomizing conditions. I wanted to come up with a quick and easy way to use JavaScript to do this in a single question.

To start, I created an EmbeddedData element in the survey flow (always on top) with fields ‘cond’ and ‘i’, leaving the values blank. Next in the instructions question, I created a random condition (all 50 videos randomized). Since the videos have the same URL and take a number as a parameter from the query string, I created and randomized and array with numbers 1 through 50. I then updated the ‘cond’ embedded data element and assigned the index element (‘i’) to 0, using the setEmbeddedData() method that Qualtrics has programmed into their API.

var conditions=new array();
//followed by to populate and randomize the array
Qualtrics.SurveyEngine.setEmbeddedData("cond",conditions.toString());
Qualtrics.SurveyEngine.setEmbeddedData("i","0");

On the question that will display the video, I needed a way to access the embedded data. You cannot do that with the getEmbeddData() method because the embedded data was created on a different page. Instead, since I am going to be editing the question text anyway, I pipe the data into the question text, typing in ${e://Field/cond};${e://Field/i}.

Next, I go to the JavaScript editor, I grab the question text HTML element using another method from the Qualtrics API.

var qt = this.getQuestionTextContainer();

Then I grab and parse the embedded data information using String.split().

var parts=qt.innerHTML.split(";");
var index=Number(parts[1]);
var conds=parts[0].split(",");
var condition=conds[index];

Last, I replace the question text with an iframe of the web page that stores my Flash files, and increment the index variable.

qt.innerHTML="";
Qualtrics.SurveyEngine.setEmbeddedData("i",String(++index));

Now I can repeat that block in the survey flow, and each time it will play the next video.

For a summary for those who want to just copy and paste, click here.

Qualtrics JavaScript Methods: Setting Embedded Data with JavaScript

I’ve been frustrated trying to manipulate questions in Qualtrics and not being able to do so. Qualtrics is great and better than any other survey creator on the webs, but as a programmer, I often want to do things Qualtrics just can’t do on its webpage. I also don’t want to program my own study since that takes 10 times longer.

I recently was able to solve several problems by looking at the Qualtrics JavaScript source. Qualtrics publishes a Question API, but it is slightly incomplete. Looking at the full code answered so many questions that their customer service was not able to. For instance, answer responses are stored as hidden input elements. This makes sense, but there are other methods. As I continue to go through the code, I will be posting intermittently various tools to use in Qualtrics or answering puzzling questions I have had that are solved by looking at the programming. Obviously all I have are the client side JavaScript files, so I have no idea how information is stored on servers, etc., but this has been very helpful to me  and should be to you as well.

One function I have wanted to do but didn’t know how until I saw the code is modify embedded data elements within a question. Having to do it in the survey flow is very limiting. When it is modified, a simple JavaScript method is called which updates the HTML input element for your data. Here is the two functions:

addEmbeddedData: function (key, value)
{
$('Page').appendChild(QBuilder('input',
{
type: 'hidden',
name: key,
value: value
}));
},
setEmbeddedData: function (key, value)
{
var fieldName = 'ED~' + key;
if ($(fieldName))
{
$(fieldName).value = value;
}
else
{
$('Header').appendChild(QBuilder('input',
{
type: 'hidden',
id: fieldName,
name: fieldName,
value: value
}));
}
getEmbeddedData: function (key)
{
var fieldName = 'ED~' + key;
if ($(fieldName))
{
return $(fieldName).value;
}
}

There are several limitations that I have learned from trial and error. First, while after looking it the code, it appears like Qualtrics implements jQuery, they in fact do not. Their selector function [$()] takes a string (or multiple strings) and only searches by id. CSS selectors do not work. Second, JavaScript alone can’t do much with EmbeddedData elements. For your data to be store in the datafiles, and for you to use the EmbeddedData across multiple pages, you need to create an element with that name in the Survey Flow. Using addEmbeddedData without the element in the survey flow will result in the data being lost after the user moves on to the next page. Last, the getEmbeddedData() method cannot get embedded data elements that were not created through JavaScript. Creating an embedded data element in the Survey Flow and then trying to access it using getEmbeddedData() will not work. Creating an embedded data element in the survey flow, editing it using setEmbeddedData, then getting it using getEmbeddedData is the only way to get and set the data in a permanent way.

Check out tomorrow’s post for an example (here).

Excluding Mturk workers from surveys in Qualtrics (and elsewhere)

A little while ago I wrote about how to replace the use of confirmation codes with the use of Mturk’s worker ID. It really serves a variety of function other than getting rid of pesky confirmation codes. For instance, many people want to exclude workers from a survey if they have done certain past surveys. Other than data-basing every worker and survey you run (something I recommend, but that’s for another time), you can use this method which matches the respondents worker ID against a black list of past respondents.

First, you need to get and store the worker ID in Qualtrics. I covered this here. Skip the senseless writing at the beginning and just read the comments inside the code and everything below it.

After completing that, all you need to do is make a comma separated list of your worker IDs to exclude, and program a logical branch in Qualtrics. To make the list, you should either have a data file or Mturk batch file with the IDs in a column. Put the column in a new spreadsheeet, and put a comma in each cell to the right of an ID (just type a comma next to the first ID then CTRL click and drag in Excel). Highlight and copy both columns. In your Qualtrics survey flow, add a new field underneath your workerId embedded data field. Name it whatever you want, but I use exclude. For the value, paste the string of IDs.

Last, create a branch element with the logic “IF EMBEDDED DATA exclude CONTAINS workerId” and make that go to a block containing a text question politely asking them to return the HIT, then add an End of Survey element. You may get some complaining emails for making someone accept a HIT just to ask them to return it, but it really does not affect them negatively (I even asked Amazon about this). Everything should look like this:

If you are using something other than Qualtrics, I have used similar methods in SurveyGizmo, Survey Monkey, and in custom programs, though with different implementations. If you need help using this in a different venue, leave a comment.

Post Navigation