the:behavioral:lab

Sorting a 3 dimensional array in Excel

Never learned visual basic. Never thought I would use Excel enough to learn VBA. Then someone asked my to alphabetically order 500 individual rows of what can be characterized as a 3 dimensional array. Rather than spend 5 hours doing that by hand I spent 6 learning VBA and created a macro for it.

A quick overview: I received a file with about 500 rows. Each row had 220, 4-column groupings of data (i.e. column 1-4 go together and must stay together when sorting, 5-9 go together, etc.). This is probably the most logical way to put a 3 dimensional array in Excel. It makes it hard to sort though, since Excel can’t do the grouping very well. I solve this by concatenating the groupings with a delimiter. I then sort, create columns, and split the concatenated cells into columns again. To run the macro you need data with no headers and 4 columns per grouping. If your data is different you’ll need to edit the parameters in the macro.

1:  Sub SortData()  
2:    Application.ScreenUpdating = False  
3:    'concatenate data using ! as delimiter, clearing previous contents of cells  
4:    For rowx = 1 To Cells(Rows.Count, 1).End(xlUp).Row  
5:      For colx = 1 To Cells(rowx, Columns.Count).End(xlToLeft).Column Step 4  
6:        Cells(rowx, colx) = Cells(rowx, colx).Value() & "!" & Cells(rowx, colx + 1).Value() & "!" & Cells(rowx, colx + 2).Value() & "!" & Cells(rowx, colx + 3).Value()  
7:        Cells(rowx, colx + 1).ClearContents  
8:        Cells(rowx, colx + 2).ClearContents  
9:        Cells(rowx, colx + 3).ClearContents  
10:      Next colx  
11:    Next rowx  
12:    'Sort Rows Individually  
13:    For r = 1 To Cells(Rows.Count, 1).End(xlUp).Row  
14:      Range(Cells(r, 1), Cells(r, Columns.Count)).Select  
15:      Selection.Sort Key1:=Cells(r, 2), Order1:=xlAscending, Header:=xlGuess, _  
16:      OrderCustom:=1, MatchCase:=False, Orientation:=xlLeftToRight, _  
17:      DataOption1:=xlSortNormal  
18:    Next r  
19:    'Insert columns after each entry  
20:    For colx = 2 To Cells(1, Columns.Count).End(xlToLeft).Column  
21:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
22:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
23:      Columns((colx - 1) * 4 - 2).Insert Shift:=xlToRight  
24:    Next  
25:    'Split cells using ! as delimiter  
26:    For colx = 1 To Cells(1, Columns.Count).End(xlToLeft).Column Step 4  
27:    Columns(colx).Select  
28:    Selection.TextToColumns Destination:=Cells(1, colx), DataType:=xlDelimited, _  
29:      TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _  
30:      Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _  
31:      :="!", FieldInfo:=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1)), _  
32:      TrailingMinusNumbers:=True  
33:    Next  
34:    Application.ScreenUpdating = True  
35:    Range("A1").Select  
36:  End Sub  

Converting a JSON file to CSV

JSON doesn’t map perfectly to XML hierarchies, and definitely doesn’t have an ideal proxy for formatting as a CSV. However, JSON data doesn’t load into Excel for analysis purposes. Existing code I’ve found in forums for turning JSON into CSV simply uses json_decode() and then used fputcsv(), which ignores if the columns have different headers. To solve this, I wrote some code in PHP that converts JSON into an array [using json_decode()], then into XML so that I could plug into into my XML to CSV parser. There may be a simpler way, but I already had the XML to CSV script written.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
<?PHP
//All you need to change is the first two lines, when the data comes from and where it goes.
//The top is the procedural elements. Get data-&gt;remove line breaks-&gt;declare vars-&gt;convert json to array-&gt;Flatten array-&gt;convert to XML-&gt;convert to CSV
$json_str = file_get_contents('json.txt');
$filename = "target.csv";
$white= array("\\r\\n", "\\n", "\\r");
$replace = ' ';
$json_str = str_replace($white, $replace, $json_str);//line breaks cause malformed XML
$json_obj = json_decode($json_str,true);
$json_arr = array();
$return = array();
$xml = new SimpleXMLElement("");
foreach ($json_obj as $fields) {
    $json_arr[]=flatten($fields);
    $return=array();
}
foreach($json_arr as $val)
{
	array_to_xml($val,$xml);
	XMLtoCSV($xml);
	$xml=new SimpleXMLElement("");//CSV must be written one row at a time. Each element in the array is single row. Must re-initialize XML afterwards.
}
echo("Completed");//If your file is huge (mine was 300mb) it may take a while to create. Its good to know when it's done.
function flat ($a,$b){global $return; $return[$b] = $a;}//my web hosting company doesn't have PHP 5.3 for some reason so I can't pass a closure to the flatten() function. There may be an easier way still... but this works at least.
function flatten($array) {
	global $return;
	array_walk_recursive($array, 'flat');
	return $return;
}
//Simple conversion script for array to xml. The XML is not necessary well-formed. Indexed arrays will yield elements with numbers for names.
function array_to_xml($arr, &amp;$xml) {
    foreach($arr as $key =&gt; $value) {
        if(is_array($value)) {
            if(!is_numeric($key)){
                $subnode = $xml-&gt;addChild("$key");
                array_to_xml($value, $subnode);
            }
            else{
                array_to_xml($value, $xml);
            }
        }
        else {
            $xml-&gt;addChild("$key","$value");
        }
    }
}
//This parser was designed to take XML data and write it to a CSV mapping data to specific columns based on the column name, and adding columns when necessary. I wrote it a long time ago, and the code gets edited more than the comments get updated. Some of the comments may be off.
function XMLtoCSV($data){
	global $filename;
	if(file_exists($filename))
	{
		$fh = fopen($filename, "r");
		$existing_columns = fgetcsv($fh); //this is an indexed array of all the column headings
		$original_columns = $existing_columns;
		$num_original_columns = count($original_columns);
		//go through all the new data and map it onto the existing stuff...create new headings for the new data
		//first dump the XML $data columns into an array
		$x=0;
		foreach($data as $key =&gt; $value)
		{
			$new_columns_array[$x] = $key;
			$x++;
		}
		$num_newdata = count($new_columns_array);
		$numberofnewvariables = 0;
		for($x=0;$x$location)
		{
			$write_array[$location] = $key;
		}
		//next create data array from XML
		$x=0;
		foreach($data as $key =&gt; $value)
		{
			$value = str_replace("'","",$value);
			$value = str_replace('"','',$value);
			$value = str_replace(',','',$value);
			$value = str_replace(';','',$value);
			$new_values_array[$x] = $value;
			$x++;
		}
		if($numberofnewvariables &gt; 0)
		{			
			$fh = fopen($filename, "r");
			$i=0;
			while(!feof($fh))
			{
				$data2[$i++]=fgets($fh);
			}
			$first_row=explode(",",$data2[0]);
			//add new column headers here
			$temp_numcolumns = count($first_row);

			for($x=0;$x&lt;$numberofnewvariables;$x++)
			{
				$first_row[$temp_numcolumns + $x] = trim($new_columns_array[$write_array[$num_original_columns+$x]]);
			}
			for($x=0;$x&lt;count($first_row);$x++)
			{
				$first_row[$x] = preg_replace( '/\r\n/', '', trim($first_row[$x]) );
				$newfirst_row .= $first_row[$x].",";
			}
			$newfirst_row .="\n";
			$data2[0]=$newfirst_row;
			fclose($fh);
			$fh = fopen($filename,"w");
			$i=0;
			while($i&lt;count($data2))
			{	
				fputs($fh,$data2[$i]);
				$i++;
			}
			fclose($fh);	
		}
		$fh = fopen($filename, "r");
		$total_columns = fgetcsv($fh);
		$num_total_columns = count($total_columns);
		fclose($fh);

		$fh = fopen($filename, "a");
		for($x=0;$x&lt;$num_total_columns;$x++) 		{ 			fwrite($fh, "\"". $new_values_array[$write_array[$x]]."\","); 		 		} 		fwrite($fh,"\n"); //terminate the line; 		fclose($fh); 	} 	else 	{ 		$fh =fopen($filename, 'w'); 		foreach($data as $key =&gt; $value)
		{
			//first loop creates column headers
			fwrite($fh,"$key,");
		}
		fwrite($fh,"\n");

		foreach($data as $key =&gt; $value)
		{
			//second  loop writes the data (and cleans it)
			$value = str_replace("'","",$value);
			$value = str_replace('"','',$value);
			$value = str_replace(',','',$value);
			$value = str_replace(';','',$value);
			fwrite($fh,"\"$value\",");
		}	
		fwrite($fh,"\n"); //terminate the line;
		fclose($fh);
	}
}
?>

Programmatically embedding Flash videos in Qualtrics

I recently was helping out with a project that required a user to view 50 different flash videos. Since Qualtrics parses HTML automatically in the survey preview mode of the editor, having 50 videos pop up and start playing at once is loud and annoying. Additionally, it is inefficient and difficult to handle randomizing conditions. I wanted to come up with a quick and easy way to use JavaScript to do this in a single question.

To start, I created an EmbeddedData element in the survey flow (always on top) with fields ‘cond’ and ‘i’, leaving the values blank. Next in the instructions question, I created a random condition (all 50 videos randomized). Since the videos have the same URL and take a number as a parameter from the query string, I created and randomized and array with numbers 1 through 50. I then updated the ‘cond’ embedded data element and assigned the index element (‘i’) to 0, using the setEmbeddedData() method that Qualtrics has programmed into their API.

var conditions=new array();
//followed by to populate and randomize the array
Qualtrics.SurveyEngine.setEmbeddedData("cond",conditions.toString());
Qualtrics.SurveyEngine.setEmbeddedData("i","0");

On the question that will display the video, I needed a way to access the embedded data. You cannot do that with the getEmbeddData() method because the embedded data was created on a different page. Instead, since I am going to be editing the question text anyway, I pipe the data into the question text, typing in ${e://Field/cond};${e://Field/i}.

Next, I go to the JavaScript editor, I grab the question text HTML element using another method from the Qualtrics API.

var qt = this.getQuestionTextContainer();

Then I grab and parse the embedded data information using String.split().

var parts=qt.innerHTML.split(";");
var index=Number(parts[1]);
var conds=parts[0].split(",");
var condition=conds[index];

Last, I replace the question text with an iframe of the web page that stores my Flash files, and increment the index variable.

qt.innerHTML="";
Qualtrics.SurveyEngine.setEmbeddedData("i",String(++index));

Now I can repeat that block in the survey flow, and each time it will play the next video.

For a summary for those who want to just copy and paste, click here.

Qualtrics JavaScript Methods: Setting Embedded Data with JavaScript

I’ve been frustrated trying to manipulate questions in Qualtrics and not being able to do so. Qualtrics is great and better than any other survey creator on the webs, but as a programmer, I often want to do things Qualtrics just can’t do on its webpage. I also don’t want to program my own study since that takes 10 times longer.

I recently was able to solve several problems by looking at the Qualtrics JavaScript source. Qualtrics publishes a Question API, but it is slightly incomplete. Looking at the full code answered so many questions that their customer service was not able to. For instance, answer responses are stored as hidden input elements. This makes sense, but there are other methods. As I continue to go through the code, I will be posting intermittently various tools to use in Qualtrics or answering puzzling questions I have had that are solved by looking at the programming. Obviously all I have are the client side JavaScript files, so I have no idea how information is stored on servers, etc., but this has been very helpful to me  and should be to you as well.

One function I have wanted to do but didn’t know how until I saw the code is modify embedded data elements within a question. Having to do it in the survey flow is very limiting. When it is modified, a simple JavaScript method is called which updates the HTML input element for your data. Here is the two functions:

addEmbeddedData: function (key, value)
{
$('Page').appendChild(QBuilder('input',
{
type: 'hidden',
name: key,
value: value
}));
},
setEmbeddedData: function (key, value)
{
var fieldName = 'ED~' + key;
if ($(fieldName))
{
$(fieldName).value = value;
}
else
{
$('Header').appendChild(QBuilder('input',
{
type: 'hidden',
id: fieldName,
name: fieldName,
value: value
}));
}
getEmbeddedData: function (key)
{
var fieldName = 'ED~' + key;
if ($(fieldName))
{
return $(fieldName).value;
}
}

There are several limitations that I have learned from trial and error. First, while after looking it the code, it appears like Qualtrics implements jQuery, they in fact do not. Their selector function [$()] takes a string (or multiple strings) and only searches by id. CSS selectors do not work. Second, JavaScript alone can’t do much with EmbeddedData elements. For your data to be store in the datafiles, and for you to use the EmbeddedData across multiple pages, you need to create an element with that name in the Survey Flow. Using addEmbeddedData without the element in the survey flow will result in the data being lost after the user moves on to the next page. Last, the getEmbeddedData() method cannot get embedded data elements that were not created through JavaScript. Creating an embedded data element in the Survey Flow and then trying to access it using getEmbeddedData() will not work. Creating an embedded data element in the survey flow, editing it using setEmbeddedData, then getting it using getEmbeddedData is the only way to get and set the data in a permanent way.

Check out tomorrow’s post for an example.

Excluding Mturk workers from surveys in Qualtrics (and elsewhere)

A little while ago I wrote about how to replace the use of confirmation codes with the use of Mturk’s worker ID. It really serves a variety of function other than getting rid of pesky confirmation codes. For instance, many people want to exclude workers from a survey if they have done certain past surveys. Other than data-basing every worker and survey you run (something I recommend, but that’s for another time), you can use this method which matches the respondents worker ID against a black list of past respondents.

First, you need to get and store the worker ID in Qualtrics. I covered this here. Skip the senseless writing at the beginning and just read the comments inside the code and everything below it.

After completing that, all you need to do is make a comma separated list of your worker IDs to exclude, and program a logical branch in Qualtrics. To make the list, you should either have a data file or Mturk batch file with the IDs in a column. Put the column in a new spreadsheeet, and put a comma in each cell to the right of an ID (just type a comma next to the first ID then CTRL click and drag in Excel). Highlight and copy both columns. In your Qualtrics survey flow, add a new field underneath your workerId embedded data field. Name it whatever you want, but I use exclude. For the value, paste the string of IDs.

Last, create a branch element with the logic “IF EMBEDDED DATA exclude CONTAINS workerId” and make that go to a block containing a text question politely asking them to return the HIT, then add an End of Survey element. You may get some complaining emails for making someone accept a HIT just to ask them to return it, but it really does not affect them negatively (I even asked Amazon about this). Everything should look like this:

If you are using something other than Qualtrics, I have used similar methods in SurveyGizmo, Survey Monkey, and in custom programs, though with different implementations. If you need help using this in a different venue, leave a comment.

How attention filter format affects responses on Mturk

Attention filters or known-answers are the most common way to gauge the responders level of attention or effort. If a question where the answer is given to the participant or easily known (e.g. 2+2 = ?; or telling the answer in instructions before the question) is answered incorrectly, you have a good idea that the person was breezing through and they either should not be paid or you should not use their data.

When I first started using Mturk, the attention filter fail rate was about 5% in my surveys, which is phenomenal for the cost. I attributed this to the fact that people on Mturk are worried about not getting paid, so more attention is put into work. However then I had a string of surveys where the fail rate was around 50%. This is terrible and justification to not publish results even if they are good. I decided to try to figure out the cause.

A few weeks ago I wrote about how workers sometimes don’t attend to instructions in order to make the survey go faster. Workers are used to quick tasks, and repetitive multiple choice questions are therefore better than long instruction sets. I ran two studies and each contained the same attention filter. The filter was one that gave the answer in a paragraph before the question. Then asked a simple question where the correct answer (the one given) would not be the correct answer to the question. Like “Answer ’8′ to the following question. 2+2=___.” The question was always bolded, but the instructions were bold in one study and not bold in another. When bold the fail rate was 7%. When not bold the fail rate was 45%. The simple act of making the instructions look like instructions made people not read them.

Because of this, it is important both to construct attention filters in a way that they measure what you want to measure (e.g. do you want to know if instructions are being read or if question text is being read) and it is important to format instructions in a way that they look like questions.

Questions get read, instructions don’t.

The kinds of experiments that work on Mturk

In my never ending attempt to push colleagues into the Mturk world, I often tell them how I have never had a study that worked on another population and not work on the Mturk worker population. However, inevitably my advertisements prove false and I get someone who says that they got some weird results. After studying the types of experiments conducted, I noticed something that should have been obvious from the beginning. Workers seem to be really good at short iterative tasks, or longer tasks that have objective measurements. Thus my decision making work which asks multiple choice questions or asks people to click some buttons  and watch some numbers appear and then judge them works really well. It is akin to tasks like photo tagging. Something short over and over again is easy to pay attention to. Similarly, asking someone to write out how they would solve a problem gets good results. Anyone can look at a paragraph on how to choose between two insurance options and tell if the person put effort into it or not.

However, complex imagined scenarios tend to not work very well. A worker is not used to keeping 2 pages worth of instructions memorized to imagine what it would be like to be a Doctor figuring out whether to prescribe a patient drug X which has a higher cure and premature death rate or drug Y which guarantees immediate safety, but may not cure the disease. In general I think it should be assumed that workers are under high-cognitive load at all times. The worker may be taking your survey at work and while answering a question they may also be trying to gauge if their boss is walking by. If cognitive load factors into your variables or you think a high amount of attention is needed, you will be better to use a different recruitment method or pay a significantly higher amount of money. Last, if you are using Mturk, I would recommend using mechanisms to force attention to instructions like not allowing the Next button to appear for a minute or two in Qualtrics.

What makes the Turker click

When selling the benefits of Amazon Mechanical Turk (Mturk) to people, I usually just have to say “faster and cheaper” and the sales pitch is over. Last year, I got 12,000 people to do a task that paid a penny in about 90 minutes. Sometimes I would spend 3 hour getting 20 subjects to take a survey at UCLA. That 12,000 subject task was a special case, however. It took maybe 20 seconds to complete, and involved looking at a picture and then typing something. On average I tell people they will get their subjects in about 24 hours. This typically is ~200 people doing a 10 minute task for at $0.50.

Lately, though, I have had some surprises. Consider the following three studies. 1 posted on a Tuesday, 1 on a Thursday, 1 on a Monday. All use the same posting template with the same number of participants needed and same payment, but go to a different external survey site. They also had slightly different titles. The Monday post took ~4 days. The Thursday post is still going and is projected to take 6 days. The Tuesday post took 4 hours. When the researchers ask me why its taking so long, I have to act surprised and tell them to just sit it out. Then they worry that the added time means something is wrong and their data won’t be very good. I wonder the same things.

While I don’t have answers to why those HITs acted differently, I have found a few tips to maximize response rates. (Note: sometimes below I will generalize thoughts to “they” when really what I’m saying is “me.” In addition to being a master requester, I am an avid Mturk worker.)

1. Pay at least $0.50. With some exceptions, I found 50 cents to be a good price. It’s more than most HITs out there, but surveys are widespread now and people know they may not make as much answering questions than tagging images or pasting Wikipedia URLs. Most workers set a minimum payment for their searches. When I run studies at $0.25 I expect it to take 2 weeks to get enough subjects. The difference is exponential. If you are not pretesting or piloting consider something even higher. For studies that are published I try to pay $1.00 for a 15 minute study. Still under minimum wage, but around 4x the Mturk average. When people see a $1.00 post, they jump at it, and that desire to participate translates into a better research participant.

2. Obvious but use as many keywords as you can think of. Workers tend toward repetition. A person will log on, see a photo tagging HIT and spend the next hour tagging photos and searching for more photo tagging HITs. Similarly, some people don’t want something so monotonous and they search for surveys. Make sure survey is somewhere in your keywords. Start brainstorming and use a lot. People only find

3. Difficult but try to offer multiple HITs in each batch. Most researcher requesters think of their postings a single entity. That is not how Mturk was designed. Mturk was designed to take a template and input and generate many individual HITs from that data, like taking a template and 1000 images and generating a batch of HITs for tagging all 1000 of those images without having to create the 1000 separate HITs individually. Mturk by default orders search results by the number of HITs available. That means if you only have 1 HIT available (i.e. each single person can only do 1 survey) then you will be lower down the list and get fewer workers. Offering multiple HITs is not easy though. You have to design a scenario where you can have someone do two or more tasks but they can stop at 1 if they want. A good idea is if you have multiple projects going and want to run multiple studies at the same time. If you can’t do this, then you have to get really creative. One thing to avoid is what I have seen several times.  A survey completes and I try to move on to the next. It then tells me that I cannot do more than 1 HIT in the batch. The requester keeps track of workers and really only wants them to do one of the surveys but posts multiple so they are higher up in the search. This is bad for reputation… which brings me to.

4. Keep a good reputation. All requesters know about the stats of workers. Things like rejection rates, etc. A lot of requesters don’t know about sites like Turker Nation and Turkopticon. Turker Nation is a forum for workers, where among other things they talk about the quality of requesters. Even more useful for workers to gauge the quality of a requester is Turkopticon. This is a browser extension that allows users to hover over the names of requesters and see the Turkopticon ratings for that requester. Turkopticon keeps rating for Generosity, Speed of payment, Fairness, and Communicativity. So answer your emails, pay people quickly (don’t wait a week for it to auto-pay), give real reasons to reject (or better yet, be lenient with you $0.50 and give a warning if you think they tried but still failed), and don’t pay 10 cents for an hour-long task.

5. Consider the time. A few years ago I did a survey and discovered that most workers do tasks during their work day as a way to waste time and get paid instead of waste time and get fired on Facebook. This probably has changed some in the last 3 years, but I still find posting early in the week and in the morning gives you an edge over posting at night on Friday.

There are dozens of more examples. I will probably make this a continuing series, but don’t hesitate to share your ideas. I still have no explanation for the disparity in times recently and would love to hear what you have to say.

You’d think they’d make it easier

I am often in need of the separate components of regression function. For instance, differences in utility functions can be easily measured by comparing the exponents of a the functions’ power regressions. Excel can calculate the power regression and place the line on a chart. For some reason though, they did not give an easy way to get that data in a formula. If I felt like programming my own functions I imagine =powreg(x’s,y’s,[desiredDataToReturn]) would definitely be in there. Since it’s not I’m usually forced to go to Google. Unfortunately this appears to not be to wide-spread of a problem, or any easy solution exists that I am not aware of, because I am always at a loss even after a Google search. Of course Excel help is useless. They can’t bother to document that functions have more uses than they appear (=ln() is also an array function? I guess you need to buy a book to learn these things).

At first I was forced to look up and convert the power law equations into an Excel function. If you are looking for practice with Excel, have fun; here’s the equation for the exponent in a power function:

Power law equation for exponent

It would be easy if Excel had a better way to manage parentheses

Just kidding. Here’s the formula: =(count(x array)*sum(ln(x array)*ln(y array))-sum(ln(x array)*sum(ln(y array))/(count(x array)*sum(power(ln(x array),2))-power(sum(ln(x array),2)). Then hit CTRL+Shift+Enter NOT just enter. Quick explanation, ctrl+shift+enter enters the sums and natural logs as array functions which is how it calculates xi instead of just x. You can check the result against a trendline inserted in a chart.

Since that way is really no fun and typing that out (or even now copy and pasting and then editing the arrays) is excruciating, I decided to see what else Excel has to offer. The =linest() function looks like it’s no help at first since it only does a straight line regressions, but then I thought, “What if I just change the space the regression is playing in?” To skip some 12th grade calculus (I really don’t even know if what I’m doing is calculus), I’ll simply say a linear regression of on a log scale is a power regression in the linear scale. Simply put take the natural log of all your coordinates when plugging them into your formula.

=linest(ln(x array),ln(y array),false,false)

This formula gives you a power regression where =linest(x array, y array, false,false) gives you the linear regression. If you want the exponent, you could do the ctrl+shift+enter approach I mentioned above, but if you like using Excel arrays in a more user-friendly format, use the =index() function. The exponent is the first index (or 0 since it’s a 0-based array) and the coefficient is the second index.

=index(linest(ln(x array),ln(y array),false,false),0)

This gives you the exponent. Below is the coefficient. You have to inverse the coefficient (take it out of the log scale and back to the linear scale) first though.

=exp(index(linest(ln(x array),ln(y array),false,false),1)

Easy enough. I suppose programming a separate formula for this and every other regression style is superfluous, but why couldn’t Microsoft just put this in their Help document. It literally adds maybe 2 lines, as opposed to the 100 lines you’ve read here if you go this far.

Do you feel pre-determined … punk?

During a particularly unfortunate (and melodramatic) time, I once told someone that only bad things happen to me. Anything good thing was not happening to me directly, but merely a by-product of something good happening to someone else. As you can imagine, I was not particularly happy, but don’t worry I got over it quickly. I even had a statistical justification. If, say, a trillion random things happen to a person in their life, on average half might end up good and half might end up bad. But what’s the standard deviation? If chance events are normally distributed, there is a likelihood someone, probably me, has at least 95% of those trillion random things turn out bad.

Obviously I didn’t really believe this, or at least to that extent. However, plenty of people out there have some opinion on their own personal luck. During a study where I asked people to make choices between gambles, I decided to ask them why they chose what they did. Most gave mathematical-type answers like the expected value of one was better than the other. A significant number, however, said they didn’t want to pick the risky gamble because they are they are not lucky.

So I began to think what other areas of decision making are affected by a person’s thoughts on luck. Gambling is an obvious one, and to an extent well-researched. Unlucky people tend to stop gambling after a string of losses, while lucky people tend to keep going. However, is this how people make their investment decisions? Or even their traffic decisions? I used to be able to take either the freeway home from work at UCLA or side-streets. Freeways can be faster, but in traffic its variable and you may hit a dead-stop. Side-streets at least have you moving all the time, that is except for at stop lights. I hear people discuss this same problem all the time, all over the country. Do I personally take side streets only when I think I am unlucky? Do I drink the 2-day-expired milk only when I am feeling good about my cosmic chances?

This is going to be the next branch of research I delve into. Can thoughts on luck be used as an individual difference measure to explain decision making behavior. Do attitudes toward luck change over time? Is the attitude merely conscious or does it affect choices even when a person is not thinking about whether or not they are lucky? In the end I hope to finally answer whether the bank robber in Dirty Harry really wasn’t feeling lucky or whether he was just making a rational choice.

Post Navigation

Follow

Get every new post delivered to your Inbox.

Join 338 other followers