the:behavioral:lab

Excluding Mturk workers from surveys in Qualtrics (and elsewhere)

(**Update: I no longer actively maintain this blog. The information below may be out of date, but hopefully is still helpful in getting you closer to solving your problem.**)

A little while ago I wrote about how to replace the use of confirmation codes with the use of Mturk’s worker ID. It really serves a variety of function other than getting rid of pesky confirmation codes. For instance, many people want to exclude workers from a survey if they have done certain past surveys. Other than data-basing every worker and survey you run (something I recommend, but that’s for another time), you can use this method which matches the respondents worker ID against a black list of past respondents.

First, you need to get and store the worker ID in Qualtrics. I covered this here. Skip the senseless writing at the beginning and just read the comments inside the code and everything below it.

After completing that, all you need to do is make a comma separated list of your worker IDs to exclude, and program a logical branch in Qualtrics. To make the list, you should either have a data file or Mturk batch file with the IDs in a column. Put the column in a new spreadsheeet, and put a comma in each cell to the right of an ID (just type a comma next to the first ID then CTRL click and drag in Excel). Highlight and copy both columns. In your Qualtrics survey flow, add a new field underneath your workerId embedded data field. Name it whatever you want, but I use exclude. For the value, paste the string of IDs.

Last, create a branch element with the logic “IF EMBEDDED DATA exclude CONTAINS workerId” and make that go to a block containing a text question politely asking them to return the HIT, then add an End of Survey element. You may get some complaining emails for making someone accept a HIT just to ask them to return it, but it really does not affect them negatively (I even asked Amazon about this). Everything should look like this:

If you are using something other than Qualtrics, I have used similar methods in SurveyGizmo, Survey Monkey, and in custom programs, though with different implementations. If you need help using this in a different venue, leave a comment.

Single Post Navigation

21 thoughts on “Excluding Mturk workers from surveys in Qualtrics (and elsewhere)

  1. Hello,

    Thank you for posting this. It was very helpful and it looks cool the way you can show duplicated workers a different block of texts. Is it possible to do the same thing on surveygizmo? I am trying to screen MTurk workers who have worked on my HIT in the first batch so they won’t take the HIT again.

    Again, thank you!

    Thuy-vy

    • I’ve only done this in Survey Gizmo once, and a long time ago, but what I have below seems to still work.

      I know you can set it to accept query string variables, so you can use this and this to send and store the users ID. I can’t seem to find an equivalent operator to “contains” in Qualtrics, though “is in list” looks like it should be similar. However, you can also use a regular expressions which is pretty easy.

      First you need to make a regular expression that means a list of separate values (if you don’t know what a regular expression is you don’t need to for this purpose, but its still good to know so Google it). To do that that take each worker ID and put a pipe next to it (vertical bar, shift-backslash or |). You can do this in excel with the concatenate formula assuming you have your worker IDs in a spreadsheet from a previous HIT. The pipe operator in regular expressions works like it does in most programming languages, its a logical OR. It will evaluate either the left OR right (or in your case left or right or right or that or right of that, etc. if you have 3 or more operands). So you should have something that looks like this workerId1|workerId2|workerId3 and so on.

      Then your logic should look like this: IF [select the variable with the worker ID in it] MATCHES REGEX PATTERN [paste your pattern e.g. 1|2|3|4|5] do something. I don’t know the best place to put the logic, but I’m sure you can figure that out. Let me know if you need any more help.

      Andrew Long

  2. Hi there,
    This is a great resource! Thanks so much for sharing your knowledge.

    I’ve been trying to implement your method, and I’m wondering if you might have some advice. I’ve got everything set up as instructed, and when I only use one MTurk ID in the “exclude” embedded data field, it works perfectly. However, when I input multiple IDs using the comma separator as you outline in your post, the exclusion stops working.

    Any ideas as to why this might be occurring?

    Thanks again,
    Crys

    • I am assuming you are using Qualtrics. If you are not, then you’ll have to tell me what you are using, and I may know how to help.

      I am thinking it has to be something with your survey logic. The list you create does not need to be a comma separated list really, it is just easier to read. Qualtrics treats your entire value as a single string and looks for substrings. Thus 1,2,3,4 = 1, 2, 3, 4 = 1 2 3 4 = 1234. This presents a limitation to this method since 1234 can be seen as a single value (1 4-digit user ID) or 4 values (4 1-digit user IDs) or anything in between. Thus not only do IDs have to unique, one’s ID cannot be a substring of another’s ID (e.g. a person can’t have an ID of 1 while another person has an ID of 12) This shouldn’t be a problem with mturk, but some mturk ids are longer than others. I am not sure of the overlap.

      That does not answer your question, but does rule out the biggest possibility, that how you are creating your list matters (which it doesn’t). I really don’t know why 1 ID would work but multiple wouldn’t. Perhaps something else changed in your survey.

      I would try to trace the entire process. If you are following my method exactly, you are taking an ID from the URL parameters. On your first question pipe that ID into a question so you can check that qualtrics is taking in the right info. For example if you named your input embedded data element workerId put ${e://Field/workerId} in your first question (assuming you put that embedded data element before your first question block). You can also do the same thing for your exclude data element. If its called exclude, use this code: ${e://Field/exclude}. If that is all correct, double check your CONTAINS logic, check that it all makes sense (IF exclude IS CONTAINS workerid will exclude people that match the list assuming you send them to an end of survey element. Perhaps you accidentally sent them to a the main survey). Also look for typos. Next, test your survey with an ID that you know is NOT in the list and see where it take you in the survey (should be to your inclusion question block). Then check an ID that you know IS in the exclusion list. This should take you to the exclusion question block.

      Let me know what you find. If you can’t get it to work, you can take screen shots of your survey flow and send them to me.

      • Hello,
        Thank you so much for your swift and thorough response! I took an eagle-eyed look through my programming, and it turns out I had the exclusion branch set up incorrectly. Rather than having it set as “If Exclude is Contains ${e://Field/exclude}” I had it set as “If Exclude is Equal to ${e://Field/exclude}” This caused the survey to look at the MTurk IDs as a whole, rather than as comma delimited values.

        Thank you again for your quick answer. Luckily I had your screenshots to help me catch my own user-error.

        Regards,
        Crys

      • Subhan on said:

        Hi, first I would really like to thank you for this and the post regarding auto-retrieval of workerId. I’m using Qualtrics and mturk for the first time, for my master thesis, and they have solved many headaches. However, I have stumpled upon something that I think might be of interest for you and maybe you have a solution or experienced advice. I apologize for the long post, but hope you bear with me on this.

        I stumpled upon the fact that Qualtrics seems to be sensitive to comma and space during the screening process/if-condition. In other words, it does NOT treat 1,2,3,4 = 1, 2, 3, 4 = 1 2 3 4 = 1234, but rather distinct specifications of the exclusion list. If you e.g. type [space]2 as workerId it will reject the screening test (exclude constains workerId) in the first and last specification – so the worker will be sent to “Default Question Block” in your screenshot. Same thing happens if the worker types 12 and your exclusion list is 1,2 or 1[space]2. So I thought I could solve this by using your script.

        However, the reason I stumpled upon that in the first place was that I read in an instruction sheet that I should ensure that the workerIds from Qualtrics did not have an underscore in the beginning, e.g. “_A1234”, before I tried to match them with the ones from mturk. They had encountered that relatively often in their dataset and interpreted that as a space recoded by Qualtrics to an underscore, but after a test today it turns out Qualtrics doesn’t do that! I ran two test where I entered [space]A1234 and _A1234 as workerId (not script, but a field where you type in Id in begining of survey). Both were able to bypass the screening and when I exported the dataset the first came out as [space]A1234 and the other as _A1234. So it seems that maybe some workers type _A1234 on purpose to bypass the screening tests. The author had just manually erased the underscores (thinking it was space caused by copy-paste when workers insert Id) and then matched and paid out – confident that Qualtrics had screened duplicates correctly.

        Then I tested your script (in sandbox) and after accepting the HIT the first time, I manually changed the URL from url&workerId=A1234 (which got rejected due to duplicate) to url&workerId=_A1234 and was able to take the whole survey! Same thing if I added space, which changed to url&workerId=%20A1234. I’ve seen on some forums that workers share survey links that have similar format, but if I was able to figure out how to bypass screening after testing mturk for 4 days, so can they. So now I’m thinking one solution might be to use the HTTP referer verification in Qualtrics to force worker to come directly from the HIT preview, but I’m not sure what url to type in here? I saw your note in the script that the url is in the frame and not the one for the page, but I don’t know how to write code or figure out how to find it. I see that you use it to get the workerId, but how can I get it to show it to me, so I know what to type in Qualtrics? And do you think that might do the trick? I know I can just do another screening ex post, but then there’s no point in doing the first screening if I can’t trust it.

      • What I meant by Qualtrics treating 1,2,3,4, and 1234, etc. the same is in the value of your exclusion list, not the value of the worker id. Some people see a comma separated list and think Qualtrics will treat it like an array and iterate over the various members. Instead Qualtrics treats it like a string and looks for a matching substring, thus the problem of having an id of 12345, and searching for an id 123. Even though 123 is not in the list, 12345 is, and qualtrics will return TRUE because it found 123 in the string. Of course qualtrics treats the worker id you input as is. So, yes, 12 1 2 and 1_2 are different input values.

        I’ve never worked with the HTTP referer verification in qualtrics, but I also think it may not be the best idea as I can’t seem to find the referer from mturk in the header I’m using while testing. Perhaps Amazon blocks it. You can do that easily by using an iframe that was creating dynamically. We already know the HITs are contained within iframes, it seems likely that iframe is created dynamically. Instead what you may want to do is use javascript in your HIT to do the exclusions. This is still vulnerable to particularly savvy Turkers who may know how to edit Javascript on the fly, but that seems like a very unlikely scenario, and you still have your list that you can go through by hand just in case. I have a blog for doing this in Javascript: https://thebehaviorallab.wordpress.com/2013/09/30/excluding-mturk-workers-from-your-hits-not-just-from-your-qualtrics-surveys/

        Or just the JS code is on GitHub here:


        <h1>TITLE – change this</h1>
        <p>INSTRUCTIONS – change this</p>
        <div id="idcheck">&nbsp;</div>
        <p><textarea cols="80" name="comment" rows="3"></textarea></p>
        <script type='text/javascript'>
        var url="URL GOES HERE";//The URL that the user will go to after they are checked (i.e. the survey url). Change this.
        var workers=new Array("test1","test1","test3");//This is your black list. Change this. All IDs must be enclosed in quoation marks (single or double), and separated by a comma
        //Below is a utility function that checks if a given value is in an array. This will be used to check if the id (given value) is in the black list (an array)
        Array.prototype.contains = function(k) {
        for(var p in this)
        if(this[p] === k)
        return true;
        return false;
        }
        //HTML instructions and form for ID input
        var naCheck="<p>To see if you have completed this survey already, enter worker ID below and click \'Check ID\'</p><p><textarea rows='1' cols='20' id='idinput'></textarea></p><div id='ok' onclick='check(false)' style='border: 3px solid black;cursor: hand; cursor: pointer; background-color:gray;color:white;width:70px;height:20px;text-align:center;margin-left:10px;'>Check ID</div>";
        var href=window.location.href.toString();//Get URL of mturk webpage which may or may not include the worker ID
        var queryString={};//storage variable for query string variables
        href.replace(new RegExp("([^?=&]+)(=([^&]*))?", "g"),function($0, $1, $2, $3){queryString[$1] = $3;});//Populates variable named queryString (created above) with the actual queryString variable names and values
        //If queryString contains a variable names workerId, check if the black list contains that value of that variable, if not, ask the user to input their ID.
        if(queryString['workerId']!=undefined)
        {
        check(queryString['workerId']);//Check ID against black list
        }
        else
        {
        document.getElementById('idcheck').innerHTML=naCheck;//Add HTML necessary to ask for id input
        }
        //This function will check the ID given against the black list, or if not given will check the ID input into the text box against the black list
        function check(id){
        if(id==false){
        if(workers.contains(document.getElementById('idinput').value))
        {
        //Input ID found on list — taken the survey before, ask not to accept
        document.getElementById("idcheck").innerHTML="Your ID is on the list of workers who have completed this survey already. Please do not accept the HIT. Thank you for your previous participation.";
        }
        else
        {
        //Input ID not found on list — ask to accept, when this happens it will start the program over again, this time checking their actual ID not their inputted ID, which should be the same
        document.getElementById("idcheck").innerHTML="Your ID is not on the list of workers who have completed this survey. Please accept the HIT to continue.";
        }
        }
        else
        {
        if(workers.contains(id))
        {
        //Actual ID found on list — ask to return HIT (this should rarely happen, if ever… it basically means the person did not follow instructions or tried to lie before hand
        document.getElementById("idcheck").innerHTML="Your ID is on the list of workers who have completed this survey already. Please return the HIT. If you previously entered your ID and were told to accept the HIT, you likely did not enter the correct worker ID.";
        }
        else
        {
        //Actual ID not found on list — send to survey
        document.getElementById("idcheck").innerHTML="<p>To proceed to the survey <a href='"+url+'&workerId='+id+"' target='mturksurvey'>click here</a>";
        }
        }
        }
        </script>

        view raw

        Mturk blacklist

        hosted with ❤ by GitHub

        To make it even more difficult for people trying to get around your code, after you edit the javascript you can minify that code here:
        http://jscompress.com/

        So someone has to spend an hour figuring out the code before they edit it and take your survey. Probably better to just move on to the next HIT at that point.

        However, in the end, there is nothing you can to do to prevent someone who really wants to from changing the value sent from the client, and therefore you’ll never have 100% guarantee no one fell through the cracks. You said doing an ex post verification makes the initial verification pointless. I disagree. The initial catches the dozens who see that they don’t qualify and move on to the next HIT. The ex post catches the few (if any) who find a way to skirt your process. If your research is similar to mine, 1 or 2 people out of 2 or 3 hundred participants probably won’t skew the effects, and wasting 50 cents of a dollar isn’t that bad of a problem. However, if it is essential to prevent people who participated before, always test ex post.

      • Subhan on said:

        Appreciate your feedback. It helps to know exactly how qualtrics is treating the exclusion list and workerId input.

        After a bit of googling on javascript I tried writing document.write(window.location) as a new script after your autoretrieve script. When I put the HIT up in sandbox it gave me a long url like workersandbox.mturkcontent.com/dynamic/(hitid, assignmentid, workerId etc) after accepting the HIT. Is this the dynamic thing you talk about? I inserted workersandbox.mturkcontent.com (without the subsequent stuff) in the HTTP referer verificaiton in qualtrics (survey options) and it seems to work. Tried doing the same on a live HIT I put out and it returned mturkcontent.com(…), so think that might work. However, I don’t have a fully set up worker account, so I couldn’t accept and check.

        Accessing the survey from a direct link gets stopped, but from the HIT it works fine. Tried safari and firefox, delete cookies etc., and worked. If I try to resume a previuos session I still get access from direct link because the cookie seems to override the http referer if I have been granted access once before. But of course, if I delete the cookie I have to start over again and from HIT preview. So I might get duplicates in the qualtrics database, but that’s not such a big problem.

  3. Hi,

    Any tips for how to do this in SurveyMonkey?

    Thanks for sharing!

    • Any possible way to do it in survey monkey requires a license I don’t have, so I haven’t been able to try anything. A platinum license is required to take in query string variables, which is necessary for taking in an mturk ID straight from mturk. However, I don’t know if survey monkey lets do anything with what they call “custom variables.”

      A different technique that is easier to try but I don’t think is implementable in survey monkey is having the participant input their ID, and using skip logic based on what they input. SurveyMonkey, from what I remember, only allows skip logic on multiple choice and scale questions.

      Another solution I have for you (assuming you are using mturk), is an in-mturk way of excluding people. The logic of it is:
      1) See if worker ID query string variable exists (it does if the hit is accepted and does not if it is not accepted).
      2) If no ID variable exists, ask user to input ID.
      3) Check ID against black list. Notify to accept HIT (if not on black list) or not to accept HIT (if on black list)
      4) If ID variable exists, check that variable against black list. Notify to proceed to survey or return HIT.

      This means that people have to accept your HIT to go to your survey. After accepting the HIT, you can programatically access their worker ID preventing people from lying or accidentally taking your survey multiple times.

      I decided to just post this as another blog post, so here is the link: https://thebehaviorallab.wordpress.com/2013/09/30/excluding-mturk-workers-from-your-hits-not-just-from-your-qualtrics-surveys/

      You can read the actual post, but here is what you have do. Paste this code (and only this) into the Source area of your HIT (second tab, hit Source). Then change the title, and instructions. You can technically do this in the normal text editor after pasting it into the source window also. Then in the source window you need to change the URL variable. Paste your survey monkey URL (or any url if non-survey monkey users are reading this) between the double quotation marks, removing YOUR URL HERE. Last, you need to create a comma separated list of worker IDs. Each worker ID has to be wrapped in quotation marks (single or double) and separated by a comma. I put three fake IDs in there to show you how it should look. You can create this in excel pretty easily.

      Last, if you want, you can change the conditional instructions. Any text that is within quotation marks will potentially be shown to the user. You can judge from what the text currently says what condition occurred to show that text. (e.g. in the function called check, the first text is for people who are on the black list and have not yet accepted the HIT).

  4. Pingback: Excluding Mturk workers from your HITs (not just from your Qualtrics surveys) | the:behavioral:lab

  5. Pingback: Changing Mturk submit button functionality & A new to prevent duplicate workers on separate HITs | the:behavioral:lab

  6. Thank you for the guide! This addresses my issue very well, but I am wondering if you can confirm a modification. My goal is simply to prevent workers who took the HIT in batch 1 from taking the HIT in batch 2. I did not plan ahead effectively, and issued a confirmation code in Qualtrics.

    Does it make sense to edit the HIT to make it consistent with this guide, then take the workerIDs directly from mturk to exclude them? My apologies for the confusion. Thanks!

    • Sorry for the delay. I took a long holiday. Yes I think it makes sense to use this guide if you found it simple enough to follow. To get the worker IDs you just download the CSV data from batch 1 (let me know if you don’t know where to find this), and use the workerId column from that file.

  7. Lexi on said:

    I’m conducting a psychology study using MTurk and Qualtrics and have very little background in coding. I have been having workers input their MTurk IDs as the first question on the survey. How would I go about using the inputted data instead of the string at the end of the URL to filter out workers who already participated in the first round of the study? Thanks

    • You can grab question responses using what I’ll call piped text codes. They look something like this: ${q://QID59/ChoiceGroup/SelectedChoices}. To get them, the only reliable way I know is to go into a question (i.e. click on the question text), then click “Piped Text,” and select what you need. Since you are using a text input question, you’ll hover over the question you want get the response from (should be the first question in the list) and then there should be two options, “question text” and then the second one will actually have the question text (e.g. “Please type your worker ID”). You want to click the second one. Your code should look like this: ${q://QID60/ChoiceTextEntryValue} but with a different QID number (the QID number is the thing I don’t know how to get other than using the piped text feature). Copy the code that gets put into the question text and delete that code (otherwise it will actually show in the survey).

      Then follow the instructions in the blog IF exclude CONTAINS workerId the branch conditional should read IF exclude CONTAINS ${q://QID60/ChoiceTextEntryValue}.

      You may run into problems with things like white space (someone put a space before or after their ID) and possibly with case-sensitivity. You may want to put create your list of IDs using both lower-case and capital letters (which you can do in Excel using the =UPPER and =LOWER functions). Let me know if things don’t test out well.

  8. WOW just what I was searching for. Came here by
    searching for las vegas

  9. Hello outstanding blog! Does running a blog like tthis take a lott of work?
    I’ve absolutely no expertise in coding butt I had been hoping
    to start my own blog soon. Anyways, if you hage
    any recommendations or techniques for new blog owners please
    share. I understgand this iss off topic nevertheless I simply wanted to
    ask. Thankk you!

  10. For latest news you have to visit world wide web and on the web I found this site as
    a best site for hottest updates.

  11. Magnificent items from you, man. I have be aware your stuff previous to and you are simply too fantastic.
    I really like what you have bought right here, really
    like what you’re stating and the best way through
    which you assert it. You’re making it enjoyable and
    you still care for to stay it wise. I can’t wait to learn far more from you.
    That is really a tremendous web site.

  12. Thanks for sharing such a good opinion, piece of writing is good,
    thats why i have read it fully

Leave a comment