Flagging respondents

Using flag_resp() to create and compare flagging strategies

One use-case for response quality indicators is to use them to flag responses which potentially are of low quality. resquin provides the function flag_resp() to create a data frame of booleans (T and F) according to user-defined cut-off values on response quality indicators. If a respondent receives a T value, they are flagged as suspicious. If they receive F value, they are deemed unsuspicious.

The strength of flag_resp() lies in its ability to quickly create and compare multiple flagging strategies, as the following example illustrates:

Suppose we use data on response styles to decide whether respondents are low-quality responders on the 15 item nep scale. We can use resp_styles() to calculate response style indices per respondent.

library(resquin)
nep_resp_styles <- resp_styles(
  x = nep,
  scale_min = 1, # minimum response option
  scale_max = 5, # maximum response option
  min_valid_responses = 1) # default, excludes respondents with any missing value

summary(nep_resp_styles)
#> 
#> ── Averages of response quality indicators
#>  MRS  ARS  DRS  ERS NERS 
#> 0.16 0.55 0.30 0.24 0.76
#> 
#> ── Quantiles of response quality indicators
#> # A tibble: 5 × 6
#>   quantiles   MRS   ARS   DRS   ERS  NERS
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0%         0     0     0     0     0   
#> 2 25%        0.07  0.47  0.2   0.07  0.6 
#> 3 50%        0.13  0.53  0.33  0.2   0.8 
#> 4 75%        0.2   0.6   0.4   0.4   0.93
#> 5 100%       0.93  1     0.73  1     1

In the first example, we will consider the acquiescence response style (ARS). ARS represents the tendency of respondents to agree to questions regardless of their content. Since the nep scale includes positively and negatively keyed items, we can expect that higher ARS values indeed correspond to this behavior: Respondents who are more concerned about nature should choose higher response options on the positively keyed items and more negative responses on the negatively keyed items. Just choosing all high response options presents a substantively inconsistent response behavior, potentially caused by acquiescence.

A first idea could be to flag respondents which have more than 80% responses in the ARS category.

first_flagging <- flag_resp(nep_resp_styles,
                            ARS > 0.8)

summary(first_flagging)
#> 
#> ── Number of respondents flagged (Total N: 1222)
#> ARS > 0.8 
#>        33

We can see that 33 respondents are flagged as suspicious, as their ARS score is above 0.8.

Using two flagging strategies

In a second step, we might also be interested in flagging respondents who choose the same response option repeatedly. We can use the resp_patterns() to compute the longest string length indicator. This indicator shows the longest string of repeated response options. We will flag respondents which have a longest string length of 8 or more. We keep the ARS flagging strategy in place to compare it to the new one.

nep_resp_patterns <- resp_patterns(nep)
nep_resp_patterns_resp_styles <- cbind(nep_resp_styles,nep_resp_patterns[,-1])

second_flagging <- flag_resp(nep_resp_patterns_resp_styles,
                             ARS > 0.8,
                             longest_string_length >= 8)
summary(second_flagging)
#> 
#> ── Number of respondents flagged (Total N: 1222)
#>                  ARS > 0.8 longest_string_length >= 8 
#>                         33                         19
#> 
#> ── Agreement between flagging strategies
#> 
#> 
#> Flag                         longest_string_length >= 8   ARS > 0.8 
#> ---------------------------  ---------------------------  ----------
#> longest_string_length >= 8   19                                     
#> ARS > 0.8                    9                            33

We can see that 19 respondents have a longest string length of larger or equal to 8. The output also contains an agreement matrix between the flagging strategies. In the second row of the first column, we can see that the two flagging strategies agree on 9 flagged respondents. Together, both strategies would flag 33 + 19 - 9 = 43 respondents of 1222.

It is also possible to join mutliple flagging expressions with an & or | operator.

flag_resp(nep_resp_patterns_resp_styles,
          ARS > 0.8,
          longest_string_length >= 8,
          ARS > 0.8 | longest_string_length >= 8) |> 
  summary()
#> 
#> ── Number of respondents flagged (Total N: 1222)
#>                              ARS > 0.8             longest_string_length >= 8 
#>                                     33                                     19 
#> ARS > 0.8 | longest_string_length >= 8 
#>                                     43
#> 
#> ── Agreement between flagging strategies
#> 
#> 
#> Flag                                     ARS > 0.8 | longest_string_length >= 8   longest_string_length >= 8   ARS > 0.8 
#> ---------------------------------------  ---------------------------------------  ---------------------------  ----------
#> ARS > 0.8 | longest_string_length >= 8   43                                                                              
#> longest_string_length >= 8               19                                       19                                     
#> ARS > 0.8                                33                                       9                            33

Comparing three flagging strategies with one comming from a different source

We can use any vector of logical (i.e. T and F) values with the same number of rows as the nep data frame and compare them with the values provided by resquin. In the following example we create a random vector of boolean values and add it to the data frame from the last example.

random_vector <- sample(c(F,T),1000,replace = T)
random_vector[is.na(nep_resp_styles$ARS)] <- NA # Add missing data as in the other data frames

# example three contains response indicator values per respondent
external_indicator_data <- cbind(
  nep_resp_patterns_resp_styles,
  new_indicator = random_vector)

flag_resp(external_indicator_data,
          ARS > 0.8,
          longest_string_length >= 8,
          new_indicator == T) |> 
  summary()
#> 
#> ── Number of respondents flagged (Total N: 1222)
#>                  ARS > 0.8 longest_string_length >= 8 
#>                         33                         19 
#>         new_indicator == T 
#>                        374
#> 
#> ── Agreement between flagging strategies
#> 
#> 
#> Flag                         new_indicator == T   longest_string_length >= 8   ARS > 0.8 
#> ---------------------------  -------------------  ---------------------------  ----------
#> new_indicator == T           374                                                         
#> longest_string_length >= 8   9                    19                                     
#> ARS > 0.8                    16                   9                            33

The new indicator new_indicator now is included in the output of the summary function and can be compared with the other indicators.

Filtering respondents

The output of flag_resp() can be used to filter out the flagged respondents. The output of flag_resp() is just a collection of logicals:

flag_df <- flag_resp(
  nep_resp_patterns_resp_styles,
  ARS > 0.8,
  longest_string_length >= 8,
  ARS > 0.8 | longest_string_length >= 8) 

flag_df
#> # A data frame: 1,222 × 4
#>       id `ARS > 0.8` `longest_string_length >= 8` ARS > 0.8 | longest_string_l…¹
#>    <int> <lgl>       <lgl>                        <lgl>                         
#>  1     1 FALSE       FALSE                        FALSE                         
#>  2     2 FALSE       FALSE                        FALSE                         
#>  3     3 FALSE       FALSE                        FALSE                         
#>  4     4 FALSE       FALSE                        FALSE                         
#>  5     5 FALSE       FALSE                        FALSE                         
#>  6     6 NA          NA                           NA                            
#>  7     7 FALSE       FALSE                        FALSE                         
#>  8     8 FALSE       FALSE                        FALSE                         
#>  9     9 FALSE       FALSE                        FALSE                         
#> 10    10 FALSE       FALSE                        FALSE                         
#> # ℹ 1,212 more rows
#> # ℹ abbreviated name: ¹​`ARS > 0.8 | longest_string_length >= 8`

We can use these to filter respondents from the original nep dataset. We can exclude the flagged respondent.

# Exclude the 33 flagged respondents with ARS > 0.8
nep[!flag_df$`ARS > 0.8`,] |> 
  na.omit() #exclude respondents with missing values
#> # A tibble: 904 × 15
#>    bczd005a bczd006a bczd007a bczd008a bczd009a bczd010a bczd011a bczd012a
#>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#>  1        4        4        5        2        4        4        5        2
#>  2        5        2        4        2        5        2        4        1
#>  3        4        2        2        2        4        2        4        2
#>  4        2        2        4        4        3        4        3        2
#>  5        2        2        5        5        4        4        5        4
#>  6        5        2        5        2        5        4        4        2
#>  7        5        2        5        3        4        4        5        1
#>  8        5        2        4        4        4        3        5        2
#>  9        1        3        5        5        4        5        5        1
#> 10        4        2        4        2        5        3        2        2
#> # ℹ 894 more rows
#> # ℹ 7 more variables: bczd013a <dbl>, bczd014a <dbl>, bczd015a <dbl>,
#> #   bczd016a <dbl>, bczd017a <dbl>, bczd018a <dbl>, bczd019a <dbl>

Alternatively we can filter out the flagged respondent.

# Extract only the 33 flagged respondents with ARS 0.8
nep[flag_df$`ARS > 0.8`,] |> 
  na.omit()
#> # A tibble: 33 × 15
#>    bczd005a bczd006a bczd007a bczd008a bczd009a bczd010a bczd011a bczd012a
#>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#>  1        4        3        4        4        4        4        4        4
#>  2        4        4        5        5        4        4        5        3
#>  3        3        4        4        4        4        4        4        4
#>  4        4        4        4        4        4        4        4        4
#>  5        4        4        4        5        4        4        5        4
#>  6        4        4        5        3        4        4        4        4
#>  7        4        4        5        4        5        4        5        4
#>  8        4        4        3        5        4        4        4        4
#>  9        4        2        5        4        5        4        5        4
#> 10        4        4        4        4        4        4        4        4
#> # ℹ 23 more rows
#> # ℹ 7 more variables: bczd013a <dbl>, bczd014a <dbl>, bczd015a <dbl>,
#> #   bczd016a <dbl>, bczd017a <dbl>, bczd018a <dbl>, bczd019a <dbl>

Notice that you can also use the id column in the flag_df to join the flag_df to your original data.