1  Data requirements

1.1 General requirements

  • Currently ShapleyVIC applies to binary, ordinal and continuous outcomes.
  • Code binary outcomes as 0/1, and ordinal outcomes as integers starting from 0.
  • No space or special characters (e.g., [, ], (, ), ,) in variable names. Replace them using _.
  • Variable centering/standardization is not required.

1.2 Missing values and sparsity

  • Handle missing entries appropriately before applying ShapleyVIC. Missing entry is not supported
  • Check data distribution and handle data sparsity before applying ShapleyVIC. Data sparsity may increase run time and lead to unstable results.

1.3 Additional pre-processing for high-dimensional data

  • Although theoretically permissible, it is not advisable to apply ShapleyVIC to data with a large number of variables.
  • Screen out variables with low importance (e.g., based on univariable or multivariable analysis p-values) to reduce dimension (e.g., to <50 variables) before applying ShapleyVIC.

1.4 General suggestions on the size of explanation set

  • Larger number of variables generally requires larger explanation set for stable results.
  • Increase in the size of explanation set and/or number of variables increases time required to compute ShapleyVIC values.
  • Use of >3500 samples in explanation set leads to long run time and is generally not recommended.