3  Descriptive analysis

Before building AutoScore models, users can use our package and codes to conduct do descriptive analysis (e.g., univariable analysis, multivariable analysis) for data with binary, survival, or ordinal outcomes.

3.1 Binary outcome

  • Compute descriptive table (usually Table 1 in medical literature) for the dataset.
library(AutoScore)
library(knitr)
data("sample_data")
names(sample_data)[names(sample_data) == "Mortality_inpatient"] <- "label"
compute_descriptive_table(sample_data)
Overall FALSE TRUE p test
n 20000 18412 1588
Vital_A (mean (SD)) 85.40 (15.23) 84.81 (15.11) 92.25 (14.98) <0.001
Vital_B (mean (SD)) 119.13 (16.72) 119.57 (16.67) 114.01 (16.49) <0.001
Vital_C (mean (SD)) 61.15 (10.81) 61.50 (10.77) 57.13 (10.48) <0.001
Vital_D (mean (SD)) 78.41 (11.14) 78.74 (11.10) 74.60 (10.95) <0.001
Vital_E (mean (SD)) 18.57 (3.92) 18.35 (3.86) 21.13 (3.74) <0.001
Vital_F (mean (SD)) 36.84 (0.59) 36.84 (0.59) 36.79 (0.57) 0.001
Vital_G (mean (SD)) 97.17 (1.98) 97.20 (1.97) 96.76 (2.03) <0.001
Lab_A (mean (SD)) 138.24 (41.69) 137.36 (41.73) 148.42 (39.80) <0.001
Lab_B (mean (SD)) 14.12 (3.44) 13.94 (3.39) 16.23 (3.31) <0.001
Lab_C (mean (SD)) 24.04 (4.34) 24.18 (4.31) 22.47 (4.39) <0.001
Lab_D (mean (SD)) 1.55 (1.30) 1.52 (1.29) 1.86 (1.35) <0.001
Lab_E (mean (SD)) 104.56 (5.54) 104.62 (5.53) 103.90 (5.52) <0.001
Lab_F (mean (SD)) 32.80 (5.53) 32.94 (5.51) 31.11 (5.47) <0.001
Lab_G (mean (SD)) 11.04 (1.97) 11.11 (1.96) 10.30 (1.97) <0.001
Lab_H (mean (SD)) 2.09 (1.14) 2.03 (1.12) 2.82 (1.15) <0.001
Lab_I (mean (SD)) 229.36 (113.73) 230.81 (113.62) 212.51 (113.65) <0.001
Lab_J (mean (SD)) 4.23 (0.62) 4.22 (0.61) 4.28 (0.63) <0.001
Lab_K (mean (SD)) 25.92 (18.29) 24.85 (17.88) 38.34 (18.52) <0.001
Lab_L (mean (SD)) 138.27 (4.25) 138.29 (4.24) 138.07 (4.39) 0.051
Lab_M (mean (SD)) 12.28 (8.27) 12.04 (8.21) 14.98 (8.60) <0.001
Age (mean (SD)) 62.46 (16.29) 61.62 (16.12) 72.21 (15.12) <0.001
label = TRUE (%) 1588 (7.9) 0 (0.0) 1588 (100.0) <0.001
  • Perform univariable analysis and generate the result table with (unadjusted) odd ratios.
uni_table <- compute_uni_variable_table(sample_data)
kable(uni_table)
OR p value
Vital_A 1.033(1.03-1.037) <0.001
Vital_B 0.98(0.977-0.983) <0.001
Vital_C 0.963(0.958-0.968) <0.001
Vital_D 0.967(0.963-0.972) <0.001
Vital_E 1.209(1.192-1.226) <0.001
Vital_F 0.867(0.794-0.946) 0.001
Vital_G 0.897(0.875-0.92) <0.001
Lab_A 1.006(1.005-1.008) <0.001
Lab_B 1.222(1.202-1.241) <0.001
Lab_C 0.912(0.902-0.923) <0.001
Lab_D 1.208(1.163-1.254) <0.001
Lab_E 0.977(0.968-0.986) <0.001
Lab_F 0.942(0.933-0.95) <0.001
Lab_G 0.81(0.788-0.831) <0.001
Lab_H 1.815(1.734-1.9) <0.001
Lab_I 0.999(0.998-0.999) <0.001
Lab_J 1.176(1.082-1.278) <0.001
Lab_K 1.039(1.036-1.042) <0.001
Lab_L 0.988(0.976-1) 0.051
Lab_M 1.042(1.036-1.048) <0.001
Age 1.042(1.039-1.046) <0.001
  • Perform multivariable analysis and generate the result table with adjusted odd ratios.
multi_table <- compute_multi_variable_table(sample_data)
kable(multi_table)
adjusted_OR p value
Vital_A 1.032(1.027-1.037) <0.001
Vital_B 0.976(0.97-0.983) <0.001
Vital_C 0.958(0.945-0.971) <0.001
Vital_D 1.049(1.031-1.067) <0.001
Vital_E 1.153(1.133-1.173) <0.001
Vital_F 0.844(0.757-0.942) 0.002
Vital_G 0.995(0.965-1.027) 0.774
Lab_A 1.001(1-1.002) 0.184
Lab_B 1.111(1.067-1.156) <0.001
Lab_C 0.946(0.912-0.981) 0.003
Lab_D 0.774(0.728-0.823) <0.001
Lab_E 0.928(0.897-0.96) <0.001
Lab_F 1.122(1.081-1.165) <0.001
Lab_G 0.636(0.572-0.707) <0.001
Lab_H 1.615(1.526-1.71) <0.001
Lab_I 0.997(0.997-0.998) <0.001
Lab_J 0.73(0.654-0.814) <0.001
Lab_K 1.033(1.029-1.038) <0.001
Lab_L 1.04(1.005-1.077) 0.026
Lab_M 1.029(1.022-1.037) <0.001
Age 1.047(1.042-1.051) <0.001

3.2 Survival outcome

  • Compute descriptive table (usually Table 1 in medical literature) for the data with survival outcome
data("sample_data_survival")
compute_descriptive_table(sample_data_survival)
Overall FALSE TRUE p test
n 20000 5350 14650
Vital_A (mean (SD)) 85.40 (15.23) 80.14 (14.64) 87.33 (14.99) <0.001
Vital_B (mean (SD)) 119.13 (16.72) 123.03 (16.45) 117.70 (16.59) <0.001
Vital_C (mean (SD)) 61.15 (10.81) 64.47 (10.54) 59.94 (10.66) <0.001
Vital_D (mean (SD)) 78.41 (11.14) 81.41 (10.87) 77.31 (11.04) <0.001
Vital_E (mean (SD)) 18.57 (3.92) 16.46 (3.64) 19.35 (3.74) <0.001
Vital_F (mean (SD)) 36.84 (0.59) 36.88 (0.59) 36.82 (0.59) <0.001
Vital_G (mean (SD)) 97.17 (1.98) 97.49 (1.92) 97.05 (1.99) <0.001
Lab_A (mean (SD)) 138.24 (41.69) 129.87 (41.42) 141.29 (41.36) <0.001
Lab_B (mean (SD)) 14.12 (3.44) 12.40 (3.25) 14.75 (3.29) <0.001
Lab_C (mean (SD)) 24.04 (4.34) 25.25 (4.22) 23.60 (4.29) <0.001
Lab_D (mean (SD)) 1.55 (1.30) 1.30 (1.22) 1.64 (1.32) <0.001
Lab_E (mean (SD)) 104.56 (5.54) 105.11 (5.59) 104.36 (5.50) <0.001
Lab_F (mean (SD)) 32.80 (5.53) 34.05 (5.43) 32.34 (5.50) <0.001
Lab_G (mean (SD)) 11.04 (1.97) 11.62 (1.93) 10.83 (1.95) <0.001
Lab_H (mean (SD)) 2.09 (1.14) 1.52 (1.00) 2.30 (1.12) <0.001
Lab_I (mean (SD)) 229.36 (113.73) 244.73 (114.08) 223.74 (113.08) <0.001
Lab_J (mean (SD)) 4.23 (0.62) 4.19 (0.62) 4.24 (0.61) <0.001
Lab_K (mean (SD)) 25.92 (18.29) 16.56 (15.10) 29.34 (18.17) <0.001
Lab_L (mean (SD)) 138.27 (4.25) 138.40 (4.29) 138.22 (4.23) 0.008
Lab_M (mean (SD)) 12.28 (8.27) 10.27 (7.79) 13.01 (8.32) <0.001
Age (mean (SD)) 62.46 (16.29) 54.06 (15.27) 65.53 (15.56) <0.001
label_status = TRUE (%) 14650 (73.2) 0 (0.0) 14650 (100.0) <0.001
label_time (mean (SD)) 70.03 (19.27) 91.00 (0.00) 62.37 (16.97) <0.001
  • Perform univariable analysis and generate the result table with (unadjusted) hazard ratios.
uni_table_survival <- compute_uni_variable_table_survival(sample_data_survival)
kable(uni_table_survival)
OR p value
Vital_A 1.021(1.02-1.022) <0.001
Vital_B 0.988(0.987-0.989) <0.001
Vital_C 0.976(0.974-0.977) <0.001
Vital_D 0.979(0.978-0.98) <0.001
Vital_E 1.139(1.134-1.144) <0.001
Vital_F 0.904(0.879-0.929) <0.001
Vital_G 0.932(0.924-0.939) <0.001
Lab_A 1.004(1.004-1.005) <0.001
Lab_B 1.145(1.139-1.15) <0.001
Lab_C 0.945(0.942-0.949) <0.001
Lab_D 1.136(1.122-1.15) <0.001
Lab_E 0.984(0.981-0.987) <0.001
Lab_F 0.964(0.961-0.967) <0.001
Lab_G 0.876(0.868-0.883) <0.001
Lab_H 1.518(1.496-1.541) <0.001
Lab_I 0.999(0.999-0.999) <0.001
Lab_J 1.088(1.059-1.117) <0.001
Lab_K 1.027(1.026-1.028) <0.001
Lab_L 0.993(0.989-0.997) <0.001
Lab_M 1.027(1.025-1.029) <0.001
Age 1.03(1.029-1.031) <0.001
  • Perform multivariable analysis and generate the result table with adjusted hazard ratios.
multi_table_survival <- compute_multi_variable_table_survival(sample_data_survival)
kable(multi_table_survival)
adjusted_OR p value
Vital_A 1.031(1.03-1.032) <0.001
Vital_B 0.975(0.973-0.977) <0.001
Vital_C 0.955(0.951-0.959) <0.001
Vital_D 1.053(1.048-1.058) <0.001
Vital_E 1.155(1.149-1.16) <0.001
Vital_F 0.84(0.815-0.866) <0.001
Vital_G 0.99(0.981-0.999) 0.022
Lab_A 1.001(1-1.001) <0.001
Lab_B 1.115(1.103-1.127) <0.001
Lab_C 0.952(0.943-0.962) <0.001
Lab_D 0.778(0.765-0.792) <0.001
Lab_E 0.934(0.925-0.943) <0.001
Lab_F 1.143(1.131-1.155) <0.001
Lab_G 0.603(0.586-0.621) <0.001
Lab_H 1.614(1.588-1.64) <0.001
Lab_I 0.997(0.997-0.997) <0.001
Lab_J 0.737(0.715-0.76) <0.001
Lab_K 1.032(1.031-1.034) <0.001
Lab_L 1.034(1.024-1.044) <0.001
Lab_M 1.03(1.028-1.032) <0.001
Age 1.046(1.045-1.047) <0.001

3.3 Ordinal outcome

  • Compute descriptive table (usually Table 1 in medical literature) for the dataset.
data("sample_data_ordinal")
compute_descriptive_table(sample_data_ordinal)
Overall 1 2 3 p test
n 20000 16360 2449 1191
label (%) <0.001
1 16360 (81.8) 16360 (100.0) 0 ( 0.0) 0 ( 0.0)
2 2449 (12.2) 0 ( 0.0) 2449 (100.0) 0 ( 0.0)
3 1191 ( 6.0) 0 ( 0.0) 0 ( 0.0) 1191 (100.0)
Age (mean (SD)) 61.68 (18.19) 60.64 (18.37) 65.37 (16.64) 68.32 (16.18) <0.001
Gender = MALE (%) 9863 (49.3) 8109 ( 49.6) 1173 ( 47.9) 581 ( 48.8) 0.284
Util_A (%) 0.626
P1 3750 (18.8) 3082 ( 18.8) 437 ( 17.8) 231 ( 19.4)
P2 11307 (56.5) 9218 ( 56.3) 1413 ( 57.7) 676 ( 56.8)
P3 and P4 4943 (24.7) 4060 ( 24.8) 599 ( 24.5) 284 ( 23.8)
Util_B (mean (SD)) 0.93 (2.20) 0.78 (1.98) 1.40 (2.73) 1.96 (3.18) <0.001
Util_C (mean (SD)) 3.54 (8.73) 3.55 (8.77) 3.38 (7.83) 3.60 (9.90) 0.632
Util_D (mean (SD)) 2.76 (1.70) 2.80 (1.71) 2.66 (1.69) 2.49 (1.63) <0.001
Comorb_A = 1 (%) 1555 ( 7.8) 888 ( 5.4) 348 ( 14.2) 319 ( 26.8) <0.001
Comorb_B = 1 (%) 2599 (13.0) 2094 ( 12.8) 336 ( 13.7) 169 ( 14.2) 0.202
Comorb_C = 1 (%) 526 ( 2.6) 414 ( 2.5) 70 ( 2.9) 42 ( 3.5) 0.088
Comorb_D = 1 (%) 1887 ( 9.4) 1538 ( 9.4) 242 ( 9.9) 107 ( 9.0) 0.645
Comorb_E = 1 (%) 310 ( 1.6) 253 ( 1.5) 43 ( 1.8) 14 ( 1.2) 0.411
Lab_A (mean (SD)) 146.85 (199.74) 143.75 (198.13) 153.25 (196.98) 176.29 (223.44) <0.001
Lab_B (mean (SD)) 4.15 (0.68) 4.15 (0.68) 4.17 (0.69) 4.14 (0.66) 0.326
Lab_C (mean (SD)) 135.15 (4.81) 135.16 (4.81) 135.06 (4.92) 135.18 (4.47) 0.607
Vital_A (mean (SD)) 82.67 (17.10) 82.11 (16.78) 84.45 (18.40) 86.65 (17.92) <0.001
Vital_B (mean (SD)) 17.86 (1.82) 17.86 (1.81) 17.86 (1.88) 17.86 (1.84) 0.995
Vital_C (mean (SD)) 97.96 (3.26) 97.97 (3.06) 97.92 (4.07) 97.93 (3.91) 0.741
Vital_D (mean (SD)) 71.23 (13.51) 71.21 (13.48) 71.35 (13.70) 71.30 (13.49) 0.877
Vital_E (mean (SD)) 133.47 (25.27) 134.13 (25.15) 130.87 (25.45) 129.74 (25.91) <0.001
Vital_F (mean (SD)) 22.82 (3.53) 22.86 (3.46) 22.73 (3.76) 22.36 (3.94) <0.001
  • Perform univariable analysis and generate the result table.
  • By default the (unadjusted) odds ratio from the commonly used proportional odds models are reported for link = "logit". If other link functions are selected (i.e., "cloglog" link corresponding to the proportional hazards model, or the "probit" link), the exponentiated coefficients are reported.
Important

Use the same link parameter throughout descriptive analysis and model building steps.

link <- "logit"
uni_table_ordinal <- compute_uni_variable_table_ordinal(sample_data_ordinal, link = link)
kable(uni_table_ordinal)
OR p value
Age 1.019 (1.017 - 1.021) <0.001
GenderMALE 0.948 (0.883 - 1.019) 0.147
Util_AP2 1.040 (0.946 - 1.145) 0.421
Util_AP3 and P4 0.998 (0.894 - 1.115) 0.974
Util_B 1.136 (1.121 - 1.153) <0.001
Util_C 0.999 (0.994 - 1.003) 0.564
Util_D 0.929 (0.908 - 0.950) <0.001
Comorb_A1 4.154 (3.737 - 4.616) <0.001
Comorb_B1 1.099 (0.989 - 1.218) 0.076
Comorb_C1 1.236 (0.997 - 1.520) 0.049
Comorb_D1 1.017 (0.899 - 1.147) 0.791
Comorb_E1 0.994 (0.739 - 1.315) 0.969
Lab_A 1.000 (1.000 - 1.001) <0.001
Lab_B 1.010 (0.958 - 1.064) 0.717
Lab_C 0.998 (0.990 - 1.005) 0.534
Vital_A 1.010 (1.008 - 1.012) <0.001
Vital_B 0.999 (0.980 - 1.019) 0.956
Vital_C 0.996 (0.986 - 1.007) 0.451
Vital_D 1.001 (0.998 - 1.003) 0.622
Vital_E 0.994 (0.993 - 0.996) <0.001
Vital_F 0.979 (0.969 - 0.989) <0.001
  • Perform multivariable analysis and generate the result table, with adjusted odd ratios from a proportional odds model by default.
multi_table_ordinal <- compute_multi_variable_table_ordinal(sample_data_ordinal, link = link)
kable(multi_table_ordinal)
adjusted_OR p value
Age 1.020 (1.018 - 1.023) <0.001
GenderMALE 0.944 (0.876 - 1.017) 0.128
Util_AP2 1.020 (0.924 - 1.127) 0.695
Util_AP3 and P4 0.970 (0.865 - 1.088) 0.603
Util_B 1.144 (1.127 - 1.160) <0.001
Util_C 0.998 (0.994 - 1.002) 0.390
Util_D 0.924 (0.902 - 0.945) <0.001
Comorb_A1 4.497 (4.034 - 5.011) <0.001
Comorb_B1 1.057 (0.948 - 1.177) 0.317
Comorb_C1 1.287 (1.032 - 1.593) 0.023
Comorb_D1 1.039 (0.915 - 1.177) 0.552
Comorb_E1 0.963 (0.706 - 1.291) 0.808
Lab_A 1.000 (1.000 - 1.001) <0.001
Lab_B 1.007 (0.953 - 1.063) 0.808
Lab_C 0.995 (0.988 - 1.003) 0.234
Vital_A 1.011 (1.009 - 1.013) <0.001
Vital_B 1.004 (0.984 - 1.024) 0.708
Vital_C 0.995 (0.985 - 1.007) 0.388
Vital_D 1.001 (0.998 - 1.003) 0.654
Vital_E 0.993 (0.992 - 0.995) <0.001
Vital_F 0.978 (0.968 - 0.988) <0.001