Computer Vision News - April 2020

2 Summary Code With Us 16 This figure from Christoph’s Molnar book (licenced under the Creative Commons License) shows the PDP of two features at once. The cancer probability and the interaction of age and number of pregnancies. As carefully noted in the example, even if a correlation exists that may not be causal. You can see on the plot that for ages below 25 and for women with less than 2 pregnancies the predicted cancer risk is much lower. The main advantage is that PDPs perfectly represent how the feature influences the prediction on average if the feature for which you computed the PDP is not correlated with the other features. A disadvantage is that the maximum number of features in a partial dependence function is only two. OK, let’s now go back to our heart model! The code that helps plotting the PDP for the feature of number of major vessels is: base_features = dt.columns.values.tolist() base_features.remove('target') feat_name = 'num_major_vessels' pdp_dist = pdp.pdp_isolate(model=model, dataset=X_test, model_features=base_features, feature=feat_name) pdp.pdp_plot(pdp_dist, feat_name) plt.show() Which produces the following plot, showing that, as the number of major blood vessels increases, the probability of heart disease decreases. This helpful approach to the plot can be seen on the model of Rob Harrand using data from the Heart Disease UCI.

RkJQdWJsaXNoZXIy NTc3NzU=