Hi,
I have just run a simple data set through a model to predict a simple true or false value (i.e. binary output)
The Lift Chart/Mining Legend in Analysis Services shows three results – Score, Population Correct (%), and Predict Probability (%)
Population Correct I beleive is the percentage of predictions it got right out of the total number of predictions it tried to make. Is this correct?
However, I can’t work out how the other two are derived in particular the 'SCORE'. To give a live example the scores were as follows:
Model Score Pop Correct Pred Probability
Decision Trees 0.83 76.59% 54.28%
Neural Network 0.75 67.63% 50.05%
Ideal Model 100.00%
Can anyone help with this and give a detailed explanation?
Many thanks,
S Rajput
Hi
The Predict Probability is the probability of the most popular prediction state for the model.
The Score is the log scatter score for a scatter plot of (x=Actual Value,y=Predicted Value) where each point has an associated probability. There are a lot of examples on the mathematical series for this calculation which you can look up.
Hope this helps.
Shuvro
|||So does a larger score mean a better fit for the model? How is a scatter score calculated?
|||Here're some more details about the scatter score calculation:
This score is the (geometric) mean score of all the points constituting the scatter plot.
Here is how it works.
0) Each point on a scatter plot corresponds to a test case and has the form (a,b(M)), where a is the actual attribute value for the case and b(M) is the value predicted using the model M;
1) First define the score(a,b(M)) for *one* individual scatter point; to do this compare the (a,b(M)) to the best prediction we can do without using any model; that prediction is of course marginalMean; so
score(a,b(M)) = likelihood ( b(M), given a, given M) / likelihood( marginalMean, given a)
2) Then average out all the scores across the entire scatter plot;
Technically is was simpler to average them out as
score = ( Product[ score(a,b) | forall (a,b) in the scatter plot])^(1/n), where n is the number of points in the scatter plot.
So that’s how it is currently done.
Details:
The statistical meaning of the individual point score is the predictive lift for the model M measured at that point. It is essentially the same notion as the Predict Likelihood fraction for the case contributed by the continuous attribute for which the scatter plot has been built.
The specific formula for this is score(a,b) = pdf( N(a, predictStdev), b) / pdf( N(a, marginalStdev), marginalMean)
where N(mu, sigma) is the normal distribution with mean of mu and Stdev sigma and
pdf(N(mu, sigma), x) is its probability density function described by exp((x-mu)^2/(2*sigma))/sqrt(2Pi)/sigma.
No comments:
Post a Comment