Wednesday, March 7, 2012

How important is specifying the variable distributions ?

I am using using Decision Trees, Naive Bayes, and Neural Net to predict customer profitability. Unfortunately, none of these are able to make good predictions based on our data.

How much improvement is to be gained by specifying in the mining model the distribution of the input variables ?

Really, not that much. If NN and DT aren't predicting profitability very well, I would look into what input attributes you are using and how you are defining profitability. For example is it a boolean? Profitable vs. not profitable? Is it just a continuous value - e.g. "How much profit will I gain?". Are there derived variables you can create to help out - e.g. total number of product purchases, average number of purchases per transaction, etc.|||

Thanks, Jamie.

To answer your questions, we have bucketed total profitability, and are trying to predict the bucket.

We tend to think, as you suggested, that either we aren't using the correct set of inputs, or else that profitability just isn't predictable from the data we have available.

No comments:

Post a Comment