Friday, March 9, 2012

How linear regression choose his regressor ?

I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.

I think that it is different from the common methods used in statistics like stepwise, forward or backward.

Laura Lerner

When you build the model using the wizard, all continuous input columns are flagged as regressors.

You can check this by looking at the Modeling Flags property of each input column in the designer.

(edit) Jamie points out to me that the modeling flags only indicate potential regressors. The actual regressors are determined by the internals of the algorithm. We have no public documentation available on that process, but we can aim to provide more detail in future docs or whitepapers.

|||

I would like to know how the algorithm works, whether there are parameters I can change to influence the results.

Laura Lerner

|||

Laura Lerner wrote:

I would like to know how the algorithm works, whether there are parameters I can change to influence the results.

Laura Lerner

If you wish to make a specific input column a regressor for linear regression, you can do that by changing the Modeling Flag property of that column to Regressor.

I am wondering if there is a specific reason that you would want to influence the choice of regressors rather than explicitly setting them. That would be good to know, to inform our future plans.

Thanks

|||

For my purposes I would like to see the p values associated with the regressors so that I can decide which variables to include or not based on them, myself. Furthermore, in order not to run into a problem of multicollinearity it would be useful to see a correlation matrix as part of regression analysis and computed Variance Inflation Factors would definitely help as well. If we are providinga wish listSmile, also at least a graph of samples of residuals would be great to have.

Best Regards,

|||

Thanks! I am quite happy to take wishes. Whether or when I can grant them is a different matter, but the more information we have on what people want - and, very importantly, why - the better.

I have had a number of requests for computed Variable Inflation Factors.

Feel free to contact me offline donald_dot_farmer_at_microsoft_dot_com if you want to discuss potential features and business cases in more detail.

Thanks

|||You can use the FORCE_REGRESSOR parameter to guarantee that the algorithm will use a particular regressor, regardless of the algorithm used to determine applicability. The target of FORCE_REGRESSOR does have to be marked as a regressor though.|||

I am building several automatic linear regressions using code and not the interface.

I give to the model hundreds variables it can choose and the result is a linear regression with very few variables.

I would like first to understand how it chooses the variables and whether there are parameters that I can use to influence the number of variables the linear regression will choose.

I used for many year SAS and there I know that I can use parameters like the significance level, number of variables to include, etc...

Laura Lerner

No comments:

Post a Comment