# Abstracts

### A logical difficulty with regression analysis: the estimation of non-existent parameters

**Robin Willink**

Industrial Research Ltd.

‘All models are wrong’ (Box), so any model used in a regression problem can only provide an approximation to
the unknown function *f*(*x*). Therefore, the parameters of the model do not all represent quantities that actually
exist and the quantities ‘estimated’ by the calculated regression coefficients are not all properly defined. So
‘parameter estimation’ is a misnomer and the values of the parameters are actually ‘chosen’. Furthermore,
confidence intervals and credible intervals often quoted for the non-existent quantities have no legitimate
meaning.

We describe this logical problem in the context of univariate linear regression. Subsequently, we identify
quantities that do actually exist and are efficiently estimated by the ordinary least-squares coefficients. The
problem of genuine interest is often the estimation of *f*(*x*), not the choice of values for the parameters of some
approximating function. So we also present a method of estimating *f*(*x*) that takes some account of the error
incurred by choosing a model. Lastly, we identify other misleading terminology in mathematics and statistics.

**Session 1a**, Statistical Methodology: 10:50 — 11:10, Room 446

Presentation Program