跳至主要內容

Statistics 2 Computer Practical 2

AI悦创原创2023年11月7日英国-布里斯托尔University of Bristol大约 12 分钟...约 3536 字

1. Instructions

  1. Reports should be handed in as a single PDF file using Blackboard, by noon on the due date. RMarkdown, image, word, zip files, for example, will not be marked.
  2. You can work alone or in a group with up to two other people.
  3. One person per group should hand in a report. The names and student numbers of all group members must be on the first page.
  4. Your answers should combine text and code snippets in R. It is recommended that you use RMarkdown to prepare your reports, since this is typically easier for students, but this is not mandatory.
  5. You must explain what you are doing clearly to obtain full marks on each question. You can use comments (which start with #) to annotate your code. Mathematical derivations can be written using LaTeX commands in RMarkdown or on paper, with a photo then appended to the end of the PDF being submitted.
  6. This practical counts for 10% of your assessment for Statistics 2.

2. Populations of world cities

The data is population sizes of the world's cities and towns, according to the World Cities Database.

To load the raw data and remove missing (NA) entries, we run the following commands.

raw.data <- read.csv("worldcities.csv")
raw.populations <- raw.data$population
populations <- raw.populations[!is.na(raw.populations)]
sum(populations)
## [1] 5076372338

There are around 5.076 billion people counted, which is off by about 3 billion since there are about 8 billion people in the world. In any case we can try to understand the distribution of population sizes for the data we do have. Now we remove populations under 10000.

populations <- populations[populations>=10000]
length(populations)
## [1] 38615
sum(populations)
## [1] 5030436026

We can see that removing the smaller towns still leaves us considering around 5.03 billion people, and we have observations of population sizes from 38615 cities.

To finish cleaning the data, we now divide populations by 10000 and subtract 1 so that the values start from 0 and we essentially counting population sizes in units of 10000.

populations <- populations/10000-1

3. Visualizing the data

The data have a large range, and so visualizing a density estimate on the original scale is not very informative. Instead, and purely for the purpose of visualization, we can plot the density of the logarithms of the observations.

plot(density(log(populations)))

4. Model

We will model populations as realizations of independent and identically distributed random variables from a member of the family of distributions with density

f(x;θ)=aba(x+b)a+1I(x0), \begin{equation} f(x;\theta) = \frac{a b^a}{(x+b)^{a+1}}\mathbb{I}(x \geq 0), \end{equation}

with two-dimensional statistical parameter θ=(a,b)(0,)2\theta=(a,b) \in (0,\infty)^2 . This is a regular statistical model.

This PDF is non-negative and integrates to 11 because for any b>0b>0 andtRt \in \mathbb{R}, we have

01(x+b)tdx={1(t1)bt1t>1t1. \begin{equation} \int_{0}^{\infty}\frac{1}{(x+b)^{t}}{\rm d}x= \begin{cases} \frac{1}{(t-1)b^{t-1}} & t>1 \\ \infty & t\leq 1. \end{cases} \end{equation}

We shall write XDist(a,b)X \sim {\rm Dist}(a,b) to mean that XX is a random variable with density (1). To simulate a vector of realizations of independent Dist(a,b){\rm Dist}(a,b) random variables, we canuse the following code. If you are interested, this is an application of inverse transform sampling.

rdist <- function(n, a, b) {
  u <- runif(n)
  (u^(-1/a)-1)*b
}

To evaluate the PDF at each point in a vector, we can use the following code. If the log argument is true then the log density is returned rather than the density, as is usual in R code.

ddist <- function(xs,a,b,log=FALSE) {
  if (log) {
    log(a) + a*log(b) - (a+1)*log(xs+b)
  } else {
    a*b^a/(xs+b)^(a+1)
  }
}

5. Estimation

Using (2) it is quite easy to show that the first and second moments of XDist(;θ)X \sim {\rm Dist}(\cdot ;\theta) are

E(X;θ)={ba1a>1,a1. \begin{align*} \mathbb{E}(X;\theta) = \begin{cases} \frac{b}{a-1} \qquad & a>1, \\ \infty & a \leq 1. \end{cases} \end{align*}

and

E(X2;θ)={2b2(a1)(a2)a>2,a2. \begin{align*} \mathbb{E}(X^2;\theta) = \begin{cases} \frac{2b^2}{(a-1)(a-2)} \qquad & a>2, \\ \infty & a \leq 2. \end{cases} \end{align*}

Question 1. [2 marks] Derive the method of moments estimators of aaand bb. Report the estimates for this dataset.

I suggest that you write a function mom.estimate that takes as input some data and returns a vector containing the estimated values of aaand bb. For example, something like this:

mom.estimate <- function(xs) {
  ## your code here
  a.hat <- 0 # obviously wrong
  b.hat <- 0 # obviously wrong
  c(a.hat, b.hat)
}

--- Solution ---

Your solution here.

--- End of Solution ---

You can check if your estimate makes sense by considering a large dataset with known parameters.

fake.data <- rdist(100000, 10, 20)
fake.theta.hat <- mom.estimate(fake.data)
fake.theta.hat
## [1] 0 0

Having computed the method of moments estimate for the populations data, we can visualize the fitted density on the log scale. Even if you have computed the estimates properly, the fit should not look great. As you go through the rest of the practical, you may be able to guess why this is.

On the logarithmic scale, we can plot the fitted density by using the transformed PDF suggested by Exercise 2.1 of the lecture notes, i.e. if Y=log(X)Y=\log(X) and fXf_X is the PDF for XX then the PDF for YY is

fY(y)=fX(exp(y))exp(y). f_Y(y) = f_X(\exp(y)) \exp(y).

theta.mom <- mom.estimate(populations)
plot(density(log(populations)),ylim=c(0,0.3))
zs <- seq(-10,10,0.01)
lines(zs,ddist(exp(zs),theta.mom[1],theta.mom[2])*exp(zs),col="red")

Question 2. [2 marks] Show that for fixed bb, the log-likelihood is maximized by taking

a=ni=1n{log(xi+b)log(b)} \begin{equation} a=\frac{n}{\sum_{i=1}^n \{ \log (x_i + b) - \log(b) \} } \end{equation}

--- Solution ---

Your solution here.

--- End of Solution ---

The log-likelihood cannot be maximized analytically, so we will use a numerical optimization procedure. In particular, we will make use of thefact that we can maximize a function of bb only since we know that for any bb the maximizing aa is given by a=ni=1n{log(xi+b)log(b)}a=\frac{n}{\sum_{i=1}^n \{ \log (x_i + b) - \log(b) \} }.

ml.estimate <- function(xs) {
  n <- length(xs)
  get.a <- function(b) {
    n/(sum(log(xs+b))-n*log(b))
  }
  ell <- function(b) {
    lb <- log(b)
    a <- get.a(b)
    la <- log(a)
    n*la+a*n*lb-(a+1)*sum(log(xs+b))
  }
  out <- optim(1, ell, control = list(fnscale=-1), method="Brent", lower=0, upper=100)
  c(get.a(out$par),out$par)
}

Now we can visualize the fitted density using the ML estimate. It looks pretty good!

theta.ml <- ml.estimate(populations)
plot(density(log(populations)))
zs <- seq(-10,10,0.01)
lines(zs,ddist(exp(zs),theta.ml[1],theta.ml[2])*exp(zs),col="red")

6. Confidence Intervals

In lectures we have seen that for regular statistical models with a one-dimensional parameter θ\theta, the ML estimator θ^n\hat{\theta}_n is asymptotically normal with

nI(θ)(θ^nθ)D(;θ)ZN(0,1). \sqrt{nI(\theta)}(\hat{\theta}_n - \theta) \to_{\mathcal{D}(\cdot;\theta)} Z \sim N(0,1).

This convergence in distribution justifies the construction of Wald confidence intervals for θ\theta.

In this computer practical, the statistical model has a 22-dimensional parameter. Under appropriate regularity assumptions, the ML estimator θ^n\hat{\theta}_n is asymptotically (multivariate) normal in the sense that

n(θ^nθ)D(;θ)WN(0,I(θ)1), \sqrt{n}(\hat{\theta}_{n}-\theta) \to_{\mathcal{D}(\cdot;\theta)} W \sim N(0,I(\theta)^{-1}),

where I(θ)I(\theta) is the Fisher information matrix

I(θ)=E[2(θ;X1);θ], I(\theta) = - \mathbb{E}[\nabla^2 \ell(\theta; X_1) ; \theta],

and the expectation is taken element-wise. That is, the ijij th entry of I(θ)I(\theta) is the negative expectation of the corresponding ijij th second order partial derivative of the log-likelihood associated with 1 observation. Notice that WW is a two-dimensional normal random vector with mean 00 and variance-covariance matrix I(θ)1I(\theta)^{-1}.

One can deduce from this multivariate asymptotic normality that for j{1,,2}j \in \{1,\ldots,2\},

n(θ^n,jθj)D(;θ)W1N(0,(I(θ)1)jj), \sqrt{n}(\hat{\theta}_{n,j}-\theta_{j}) \to_{\mathcal{D}(\cdot;\theta)} W_1 \sim N(0,(I(\theta)^{-1})_{jj}),

which we can rewrite as

n(I(θ)1)jj(θ^n,jθj)D(;θ)ZN(0,1), \sqrt{\frac{n}{(I(\theta)^{-1})_{jj}}} (\hat{\theta}_{n,j}-\theta_{j}) \to_{\mathcal{D}(\cdot;\theta)} Z \sim N(0,1),

where θ^n,j\hat{\theta}_{n,j} denotes the jj th component of θ^n\hat{\theta}_n and θj\theta_j denotes the jj th component of θ\theta.

Notice that (I(θ)1)jj(I(\theta)^{-1})_{jj} is the jj th diagonal entry of the inverse of the Fisher information matrix, and is not in general equal to (I(θ)jj)1(I(\theta)_{jj})^{-1}, the inverse of the jj th diagonal entry of the Fisher information matrix1matrix^1.In R you can compute numerically the inverse of a matrix using the solve command.

As in the one-dimensional parameter case, one can replace I(θ)I(\theta) with I(θ^n)I(\hat{\theta}_n) if θ^n\hat{\theta}_n is a consistent sequence of estimators of θ\theta to obtain

n(I(θ^n)1)jj(θ^n,jθj)D(;θ)ZN(0,1), \sqrt{\frac{n}{(I(\hat{\theta}_n)^{-1})_{jj}}} (\hat{\theta}_{n,j}-\theta_{j}) \to_{\mathcal{D}(\cdot;\theta)} Z \sim N(0,1),

Question 3. [2 marks] Show that the Fisher information for one observation is

I(θ)=(1a21b(a+1)1b(a+1)ab2(a+2)), I(\theta)=\left(\begin{array}{cc} \frac{1}{a^{2}} & -\frac{1}{b(a+1)}\\ -\frac{1}{b(a+1)} & \frac{a}{b^{2}(a+2)} \end{array}\right),

and compute the Fisher information for θ=θ^ML\theta = \hat{\theta}_{\rm ML}.

I recommend that you write a function to compute the Fisher information for any value of θ\theta, structured like this:

info <- function(theta) {
  a <- theta[1]
  b <- theta[2]
  # your code
  i11 <- 1 # obviously wrong
  i12 <- 0 # obviously wrong
  i21 <- 0 # obviously wrong
  i22 <- 1 # obviously wrong
  mtx <- matrix(c(i11,i21,i12,i22),2,2)
  return(mtx)
}

--- Solution ---

Your solution here.

--- End of Solution ---

Question 4. [2 marks] Compute and report the endpoints of the observed 98% Wald confidence intervals for aa and bb individually.

I recommend that you write a function that computes the observed confidence intervals' end points, which should call the info function.

compute.CI.endpoints <- function(xs,alpha) {
  # your code
  L.a <- 0 # obviously wrong
  U.a <- 1 # obviously wrong
  L.b <- 0 # obviously wrong
  U.b <- 1 # obviously wrong
  list(a=c(L.a,U.a),b=c(L.b,U.b))
}

You may find it helpful to know that you can use the solve function to compute the inverse of a matrix.

--- Solution ---

Your solution here.

--- End of Solution ---

Question 5. [2 marks] Using many simulations, approximate the coverage of the asymptotically exact 1α1-\alpha confidence intervals corresponding to your answer to Question 3, when (a,b)=(0.9,1.28)(a,b) = (0.9,1.28) and α=0.02\alpha=0.02, for n{10,100,1000}n \in \{10,100,1000\}. Comment on the coverageof the confidence intervals and explain in words why the approximate coverage you report is approximate.

I recommend that you write a function

approximate.coverage <- function(n,a,b,alpha,trials) {
  # your code
  coverage.a <- 0 # hopefully wrong
  coverage.b <- 0 # hopefully wrong
  c(coverage.a,coverage.b)
}

--- Solution ---

Your solution here.

--- End of Solution ---

7. Epilogue

This computer practical involves a slightly more complicated distribution for the data than the standard distributions withone-dimensional statistical parameters typically used for communicating the basic ideas. The use of more sophisticated models is important when applying statistical ideas to real world data. As in this case, it is not uncommon that the ML estimate has to be computed numerically.Similarly, most real-world models have several statistical parameters,and the use of two parameters here is therefore a gentle introduction to what you would do in more realistic statistical analyses.

demo1

要进行 Fisher 信息矩阵的详细数学证明,我们需要从分布的定义开始,即:

对于给定的密度函数 f(x;θ)=aba(x+b)a+1I(x0)f(x;\theta) = \frac{a b^a}{(x+b)^{a+1}}\mathbb{I}(x \geq 0) ,我们需要找到对数似然函数,然后计算其关于参数 aabb 的一阶和二阶导数。接着,计算这些导数的期望值,以得到 Fisher 信息矩阵的各个元素。

以下是详细的步骤:

步骤 1: 对数似然函数

对数似然函数为单个观察值的 f(x;θ)f(x;\theta) 对数:

(θ;x)=log(f(x;θ))=log(a)+alog(b)(a+1)log(x+b) \ell(\theta; x) = \log(f(x;\theta)) = \log(a) + a\log(b) - (a+1)\log(x+b)

步骤 2: 一阶导数(偏导数)

aa 的偏导数为:

(θ;x)a=1a+log(b)log(x+b) \frac{\partial \ell(\theta; x)}{\partial a} = \frac{1}{a} + \log(b) - \log(x+b)

bb 的偏导数为:

(θ;x)b=ab(a+1)x+b \frac{\partial \ell(\theta; x)}{\partial b} = \frac{a}{b} - \frac{(a+1)}{x+b}

步骤 3: 二阶导数

aa 的二阶导数为:

2(θ;x)a2=1a2 \frac{\partial^2 \ell(\theta; x)}{\partial a^2} = -\frac{1}{a^2}

对 ( b ) 的二阶导数为:

2(θ;x)b2=ab2+(a+1)(x+b)2 \frac{\partial^2 \ell(\theta; x)}{\partial b^2} = -\frac{a}{b^2} + \frac{(a+1)}{(x+b)^2}

aabb 的交叉二阶导数为:

2(θ;x)ab=2(θ;x)ba=1b+1x+b \frac{\partial^2 \ell(\theta; x)}{\partial a \partial b} = \frac{\partial^2 \ell(\theta; x)}{\partial b \partial a} = -\frac{1}{b} + \frac{1}{x+b}

步骤 4: 二阶导数的期望值

我们需要找到上述二阶导数的期望值。对于 Fisher 信息矩阵,我们需要负的期望值。

aa 的期望二阶导数:

E[2(θ;X)a2]=1a2 \mathbb{E}\left[-\frac{\partial^2 \ell(\theta; X)}{\partial a^2}\right] = \frac{1}{a^2}

bb 的期望二阶导数(因为第二项中的 (X+b)2(X+b)^2XX 上的期望并不容易计算,我们只能使用 bb 的积分来近似):

E[2(θ;X)b2]ab2 \mathbb{E}\left[-\frac{\partial^2 \ell(\theta; X)}{\partial b^2}\right] \approx \frac{a}{b^2}

aabb 的期望交叉二阶导数:

E[2(θ;X)ab]=1b(a+1) \mathbb{E}\left[-\frac{\partial^2 \ell(\theta; X)}{\partial a \partial b}\right] = \frac{1}{b(a+1)}

由于 XX 的分布是未知的,上述期望值的精确计算可能需要更复杂的积分和概率计算。

步骤 5: 构造Fisher信息矩阵

最后,Fisher 信息矩阵 I(θ)I(\theta) 可以表示为:

I(θ)=(1a21b(a+1)1b(a+1)ab2) I(\theta)=\left(\begin{array}{cc} \frac{1}{a^2} & -\frac{1}{b(a+1)}\\ -\frac{1}{b(a+1)} & \frac{a}{b^2} \end{array}\right)

公众号:AI悦创【二维码】

AI悦创·编程一对一

AI悦创·推出辅导班啦,包括「Python 语言辅导班、C++ 辅导班、java 辅导班、算法/数据结构辅导班、少儿编程、pygame 游戏开发、Web、Linux」,全部都是一对一教学:一对一辅导 + 一对一答疑 + 布置作业 + 项目实践等。当然,还有线下线上摄影课程、Photoshop、Premiere 一对一教学、QQ、微信在线,随时响应!微信:Jiabcdefh

C++ 信息奥赛题解,长期更新!长期招收一对一中小学信息奥赛集训,莆田、厦门地区有机会线下上门,其他地区线上。微信:Jiabcdefh

方法一:QQ

方法二:微信:Jiabcdefh

你认为这篇文章怎么样?
  • 0
  • 0
  • 0
  • 0
  • 0
  • 0
评论
  • 按正序
  • 按倒序
  • 按热度
通知
关于编程私教&加密文章