# Econometric Python Lab Assignment 1

## # 1. 题目

1. (0 points) Please type your code and answers into Jupyter notebook. All visualizations should be prop-erly labelled. Submit the notebook as a pdf.

2. (3.5 points) Use the bwght dataset from the Wooldridge python module to answer the following question. You can find the documentation for the data online here. Import this data into your notebook.

(a) (1.5 Points) How many women are in the sample? What proportion of women with a family income higher than $50,000 are smokers? What proportion of women with a family income less than$20,000 are smokers?

(b) (1 Points) Generate a table of summary statistics for the dataframe. What is the average number of cigarettes smoked in a day? Is the mean a good measure of the typical women’s smoking habits? If no, explain why and if there is a better measure.

(c) (1 Points) Find the mode of fatheduc in the sample. Why are only 1,192 observations used to compute this statistic?

3. (5.5 points) Use the bwght dataset from the Wooldridge python module to answer the following question.

(a) (1 point) Generate two different histograms of bwght using Sturge’s and FD binning methods. Explain the strengths and weaknesses of each method when applied to bwght.

(b) (1 point) Create a histogram of bwght using either sturges or fd to choose the number of bins. Overlay a density curve.

(c) (2 points) Using a q-q plot, do you believe bwght is approximately normally distributed? Why are why not? What about family income?

(d) (1.5 points) Create a boxplot conditioning on whether or not the mother was a smoker. Do you observe any differences? If so, what are they?

4. (6 points) Use the bwght dataset from the Wooldridge python module to answer the following question.

(a) (2 points) Estimate the parameters for the following simple regression:

$\large \hat{bwght} = \hat{\beta_0} + \hat{\beta_1} \times packs$

report the intercept and slope. What do these tell you about the association between cigarette use and birth weight?

(b) (2 points) What is the predicted value of birthweight when packs = 0? When packs = 2? What is the interpretation of the intercept?

(c) (1 point) Verify the residuals of this regression sum (approximately) to zero.

(d) (1 point) Using a scatter plot, show the observed values against the values predicted by a regression.

## # 2. Solution

### # a.

$\large \text{比例} = \frac{\text{家庭收入低于\20,000且吸烟的女性数}}{\text{家庭收入低于\20,000的女性总数}}$

low_income_smokers = len(bwght[(bwght['faminc'] < 20) & (bwght['cigs'] > 0)]) / len(bwght[bwght['faminc'] < 20])


