Ma 3/103:  Introduction to Probability and Statistics
Winter 2016-17
MWF 10:00 AM // Baxter Lecture Hall

## ANNOUNCEMENTS

### Final Correction

I have posted a correction to the final exam. The likelihood function for proble two shoul read: $$n_0 \ln\bigl(\pi + (1-\pi) e^{-\mu}\bigr) + (n-n_0) \ln(1-\pi) - (n-n_0) \mu + \ln(\mu) \sum_{i=1}^n k_i - \sum_{i=1}^n \ln k_i !$$.

Note that the coefficen on $$\sum_{i=1}^n k_i$$ is $$\ln(\mu)$$ and not $$\mu$$. It makes a big difference. Thanks to Ethan for catching it early.

### Final and Review Session

The final exam will be posted online around noon on Friday, March 10, and be due Thursday, March 16, by 4:00 pm. There will be a review session hosted by one of the TAs on Saturday, March 11, at 1:00 pm in Baxter Lecture Hall.

The exam will concentrate on statistics, and you will need access to a computer with decent software. Because programming can be frustrating, there will be no time limit on the exam. But it shouldn't take more than a few hours.

## COURSE DESCRIPTION

Introduction to the fundamental ideas and techniques of probability theory and statistical inference.

Probability will be covered in the first half of the term (using Pitman) and statistics (using Larsen and Marx) in the second half (see below for information regarding textbooks). Main topics covered are:

• Properties of probability
• Independence, conditional probability, Bayes' Law
• Random variables, distributions, densities, and expectation
• Joint distributions, marginals, covariance, correlation
• The Law of Large Numbers
• The Central Limit Theorem
• Order statisitics
• Important distributions
• Bernoulli, Binomial
• Uniform
• Normal (Gaussian)
• Exponential, Poisson
• Gamma, Beta, Chi-square
• Conjugate prior/posterior pairs
• Introduction to stochastic processes
• Random walk
• Markov chains
• Martingales
• Estimation of parameters
• Consistency, unbiasedness
• Maximum likelihood estimation
• Confidence intervals
• Cramér–Rao lower bound
• Testing statistical hypotheses
• Significance tests
• Likelihood ratio tests
• Monotone Likelihood Ratio Property and the Neyman–Person Lemma
• Type I and Type II errors
• Power and assurance
• Critical values
• Specification tests
• Kolmogorov–Smirnov
• chi-square test, Fisher's exact test
• Linear regression analysis
• Gauss Markov–Theorem
• ANOVA
• Nonparametric tests
• Wilcoxon, Mann–Whitney, Kruskal–Wallis
• Spearman rank correlation
• Introduction to Bayesian approaches

## PREREQUISITES

Ma 1abc. In addition, some familiarity with a scientific computing language or program (e.g., Mathematica, Matlab, NumPy, Octave, R) is assumed.

## SCHEDULE

MWF 10:00 - 10:55, Baxter Lecture Hall.

## Contact Information

Instructor:
Kim Border, 205 Baxter Hall, x4218, kcborder@caltech.edu
Office Hours:
Fridays, 1:30–3:00 pm
William Chan, wcchan@caltech.edu
Office Hours:
Saturday, 1 - 2 pm
160 Sloan
Course Secretary:
Meagan Heirwegh, 253 Sloan, x4335, heirwegm@caltech.edu

## TA's

 Section 1 - Liyang Yang Office Hours: Sundays, 8 - 9 PM 155 Sloan, 626-395-4081 Thursdays, 09:00 AM153 Sloan Section 2 - William Chan Office Hours: Saturday, 1 - 2 PM 160 Sloan, 626-395-4324 Thursdays, 9:00 AM159 Sloan Section 3 - William Chan Office Hours: Saturday, 2 - 3 PM 160 Sloan, 626-395-4324 Thursdays, 10:00 AMB127 GCL Section 4 - Hélène Rochais Office Hours: Fridays, 6 - 7 PM 155 Sloan, 626-395-4081 Thursdays, 10:00 AM269 Lauritsen Section 5 - Jack Tao Office Hours: Fridays, 5 - 6 PM 155 Sloan, 608-338-2788 Thursdays 10:00 AM102 Steele Section 7 - Jane Panangaden Office Hours: Mondays, 5 - 6 PM 155 Sloan, 626-395-4081 Thursdays, 1:00 PM153 Sloan Section 8 - Juhyun Kim Office Hours: Saturdays, 3 - 4 PM 155 Sloan, 626-395-4081 Thursdays, 1:00 PMB127 GCL Section 9 - Lingfei Yi Office Hours: Sundays, 5 - 6 PM 155 Sloan, 626-395-7172 Thursdays, 2:30 - 3:25 PM115 BCK

## POLICIES

Late Work:

As a rule, late work is not accepted. This is to protect the TAs, who are talented hardworking students, just as you are. At the discretion of the Head TA, late homework turned in the day it is due, but after the 4:00 pm deadline will be accepted with a 25% penalty. If there are extenuating circumstances, you must notify the Head TA by midnight the night before it is due and you must get a noite from the from the Dean supporting the extension. As partial compensation, your lowest homework score will be discarded.

Your course grade will be based on the weekly homework (40%), the midterm (25% or 35%), and the final (35% or 25%). The weights on the final and midterm will put the greater weight on the better exam. In computing the homework average, your lowest homework score will be dropped. (Since homework assignments vary by weight, a modified Kazatkin algorithm will be used to determine which score to drop.)

This year I am continuing the following practice. Each assignment will contain zero or more optional exercises. They are optional in the following sense: Grades will calculated without taking the optional exercises into account, but the maximum grade will be an A. If you want an A+, you will have to earn an A and also accumulate sufficiently many optional points. No collaboration is allowed on optional exercises.

As this course is for a letter grade, no one will be excused from the final.

Homework:

Homework will be typically be due at 4:00 pm on Mondays in the appropriate homework box outside 253 Sloan. (If Monday is a holiday [which happens twice this term] homework will be due on Tuesday. Assignment 0 is a major exception.) Problems (and later solutions) will be posted on this course webpage. You are encouraged to start the homework well in advance of the due date in order not to risk missing the deadline. Homework is turned in to locked boxes, so it can safely be submitted as soon as it is completed.

Collaboration:

Collaboration is allowed on the homework, but your write-up must be in your own words and may not be copied. The exception is that no collaboration is allowed on optional exercises. Collaboration is not allowed on the exams. Please ask for clarification if anything is unclear.

*Information is subject to change*

## TEXTBOOKS

The required textbooks for the course are:

• . 1993. Probability. Springer, New York, Berlin, and Heidelberg. ISBN: 0-387-97974-8.
• . 2012. An Introduction to Mathematical Statistics and Its Applications, fifth edition. Prentice Hall. ISBN: 0-321-69394-9.

There will be additional readings from time to time, either as handouts or articles available on line.

There are other books that you may find useful for this course or perhaps later in life. Here are, in no particular order, some of my recommendations.

• . 2015. Probability and Statistical Inference. Pearson, Boston. ISBN: 978-0-321-92327-1. This is a nicely written introduction that I am evaluating to see if it can replace the two books above.
• . 2015. Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, San Francisco. ISBN: 978-1-59327-620-1. This short (129 pages) book is written for scientists and covers many common misinterpretations of statistical methods and results in the analysis of scientific data.
• . 2011. Choosing and Using Statistics: A Biologist's Guide. Wiley-Blackwell. ISBN: 978-1-4051-9839-4. This is a cookbook and reference geared toward biologists, but is a useful reference for almost everyone.
• . 2015. Using and Understanding Medical Statistics. Karger, Basel. ISBN: 978-3-318-05458-3.
• . 2008. Basic Probability Theory. Dover, Mineola, New York. Reprint of the 1970 edition published by John Wiley and Sons. ISBN: 0-486-46628-0. This book, being published by Dover, is very affordable. (I think it's still just under \$20.) The first chapter, especially sections 1.4 through 1.7 are very good at explaining how to count for combinatorial problems.
• . 2012. Knowing the Odds: An Introduction to Probability. American Mathematical Society, Providence, Rhode Island. ISBN: 978-0-8218-8532-1. I almost used this as the textbook for the course, but decided to stay with the status quo.
• . 2003. Elementary Probability Theory with Stochastic Processes and an Introduction to Mathematical Finance. Springer-Verlag, New York, Heidelberg, and Berlin. ISBN: 978-0-387-95578-0. This is a very well-written introduction to probability theory. Chapter 3 on counting is especially good.
• . 1995. The Pleasures of Probability. Springer-Verlag, New York, Berlin, and Heidelberg. ISBN: 0-387-94415-X. Another good introduction to probability theory, but a bit too eccentric to use as the main text for this course.

Modern statistical practice is computationally intensive, but this course is not especially so. But you will have to use computers to do some of the assignments. Many of the people on campus that I have talked to recommend the statistical programming language R (the open source alternative to AT&T's S). Mathematica 9 and later claims to be highly integrated with R, but I haven't tried it yet. Others I have talked to rave about NumPy, an extension of Python that provides much of the functionality of Matlab. Still others continue to use other packages because they have invested a lot of effort in learning to use them. (I myself use Mathematica because I started using it in 1992, so my recommendation of R falls into the category of “do as I say, not as I do.”) My son recommends R and this video as an endorsement.

Here are a couple of highly recommended books on R that I mostly have not read. But I find the first two to be useful.

• . 2011. R Primer. Chapman & Hall/CRC Press. Available for on-line reading from the Caltech Library.
• . 2011. R Cookbook. O'Reilly Media. ISBN: 978-0-596-80915-7.
• . 2012. R in a Nutshell, 2nd edition. O'Reilly Media. ISBN: 978-1449312084.
• . 2009. A Beginner's Guide to R. Springer Science+Business Media, New York. ISBN: 978-0-387-93836-3.
• . 2008. Introductory Statistics with R, second edition. Springer Science+Business Media, New York. ISBN: 978-0-387-79053-4

## LECTURE NOTES & SUPPLEMENTS

I usually revise theses notes following the lectures and even later, so while you may wanto look them over before the lecture, you may not want to print them out till later. The date/times in parenthese reflect the most recent version.

## HOMEWORK

Due Date Assignment Solutions
Thursday, January 5, 8:00 pm
online
Assignment 0 N/A
Monday, January 9, 4:00 pm Assignment 1 Sample solutions
Tuesday, January 17, 4:00 pm Assignment 2 Sample solutions
Monday, January 23, 4:00 pm Assignment 3 Sample solutions
Tuesday, January 31, 4:00 pm Assignment 4 Sample solutions
Tuesday, February 14, 4:00 pm Assignment 5 Sample solutions
Tuesday, February 21, 4:00 pm Assignment 6 Sample solutions
Tuesday, February 28, 4:00 pm Assignment 7 Sample solutions
Tuesday, March 7, 4:00 pm Assignment 8

## EXAMS

Due Date Exam Solutions
Wednesday, February 8, 4:00 pm Midquarter Exam Histogram of scores
Solutions
Thursday, March 16, 4:00 pm Final Exam