ANNOUNCEMENTS
Final Correction
I have posted a correction to the final exam. The likelihood function for proble two shoul read: \(n_0 \ln\bigl(\pi + (1\pi) e^{\mu}\bigr) + (nn_0) \ln(1\pi)  (nn_0) \mu + \ln(\mu) \sum_{i=1}^n k_i  \sum_{i=1}^n \ln k_i !\).
Note that the coefficen on \(\sum_{i=1}^n k_i\) is \(\ln(\mu)\) and not \(\mu\). It makes a big difference. Thanks to Ethan for catching it early.
Final and Review Session
The final exam will be posted online around noon on Friday, March 10, and be due Thursday, March 16, by 4:00 pm. There will be a review session hosted by one of the TAs on Saturday, March 11, at 1:00 pm in Baxter Lecture Hall.
The exam will concentrate on statistics, and you will need access to a computer with decent software. Because programming can be frustrating, there will be no time limit on the exam. But it shouldn't take more than a few hours.
This page was last updated: .
COURSE DESCRIPTION
Introduction to the fundamental ideas and techniques of probability theory and statistical inference.
Probability will be covered in the first half of the term (using Pitman) and statistics (using Larsen and Marx) in the second half (see below for information regarding textbooks). Main topics covered are:
 Properties of probability
 Independence, conditional probability, Bayes' Law
 Random variables, distributions, densities, and expectation
 Joint distributions, marginals, covariance, correlation
 The Law of Large Numbers
 The Central Limit Theorem
 Order statisitics
 Important distributions
 Bernoulli, Binomial
 Uniform
 Normal (Gaussian)
 Exponential, Poisson
 Gamma, Beta, Chisquare
 Conjugate prior/posterior pairs
 Introduction to stochastic processes
 Random walk
 Markov chains
 Martingales
 Estimation of parameters
 Consistency, unbiasedness
 Maximum likelihood estimation
 Confidence intervals
 Cramér–Rao lower bound
 Testing statistical hypotheses
 Significance tests
 Likelihood ratio tests
 Monotone Likelihood Ratio Property and the Neyman–Person Lemma
 Type I and Type II errors
 Power and assurance
 Critical values
 Specification tests
 Kolmogorov–Smirnov
 chisquare test, Fisher's exact test
 Linear regression analysis
 Gauss Markov–Theorem
 ANOVA
 Nonparametric tests
 Wilcoxon, Mann–Whitney, Kruskal–Wallis
 Spearman rank correlation
 Significance tests
 Introduction to Bayesian approaches
PREREQUISITES
Ma 1abc. In addition, some familiarity with a scientific computing language or program (e.g., Mathematica, Matlab, NumPy, Octave, R) is assumed.
Contact Information
 Instructor:
 Kim Border, 205 Baxter Hall, x4218, kcborder@caltech.edu
 Office Hours:
 Fridays, 1:30–3:00 pm
 Lead TA:
 William Chan, wcchan@caltech.edu
 Office Hours:
 Saturday, 1  2 pm
160 Sloan  Course Secretary:
 Meagan Heirwegh, 253 Sloan, x4335, heirwegm@caltech.edu
TA's
Section 1  Liyang Yang Office Hours: Sundays, 8  9 PM 155 Sloan, 6263954081 
Thursdays, 09:00 AM 153 Sloan 
Section 2  William Chan Office Hours: Saturday, 1  2 PM 160 Sloan, 6263954324 
Thursdays, 9:00 AM 159 Sloan 
Section 3  William Chan Office Hours: Saturday, 2  3 PM 160 Sloan, 6263954324 
Thursdays, 10:00 AM B127 GCL 
Section 4  Hélène Rochais Office Hours: Fridays, 6  7 PM 155 Sloan, 6263954081 
Thursdays, 10:00 AM 269 Lauritsen 
Section 5  Jack Tao Office Hours: Fridays, 5  6 PM 155 Sloan, 6083382788 
Thursdays 10:00 AM 102 Steele 
Section 7  Jane Panangaden Office Hours: Mondays, 5  6 PM 155 Sloan, 6263954081 
Thursdays, 1:00 PM 153 Sloan 
Section 8  Juhyun Kim Office Hours: Saturdays, 3  4 PM 155 Sloan, 6263954081 
Thursdays, 1:00 PM B127 GCL 
Section 9  Lingfei Yi Office Hours: Sundays, 5  6 PM 155 Sloan, 6263957172 
Thursdays, 2:30  3:25 PM 115 BCK 
POLICIES
 Late Work:

As a rule, late work is not accepted. This is to protect the TAs, who are talented hardworking students, just as you are. At the discretion of the Head TA, late homework turned in the day it is due, but after the 4:00 pm deadline will be accepted with a 25% penalty. If there are extenuating circumstances, you must notify the Head TA by midnight the night before it is due and you must get a noite from the from the Dean supporting the extension. As partial compensation, your lowest homework score will be discarded.
 Grading:

Your course grade will be based on the weekly homework (40%), the midterm (25% or 35%), and the final (35% or 25%). The weights on the final and midterm will put the greater weight on the better exam. In computing the homework average, your lowest homework score will be dropped. (Since homework assignments vary by weight, a modified Kazatkin algorithm will be used to determine which score to drop.)
This year I am continuing the following practice. Each assignment will contain zero or more optional exercises. They are optional in the following sense: Grades will calculated without taking the optional exercises into account, but the maximum grade will be an A. If you want an A+, you will have to earn an A and also accumulate sufficiently many optional points. No collaboration is allowed on optional exercises.
As this course is for a letter grade, no one will be excused from the final.
 Homework:

Homework will be typically be due at 4:00 pm on Mondays in the appropriate homework box outside 253 Sloan. (If Monday is a holiday [which happens twice this term] homework will be due on Tuesday. Assignment 0 is a major exception.) Problems (and later solutions) will be posted on this course webpage. You are encouraged to start the homework well in advance of the due date in order not to risk missing the deadline. Homework is turned in to locked boxes, so it can safely be submitted as soon as it is completed.
 Collaboration:

Collaboration is allowed on the homework, but your writeup must be in your own words and may not be copied. The exception is that no collaboration is allowed on optional exercises. Collaboration is not allowed on the exams. Please ask for clarification if anything is unclear.
*Information is subject to change*
TEXTBOOKS
The required
textbooks for the course are:
 Jim Pitman. 1993. Probability. Springer, New York, Berlin, and Heidelberg. ISBN: 0387979748.
 Richard J. Larsen and Morris L. Marx. 2012. An Introduction to Mathematical Statistics and Its Applications, fifth edition. Prentice Hall. ISBN: 0321693949.
There will be additional readings from time to time, either as handouts or articles available on line.
There are other books that you may find useful for this course or perhaps later in life. Here are, in no particular order, some of my recommendations.
 Robert V. Hogg, Elliot A. Tanis, and Dale Zimmerman. 2015. Probability and Statistical Inference. Pearson, Boston. ISBN: 9780321923271. This is a nicely written introduction that I am evaluating to see if it can replace the two books above.
 Alex Reinhart. 2015. Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, San Francisco. ISBN: 9781593276201. This short (129 pages) book is written for scientists and covers many common misinterpretations of statistical methods and results in the analysis of scientific data.
 Calvin Dytham. 2011. Choosing and Using Statistics: A Biologist's Guide. WileyBlackwell. ISBN: 9781405198394. This
is a
cookbook
and reference geared toward biologists, but is a useful reference for almost everyone.  David E. Matthews and Vernojn T. Farewell. 2015. Using and Understanding Medical Statistics. Karger, Basel. ISBN: 9783318054583.
 Robert B. Ash. 2008. Basic Probability Theory. Dover, Mineola, New York. Reprint of the 1970 edition published by John Wiley and Sons. ISBN: 0486466280. This book, being published by Dover, is very affordable. (I think it's still just under $20.) The first chapter, especially sections 1.4 through 1.7 are very good at explaining how to count for combinatorial problems.
 John B. Walsh. 2012. Knowing the Odds: An Introduction to Probability. American Mathematical Society, Providence, Rhode Island. ISBN: 9780821885321. I almost used this as the textbook for the course, but decided to stay with the status quo.
 Kai Lai Chung and Farid AitSahlia. 2003. Elementary Probability Theory with Stochastic Processes and an Introduction to Mathematical Finance. SpringerVerlag, New York, Heidelberg, and Berlin. ISBN: 9780387955780. This is a very wellwritten introduction to probability theory. Chapter 3 on counting is especially good.
 Richard Isaac. 1995. The Pleasures of Probability. SpringerVerlag, New York, Berlin, and Heidelberg. ISBN: 038794415X. Another good introduction to probability theory, but a bit too eccentric to use as the main text for this course.
Modern statistical practice is computationally intensive, but
this course is not especially so. But you will have to use
computers to do some of the assignments. Many of
the people on campus that I have talked to recommend the statistical
programming language R
(the open source alternative to
AT&T's S
). Mathematica 9
and later claims to be highly integrated with R
, but I haven't
tried it yet. Others I have talked to rave about NumPy
, an extension of Python
that
provides much of the functionality of Matlab
. Still
others continue to use other packages because they have invested a
lot of effort in learning to use them. (I myself use Mathematica
because I started using it in 1992, so my
recommendation of R
falls into the category of “do as I
say, not as I do.”)
My son recommends R
and this video as an
endorsement.
Here are a couple of highly recommended books on R
that I mostly have not read. But I find the first two to be
useful.
 Claus Thorn Ekstrom. 2011. R Primer. Chapman & Hall/CRC Press. Available for online reading from the Caltech Library.
 Paul Teetor. 2011. R Cookbook. O'Reilly Media. ISBN: 9780596809157.
 Joseph Adler. 2012. R in a Nutshell, 2nd edition. O'Reilly Media. ISBN: 9781449312084.
 Alain F. Zuur, Elena N. Ieno, and Erik H. W. G. Meesters. 2009. A Beginner's Guide to R. Springer Science+Business Media, New York. ISBN: 9780387938363.
 Peter Dalgaard. 2008. Introductory Statistics with R, second edition. Springer Science+Business Media, New York. ISBN: 9780387790534
LECTURE NOTES & SUPPLEMENTS
I usually revise theses notes following the lectures and even later, so while you may wanto look them over before the lecture, you may not want to print them out till later. The date/times in parenthese reflect the most recent version.
 Lecture 1
 Lecture 2
 Lecture 3
 Lecture 4
 Lecture 5
 Lecture 6
 Lecture 7
 Lecture 8
 Lecture 9
 Lecture 10
 Lecture 11
 Lecture 12
 Lecture 13
 Lecture 14
 Lecture 15
 Lecture 16
 Lecture 17
 Lecture 18
 Lecture 19
 Lecture 20
 Lecture 21
 Lecture 22
 Lecture 23
 Lecture 24
 Lecture 24 Slides
 Lecture 25
 Lecture 25 Slides
 Lecture 26
 Lecture 27 (tentative)
HOMEWORK
Due Date  Assignment  Solutions 

Thursday, January 5, 8:00 pm online 
Assignment 0  N/A 
Monday, January 9, 4:00 pm  Assignment 1  Sample solutions 
Tuesday, January 17, 4:00 pm  Assignment 2  Sample solutions 
Monday, January 23, 4:00 pm  Assignment 3  Sample solutions 
Tuesday, January 31, 4:00 pm  Assignment 4  Sample solutions 
Tuesday, February 14, 4:00 pm  Assignment 5  Sample solutions 
Tuesday, February 21, 4:00 pm  Assignment 6  Sample solutions 
Tuesday, February 28, 4:00 pm  Assignment 7  Sample solutions 
Tuesday, March 7, 4:00 pm  Assignment 8 
EXAMS
Due Date  Exam  Solutions 

Wednesday, February 8, 4:00 pm  Midquarter Exam  Histogram of
scores Solutions 
Thursday, March 16, 4:00 pm  Final Exam 