ANNOUNCEMENTS
Final Correction
I have posted a correction to the final exam. The likelihood function for proble two shoul read: \(n_0 \ln\bigl(\pi + (1-\pi) e^{-\mu}\bigr) + (n-n_0) \ln(1-\pi) - (n-n_0) \mu + \ln(\mu) \sum_{i=1}^n k_i - \sum_{i=1}^n \ln k_i !\).
Note that the coefficen on \(\sum_{i=1}^n k_i\) is \(\ln(\mu)\) and not \(\mu\). It makes a big difference. Thanks to Ethan for catching it early.
Final and Review Session
The final exam will be posted online around noon on Friday, March 10, and be due Thursday, March 16, by 4:00 pm. There will be a review session hosted by one of the TAs on Saturday, March 11, at 1:00 pm in Baxter Lecture Hall.
The exam will concentrate on statistics, and you will need access to a computer with decent software. Because programming can be frustrating, there will be no time limit on the exam. But it shouldn't take more than a few hours.
This page was last updated: .
COURSE DESCRIPTION
Introduction to the fundamental ideas and techniques of probability theory and statistical inference.
Probability will be covered in the first half of the term (using Pitman) and statistics (using Larsen and Marx) in the second half (see below for information regarding textbooks). Main topics covered are:
- Properties of probability
- Independence, conditional probability, Bayes' Law
- Random variables, distributions, densities, and expectation
- Joint distributions, marginals, covariance, correlation
- The Law of Large Numbers
- The Central Limit Theorem
- Order statisitics
- Important distributions
- Bernoulli, Binomial
- Uniform
- Normal (Gaussian)
- Exponential, Poisson
- Gamma, Beta, Chi-square
- Conjugate prior/posterior pairs
- Introduction to stochastic processes
- Random walk
- Markov chains
- Martingales
- Estimation of parameters
- Consistency, unbiasedness
- Maximum likelihood estimation
- Confidence intervals
- Cramér–Rao lower bound
- Testing statistical hypotheses
- Significance tests
- Likelihood ratio tests
- Monotone Likelihood Ratio Property and the Neyman–Person Lemma
- Type I and Type II errors
- Power and assurance
- Critical values
- Specification tests
- Kolmogorov–Smirnov
- chi-square test, Fisher's exact test
- Linear regression analysis
- Gauss Markov–Theorem
- ANOVA
- Nonparametric tests
- Wilcoxon, Mann–Whitney, Kruskal–Wallis
- Spearman rank correlation
- Significance tests
- Introduction to Bayesian approaches
PREREQUISITES
Ma 1abc. In addition, some familiarity with a scientific computing language or program (e.g., Mathematica, Matlab, NumPy, Octave, R) is assumed.
Contact Information
- Instructor:
- Kim Border, 205 Baxter Hall, x4218, kcborder@caltech.edu
- Office Hours:
- Fridays, 1:30–3:00 pm
- Lead TA:
- William Chan, wcchan@caltech.edu
- Office Hours:
- Saturday, 1 - 2 pm
160 Sloan - Course Secretary:
- Meagan Heirwegh, 253 Sloan, x4335, heirwegm@caltech.edu
TA's
Section 1 - Liyang Yang Office Hours: Sundays, 8 - 9 PM 155 Sloan, 626-395-4081 |
Thursdays, 09:00 AM 153 Sloan |
Section 2 - William Chan Office Hours: Saturday, 1 - 2 PM 160 Sloan, 626-395-4324 |
Thursdays, 9:00 AM 159 Sloan |
Section 3 - William Chan Office Hours: Saturday, 2 - 3 PM 160 Sloan, 626-395-4324 |
Thursdays, 10:00 AM B127 GCL |
Section 4 - Hélène Rochais Office Hours: Fridays, 6 - 7 PM 155 Sloan, 626-395-4081 |
Thursdays, 10:00 AM 269 Lauritsen |
Section 5 - Jack Tao Office Hours: Fridays, 5 - 6 PM 155 Sloan, 608-338-2788 |
Thursdays 10:00 AM 102 Steele |
Section 7 - Jane Panangaden Office Hours: Mondays, 5 - 6 PM 155 Sloan, 626-395-4081 |
Thursdays, 1:00 PM 153 Sloan |
Section 8 - Juhyun Kim Office Hours: Saturdays, 3 - 4 PM 155 Sloan, 626-395-4081 |
Thursdays, 1:00 PM B127 GCL |
Section 9 - Lingfei Yi Office Hours: Sundays, 5 - 6 PM 155 Sloan, 626-395-7172 |
Thursdays, 2:30 - 3:25 PM 115 BCK |
POLICIES
- Late Work:
-
As a rule, late work is not accepted. This is to protect the TAs, who are talented hardworking students, just as you are. At the discretion of the Head TA, late homework turned in the day it is due, but after the 4:00 pm deadline will be accepted with a 25% penalty. If there are extenuating circumstances, you must notify the Head TA by midnight the night before it is due and you must get a noite from the from the Dean supporting the extension. As partial compensation, your lowest homework score will be discarded.
- Grading:
-
Your course grade will be based on the weekly homework (40%), the midterm (25% or 35%), and the final (35% or 25%). The weights on the final and midterm will put the greater weight on the better exam. In computing the homework average, your lowest homework score will be dropped. (Since homework assignments vary by weight, a modified Kazatkin algorithm will be used to determine which score to drop.)
This year I am continuing the following practice. Each assignment will contain zero or more optional exercises. They are optional in the following sense: Grades will calculated without taking the optional exercises into account, but the maximum grade will be an A. If you want an A+, you will have to earn an A and also accumulate sufficiently many optional points. No collaboration is allowed on optional exercises.
As this course is for a letter grade, no one will be excused from the final.
- Homework:
-
Homework will be typically be due at 4:00 pm on Mondays in the appropriate homework box outside 253 Sloan. (If Monday is a holiday [which happens twice this term] homework will be due on Tuesday. Assignment 0 is a major exception.) Problems (and later solutions) will be posted on this course webpage. You are encouraged to start the homework well in advance of the due date in order not to risk missing the deadline. Homework is turned in to locked boxes, so it can safely be submitted as soon as it is completed.
- Collaboration:
-
Collaboration is allowed on the homework, but your write-up must be in your own words and may not be copied. The exception is that no collaboration is allowed on optional exercises. Collaboration is not allowed on the exams. Please ask for clarification if anything is unclear.
*Information is subject to change*
TEXTBOOKS
The required
textbooks for the course are:
- Jim Pitman. 1993. Probability. Springer, New York, Berlin, and Heidelberg. ISBN: 0-387-97974-8.
- Richard J. Larsen and Morris L. Marx. 2012. An Introduction to Mathematical Statistics and Its Applications, fifth edition. Prentice Hall. ISBN: 0-321-69394-9.
There will be additional readings from time to time, either as handouts or articles available on line.
There are other books that you may find useful for this course or perhaps later in life. Here are, in no particular order, some of my recommendations.
- Robert V. Hogg, Elliot A. Tanis, and Dale Zimmerman. 2015. Probability and Statistical Inference. Pearson, Boston. ISBN: 978-0-321-92327-1. This is a nicely written introduction that I am evaluating to see if it can replace the two books above.
- Alex Reinhart. 2015. Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, San Francisco. ISBN: 978-1-59327-620-1. This short (129 pages) book is written for scientists and covers many common misinterpretations of statistical methods and results in the analysis of scientific data.
- Calvin Dytham. 2011. Choosing and Using Statistics: A Biologist's Guide. Wiley-Blackwell. ISBN: 978-1-4051-9839-4. This
is a
cookbook
and reference geared toward biologists, but is a useful reference for almost everyone. - David E. Matthews and Vernojn T. Farewell. 2015. Using and Understanding Medical Statistics. Karger, Basel. ISBN: 978-3-318-05458-3.
- Robert B. Ash. 2008. Basic Probability Theory. Dover, Mineola, New York. Reprint of the 1970 edition published by John Wiley and Sons. ISBN: 0-486-46628-0. This book, being published by Dover, is very affordable. (I think it's still just under $20.) The first chapter, especially sections 1.4 through 1.7 are very good at explaining how to count for combinatorial problems.
- John B. Walsh. 2012. Knowing the Odds: An Introduction to Probability. American Mathematical Society, Providence, Rhode Island. ISBN: 978-0-8218-8532-1. I almost used this as the textbook for the course, but decided to stay with the status quo.
- Kai Lai Chung and Farid Ait-Sahlia. 2003. Elementary Probability Theory with Stochastic Processes and an Introduction to Mathematical Finance. Springer-Verlag, New York, Heidelberg, and Berlin. ISBN: 978-0-387-95578-0. This is a very well-written introduction to probability theory. Chapter 3 on counting is especially good.
- Richard Isaac. 1995. The Pleasures of Probability. Springer-Verlag, New York, Berlin, and Heidelberg. ISBN: 0-387-94415-X. Another good introduction to probability theory, but a bit too eccentric to use as the main text for this course.
Modern statistical practice is computationally intensive, but
this course is not especially so. But you will have to use
computers to do some of the assignments. Many of
the people on campus that I have talked to recommend the statistical
programming language R
(the open source alternative to
AT&T's S
). Mathematica 9
and later claims to be highly integrated with R
, but I haven't
tried it yet. Others I have talked to rave about NumPy
, an extension of Python
that
provides much of the functionality of Matlab
. Still
others continue to use other packages because they have invested a
lot of effort in learning to use them. (I myself use Mathematica
because I started using it in 1992, so my
recommendation of R
falls into the category of “do as I
say, not as I do.”)
My son recommends R
and this video as an
endorsement.
Here are a couple of highly recommended books on R
that I mostly have not read. But I find the first two to be
useful.
- Claus Thorn Ekstrom. 2011. R Primer. Chapman & Hall/CRC Press. Available for on-line reading from the Caltech Library.
- Paul Teetor. 2011. R Cookbook. O'Reilly Media. ISBN: 978-0-596-80915-7.
- Joseph Adler. 2012. R in a Nutshell, 2nd edition. O'Reilly Media. ISBN: 978-1449312084.
- Alain F. Zuur, Elena N. Ieno, and Erik H. W. G. Meesters. 2009. A Beginner's Guide to R. Springer Science+Business Media, New York. ISBN: 978-0-387-93836-3.
- Peter Dalgaard. 2008. Introductory Statistics with R, second edition. Springer Science+Business Media, New York. ISBN: 978-0-387-79053-4
LECTURE NOTES & SUPPLEMENTS
I usually revise theses notes following the lectures and even later, so while you may wanto look them over before the lecture, you may not want to print them out till later. The date/times in parenthese reflect the most recent version.
- Lecture 1
- Lecture 2
- Lecture 3
- Lecture 4
- Lecture 5
- Lecture 6
- Lecture 7
- Lecture 8
- Lecture 9
- Lecture 10
- Lecture 11
- Lecture 12
- Lecture 13
- Lecture 14
- Lecture 15
- Lecture 16
- Lecture 17
- Lecture 18
- Lecture 19
- Lecture 20
- Lecture 21
- Lecture 22
- Lecture 23
- Lecture 24
- Lecture 24 Slides
- Lecture 25
- Lecture 25 Slides
- Lecture 26
- Lecture 27 (tentative)
HOMEWORK
Due Date | Assignment | Solutions |
---|---|---|
Thursday, January 5, 8:00 pm online |
Assignment 0 | N/A |
Monday, January 9, 4:00 pm | Assignment 1 | |
Tuesday, January 17, 4:00 pm | Assignment 2 | |
Monday, January 23, 4:00 pm | Assignment 3 | |
Tuesday, January 31, 4:00 pm | Assignment 4 | |
Tuesday, February 14, 4:00 pm | Assignment 5 | |
Tuesday, February 21, 4:00 pm | Assignment 6 | |
Tuesday, February 28, 4:00 pm | Assignment 7 | |
Tuesday, March 7, 4:00 pm | Assignment 8 |
EXAMS
Due Date | Exam | Solutions |
---|---|---|
Wednesday, February 8, 4:00 pm | Midquarter Exam | Histogram of
scores |
Thursday, March 16, 4:00 pm | Final Exam |