STATS 606

Course project

The project is intended to engage you in a non-trivial application of computation and/or optimization methods in statistics/data science. We encourage you to combine the project with your research or a personal project. You may complete the project in teams of 3 or 4 students. The deliverables are

  1. a project proposal due at noon ET on Fri, Feb 28,
  2. a draft report (for peer review) due at noon ET on Mon, Apr 7,
  3. a peer review due at noon on Mon, Apr 14, and
  4. a final report due at noon on Mon, Apr 21.

The overall grade on the final project is a combination of the grades on the deliverables: 10% project proposal, 10% peer review, 80% final report. Although the draft report does not contribute to the overall (project) grade, teams who do not submit a submit a draft report will not be assigned a project to review (so they will receive zero credit for the peer review).

Coming up with a project

Most projects fall into one of three types:

Before starting work on a project, make sure it is novel; ie the project should fill a gap in the literature (eg. there are no recent review papers on the same topic). After coming up with a project idea, first do a literature review to gauge the novelty of the idea, then discuss the idea with the course staff.

The following guidelines on the deliverables are intentionally vague (to allow the widest variety of projects). Please ask the course staff about guidelines specific to your project.

Project proposal

The proposal is due at noon ET on Fri, Feb 28. It is intended to get you started on the project and solicit feedback from the course staff. It should be no more than 2 pages in NeurIPS format (excluding the contributions section, references, and any appendices). Please follow the guidelines for mathematical writing. The proposal must include

  1. the title of the project,
  2. the names of the team members
  3. a description of the task and your (tentative) approach: What is the problem you are tackling? What dataset(s) and algorithm(s) will you use? What are some expected challenges? If your project entails data collection, describe the data collection protocol. What methods/metrics will you use to evaluate the performance of your approach? If you are working on a well-studied problem, describe baseline methods that you will compare against.
  4. a review of the relevant literature (see guidelines for the related work section in the final report below)
  5. a to-do list for the draft report: if your project entails data collection, then we expect you to have collected all the data; if your project uses pre-processed data (e.g. from Kaggle), then we expect experimental results (e.g. performance on baselines).

The goal of the project proposal is similar to that of a grant proposal. A good proposal should convince the course staff that (i) the project is worth doing (e.g. if the original authors provide code that reproduces the results, then it is not worth reproducing the results again), and (ii) you can complete the project. For (i), the best way is to review related work in the area and identify the gap that your project intends to fill. For (ii), the best way is to start working on the project. This helps you identify issues that may arise and propose solutions.

Draft report and peer review

The draft report is due at noon ET on Mon, Apr 7. It is intended to solicit feedback from your classmates, so it should be close to the final report. We expect the draft report to include some experimental results demonstrating the efficacy of the approach on the task. Please follow the guidelines for mathematical writing.

The peer review process is double-blind; i.e. the reviewer(s) are hidden from the author(s) and vice versa. Thus the draft report must be anonymous; i.e. do not include team member names in the draft report. The peer review of your assigned project is due on noon ET on Mon, Apr 14. It is intended to provide constructive feedback to your classmates. The review should include the following sections (adapted from NeurIPS reviewer guidelines):

  1. Summary: Briefly summarize the paper and its contributions. This is not the place to critique the paper; the authors should generally agree with a well-written summary.
  2. Strengths and Weaknesses: Please provide a thorough assessment of the strengths and weaknesses of the paper, touching on each of the following aspects:
    • originality: Are the tasks or methods new? Is the work a novel combination of well-known techniques? (This can be valuable!) Is it clear how this work differs from previous contributions? Is related work adequately cited?
    • quality: Is the submission technically sound? Are claims well supported (e.g., by theoretical analysis or experimental results)? Are the methods used appropriate? Is this a complete piece of work or work in progress? Are the authors careful and honest about the strengths and weaknesses of their work?
    • clarity: Is the submission clearly written? Is it well organized? (If not, please make constructive suggestions for improving its clarity.) An expert reader should be able to easily reproduce the results in a well-written paper.
  3. Questions: Please list any questions and suggestions for the authors. Think of the things where a response from the author can change your opinion, clarify a confusion or address a limitation.
  4. Limitations: Have the authors adequately addressed the limitations and potential negative societal impact of their work? If not, please include constructive suggestions for improvement.

Final report

The final report is due on noon ET on Mon, Apr 21. It should be no more than 8 pages in NeurIPS format (excluding the contributions section, references, and any appendices). The report should include (but is not limited to) the following sections (adapted from Stanford’s CS 229 final report guidelines):

  1. Introduction (0.5 to 1 pages): Explain the problem and why it is important. Clearly state what the inputs and outputs are (e.g. our algorithm accepts an histopathological image as input and predicts whether the central regions contains any tumor tissue).
  2. Related work (0.5 to 1 page): You should find relevant papers, group them into categories based on their approaches, discuss their strengths and weaknesses, and compare them with your approach. Which approaches were clever/good? What is the state-of-the-art? You should cite at least a dozen relevant papers. Google Scholar is very useful for finding relevant papers.
  3. Dataset and features (0.5 to 1 pages): Describe the dataset you are using. How many training/validation/test examples do you have? Did you preprocess the data in any way? What features did you extract? Space permitting, show some examples from your dataset.
  4. Methods (1 to 2 pages): Describe your learning pipeline, including any algorithm(s). Make sure to include relevant mathematical details. For each algorithm, give a short description (1 to 2 paragraphs) of how it works. If you are using cutting edge or niche algorithms (or any algorithm not covered in class), provide enough detail so that your classmates can understand the algorithm. You should also describe how you chose (hyper)parameters (e.g. what was your mini-batch size and why).
  5. Experiments/Results (1 to 2 pages): Present your results with a mixture of tables and plots. For example, if you are solving a classification problem, you should include a confusion matrix or AUC/AUPRC curves. Your figures should include legends, axis labels, and have font sizes that are legible when printed. Make sure to describe (mathematically if necessary) any metrics you report and refer to any figures/tables in your main text. You should have both quantitative and qualitative results.
  6. Conclusion/Discussion (1 to 2 pages): Summarize your report and reiterate the main points. What worked and what didn’t work (and why)? Discuss the advantages and disadvantages of your method (e.g. provide examples of where your algorithm failed/succeeded). For future work, how can the method be improved?
  7. Contributions: If you are working on a project as part of a team, you must also include a contributions section at the end of the report (where acknowledgements usually appear) describing the contributions of the team members. If there are discrepancies among the contributions of the team members, the grades will be adjusted. This section does not count towards the page limit.
  8. References: Include citations for: (1) any papers mentioned in the related work section, (2) papers describing algorithms that you used which were not covered in class, (3) references for datasets and software. Any reference format that include author(s), title, conference/journal, year is acceptable.

You must submit a PDF file of your project report and a repository of (properly commented) code that reproduces any computer output in the project report on Canvas (eg on GitHub). The report must be typeset. Submissions will be evaluated on three aspects: