You have 4 summaries left

The Gradient: Perspectives on AI

Ryan Tibshirani: Statistics, Nonparametric Regression, Conformal Prediction

Thu Apr 25 2024
StatisticsMachine LearningNon-Parametric RegressionTrend FilteringConformal PredictionData Quality

Description

This episode explores the intersection of statistics and machine learning, discussing topics such as research focus, non-parametric regression, trend filtering, conformal prediction, and assessing data quality in epidemic tracking. The guests share insights on the differences between the statistics and ML communities, the challenges and limitations of conformal prediction, and the importance of diverse perspectives in collaborative projects. They also highlight the need for openness in research and discuss future research directions.

Insights

Statistics and machine learning influence each other

Both statistics and machine learning benefit from each other's energy and approaches, with each discipline influencing the other and contributing to advancements in the field.

Conformal prediction provides uncertainty quantification

Conformal prediction is a method for quantifying uncertainty in machine learning models, providing a prediction set for Y given X. It offers probabilistic guarantees without relying on assumptions or asymptotics.

Challenges of conformal prediction

Conformal prediction faces challenges in handling non-IID data, improving conditional coverage, and avoiding incentivizing subpar prediction functions. Assumptions like IID and exchangeability may not hold in real-world scenarios.

Assessing data quality in epidemic tracking

Measuring data quality in epidemic tracking and forecasting is complex due to various factors. There is no foolproof method, leading to challenges and potential distrust in available methods.

Chapters

  1. Introduction to Statistics and AI
  2. Research Focus and Openness in ML Community
  3. Statistics and Machine Learning
  4. Non-Parametric Regression and Locally Adaptive Methods
  5. Trend Filtering and Neural Networks
  6. Basis Functions on Graphs and Regularization Techniques
  7. Discrete Splines and Conformal Prediction
  8. Conformal Prediction and its Applications
  9. Challenges and Limitations of Conformal Prediction
  10. Improving Conditional Coverage and Handling Non-IID Data
  11. Accommodating Non-Exchangeable Data and Online Cases
  12. Adaptive Control Mechanism and Future Research Directions
  13. Large-Scale Survey for Tracking and Forecasting during the Pandemic
  14. Assessing Data Quality in Epidemic Tracking and Forecasting
Summary
Transcript

Introduction to Statistics and AI

00:00 - 07:32

  • The AI world has drawn heavily from the statistics community, but there are differences in what researchers focus on.
  • Ryan Tibshirani, a statistician, discusses his path into statistics and how he chose to pursue a statistics PhD over a math PhD.
  • Ryan's introduction to statistics came through applied work during summer internships at Stanford.
  • Statistics is considered broad and offers opportunities for mathematical, applied, and computational approaches.
  • During Ryan's PhD program where he studied in the same department as his father, he had to prove himself independently without relying on his father's reputation.
  • Despite initial delicate moments, Ryan found being in the same field as his father supportive and positive.
  • As a junior faculty member, Ryan has been invited to prestigious events due to his own accomplishments but occasionally gets mistaken for his father.

Research Focus and Openness in ML Community

07:06 - 15:03

  • The researcher discusses how they cultivate their taste as a researcher and choose which problems to work on based on personal interest, beauty of the problem, and perceived importance within the community.
  • They mention that their motivation for working on different types of problems can be driven by finding them interesting or beautiful, or by considering them important for the community.
  • The researcher explains that they do not have a specific long-term agenda but rely on two axes - personal interest and perceived importance within the community - to guide their research focus.
  • There is a discussion about maintaining openness in research within the ML community despite trends like chasing large language models, with emphasis on working on different problems and avoiding following mainstream trends.
  • A comparison is made between the statistics and ML communities regarding paper length, with observations that statistics papers tend to be longer. The differences in how each community approaches knowledge production, publication norms, and directions for research are highlighted.

Statistics and Machine Learning

14:41 - 22:50

  • In statistics and machine learning, there is a distinction between prediction and inference, with prediction dominating in machine learning while inference has grown in importance.
  • The language used in statistics and machine learning shapes the way ideas are expressed and worked on, leading to different areas of focus.
  • Statisticians tend to be more skeptical by nature compared to the doers community in machine learning, which leads to differences in how research is approached and communicated within each discipline.
  • Both statistics and machine learning benefit from each other's energy and approaches, with each discipline influencing the other and contributing to advancements in the field.

Non-Parametric Regression and Locally Adaptive Methods

22:26 - 30:39

  • Astrophysicists and machine learning colleagues collaborated effectively on a project, showcasing the importance of diverse perspectives.
  • The discussion led to focusing on non-parametric regression papers related to trend filtering.
  • Non-parametric regression allows for predicting responses without assuming a specific parametric relationship between variables.
  • Locally adaptive methods in non-parametric regression adjust to varying levels of smoothness in trends, unlike traditional methods like spline estimators.
  • Locally adaptive methods, such as kernel smoothers with varying bandwidths, can handle differences in local smoothness effectively.

Trend Filtering and Neural Networks

30:10 - 38:28

  • Trend filtering is a computationally efficient locally-adaptive estimator that is in discrete time and can adapt to low trends efficiently.
  • There is an interesting connection between trend filtering and neural networks as both are locally adaptive non-parametric regression methods.
  • In problem formulation, there are synthesis and analysis frameworks where synthesis involves specifying basis functions while analysis starts with a complete parameterization and penalizes unwanted behaviors.

Basis Functions on Graphs and Regularization Techniques

38:10 - 46:14

  • Basis functions on graphs can be challenging to define without embedding the graph into Euclidean space
  • Penalties can incentivize behaviors on graphs, such as taking differences between neighboring nodes
  • Neural networks operate within a synthesis framework by constructing basis functions through composition of simple functions
  • Regularization techniques in neural networks can optimize over functions and move away from certain behaviors
  • The paper discussed involves discrete splines, trend filtering, and numerical analysis properties
  • Splines are smooth piecewise polynomials that are useful for estimating trends and have applications in statistics

Discrete Splines and Conformal Prediction

45:48 - 53:33

  • Divided differences, a concept dating back to Newton, are used in discrete splines as an alternative to derivatives.
  • Discrete splines and traditional splines offer similar approximation rates and complexity in estimating functions.
  • Conformal prediction, developed by Vladimir Vovk and collaborators, is a method for quantifying uncertainty in machine learning models.
  • Conformal prediction was influenced by early conversations with Vladimir Vapnik.

Conformal Prediction and its Applications

53:04 - 1:00:51

  • Conformal prediction method was developed in the late 90s and gained popularity in machine learning.
  • Conformal prediction is favored in machine learning for being light on assumptions compared to traditional inference techniques in statistics.
  • The method connects to old statistical ideas like permutation testing and constructing confidence intervals by inverting hypothesis tests.
  • Conformal prediction provides a prediction set for Y given X, which can be valuable in various domains like epidemiological forecasting.
  • In conformal prediction, a point prediction is transformed into a set of plausible values by dividing the data into proper training and calibration sets.

Challenges and Limitations of Conformal Prediction

1:00:22 - 1:08:03

  • The process of conformal prediction involves dividing a set into a proper training set and a calibration set to evaluate prediction errors.
  • Conformal prediction aims to ensure that the error on test points is within the 90% smallest fraction of errors in the calibration set.
  • The strength of conformal prediction lies in its ability to provide guaranteed coverage for test predictions without assumptions or asymptotics.
  • A limitation of conformal prediction is its assumption that training and test data are IID, which may not hold in real-world scenarios like time series data or different populations.
  • Another limitation is that the coverage guarantee in conformal prediction is marginal over features, providing an average coverage over feature values at test time.

Improving Conditional Coverage and Handling Non-IID Data

1:07:33 - 1:15:32

  • Conformal prediction provides a 95% prediction interval with 95% coverage on average, but cannot guarantee coverage conditional on specific features.
  • Extensions of conformal prediction aim to improve conditional coverage and address non-IID data, but do not offer foolproof solutions for all scenarios.
  • There is concern that conformal prediction's probabilistic guarantees may incentivize the use of subpar prediction functions, leading to larger intervals.
  • The size of the prediction interval is influenced by the quality of the predictor used, with better predictors resulting in smaller intervals.
  • Assumptions like IID and exchangeability in conformal prediction can be limiting and may not hold in real-world scenarios, prompting research into methods beyond exchangeability.

Accommodating Non-Exchangeable Data and Online Cases

1:15:15 - 1:23:39

  • Exchangeability is a strong assumption in conformal prediction, with efforts to accommodate non-exchangeable data sequences
  • Conformal prediction can be approached in batch and online cases, each requiring different methods and considerations
  • In the online case, the focus is on empirical coverage over a sequence rather than probabilistic guarantees for individual test points
  • Methods in conformal prediction can adapt to arbitrary distribution shifts in the online setting
  • Weighting calibration points based on similarity to test points can impact coverage loss in non-IID data scenarios
  • Quantifying the impact of calibration point weighting on coverage loss provides insights into handling distribution shifts

Adaptive Control Mechanism and Future Research Directions

1:23:21 - 1:30:59

  • The speaker's method involves running random forests with squared error as the error metric and aiming for 90% coverage.
  • In conformal prediction, a knob is adjusted to achieve the desired coverage level, even if the algorithm is miscalibrated.
  • An adaptive control mechanism adjusts the algorithm's error level to ensure coverage in prediction sets.
  • Future research directions in conformal prediction include exploring non-ID cases and integrating it into practical pipelines for decision-making.
  • A survey conducted by the Delphi group in collaboration with Facebook gathered data related to COVID-19 symptoms, behaviors, demographics, and comorbidities.

Large-Scale Survey for Tracking and Forecasting during the Pandemic

1:30:39 - 1:39:05

  • The survey discussed in the transcript was a collaborative effort involving researchers, government officials, and various institutions, with 11 revisions made during the pandemic.
  • The survey had approximately 30 million respondents over the course of the pandemic, averaging over 50,000 respondents per day.
  • It was primarily used for tracking, forecasting, and policy implications during the pandemic.
  • Estimates of vaccine uptake based on the survey were found to be biased upwards due to non-response bias.
  • While some criticized the survey for biased estimates, it was argued that it could still be valuable for comparing changes over time rather than absolute quantities.

Assessing Data Quality in Epidemic Tracking and Forecasting

1:38:46 - 1:46:29

  • Data defect correlation is used to characterize data sets and understand their limitations
  • Effective sample size calculation helps in assessing the quality of a dataset
  • Measuring data quality in epidemic tracking and forecasting poses challenges due to various factors like resource costs and objections to different quantification methods
  • Assessing leading indicators and data quality involves using different methods like correlation, Granger causality, and forecasting models, each with limitations
  • There is no foolproof method for assessing data quality, leading to complexities and potential distrust in available methods
1