Glossary
A reference companion covering statistics, econometrics, business data solutions and policy studies.
99 terms
A
ARDL Bounds Testing
A cointegration testing approach developed by Pesaran, Shin, and Smith (2001) that does not require all variables to share the same order of integration. The procedure uses F- and t-statistics … A cointegration testing approach developed by Pesaran, Shin, and Smith (2001) that does not require all variables to share the same order of integration. The procedure uses F- and t-statistics on lagged levels within an autoregressive distributed lag (ARDL) error-correction model. Two sets of critical values form upper and lower bounds. If the test statistic exceeds the upper bound the null of no level relationship is rejected. Read more
Akaike's Information Criterion (AIC)
A model selection criterion that balances goodness of fit against model complexity. Given a set of candidate models, the one with the lowest AIC is preferred. AIC rewards fit through … A model selection criterion that balances goodness of fit against model complexity. Given a set of candidate models, the one with the lowest AIC is preferred. AIC rewards fit through a likelihood term but penalises the number of estimated parameters to discourage overfitting. The penalty in AIC is smaller than in the Bayesian Information Criterion (BIC), making AIC relatively more tolerant of complex models. AIC is widely used in stepwise regression and time-series model selection (e.g., choosing lag orders in a VAR). Read more
Alternative Hypothesis
The statement a statistical hypothesis test aims to establish evidence for. If the data provide sufficient grounds to reject the null hypothesis, the conclusion favours the alternative. Whether the test … The statement a statistical hypothesis test aims to establish evidence for. If the data provide sufficient grounds to reject the null hypothesis, the conclusion favours the alternative. Whether the test is one-sided or two-sided depends on how the alternative is formulated. Denoted Ha. Read more
Augmented Dickey-Fuller (ADF) Test
A widely used unit-root test that augments the basic Dickey-Fuller regression with lagged first differences of the dependent variable to account for serial correlation in the error term. The ADF … A widely used unit-root test that augments the basic Dickey-Fuller regression with lagged first differences of the dependent variable to account for serial correlation in the error term. The ADF test is typically a first step in any time-series analysis to determine whether differencing is needed to achieve stationarity. Read more
Autocorrelation
The degree of similarity between observations as a function of the lag separating them in time or space. A positively autocorrelated series exhibits values that are more alike when they … The degree of similarity between observations as a function of the lag separating them in time or space. A positively autocorrelated series exhibits values that are more alike when they are close together than when far apart. Autocorrelation is central to time-series analysis. Read more
B
Bayesian Inference
A statistical framework in which probability expresses a degree of belief rather than a long-run frequency. Bayesian methods update a prior distribution for parameters using observed data via Bayes' theorem … A statistical framework in which probability expresses a degree of belief rather than a long-run frequency. Bayesian methods update a prior distribution for parameters using observed data via Bayes' theorem to obtain a posterior distribution. Bayesian inference is increasingly used in policy evaluation, macroeconomic modelling, and decision analysis. Read more
Bayesian Information Criterion (BIC)
A model selection criterion similar to AIC that trades off goodness of fit against complexity. BIC's penalty for each additional parameter is larger than AIC's, so BIC tends to favour … A model selection criterion similar to AIC that trades off goodness of fit against complexity. BIC's penalty for each additional parameter is larger than AIC's, so BIC tends to favour simpler models, especially in large samples. Read more
Bias
The systematic difference between a statistic's expected value and the true population parameter it estimates. An estimator is unbiased if, on average across repeated samples, it equals the parameter. Bias … The systematic difference between a statistic's expected value and the true population parameter it estimates. An estimator is unbiased if, on average across repeated samples, it equals the parameter. Bias can arise from model misspecification, measurement error, sample selection, or omitted variables. Read more
Block Bootstrap
A resampling technique designed for dependent (time-series) data. Instead of resampling individual observations, the block bootstrap resamples contiguous blocks to preserve the serial dependence structure within each block. A resampling technique designed for dependent (time-series) data. Instead of resampling individual observations, the block bootstrap resamples contiguous blocks to preserve the serial dependence structure within each block. Read more
Bootstrap
A broad class of resampling methods that estimate the sampling distribution of a statistic by repeatedly drawing samples (with replacement) from the observed data. Bootstrapping is used to construct confidence … A broad class of resampling methods that estimate the sampling distribution of a statistic by repeatedly drawing samples (with replacement) from the observed data. Bootstrapping is used to construct confidence intervals, estimate standard errors, and perform hypothesis tests. Read more
Box Plot
A graphical summary of a dataset's distribution. The box spans the interquartile range (IQR), with a line at the median. Whiskers typically extend to 1.5 x IQR beyond the quartiles, … A graphical summary of a dataset's distribution. The box spans the interquartile range (IQR), with a line at the median. Whiskers typically extend to 1.5 x IQR beyond the quartiles, and points beyond the whiskers are plotted individually as potential outliers. Read more
C
Categorical Variable
A variable that takes a limited, fixed set of distinct values representing categories. Examples include gender, blood type, or region. Categorical data are typically summarised in contingency tables and analysed … A variable that takes a limited, fixed set of distinct values representing categories. Examples include gender, blood type, or region. Categorical data are typically summarised in contingency tables and analysed with chi-squared tests or logistic regression. Read more
Coefficient of Variation (CV)
The ratio of the standard deviation to the arithmetic mean of a dataset, often expressed as a percentage. It provides a dimensionless measure of relative variability, making it useful for … The ratio of the standard deviation to the arithmetic mean of a dataset, often expressed as a percentage. It provides a dimensionless measure of relative variability, making it useful for comparing dispersion across variables measured on different scales. Read more
Cointegration
A statistical property of two or more non-stationary time series that share a common stochastic trend, such that a specific linear combination of them is stationary. Cointegration is foundational in … A statistical property of two or more non-stationary time series that share a common stochastic trend, such that a specific linear combination of them is stationary. Cointegration is foundational in macroeconomic and financial modelling. Read more
Confidence Interval
A range of values, computed from sample data, that is expected to contain the true population parameter with a specified probability (the confidence level). Confidence intervals quantify estimation uncertainty and … A range of values, computed from sample data, that is expected to contain the true population parameter with a specified probability (the confidence level). Confidence intervals quantify estimation uncertainty and are more informative than point estimates alone. Read more
Confidence Level
The probability that a confidence interval contains the true parameter value. Common choices are 90%, 95%, and 99%. Higher confidence levels produce wider intervals. The probability that a confidence interval contains the true parameter value. Common choices are 90%, 95%, and 99%. Higher confidence levels produce wider intervals.
Connectedness
A framework, pioneered by Diebold and Yilmaz (2012, 2014), for measuring how shocks transmit across variables in a system using forecast-error variance decompositions from a vector autoregression. A framework, pioneered by Diebold and Yilmaz (2012, 2014), for measuring how shocks transmit across variables in a system using forecast-error variance decompositions from a vector autoregression. Read more
Correlation
A measure of the strength and direction of the linear association between two variables, ranging from -1 (perfect negative) to +1 (perfect positive). Correlation does not imply causation. A measure of the strength and direction of the linear association between two variables, ranging from -1 (perfect negative) to +1 (perfect positive). Correlation does not imply causation. Read more
Cox Proportional-Hazards Regression
A semi-parametric survival model that relates the hazard (instantaneous event rate) to one or more explanatory variables through a log-linear function. The model assumes proportional hazards. A semi-parametric survival model that relates the hazard (instantaneous event rate) to one or more explanatory variables through a log-linear function. The model assumes proportional hazards. Read more
Critical Region
The set of values of a test statistic for which the null hypothesis is rejected. It is defined so that, under the null, the probability of the test statistic falling … The set of values of a test statistic for which the null hypothesis is rejected. It is defined so that, under the null, the probability of the test statistic falling in this region equals the chosen significance level. Read more
Cross-Quantilogram
A measure introduced by Han, Linton, Oka, and Whang (2016) for detecting and quantifying directional predictability between two time series at specific quantiles. A measure introduced by Han, Linton, Oka, and Whang (2016) for detecting and quantifying directional predictability between two time series at specific quantiles.
D
Difference-in-Differences (DiD)
A quasi-experimental research design that estimates a causal effect by comparing changes over time between a treatment group and a control group. DiD is one of the most widely used … A quasi-experimental research design that estimates a causal effect by comparing changes over time between a treatment group and a control group. DiD is one of the most widely used methods in policy evaluation. Read more
Divergence
In statistics and information theory, divergence is a measure of how one probability distribution differs from another. The Kullback-Leibler (KL) divergence is the most common variant. In statistics and information theory, divergence is a measure of how one probability distribution differs from another. The Kullback-Leibler (KL) divergence is the most common variant. Read more
E
Effect Size
A quantitative measure of the magnitude of a difference or relationship observed in the data. Common effect-size statistics include Cohen's d, the odds ratio, and R squared. A quantitative measure of the magnitude of a difference or relationship observed in the data. Common effect-size statistics include Cohen's d, the odds ratio, and R squared.
Endogeneity
A condition in which an explanatory variable in a regression model is correlated with the error term, leading to biased and inconsistent estimates. Common causes include omitted-variable bias, measurement error, … A condition in which an explanatory variable in a regression model is correlated with the error term, leading to biased and inconsistent estimates. Common causes include omitted-variable bias, measurement error, and simultaneous causality. Read more
Exogeneity
A property of a variable indicating that it is determined outside the model and is not correlated with the model's error term. Understanding exogeneity is critical for determining whether a … A property of a variable indicating that it is determined outside the model and is not correlated with the model's error term. Understanding exogeneity is critical for determining whether a model's estimates can be given a causal or structural interpretation. Read more
F
Finite Population Correction
An adjustment to the standard error applied when the sample represents more than about 5% of the population. It accounts for the added precision that comes from sampling a large … An adjustment to the standard error applied when the sample represents more than about 5% of the population. It accounts for the added precision that comes from sampling a large fraction of a finite population. Read more
Forecast-Error Variance Decomposition (FEVD)
A tool from VAR analysis that breaks down the forecast-error variance of each variable into the proportions attributable to shocks from each variable in the system. A tool from VAR analysis that breaks down the forecast-error variance of each variable into the proportions attributable to shocks from each variable in the system.
G
GARCH
A class of time-series models that capture the tendency of financial returns to exhibit volatility clustering. The basic GARCH(1,1) model specifies the conditional variance as a function of past squared … A class of time-series models that capture the tendency of financial returns to exhibit volatility clustering. The basic GARCH(1,1) model specifies the conditional variance as a function of past squared residuals and past conditional variances. GARCH models are essential in risk management, option pricing, and volatility forecasting. Read more
Generalised Linear Model (GLM)
A flexible extension of ordinary linear regression that accommodates response variables with non-Normal error distributions. Logistic regression (binary outcomes) and Poisson regression (count data) are common special cases. A flexible extension of ordinary linear regression that accommodates response variables with non-Normal error distributions. Logistic regression (binary outcomes) and Poisson regression (count data) are common special cases. Read more
Gini Coefficient
A measure of inequality in a distribution, equal to half the relative mean absolute difference. It ranges from 0 (perfect equality) to 1 (maximal inequality). Most commonly applied to income … A measure of inequality in a distribution, equal to half the relative mean absolute difference. It ranges from 0 (perfect equality) to 1 (maximal inequality). Most commonly applied to income or wealth distributions. Read more
Granger Causality
A concept of predictive causality: variable X is said to Granger-cause variable Y if past values of X contain information that helps predict Y beyond what is contained in Y's … A concept of predictive causality: variable X is said to Granger-cause variable Y if past values of X contain information that helps predict Y beyond what is contained in Y's own past. Granger causality does not establish true causation in the structural sense. Read more
Graph Theory
The mathematical study of networks composed of nodes (vertices) and edges (links). In econometrics and financial risk, graph-theoretic tools underpin network analysis of connectedness. The mathematical study of networks composed of nodes (vertices) and edges (links). In econometrics and financial risk, graph-theoretic tools underpin network analysis of connectedness. Read more
H
Heteroscedasticity
A condition in which the variance of the error term in a regression model is not constant across observations. When present, OLS estimates remain unbiased but standard errors become unreliable. A condition in which the variance of the error term in a regression model is not constant across observations. When present, OLS estimates remain unbiased but standard errors become unreliable. Read more
Histogram
A graphical representation of the distribution of a continuous variable, constructed by dividing the data range into bins and plotting the frequency of observations in each bin as adjacent rectangles. A graphical representation of the distribution of a continuous variable, constructed by dividing the data range into bins and plotting the frequency of observations in each bin as adjacent rectangles. Read more
Hypothesis Testing
A formal procedure for deciding between two competing statements (the null and alternative hypotheses) using sample data. Hypothesis testing is fundamental to virtually all empirical research. A formal procedure for deciding between two competing statements (the null and alternative hypotheses) using sample data. Hypothesis testing is fundamental to virtually all empirical research. Read more
I
Impulse Response Function (IRF)
A function that traces the dynamic effect of a one-unit shock to one variable in a VAR system on all variables over subsequent time periods. IRFs are essential for understanding … A function that traces the dynamic effect of a one-unit shock to one variable in a VAR system on all variables over subsequent time periods. IRFs are essential for understanding the propagation and persistence of shocks. Read more
Independent and Identically Distributed (IID)
A foundational assumption in many statistical methods requiring that observations are mutually independent and drawn from the same probability distribution. A foundational assumption in many statistical methods requiring that observations are mutually independent and drawn from the same probability distribution.
Instrumental Variables (IV)
A method for obtaining consistent estimates when an explanatory variable is endogenous. An instrument is a variable that is correlated with the endogenous regressor (relevance) but uncorrelated with the error … A method for obtaining consistent estimates when an explanatory variable is endogenous. An instrument is a variable that is correlated with the endogenous regressor (relevance) but uncorrelated with the error term (exclusion restriction). Read more
Interquartile Range (IQR)
The difference between the 75th percentile and the 25th percentile of a dataset. It captures the spread of the middle 50% of the data and is robust to outliers. The difference between the 75th percentile and the 25th percentile of a dataset. It captures the spread of the middle 50% of the data and is robust to outliers.
K
Key Driver Analysis
A set of analytical techniques that identify which factors most strongly influence a business outcome. Methods include regression-based importance, Shapley values, and machine-learning-based feature importance. A set of analytical techniques that identify which factors most strongly influence a business outcome. Methods include regression-based importance, Shapley values, and machine-learning-based feature importance. Read more
L
Likert Scale
A symmetric, ordered response scale commonly used in surveys to measure attitudes or opinions. A typical five-point Likert scale ranges from Strongly Agree to Strongly Disagree with a neutral midpoint. A symmetric, ordered response scale commonly used in surveys to measure attitudes or opinions. A typical five-point Likert scale ranges from Strongly Agree to Strongly Disagree with a neutral midpoint. Read more
Log-Rank Test
A non-parametric hypothesis test used in survival analysis to compare the event-time distributions across two or more groups. It accounts for right-censored observations. A non-parametric hypothesis test used in survival analysis to compare the event-time distributions across two or more groups. It accounts for right-censored observations.
Logistic Transformation (Logit)
The transformation that maps a probability p to the log-odds: log(p / (1 - p)). This maps a bounded quantity to the entire real line, enabling linear modelling of binary … The transformation that maps a probability p to the log-odds: log(p / (1 - p)). This maps a bounded quantity to the entire real line, enabling linear modelling of binary outcomes.
M
Margin of Error
The maximum expected difference between the true population parameter and a sample estimate, typically expressed as plus or minus a percentage. A smaller margin of error demands a larger sample. The maximum expected difference between the true population parameter and a sample estimate, typically expressed as plus or minus a percentage. A smaller margin of error demands a larger sample. Read more
Maximum Entropy
A principle from information theory stating that the probability distribution which best represents current knowledge is the one with the highest entropy subject to the constraints imposed by observed data. A principle from information theory stating that the probability distribution which best represents current knowledge is the one with the highest entropy subject to the constraints imposed by observed data. Read more
Maximum Likelihood Estimation (MLE)
An estimation method that selects parameter values maximising the likelihood function. MLE is the dominant estimation paradigm in modern statistics. Under regularity conditions, MLEs are consistent, asymptotically efficient, and asymptotically … An estimation method that selects parameter values maximising the likelihood function. MLE is the dominant estimation paradigm in modern statistics. Under regularity conditions, MLEs are consistent, asymptotically efficient, and asymptotically normal. Read more
Mean (Arithmetic)
The sum of all values in a dataset divided by the number of observations. It is the most common measure of central tendency but is sensitive to outliers. The sum of all values in a dataset divided by the number of observations. It is the most common measure of central tendency but is sensitive to outliers.
Mean Difference
The arithmetic mean of all pairwise absolute differences between observations. It is a measure of dispersion that gives less weight to extreme deviations compared with the standard deviation. The arithmetic mean of all pairwise absolute differences between observations. It is a measure of dispersion that gives less weight to extreme deviations compared with the standard deviation. Read more
Median
The middle value of an ordered dataset. As a measure of central tendency, the median is more robust to outliers than the arithmetic mean, making it preferred for skewed distributions. The middle value of an ordered dataset. As a measure of central tendency, the median is more robust to outliers than the arithmetic mean, making it preferred for skewed distributions. Read more
Mode
The most frequently occurring value in a dataset. It is the only measure of central tendency applicable to nominal (categorical) data. The most frequently occurring value in a dataset. It is the only measure of central tendency applicable to nominal (categorical) data.
Modern Portfolio Theory (MPT)
A framework introduced by Markowitz (1952) for constructing portfolios that maximise expected return for a given level of risk. MPT formalises diversification by showing that portfolio risk depends critically on … A framework introduced by Markowitz (1952) for constructing portfolios that maximise expected return for a given level of risk. MPT formalises diversification by showing that portfolio risk depends critically on the covariances between assets. Read more
Monte Carlo Simulation
A computational technique that uses repeated random sampling to approximate the distribution of a quantity that is analytically intractable. In econometrics, Monte Carlo experiments study finite-sample properties of estimators and … A computational technique that uses repeated random sampling to approximate the distribution of a quantity that is analytically intractable. In econometrics, Monte Carlo experiments study finite-sample properties of estimators and test statistics. Read more
Multicollinearity
A condition in regression analysis where two or more predictor variables are highly correlated, making it difficult to isolate the individual effect of each. Variance inflation factors (VIFs) are commonly … A condition in regression analysis where two or more predictor variables are highly correlated, making it difficult to isolate the individual effect of each. Variance inflation factors (VIFs) are commonly used for detection. Read more
N
Nominal Variable
A type of categorical variable whose categories have labels but no inherent order. Examples include blood type, country of origin, and industry classification. A type of categorical variable whose categories have labels but no inherent order. Examples include blood type, country of origin, and industry classification.
Normal Distribution
A continuous probability distribution characterised by its symmetric bell-shaped curve. The Normal distribution is central to statistics because the Central Limit Theorem guarantees convergence of the sampling distribution of the … A continuous probability distribution characterised by its symmetric bell-shaped curve. The Normal distribution is central to statistics because the Central Limit Theorem guarantees convergence of the sampling distribution of the mean. Read more
Null Hypothesis
The default statement assumed to be true in a hypothesis test until the data provide sufficient evidence against it. It typically asserts no effect, no difference, or no association. Denoted … The default statement assumed to be true in a hypothesis test until the data provide sufficient evidence against it. It typically asserts no effect, no difference, or no association. Denoted H0. Read more
O
Odds
The ratio of the probability of an event occurring to the probability of it not occurring: p / (1 - p). Odds are used extensively in logistic regression and medical … The ratio of the probability of an event occurring to the probability of it not occurring: p / (1 - p). Odds are used extensively in logistic regression and medical statistics.
Odds Ratio
The ratio of the odds of an outcome in one group to the odds in another. An odds ratio of 1 indicates no association. Odds ratios are the natural effect … The ratio of the odds of an outcome in one group to the odds in another. An odds ratio of 1 indicates no association. Odds ratios are the natural effect measure in logistic regression. Read more
Ordinal Variable
A type of categorical variable whose categories have a meaningful order or ranking but where the intervals between categories are not necessarily equal. A type of categorical variable whose categories have a meaningful order or ranking but where the intervals between categories are not necessarily equal.
Ordinary Least Squares (OLS)
The most widely used regression estimation method, which chooses coefficient estimates that minimise the sum of squared residuals. Under the Gauss-Markov assumptions, OLS is the Best Linear Unbiased Estimator (BLUE). The most widely used regression estimation method, which chooses coefficient estimates that minimise the sum of squared residuals. Under the Gauss-Markov assumptions, OLS is the Best Linear Unbiased Estimator (BLUE). Read more
P
P-value
The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the data, assuming the null hypothesis is true. The p-value does not … The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the data, assuming the null hypothesis is true. The p-value does not measure the probability that the null hypothesis is true. Read more
Panel Data
A dataset that follows the same set of cross-sectional units over multiple time periods. Panel data combine cross-sectional variation with time-series dynamics, enabling researchers to control for unobserved heterogeneity. A dataset that follows the same set of cross-sectional units over multiple time periods. Panel data combine cross-sectional variation with time-series dynamics, enabling researchers to control for unobserved heterogeneity. Read more
Percentile
The value below which a given percentage of observations falls. The 50th percentile is the median, the 25th is the lower quartile, and the 75th is the upper quartile. The value below which a given percentage of observations falls. The 50th percentile is the median, the 25th is the lower quartile, and the 75th is the upper quartile.
Phillips-Perron (PP) Test
A unit-root test that uses a non-parametric correction for serial correlation and heteroscedasticity rather than adding lagged differences. It is a useful complement to the ADF test. A unit-root test that uses a non-parametric correction for serial correlation and heteroscedasticity rather than adding lagged differences. It is a useful complement to the ADF test. Read more
Polychoric Correlation
An estimate of the correlation between two continuous latent variables underlying a pair of observed ordinal variables. Used in factor analysis and structural equation modelling. An estimate of the correlation between two continuous latent variables underlying a pair of observed ordinal variables. Used in factor analysis and structural equation modelling.
Power (Statistical)
The probability that a hypothesis test correctly rejects the null hypothesis when a true effect exists. Power = 1 - beta. Adequate power (commonly 80% or higher) requires sufficient sample … The probability that a hypothesis test correctly rejects the null hypothesis when a true effect exists. Power = 1 - beta. Adequate power (commonly 80% or higher) requires sufficient sample size. Read more
Probability Density Function (PDF)
A function that describes the relative likelihood for a continuous random variable to take a given value. The total area under the density curve equals one. A function that describes the relative likelihood for a continuous random variable to take a given value. The total area under the density curve equals one.
Propensity Score Matching
A quasi-experimental technique that estimates the causal effect of a treatment by matching treated and untreated units with similar propensity scores. Widely used in policy evaluation and health economics. A quasi-experimental technique that estimates the causal effect of a treatment by matching treated and untreated units with similar propensity scores. Widely used in policy evaluation and health economics. Read more
Q
Quantile
A value that divides an ordered dataset into equal-sized subsets. Quantiles are robust to outliers and form the basis of quantile regression. A value that divides an ordered dataset into equal-sized subsets. Quantiles are robust to outliers and form the basis of quantile regression.
Quantile Regression
A regression framework that models the conditional quantiles of a response variable rather than the conditional mean. Quantile regression is robust to outliers and does not require assumptions about the … A regression framework that models the conditional quantiles of a response variable rather than the conditional mean. Quantile regression is robust to outliers and does not require assumptions about the error distribution. Read more
Quantile VAR (QVAR)
An extension of the standard VAR model that allows dynamic interactions among variables to vary across quantiles of the distribution. Particularly relevant for stress testing and growth-at-risk exercises. An extension of the standard VAR model that allows dynamic interactions among variables to vary across quantiles of the distribution. Particularly relevant for stress testing and growth-at-risk exercises. Read more
Quartile
The three values that divide an ordered dataset into four equal parts. Quartiles underpin the interquartile range and box plots. The three values that divide an ordered dataset into four equal parts. Quartiles underpin the interquartile range and box plots.
R
ROC Area Under the Curve (AUC)
A summary statistic representing the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. AUC = 1.0 indicates perfect discrimination. A summary statistic representing the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. AUC = 1.0 indicates perfect discrimination. Read more
Range
The difference between the maximum and minimum values in a dataset. It is the simplest measure of spread but is highly sensitive to outliers. The difference between the maximum and minimum values in a dataset. It is the simplest measure of spread but is highly sensitive to outliers.
Receiver Operating Characteristic (ROC) Curve
A plot of the true positive rate against the false positive rate across all classification thresholds. The area under the curve (AUC) summarises overall discriminative performance. A plot of the true positive rate against the false positive rate across all classification thresholds. The area under the curve (AUC) summarises overall discriminative performance.
Regression Discontinuity Design (RDD)
A quasi-experimental method that exploits a threshold in an assignment variable to estimate causal effects. RDD is considered one of the most credible non-experimental designs. A quasi-experimental method that exploits a threshold in an assignment variable to estimate causal effects. RDD is considered one of the most credible non-experimental designs.
Regularisation
A set of techniques that prevent overfitting by adding a penalty term to the model's loss function. Common forms include L1 (Lasso) and L2 (Ridge) regularisation. A set of techniques that prevent overfitting by adding a penalty term to the model's loss function. Common forms include L1 (Lasso) and L2 (Ridge) regularisation.
Relative Mean Difference
The mean absolute difference between all pairs of values, divided by the arithmetic mean. It is directly related to the Gini coefficient. The mean absolute difference between all pairs of values, divided by the arithmetic mean. It is directly related to the Gini coefficient.
Robust Standard Errors
Standard error estimates that remain valid under violations of classical assumptions such as homoscedasticity or independence. Standard error estimates that remain valid under violations of classical assumptions such as homoscedasticity or independence.
S
Sensitivity Testing
The study of how variation in the inputs or assumptions of a model affects its output. Sensitivity analysis is used to assess model robustness. The study of how variation in the inputs or assumptions of a model affects its output. Sensitivity analysis is used to assess model robustness.
Significance Level
The pre-specified probability threshold below which the null hypothesis is rejected. Common choices are 0.05 and 0.01. It also equals the type I error rate. The pre-specified probability threshold below which the null hypothesis is rejected. Common choices are 0.05 and 0.01. It also equals the type I error rate.
Skewness
A measure of the asymmetry of a distribution. Right-skewed (positive) distributions have a longer right tail. Highly skewed data may call for transformations or non-parametric techniques. A measure of the asymmetry of a distribution. Right-skewed (positive) distributions have a longer right tail. Highly skewed data may call for transformations or non-parametric techniques. Read more
Spurious Regression
A phenomenon in which a regression between two independent non-stationary time series produces apparently significant results even though no genuine relationship exists. Cointegration tests and differencing are the standard safeguards. A phenomenon in which a regression between two independent non-stationary time series produces apparently significant results even though no genuine relationship exists. Cointegration tests and differencing are the standard safeguards. Read more
Standard Deviation
The square root of the variance, measuring how dispersed values are around the mean. It shares the same units as the original data, making it more interpretable than the variance. The square root of the variance, measuring how dispersed values are around the mean. It shares the same units as the original data, making it more interpretable than the variance.
Standard Error
The standard deviation of the sampling distribution of a statistic. It quantifies how precisely the sample estimate represents the corresponding population parameter. The standard deviation of the sampling distribution of a statistic. It quantifies how precisely the sample estimate represents the corresponding population parameter.
Stationarity
A time series is weakly stationary if its mean, variance, and autocovariance structure do not change over time. Stationarity is a critical assumption in most time-series models. A time series is weakly stationary if its mean, variance, and autocovariance structure do not change over time. Stationarity is a critical assumption in most time-series models.
Stationary Bootstrap
A resampling method designed for stationary time series that constructs pseudo-time series by concatenating blocks of random length drawn from a geometric distribution. A resampling method designed for stationary time series that constructs pseudo-time series by concatenating blocks of random length drawn from a geometric distribution.
Statistical Significance
A result is statistically significant when the p-value falls below the pre-determined significance level. Statistical significance does not necessarily imply practical importance. A result is statistically significant when the p-value falls below the pre-determined significance level. Statistical significance does not necessarily imply practical importance.
Stepwise Regression
A systematic approach to variable selection that iteratively adds or removes predictors based on a criterion such as AIC, BIC, or p-values. A systematic approach to variable selection that iteratively adds or removes predictors based on a criterion such as AIC, BIC, or p-values.
Structural Breaks
Abrupt shifts in the parameters of a statistical model. Detecting structural breaks is vital in macroeconomic and policy analysis, where regime changes alter the data-generating process. Abrupt shifts in the parameters of a statistical model. Detecting structural breaks is vital in macroeconomic and policy analysis, where regime changes alter the data-generating process. Read more
Survival Analysis
A branch of statistics focused on modelling the time until an event occurs. Key tools include Kaplan-Meier survival curves, log-rank tests, and Cox proportional-hazards regression. A branch of statistics focused on modelling the time until an event occurs. Key tools include Kaplan-Meier survival curves, log-rank tests, and Cox proportional-hazards regression.
T
Type I and Type II Errors
A Type I error (false positive) occurs when the null hypothesis is incorrectly rejected. A Type II error (false negative) occurs when a false null hypothesis is not rejected. Reducing … A Type I error (false positive) occurs when the null hypothesis is incorrectly rejected. A Type II error (false negative) occurs when a false null hypothesis is not rejected. Reducing one type typically increases the other. Read more
U
Unit Root
A characteristic of a time series whose autoregressive polynomial has a root equal to one, implying that shocks have a permanent effect and the series is non-stationary. A characteristic of a time series whose autoregressive polynomial has a root equal to one, implying that shocks have a permanent effect and the series is non-stationary.
Univariate Analysis
The analysis of a single variable at a time, focusing on describing its distribution, central tendency, and dispersion. Univariate analysis is typically the first stage of data exploration. The analysis of a single variable at a time, focusing on describing its distribution, central tendency, and dispersion. Univariate analysis is typically the first stage of data exploration. Read more
V
Variance
The arithmetic mean of the squared deviations from the mean. It measures the spread of a distribution. The variance is the square of the standard deviation. The arithmetic mean of the squared deviations from the mean. It measures the spread of a distribution. The variance is the square of the standard deviation.
Variance Decomposition
See Forecast-Error Variance Decomposition (FEVD). Variance decompositions from a VAR are interpreted as a network showing how shocks transmit across variables. See Forecast-Error Variance Decomposition (FEVD). Variance decompositions from a VAR are interpreted as a network showing how shocks transmit across variables.
Vector Autoregression (VAR)
A multivariate time-series model in which each variable is regressed on its own lags and the lags of every other variable in the system. VARs are foundational in macroeconomic forecasting … A multivariate time-series model in which each variable is regressed on its own lags and the lags of every other variable in the system. VARs are foundational in macroeconomic forecasting and financial-connectedness research. Read more
Volatility Spillover
The transmission of volatility (risk) from one market or asset to another. Diebold and Yilmaz (2012) proposed a framework based on generalised VAR forecast-error variance decompositions to measure total and … The transmission of volatility (risk) from one market or asset to another. Diebold and Yilmaz (2012) proposed a framework based on generalised VAR forecast-error variance decompositions to measure total and directional volatility spillovers. Read more
No terms found.
Try a different search or clear the tag filter.