F-Statistic Explanation
User: Thava Alagu (thavamuni@gmail.com)
Created: 3/25/2025 14:12
what is F statistics ?
The F-statistic is a ratio of two variances used in statistical analysis, particularly in ANOVA (Analysis of Variance) and regression analysis. It helps determine whether group means are significantly different from each other or if a regression model is statistically significant.
or in terms of mean squares:
where:
Would you like an example of an F-test calculation? 🚀
How is F critical value calculated ?
The F-critical value is a threshold from the F-distribution that helps determine whether to reject the null hypothesis in an F-test. It depends on three key factors:
Significance Level (α) – Typically 0.05 (5%) or 0.01 (1%).
Degrees of Freedom (df1, df2):
F-Distribution Table or Statistical Function – The critical value is found using statistical tables or software.
Identify
Choose Significance Level (α):
Use an F-table or Statistical Software:
Look up
Or use Python/R:
Python (SciPy):
from scipy.stats import f
alpha = 0.05 # 5% significance level
df1 = 3 # Example numerator df
df2 = 20 # Example denominator df
f_critical = f.ppf(1 - alpha, df1, df2)
print(f_critical)
R:
qf(0.95, df1, df2)
Suppose we have:
Degrees of freedom:
From an F-table or Python, the F-critical value ≈ 3.35 at α = 0.05.
Would you like me to compute an F-critical value for your specific case? 📊
There is an assumption that the F-statistic itself follows certain distribution -- what is the shape of such distribution ?
Finished thinking.
The F-statistic follows the F-distribution, which is a right-skewed (positively skewed) distribution. The shape of the F-distribution depends on the degrees of freedom (df1, df2).
Right-Skewed:
Non-Negative:
Depends on Degrees of Freedom:
Mean of the F-Distribution:
The expected value of an F-distribution is approximately:
As
Here’s how the F-distribution looks for different degrees of freedom:
Shape of F-Distribution | ||
---|---|---|
1 | 5 | Highly skewed right |
2 | 10 | Moderately skewed |
10 | 30 | Nearly normal |
30 | 100 | Almost symmetric |
🔹 Higher df → More normal-like
🔹 Lower df → More skewed
Would you like a Python or R code snippet to visualize the F-distribution? 📊
What is T statistic
The t-statistic is a measure used in t-tests to determine if there is a significant difference between the means of two groups. It helps test hypotheses when sample sizes are small, and population variance is unknown.
For a one-sample t-test (comparing a sample mean
where:
For a two-sample t-test (comparing means of two independent samples):
where:
A larger absolute t-value means a greater difference between groups.
The p-value (calculated from the t-distribution) determines significance:
Would you like an example of a t-test calculation? 📊
yes
Let’s say a company claims that the average salary of employees is $50,000 per year. We take a random sample of 10 employees and find:
We want to test whether the actual mean salary differs significantly from
Using the formula:
For a one-tailed t-test at
we look up the t-table or use Python:
from scipy.stats import t
alpha = 0.05
df = 9
t_critical = t.ppf(1 - alpha, df)
print(t_critical) # Output: 1.833
The critical t-value is ±1.833 (for a two-tailed test, it's ±2.262).
Since |t| < t-critical (1.90 < 2.262), we fail to reject the null hypothesis.
Conclusion: There is no significant evidence that the true mean salary is different from $50,000.
from scipy.stats import ttest_1samp
sample_data = [47000, 48000, 46000, 49000, 50000, 47500, 45500, 49500, 46500, 48000]
pop_mean = 50000
t_stat, p_value = ttest_1samp(sample_data, pop_mean)
print(f"T-Statistic: {t_stat:.3f}, P-Value: {p_value:.3f}")
Would you like a two-sample t-test example as well? 🚀
what is p-value in the context of F-Statistic or T-Statistic ?
The p-value is a probability that measures the strength of evidence against the null hypothesis (
Small p-value (p < 0.05) → Strong evidence against
Large p-value (p > 0.05) → Weak evidence against
Thresholds:
For a t-test, the p-value represents the probability of observing a t-statistic as extreme as the calculated one if the null hypothesis (
Example:
For an F-test (like in ANOVA), the p-value measures how likely it is that the variances between groups are equal.
Example:
from scipy.stats import t
t_stat = 2.1
df = 10
p_value = 2 * (1 - t.cdf(t_stat, df)) # Two-tailed test
print(f"P-Value: {p_value:.4f}")
from scipy.stats import f
F_stat = 4.2
df1, df2 = 3, 20
p_value = 1 - f.cdf(F_stat, df1, df2)
print(f"P-Value: {p_value:.4f}")
Would you like an example where we compute p-values from real data? 🚀
what is chi-square test
The Chi-Square Test (χ² test) is a statistical test used to determine whether there is a significant association between categorical variables. It is commonly used in contingency tables, goodness-of-fit tests, and tests of independence.
Chi-Square Goodness-of-Fit Test
Chi-Square Test for Independence (Contingency Table Test)
where:
Let's say we survey 200 people about their preference for Tea or Coffee based on their Gender (Male/Female).
Tea | Coffee | Total | |
---|---|---|---|
Male | 30 | 70 | 100 |
Female | 50 | 50 | 100 |
Total | 80 | 120 | 200 |
We test whether Gender and Beverage Choice are independent.
Expected Frequency:
For Male & Tea:
For Male & Coffee:
For Female & Tea:
For Female & Coffee:
Using
import scipy.stats as stats
import numpy as np
# Observed data
observed = np.array([[30, 70], [50, 50]])
# Perform Chi-Square test
chi2, p, dof, expected = stats.chi2_contingency(observed)
print(f"Chi-Square Statistic: {chi2:.2f}")
print(f"P-Value: {p:.4f}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies:\n", expected)
Would you like an example for a goodness-of-fit test? 🚀