Unit 3: Evaluating model
Unit 3: Evaluating model
What is Model Evaluation?
Model evaluation means checking how well a machine learning model performs using different evaluation metrics.
Model evaluation means checking how well a machine learning model performs using different evaluation metrics.
✔ Why is it important?
Helps us know if the model is performing well.
Works like a report card for the AI model.
Gives feedback → so we can improve the model.
Helps us select the best model
Why Do We Need Model Evaluation?
Model evaluation
Tells the strengths and weaknesses of a model
Shows how well a model will work on future / unseen data
Helps build reliable and trustworthy AI systems
Is a necessary step before using the model in real life
Just like a school report card helps students improve,
model evaluation helps AI models improve.
Model evaluation
Tells the strengths and weaknesses of a model
Shows how well a model will work on future / unseen data
Helps build reliable and trustworthy AI systems
Is a necessary step before using the model in real life
Just like a school report card helps students improve,
model evaluation helps AI models improve.
Train–Test Split (Evaluation Technique)
✔ What is train-test split?
It is a method to check a model’s performance by dividing the dataset into:
Training set → Used to teach the model
Testing set → Used to check the model
✔ What is train-test split?
It is a method to check a model’s performance by dividing the dataset into:
Training set → Used to teach the model
Testing set → Used to check the model
✔ Why is train-test split needed?
To check how the model performs on new data
To avoid overfitting (when a model memorizes the training data and fails on new data)
To estimate future performance
To build a model that predicts correctly for unseen cases
✔ Key Point:
You should not test the model on the same data used for training.
5. Accuracy and Error
✔ Bob & Billy Example (Simple Understanding)
Entry fee = ₹500
Bob brings ₹300 → error = 500 – 300 = 200
Billy brings ₹550 → error = 550 – 500 = 50
Billy is more accurate because he is closer to the correct amount.
✔ Bob & Billy Example (Simple Understanding)
Entry fee = ₹500
Bob brings ₹300 → error = 500 – 300 = 200
Billy brings ₹550 → error = 550 – 500 = 50
Billy is more accurate because he is closer to the correct amount.
6. What is Accuracy?
Accuracy tells us how many predictions the model got correct.Correct Predictions/ Total Predictions
Accuracy tells us how many predictions the model got correct.Correct Predictions/ Total Predictions
✔ Higher accuracy = better model performance.
7. What is Error?
Error is the difference between the predicted value and actual value.
It shows how wrong the model’s prediction is.
✔ Goal: Minimize error
✔ Example:
If the model says “no disease” but the person actually has a disease → this is an error.
Error is the difference between the predicted value and actual value.
It shows how wrong the model’s prediction is.
✔ Goal: Minimize error
✔ Example:
If the model says “no disease” but the person actually has a disease → this is an error.
Evaluation helps choose the best model and avoid overfitting.
EVALUATION METRICS FOR CLASSIFICATION
What is Classification?
Classification means sorting or grouping items into different categories.
Simple Example:
You are in a supermarket with two trolleys:
One trolley → fruits & vegetables
Another trolley → grocery items
You are classifying items into:
Fruits/Vegetables
Grocery
Classification means sorting or grouping items into different categories.
Simple Example:
You are in a supermarket with two trolleys:
One trolley → fruits & vegetables
Another trolley → grocery items
You are classifying items into:
Fruits/Vegetables
Grocery
Definition
Classification is a machine learning task where the model predicts a class label based on input data.
Examples of Classification
Predicting whether an item is a vegetable or grocery
Email: spam or not spam
Predicting whether a patient has disease = Yes/No
Credit card fraud detection (Fraud / Not Fraud)
Classification Metrics
These are methods to measure how good a classification model is.
Common metrics are:
Confusion Matrix
Accuracy
Precision
Recall
F1 Score
These are methods to measure how good a classification model is.
Common metrics are:
Confusion Matrix
Accuracy
Precision
Recall
F1 Score
Confusion Matrix
A confusion matrix shows:
Actual values on the Y-axis
Predicted values on the X-axis
It contains four important terms:
A confusion matrix shows:
Actual values on the Y-axis
Predicted values on the X-axis
It contains four important terms:
| Predicted Yes | Predicted No | |
|---|---|---|
| Actual Yes | TP | FN |
| Actual No | FP | TN |
Meaning of the Four Terms
1. True Positive (TP)
Model predicted Yes → actually Yes
Example: You predicted a person has disease, and they really have it.
Model predicted Yes → actually Yes
Example: You predicted a person has disease, and they really have it.
2. True Negative (TN)
Model predicted No → actually No
Example: You predicted a person does not have disease, and they are healthy.
3. False Positive (FP)
Model predicted Yes → actually No
Wrong prediction (Type-I error)
Example: You predicted a person has disease, but they are healthy.
Model predicted Yes → actually No
Wrong prediction (Type-I error)
Example: You predicted a person has disease, but they are healthy.
4. False Negative (FN)
Model predicted No → actually Yes
Very dangerous (Type-II error)
Example: You predicted a patient is healthy, but they actually have the disease.
Accuracy
Accuracy tells us how many predictions were correct.
✔ When is accuracy useful?
When dataset is balanced (equal number of Yes/No examples)
⚠ When NOT to use accuracy?
When dataset is unbalanced
Accuracy tells us how many predictions were correct.
✔ When is accuracy useful?
When dataset is balanced (equal number of Yes/No examples)
⚠ When NOT to use accuracy?
When dataset is unbalanced
Example in textbook:
900 Yes, 100 No
Model predicts everything “Yes” → still 90% accuracy (but it's useless)
Precision
Precision tells out of all predicted YES, how many were correctly YES.
✔ When to use Precision?
When False Positives (FP) must be minimized.
Example use case: Satellite Launch
Predicting bad weather as good weather is dangerous
FP must be avoided → use Precision
Precision tells out of all predicted YES, how many were correctly YES.
✔ When to use Precision?
When False Positives (FP) must be minimized.
Example use case: Satellite Launch
Predicting bad weather as good weather is dangerous
FP must be avoided → use Precision
6. Recall (Sensitivity / True Positive Rate)
Recall tells out of all actual YES, how many were predicted correctly.
✔ When to use Recall?
When False Negatives (FN) must be minimized.
Example use case: COVID-19 diagnosis
Predicting a sick person as healthy (FN) is dangerous
Recall must be high → use Recall
✔ When to use Recall?
When False Negatives (FN) must be minimized.
Example use case: COVID-19 diagnosis
Predicting a sick person as healthy (FN) is dangerous
Recall must be high → use Recall
F1 Score
F1 Score combines both Precision and Recall into one value.
✔ When to use F1 Score?
Dataset is unbalanced
Can't decide whether FP or FN is more important
F1 Score combines both Precision and Recall into one value.
✔ When to use F1 Score?
Dataset is unbalanced
Can't decide whether FP or FN is more important
Classification Metric Selection – Summary Table
| Scenario | Most Important Metric | Why? |
|---|---|---|
| Satellite launch (Bad weather predicted as good = dangerous) | Precision | Avoid FP |
| COVID-19 detection (Sick person predicted as healthy = risky) | Recall | Avoid FN |
| Fraud detection | Recall | Missing a fraud (FN) is costly |
| Balanced dataset | Accuracy | All errors equally important |
| Unbalanced dataset & unsure | F1 Score | Balanced measure |
Ethical Concerns in Model Evaluation
While evaluating a model, we must consider:
✔ Fairness
Ensure the model is not biased toward any group.
✔ Privacy
Do not expose personal or sensitive data.
✔ Transparency
The decision-making logic should be explainable.
✔ Avoid Harm
Wrong predictions should not result in physical, emotional, or financial harm.
✔ Accountability
Developers must take responsibility for model errors.
While evaluating a model, we must consider:
✔ Fairness
Ensure the model is not biased toward any group.
✔ Privacy
Do not expose personal or sensitive data.
✔ Transparency
The decision-making logic should be explainable.
✔ Avoid Harm
Wrong predictions should not result in physical, emotional, or financial harm.
✔ Accountability
Developers must take responsibility for model errors.
Comments
Post a Comment