How JobGuard works
A machine learning system trained on 18,000 real and fake job postings to protect job seekers from scams.
The problem we're solving
Every day thousands of job seekers fall victim to fraudulent job postings. These scams steal personal information, charge upfront fees, and waste months of a person's time. JobGuard uses machine learning to detect these scams instantly — before you apply.
How the ML model works
The model uses a two-stage pipeline. First, a TF-IDF vectorizer converts the job text into 15,000 numerical features — each one representing how important a word or phrase is in that posting. Then a Logistic Regression classifier weighs each feature and outputs a probability of the job being fake.
Text cleaning
All 7 fields are combined — title, company, location, salary, description, requirements, benefits — into one text string. HTML tags and special characters are removed.
TF-IDF vectorizer
Converts text into 15,000 numbers using unigrams and bigrams. Bigrams like "no experience" and "work from home" are far more powerful signals than single words alone.
Logistic Regression
Trained with class_weight='balanced' to handle the imbalanced dataset (only 4.8% fake). Outputs a probability score — the confidence percentage you see in the result.
Signals the model looks for
Fake job signals
- "No experience required"
- Unrealistically high salary
- "Immediate start", "apply now"
- Vague company descriptions
- "No background check"
- Weekly PayPal payments
- Generic job descriptions
Real job signals
- Specific degree requirements
- Market-aligned salary ranges
- Years of experience specified
- Verifiable company names
- Standard benefits (401k, health)
- Technical skills listed
- Professional language throughout
The dataset
This project uses the EMSCAD dataset — Employment Scam Aegean Dataset — published on Kaggle. It contains 17,880 real-world job postings collected between 2012 and 2014, labeled as legitimate or fraudulent by human reviewers.
Tech stack
Important disclaimer
JobGuard is an AI-powered tool with 97.4% accuracy on the test dataset. It is not 100% guaranteed and should not be your only method of verifying a job posting. Always research the company independently, never pay upfront fees, and never share sensitive personal information before verifying a company's legitimacy.