About this project

How JobGuard works

A machine learning system trained on 18,000 real and fake job postings to protect job seekers from scams.

The problem we're solving

Every day thousands of job seekers fall victim to fraudulent job postings. These scams steal personal information, charge upfront fees, and waste months of a person's time. JobGuard uses machine learning to detect these scams instantly — before you apply.

18,000+Jobs in training dataset
97.4%Model accuracy
4.8%Fraud rate in dataset
< 1sDetection speed

How the ML model works

The model uses a two-stage pipeline. First, a TF-IDF vectorizer converts the job text into 15,000 numerical features — each one representing how important a word or phrase is in that posting. Then a Logistic Regression classifier weighs each feature and outputs a probability of the job being fake.

1

Text cleaning

All 7 fields are combined — title, company, location, salary, description, requirements, benefits — into one text string. HTML tags and special characters are removed.

2

TF-IDF vectorizer

Converts text into 15,000 numbers using unigrams and bigrams. Bigrams like "no experience" and "work from home" are far more powerful signals than single words alone.

3

Logistic Regression

Trained with class_weight='balanced' to handle the imbalanced dataset (only 4.8% fake). Outputs a probability score — the confidence percentage you see in the result.

Signals the model looks for

Fake job signals

  • "No experience required"
  • Unrealistically high salary
  • "Immediate start", "apply now"
  • Vague company descriptions
  • "No background check"
  • Weekly PayPal payments
  • Generic job descriptions

Real job signals

  • Specific degree requirements
  • Market-aligned salary ranges
  • Years of experience specified
  • Verifiable company names
  • Standard benefits (401k, health)
  • Technical skills listed
  • Professional language throughout

The dataset

This project uses the EMSCAD dataset — Employment Scam Aegean Dataset — published on Kaggle. It contains 17,880 real-world job postings collected between 2012 and 2014, labeled as legitimate or fraudulent by human reviewers.

Dataset nameEMSCAD — Real or Fake Job Postings
SourceKaggle — shivamb
Total records17,880 job postings
Fake postings866 (4.84%)
Real postings17,014 (95.16%)
Train / test split80% train, 20% test

Tech stack

Frontend
Next.js 14CSS ModulesTypeScriptReact Hook Form
Backend
FastAPIPython 3.11UvicornPydantic
ML / Data
scikit-learnpandasnumpyjoblib
Deployment
VercelRenderGitHubCI/CD

Important disclaimer

JobGuard is an AI-powered tool with 97.4% accuracy on the test dataset. It is not 100% guaranteed and should not be your only method of verifying a job posting. Always research the company independently, never pay upfront fees, and never share sensitive personal information before verifying a company's legitimacy.

Ready to check a job?

Paste any job listing and get an instant verdict.

Analyze a job posting