10 KiB
10 KiB
Instagram Scraper - Usage Guide
Complete guide to using the Instagram scraper with all available workflows.
🚀 Quick Start
1. Full Workflow (Recommended)
The most comprehensive workflow that uses all scraper functions:
# Windows PowerShell
$env:INSTAGRAM_USERNAME="your_username"
$env:INSTAGRAM_PASSWORD="your_password"
$env:TARGET_USERNAME="instagram"
$env:MAX_FOLLOWING="20"
$env:MAX_PROFILES="5"
$env:MODE="full"
node server.js
What happens:
- 🔐 Login - Logs into Instagram with human-like behavior
- 💾 Save Session - Extracts and saves cookies to
session_cookies.json - 🌐 Browse - Simulates random mouse movements and scrolling
- 👥 Fetch Followings - Gets following list using API interception
- 👤 Scrape Profiles - Scrapes detailed data for each profile
- 📁 Save Data - Creates JSON files with all collected data
Output files:
followings_[username]_[timestamp].json- Full following listprofiles_[username]_[timestamp].json- Detailed profile datasession_cookies.json- Reusable session cookies
2. Simple Workflow
Uses the built-in scrapeWorkflow() function:
$env:MODE="simple"
node server.js
What it does:
- Combines login + following fetch + profile scraping
- Single output file with all data
- Less granular control but simpler
3. Scheduled Workflow
Runs scraping on a schedule using cronJobs():
$env:MODE="scheduled"
$env:SCRAPE_INTERVAL="60" # Minutes between runs
$env:MAX_RUNS="5" # Stop after 5 runs
node server.js
Use case: Monitor a profile's followings over time
📋 Environment Variables
| Variable | Description | Default | Example |
|---|---|---|---|
INSTAGRAM_USERNAME |
Your Instagram username | your_username |
john_doe |
INSTAGRAM_PASSWORD |
Your Instagram password | your_password |
MySecureP@ss |
TARGET_USERNAME |
Profile to scrape | instagram |
cristiano |
MAX_FOLLOWING |
Max followings to fetch | 20 |
100 |
MAX_PROFILES |
Max profiles to scrape | 5 |
50 |
PROXY |
Proxy server | None |
proxy.com:8080 |
MODE |
Workflow type | full |
simple, scheduled |
SCRAPE_INTERVAL |
Minutes between runs (scheduled mode) | 60 |
30 |
MAX_RUNS |
Max runs (scheduled mode) | 5 |
10 |
🎯 Workflow Details
Full Workflow Step-by-Step
async function fullScrapingWorkflow() {
// Step 1: Login
const { browser, page } = await login(credentials, proxy);
// Step 2: Extract session
const session = await extractSession(page);
// Step 3: Simulate browsing
await simulateHumanBehavior(page, { mouseMovements: 5, scrolls: 3 });
// Step 4: Get followings list
const followingsData = await getFollowingsList(
page,
targetUsername,
maxFollowing
);
// Step 5: Scrape individual profiles
for (const username of followingsData.usernames) {
const profileData = await scrapeProfile(page, username);
// ... takes breaks every 3 profiles
}
// Step 6: Save all data
// ... creates JSON files
}
What Each Function Does
login(credentials, proxy)
- Launches browser with stealth mode
- Sets anti-detection headers
- Simulates human login behavior
- Returns
{ browser, page }
extractSession(page)
- Gets all cookies from current session
- Returns
{ cookies: [...] } - Save for session reuse
simulateHumanBehavior(page, options)
- Random mouse movements
- Random scrolling
- Mimics real user behavior
- Options:
{ mouseMovements, scrolls, randomClicks }
getFollowingsList(page, username, maxUsers)
- Navigates to profile
- Clicks "following" button
- Intercepts Instagram API responses
- Returns
{ usernames: [...], fullData: [...] }
Full data includes:
{
"pk": "310285748",
"username": "example_user",
"full_name": "Example User",
"profile_pic_url": "https://...",
"is_verified": true,
"is_private": false,
"fbid_v2": "...",
"latest_reel_media": 1761853039
}
scrapeProfile(page, username)
- Navigates to profile
- Intercepts API endpoint
- Falls back to DOM scraping if needed
- Returns detailed profile data
Profile data includes:
{
"username": "example_user",
"full_name": "Example User",
"bio": "Biography text...",
"followerCount": 15000,
"followingCount": 500,
"postsCount": 100,
"is_verified": true,
"is_private": false,
"is_business_account": true,
"email": "contact@example.com",
"phone": "+1234567890"
}
scrapeWorkflow(creds, targetUsername, proxy, maxFollowing)
- Complete workflow in one function
- Combines all steps above
- Returns aggregated results
cronJobs(fn, intervalSec, stopAfter)
- Runs function on interval
- Returns stop function
- Used for scheduled scraping
💡 Usage Examples
Example 1: Scrape Top Influencer's Followers
$env:INSTAGRAM_USERNAME="your_account"
$env:INSTAGRAM_PASSWORD="your_password"
$env:TARGET_USERNAME="cristiano"
$env:MAX_FOLLOWING="100"
$env:MAX_PROFILES="20"
node server.js
Example 2: Monitor Competitor Every Hour
$env:TARGET_USERNAME="competitor_account"
$env:MODE="scheduled"
$env:SCRAPE_INTERVAL="60"
$env:MAX_RUNS="24" # Run for 24 hours
node server.js
Example 3: Scrape Multiple Accounts
Create scrape-multiple.js:
const { fullScrapingWorkflow } = require("./server.js");
const targets = ["account1", "account2", "account3"];
async function scrapeAll() {
for (const target of targets) {
process.env.TARGET_USERNAME = target;
await fullScrapingWorkflow();
// Wait between accounts
await new Promise((r) => setTimeout(r, 300000)); // 5 minutes
}
}
scrapeAll();
Example 4: Custom Workflow with Your Logic
const { login, getFollowingsList, scrapeProfile } = require("./scraper.js");
async function myCustomWorkflow() {
// Login once
const { browser, page } = await login({
username: "your_username",
password: "your_password",
});
try {
// Get followings from multiple accounts
const accounts = ["account1", "account2"];
for (const account of accounts) {
const followings = await getFollowingsList(page, account, 50);
// Filter verified users only
const verified = followings.fullData.filter((u) => u.is_verified);
// Scrape verified profiles
for (const user of verified) {
const profile = await scrapeProfile(page, user.username);
// Custom logic: save only if business account
if (profile.is_business_account) {
console.log(`Business: ${profile.username} - ${profile.email}`);
}
}
}
} finally {
await browser.close();
}
}
myCustomWorkflow();
🔍 Output Format
Followings Data
{
"targetUsername": "instagram",
"scrapedAt": "2025-10-31T12:00:00.000Z",
"totalFollowings": 20,
"followings": [
{
"pk": "123456",
"username": "user1",
"full_name": "User One",
"is_verified": true,
...
}
]
}
Profiles Data
{
"targetUsername": "instagram",
"scrapedAt": "2025-10-31T12:00:00.000Z",
"totalProfiles": 5,
"profiles": [
{
"username": "user1",
"followerCount": 50000,
"email": "contact@user1.com",
...
}
]
}
⚡ Performance Tips
1. Optimize Delays
// Faster (more aggressive, higher block risk)
await randomSleep(1000, 2000);
// Balanced (recommended)
await randomSleep(2500, 6000);
// Safer (slower but less likely to be blocked)
await randomSleep(5000, 10000);
2. Batch Processing
Scrape in batches to avoid overwhelming Instagram:
const batchSize = 10;
for (let i = 0; i < usernames.length; i += batchSize) {
const batch = usernames.slice(i, i + batchSize);
// Scrape batch
// Long break between batches
await randomSleep(60000, 120000); // 1-2 minutes
}
3. Session Reuse
Reuse cookies to avoid logging in repeatedly:
const savedCookies = JSON.parse(fs.readFileSync("session_cookies.json"));
await page.setCookie(...savedCookies.cookies);
🚨 Common Issues
"Rate limited (429)"
✅ Solution: Exponential backoff is automatic. If persistent:
- Reduce MAX_FOLLOWING and MAX_PROFILES
- Increase delays
- Add residential proxies
"Login failed"
- Check credentials
- Instagram may require verification
- Try from your home IP first
"No data captured"
- Instagram changed their API structure
- Check if
doc_idvalues need updating - DOM fallback should still work
Blocked on cloud servers
❌ Problem: Using datacenter IPs
✅ Solution: Get residential proxies (see ANTI-BOT-RECOMMENDATIONS.md)
📊 Best Practices
- Start Small: Test with MAX_FOLLOWING=5, MAX_PROFILES=2
- Use Residential Proxies: Critical for production use
- Respect Rate Limits: ~200 requests/hour per IP
- Save Sessions: Reuse cookies to avoid repeated logins
- Monitor Logs: Watch for 429 errors
- Add Randomness: Vary delays and patterns
- Take Breaks: Schedule longer breaks every N profiles
🎓 Learning Path
- Start: Run
MODE=simplewith small numbers - Understand: Read the logs and output files
- Customize: Modify
MAX_FOLLOWINGandMAX_PROFILES - Advanced: Use
MODE=fullfor complete control - Production: Add proxies and session management
Need help? Check:
- ANTI-BOT-RECOMMENDATIONS.md
- EXPONENTIAL-BACKOFF.md
- Test script:
node test-retry.js