Files
instagram-scraper/USAGE-GUIDE.md

10 KiB

Instagram Scraper - Usage Guide

Complete guide to using the Instagram scraper with all available workflows.

🚀 Quick Start

The most comprehensive workflow that uses all scraper functions:

# Windows PowerShell
$env:INSTAGRAM_USERNAME="your_username"
$env:INSTAGRAM_PASSWORD="your_password"
$env:TARGET_USERNAME="instagram"
$env:MAX_FOLLOWING="20"
$env:MAX_PROFILES="5"
$env:MODE="full"

node server.js

What happens:

  1. 🔐 Login - Logs into Instagram with human-like behavior
  2. 💾 Save Session - Extracts and saves cookies to session_cookies.json
  3. 🌐 Browse - Simulates random mouse movements and scrolling
  4. 👥 Fetch Followings - Gets following list using API interception
  5. 👤 Scrape Profiles - Scrapes detailed data for each profile
  6. 📁 Save Data - Creates JSON files with all collected data

Output files:

  • followings_[username]_[timestamp].json - Full following list
  • profiles_[username]_[timestamp].json - Detailed profile data
  • session_cookies.json - Reusable session cookies

2. Simple Workflow

Uses the built-in scrapeWorkflow() function:

$env:MODE="simple"
node server.js

What it does:

  • Combines login + following fetch + profile scraping
  • Single output file with all data
  • Less granular control but simpler

3. Scheduled Workflow

Runs scraping on a schedule using cronJobs():

$env:MODE="scheduled"
$env:SCRAPE_INTERVAL="60"  # Minutes between runs
$env:MAX_RUNS="5"          # Stop after 5 runs
node server.js

Use case: Monitor a profile's followings over time

📋 Environment Variables

Variable Description Default Example
INSTAGRAM_USERNAME Your Instagram username your_username john_doe
INSTAGRAM_PASSWORD Your Instagram password your_password MySecureP@ss
TARGET_USERNAME Profile to scrape instagram cristiano
MAX_FOLLOWING Max followings to fetch 20 100
MAX_PROFILES Max profiles to scrape 5 50
PROXY Proxy server None proxy.com:8080
MODE Workflow type full simple, scheduled
SCRAPE_INTERVAL Minutes between runs (scheduled mode) 60 30
MAX_RUNS Max runs (scheduled mode) 5 10

🎯 Workflow Details

Full Workflow Step-by-Step

async function fullScrapingWorkflow() {
  // Step 1: Login
  const { browser, page } = await login(credentials, proxy);

  // Step 2: Extract session
  const session = await extractSession(page);

  // Step 3: Simulate browsing
  await simulateHumanBehavior(page, { mouseMovements: 5, scrolls: 3 });

  // Step 4: Get followings list
  const followingsData = await getFollowingsList(
    page,
    targetUsername,
    maxFollowing
  );

  // Step 5: Scrape individual profiles
  for (const username of followingsData.usernames) {
    const profileData = await scrapeProfile(page, username);
    // ... takes breaks every 3 profiles
  }

  // Step 6: Save all data
  // ... creates JSON files
}

What Each Function Does

login(credentials, proxy)

  • Launches browser with stealth mode
  • Sets anti-detection headers
  • Simulates human login behavior
  • Returns { browser, page }

extractSession(page)

  • Gets all cookies from current session
  • Returns { cookies: [...] }
  • Save for session reuse

simulateHumanBehavior(page, options)

  • Random mouse movements
  • Random scrolling
  • Mimics real user behavior
  • Options: { mouseMovements, scrolls, randomClicks }

getFollowingsList(page, username, maxUsers)

  • Navigates to profile
  • Clicks "following" button
  • Intercepts Instagram API responses
  • Returns { usernames: [...], fullData: [...] }

Full data includes:

{
  "pk": "310285748",
  "username": "example_user",
  "full_name": "Example User",
  "profile_pic_url": "https://...",
  "is_verified": true,
  "is_private": false,
  "fbid_v2": "...",
  "latest_reel_media": 1761853039
}

scrapeProfile(page, username)

  • Navigates to profile
  • Intercepts API endpoint
  • Falls back to DOM scraping if needed
  • Returns detailed profile data

Profile data includes:

{
  "username": "example_user",
  "full_name": "Example User",
  "bio": "Biography text...",
  "followerCount": 15000,
  "followingCount": 500,
  "postsCount": 100,
  "is_verified": true,
  "is_private": false,
  "is_business_account": true,
  "email": "contact@example.com",
  "phone": "+1234567890"
}

scrapeWorkflow(creds, targetUsername, proxy, maxFollowing)

  • Complete workflow in one function
  • Combines all steps above
  • Returns aggregated results

cronJobs(fn, intervalSec, stopAfter)

  • Runs function on interval
  • Returns stop function
  • Used for scheduled scraping

💡 Usage Examples

Example 1: Scrape Top Influencer's Followers

$env:INSTAGRAM_USERNAME="your_account"
$env:INSTAGRAM_PASSWORD="your_password"
$env:TARGET_USERNAME="cristiano"
$env:MAX_FOLLOWING="100"
$env:MAX_PROFILES="20"
node server.js

Example 2: Monitor Competitor Every Hour

$env:TARGET_USERNAME="competitor_account"
$env:MODE="scheduled"
$env:SCRAPE_INTERVAL="60"
$env:MAX_RUNS="24"  # Run for 24 hours
node server.js

Example 3: Scrape Multiple Accounts

Create scrape-multiple.js:

const { fullScrapingWorkflow } = require("./server.js");

const targets = ["account1", "account2", "account3"];

async function scrapeAll() {
  for (const target of targets) {
    process.env.TARGET_USERNAME = target;
    await fullScrapingWorkflow();

    // Wait between accounts
    await new Promise((r) => setTimeout(r, 300000)); // 5 minutes
  }
}

scrapeAll();

Example 4: Custom Workflow with Your Logic

const { login, getFollowingsList, scrapeProfile } = require("./scraper.js");

async function myCustomWorkflow() {
  // Login once
  const { browser, page } = await login({
    username: "your_username",
    password: "your_password",
  });

  try {
    // Get followings from multiple accounts
    const accounts = ["account1", "account2"];

    for (const account of accounts) {
      const followings = await getFollowingsList(page, account, 50);

      // Filter verified users only
      const verified = followings.fullData.filter((u) => u.is_verified);

      // Scrape verified profiles
      for (const user of verified) {
        const profile = await scrapeProfile(page, user.username);

        // Custom logic: save only if business account
        if (profile.is_business_account) {
          console.log(`Business: ${profile.username} - ${profile.email}`);
        }
      }
    }
  } finally {
    await browser.close();
  }
}

myCustomWorkflow();

🔍 Output Format

Followings Data

{
  "targetUsername": "instagram",
  "scrapedAt": "2025-10-31T12:00:00.000Z",
  "totalFollowings": 20,
  "followings": [
    {
      "pk": "123456",
      "username": "user1",
      "full_name": "User One",
      "is_verified": true,
      ...
    }
  ]
}

Profiles Data

{
  "targetUsername": "instagram",
  "scrapedAt": "2025-10-31T12:00:00.000Z",
  "totalProfiles": 5,
  "profiles": [
    {
      "username": "user1",
      "followerCount": 50000,
      "email": "contact@user1.com",
      ...
    }
  ]
}

Performance Tips

1. Optimize Delays

// Faster (more aggressive, higher block risk)
await randomSleep(1000, 2000);

// Balanced (recommended)
await randomSleep(2500, 6000);

// Safer (slower but less likely to be blocked)
await randomSleep(5000, 10000);

2. Batch Processing

Scrape in batches to avoid overwhelming Instagram:

const batchSize = 10;
for (let i = 0; i < usernames.length; i += batchSize) {
  const batch = usernames.slice(i, i + batchSize);
  // Scrape batch
  // Long break between batches
  await randomSleep(60000, 120000); // 1-2 minutes
}

3. Session Reuse

Reuse cookies to avoid logging in repeatedly:

const savedCookies = JSON.parse(fs.readFileSync("session_cookies.json"));
await page.setCookie(...savedCookies.cookies);

🚨 Common Issues

"Rate limited (429)"

Solution: Exponential backoff is automatic. If persistent:

  • Reduce MAX_FOLLOWING and MAX_PROFILES
  • Increase delays
  • Add residential proxies

"Login failed"

  • Check credentials
  • Instagram may require verification
  • Try from your home IP first

"No data captured"

  • Instagram changed their API structure
  • Check if doc_id values need updating
  • DOM fallback should still work

Blocked on cloud servers

Problem: Using datacenter IPs
Solution: Get residential proxies (see ANTI-BOT-RECOMMENDATIONS.md)

📊 Best Practices

  1. Start Small: Test with MAX_FOLLOWING=5, MAX_PROFILES=2
  2. Use Residential Proxies: Critical for production use
  3. Respect Rate Limits: ~200 requests/hour per IP
  4. Save Sessions: Reuse cookies to avoid repeated logins
  5. Monitor Logs: Watch for 429 errors
  6. Add Randomness: Vary delays and patterns
  7. Take Breaks: Schedule longer breaks every N profiles

🎓 Learning Path

  1. Start: Run MODE=simple with small numbers
  2. Understand: Read the logs and output files
  3. Customize: Modify MAX_FOLLOWING and MAX_PROFILES
  4. Advanced: Use MODE=full for complete control
  5. Production: Add proxies and session management

Need help? Check: