feat: Instagram scraper with GraphQL API integration - Automated followings list extraction via API interception - Profile scraping using GraphQL endpoint interception - DOM fallback for edge cases - Performance timing for all operations - Anti-bot measures and human-like behavior simulation

This commit is contained in:
2025-10-31 23:06:06 +05:45
parent ba2dcec881
commit 6f4f37bee5
8 changed files with 3474 additions and 0 deletions

6
.gitignore vendored
View File

@@ -136,3 +136,9 @@ dist
.yarn/install-state.gz
.pnp.*
# Instagram scraper sensitive files
session_cookies.json
*.json
!package.json
!package-lock.json

179
ANTI-BOT-RECOMMENDATIONS.md Normal file
View File

@@ -0,0 +1,179 @@
# Instagram Scraper - Anti-Bot Detection Recommendations
Based on [Scrapfly's Instagram Scraping Guide](https://scrapfly.io/blog/posts/how-to-scrape-instagram)
## ✅ Already Implemented
1. **Puppeteer Stealth Plugin** - Bypasses basic browser detection
2. **Random User Agents** - Different browser signatures
3. **Human-like behaviors**:
- Mouse movements
- Random scrolling
- Variable delays (2.5-6 seconds between profiles)
- Typing delays
- Breaks every 10 profiles
4. **Variable viewport sizes** - Randomized window dimensions
5. **Network payload interception** - Capturing API responses instead of DOM scraping
6. **Critical headers** - Including `x-ig-app-id: 936619743392459`
## ⚠️ Critical Improvements Needed
### 1. **Residential Proxies** (MOST IMPORTANT)
**Status**: ❌ Not implemented
**Issue**:
- Datacenter IPs (AWS, Google Cloud, etc.) are **blocked instantly** by Instagram
- Your current setup will be detected as soon as you deploy to any cloud server
**Solution**:
```javascript
const browser = await puppeteer.launch({
headless: true,
args: [
"--proxy-server=residential-proxy-provider.com:port",
// Residential proxies required - NOT datacenter
],
});
```
**Recommended Proxy Providers**:
- Bright Data (formerly Luminati)
- Oxylabs
- Smartproxy
- GeoSurf
**Requirements**:
- Must be residential IPs (from real ISPs like Comcast, AT&T)
- Rotate IPs every 5-10 minutes (sticky sessions)
- Each IP allows ~200 requests/hour
- Cost: ~$10-15 per GB
### 2. **Rate Limit Handling with Exponential Backoff**
**Status**: ⚠️ Partial - needs improvement
**Current**: Random delays exist
**Needed**: Proper 429 error handling
```javascript
async function makeRequest(fn, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await fn();
} catch (error) {
if (error.status === 429 && i < retries - 1) {
const delay = Math.pow(2, i) * 2000; // 2s, 4s, 8s
console.log(`Rate limited, waiting ${delay}ms...`);
await new Promise((res) => setTimeout(res, delay));
continue;
}
throw error;
}
}
}
```
### 3. **Session Cookies Management**
**Status**: ⚠️ Partial - extractSession exists but not reused
**Issue**: Creating new sessions repeatedly looks suspicious
**Solution**:
- Save cookies after login
- Reuse cookies across multiple scraping sessions
- Rotate sessions periodically
```javascript
// Save cookies after login
const cookies = await extractSession(page);
fs.writeFileSync("session.json", JSON.stringify(cookies));
// Reuse cookies in next session
const savedCookies = JSON.parse(fs.readFileSync("session.json"));
await page.setCookie(...savedCookies.cookies);
```
### 4. **Realistic Browsing Patterns**
**Status**: ✅ Implemented but can improve
**Additional improvements**:
- Visit homepage before going to target profile
- Occasionally view posts/stories during following list scraping
- Don't always scrape in the same order (randomize)
- Add occasional "browsing breaks" of 30-60 seconds
### 5. **Monitor doc_id Changes**
**Status**: ❌ Not monitoring
**Issue**: Instagram changes GraphQL `doc_id` values every 2-4 weeks
**Current doc_ids** (as of article):
- Profile posts: `9310670392322965`
- Post details: `8845758582119845`
- Reels: `25981206651899035`
**Solution**:
- Monitor Instagram's GraphQL requests in browser DevTools
- Update when API calls start failing
- Or use a service like Scrapfly that auto-updates
## 📊 Instagram's Blocking Layers
1. **IP Quality Check** → Blocks datacenter IPs instantly
2. **TLS Fingerprinting** → Detects non-browser tools (Puppeteer Stealth helps)
3. **Rate Limiting** → ~200 requests/hour per IP
4. **Behavioral Detection** → Flags unnatural patterns
## 🎯 Priority Implementation Order
1. **HIGH PRIORITY**: Add residential proxy support
2. **HIGH PRIORITY**: Implement exponential backoff for 429 errors
3. **MEDIUM**: Improve session cookie reuse
4. **MEDIUM**: Add doc_id monitoring system
5. **LOW**: Additional browsing pattern randomization
## 💰 Cost Estimates (for 10,000 profiles)
- **Proxy bandwidth**: ~750 MB
- **Cost**: $7.50-$11.25 in residential proxy fees
- **With Proxy Saver**: $5.25-$7.88 (30-50% savings)
## 🚨 Legal Considerations
- Only scrape **publicly available** data
- Respect rate limits
- Don't store PII of EU citizens without GDPR compliance
- Add delays to avoid damaging Instagram's servers
- Check Instagram's Terms of Service
## 📚 Additional Resources
- [Scrapfly Instagram Scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper) - Open source reference
- [Instagram GraphQL Endpoint Documentation](https://scrapfly.io/blog/posts/how-to-scrape-instagram#how-instagrams-scraping-api-works)
- [Proxy comparison guide](https://scrapfly.io/blog/best-proxy-providers-for-web-scraping)
## ⚡ Quick Wins
Things you can implement immediately:
1. ✅ Critical headers added (x-ig-app-id)
2. ✅ Human simulation functions integrated
3. ✅ Exponential backoff added (see EXPONENTIAL-BACKOFF.md)
4. Implement cookie persistence (15 min)
5. Research residential proxy providers (1 hour)
---
**Bottom Line**: Without residential proxies, this scraper will be blocked immediately on any cloud infrastructure. That's the #1 priority to address.

407
USAGE-GUIDE.md Normal file
View File

@@ -0,0 +1,407 @@
# Instagram Scraper - Usage Guide
Complete guide to using the Instagram scraper with all available workflows.
## 🚀 Quick Start
### 1. Full Workflow (Recommended)
The most comprehensive workflow that uses all scraper functions:
```bash
# Windows PowerShell
$env:INSTAGRAM_USERNAME="your_username"
$env:INSTAGRAM_PASSWORD="your_password"
$env:TARGET_USERNAME="instagram"
$env:MAX_FOLLOWING="20"
$env:MAX_PROFILES="5"
$env:MODE="full"
node server.js
```
**What happens:**
1. 🔐 **Login** - Logs into Instagram with human-like behavior
2. 💾 **Save Session** - Extracts and saves cookies to `session_cookies.json`
3. 🌐 **Browse** - Simulates random mouse movements and scrolling
4. 👥 **Fetch Followings** - Gets following list using API interception
5. 👤 **Scrape Profiles** - Scrapes detailed data for each profile
6. 📁 **Save Data** - Creates JSON files with all collected data
**Output files:**
- `followings_[username]_[timestamp].json` - Full following list
- `profiles_[username]_[timestamp].json` - Detailed profile data
- `session_cookies.json` - Reusable session cookies
### 2. Simple Workflow
Uses the built-in `scrapeWorkflow()` function:
```bash
$env:MODE="simple"
node server.js
```
**What it does:**
- Combines login + following fetch + profile scraping
- Single output file with all data
- Less granular control but simpler
### 3. Scheduled Workflow
Runs scraping on a schedule using `cronJobs()`:
```bash
$env:MODE="scheduled"
$env:SCRAPE_INTERVAL="60" # Minutes between runs
$env:MAX_RUNS="5" # Stop after 5 runs
node server.js
```
**Use case:** Monitor a profile's followings over time
## 📋 Environment Variables
| Variable | Description | Default | Example |
| -------------------- | ------------------------------------- | --------------- | --------------------- |
| `INSTAGRAM_USERNAME` | Your Instagram username | `your_username` | `john_doe` |
| `INSTAGRAM_PASSWORD` | Your Instagram password | `your_password` | `MySecureP@ss` |
| `TARGET_USERNAME` | Profile to scrape | `instagram` | `cristiano` |
| `MAX_FOLLOWING` | Max followings to fetch | `20` | `100` |
| `MAX_PROFILES` | Max profiles to scrape | `5` | `50` |
| `PROXY` | Proxy server | `None` | `proxy.com:8080` |
| `MODE` | Workflow type | `full` | `simple`, `scheduled` |
| `SCRAPE_INTERVAL` | Minutes between runs (scheduled mode) | `60` | `30` |
| `MAX_RUNS` | Max runs (scheduled mode) | `5` | `10` |
## 🎯 Workflow Details
### Full Workflow Step-by-Step
```javascript
async function fullScrapingWorkflow() {
// Step 1: Login
const { browser, page } = await login(credentials, proxy);
// Step 2: Extract session
const session = await extractSession(page);
// Step 3: Simulate browsing
await simulateHumanBehavior(page, { mouseMovements: 5, scrolls: 3 });
// Step 4: Get followings list
const followingsData = await getFollowingsList(
page,
targetUsername,
maxFollowing
);
// Step 5: Scrape individual profiles
for (const username of followingsData.usernames) {
const profileData = await scrapeProfile(page, username);
// ... takes breaks every 3 profiles
}
// Step 6: Save all data
// ... creates JSON files
}
```
### What Each Function Does
#### `login(credentials, proxy)`
- Launches browser with stealth mode
- Sets anti-detection headers
- Simulates human login behavior
- Returns `{ browser, page }`
#### `extractSession(page)`
- Gets all cookies from current session
- Returns `{ cookies: [...] }`
- Save for session reuse
#### `simulateHumanBehavior(page, options)`
- Random mouse movements
- Random scrolling
- Mimics real user behavior
- Options: `{ mouseMovements, scrolls, randomClicks }`
#### `getFollowingsList(page, username, maxUsers)`
- Navigates to profile
- Clicks "following" button
- Intercepts Instagram API responses
- Returns `{ usernames: [...], fullData: [...] }`
**Full data includes:**
```json
{
"pk": "310285748",
"username": "example_user",
"full_name": "Example User",
"profile_pic_url": "https://...",
"is_verified": true,
"is_private": false,
"fbid_v2": "...",
"latest_reel_media": 1761853039
}
```
#### `scrapeProfile(page, username)`
- Navigates to profile
- Intercepts API endpoint
- Falls back to DOM scraping if needed
- Returns detailed profile data
**Profile data includes:**
```json
{
"username": "example_user",
"full_name": "Example User",
"bio": "Biography text...",
"followerCount": 15000,
"followingCount": 500,
"postsCount": 100,
"is_verified": true,
"is_private": false,
"is_business_account": true,
"email": "contact@example.com",
"phone": "+1234567890"
}
```
#### `scrapeWorkflow(creds, targetUsername, proxy, maxFollowing)`
- Complete workflow in one function
- Combines all steps above
- Returns aggregated results
#### `cronJobs(fn, intervalSec, stopAfter)`
- Runs function on interval
- Returns stop function
- Used for scheduled scraping
## 💡 Usage Examples
### Example 1: Scrape Top Influencer's Followers
```bash
$env:INSTAGRAM_USERNAME="your_account"
$env:INSTAGRAM_PASSWORD="your_password"
$env:TARGET_USERNAME="cristiano"
$env:MAX_FOLLOWING="100"
$env:MAX_PROFILES="20"
node server.js
```
### Example 2: Monitor Competitor Every Hour
```bash
$env:TARGET_USERNAME="competitor_account"
$env:MODE="scheduled"
$env:SCRAPE_INTERVAL="60"
$env:MAX_RUNS="24" # Run for 24 hours
node server.js
```
### Example 3: Scrape Multiple Accounts
Create `scrape-multiple.js`:
```javascript
const { fullScrapingWorkflow } = require("./server.js");
const targets = ["account1", "account2", "account3"];
async function scrapeAll() {
for (const target of targets) {
process.env.TARGET_USERNAME = target;
await fullScrapingWorkflow();
// Wait between accounts
await new Promise((r) => setTimeout(r, 300000)); // 5 minutes
}
}
scrapeAll();
```
### Example 4: Custom Workflow with Your Logic
```javascript
const { login, getFollowingsList, scrapeProfile } = require("./scraper.js");
async function myCustomWorkflow() {
// Login once
const { browser, page } = await login({
username: "your_username",
password: "your_password",
});
try {
// Get followings from multiple accounts
const accounts = ["account1", "account2"];
for (const account of accounts) {
const followings = await getFollowingsList(page, account, 50);
// Filter verified users only
const verified = followings.fullData.filter((u) => u.is_verified);
// Scrape verified profiles
for (const user of verified) {
const profile = await scrapeProfile(page, user.username);
// Custom logic: save only if business account
if (profile.is_business_account) {
console.log(`Business: ${profile.username} - ${profile.email}`);
}
}
}
} finally {
await browser.close();
}
}
myCustomWorkflow();
```
## 🔍 Output Format
### Followings Data
```json
{
"targetUsername": "instagram",
"scrapedAt": "2025-10-31T12:00:00.000Z",
"totalFollowings": 20,
"followings": [
{
"pk": "123456",
"username": "user1",
"full_name": "User One",
"is_verified": true,
...
}
]
}
```
### Profiles Data
```json
{
"targetUsername": "instagram",
"scrapedAt": "2025-10-31T12:00:00.000Z",
"totalProfiles": 5,
"profiles": [
{
"username": "user1",
"followerCount": 50000,
"email": "contact@user1.com",
...
}
]
}
```
## ⚡ Performance Tips
### 1. Optimize Delays
```javascript
// Faster (more aggressive, higher block risk)
await randomSleep(1000, 2000);
// Balanced (recommended)
await randomSleep(2500, 6000);
// Safer (slower but less likely to be blocked)
await randomSleep(5000, 10000);
```
### 2. Batch Processing
Scrape in batches to avoid overwhelming Instagram:
```javascript
const batchSize = 10;
for (let i = 0; i < usernames.length; i += batchSize) {
const batch = usernames.slice(i, i + batchSize);
// Scrape batch
// Long break between batches
await randomSleep(60000, 120000); // 1-2 minutes
}
```
### 3. Session Reuse
Reuse cookies to avoid logging in repeatedly:
```javascript
const savedCookies = JSON.parse(fs.readFileSync("session_cookies.json"));
await page.setCookie(...savedCookies.cookies);
```
## 🚨 Common Issues
### "Rate limited (429)"
**Solution**: Exponential backoff is automatic. If persistent:
- Reduce MAX_FOLLOWING and MAX_PROFILES
- Increase delays
- Add residential proxies
### "Login failed"
- Check credentials
- Instagram may require verification
- Try from your home IP first
### "No data captured"
- Instagram changed their API structure
- Check if `doc_id` values need updating
- DOM fallback should still work
### Blocked on cloud servers
**Problem**: Using datacenter IPs
**Solution**: Get residential proxies (see ANTI-BOT-RECOMMENDATIONS.md)
## 📊 Best Practices
1. **Start Small**: Test with MAX_FOLLOWING=5, MAX_PROFILES=2
2. **Use Residential Proxies**: Critical for production use
3. **Respect Rate Limits**: ~200 requests/hour per IP
4. **Save Sessions**: Reuse cookies to avoid repeated logins
5. **Monitor Logs**: Watch for 429 errors
6. **Add Randomness**: Vary delays and patterns
7. **Take Breaks**: Schedule longer breaks every N profiles
## 🎓 Learning Path
1. **Start**: Run `MODE=simple` with small numbers
2. **Understand**: Read the logs and output files
3. **Customize**: Modify `MAX_FOLLOWING` and `MAX_PROFILES`
4. **Advanced**: Use `MODE=full` for complete control
5. **Production**: Add proxies and session management
---
**Need help?** Check:
- [ANTI-BOT-RECOMMENDATIONS.md](./ANTI-BOT-RECOMMENDATIONS.md)
- [EXPONENTIAL-BACKOFF.md](./EXPONENTIAL-BACKOFF.md)
- Test script: `node test-retry.js`

1648
package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

9
package.json Normal file
View File

@@ -0,0 +1,9 @@
{
"dependencies": {
"dotenv": "^17.2.3",
"puppeteer": "^24.27.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2",
"random-useragent": "^0.5.0"
}
}

723
scraper.js Normal file
View File

@@ -0,0 +1,723 @@
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
const randomUseragent = require("random-useragent");
const fs = require("fs");
const {
randomSleep,
simulateHumanBehavior,
handleRateLimitedRequest,
} = require("./utils.js");
puppeteer.use(StealthPlugin());
const INSTAGRAM_URL = "https://www.instagram.com";
const SESSION_FILE = "session_cookies.json";
async function loginWithSession(
{ username, password },
proxy = null,
useExistingSession = true
) {
const browserArgs = [];
if (proxy) browserArgs.push(`--proxy-server=${proxy}`);
const userAgent = randomUseragent.getRandom();
const browser = await puppeteer.launch({
headless: false,
args: browserArgs,
});
const page = await browser.newPage();
await page.setUserAgent(userAgent);
// Set a large viewport to ensure modal behavior (Instagram shows modals on desktop/large screens)
await page.setViewport({
width: 1920, // Standard desktop width
height: 1080, // Standard desktop height
});
// Set browser timezone
await page.evaluateOnNewDocument(() => {
Object.defineProperty(Intl.DateTimeFormat.prototype, "resolvedOptions", {
value: function () {
return { timeZone: "America/New_York" };
},
});
});
// Monitor for rate limit responses
page.on("response", (response) => {
if (response.status() === 429) {
console.log(
`WARNING: Rate limit detected (429) on ${response
.url()
.substring(0, 80)}...`
);
}
});
// Try to load existing session if available
if (useExistingSession && fs.existsSync(SESSION_FILE)) {
try {
console.log("Found existing session, attempting to reuse...");
const sessionData = JSON.parse(fs.readFileSync(SESSION_FILE, "utf-8"));
if (sessionData.cookies && sessionData.cookies.length > 0) {
await page.setCookie(...sessionData.cookies);
console.log(
`Loaded ${sessionData.cookies.length} cookies from session`
);
// Navigate to Instagram to check if session is valid
await page.goto(INSTAGRAM_URL, { waitUntil: "networkidle2" });
await randomSleep(2000, 3000);
// Check if we're logged in by looking for profile link or login page
const isLoggedIn = await page.evaluate(() => {
// If we see login/signup links, we're not logged in
const loginLink = document.querySelector(
'a[href="/accounts/login/"]'
);
return !loginLink;
});
if (isLoggedIn) {
console.log("Session is valid! Skipping login.");
return { browser, page, sessionReused: true };
} else {
console.log("Session expired, proceeding with fresh login...");
}
}
} catch (error) {
console.log("Failed to load session, proceeding with fresh login...");
}
}
// Fresh login flow
return await performLogin(page, { username, password }, browser);
}
async function performLogin(page, { username, password }, browser) {
// Navigate to login page
await handleRateLimitedRequest(
page,
async () => {
await page.goto(`${INSTAGRAM_URL}/accounts/login/`, {
waitUntil: "networkidle2",
});
},
"during login page load"
);
console.log("Waiting for login form to appear...");
// Wait for the actual login form to load
await page.waitForSelector('input[name="username"]', {
visible: true,
timeout: 60000,
});
console.log("Login form loaded!");
// Simulate human behavior
await simulateHumanBehavior(page, { mouseMovements: 3, scrolls: 1 });
await randomSleep(500, 1000);
await page.type('input[name="username"]', username, { delay: 130 });
await randomSleep(300, 700);
await page.type('input[name="password"]', password, { delay: 120 });
await simulateHumanBehavior(page, { mouseMovements: 2, scrolls: 0 });
await randomSleep(500, 1000);
await Promise.all([
page.click('button[type="submit"]'),
page.waitForNavigation({ waitUntil: "networkidle2" }),
]);
await randomSleep(1000, 2000);
return { browser, page, sessionReused: false };
}
async function extractSession(page) {
// Return cookies/session tokens for reuse
const cookies = await page.cookies();
return { cookies };
}
async function getFollowingsList(page, targetUsername, maxUsers = 100) {
const followingData = [];
const followingUsernames = [];
let requestCount = 0;
const requestsPerBatch = 12; // Instagram typically returns ~12 users per request
// Set up response listener to capture API responses (no need for request interception)
page.on("response", async (response) => {
const url = response.url();
// Intercept the following list API endpoint
if (url.includes("/friendships/") && url.includes("/following/")) {
try {
const json = await response.json();
// Check for rate limit in response
if (json.status === "fail" || json.message?.includes("rate limit")) {
console.log("WARNING: Rate limit detected in API response");
return;
}
if (json.users && Array.isArray(json.users)) {
json.users.forEach((user) => {
if (followingData.length < maxUsers) {
followingData.push({
pk: user.pk,
pk_id: user.pk_id,
username: user.username,
full_name: user.full_name,
profile_pic_url: user.profile_pic_url,
is_verified: user.is_verified,
is_private: user.is_private,
fbid_v2: user.fbid_v2,
latest_reel_media: user.latest_reel_media,
account_badges: user.account_badges,
});
followingUsernames.push(user.username);
}
});
requestCount++;
console.log(
`Captured ${followingData.length} users so far (Request #${requestCount})...`
);
}
} catch (err) {
// Not JSON or parsing error, ignore
}
}
});
await handleRateLimitedRequest(
page,
async () => {
await page.goto(`${INSTAGRAM_URL}/${targetUsername}/`, {
waitUntil: "networkidle2",
});
},
`while loading profile @${targetUsername}`
);
// Simulate browsing the profile before clicking following
await simulateHumanBehavior(page, { mouseMovements: 4, scrolls: 2 });
await randomSleep(1000, 2000);
await page.waitForSelector('a[href$="/following/"]', { timeout: 10000 });
// Hover over the following link before clicking
await page.hover('a[href$="/following/"]');
await randomSleep(300, 600);
await page.click('a[href$="/following/"]');
// Wait for either modal or page navigation
await randomSleep(1500, 2500);
// Detect if modal opened or if we navigated to a new page
const layoutType = await page.evaluate(() => {
const hasModal = !!document.querySelector('div[role="dialog"]');
const urlHasFollowing = window.location.pathname.includes("/following");
return { hasModal, urlHasFollowing };
});
if (layoutType.hasModal) {
console.log("Following modal opened (desktop layout)");
} else if (layoutType.urlHasFollowing) {
console.log("Navigated to following page (mobile/small viewport layout)");
} else {
console.log("Warning: Could not detect following list layout");
}
// Wait for the list content to load
await randomSleep(1500, 2500);
// Verify we can see the list items
const hasListItems = await page.evaluate(() => {
return (
document.querySelectorAll('div.x1qnrgzn, a[href*="following"]').length > 0
);
});
if (hasListItems) {
console.log("Following list loaded successfully");
} else {
console.log("Warning: List items not detected, but continuing...");
}
// Scroll to load more users while simulating human behavior
const totalRequests = Math.ceil(maxUsers / requestsPerBatch);
let scrollAttempts = 0;
const maxScrollAttempts = Math.min(totalRequests * 3, 50000); // Cap at 50k attempts
let lastDataLength = 0;
let noNewDataCount = 0;
console.log(
`Will attempt to scroll up to ${maxScrollAttempts} times to reach ${maxUsers} users...`
);
while (
followingData.length < maxUsers &&
scrollAttempts < maxScrollAttempts
) {
// Check if we're still getting new data
if (followingData.length === lastDataLength) {
noNewDataCount++;
// If no new data after 8 consecutive scroll attempts, we've reached the end
if (noNewDataCount >= 8) {
console.log(
`No new data after ${noNewDataCount} attempts. Reached end of list.`
);
break;
}
if (noNewDataCount % 3 === 0) {
console.log(
`Still at ${followingData.length} users after ${noNewDataCount} scrolls...`
);
}
} else {
if (noNewDataCount > 0) {
console.log(
`Got new data! Now at ${followingData.length} users (was stuck for ${noNewDataCount} attempts)`
);
}
noNewDataCount = 0; // Reset counter when we get new data
lastDataLength = followingData.length;
}
// Every ~12 users loaded (one request completed), simulate human behavior
if (
requestCount > 0 &&
requestCount % Math.max(1, Math.ceil(totalRequests / 5)) === 0
) {
await simulateHumanBehavior(page, {
mouseMovements: 2,
scrolls: 0, // We're manually controlling scroll below
});
}
// Occasionally move mouse while scrolling
if (scrollAttempts % 5 === 0) {
const viewport = await page.viewport();
await page.mouse.move(
Math.floor(Math.random() * viewport.width),
Math.floor(Math.random() * viewport.height),
{ steps: 10 }
);
}
// Scroll the dialog's scrollable container - comprehensive approach
const scrollResult = await page.evaluate(() => {
// Find the scrollable container inside the dialog
const dialog = document.querySelector('div[role="dialog"]');
if (!dialog) {
return { success: false, error: "No dialog found", scrolled: false };
}
// Look for the scrollable div - it has overflow: hidden auto
const scrollableElements = dialog.querySelectorAll("div");
let scrollContainer = null;
for (const elem of scrollableElements) {
const style = window.getComputedStyle(elem);
const overflow = style.overflow || style.overflowY;
// Check if element is scrollable
if (
(overflow === "auto" || overflow === "scroll") &&
elem.scrollHeight > elem.clientHeight
) {
scrollContainer = elem;
break;
}
}
if (!scrollContainer) {
// Fallback: try specific class from your HTML
scrollContainer =
dialog.querySelector("div.x6nl9eh") ||
dialog.querySelector('div[style*="overflow"]');
}
if (!scrollContainer) {
return {
success: false,
error: "No scrollable container found",
scrolled: false,
};
}
const oldScrollTop = scrollContainer.scrollTop;
const scrollHeight = scrollContainer.scrollHeight;
const clientHeight = scrollContainer.clientHeight;
// Scroll down
scrollContainer.scrollTop += 400 + Math.floor(Math.random() * 200);
const newScrollTop = scrollContainer.scrollTop;
const actuallyScrolled = newScrollTop > oldScrollTop;
const atBottom = scrollHeight - newScrollTop - clientHeight < 50;
return {
success: true,
scrolled: actuallyScrolled,
atBottom: atBottom,
scrollTop: newScrollTop,
scrollHeight: scrollHeight,
};
});
if (!scrollResult.success) {
console.log(`Scroll error: ${scrollResult.error}`);
// Try alternative: scroll the page itself
await page.evaluate(() => window.scrollBy(0, 300));
} else if (!scrollResult.scrolled) {
console.log("Reached scroll bottom - cannot scroll further");
}
// Check if we've reached the bottom and loading indicator is visible
const loadingStatus = await page.evaluate(() => {
const loader = document.querySelector('svg[aria-label="Loading..."]');
if (!loader) {
return { exists: false, visible: false, reachedBottom: true };
}
// Check if loader is in viewport (visible)
const rect = loader.getBoundingClientRect();
const isVisible =
rect.top >= 0 &&
rect.left >= 0 &&
rect.bottom <= window.innerHeight &&
rect.right <= window.innerWidth;
return { exists: true, visible: isVisible, reachedBottom: isVisible };
});
if (!loadingStatus.exists) {
// No loading indicator at all - might have reached the actual end
console.log("No loading indicator found - may have reached end of list");
} else if (loadingStatus.visible) {
// Loader is visible, meaning we've scrolled to it
console.log("Loading indicator visible, waiting for more data...");
await randomSleep(2500, 3500); // Wait longer for Instagram to load more
} else {
// Loader exists but not visible yet, keep scrolling
await randomSleep(1500, 2500);
}
scrollAttempts++;
// Progress update every 50 scrolls
if (scrollAttempts % 50 === 0) {
console.log(
`Progress: ${followingData.length} users captured after ${scrollAttempts} scroll attempts...`
);
}
}
console.log(`Total users captured: ${followingData.length}`);
return {
usernames: followingUsernames.slice(0, maxUsers),
fullData: followingData.slice(0, maxUsers),
};
}
async function scrapeProfile(page, username) {
console.log(`Scraping profile: @${username}`);
let profileData = { username };
let dataCapture = false;
// Set up response listener to intercept API calls
const responseHandler = async (response) => {
const url = response.url();
try {
// Check for GraphQL or REST API endpoints
if (
url.includes("/api/v1/users/web_profile_info/") ||
url.includes("/graphql/query")
) {
const contentType = response.headers()["content-type"] || "";
if (!contentType.includes("json")) return;
const json = await response.json();
// Handle web_profile_info endpoint (REST API)
if (url.includes("web_profile_info") && json.data?.user) {
if (dataCapture) return; // Already captured, skip duplicate
const user = json.data.user;
profileData = {
username: user.username,
full_name: user.full_name,
bio: user.biography || "",
followerCount: user.edge_followed_by?.count || 0,
followingCount: user.edge_follow?.count || 0,
profile_pic_url:
user.hd_profile_pic_url_info?.url || user.profile_pic_url,
is_verified: user.is_verified,
is_private: user.is_private,
is_business: user.is_business_account,
category: user.category_name,
external_url: user.external_url,
email: null,
phone: null,
};
// Extract email/phone from bio
if (profileData.bio) {
const emailMatch = profileData.bio.match(
/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/
);
profileData.email = emailMatch ? emailMatch[0] : null;
const phoneMatch = profileData.bio.match(
/(\+\d{1,3}[- ]?)?\d{10,14}/
);
profileData.phone = phoneMatch ? phoneMatch[0] : null;
}
dataCapture = true;
}
// Handle GraphQL endpoint
else if (url.includes("graphql") && json.data?.user) {
if (dataCapture) return; // Already captured, skip duplicate
const user = json.data.user;
profileData = {
username: user.username,
full_name: user.full_name,
bio: user.biography || "",
followerCount: user.follower_count || 0,
followingCount: user.following_count || 0,
profile_pic_url:
user.hd_profile_pic_url_info?.url || user.profile_pic_url,
is_verified: user.is_verified,
is_private: user.is_private,
is_business: user.is_business_account || user.is_business,
category: user.category_name || user.category,
external_url: user.external_url,
email: null,
phone: null,
};
// Extract email/phone from bio
if (profileData.bio) {
const emailMatch = profileData.bio.match(
/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/
);
profileData.email = emailMatch ? emailMatch[0] : null;
const phoneMatch = profileData.bio.match(
/(\+\d{1,3}[- ]?)?\d{10,14}/
);
profileData.phone = phoneMatch ? phoneMatch[0] : null;
}
dataCapture = true;
}
}
} catch (e) {
// Ignore errors from parsing non-JSON responses
}
};
page.on("response", responseHandler);
// Navigate to profile page
await handleRateLimitedRequest(
page,
async () => {
await page.goto(`${INSTAGRAM_URL}/${username}/`, {
waitUntil: "domcontentloaded",
});
},
`while loading profile @${username}`
);
// Wait for API calls to complete
await randomSleep(2000, 3000);
// Remove listener
page.off("response", responseHandler);
// If API capture worked, return the data
if (dataCapture) {
return profileData;
}
// Otherwise, fall back to DOM scraping
console.log(`⚠️ API capture failed for @${username}, using DOM fallback...`);
return await scrapeProfileFallback(page, username);
}
// Fallback function using DOM scraping
async function scrapeProfileFallback(page, username) {
console.log(`Using DOM scraping for @${username}...`);
const domData = await page.evaluate(() => {
// Try multiple selectors for bio
let bio = "";
const bioSelectors = [
"span._ap3a._aaco._aacu._aacx._aad7._aade", // Updated bio class (2025)
"span._ap3a._aaco._aacu._aacx._aad6._aade", // Previous bio class
"div._aacl._aaco._aacu._aacx._aad7._aade", // Alternative bio with _aad7
"div._aacl._aaco._aacu._aacx._aad6._aade", // Alternative bio with _aad6
"h1 + div span", // Bio after username
"header section div span", // Generic header bio
'div.x7a106z span[dir="auto"]', // Bio container with dir attribute
];
for (const selector of bioSelectors) {
const elem = document.querySelector(selector);
if (elem && elem.innerText && elem.innerText.length > 3) {
bio = elem.innerText;
break;
}
}
// Get follower/following counts using href-based selectors (stable)
let followerCount = 0;
let followingCount = 0;
// Method 1: Find by href (most reliable)
const followersLink = document.querySelector('a[href*="/followers/"]');
const followingLink = document.querySelector('a[href*="/following/"]');
if (followersLink) {
const text = followersLink.innerText || followersLink.textContent || "";
const match = text.match(/[\d,\.]+/);
if (match) {
followerCount = match[0].replace(/,/g, "").replace(/\./g, "");
}
}
if (followingLink) {
const text = followingLink.innerText || followingLink.textContent || "";
const match = text.match(/[\d,\.]+/);
if (match) {
followingCount = match[0].replace(/,/g, "").replace(/\./g, "");
}
}
// Alternative: Look in meta tags if href method fails
if (!followerCount) {
const metaContent =
document.querySelector('meta[property="og:description"]')?.content ||
"";
const followerMatch = metaContent.match(/([\d,\.KMB]+)\s+Followers/i);
const followingMatch = metaContent.match(/([\d,\.KMB]+)\s+Following/i);
if (followerMatch) followerCount = followerMatch[1].replace(/,/g, "");
if (followingMatch) followingCount = followingMatch[1].replace(/,/g, "");
}
// Extract email/phone from bio
let emailMatch = bio.match(
/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/
);
let email = emailMatch ? emailMatch[0] : null;
let phoneMatch = bio.match(/(\+\d{1,3}[- ]?)?\d{10,14}/);
let phone = phoneMatch ? phoneMatch[0] : null;
return {
bio,
followerCount: parseInt(followerCount) || 0,
followingCount: parseInt(followingCount) || 0,
email,
phone,
};
});
return {
username,
...domData,
};
}
async function cronJobs(fn, intervalSec, stopAfter = 0) {
let runCount = 0;
let stop = false;
const timer = setInterval(async () => {
if (stop || (stopAfter && runCount >= stopAfter)) {
clearInterval(timer);
return;
}
await fn();
runCount++;
}, intervalSec * 1000);
return () => {
stop = true;
};
}
async function scrapeWorkflow(
creds,
targetUsername,
proxy = null,
maxFollowingToScrape = 10
) {
const { browser, page } = await login(creds, proxy);
try {
// Extract current session details for persistence
const session = await extractSession(page);
// Grab followings with full data
const followingsData = await getFollowingsList(
page,
targetUsername,
maxFollowingToScrape
);
console.log(
`Processing ${followingsData.usernames.length} following accounts...`
);
for (let i = 0; i < followingsData.usernames.length; i++) {
// Add occasional longer breaks to simulate human behavior
if (i > 0 && i % 10 === 0) {
console.log(`Taking a human-like break after ${i} profiles...`);
await simulateHumanBehavior(page, { mouseMovements: 5, scrolls: 3 });
await randomSleep(5000, 10000); // Longer break every 10 profiles
}
const profileInfo = await scrapeProfile(
page,
followingsData.usernames[i]
);
console.log(JSON.stringify(profileInfo));
// Implement rate limiting + anti-bot sleep
await randomSleep(2500, 6000);
}
// Optionally return the full data for further processing
return {
session,
followingsFullData: followingsData.fullData,
scrapedProfiles: followingsData.usernames.length,
};
} catch (err) {
console.error("Scrape error:", err);
} finally {
await browser.close();
}
}
module.exports = {
loginWithSession,
extractSession,
scrapeWorkflow,
getFollowingsList,
scrapeProfile,
cronJobs,
};

356
server.js Normal file
View File

@@ -0,0 +1,356 @@
const {
loginWithSession,
extractSession,
scrapeWorkflow,
getFollowingsList,
scrapeProfile,
cronJobs,
} = require("./scraper.js");
const { randomSleep, simulateHumanBehavior } = require("./utils.js");
const fs = require("fs");
require("dotenv").config();
// Full workflow: Login, browse, scrape followings and profiles
async function fullScrapingWorkflow() {
console.log("Starting Instagram Full Scraping Workflow...\n");
// Start total timer
const totalStartTime = Date.now();
const credentials = {
username: process.env.INSTAGRAM_USERNAME || "your_username",
password: process.env.INSTAGRAM_PASSWORD || "your_password",
};
const targetUsername = process.env.TARGET_USERNAME || "instagram";
const maxFollowing = parseInt(process.env.MAX_FOLLOWING || "20", 10);
const maxProfilesToScrape = parseInt(process.env.MAX_PROFILES || "5", 10);
const proxy = process.env.PROXY || null;
let browser, page;
try {
console.log("Configuration:");
console.log(` Target: @${targetUsername}`);
console.log(` Max following to fetch: ${maxFollowing}`);
console.log(` Max profiles to scrape: ${maxProfilesToScrape}`);
console.log(` Proxy: ${proxy || "None"}\n`);
// Step 1: Login (with session reuse)
console.log("Step 1: Logging in to Instagram...");
const loginResult = await loginWithSession(credentials, proxy, true);
browser = loginResult.browser;
page = loginResult.page;
if (loginResult.sessionReused) {
console.log("Reused existing session!\n");
} else {
console.log("Fresh login successful!\n");
}
// Step 2: Extract and save session
console.log("Step 2: Extracting session cookies...");
const session = await extractSession(page);
fs.writeFileSync("session_cookies.json", JSON.stringify(session, null, 2));
console.log(`Session saved (${session.cookies.length} cookies)\n`);
// Step 3: Simulate browsing before scraping
console.log("Step 3: Simulating human browsing behavior...");
await simulateHumanBehavior(page, { mouseMovements: 5, scrolls: 3 });
await randomSleep(2000, 4000);
console.log("Browsing simulation complete\n");
// Step 4: Get followings list
console.log(`👥 Step 4: Fetching following list for @${targetUsername}...`);
const followingsStartTime = Date.now();
const followingsData = await getFollowingsList(
page,
targetUsername,
maxFollowing
);
const followingsEndTime = Date.now();
const followingsTime = (
(followingsEndTime - followingsStartTime) /
1000
).toFixed(2);
console.log(
`✓ Captured ${followingsData.fullData.length} followings in ${followingsTime}s\n`
);
// Save followings data
const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
const followingsFile = `followings_${targetUsername}_${timestamp}.json`;
fs.writeFileSync(
followingsFile,
JSON.stringify(
{
targetUsername,
scrapedAt: new Date().toISOString(),
totalFollowings: followingsData.fullData.length,
followings: followingsData.fullData,
},
null,
2
)
);
console.log(`Followings data saved to: ${followingsFile}\n`);
// Step 5: Scrape individual profiles
console.log(
`📊 Step 5: Scraping ${maxProfilesToScrape} individual profiles...`
);
const profilesStartTime = Date.now();
const profilesData = [];
const usernamesToScrape = followingsData.usernames.slice(
0,
maxProfilesToScrape
);
for (let i = 0; i < usernamesToScrape.length; i++) {
const username = usernamesToScrape[i];
console.log(
` [${i + 1}/${usernamesToScrape.length}] Scraping @${username}...`
);
try {
const profileData = await scrapeProfile(page, username);
profilesData.push(profileData);
console.log(` @${username}: ${profileData.followerCount} followers`);
// Human-like delay between profiles
await randomSleep(3000, 6000);
// Take a longer break every 3 profiles
if ((i + 1) % 3 === 0 && i < usernamesToScrape.length - 1) {
console.log(" ⏸ Taking a human-like break...");
await simulateHumanBehavior(page, { mouseMovements: 4, scrolls: 2 });
await randomSleep(8000, 12000);
}
} catch (error) {
console.log(` Failed to scrape @${username}: ${error.message}`);
}
}
const profilesEndTime = Date.now();
const profilesTime = ((profilesEndTime - profilesStartTime) / 1000).toFixed(
2
);
console.log(
`\n✓ Scraped ${profilesData.length} profiles in ${profilesTime}s\n`
);
// Step 6: Save profiles data
console.log("Step 6: Saving profile data...");
const profilesFile = `profiles_${targetUsername}_${timestamp}.json`;
fs.writeFileSync(
profilesFile,
JSON.stringify(
{
targetUsername,
scrapedAt: new Date().toISOString(),
totalProfiles: profilesData.length,
profiles: profilesData,
},
null,
2
)
);
console.log(`Profiles data saved to: ${profilesFile}\n`);
// Calculate total time
const totalEndTime = Date.now();
const totalTime = ((totalEndTime - totalStartTime) / 1000).toFixed(2);
const totalMinutes = Math.floor(totalTime / 60);
const totalSeconds = (totalTime % 60).toFixed(2);
// Step 7: Summary
console.log("=".repeat(60));
console.log("📊 SCRAPING SUMMARY");
console.log("=".repeat(60));
console.log(`✓ Logged in successfully`);
console.log(`✓ Session cookies saved`);
console.log(
`${followingsData.fullData.length} followings captured in ${followingsTime}s`
);
console.log(
`${profilesData.length} profiles scraped in ${profilesTime}s`
);
console.log(`\n📁 Files created:`);
console.log(`${followingsFile}`);
console.log(`${profilesFile}`);
console.log(` • session_cookies.json`);
console.log(
`\n⏱️ Total execution time: ${totalMinutes}m ${totalSeconds}s`
);
console.log("=".repeat(60) + "\n");
return {
success: true,
followingsCount: followingsData.fullData.length,
profilesCount: profilesData.length,
followingsData: followingsData.fullData,
profilesData,
session,
timings: {
followingsTime: parseFloat(followingsTime),
profilesTime: parseFloat(profilesTime),
totalTime: parseFloat(totalTime),
},
};
} catch (error) {
console.error("\nScraping workflow failed:");
console.error(error.message);
console.error(error.stack);
throw error;
} finally {
if (browser) {
console.log("Closing browser...");
await browser.close();
console.log("Browser closed\n");
}
}
}
// Alternative: Use the built-in scrapeWorkflow function
async function simpleWorkflow() {
console.log("Starting Simple Scraping Workflow (using scrapeWorkflow)...\n");
const credentials = {
username: process.env.INSTAGRAM_USERNAME || "your_username",
password: process.env.INSTAGRAM_PASSWORD || "your_password",
};
const targetUsername = process.env.TARGET_USERNAME || "instagram";
const maxFollowing = parseInt(process.env.MAX_FOLLOWING || "20", 10);
const proxy = process.env.PROXY || null;
try {
console.log(`Target: @${targetUsername}`);
console.log(`Max following to scrape: ${maxFollowing}`);
console.log(`Using proxy: ${proxy || "None"}\n`);
const result = await scrapeWorkflow(
credentials,
targetUsername,
proxy,
maxFollowing
);
console.log("\nScraping completed successfully!");
console.log(`Total profiles scraped: ${result.scrapedProfiles}`);
console.log(
`Full following data captured: ${result.followingsFullData.length} users`
);
// Save the data
if (result.followingsFullData.length > 0) {
const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
const filename = `scraped_data_${targetUsername}_${timestamp}.json`;
fs.writeFileSync(
filename,
JSON.stringify(
{
targetUsername,
scrapedAt: new Date().toISOString(),
totalUsers: result.followingsFullData.length,
data: result.followingsFullData,
},
null,
2
)
);
console.log(`Data saved to: ${filename}`);
}
return result;
} catch (error) {
console.error("\nScraping failed:");
console.error(error.message);
throw error;
}
}
// Scheduled scraping with cron
async function scheduledScraping() {
console.log("Starting Scheduled Scraping...\n");
const credentials = {
username: process.env.INSTAGRAM_USERNAME || "your_username",
password: process.env.INSTAGRAM_PASSWORD || "your_password",
};
const targetUsername = process.env.TARGET_USERNAME || "instagram";
const intervalMinutes = parseInt(process.env.SCRAPE_INTERVAL || "60", 10);
const maxRuns = parseInt(process.env.MAX_RUNS || "5", 10);
console.log(
`Will scrape @${targetUsername} every ${intervalMinutes} minutes`
);
console.log(`Maximum runs: ${maxRuns}\n`);
let runCount = 0;
const stopCron = await cronJobs(
async () => {
runCount++;
console.log(`\n${"=".repeat(60)}`);
console.log(
`📅 Scheduled Run #${runCount} - ${new Date().toLocaleString()}`
);
console.log("=".repeat(60));
try {
await simpleWorkflow();
} catch (error) {
console.error(`Run #${runCount} failed:`, error.message);
}
if (runCount >= maxRuns) {
console.log(`\nCompleted ${maxRuns} scheduled runs. Stopping...`);
process.exit(0);
}
},
intervalMinutes * 60, // Convert to seconds
maxRuns
);
console.log("Cron job started. Press Ctrl+C to stop.\n");
}
// Main entry point
if (require.main === module) {
const mode = process.env.MODE || "full"; // full, simple, or scheduled
console.log(`Mode: ${mode}\n`);
let workflow;
if (mode === "simple") {
workflow = simpleWorkflow();
} else if (mode === "scheduled") {
workflow = scheduledScraping();
} else {
workflow = fullScrapingWorkflow();
}
workflow
.then(() => {
console.log("All done!");
process.exit(0);
})
.catch((err) => {
console.error("\nFatal error:", err);
process.exit(1);
});
}
module.exports = {
fullScrapingWorkflow,
simpleWorkflow,
scheduledScraping,
};

146
utils.js Normal file
View File

@@ -0,0 +1,146 @@
function randomSleep(minMs = 2000, maxMs = 5000) {
const delay = Math.floor(Math.random() * (maxMs - minMs + 1)) + minMs;
return new Promise((res) => setTimeout(res, delay));
}
async function humanLikeMouseMovement(page, steps = 10) {
// Simulate human-like mouse movements across the page
const viewport = await page.viewport();
const width = viewport.width;
const height = viewport.height;
for (let i = 0; i < steps; i++) {
const x = Math.floor(Math.random() * width);
const y = Math.floor(Math.random() * height);
await page.mouse.move(x, y, { steps: Math.floor(Math.random() * 10) + 5 });
await randomSleep(100, 500);
}
}
async function randomScroll(page, scrollCount = 3) {
// Perform random scrolling to simulate human behavior
for (let i = 0; i < scrollCount; i++) {
const scrollAmount = Math.floor(Math.random() * 300) + 100;
await page.evaluate((amount) => {
window.scrollBy(0, amount);
}, scrollAmount);
await randomSleep(800, 1500);
}
}
async function simulateHumanBehavior(page, options = {}) {
// Combined function to simulate various human-like behaviors
const { mouseMovements = 5, scrolls = 2, randomClicks = false } = options;
// Random mouse movements
if (mouseMovements > 0) {
await humanLikeMouseMovement(page, mouseMovements);
}
// Random scrolling
if (scrolls > 0) {
await randomScroll(page, scrolls);
}
// Optional: Random clicks on non-interactive elements
if (randomClicks) {
try {
await page.evaluate(() => {
const elements = document.querySelectorAll("div, span, p");
if (elements.length > 0) {
const randomElement =
elements[Math.floor(Math.random() * elements.length)];
const rect = randomElement.getBoundingClientRect();
// Just move to it, don't actually click to avoid triggering actions
}
});
} catch (err) {
// Ignore errors from random element selection
}
}
await randomSleep(500, 1000);
}
async function withRetry(fn, options = {}) {
const {
maxRetries = 3,
initialDelay = 2000,
maxDelay = 30000,
shouldRetry = (error) => true,
} = options;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
const isLastAttempt = attempt === maxRetries - 1;
// Check if we should retry this error
if (!shouldRetry(error) || isLastAttempt) {
throw error;
}
// Calculate exponential backoff delay: 2s, 4s, 8s, 16s, 30s (capped)
const exponentialDelay = Math.min(
initialDelay * Math.pow(2, attempt),
maxDelay
);
// Add jitter (randomize ±20%) to avoid thundering herd
const jitter = exponentialDelay * (0.8 + Math.random() * 0.4);
const delay = Math.floor(jitter);
console.log(
`Retry attempt ${attempt + 1}/${maxRetries} after ${delay}ms delay...`
);
console.log(`Error: ${error.message || error}`);
await randomSleep(delay, delay);
}
}
}
async function handleRateLimitedRequest(page, requestFn, context = "") {
return withRetry(requestFn, {
maxRetries: 5,
initialDelay: 2000,
maxDelay: 60000,
shouldRetry: (error) => {
// Retry on rate limit (429) or temporary errors
if (error.status === 429 || error.statusCode === 429) {
console.log(`Rate limited (429) ${context}. Backing off...`);
return true;
}
// Retry on 5xx server errors
if (error.status >= 500 || error.statusCode >= 500) {
console.log(
`Server error (${
error.status || error.statusCode
}) ${context}. Retrying...`
);
return true;
}
// Retry on network errors
if (error.code === "ECONNRESET" || error.code === "ETIMEDOUT") {
console.log(`Network error (${error.code}) ${context}. Retrying...`);
return true;
}
// Don't retry on client errors (4xx except 429)
return false;
},
});
}
module.exports = {
randomSleep,
humanLikeMouseMovement,
randomScroll,
simulateHumanBehavior,
withRetry,
handleRateLimitedRequest,
};