Puppeteer

Scraping Browser provides a high-performance, serverless platform designed to simplify the process of extracting data from dynamic websites. Through seamless integration with Puppeteer, developers can run, manage, and monitor headless browsers without needing a dedicated server, enabling efficient web automation and data collection.

Installing Necessary Libraries

First, install puppeteer-core, a lightweight version of Puppeteer designed for connecting to existing browser instances:

npm install puppeteer-core

Writing Code to Connect to Scraping Browser

In your Puppeteer code, connect to Scraping Browser using the following:

const { Puppeteer } = require('@scrapeless-ai/sdk');
 
(async () => {
    const browser = await Puppeteer.connect({
        session_name: 'sdk_test',
        session_ttl: 180,
        proxy_country: 'US',
        session_recording: true,
        defaultViewport: null
    });
 
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    console.log(await page.title());
    await browser.close();
})();

This allows you to leverage Scraping Browser’s infrastructure, including scalability, IP rotation, and global access.

Practical Examples

Here are some common Puppeteer operations after integrating Scraping Browser:

Navigation and Page Content Extraction

const page = await browser.newPage();
await page.goto('https://www.example.com');
console.log(await page.title());
const html = await page.content();
console.log(html);
await browser.close();

Taking Screenshots

const page = await browser.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();

Running Custom Code

const page = await browser.newPage();
await page.goto('https://www.example.com');
const result = await page.evaluate(() => document.title);
console.log('Page title:', result);
await browser.close();

Simulates a mouse click.

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.realClick('button[type="submit"]');

Simulate keyboard input.

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.realFill('#login-email', 'scrapeless@gmail.com');

Get the current page URL using Scrapeless Agent

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
const { error, liveURL } = await cdpSession.liveURL();

Solve Image Captcha

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.imageToText({
  imageSelector: '.captcha__image',
  inputSelector: 'input[name="captcha"]',
  timeout: 30000,
});

Disable automatic captcha solving

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.disableCaptchaAutoSolve();

Manually solve a captcha with specified options

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.solveCaptcha();

Wait for a captcha to be detected on the page

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.waitCaptchaDetected({ timeout: 30000 });

Wait for a captcha to be solved (either successfully or failed

const { createPuppeteerCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPuppeteerCDPSession(page);
await cdpSession.waitCaptchaSolved({ timeout: 30000 });

Getting Started Playwright