Playwright

Scraping Browser provides a high-performance serverless platform designed to simplify the process of data extraction from dynamic websites. Through seamless integration with Playwright, developers can run, manage, and monitor headless browsers without needing dedicated server resources, enabling efficient web automation and data collection.

Installing Necessary Libraries

First, install playwright-core, a lightweight version of Playwright used to connect to existing browser instances:

npm install playwright-core

Writing Code to Connect to Scraping Browser

In your Playwright code, connect to Scraping Browser using the following:

const { Playwright } = require('@scrapeless-ai/sdk');
 
(async () => {
    const browser = await Playwright.connect({
        apiKey: 'Your API key',
        sessionName: 'sdk_test',
        sessionTTL: 180,
        proxyCountry: 'US',
        sessionRecording: true,
    });
 
    const context = browser.contexts()[0];
    const page = await context.newPage();
 
    await page.goto('https://www.scrapeless.com');
    console.log(await page.title());
    await browser.close();
})();

This allows you to leverage Scraping Browser’s infrastructure, including scalability, IP rotation, and global access.

Practical Examples

Here are some common Playwright operations after integrating Scraping Browser:

Navigation and Page Content Extraction

const page = await browser.newPage();
await page.goto('https://www.example.com');
console.log(await page.title());
const html = await page.content();
console.log(html);
await browser.close();

Taking Screenshots

const context = browser.contexts()[0];
const page = await context.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();

Running Custom Code

const context = browser.contexts()[0];
const page = await context.newPage();
await page.goto('https://www.example.com');
const result = await page.evaluate(() => document.title);
console.log('Page title:', result);
await browser.close();

Simulates a mouse click.

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.realClick('button[type="submit"]');

Simulate keyboard input.

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.realFill('#login-email', 'scrapeless@gmail.com');

Get the current page URL using Scrapeless Agent

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
const { error, liveURL } = await cdpSession.liveURL();

Solve Image Captcha

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.imageToText({
  imageSelector: '.captcha__image',
  inputSelector: 'input[name="captcha"]',
  timeout: 30000,
});

Disable automatic captcha solving

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.disableCaptchaAutoSolve();

Manually solve a captcha with specified options

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.solveCaptcha();

Wait for a captcha to be detected on the page

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.waitCaptchaDetected({ timeout: 30000 });

Wait for a captcha to be solved (either successfully or failed

const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.waitCaptchaSolved({ timeout: 30000 });

Puppeteer Live Session