Playwright
Scraping Browser provides a high-performance serverless platform designed to simplify the process of data extraction from dynamic websites. Through seamless integration with Playwright, developers can run, manage, and monitor headless browsers without needing dedicated server resources, enabling efficient web automation and data collection.
Installing Necessary Libraries
First, install playwright-core, a lightweight version of Playwright used to connect to existing browser instances:
npm install playwright-core
Writing Code to Connect to Scraping Browser
In your Playwright code, connect to Scraping Browser using the following:
const { Playwright } = require('@scrapeless-ai/sdk');
(async () => {
const browser = await Playwright.connect({
session_name: 'sdk_test',
session_ttl: 180,
proxy_country: 'US',
session_recording: true,
defaultViewport: null
});
const context = browser.contexts()[0];
const page = await context.newPage();
await page.goto('https://www.scrapeless.com');
console.log(await page.title());
await browser.close();
})();
This allows you to leverage Scraping Browser’s infrastructure, including scalability, IP rotation, and global access.
Practical Examples
Here are some common Playwright operations after integrating Scraping Browser:
- Navigation and Page Content Extraction
const page = await browser.newPage();
await page.goto('https://www.example.com');
console.log(await page.title());
const html = await page.content();
console.log(html);
await browser.close();
- Taking Screenshots
const context = browser.contexts()[0];
const page = await context.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();
- Running Custom Code
const context = browser.contexts()[0];
const page = await context.newPage();
await page.goto('https://www.example.com');
const result = await page.evaluate(() => document.title);
console.log('Page title:', result);
await browser.close();
- Simulates a mouse click.
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.realClick('button[type="submit"]');
- Simulate keyboard input.
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.realFill('#login-email', 'scrapeless@gmail.com');
- Get the current page URL using Scrapeless Agent
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
const { error, liveURL } = await cdpSession.liveURL();
- Solve Image Captcha
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.imageToText({
imageSelector: '.captcha__image',
inputSelector: 'input[name="captcha"]',
timeout: 30000,
});
- Disable automatic captcha solving
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.disableCaptchaAutoSolve();
- Manually solve a captcha with specified options
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.solveCaptcha();
- Wait for a captcha to be detected on the page
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.waitCaptchaDetected({ timeout: 30000 });
- Wait for a captcha to be solved (either successfully or failed
const { createPlaywrightCDPSession } = require('@scrapeless-ai/sdk');
// ... connect to Scraping Browser as shown above
const cdpSession = await createPlaywrightCDPSession(page);
await cdpSession.waitCaptchaSolved({ timeout: 30000 });