Integration
Puppeteer
Scraping Browser offers a high-performance serverless platform designed to simplify the process of data extraction from dynamic websites. With seamless integration with Puppeteer, developers can run, manage, and monitor headless browsers without dedicated server resources, achieving efficient web automation and data collection.
Install Necessary Libraries
First, install puppeteer-core
, which is the lightweight version of Puppeteer, designed to connect to existing browser instances:
npm install puppeteer-core
Write Code to Connect to Scraping Browser
In your Puppeteer code, connect to Scraping Browser using the following method:
const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=APIKey&session_ttl=180&proxy_country=ANY';
(async () => {
const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
const page = await browser.newPage();
await page.goto('https://www.scrapeless.com');
console.log(await page.title());
await browser.close();
})();
In this way, you can take advantage of Scraping Browser’s infrastructure, including scalability, IP rotation, and global access.
Practical Examples
Here are some common Puppeteer operations after integrating with Scraping Browser:
- Navigation and Page Content Extraction
const page = await browser.newPage();
await page.goto('https://www.example.com');
console.log(await page.title());
const html = await page.content();
console.log(html);
await browser.close();
- Capturing Screenshots
const page = await browser.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();
- Running Custom Scripts
const page = await browser.newPage();
await page.goto('https://www.example.com');
const result = await page.evaluate(() => document.title);
console.log('Page title:', result);
await browser.close();
Playwright
Scraping Browser offers a high-performance serverless platform designed to simplify the process of data extraction from dynamic websites. With seamless integration with Playwright, developers can run, manage, and monitor headless browsers without dedicated server resources, achieving efficient web automation and data collection.
Install Necessary Libraries
First, install playwright-core
, which is the lightweight version of Playwright, used to connect to existing browser instances:
npm install playwright-core
Write Code to Connect to Scraping Browser
In Playwright code, connect to Scraping Browser using the following method:
const { chromium } = require('playwright-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=APIKey&session_ttl=180&proxy_country=ANY';
(async () => {
const browser = await chromium.connectOverCDP(connectionURL);
const page = await browser.newPage();
await page.goto('https://www.scrapeless.com');
console.log(await page.title());
await browser.close();
})();
This allows you to leverage Scraping Browser’s infrastructure, including scalability, IP rotation, and global access.
Practical Examples
Here are some common Playwright operations after integrating with Scraping Browser:
- Navigation and Page Content Extraction
const page = await browser.newPage();
await page.goto('https://www.example.com');
console.log(await page.title());
const html = await page.content();
console.log(html);
await browser.close();
- Capturing Screenshots
const page = await browser.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();
- Running Custom Scripts
const page = await browser.newPage();
await page.goto('https://www.example.com');
const result = await page.evaluate(() => document.title);
console.log('Page title:', result);
await browser.close();
These code examples demonstrate how to use playwright-core to connect to and manipulate Scraping Browser.