Puppeteer
Scraping Browser provides a high-performance, serverless platform designed to simplify the process of extracting data from dynamic websites. Through seamless integration with Puppeteer, developers can run, manage, and monitor headless browsers without needing a dedicated server, enabling efficient web automation and data collection.
Installing Necessary Libraries
First, install puppeteer-core
, a lightweight version of Puppeteer designed for connecting to existing browser instances:
npm install puppeteer-core
Writing Code to Connect to Scraping Browser
In your Puppeteer code, connect to Scraping Browser using the following:
const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=APIKey&session_ttl=180&proxy_country=ANY';
(async () => {
const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
const page = await browser.newPage();
await page.goto('https://www.scrapeless.com');
console.log(await page.title());
await browser.close();
})();
This allows you to leverage Scraping Browser’s infrastructure, including scalability, IP rotation, and global access.
Practical Examples
Here are some common Puppeteer operations after integrating Scraping Browser:
- Navigation and Page Content Extraction
const page = await browser.newPage();
await page.goto('https://www.example.com');
console.log(await page.title());
const html = await page.content();
console.log(html);
await browser.close();
- Taking Screenshots
const page = await browser.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();
- Running Custom Code
const page = await browser.newPage();
await page.goto('https://www.example.com');
const result = await page.evaluate(() => document.title);
console.log('Page title:', result);
await browser.close();