支持的验证码

reCaptcha

Scrapeless 爬虫浏览器仅协助您自动解决 reCaptchaV2,后续操作需自行实现。

Cloudflare

  • Cloudflare Turnstile
  • Cloudflare Challenge

Scrapeless 爬虫浏览器仅协助您自动解决 Turnstile 或 Challenge,后续操作需自行实现。关于处理 Cloudflare 挑战的详细实践(包括获取cf_clearance),请参考:https://www.scrapeless.com/en/blog/cloudflare-challenge-bypass

解决示例

当我们连接到浏览器访问目标站点时,Scrapeless 将自动解决验证码。但是,我们需要确保验证码已成功解决。这是一个简单的示例:此示例访问目标站点,并通过监听Captcha.solveFinished CDP 事件来确认验证码已成功解决。最后,它将截取页面截图以进行验证。

此示例定义了两种主要方法:

  • addCaptchaListener: 用于监听浏览器会话中的验证码事件
  • onCaptchaFinished: 用于等待验证码解决完成

支持的验证码列表

  • reCaptcha v2
  • Cloudflare Turnstile
  • Cloudflare 5s Challenge
  • AWS Challenge
import puppeteer from "puppeteer-core";
import EventEmitter from 'events';
const emitter = new EventEmitter()
const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&session_ttl=180&proxy_country=ANY';
 
export async function example(url) {
  const browser = await puppeteer.connect({
    browserWSEndpoint: scrapelessUrl,
    defaultViewport: null
  });
  console.log("Verbonden met Scrapeless browser");
  try {
    const page = await browser.newPage();
    // Listen for captcha events
    console.debug("addCaptchaListener: Start listening for captcha events");
    await addCaptchaListener(page);
    console.log("Navigated to URL:", url);
    await page.goto(url, { waitUntil: "domcontentloaded", timeout: 30000 });
    console.log("onCaptchaFinished: Waiting for captcha solving to finish...");
    await onCaptchaFinished()
    // Screenshot for debugging
    console.debug("Taking screenshot of the final page...");
    await page.screenshot({ path: 'screenshot.png', fullPage: true });
  } catch (error) {
    console.error(error);
  } finally {
    await browser.close();
    console.log("Browser closed");
  }
}
 
async function addCaptchaListener(page) {
  const client = await page.createCDPSession();
  client.on("Captcha.detected", (msg) => {
    console.debug("Captcha.detected: ", msg);
  });
  client.on("Captcha.solveFinished", async (msg) => {
    console.debug("Captcha.solveFinished: ", msg);
    emitter.emit("Captcha.solveFinished", msg);
    client.removeAllListeners()
  });
}
 
async function onCaptchaFinished(timeout = 60_000) {
  return Promise.race([
    new Promise((resolve) => {
      emitter.on("Captcha.solveFinished", (msg) => {
        resolve(msg);
      });
    }),
    new Promise((_, reject) => setTimeout(() => reject('Timeout'), timeout))
  ])
}

reCaptcha 示例

调用示例代码方法来验证自动 reCaptcha 解决。

 example('https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox-explicit.php');

Cloudflare Turnstile 示例

调用示例代码方法来验证自动 Cloudflare Turnstile 解决。

 example('https://www.scrapingcourse.com/login/cf-turnstile');

Cloudflare Turnstile 的成功解决不仅可以通过监听Captcha.solveFinished CDP 事件来确认,也可以通过监听window.turnstile.getResponse()来确认。这是一个完整的示例:

import puppeteer from "puppeteer-core";
const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&session_ttl=180&proxy_country=ANY';
 
export async function turnstileExample(url) {
  const browser = await puppeteer.connect({
    browserWSEndpoint: scrapelessUrl,
    defaultViewport: null
  });
  console.log("Verbonden met Scrapeless browser");
  try {
    const page = await browser.newPage();
    console.log("Navigated to URL:", url);
    await page.goto(url, { waitUntil: "domcontentloaded", timeout: 30000 });
    console.log("onCaptchaFinished: Waiting for captcha solving to finish...");
    await waitTurnstile(page)
    // Screenshot for debugging
    console.debug("Taking screenshot of the final page...");
    await page.screenshot({ path: 'screenshot.png', fullPage: true });
  } catch (error) {
    console.error(error);
  } finally {
    await browser.close();
    console.log("Browser closed");
  }
}
 
async function waitTurnstile(page) {
    await page.waitForFunction(() => {
        return window.turnstile && window.turnstile.getResponse();
    });
    const token = await page.evaluate(() => {
        return window.turnstile.getResponse();
    });
    console.log("Cloudflare Turnstile token:", token);
}
 
turnstileExample('https://www.scrapingcourse.com/login/cf-turnstile');

Cloudflare Challenge 示例

Cloudflare Challenge 比较特殊,因为有时不会触发 Cloudflare Challenge,而通过 CDP 事件监听解决成功的方法则会超时。因此,等待解决后页面上元素的出现是一种更稳定的方法。这是一个完整的示例:

import puppeteer from "puppeteer-core";
const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&session_ttl=180&proxy_country=ANY';
 
export async function challengeExample(url) {
  const browser = await puppeteer.connect({
    browserWSEndpoint: scrapelessUrl,
    defaultViewport: null
  });
  console.log("Verbonden met Scrapeless browser");
  try {
    const page = await browser.newPage();
    console.log("Navigated to URL:", url);
    await page.goto(url, { waitUntil: "domcontentloaded", timeout: 30000 });
    console.log("onCaptchaFinished: Waiting for captcha solving to finish...");
    await waitChallenge(page, 'main.page-content .challenge-info')
    // Screenshot for debugging
    console.debug("Taking screenshot of the final page...");
    await page.screenshot({ path: 'screenshot.png', fullPage: true });
  } catch (error) {
    console.error(error);
  } finally {
    await browser.close();
    console.log("Browser closed");
  }
}
 
async function waitChallenge(page, selector) {
    await page.waitForSelector(selector);
    console.log("Cloudflare Challenge completed");
}
 
challengeExample('https://www.scrapingcourse.com/cloudflare-challenge');