支持的验证码
reCaptcha
Scrapeless 爬虫浏览器仅协助您自动解决 reCaptchaV2,后续操作需自行实现。
Cloudflare
- Cloudflare Turnstile
- Cloudflare Challenge
Scrapeless 爬虫浏览器仅协助您自动解决 Turnstile 或 Challenge,后续操作需自行实现。关于处理 Cloudflare 挑战的详细实践(包括获取
cf_clearance
),请参考:https://www.scrapeless.com/en/blog/cloudflare-challenge-bypass
解决示例
当我们连接到浏览器访问目标站点时,Scrapeless 将自动解决验证码。但是,我们需要确保验证码已成功解决。这是一个简单的示例:此示例访问目标站点,并通过监听Captcha.solveFinished
CDP 事件来确认验证码已成功解决。最后,它将截取页面截图以进行验证。
此示例定义了两种主要方法:
addCaptchaListener
: 用于监听浏览器会话中的验证码事件onCaptchaFinished
: 用于等待验证码解决完成
支持的验证码列表
- reCaptcha v2
- Cloudflare Turnstile
- Cloudflare 5s Challenge
- AWS Challenge
import puppeteer from "puppeteer-core";
import EventEmitter from 'events';
const emitter = new EventEmitter()
const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&session_ttl=180&proxy_country=ANY';
export async function example(url) {
const browser = await puppeteer.connect({
browserWSEndpoint: scrapelessUrl,
defaultViewport: null
});
console.log("Verbonden met Scrapeless browser");
try {
const page = await browser.newPage();
// Listen for captcha events
console.debug("addCaptchaListener: Start listening for captcha events");
await addCaptchaListener(page);
console.log("Navigated to URL:", url);
await page.goto(url, { waitUntil: "domcontentloaded", timeout: 30000 });
console.log("onCaptchaFinished: Waiting for captcha solving to finish...");
await onCaptchaFinished()
// Screenshot for debugging
console.debug("Taking screenshot of the final page...");
await page.screenshot({ path: 'screenshot.png', fullPage: true });
} catch (error) {
console.error(error);
} finally {
await browser.close();
console.log("Browser closed");
}
}
async function addCaptchaListener(page) {
const client = await page.createCDPSession();
client.on("Captcha.detected", (msg) => {
console.debug("Captcha.detected: ", msg);
});
client.on("Captcha.solveFinished", async (msg) => {
console.debug("Captcha.solveFinished: ", msg);
emitter.emit("Captcha.solveFinished", msg);
client.removeAllListeners()
});
}
async function onCaptchaFinished(timeout = 60_000) {
return Promise.race([
new Promise((resolve) => {
emitter.on("Captcha.solveFinished", (msg) => {
resolve(msg);
});
}),
new Promise((_, reject) => setTimeout(() => reject('Timeout'), timeout))
])
}
reCaptcha 示例
调用示例代码方法来验证自动 reCaptcha 解决。
example('https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox-explicit.php');
Cloudflare Turnstile 示例
调用示例代码方法来验证自动 Cloudflare Turnstile 解决。
example('https://www.scrapingcourse.com/login/cf-turnstile');
Cloudflare Turnstile 的成功解决不仅可以通过监听Captcha.solveFinished
CDP 事件来确认,也可以通过监听window.turnstile.getResponse()
来确认。这是一个完整的示例:
import puppeteer from "puppeteer-core";
const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&session_ttl=180&proxy_country=ANY';
export async function turnstileExample(url) {
const browser = await puppeteer.connect({
browserWSEndpoint: scrapelessUrl,
defaultViewport: null
});
console.log("Verbonden met Scrapeless browser");
try {
const page = await browser.newPage();
console.log("Navigated to URL:", url);
await page.goto(url, { waitUntil: "domcontentloaded", timeout: 30000 });
console.log("onCaptchaFinished: Waiting for captcha solving to finish...");
await waitTurnstile(page)
// Screenshot for debugging
console.debug("Taking screenshot of the final page...");
await page.screenshot({ path: 'screenshot.png', fullPage: true });
} catch (error) {
console.error(error);
} finally {
await browser.close();
console.log("Browser closed");
}
}
async function waitTurnstile(page) {
await page.waitForFunction(() => {
return window.turnstile && window.turnstile.getResponse();
});
const token = await page.evaluate(() => {
return window.turnstile.getResponse();
});
console.log("Cloudflare Turnstile token:", token);
}
turnstileExample('https://www.scrapingcourse.com/login/cf-turnstile');
Cloudflare Challenge 示例
Cloudflare Challenge 比较特殊,因为有时不会触发 Cloudflare Challenge,而通过 CDP 事件监听解决成功的方法则会超时。因此,等待解决后页面上元素的出现是一种更稳定的方法。这是一个完整的示例:
import puppeteer from "puppeteer-core";
const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&session_ttl=180&proxy_country=ANY';
export async function challengeExample(url) {
const browser = await puppeteer.connect({
browserWSEndpoint: scrapelessUrl,
defaultViewport: null
});
console.log("Verbonden met Scrapeless browser");
try {
const page = await browser.newPage();
console.log("Navigated to URL:", url);
await page.goto(url, { waitUntil: "domcontentloaded", timeout: 30000 });
console.log("onCaptchaFinished: Waiting for captcha solving to finish...");
await waitChallenge(page, 'main.page-content .challenge-info')
// Screenshot for debugging
console.debug("Taking screenshot of the final page...");
await page.screenshot({ path: 'screenshot.png', fullPage: true });
} catch (error) {
console.error(error);
} finally {
await browser.close();
console.log("Browser closed");
}
}
async function waitChallenge(page, selector) {
await page.waitForSelector(selector);
console.log("Cloudflare Challenge completed");
}
challengeExample('https://www.scrapingcourse.com/cloudflare-challenge');