通用抓取API功能JS Render

JS 渲染

通用抓取 API 是一款功能强大的网页内容检索服务,支持复杂的网页渲染和交互场景。

请参考我们的 API 文档 获取详细信息。

请求结构

{
  "actor": "unlocker.webunlocker",
  "proxy": {
    "country": "ANY",
    "url": ""
  },
  "input": {
    "url": "string",
    "jsRender": {
      "enabled": true,
      "headless": true,
      "waitUntil": "domcontentloaded",
      "instructions": [],
      "block": {
        "resources": [],
        "urls": []
      },
      "response": {
        "type": "html",
        "options": {}
      }
    }
  }
}

核心功能

JavaScript 渲染

JavaScript 渲染功能能够处理动态加载的内容和 SPA(单页应用程序)。它启用完整的浏览器环境,支持更复杂的页面交互和渲染需求。

input.jsRender.enabled=true,我们将使用浏览器发起请求。

{
  "actor": "unlocker.webunlocker",
  "proxy": {
    "country": "ANY"
  },
  "input": {
    "url": "https://example.com/",
    "jsRender": {
      "enabled": true
    }
  }
}

JavaScript 指令

提供一套完整的 JavaScript 指令集,允许您与网页动态交互。

这些指令使您可以点击元素、填写表单、提交表单或等待特定元素出现,从而为诸如点击“阅读更多”按钮或提交表单等任务提供灵活性。

{
  "actor": "unlocker.webunlocker",
  "input": {
    "url": "https://example.com",
    "jsRender": {
      "enabled": true,
      "instructions": [
        {
          "waitFor": [
            ".dynamic-content",
            30000
          ]
          // 等待元素
        },
        {
          "click": [
            "#load-more",
            1000
          ]
          // 点击元素
        },
        {
          "fill": [
            "#search-input",
            "search term"
          ]
          // 填充表单
        },
        {
          "keyboard": [
            "press",
            "Enter"
          ]
          // 模拟按键
        },
        {
          "evaluate": "window.scrollTo(0, document.body.scrollHeight)"
          // 执行自定义 JS
        }
      ]
    }
  }
}

以下是您可以使用 JavaScript 指令执行的一些常见操作:

JavaScript 指令参考

指令语法描述示例
waitFor[selector, timeout]等待元素出现{"waitFor": [".content", 30000]}
click[selector, delay]点击元素{"click": [".button", 1000]}
fill[selector, value]填充表单{"fill": ["#input", "text"]}
waitmilliseconds固定等待时间{"wait": 2000}
evaluatejavascript code执行 JS 代码{"evaluate": "console.log('test')"}
keyboard[action, value, delay?]键盘操作请参见下表中的键盘操作

键盘操作

操作语法描述示例
按键["press", keyInput]按下指定的 keyInput{"keyboard": ["press", "Enter"]}
输入文本["type", text, delay?]输入文本,可选延迟{"keyboard": ["type", "Hello", 20]}
按下键["down", key]按住一个键{"keyboard": ["down", "Shift"]}
松开键["up", key]释放一个键{"keyboard": ["up", "Shift"]}

支持的特殊 KeyInput 类型: https://pptr.dev/api/puppeteer.keyinput

响应类型

您可以通过参数 input.jsRender.response.type 指定响应类型,可选值为:html | plaintext | markdown | png | jpeg | network | content,默认值为 html

类型描述
html页面的转义原始 HTML 字符串,支持 CSS 选择器
plaintext页面的纯文本字符串,支持 CSS 选择器
markdown页面的转义 Markdown 字符串,支持 CSS 选择器
png使用 png 格式的页面的 base64 编码字符串,支持 CSS 选择器
jpeg使用 jpeg 格式的页面的 base64 编码字符串,支持 CSS 选择器
network启用网络请求捕获,它将收集页面加载期间发出的所有 XHRfetch 请求,并将其详细信息以转义的 JSON 字符串格式返回,不支持 CSS 选择器
content从页面内容中过滤数据,它将结果以转义的 JSON 字符串格式返回

详情如下:

HTML

用于提取页面的 HTML 内容,最适合纯静态页面,并以转义的 HTML 字符串格式返回内容。

在请求中添加 input.jsRender.response.type=html

const axios = require('axios');
const fs = require('fs');
 
(async () => {
    const payload = {
        actor: "unlocker.webunlocker",
        proxy: {
            country: "ANY"
        },
        input: {
            url: "https://www.example.com",
            jsRender: {
                enabled: true,
                response: {
                    type: "html"
                }
            }
        }
    };
 
    const response = await axios.post("https://api.scrapeless.com/api/v2/unlocker/request", payload, {
        headers: {
            "x-api-token": "API Key",
            "Content-Type": "application/json"
        },
        timeout: 60000
    });
 
    if (response.data?.code === 200) {
        fs.writeFileSync('response.html', response.data.data, 'utf8');
    }
})();

返回 HTML 格式的文本内容。

{
    "code": 200,
    "data": "<!DOCTYPE html><html><head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\">\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n\n\n</body></html>"
}

保存后的 HTML 文件示例内容:

<!DOCTYPE html><html><head>
    <title>Example Domain</title>
 
    <meta charset="utf-8">
    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
 
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>
 
<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
 
</body></html>

纯文本

纯文本功能是一个输出选项,它以纯文本格式而不是 HTML 或 Markdown 格式返回抓取的内容。当需要内容的简洁、未格式化的版本(没有任何 HTML 标签或 Markdown 格式)时,此功能非常实用。它简化了内容提取过程,使文本处理或分析更加方便。

在请求中添加 input.jsRender.response.type=plaintext

const axios = require('axios');
const fs = require('fs');
 
(async () => {
    const payload = {
        actor: "unlocker.webunlocker",
        proxy: {
            country: "ANY"
        },
        input: {
            url: "https://www.example.com",
            jsRender: {
                enabled: true,
                response: {
                    type: "plaintext"
                }
            }
        }
    };
 
    const response = await axios.post("https://api.scrapeless.com/api/v2/unlocker/request", payload, {
        headers: {
            "x-api-token": "API Key",
            "Content-Type": "application/json"
        },
        timeout: 60000
    });
 
    if (response.data?.code === 200) {
        fs.writeFileSync('response.txt', response.data.data, 'utf8');
    }
})();

返回页面纯文本内容作为字符串。请参见下面的示例。

{
    "code": 200,
    "data": "Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\nMore information..."
}

保存后的 txt 文件示例内容:

Example Domain

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

More information...

Markdown

对于以 Markdown 格式提取页面内容,纯静态 Markdown 页面效果最佳,通用抓取 API 将以 Markdown 格式返回内容,使其更易于阅读和处理。

在请求中添加 input.jsRender.response.type=markdown

const axios = require('axios');
const fs = require('fs');
 
(async () => {
    const payload = {
        actor: "unlocker.webunlocker",
        proxy: {
            country: "ANY"
        },
        input: {
            url: "https://www.example.com",
            jsRender: {
                enabled: true,
                response: {
                    type: "markdown"
                }
            }
        }
    };
 
    const response = await axios.post("https://api.scrapeless.com/api/v2/unlocker/request", payload, {
        headers: {
            "x-api-token": "API Key",
            "Content-Type": "application/json"
        },
        timeout: 60000
    });
 
    if (response.data?.code === 200) {
        fs.writeFileSync('response.md', response.data.data, 'utf8');
    }
})();

返回 Markdown 格式的文本内容。

{
    "code": 200,
    "data": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)"
}

保存后的 Markdown 文件示例内容:

# Example Domain
 
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
 
[More information...](https://www.iana.org/domains/example)
 

PNG/JPEG

您可以捕获目标页面的屏幕截图,并以 PNG 或 JPEG 格式返回图像。当响应结果设置为 PNG 或 JPEG 时,您可以使用 input.jsRender.response.options.fullPage=true 参数来指定返回的结果是否为全页面截图。

通过在请求中添加 input.jsRender.response.type=png or jpeg

const axios = require('axios');
const fs = require('fs');
 
(async () => {
    const payload = {
        actor: "unlocker.webunlocker",
        proxy: {
            country: "ANY"
        },
        input: {
            url: "https://www.example.com",
            jsRender: {
                enabled: true,
                response: {
                    type: "png" // png 或 jpeg
                }
            }
        }
    };
 
    const response = await axios.post("https://api.scrapeless.com/api/v2/unlocker/request", payload, {
        headers: {
            "x-api-token": "API Key",
            "Content-Type": "application/json"
        },
        timeout: 60000
    });
 
    if (response.data?.code === 200) {
        fs.writeFileSync('response.png', Buffer.from(response.data.data, 'base64'));
    }
})();

返回 PNG 或 JPEG 格式的 base64 编码字符串。

{
    "code": 200,
    "data": "JVBERi0xLjQKJdPr6eEKM..."
}

保存后的 png/jpeg 文件示例:

Network

input.jsRender.response.type=network 时,页面加载期间会捕获所有 XHRfetch 类型的网络请求。然后,网络请求数据将以转义的 JSON 字符串格式返回。此响应数据包括 URL、请求方法、响应状态代码、标头、响应正文等。

如果请求或响应正文包含二进制内容、超大的响应正文或非文本数据,则不会直接返回原始内容。相反,它将用占位符字符串 [Preview not available ...] 标记。您可以使用 input.jsRender.response.options 参数通过 URL、请求方法和状态代码条件过滤结果。

const axios = require('axios');
const fs = require('fs');
 
(async () => {
    const payload = {
        actor: "unlocker.webunlocker",
        proxy: {
            country: "ANY"
        },
        input: {
            url: "https://www.example.com",
            jsRender: {
                enabled: true,
                response: {
                    type: "network",
                    options: {
                        "urls": [
                            "/example"
                        ],
                        "status": [
                            200
                        ],
                        "methods": [
                            "get"
                        ]
                    }
                }
            }
        }
    };
 
    const response = await axios.post("https://api.scrapeless.com/api/v2/unlocker/request", payload, {
        headers: {
            "x-api-token": "API Key",
            "Content-Type": "application/json"
        },
        timeout: 60000
    });
 
    if (response.data?.code === 200) {
        fs.writeFileSync('response.json', response.data.data, 'utf8');
    }
})();

返回带有转义 JSON 字符串的数据:

{
    "code": 200,
    "data": "[{\"url\":\"https://www.tiktok.com/api/explore/item_list/...]"
}

JSON 结果示例如下:

[
  {
    "url": "https://www.tiktok.com/api/explore/item_list/?WebIdLastTime=1752724401&aid=1988&app_language=en&app_name=tiktok_web&browser_language=en&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F135.0.0.0%20Safari%2F537.36&categoryType=120&channel=tiktok_web&clientABVersions=70508271%2C73485602%2C73547759%2C73720540%2C73810951%2C73814854%2C73848867%2C73866686%2C73944035%2C73969557%2C73990102%2C74048200%2C74129613%2C74148345%2C74157215%2C74163128%2C74176097%2C74195789%2C74213192%2C74241848%2C70405643%2C71057832%2C71200802%2C72361743%2C73171280%2C73208420&cookie_enabled=true&count=8&data_collection_enabled=false&device_id=7527893946556515853&device_platform=web_pc&enable_cache=true&focus_state=true&history_len=2&is_fullscreen=false&is_page_visible=true&language=en&odinId=7527893969448764429&os=windows&priority_region=&referer=&region=US&screen_height=1440&screen_width=3440&tz_name=America%2FNew_York&user_is_login=false&webcast_language=en",
    "method": "GET",
    "resourceType": "fetch",
    "status": 200,
    "timestamp": 1752724403206,
    "payload": null,
    "requestReaders": {
      "sec-ch-ua-platform": "\"Windows\"",
      "referer": "https://www.tiktok.com/explore",
      "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",
      "sec-ch-ua": "\"Google Chrome\";v=\"135\", \"Not-A.Brand\";v=\"8\", \"Chromium\";v=\"135\"",
      "sec-ch-ua-mobile": "?0",
      "accept": "*/*",
      "cookie": "tt_csrf_token=KO5LUsj8-r2G0Lcbmx_RqngSiFd_VRcPiaeY; ttwid=1%7CQapzXhCnEqiLXypjvNK3iX65g9iXPk_Jpj4GGLdqNRY%7C1752724401%7Cfb5daf3940529652ba613376c011e990b0afede828ad52c4e0c14f1c422bea61; tt_chain_token=jIGTF0ppLEuXKFGayjhpyg=="
    },
    "responseHeaders": {
      "access-control-expose-headers": "x-tt-traceflag,x-tt-logid",
      "bd-tt-error-code": "0",
      "cache-control": "max-age=1800, must-revalidate",
      "content-encoding": "br",
      "content-length": "30900",
      "content-type": "application/json; charset=utf-8",
      "date": "Thu, 17 Jul 2025 03:53:23 GMT",
      "expires": "Thu, 17 Jul 2025 03:53:23 GMT",
      "pragma": "no-cache",
      "server": "nginx",
      "server-timing": "cdn-cache; desc=HIT, edge; dur=0, origin; dur=0\ninner; dur=387",
      "tt_stable": "1",
      "x-akamai-request-id": "42a8e99d",
      "x-cache": "TCP_MEM_HIT from a23-47-221-69.deploy.akamaitechnologies.com (AkamaiGHost/22.2.0-c471f2b4819e3aa253dfcc21bfdfd452) (-)",
      "x-ms-token": "YAOuylbpReZ5gTM1PP8mwsmWMCxWprQ4oHRNuZQgKsADY7HTftSBu6W9raVm6PKyp-1mXt9Q6CIs0BHLRxozI_uNNEOWSvkaxFyunXX-54aBUvkuHBe2id6bY0cB",
      "x-tt-logid": "20250717035100C13E2314287BF101E7D9",
      "x-tt-trace-host": "0102b37aa413a15dcf9191a3f676ab4b78d5ba03a6d109c921ad21a607e80d7a40dbc340eb8c009458e52488a06a1b874047a91b63eb21ce08d01175dca60742a8bdf12f766710e93ed82ca68be07bf95a053639c5cedca212d37246317d611b65",
      "x-tt-trace-id": "00-250717035100C13E2314287BF101E7D9-729F56051B790290-00",
      "x-tt-trace-tag": "id=16;cdn-cache=hit;type=static"
    },
    "responseBody": {
      "data": "omitted"
    },
  }
]
"responseSize": 249297,
    "error": null
  }
]

内容

input.jsRender.response.type=content时,它将过滤页面内容中的JSON格式数据,响应将被固定为JSON字符串格式。input.jsRender.response.options.outputs参数允许您精确定义要从抓取的页面内容中提取哪些数据类型,从而能够高效地仅检索所需信息。通过这样做,您可以减少处理时间并专注于最符合您用例的相关数据。

如果input.jsRender.response.options.outputs为空,则将返回所有输出,可选输出包括:

phone_numbers, headings, images, audios, videos, links, menus, hashtags, emails, metadata, tables, favicon

有关详细用法,请查看下面的代码。

const axios = require('axios');
const fs = require('fs');
 
(async () => {
    // 配置
    const url = "https://api.scrapeless.com/api/v2/unlocker/request";
    const token = "API Key";
 
    const headers = {"x-api-token": token, "Content-Type": "application/json"};
 
    const payload = {
        actor: "unlocker.webunlocker",
        proxy: {
            country: "ANY"
        },
        input: {
            url: "https://www.example.com",
            jsRender: {
                enabled: true,
                response: {
                    type: "content",
                    options: {
                        outputs: [
                            "phone_numbers",
                            "headings",
                            "images",
                            "audios",
                            "videos",
                            "links",
                            "menus",
                            "hashtags",
                            "emails",
                            "metadata",
                            "tables",
                            "favicon"
                        ]
                    }
                }
            }
        }
    };
 
    try {
        const response = await axios.post(url, payload, {headers, timeout: 60000});
 
        if (response.status !== 200) {
            throw newError(`HTTP Error: ${response.status}`);
        }
 
        const data = response.data;
        if (data.code !== 200) {
            throw newError(`API Error: ${data}`);
        }
 
        const content = data.data || '';
 
        // 保存并返回结果
        fs.writeFileSync('response.json', content, 'utf8');
        console.log('✅ Success! Content saved as response.json');
 
        returnJSON.parse(content);
 
    } catch (error) {
        console.error('❌ Error:', error.message);
        throw error;
    }
})()

这里有一些例子:

邮件

使用CSS选择器和正则表达式提取标准格式的电子邮件地址,例如example@example.com

{
    "code": 200,
    "data": "{\"emails\":[\"market@scrapeless.com\"]}"
}
 
电话号码

使用CSS选择器和正则表达式提取电话号码,重点关注包含tel:协议的链接。

示例:outputs=phone_numbers

{
    "code": 200,
    "data": "{ \"phone_numbers\": [ \"+1-111-111-111\" ] }"
}
 
标题

从HTML中的H1H6提取标题文本。

示例:outputs=headings

{
    "code": 200,
    "data": "{\"headings\":[\"Example Domain\"]}"
}
 
图片

img标签中提取图片源。只返回src属性。

示例:outputs=images

{
    "code": 200,
    "data": "{\"images\":[\"https://www.scrapeless.com/_next/image?url=%2Fassets%2Fimages%2Ftoolkit%2Flight%2Fimg-2.png&w=750&q=100\"]}"
}
 
音频

audio标签内的source元素中提取音频源。只返回src属性。

示例:outputs=audios

{
    "code": 200,
    "data": "{\"audios\":[\"https://example.com/audio.mp3\"]}"
}
 
视频

video标签内的source元素中提取视频源。只返回src属性。

示例:outputs=videos

{
    "code": 200,
    "data": "{\"videos\":[\"https://example.com/video.mp4\"]}"
}
 
链接

a标签中提取URL。只返回href属性。

示例:outputs=links

{
    "code": 200,
    "data": "{\"links\":[\"https://app.scrapeless.com/landing/guide\",\"https://www.scrapeless.com/en\",\"https://www.scrapeless.com/en/pricing\",\"https://docs.scrapeless.com/\",\"https://backend.scrapeless.com/app/api\",\"https://www.producthunt.com/posts/scrapeless-deep-serpapi\",\"https://www.g2.com/products/scrapeless/reviews\",\"https://www.trustpilot.com/review/scrapeless.com\",\"https://slashdot.org/software/p/Scrapeless/\",\"https://tekpon.com/software/scrapeless/reviews/\",\"https://www.scrapeless.com/en/product/deep-serp-api\",\"https://www.scrapeless.com/en/product/scraping-browser\",\"https://www.scrapeless.com/en/product/scraping-api\",\"https://www.scrapeless.com/en/product/universal-scraping-api\",\"https://www.scrapeless.com/en/solutions/e-commerce\",\"https://www.scrapeless.com/en/solutions/seo\",\"https://www.scrapeless.com/en/solutions/real-estate\",\"https://www.scrapeless.com/en/solutions/travel-hotel-airline\",\"https://www.scrapeless.com/en/solutions/social-media\",\"https://www.scrapeless.com/en/solutions/market-research\",\"https://www.scrapeless.com/en/blog\",\"https://www.scrapeless.com/en/blog/deep-serp-api-online\",\"https://www.scrapeless.com/en/blog/scrapeless-web-scraping-toolkit\",\"https://www.scrapeless.com/en/blog/google-shopping-scrape\",\"https://backend.scrapeless.com/app/api/v1/public/links/github\",\"https://backend.scrapeless.com/app/api/v1/public/links/youtube\",\"mailto:market@scrapeless.com\",\"https://www.scrapeless.com/en/ai-agent\",\"https://browserless.scrapeless.com/\",\"https://www.scrapeless.com/en/solutions/temu\",\"https://www.scrapeless.com/en/solutions/walmart\",\"https://www.scrapeless.com/en/solutions/shopee\",\"https://www.scrapeless.com/en/solutions/lazada\",\"https://www.scrapeless.com/en/solutions/amazon\",\"https://www.scrapeless.com/en/solutions/google-trends\",\"https://www.scrapeless.com/en/solutions/google-search\",\"https://www.scrapeless.com/en/solutions/airbnb\",\"https://www.scrapeless.com/en/solutions/scoot\",\"https://www.scrapeless.com/en/solutions/latam\",\"https://www.scrapeless.com/en/solutions/localiza\",\"https://www.scrapeless.com/en/solutions/tiktok\",\"https://www.scrapeless.com/en/solutions/instagram\",\"https://www.scrapeless.com/en/integration\",\"https://www.scrapeless.com/en/faq\",\"https://www.scrapeless.com/en/glossary\",\"https://www.scrapeless.com/en/legal/privacy-policy\",\"https://www.scrapeless.com/en/legal/terms\",\"https://www.scrapeless.com/en/legal/terms#refund-policy\",\"https://www.scrapeless.com/en/legal/check-your-data\",\"https://backend.scrapeless.com/app/api/v1/public/links/discord\"]}"
}
 
菜单

从菜单标签内的li元素中提取菜单项。

示例:outputs=menus

{
    "code": 200,
    "data": "{\"links\":[ \"Coffee\", \"Tea\", \"Milk\" ]}"
}
 
标签

使用正则表达式提取标签格式以匹配典型的标签模式,例如#example

示例:outputs = hashtags

{
    "code": 200,
    "data": "{\"hashtags\":[\"#docsearch\",\"#search\"]}"
}
 
元数据

head部分的meta标签中提取元信息,以name: content的格式返回namecontent属性。

示例:outputs=metadata

{
    "code": 200,
    "data": "{\"metadata\":[\"viewport: width=device-width, initial-scale=1\",\"description: Scrapeless is the best full-stack web scraping toolkit offering Scraping API, Scraping Browser\"]}"
}
 
表格

从表格元素中提取数据,并以JSON格式返回表格数据,包括维度、标题和内容。

示例:outputs=tables

{
    "code": 200,
    "data": "{\"tables\":[{\"dimensions\":{\"rows\":7,\"columns\":3,\"heading\":true},\"heading\":[\"Company\",\"Contact\",\"Country\"],\"content\":[{\"Company\":\"Alfreds Futterkiste\",\"Contact\":\"Maria Anders\",\"Country\":\"Germany\"},{\"Company\":\"Centro comercial Moctezuma\",\"Contact\":\"Francisco Chang\",\"Country\":\"Mexico\"},{\"Company\":\"Ernst Handel\",\"Contact\":\"Roland Mendel\",\"Country\":\"Austria\"},{\"Company\":\"Island Trading\",\"Contact\":\"Helen Bennett\",\"Country\":\"UK\"},{\"Company\":\"Laughing Bacchus Winecellars\",\"Contact\":\"Yoshi Tannamuri\",\"Country\":\"Canada\"},{\"Company\":\"Magazzini Alimentari Riuniti\",\"Contact\":\"Giovanni Rovelli\",\"Country\":\"Italy\"}]},{\"dimensions\":{\"rows\":11,\"columns\":2,\"heading\":true},\"heading\":[\"Tag\",\"Description\"],\"content\":[{\"Tag\":\"<table>\",\"Description\":\"Defines a table\"},{\"Tag\":\"<th>\",\"Description\":\"Defines a header cell in a table\"},{\"Tag\":\"<tr>\",\"Description\":\"Defines a row in a table\"},{\"Tag\":\"<td>\",\"Description\":\"Defines a cell in a table\"},{\"Tag\":\"<caption>\",\"Description\":\"Defines a table caption\"},{\"Tag\":\"<colgroup>\",\"Description\":\"Specifies a group of one or more columns in a table for formatting\"},{\"Tag\":\"<col>\",\"Description\":\"Specifies column properties for each column within a <colgroup> element\"},{\"Tag\":\"<thead>\",\"Description\":\"Groups the header content in a table\"},{\"Tag\":\"<tbody>\",\"Description\":\"Groups the body content in a table\"},{\"Tag\":\"<tfoot>\",\"Description\":\"Groups the footer content in a table\"}]}]}"
}
 
Favicon

从HTML head部分的link元素中提取favicon URL。

示例:outputs=favicon

{
    "code": 200,
    "data": "{\"favicon\":\"https://www.scrapeless.com/favicon.ico\"}"
}
 

资源控制

用于优化性能和带宽使用的资源加载控制系统。

{
  "actor": "unlocker.webunlocker",
  "proxy": {
    "country": "ANY",
    "url": ""
  },
  "input": {
    "url": "string",
    "jsRender": {
      "enabled": true,
      "block": {
        "resources": [
          "Image",
          "Font",
          "Stylesheet",
          "Script"
        ],
        "urls": [
          // 可选,基于URL模式的阻止
          "*.analytics.com/*",
          "*/ads/*"
        ]
      }
    }
  }
}

完整的资源类型参考:

资源类型描述影响
Document主要文档和iframe核心页面内容
StylesheetCSS文件页面样式和布局
Image图片和图标可视化内容
Media音频和视频资源多媒体内容
Font网络字体文本渲染
ScriptJavaScript文件页面功能
TextTrack视频字幕和字幕媒体辅助功能
XHRXMLHttpRequest调用旧版异步请求
FetchFetch API请求现代异步请求
Prefetch预取资源性能优化
EventSource服务器发送事件实时更新
WebSocketWebSocket连接双向通信
ManifestWeb应用清单PWA配置
SignedExchange已签名的HTTP交换内容真实性
PingPing请求分析和跟踪
CSPViolationReportCSP违规报告安全监控
PreflightCORS预检请求跨域安全
Other未分类资源其他

使用方法示例:

{
  "actor": "unlocker.webunlocker",
  "proxy": {
    "country": "ANY",
    "url": ""
  },
  "input": {
    "url": "string",
    "jsRender": {
      "enabled": true,
      "block": {
        "resources": [
          "Image",
          "Font",
          "Stylesheet",
          "Script",
          "Media",
          "Ping",
          "Prefetch"
        ]
      }
    }
  }
}

资源阻塞的最佳实践:

  1. 性能优化

    • 谨慎使用资源阻塞,阻止不必要的资源以加快加载速度
    • 考虑阻止PrefetchPing以减少网络使用
    • 保持Document和关键Script资源不被阻止
  2. 带宽管理

    • 对于带宽密集型页面,阻止ImageMedia
    • 考虑阻止Font以使用系统字体
  3. 稳定性增强

    • 实现请求重试机制
    • 添加错误处理逻辑
    • 使用waitFor代替固定的wait
  4. 资源效率

    • 按需加载资源
    • 及时关闭不必要的连接

注意: 资源类型字符串区分大小写。请使用参考表中所示的精确匹配。