๐ŸŽ Surprise Discount: Enjoy 90% Off Your Subscription!

  • Pricing
  • Documentation
EN
Contact

ยฉ 2025 NST LABS TECH LTD. ALL RIGHTS RESERVED

Products

Anti-Detect Browser
Nstbrowser RPA
Cloudflare Bypass
Browserless
Web Unblocker

Solutions

Cloud Fingerprint Browser
Multi-Account Management
Web Scraping & Automation
Anti-Detection Bot

Resources

Pricing
Download
RPA Marketplace
Affiliate Program
Partners
Blog
Release Notes

Support

Contact

Documentation

Legal

Terms
Privacy Policy
Cookies Policy

ProductsSolutionsResourcesSupportLegal

ProductsSolutionsResources

SupportLegal

ยฉ 2025 NST LABS TECH LTD. ALL RIGHTS RESERVED

Back to Blog
Browserless web scraping
Browserless

How to do Web Scraping with Browserless in the Cloud?

Unlock the power of cloud browsers! Discover why Browserless is your go-to for seamless web scraping and learn how to get started in the cloud!
Sep 24, 2024Robin Brown

What Are Cloud Browsers?

Cloud Browser is a cloud-based combination of a web browser application with a virtualized container that implements the concept of remote browser isolation. Developers can automate web browsers using popular tools such as Playwright and Puppeteer for web scraping and web testing.

Cloud Browser executes commands from the network in a secure container, separate from the user endpoint, and accessed through a remote display protocol. As a result, browser applications are more centralized, easier to manage, more cost-effective, scalable, and protected.

Why Scrape With Browserless?

Browserless is a powerful cloud-based solution for seamless browser automation, web scraping, and testing. The primary goal of Browserless is to simplify and scale up web browser automation tasks like testing and scraping and this can be approached in one of two ways:

  • Running multiple browser instances on Docker or Kubernetes.
  • Using existing Web Driver proxy servers like Selenium Grid.

๐Ÿ‘‰ Learn more about Browserless

How to do web scraping with Browserless in the Cloud?

Step1: Preparation

Before we start, we need to have a Browserless service. Using Browserless can solve complex web crawling and large-scale automation tasks, and it has now achieved fully managed cloud deployment.

Browserless adopts a browser-centric approach, provides powerful headless deployment capabilities, and provides higher performance and reliability. For more information about Browserless, you can get the document to learn more.

  1. Get the API KEY and go to the Browserless menu page of the Nstbrowser client, or you can go to the Nstbrowser client to access
Get the API KEY

Do you have any wonderful ideas and doubts about web scraping and Browserless?
Let's see what other developers are sharing on Discord and Telegram!

Step2: Confirm crawling target

Before we start, we need to make sure what we are going to crawl. In the following example, we try to crawl the movie titles in IMDb's Top 250 Movies. After opening the page:

  1. Wait for the page to load normally and locate the page to the movie title under IMDb Top 250 Movies
  2. Open the debug console and identify the html element of the movie title
  3. Use your favorite library to get the movie title
IMDb's Top 250

Step3: Start crawling

Everything is ready, start crawling! We choose to use the powerful Cloud Browserless provided by Nstbrowser to crawl the above content. Below we will list some of the common libraries used together.

Puppeteer

If you haven't chosen a library yet, we highly recommend Puppeteer because it is very active and has many maintainers. It is also built by Chrome developers, so it is one of the highest-quality libraries.

  • Install puppeteer-core
Bash Copy
# pnpm
pnpm i puppeteer-core
# yarn
yarn add puppeteer-core
# npm
npm i --save puppeteer-core
  • Code script
JavaScript Copy
import puppeteer from "puppeteer-core";

const token = "your api key"; // 'your proxy'

const config = {
  proxy: 'your proxy', // required; input format: schema://user:password@host:port eg: http://user:password@localhost:8080
  // platform: 'windows', // support: windows, mac, linux
  // kernel: 'chromium', // only support: chromium
  // kernelMilestone: '128', // support: 128
  // args: {
  //     "--proxy-bypass-list": "detect.nstbrowser.io"
  // }, // browser args
  // fingerprint: {
  //     userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.6613.85 Safari/537.36', // userAgent supportted since v0.15.0
  // },
};

const query = new URLSearchParams({
  token: token, // required
  config: JSON.stringify(config),
});

const browserWSEndpoint = `https://less.nstbrowser.io/connect?${query.toString()}`;
// Connect browserless
const browser = await puppeteer.connect({
  browserWSEndpoint,
  defaultViewport: null,
})
console.info('Connected!');

// Create new page
const page = await browser.newPage()

// Visit IMDb top 250 page
await page.goto('https://www.imdb.com/chart/top/?ref_=nv_mv_250')

// Wait for the movie list to load
await page.waitForSelector('.ipc-metadata-list')

// Get a list of movie titles
const moviesList = await page.$$eval('.ipc-metadata-list h3.ipc-title__text', nodes => nodes.map(node => node.textContent));

console.log('[IMDb Top 250 Movies]===>', moviesList);

// Close browser
await browser.close();

Congratulation! We have finished our scraping task. You can see the result of 250 movies in the console:

scraping result

Playwright

It is an active open-source project with a large number of contributors. Playwright was developed by Microsoft and supports multiple browsers (Chromium, Firefox, and WebKit) and multiple programming languages (Nodejs, Python, .NET, and Java), making it one of the most versatile and high-quality browser automation tools.

Use in Nodejs

  • Install Playwright
Bash Copy
# pnpm
pnpm create playwright
# yarn
yarn create playwright
# npm
npm init playwright@latest
  • Code script
JavaScript Copy
import { chromium } from 'playwright'

const token = "your api key"; // 'your proxy'

const config = {
  proxy: 'your proxy', // required; input format: schema://user:password@host:port eg: http://user:password@localhost:8080
  // platform: 'windows', // support: windows, mac, linux
  // kernel: 'chromium', // only support: chromium
  // kernelMilestone: '128', // support: 128
  // args: {
  //     "--proxy-bypass-list": "detect.nstbrowser.io"
  // }, // browser args
  // fingerprint: {
  //     userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.6613.85 Safari/537.36', // userAgent supportted since v0.15.0
  // },
};

const query = new URLSearchParams({
  token: token, // required
  config: JSON.stringify(config),
});

const browserWSEndpoint = `ws://less.nstbrowser.io/connect?${query.toString()}`;
// Connect browserless
const browser = await chromium.connectOverCDP(browserWSEndpoint);
console.info('Connected!');

// Create new page
const page = await browser.newPage();

// Visit IMDb's top 250 page
await page.goto('https://www.imdb.com/chart/top/?ref_=nv_mv_250');

// Wait for movie list to load
await page.waitForSelector('.ipc-metadata-list');

// Get a list of movie titles
const moviesList = await page.$$eval('.ipc-metadata-list h3.ipc-title__text', nodes => nodes.map(node => node.textContent));

console.log('[IMDb Top 250 Movies]===>', moviesList);

// Close browser
await browser.close();

Here we can figure out the lists after running our project:

scraping result

Use in Python

  • Install Playwright
Bash Copy
pip install pytest-playwright
  • Code script
Python Copy
from playwright.sync_api import sync_playwright
from urllib.parse import urlencode
import json

token = "your api key"  # 'your proxy'

config = {
    "proxy": "your proxy",  # required; input format: schema://user:password@host:port eg: http://user:password@localhost:8080
    # platform: 'windows', // support: windows, mac, linux
    # kernel: 'chromium', // only support: chromium
    # kernelMilestone: '128', // support: 128
    # args: {
    #     "--proxy-bypass-list": "detect.nstbrowser.io"
    # }, // browser args
    # fingerprint: {
    #     userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.6613.85 Safari/537.36', // userAgent supportted since v0.15.0
    # },
}

query = urlencode({"token": token, "config": json.dumps(config)})

browser_ws_endpoint = f"ws://less.nstbrowser.io/connect?{query}"

def scrape_imdb_top_250():
    with sync_playwright() as p:
        # Connect browserless
        browser = p.chromium.connect_over_cdp(browser_ws_endpoint)
        print("Connected!")

        # Create new page
        page = browser.new_page()

        # Visit IMDb top 250 page
        page.goto("https://www.imdb.com/chart/top/?ref_=nv_mv_250")

        # Wait for movie list to load
        page.wait_for_selector(".ipc-metadata-list")

        # Get a list of movie titles
        movies_list = page.eval_on_selector_all(
            ".ipc-metadata-list h3.ipc-title__text",
            "nodes => nodes.map(node => node.textContent)",
        )

        print("[IMDb Top 250 Movies]===>", movies_list)

        # Close browser
        browser.close()

scrape_imdb_top_250()

Of course, the following is the scraping result:

scraping result

Select your favorite language and library, execute the corresponding script, and you can see the crawled results!

Step4: View the Browserless Dashboard

You can view all statistics for recent requests and remaining session time in the Browserless menu of the Nstbrowser client.

View the Browserless Dashboard

๐Ÿ‘‰ Download Nstbrowser Now!

5 Tips to Bypass Scraper Blocking with Browserless?

There are some key differences between headless browsers and regular browsers. The anti-bot service detects headless browsers by spotting these differences.

Bypassing scraper blocking with Browserless involves several key strategies:

  1. Use Headless Browsers: Browserless allows you to run headless browsers like Puppeteer, which can mimic real user behavior and reduce detection risks.
  2. Rotate User Agents: Randomizing user agents can help avoid blocking by making your requests appear to come from different browsers.
  3. Implement Proxy Rotation: Utilize rotating proxies to change IP addresses frequently, minimizing the chances of being blocked by target websites.
  4. Throttle Requests: Mimic human-like browsing behavior by introducing delays between requests, preventing rapid-fire requests that trigger security measures.
  5. Manage Cookies and Sessions: Properly handle cookies and session data to maintain continuity and avoid detection, as persistent sessions are less likely to be flagged.

What are the differences Between Browserless and Headless Browser?

  1. Definition:

Browserless: A service or platform that provides cloud-based browser automation, enabling users to run headless browsers without managing the underlying infrastructure.

Headless Browser: A web browser without a graphical user interface (GUI) that can be run in a command-line environment, useful for automating web tasks.

  1. Management:

Browserless: Offers managed services, abstracting the complexity of setting up and maintaining browser instances. Users interact with an API to initiate tasks.

Headless Browser: Requires users to set up and manage the browser environment, including dependencies and configurations.

  1. Scalability:

Browserless: Designed for scalability, allowing users to run multiple browser instances in parallel easily, leveraging cloud resources.

Headless Browser: Scaling may involve more manual work, such as managing multiple instances and server resources.

  1. Use Cases:

Browserless: Ideal for developers and teams needing scalable automation for web scraping, testing, and data extraction without infrastructure concerns.

Headless Browser: Suitable for developers who prefer direct control over their automation tasks and are comfortable managing their own environment.

  1. Integration:

Browserless: Often provides integrations with various tools and services, making it easier to incorporate into existing workflows.

Headless Browser: Requires custom integration efforts to connect with other tools or services.

It's a Wrap

How to do web scraping with browserless in the Cloud? We have already explored 4 detailed steps. Just choose your favorite library and then scrape the data you need.

In addition, we have also learned that:

  • What is a Cloud browser?
  • Benefits of Browserless
  • 5 effective to avoid blocking

Start using Browserless to simplify everything!

More
Running headless Chrome in the cloud for scalable web scraping
Headless BrowserWeb ScrapingBrowserless
How to Run Headless Chrome in the Cloud for Large-Scale Scraping
How to Run Headless Chrome in the Cloud for Large-Scale Scraping
Sep 02, 2025Robin Brown
Playwright vs. Puppeteer for Web Scraping
BrowserlessHeadless BrowserWeb Scraping
Playwright vs. Puppeteer for Web Scraping: Choosing Your Champion
Playwright vs. Puppeteer for Web Scraping: Choosing Your Champion
Sep 01, 2025Carlos Rivera
Puppeteer Stealth Plugin for Undetectable Web Scraping
BrowserlessHeadless BrowserWeb Scraping
Mastering Stealth: Leveraging Puppeteer Stealth Plugin for Undetectable Web Scraping
Mastering Stealth: Leveraging Puppeteer Stealth Plugin for Undetectable Web Scraping
Sep 01, 2025Robin Brown
The Best Headless Browser
Headless BrowserBrowserlessWeb Scraping
The Best Headless Browsers for Web Scraping: A Comprehensive Guide
The Best Headless Browsers for Web Scraping: A Comprehensive Guide
Sep 01, 2025Carlos Rivera
http-2-bypass
Browserless
What Is HTTP/2 Fingerprinting and How to Bypass It?
Learn how to bypass HTTP/2 fingerprinting in web scraping with six powerful methods, from using real browsers to cloud-based Browserless. Stay undetected against modern anti-bot defenses.
Jun 03, 2025Carlos Rivera
Load Browser Extensions in Nstbrowser Docker
Browserless
How to Load Browser Extensions in Nstbrowser Docker?
Learn 2 methods for uploading extensions in Nstbrowser and the steps to launch them in Docker.
Mar 19, 2025Carlos Rivera
Catalogue