🎁 Surprise Discount: Enjoy 90% Off Your Subscription!

  • Pricing
  • Documentation
EN
Contact

© 2025 NST LABS TECH LTD. ALL RIGHTS RESERVED

Products

Anti-Detect Browser
Nstbrowser RPA
Cloudflare Bypass
Browserless
Web Unblocker

Solutions

Cloud Fingerprint Browser
Multi-Account Management
Web Scraping & Automation
Anti-Detection Bot

Resources

Pricing
Download
RPA Marketplace
Affiliate Program
Partners
Blog
Release Notes

Support

Contact

Documentation

Legal

Terms
Privacy Policy
Cookies Policy

ProductsSolutionsResourcesSupportLegal

ProductsSolutionsResources

SupportLegal

© 2025 NST LABS TECH LTD. ALL RIGHTS RESERVED

Back to Blog
docker image
Browserless

Playwright Web Scraping: in Docker of Browserless

Running Playwright in Docker is the second game-changer! What about the basics of Docker? How to do Playwright web scraping in Docker? Find out in this blog.
Aug 23, 2024Robin Brown

Be Familiar with Docker

What is Docker? Do you still remember the shipping industry and the development of standardized containers in the mid-20th century?

The introduction of containers allowed people to use large machinery such as cranes to load and unload goods. This greatly reduced shipping time and costs, making cargo transportation faster and more efficient than ever before. Therefore, it contributed to the prosperity of the shipping industry to a great extent.

Yes, Docker is like containers in the field of software development: introducing a standardized, container-based approach to packaging, distributing, and running applications. It greatly accelerates the development of the dev field.

In simple terms, Docker is a tool that allows developers to easily deploy their applications into sandboxes (called Docker containers) to run on the host operating system (i.e. Linux).

What Benefits Does Docker Give?

  • Isolated Execution Environment
  • Simplified Testing
  • Reliability and Consistency at Scale
  • Faster Feature Delivery

What is Browserless?

Browserless is a cloud-based clustered browser solution tailored for efficient browser automation, web scraping, and testing.

Browserless test

Built on Nstbrowser’s fingerprint library, it offers random fingerprint switching for seamless data collection and automation. With its strong cloud infrastructure, Browserless enables easy access to multiple browser instances, streamlining the management of automation tasks.

Do you have any wonderful ideas and doubts about web scraping and Browserless?
Let's see what other developers are sharing on Discord and Telegram!

How Do You Use the Docker Image of Nstbrowser?

Before you start using Nstbrowser's Docker image, make sure you've completed the following steps:

  1. Ensure you have Docker set up. If Docker isn't installed on your system yet, please follow the instructions in Docker Docs.
  2. Obtain a Nstbrowser account. If you haven't registered yet, you can sign up on the official Nstbrowser website.
  3. After your account is ready, visit the following page to retrieve your API key.
API key in Nstbrowser

Once these preparations are done, you can pull the Nstbrowser Docker image using the following command:

Bash Copy
# Pull the Browserless image
docker pull nstbrowser/browserless:0.0.1-beta

# Run the Browserless image
docker run -it -e TOKEN=xxx -e SERVER_PORT=8848 -p 8848:8848 --name nstbrowserless nstbrowser/browserless:0.0.1-beta

# After running, you can use the docker ps command to check whether the container is running properly
docker ps
docker environmen

Crawling Dynamic Websites with Playwright in Docker

Now, let's figure out how to use Playwright based on the Nstbrowser Docker image to crawl dynamic websites! Don't be confused!

Here you can find the step-by-step to complete a simple example.

Step 1: Determine crawl targets

We must identify the information we want to crawl at the beginning. In this example, we choose to crawl all the article titles of Hacker News and store the crawled content locally.

Step 2: Writing a Playwright Script

Next, it's time to write a Playwright script to crawl our target. This is where Playwright can be used to get to the target page and retrieve the article title.

In the following example, we'll connect to the Browserless port in our docker container to run the script. Don't hesitate to have a check!

JavaScript Copy
import { chromium } from 'playwright'
// Start the browser
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com/');

await page.waitForSelector('.titleline > a')
// Grab the article title
const titles = await page.$$eval('.titleline > a', elements =>
  elements.map(el => el.innerText)
);
// Output the captured title
console.log('Article title:');
titles.forEach((title, index) => console.log(`${index + 1}: ${title}`));

Step 3: Project Initialization

How about your script's working condition? Before scraping with Browserless, we also need to initialize a new project:

Bash Copy
cd ~
makedir playwright-docker
cd playwright-docker

Step 4: Connect to Browserless and start crawling

Scraping time! Now let's try to connect to the Browserless port in the docker container in Playwright.

  • Prepare the configuration information for the interface:
JavaScript Copy
  const host = 'host.docker.internal:8848';
  const config = {
    once: true,
    headless: true, // Set headless mode
    autoClose: true,
    args: { '--disable-gpu': '', '--no-sandbox': '' }, // browser args should be a dictionary
    fingerprint: {
      name: '',
      platform: 'mac',
      kernel: 'chromium',
      kernelMilestone: 124,
      hardwareConcurrency: 8,
      deviceMemory: 8,
    },
  };
  const browserWSEndpoint = `ws://${host}/ws/connect?${encodeURIComponent(
    JSON.stringify(config)
  )}`;
  • Connect via connectOverCDP:
JavaScript Copy
const { chromium } = require('playwright');

async function execPlaywright() {
 const browser = await chromium.connectOverCDP(browserWSEndpoint);
 const context = await browser.newContext();
 const page = await context.newPage();
}
  • Scraping the web
JavaScript Copy
const { chromium } = require('playwright');

async function execPlaywright() {
  try {
    const browser = await chromium.connectOverCDP(browserWSEndpoint);
    const context = await browser.newContext();
    const page = await context.newPage();

    // Navigate to the target URL
    await page.goto('https://news.ycombinator.com/');

    // Wait for element to load
    await page.waitForSelector('.titleline > a')

    // Grab the article title
    const titles = await page.$$eval('.titleline > a', elements =>
      elements.map(el => el.innerText)
    );

    // Output the captured title
    console.log('Article title:');
    titles.forEach((title, index) => console.log(`${index + 1}: ${title}`));

    // Close the browser
    await browser.close();
  } catch (err) {
    console.error('launch', err);
  }
}

execPlaywright().then()

Step 5: Storing crawl results

For subsequent data analysis, you can use the basic Node.js module fs to write the data to a JSON file. Here is a simple tool function:

JavaScript Copy
import fs from 'fs';

// Save as a JSON file
function saveObjectToJson(obj, filename) {
  const jsonString = JSON.stringify(obj, null, 2);
  fs.writeFile(filename, jsonString, 'utf8', (err) => {
    err ? console.error(err) : console.log(`File saved successfully: ${filename}`);
  });
}

The complete code is below. After running, you can find the Hacker News_log.json file in the current script execution path, which logs all the crawling results!

JavaScript Copy
const fs = require('fs')
const { chromium } = require('playwright');

const host = 'host.docker.internal:8848';
const config = {
  once: true,
  headless: true, // Set headless mode
  autoClose: true,
  args: { '--disable-gpu': '', '--no-sandbox': '' }, // browser args should be a dictionary
  fingerprint: {
    name: '',
    platform: 'mac',
    kernel: 'chromium',
    kernelMilestone: 124,
    hardwareConcurrency: 8,
    deviceMemory: 8,
  },
};
const browserWSEndpoint = `ws://${host}/ws/connect?${encodeURIComponent(
  JSON.stringify(config)
)}`;

async function execPlaywright() {
  try {
    const browser = await chromium.connectOverCDP(browserWSEndpoint);
    const context = await browser.newContext();
    const page = await context.newPage();

    // Navigate to the target URL
    await page.goto('https://news.ycombinator.com/');

    // Wait for element to load
    await page.waitForSelector('.titleline > a')

    // Grab the article title
    const titles = await page.$$eval('.titleline > a', elements =>
      elements.map(el => el.innerText)
    );

    // Output the captured title
    console.log('Article title:');
    titles.forEach((title, index) => console.log(`${index + 1}: ${title}`));

    // data storage
    saveObjectToJson(titles.map((title, index) => ({ [index + 1]: title })), 'Hacker News_log.json')

    // Close the browser
    await browser.close();
  } catch (err) {
    console.error('launch', err);
  }
}

// Save as a JSON file
function saveObjectToJson(obj, filename) {
  const jsonString = JSON.stringify(obj, null, 2);
  fs.writeFile(filename, jsonString, 'utf8', (err) => {
    err ? console.error(err) : console.log(`File saved successfully: ${filename}`);
  });
}

execPlaywright().then();

6 Key Components of Docker

  • Docker Engine: It is the core part of Docker that handles the creation and management of containers.
  • Dockerfile: The Dockerfile uses DSL (Domain Specific Language) and contains instructions for generating Docker images. You should create the Dockerfile in order when creating your application, as the Docker daemon will run all the instructions from top to bottom.
  • Docker Image: it is a read-only template for creating instructions for a Docker container that contains application code and dependencies. Docker Image consists of files with multiple layers for executing code in Docker containers.
  • Docker Container: The Docker container is a runtime instance of a Docker image. Allows developers to package the application with all the required parts, such as libraries and other dependencies.
  • Docker Hub: It is a cloud-based repository for finding and sharing container images. It offers features like you can push images to private or public registries where you can store and share Docker images.
  • Docker Registry: It is a storage distribution system for docker images where you can store images in public and private mode.

6 Important Docker Commands

  • Docker Run: start the container from the image and specify runtime options and commands.
  • Docker Pull: extracts the container image from the container registry to the local machine.
  • Docker ps: displays the running container and its important information like container ID, image used, and status.
  • Docker Stop: stop the running container normally and shut down the processes in it.
  • Docker Start: restart the stopped container and resume its operation from the previous state.
  • Docker Login: login to the Docker registry so that you can access the private repository.

Ending Thoughts

We have finished anything we need for Playwright web scraping in Docker! You've learned:

  • The basics of Docker
  • Introduction to Browserless and how to use it
  • How to do web scraping in Browserlss

Overall, Docker is all about helping to streamline the process and ensuring that Playwright runs in a consistent environment with the necessary services and data to ensure efficient web scraping.

Now start discovering more about the mysteries of Browserless!

More
Running headless Chrome in the cloud for scalable web scraping
Headless BrowserWeb ScrapingBrowserless
How to Run Headless Chrome in the Cloud for Large-Scale Scraping
How to Run Headless Chrome in the Cloud for Large-Scale Scraping
Sep 02, 2025Robin Brown
Playwright vs. Puppeteer for Web Scraping
BrowserlessHeadless BrowserWeb Scraping
Playwright vs. Puppeteer for Web Scraping: Choosing Your Champion
Playwright vs. Puppeteer for Web Scraping: Choosing Your Champion
Sep 01, 2025Carlos Rivera
Puppeteer Stealth Plugin for Undetectable Web Scraping
BrowserlessHeadless BrowserWeb Scraping
Mastering Stealth: Leveraging Puppeteer Stealth Plugin for Undetectable Web Scraping
Mastering Stealth: Leveraging Puppeteer Stealth Plugin for Undetectable Web Scraping
Sep 01, 2025Robin Brown
The Best Headless Browser
Headless BrowserBrowserlessWeb Scraping
The Best Headless Browsers for Web Scraping: A Comprehensive Guide
The Best Headless Browsers for Web Scraping: A Comprehensive Guide
Sep 01, 2025Carlos Rivera
http-2-bypass
Browserless
What Is HTTP/2 Fingerprinting and How to Bypass It?
Learn how to bypass HTTP/2 fingerprinting in web scraping with six powerful methods, from using real browsers to cloud-based Browserless. Stay undetected against modern anti-bot defenses.
Jun 03, 2025Carlos Rivera
Load Browser Extensions in Nstbrowser Docker
Browserless
How to Load Browser Extensions in Nstbrowser Docker?
Learn 2 methods for uploading extensions in Nstbrowser and the steps to launch them in Docker.
Mar 19, 2025Carlos Rivera
Catalogue