Setup and Prerequisites

Before you start building scrapers, you need to set up your development environment and understand the testing workflow.

Environment Setup

1. Create Environment File

Create a .env file in the root of the repository with the following variables:

MOVIE_WEB_TMDB_API_KEY = "your_tmdb_api_key_here"
MOVIE_WEB_PROXY_URL = "https://your-proxy-url.com"  # Optional

Getting a TMDB API Key:

  1. Create an account at TheMovieDB
  2. Go to Settings > API
  3. Request an API key (choose "Developer" for free usage)
  4. Use the provided key in your .env file

Proxy URL (Optional):

  • Useful for testing scrapers that require proxy access
  • Can help bypass geographical restrictions during development
  • If not provided, the library will use default proxy services

2. Install Dependencies

Install all required dependencies:

pnpm install

Familiarize Yourself with the CLI

The library provides a CLI tool that's essential for testing scrapers during development. Unit tests can't be made for scrapers due to their unreliable nature, so the CLI is your primary testing tool.

Interactive Mode

The easiest way to test is using interactive mode:

pnpm cli

This will prompt you for:

  • Fetcher mode (native, node-fetch, browser)
  • Scraper ID (source or embed)
  • TMDB ID for the content (for sources)
  • Embed URL (for testing embeds directly)
  • Season/episode numbers (for TV shows)

Command Line Mode

For repeatability and automation, you can specify arguments directly:

# Get help with all available options
pnpm cli --help

# Test a movie scraper
pnpm cli --source-id catflix --tmdb-id 11527

# Test a TV show scraper (Arcane S1E1)
pnpm cli --source-id zoechip --tmdb-id 94605 --season 1 --episode 1

# Test an embed scraper directly with a URL
pnpm cli --source-id turbovid --url "https://turbovid.eu/embed/DjncbDBEmbLW"

Common CLI Examples

# Popular test cases
pnpm cli --source-id catflix --tmdb-id 11527        # The Shining
pnpm cli --source-id embedsu --tmdb-id 129          # Spirited Away
pnpm cli --source-id vidsrc --tmdb-id 94605 --season 1 --episode 1    # Arcane S1E1

# Testing different fetcher modes
pnpm cli --fetcher native --source-id catflix --tmdb-id 11527
pnpm cli --fetcher browser --source-id catflix --tmdb-id 11527

Fetcher Options

The CLI supports different fetcher modes:

  • native: Uses Node.js built-in fetch (undici) - fastest
  • node-fetch: Uses the node-fetch library
  • browser: Starts headless Chrome for browser-like environment
The browser fetcher requires running pnpm build first, otherwise you'll get outdated results.

Understanding CLI Output

Source Scraper Output (Returns Embeds)

pnpm cli --source-id catflix --tmdb-id 11527

Example output:

{
  embeds: [
    {
      embedId: 'turbovid',
      url: 'https://turbovid.eu/embed/DjncbDBEmbLW'
    }
  ]
}

Embed Scraper Output (Returns Streams)

pnpm cli --source-id turbovid --url "https://turbovid.eu/embed/DjncbDBEmbLW"

Example output:

{
  stream: [
    {
      type: 'hls',
      id: 'primary',
      playlist: 'https://proxy.fifthwit.net/m3u8-proxy?url=https%3A%2F%2Fqueenselti.pro%2Fwrofm%2Fuwu.m3u8&headers=%7B%22referer%22%3A%22https%3A%2F%2Fturbovid.eu%2F%22%2C%22origin%22%3A%22https%3A%2F%2Fturbovid.eu%22%7D',
      flags: [],
      captions: []
    }
  ]
}

Notice the proxied URL: The createM3U8ProxyUrl() function creates URLs like https://proxy.fifthwit.net/m3u8-proxy?url=...&headers=... to handle protected streams. Read more about this in Advanced Concepts.

Interactive Mode Flow

pnpm cli
✔ Select a fetcher mode · native
✔ Select a source · catflix  
✔ TMDB ID · 11527
✔ Media type · movie
✓ Done!
{
  embeds: [
    {
      embedId: 'turbovid',
      url: 'https://turbovid.eu/embed/DjncbDBEmbLW'
    }
  ]
}

Development Workflow

  1. Setup: Create .env file and install dependencies
  2. Research: Study the target website's structure and player technology
  3. Code: Build your scraper following the established patterns
  4. Register: Add to all.ts with unique rank
  5. Test: Use CLI to test with multiple different movies and TV shows
  6. Iterate: Fix issues and improve reliability
  7. Submit: Create pull request with thorough testing documentation

Next Steps

Once your environment is set up:

  1. Read Provider System Overview to understand how scrapers work
  2. Learn Building Scrapers for detailed implementation guide
  3. Check Advanced Concepts for error handling and best practices
Always test your scrapers with multiple different movies and TV shows to ensure reliability across different content types.