Sources vs Embeds

Understanding the difference between sources and embeds is crucial for building scrapers effectively. They work together in a two-stage pipeline to extract playable video streams.

The Two-Stage Pipeline

User Request → Source Scraper → What did source find?
                                      ↓
                               ┌─────────────┐
                               ↓             ↓
                        Direct Stream    Embed URLs
                               ↓             ↓
                          Play Video    Embed Scraper
                                            ↓
                                     Extract Stream
                                            ↓
                                      Play Video

Flow Breakdown:

User requests content (movie/TV show)
Source scraper searches the target website
Source returns either:
- Direct streams → Ready to play immediately
- Embed URLs → Need further processing
Embed scraper (if needed) extracts streams from player URLs
Final result → Playable video stream

Sources: The Content Finders

Sources are the first stage - they find content on websites and return either:

Direct video streams (ready to play)
Embed URLs that need further processing

Example: Autoembed Source

// From src/providers/sources/autoembed.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
  // 1. Call an API to find video sources
  const data = await ctx.proxiedFetcher(`/api/getVideoSource`, {
    baseUrl: 'https://tom.autoembed.cc',
    query: { type: mediaType, id }
  });

  // 2. Return embed URLs for further processing
  return {
    embeds: [{
      embedId: 'autoembed-english',  // Points to an embed scraper
      url: data.videoSource          // URL that embed will process
    }]
  };
}

What this source does:

Queries an API with TMDB ID
Gets back a video source URL
Returns it as an embed for the autoembed-english embed scraper to handle

Example: Catflix Source

// From src/providers/sources/catflix.ts  
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
  // 1. Build URL to the movie/show page
  const watchPageUrl = `${baseUrl}/movie/${mediaTitle}-${movieId}`;
  
  // 2. Scrape the page for embedded player URLs
  const watchPage = await ctx.proxiedFetcher(watchPageUrl);
  const $ = load(watchPage);
  
  // 3. Extract and decode the embed URL
  const mainOriginMatch = scriptData.data.match(/main_origin = "(.*?)";/);
  const decodedUrl = atob(mainOriginMatch[1]);

  // 4. Return embed URL for turbovid embed to process
  return {
    embeds: [{
      embedId: 'turbovid',          // Points to turbovid embed scraper
      url: decodedUrl               // Turbovid player URL
    }]
  };
}

What this source does:

Scrapes a streaming website
Finds encoded embed player URLs in the page source
Decodes the URL and returns it for the turbovid embed scraper

Embeds: The Stream Extractors

Embeds are the second stage - they take URLs from sources and extract the actual playable video streams. Each embed type knows how to handle a specific player or service.

Example: Autoembed Embed (Simple)

// From src/providers/embeds/autoembed.ts
async scrape(ctx) {
  // The URL from the source is already a direct HLS playlist
  return {
    stream: [{
      id: 'primary',
      type: 'hls',
      playlist: ctx.url,  // Use the URL directly as HLS playlist
      flags: [flags.CORS_ALLOWED],
      captions: []
    }]
  };
}

What this embed does:

Takes the URL from autoembed source
Treats it as a direct HLS playlist (no further processing needed)
Returns it as a playable stream

Example: Turbovid Embed (Complex)

// From src/providers/embeds/turbovid.ts
async scrape(ctx) {
  // 1. Fetch the turbovid player page
  const embedPage = await ctx.proxiedFetcher(ctx.url);
  
  // 2. Extract encryption keys from the page
  const apkey = embedPage.match(/const\s+apkey\s*=\s*"(.*?)";/)?.[1];
  const xxid = embedPage.match(/const\s+xxid\s*=\s*"(.*?)";/)?.[1];
  
  // 3. Get decryption key from API
  const encodedJuiceKey = JSON.parse(
    await ctx.proxiedFetcher('/api/cucked/juice_key', { baseUrl })
  ).juice;
  
  // 4. Get encrypted playlist data
  const data = JSON.parse(
    await ctx.proxiedFetcher('/api/cucked/the_juice_v2/', {
      baseUrl, query: { [apkey]: xxid }
    })
  ).data;
  
  // 5. Decrypt the playlist URL
  const playlist = decrypt(data, atob(encodedJuiceKey));
  
  // 6. Return proxied stream (handles CORS/headers)
  return {
    stream: [{
      type: 'hls',
      id: 'primary',
      playlist: createM3U8ProxyUrl(playlist, ctx.features, streamHeaders),
      headers: streamHeaders,
      flags: [], captions: []
    }]
  };
}

What this embed does:

Takes turbovid player URL from catflix source
Performs complex extraction: fetches page → gets keys → decrypts data
Returns the final HLS playlist with proper proxy handling

Key Differences

Sources	Embeds
Find content on websites	Extract streams from players
Return embed URLs OR direct streams	Always return direct streams
Handle website navigation/search	Handle player-specific extraction
Can return multiple server options	Process one specific player type
Example: "Find Avengers on Catflix"	Example: "Extract stream from Turbovid player"

Why This Separation?

1. Reusability

Multiple sources can use the same embed:

// Both catflix and other sources can return turbovid embeds
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' }

2. Multiple Server Options

Sources can provide backup servers:

return {
  embeds: [
    { embedId: 'turbovid', url: 'https://turbovid.com/player123' },
    { embedId: 'vidcloud', url: 'https://vidcloud.co/embed456' },
    { embedId: 'dood', url: 'https://dood.watch/789' }
  ]
};

3. Language/Quality Variants

Sources can offer different options:

return {
  embeds: [
    { embedId: 'autoembed-english', url: streamUrl },
    { embedId: 'autoembed-spanish', url: streamUrlEs },
    { embedId: 'autoembed-hindi', url: streamUrlHi }
  ]
};

4. Specialization

Sources specialize in website structures and search
Embeds specialize in player technologies and decryption

How They Work Together

Flow Example: Finding "Spirited Away"

Source (catflix):
- Searches catflix.su for "Spirited Away"
- Finds movie page with embedded player
- Extracts turbovid URL: https://turbovid.com/embed/abc123
- Returns: { embedId: 'turbovid', url: 'https://turbovid.com/embed/abc123' }
Embed (turbovid):
- Receives the turbovid URL
- Scrapes the player page for encryption keys
- Decrypts the actual HLS playlist URL
- Returns: { stream: [{ playlist: 'https://cdn.example.com/movie.m3u8' }] }
Result: User can now play the video stream

Error Handling Chain

If the embed fails to extract a stream:

// Source provides multiple backup options
return {
  embeds: [
    { embedId: 'turbovid', url: url1 },     // Try first
    { embedId: 'mixdrop', url: url2 },      // Fallback 1  
    { embedId: 'dood', url: url3 }          // Fallback 2
  ]
};

The system tries each embed in rank order until one succeeds.

Best Practices

For Sources:

Provide multiple embed options when possible
Use descriptive embed IDs that match existing embeds
Handle both movies and TV shows (combo scraper pattern)
Return direct streams when embed processing isn't needed

For Embeds:

Focus on one player type per embed
Handle errors gracefully with clear error messages
Use proxy functions for protected streams
Include proper headers and flags

Registration:

// In src/providers/all.ts
export function gatherAllSources(): Array<Sourcerer> {
  return [catflixScraper, autoembedScraper, /* ... */];
}

export function gatherAllEmbeds(): Array<Embed> {
  return [turbovidScraper, autoembedEnglishScraper, /* ... */];
}

Both sources and embeds must be registered in all.ts to be available for use.

In DepthBuilding Scrapers

In DepthFlags