Sources vs Embeds
Understanding the difference between sources and embeds is crucial for building scrapers effectively. They work together in a two-stage pipeline to extract playable video streams.
The Two-Stage Pipeline
User Request → Source Scraper → What did source find?
↓
┌─────────────┐
↓ ↓
Direct Stream Embed URLs
↓ ↓
Play Video Embed Scraper
↓
Extract Stream
↓
Play Video
Flow Breakdown:
- User requests content (movie/TV show)
- Source scraper searches the target website
- Source returns either:
- Direct streams → Ready to play immediately
- Embed URLs → Need further processing
- Embed scraper (if needed) extracts streams from player URLs
- Final result → Playable video stream
Sources: The Content Finders
Sources are the first stage - they find content on websites and return either:
- Direct video streams (ready to play)
- Embed URLs that need further processing
Example: Autoembed Source
// From src/providers/sources/autoembed.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
// 1. Call an API to find video sources
const data = await ctx.proxiedFetcher(`/api/getVideoSource`, {
baseUrl: 'https://tom.autoembed.cc',
query: { type: mediaType, id }
});
// 2. Return embed URLs for further processing
return {
embeds: [{
embedId: 'autoembed-english', // Points to an embed scraper
url: data.videoSource // URL that embed will process
}]
};
}
What this source does:
- Queries an API with TMDB ID
- Gets back a video source URL
- Returns it as an embed for the
autoembed-englishembed scraper to handle
Example: Catflix Source
// From src/providers/sources/catflix.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
// 1. Build URL to the movie/show page
const watchPageUrl = `${baseUrl}/movie/${mediaTitle}-${movieId}`;
// 2. Scrape the page for embedded player URLs
const watchPage = await ctx.proxiedFetcher(watchPageUrl);
const $ = load(watchPage);
// 3. Extract and decode the embed URL
const mainOriginMatch = scriptData.data.match(/main_origin = "(.*?)";/);
const decodedUrl = atob(mainOriginMatch[1]);
// 4. Return embed URL for turbovid embed to process
return {
embeds: [{
embedId: 'turbovid', // Points to turbovid embed scraper
url: decodedUrl // Turbovid player URL
}]
};
}
What this source does:
- Scrapes a streaming website
- Finds encoded embed player URLs in the page source
- Decodes the URL and returns it for the
turbovidembed scraper
Embeds: The Stream Extractors
Embeds are the second stage - they take URLs from sources and extract the actual playable video streams. Each embed type knows how to handle a specific player or service.
Example: Autoembed Embed (Simple)
// From src/providers/embeds/autoembed.ts
async scrape(ctx) {
// The URL from the source is already a direct HLS playlist
return {
stream: [{
id: 'primary',
type: 'hls',
playlist: ctx.url, // Use the URL directly as HLS playlist
flags: [flags.CORS_ALLOWED],
captions: []
}]
};
}
What this embed does:
- Takes the URL from autoembed source
- Treats it as a direct HLS playlist (no further processing needed)
- Returns it as a playable stream
Example: Turbovid Embed (Complex)
// From src/providers/embeds/turbovid.ts
async scrape(ctx) {
// 1. Fetch the turbovid player page
const embedPage = await ctx.proxiedFetcher(ctx.url);
// 2. Extract encryption keys from the page
const apkey = embedPage.match(/const\s+apkey\s*=\s*"(.*?)";/)?.[1];
const xxid = embedPage.match(/const\s+xxid\s*=\s*"(.*?)";/)?.[1];
// 3. Get decryption key from API
const encodedJuiceKey = JSON.parse(
await ctx.proxiedFetcher('/api/cucked/juice_key', { baseUrl })
).juice;
// 4. Get encrypted playlist data
const data = JSON.parse(
await ctx.proxiedFetcher('/api/cucked/the_juice_v2/', {
baseUrl, query: { [apkey]: xxid }
})
).data;
// 5. Decrypt the playlist URL
const playlist = decrypt(data, atob(encodedJuiceKey));
// 6. Return proxied stream (handles CORS/headers)
return {
stream: [{
type: 'hls',
id: 'primary',
playlist: createM3U8ProxyUrl(playlist, ctx.features, streamHeaders),
headers: streamHeaders,
flags: [], captions: []
}]
};
}
What this embed does:
- Takes turbovid player URL from catflix source
- Performs complex extraction: fetches page → gets keys → decrypts data
- Returns the final HLS playlist with proper proxy handling
Key Differences
| Sources | Embeds |
|---|---|
| Find content on websites | Extract streams from players |
| Return embed URLs OR direct streams | Always return direct streams |
| Handle website navigation/search | Handle player-specific extraction |
| Can return multiple server options | Process one specific player type |
| Example: "Find Avengers on Catflix" | Example: "Extract stream from Turbovid player" |
Why This Separation?
1. Reusability
Multiple sources can use the same embed:
// Both catflix and other sources can return turbovid embeds
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' }
2. Multiple Server Options
Sources can provide backup servers:
return {
embeds: [
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' },
{ embedId: 'vidcloud', url: 'https://vidcloud.co/embed456' },
{ embedId: 'dood', url: 'https://dood.watch/789' }
]
};
3. Language/Quality Variants
Sources can offer different options:
return {
embeds: [
{ embedId: 'autoembed-english', url: streamUrl },
{ embedId: 'autoembed-spanish', url: streamUrlEs },
{ embedId: 'autoembed-hindi', url: streamUrlHi }
]
};
4. Specialization
- Sources specialize in website structures and search
- Embeds specialize in player technologies and decryption
How They Work Together
Flow Example: Finding "Spirited Away"
- Source (catflix):
- Searches catflix.su for "Spirited Away"
- Finds movie page with embedded player
- Extracts turbovid URL:
https://turbovid.com/embed/abc123 - Returns:
{ embedId: 'turbovid', url: 'https://turbovid.com/embed/abc123' }
- Embed (turbovid):
- Receives the turbovid URL
- Scrapes the player page for encryption keys
- Decrypts the actual HLS playlist URL
- Returns:
{ stream: [{ playlist: 'https://cdn.example.com/movie.m3u8' }] }
- Result: User can now play the video stream
Error Handling Chain
If the embed fails to extract a stream:
// Source provides multiple backup options
return {
embeds: [
{ embedId: 'turbovid', url: url1 }, // Try first
{ embedId: 'mixdrop', url: url2 }, // Fallback 1
{ embedId: 'dood', url: url3 } // Fallback 2
]
};
The system tries each embed in rank order until one succeeds.
Best Practices
For Sources:
- Provide multiple embed options when possible
- Use descriptive embed IDs that match existing embeds
- Handle both movies and TV shows (combo scraper pattern)
- Return direct streams when embed processing isn't needed
For Embeds:
- Focus on one player type per embed
- Handle errors gracefully with clear error messages
- Use proxy functions for protected streams
- Include proper headers and flags
Registration:
// In src/providers/all.ts
export function gatherAllSources(): Array<Sourcerer> {
return [catflixScraper, autoembedScraper, /* ... */];
}
export function gatherAllEmbeds(): Array<Embed> {
return [turbovidScraper, autoembedEnglishScraper, /* ... */];
}
Both sources and embeds must be registered in all.ts to be available for use.