Skip to main content

Metadata Extraction

Understand how HubSpot Deploy reads and processes your metadata from HubSpot portals and Git repositories. This guide explains what happens during extraction and why it matters for your workflows.


Overview

Metadata extraction is the process of reading your HubSpot configuration (workflows, forms, properties, etc.) and preparing it for comparison and deployment.

When you create a comparison or backup, the system:

  1. Reads metadata from HubSpot API or Git repository
  2. Converts it to a portable format that works across environments
  3. Stores it for comparison and deployment

This happens automatically in the background, but understanding the process helps you:

  • Know what to expect during extraction
  • Understand why it takes time
  • Troubleshoot issues when they occur

What Gets Extracted?

From HubSpot Portals

The system reads your portal configuration through the HubSpot API:

CRM Configuration:

  • Custom objects and properties
  • Standard object properties
  • Property groups
  • Pipelines and stages
  • Association labels

Marketing Assets:

  • Workflows
  • Forms
  • Email templates
  • Landing pages
  • Site pages
  • Blog posts
  • CTAs
  • Campaigns

Lists and Segmentation:

  • Contact lists
  • Company lists
  • Deal lists
  • Custom object lists

Sales Tools:

  • Sequences
  • Quote templates

Users and Teams:

  • Owners (users)
  • Teams

See Metadata Types for complete details.

From Git Repositories

The system reads JSON/YAML files from your repository:

my-hubspot-metadata/
├── workflows/
│ ├── welcome-workflow.json
│ └── nurture-workflow.json
├── forms/
│ ├── contact-form.json
│ └── demo-request.json
├── custom_objects/
│ └── deals-extended.json
└── properties/
├── contact-properties.json
└── company-properties.json

Each file represents one metadata item in the same format HubSpot uses.


How Extraction Works

Step 1: Reading Data

From HubSpot:

  • Connects using your OAuth or Private App credentials
  • Fetches metadata through HubSpot API
  • Handles pagination for large datasets (e.g., 1000+ workflows)
  • Respects API rate limits automatically

From Git:

  • Clones or pulls your repository
  • Reads JSON/YAML files from directories
  • Parses each file into metadata objects

Step 2: Making It Portable

The Problem: HubSpot uses numeric IDs that are different in each portal.

For example, the same owner might be:

  • ID 12345 in Production
  • ID 67890 in Staging

This makes it impossible to compare or deploy between portals.

The Solution: Convert IDs to human-readable names (URNs).

Before (portal-specific):

"ownerId": "12345"

After (portable):

"ownerId": "12345"
"__stable__ownerId": "urn:hubspot:users:email:[email protected]"

Now the system can:

  • Compare workflows between portals (even if owner IDs differ)
  • Deploy workflows to any portal (matching by email, not ID)
  • Show you meaningful names instead of cryptic IDs

See URN Management for details.

Step 3: Converting to YAML

Why YAML?

YAML is a human-readable format that's perfect for configuration:

  • Easy to read and understand
  • Works great with Git (clean diffs)
  • Industry standard for infrastructure-as-code

Example:

"name": "Contact Form"
"formType": "HUBSPOT"
"submitText": "Submit"
"fields":
- "name": "email"
"label": "Email Address"
"required": true
- "name": "firstname"
"label": "First Name"
"required": false

You can view this YAML in the comparison diff viewer to see exactly what changed.

Step 4: Detecting Changes

The system calculates a "fingerprint" (checksum) for each item to quickly detect changes:

  • Same fingerprint = No changes, skip it
  • Different fingerprint = Something changed, show in comparison

This makes re-extraction much faster because unchanged items are skipped.


Extraction Order

Metadata types are extracted in a specific order to handle dependencies:

Phase 1: Foundation (no dependencies)

  1. Owners
  2. Custom and standard objects
  3. Email templates

Phase 2: Marketing (depends on Phase 1) 4. Campaigns 5. Forms 6. CTAs

Phase 3: Automation (depends on Phase 1 & 2) 7. Lists 8. Workflows 9. Sequences

Phase 4: Content (independent) 10. Landing pages 11. Site pages 12. Blog posts

Phase 5: CRM (depends on objects) 13. Pipelines 14. Property groups 15. Association definitions 16. Quote templates

Why this order?

Some metadata types reference others. For example:

  • Workflows reference owners, lists, and objects
  • Forms reference campaigns
  • CTAs reference campaigns, forms, and emails

By extracting in dependency order, the system can properly convert all references to portable URNs.


Progress Tracking

Real-Time Status

During extraction, you'll see real-time progress for each metadata type:

Status indicators:

  • Not populated: Waiting to start
  • 🔄 Populating: Currently extracting
  • Populated: Successfully extracted
  • ⚠️ Skipped (missing scopes): OAuth permissions missing
  • Error: Extraction failed

Example progress:

✅ Owners (15 items)
✅ Custom Objects (3 items)
🔄 Workflows (extracting...)
⏳ Forms (waiting...)
⚠️ Sequences (missing scopes)

What Affects Speed?

Portal size:

  • Small portal (less than100 items): 1-2 minutes
  • Medium portal (100-1000 items): 3-5 minutes
  • Large portal (more than 1000 items): 5-15 minutes

Factors:

  • Number of metadata types selected
  • Amount of data in each type
  • HubSpot API rate limits (10,000 requests per day)
  • Network latency

Tip: If you only need specific metadata types, extract only those instead of "all" to save time.


OAuth Scopes and Permissions

Scope Validation

Before extracting each metadata type, the system checks if you have the required OAuth scopes.

If scopes are missing:

  • Metadata type is skipped
  • Marked as "skipped (missing scopes)"
  • Extraction continues with other types
  • You'll see a warning in the UI

Common scenarios:

Full access (all scopes granted):

✅ Workflows
✅ Forms
✅ Sequences
✅ All metadata types available

⚠️ Limited access (some scopes missing):

✅ Workflows
✅ Forms
⚠️ Sequences (missing sales-email-read scope)

Required Scopes by Type

Metadata TypeRequired Scope
Ownerscrm.objects.owners.read
Custom Objectscrm.schemas.custom.read
Workflowsautomation
Formsforms
Email Templatescontent
Listscrm.lists.read
CTAscontent
Campaignscontent
Landing Pagescontent
Site Pagescontent
Blog Postscontent
Sequencessales-email-read
Quote Templatescrm.objects.quotes.read

Solution: If metadata types are skipped, re-authenticate your connection with the required scopes.

See OAuth Scopes for complete details.


Viewing Extracted Metadata

Instance Observer

After extraction, view your portal's metadata in Instance Observer:

  1. Navigate to Connections
  2. Click on your portal
  3. Go to Metadata tab

You'll see a read-only view of all extracted metadata organized by type.

See Instance Observer for details.

Comparison Diff Viewer

When comparing two sources, view the YAML diff:

  1. Create a comparison
  2. Wait for extraction to complete
  3. Click on any item to see side-by-side YAML comparison

The diff shows:

  • Green: Added in target
  • Red: Removed from target
  • Yellow: Modified between source and target

Common Workflows

Initial Portal Extraction

Goal: Extract all metadata from a portal for the first time

  1. Navigate to Connections
  2. Click Connect HubSpot
  3. Authorize with all required scopes
  4. Wait for automatic extraction
  5. View metadata in Instance Observer

Time: 5-15 minutes for typical portal

Comparison Extraction

Goal: Extract metadata for a specific comparison

  1. Navigate to Comparisons
  2. Click New Comparison
  3. Select source and target
  4. Click Initialize Comparison
  5. Wait for both sides to extract
  6. Review differences in diff viewer

Time: 3-10 minutes per side

Re-Extraction

Goal: Update metadata after portal changes

  1. Navigate to comparison or Instance Observer
  2. Click Refresh or Re-extract
  3. Wait for extraction
  4. View updated metadata

Time: 1-5 minutes (faster due to change detection)


Troubleshooting

Extraction Stuck in "Populating"

Problem: Metadata type shows "populating" but never completes

Possible causes:

  • Large dataset taking time
  • API rate limit reached
  • Network issues

Solution:

  1. Wait patiently: Large portals can take 10-15 minutes
  2. Check progress: Look for item counts increasing
  3. Refresh page: Sometimes UI doesn't update
  4. Try again: If truly stuck after 20 minutes, cancel and retry

Some Metadata Types Skipped

Problem: Metadata types marked as "skipped (missing scopes)"

Cause: Your connection doesn't have required OAuth permissions

Solution:

  1. Navigate to Connections
  2. Click on your portal
  3. Click Re-authenticate
  4. Grant all requested scopes
  5. Retry extraction

Example: If "Sequences" is skipped, you need the sales-email-read scope.

See OAuth Scopes for required scopes.

Extraction Very Slow

Problem: Extraction takes longer than expected

Factors affecting speed:

  • Portal size: More items = more time
  • Metadata types: Extracting "all" takes longer than specific types
  • API rate limits: HubSpot limits requests per day
  • Network: Slow connection affects speed

Expected times:

  • Small portal (less than 100 items): 1-2 minutes
  • Medium portal (100-1000 items): 3-5 minutes
  • Large portal (more than 1000 items): 5-15 minutes
  • Enterprise portal (more than 5000 items): 15-30 minutes

Optimization tips:

  • Extract only needed metadata types
  • Avoid multiple concurrent extractions
  • Ensure stable internet connection

"Failed to Extract" Error

Problem: Extraction fails with error message

Common causes:

1. Connection expired:

  • Solution: Re-authenticate your connection

2. Missing permissions:

  • Solution: Grant required OAuth scopes

3. API rate limit:

  • Solution: Wait 10 minutes and retry

4. Network timeout:

  • Solution: Check internet connection and retry

5. HubSpot API issue:

  • Solution: Check HubSpot status page, retry later

Can't See Extracted Metadata

Problem: Extraction completed but can't find metadata

Check these locations:

For comparisons:

  1. Navigate to Comparisons
  2. Find your comparison
  3. Click View
  4. Metadata shown in diff viewer

For portal-level:

  1. Navigate to Connections
  2. Click on your portal
  3. Go to Metadata tab
  4. Browse extracted metadata

Differences Look Wrong

Problem: Comparison shows unexpected differences

Possible causes:

1. Stale data:

  • Solution: Click Refresh to re-extract

2. Manual portal changes:

  • Solution: This is expected! The diff shows what changed.

3. Different environments:

  • Solution: Production and staging are supposed to differ

4. Timing:

  • Solution: Extract both sides close together in time

Best Practices

When to Extract

Before deployments:

  • Always extract fresh data before deploying
  • Ensures you see latest changes
  • Prevents deploying stale configuration

After manual changes:

  • Re-extract after editing in HubSpot UI
  • Updates comparison with your changes
  • Enables accurate drift detection

Regular schedule:

  • Daily for active portals
  • Weekly for stable portals
  • Enables drift detection and audit trail

Scope Management

Initial connection:

  • Grant all scopes you might need
  • Easier than re-authenticating later
  • Enables full metadata extraction

Review skipped types:

  • Check which metadata types were skipped
  • Understand what you're missing
  • Re-authenticate if needed

Least privilege:

  • Only grant scopes you actually use
  • Reduces security risk
  • Simplifies permission management

Performance Tips

Extract selectively:

  • Don't always extract "all" metadata types
  • Choose specific types you need
  • Saves time and resources

Avoid concurrent extractions:

  • Don't extract multiple comparisons simultaneously
  • Can hit API rate limits
  • Slows down all extractions

Clean up old comparisons:

  • Delete comparisons you no longer need
  • Reduces database size
  • Improves overall performance


Next Steps