Metadata Extraction

Understand how HubSpot Deploy reads and processes your metadata from HubSpot portals and Git repositories. This guide explains what happens during extraction and why it matters for your workflows.

Overview

Metadata extraction is the process of reading your HubSpot configuration (workflows, forms, properties, etc.) and preparing it for comparison and deployment.

When you create a comparison or backup, the system:

Reads metadata from HubSpot API or Git repository
Converts it to a portable format that works across environments
Stores it for comparison and deployment

This happens automatically in the background, but understanding the process helps you:

Know what to expect during extraction
Understand why it takes time
Troubleshoot issues when they occur

What Gets Extracted?

From HubSpot Portals

The system reads your portal configuration through the HubSpot API:

CRM Configuration:

Custom objects and properties
Standard object properties
Property groups
Pipelines and stages
Association labels

Marketing Assets:

Workflows
Forms
Email templates
Landing pages
Site pages
Blog posts
CTAs
Campaigns

Lists and Segmentation:

Contact lists
Company lists
Deal lists
Custom object lists

Sales Tools:

Sequences
Quote templates

Users and Teams:

Owners (users)
Teams

See Metadata Types for complete details.

From Git Repositories

The system reads JSON/YAML files from your repository:

my-hubspot-metadata/
├── workflows/
│   ├── welcome-workflow.json
│   └── nurture-workflow.json
├── forms/
│   ├── contact-form.json
│   └── demo-request.json
├── custom_objects/
│   └── deals-extended.json
└── properties/
    ├── contact-properties.json
    └── company-properties.json

Each file represents one metadata item in the same format HubSpot uses.

How Extraction Works

Step 1: Reading Data

From HubSpot:

Connects using your OAuth or Private App credentials
Fetches metadata through HubSpot API
Handles pagination for large datasets (e.g., 1000+ workflows)
Respects API rate limits automatically

From Git:

Clones or pulls your repository
Reads JSON/YAML files from directories
Parses each file into metadata objects

Step 2: Making It Portable

The Problem: HubSpot uses numeric IDs that are different in each portal.

For example, the same owner might be:

ID 12345 in Production
ID 67890 in Staging

This makes it impossible to compare or deploy between portals.

The Solution: Convert IDs to human-readable names (URNs).

Before (portal-specific):

"ownerId": "12345"

After (portable):

"ownerId": "12345"
"__stable__ownerId": "urn:hubspot:users:email:[email protected]"

Now the system can:

Compare workflows between portals (even if owner IDs differ)
Deploy workflows to any portal (matching by email, not ID)
Show you meaningful names instead of cryptic IDs

See URN Management for details.

Step 3: Converting to YAML

Why YAML?

YAML is a human-readable format that's perfect for configuration:

Easy to read and understand
Works great with Git (clean diffs)
Industry standard for infrastructure-as-code

Example:

"name": "Contact Form"
"formType": "HUBSPOT"
"submitText": "Submit"
"fields":
  - "name": "email"
    "label": "Email Address"
    "required": true
  - "name": "firstname"
    "label": "First Name"
    "required": false

You can view this YAML in the comparison diff viewer to see exactly what changed.

Step 4: Detecting Changes

The system calculates a "fingerprint" (checksum) for each item to quickly detect changes:

Same fingerprint = No changes, skip it
Different fingerprint = Something changed, show in comparison

This makes re-extraction much faster because unchanged items are skipped.

Extraction Order

Metadata types are extracted in a specific order to handle dependencies:

Phase 1: Foundation (no dependencies)

Owners
Custom and standard objects
Email templates

Phase 2: Marketing (depends on Phase 1) 4. Campaigns 5. Forms 6. CTAs

Phase 3: Automation (depends on Phase 1 & 2) 7. Lists 8. Workflows 9. Sequences

Phase 4: Content (independent) 10. Landing pages 11. Site pages 12. Blog posts

Phase 5: CRM (depends on objects) 13. Pipelines 14. Property groups 15. Association definitions 16. Quote templates

Why this order?

Some metadata types reference others. For example:

Workflows reference owners, lists, and objects
Forms reference campaigns
CTAs reference campaigns, forms, and emails

By extracting in dependency order, the system can properly convert all references to portable URNs.

Progress Tracking

Real-Time Status

During extraction, you'll see real-time progress for each metadata type:

Status indicators:

⏳ Not populated: Waiting to start
🔄 Populating: Currently extracting
✅ Populated: Successfully extracted
⚠️ Skipped (missing scopes): OAuth permissions missing
❌ Error: Extraction failed

Example progress:

✅ Owners (15 items)
✅ Custom Objects (3 items)
🔄 Workflows (extracting...)
⏳ Forms (waiting...)
⚠️ Sequences (missing scopes)

What Affects Speed?

Portal size:

Small portal (less than100 items): 1-2 minutes
Medium portal (100-1000 items): 3-5 minutes
Large portal (more than 1000 items): 5-15 minutes

Factors:

Number of metadata types selected
Amount of data in each type
HubSpot API rate limits (10,000 requests per day)
Network latency

Tip: If you only need specific metadata types, extract only those instead of "all" to save time.

OAuth Scopes and Permissions

Scope Validation

Before extracting each metadata type, the system checks if you have the required OAuth scopes.

If scopes are missing:

Metadata type is skipped
Marked as "skipped (missing scopes)"
Extraction continues with other types
You'll see a warning in the UI

Common scenarios:

✅ Full access (all scopes granted):

✅ Workflows
✅ Forms
✅ Sequences
✅ All metadata types available

⚠️ Limited access (some scopes missing):

✅ Workflows
✅ Forms
⚠️ Sequences (missing sales-email-read scope)

Required Scopes by Type

Metadata Type	Required Scope
Owners	`crm.objects.owners.read`
Custom Objects	`crm.schemas.custom.read`
Workflows	`automation`
Forms	`forms`
Email Templates	`content`
Lists	`crm.lists.read`
CTAs	`content`
Campaigns	`content`
Landing Pages	`content`
Site Pages	`content`
Blog Posts	`content`
Sequences	`sales-email-read`
Quote Templates	`crm.objects.quotes.read`

Solution: If metadata types are skipped, re-authenticate your connection with the required scopes.

See OAuth Scopes for complete details.

Viewing Extracted Metadata

Instance Observer

After extraction, view your portal's metadata in Instance Observer:

Navigate to Connections
Click on your portal
Go to Metadata tab

You'll see a read-only view of all extracted metadata organized by type.

See Instance Observer for details.

Comparison Diff Viewer

When comparing two sources, view the YAML diff:

Create a comparison
Wait for extraction to complete
Click on any item to see side-by-side YAML comparison

The diff shows:

Green: Added in target
Red: Removed from target
Yellow: Modified between source and target

Common Workflows

Initial Portal Extraction

Goal: Extract all metadata from a portal for the first time

Navigate to Connections
Click Connect HubSpot
Authorize with all required scopes
Wait for automatic extraction
View metadata in Instance Observer

Time: 5-15 minutes for typical portal

Comparison Extraction

Goal: Extract metadata for a specific comparison

Navigate to Comparisons
Click New Comparison
Select source and target
Click Initialize Comparison
Wait for both sides to extract
Review differences in diff viewer

Time: 3-10 minutes per side

Re-Extraction

Goal: Update metadata after portal changes

Navigate to comparison or Instance Observer
Click Refresh or Re-extract
Wait for extraction
View updated metadata

Time: 1-5 minutes (faster due to change detection)

Troubleshooting

Extraction Stuck in "Populating"

Problem: Metadata type shows "populating" but never completes

Possible causes:

Large dataset taking time
API rate limit reached
Network issues

Solution:

Wait patiently: Large portals can take 10-15 minutes
Check progress: Look for item counts increasing
Refresh page: Sometimes UI doesn't update
Try again: If truly stuck after 20 minutes, cancel and retry

Some Metadata Types Skipped

Problem: Metadata types marked as "skipped (missing scopes)"

Cause: Your connection doesn't have required OAuth permissions

Solution:

Navigate to Connections
Click on your portal
Click Re-authenticate
Grant all requested scopes
Retry extraction

Example: If "Sequences" is skipped, you need the sales-email-read scope.

See OAuth Scopes for required scopes.

Extraction Very Slow

Problem: Extraction takes longer than expected

Factors affecting speed:

Portal size: More items = more time
Metadata types: Extracting "all" takes longer than specific types
API rate limits: HubSpot limits requests per day
Network: Slow connection affects speed

Expected times:

Small portal (less than 100 items): 1-2 minutes
Medium portal (100-1000 items): 3-5 minutes
Large portal (more than 1000 items): 5-15 minutes
Enterprise portal (more than 5000 items): 15-30 minutes

Optimization tips:

Extract only needed metadata types
Avoid multiple concurrent extractions
Ensure stable internet connection

"Failed to Extract" Error

Problem: Extraction fails with error message

Common causes:

1. Connection expired:

Solution: Re-authenticate your connection

2. Missing permissions:

Solution: Grant required OAuth scopes

3. API rate limit:

Solution: Wait 10 minutes and retry

4. Network timeout:

Solution: Check internet connection and retry

5. HubSpot API issue:

Solution: Check HubSpot status page, retry later

Can't See Extracted Metadata

Problem: Extraction completed but can't find metadata

Check these locations:

For comparisons:

Navigate to Comparisons
Find your comparison
Click View
Metadata shown in diff viewer

For portal-level:

Navigate to Connections
Click on your portal
Go to Metadata tab
Browse extracted metadata

Differences Look Wrong

Problem: Comparison shows unexpected differences

Possible causes:

1. Stale data:

Solution: Click Refresh to re-extract

2. Manual portal changes:

Solution: This is expected! The diff shows what changed.

3. Different environments:

Solution: Production and staging are supposed to differ

4. Timing:

Solution: Extract both sides close together in time

Best Practices

When to Extract

Before deployments:

Always extract fresh data before deploying
Ensures you see latest changes
Prevents deploying stale configuration

After manual changes:

Re-extract after editing in HubSpot UI
Updates comparison with your changes
Enables accurate drift detection

Regular schedule:

Daily for active portals
Weekly for stable portals
Enables drift detection and audit trail

Scope Management

Initial connection:

Grant all scopes you might need
Easier than re-authenticating later
Enables full metadata extraction

Review skipped types:

Check which metadata types were skipped
Understand what you're missing
Re-authenticate if needed

Least privilege:

Only grant scopes you actually use
Reduces security risk
Simplifies permission management

Performance Tips

Extract selectively:

Don't always extract "all" metadata types
Choose specific types you need
Saves time and resources

Avoid concurrent extractions:

Don't extract multiple comparisons simultaneously
Can hit API rate limits
Slows down all extractions

Clean up old comparisons:

Delete comparisons you no longer need
Reduces database size
Improves overall performance

URN Management: Understanding portable references
Comparisons: Using extracted metadata
Deployments: Deploying changes
OAuth Scopes: Required permissions
Instance Observer: Viewing metadata
Metadata Types: Supported types

Next Steps

Learn about URN Management for portable references
Create your first comparison
Understand deployment process
Set up drift detection

Overview​

What Gets Extracted?​

From HubSpot Portals​

From Git Repositories​

How Extraction Works​

Step 1: Reading Data​

Step 2: Making It Portable​

Step 3: Converting to YAML​

Step 4: Detecting Changes​

Extraction Order​

Progress Tracking​

Real-Time Status​

What Affects Speed?​

OAuth Scopes and Permissions​

Scope Validation​

Required Scopes by Type​

Viewing Extracted Metadata​

Instance Observer​

Comparison Diff Viewer​

Common Workflows​

Initial Portal Extraction​

Comparison Extraction​

Re-Extraction​

Troubleshooting​

Extraction Stuck in "Populating"​

Some Metadata Types Skipped​

Extraction Very Slow​

"Failed to Extract" Error​

Can't See Extracted Metadata​

Differences Look Wrong​

Best Practices​

When to Extract​

Scope Management​

Performance Tips​

Related Features​

Next Steps​