Extraction Failures
Debugging common issues when extraction succeeds but produces incomplete or unexpected results.
Timeout Errors
Symptoms
Causes
- Slow API response - Large clusters, many topics
- Network latency - Distance from API region, poor connectivity
- Overloaded backend - Confluent Cloud API under load
Solutions
Reduce Extraction Scope
Extract one cluster at a time:
Disable Heavy Extractors
Metrics and Stream Catalog can be slow:
# In UI: uncheck "Metrics" and "Stream Catalog" in sidebar
# Or disable via orchestrator in code
await run_extraction(
settings,
environment_ids=["env-abc123"],
enable_metrics=False,
enable_stream_catalog=False,
)
Split Extraction and Enrichment
# Extract without enrichment
uv run lineage-bridge-extract --env env-abc123 --no-enrich --output graph_raw.json
# Enrich separately
uv run lineage-bridge-extract --enrich-only --output graph_raw.json
Check Network
Test latency to API endpoints:
ping api.confluent.cloud
curl -w "@-" -o /dev/null -s https://api.confluent.cloud <<'EOF'
time_namelookup: %{time_namelookup}s\n
time_connect: %{time_connect}s\n
time_appconnect: %{time_appconnect}s\n
time_redirect: %{time_redirect}s\n
time_pretransfer: %{time_pretransfer}s\n
time_starttransfer: %{time_starttransfer}s\n
----------\n
time_total: %{time_total}s\n
EOF
Missing Topics
Symptoms
- Graph shows no topics
- Graph shows some topics but not all
- Specific topics are missing
Causes
- KafkaAdmin extractor disabled - Extractor not running
- Wrong cluster credentials - Using Cloud API key instead of cluster key
- Cluster filter mismatch - Topic exists in different cluster
- Permission issues - API key lacks read permissions
Solutions
Enable KafkaAdmin Extractor
Check in UI sidebar or CLI:
await run_extraction(
settings,
environment_ids=["env-abc123"],
enable_kafka_admin=True, # Ensure this is True
)
Verify Cluster Credentials
Some clusters require cluster-scoped API keys:
Create a cluster API key:
Remove Cluster Filter
If using --cluster, remove it to see all clusters:
Check Permissions
Verify API key has read permissions:
Test with Confluent CLI:
Missing Connectors
Symptoms
- Graph shows topics but no connectors
- Some connectors are missing
Causes
- Connect extractor disabled
- Wrong environment ID
- Connector in different cluster
Solutions
Enable Connect Extractor
Verify Environment
List connectors in environment:
Check Connector Status
Connector may be FAILED or DELETED:
Missing ksqlDB Queries
Symptoms
- Graph shows topics but no ksqlDB queries
- Some queries are missing
Causes
- ksqlDB extractor disabled
- Wrong ksqlDB credentials
- ksqlDB cluster not found
- Query is not persistent (only persistent queries are extracted)
Solutions
Enable ksqlDB Extractor
Verify ksqlDB Cluster
List ksqlDB clusters:
Check Query Type
Only persistent queries are extracted (CTAS, CSAS):
-- This is extracted
CREATE STREAM enriched_orders AS
SELECT * FROM orders WHERE amount > 100;
-- This is NOT extracted (transient query)
SELECT * FROM orders;
List persistent queries:
Missing Flink Jobs
Symptoms
- Graph shows topics but no Flink statements
- Some statements are missing
Causes
- Flink extractor disabled
- Wrong organization ID
- Flink credentials missing
- Statement is STOPPED or DELETED
Solutions
Enable Flink Extractor
Verify Organization ID
Flink API requires organization ID:
# Check logs for organization ID
uv run lineage-bridge-extract --env env-abc123 | grep "organization"
If missing:
Organization ID is auto-discovered from cluster metadata. If this fails, it's a bug — please report it.
Check Statement Status
Only RUNNING statements are extracted:
Missing Schemas
Symptoms
- Graph shows topics but no schema nodes
- HAS_SCHEMA edges are missing
- Schema enrichment skipped
Causes
- Schema Registry extractor disabled
- Schema Registry endpoint not found
- Schema Registry credentials missing
- Topics don't use schemas (e.g., JSON, Avro with embedded schema)
Solutions
Enable Schema Registry Extractor
Verify Schema Registry Endpoint
Check logs for:
If not found:
No Schema Registry endpoint found for env-abc123.
Set LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_KEY in .env
or check Stream Governance is enabled.
Manually set endpoint:
# .env
LINEAGE_BRIDGE_SCHEMA_REGISTRY_ENDPOINT=https://psrc-abc123.us-east-1.aws.confluent.cloud
LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_KEY=SR_KEY
LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_SECRET=SR_SECRET
Check Schema Exists
List schemas:
Topics without schemas won't have HAS_SCHEMA edges.
Missing Catalog Tables
Symptoms
- Graph shows topics but no UC/Glue/BigQuery tables
- MATERIALIZES edges are missing
- Tableflow extraction skipped
Causes
- Tableflow extractor disabled
- No Tableflow integrations configured
- Catalog credentials missing
- Table not mapped to topic
Solutions
Enable Tableflow Extractor
Verify Tableflow Integrations
Check if Tableflow is configured in Confluent Cloud:
- Confluent Cloud UI → Environment → Tableflow
- Verify integrations exist (Unity Catalog, AWS Glue, etc.)
- Verify topic-table mappings exist
Verify Catalog Credentials
Databricks UC:
# .env
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URL=https://myworkspace.cloud.databricks.com
LINEAGE_BRIDGE_DATABRICKS_TOKEN=dapi...
AWS Glue:
Google BigQuery:
Test credentials:
# Databricks
curl -H "Authorization: Bearer $LINEAGE_BRIDGE_DATABRICKS_TOKEN" \
"$LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URL/api/2.0/sql/warehouses"
# AWS
aws glue get-databases --region us-east-1
# Google
bq ls $LINEAGE_BRIDGE_GCP_PROJECT_ID:
Orphan Nodes
Symptoms
Graph validation warnings:
Causes
- Topic has no producers or consumers
- Producers/consumers in disabled extractors
- Expected behavior - Not all topics have lineage
Solutions
Enable All Extractors
Orphan topics may have producers/consumers in disabled extractors:
await run_extraction(
settings,
environment_ids=["env-abc123"],
enable_connect=True,
enable_ksqldb=True,
enable_flink=True,
)
Check Topic Usage
Topic may genuinely have no producers or consumers:
# Check consumer groups
confluent kafka topic describe my-topic --cluster lkc-xyz789
# Check connectors
confluent connect list --cluster lcc-abc123
Ignore Warnings
Orphan nodes are expected in partial extractions. The graph is still valid and usable.
To hide orphans in UI, use the filter:
Dangling Edges
Symptoms
Graph validation warnings:
Dangling edge src: confluent:connector:env-abc123:lkc-xyz789:mysql-source
Dangling edge dst: confluent:kafka_topic:env-abc123:orders
Causes
- Bug in extractor - Edge references non-existent node
- Race condition - Node deleted between extraction phases
- Cluster filter - Edge references node in filtered-out cluster
Solutions
Enable Full Extraction
Remove cluster filters:
Report Bug
Dangling edges should not occur. If you see this consistently, it's a bug:
- Export graph:
graph.to_json_file("graph.json") - Open GitHub issue with graph JSON and extraction logs
Workaround
Graph is still usable. Dangling edges are ignored in rendering.
Partial Extraction Results
Symptoms
- Some extractors return empty results
- Graph is smaller than expected
- Multiple warnings in logs
Causes
- Extractor failures - API errors, timeouts
- Missing credentials - Some extractors skipped
- Resource not configured - No connectors, ksqlDB, etc.
Solutions
Review Logs
Look for warnings:
Common warnings:
Warning: No Schema Registry endpoint found
Warning: Extractor 'Connect' failed: 403 Forbidden
Warning: Could not determine organization ID for Flink
Enable Debug Logging
Extract in Stages
Extract different resource types separately to isolate failures:
# Extract topics only
await run_extraction(
settings,
environment_ids=["env-abc123"],
enable_connect=False,
enable_ksqldb=False,
enable_flink=False,
enable_schema_registry=False,
)
Next Steps
- API Errors - HTTP error codes
- Credential Issues - Authentication problems
- Performance - Optimization tips