API Errors
Common HTTP status codes and how LineageBridge handles them.
Error Handling Strategy
LineageBridge uses a fail-safe extraction strategy:
- Retryable errors (429, 500, 502, 503, 504) - Automatic retry with exponential backoff
- Non-retryable errors (400, 401, 403, 404) - Log warning, return empty results, continue extraction
- Timeout errors - Log warning, return empty results, continue extraction
This ensures partial extraction succeeds even if some APIs fail.
400 Bad Request
Description
The request was malformed or contained invalid parameters.
Common Causes
- Invalid environment ID or cluster ID
- Malformed query parameters
- Invalid JSON payload
LineageBridge Behavior
Logs warning and returns empty results:
Extractor 'KafkaAdmin' got 400 Bad Request.
The API key may not have access to this environment,
or the API parameters are invalid.
Solutions
-
Verify environment ID:
-
Verify cluster ID:
-
Check API key scope:
-
Enable debug logging:
401 Unauthorized
Description
The API key or token is missing, invalid, or expired.
Common Causes
- Missing credentials in
.env - Expired API key or token
- Wrong API key type (Cloud vs cluster-scoped)
LineageBridge Behavior
Logs helpful error message:
Extractor 'KafkaAdmin:lkc-xyz789' got 401 Unauthorized.
This likely means a cluster-scoped API key is needed.
Set LINEAGE_BRIDGE_KAFKA_API_KEY in .env.
Solutions
See Credential Issues: 401 Unauthorized.
403 Forbidden
Description
The API key lacks required permissions.
Common Causes
- Insufficient RBAC roles
- API key scoped to wrong resource
- Organization-level restrictions
LineageBridge Behavior
Logs warning:
Solutions
See Credential Issues: 403 Forbidden.
404 Not Found
Description
The requested resource does not exist.
Common Causes
- Wrong environment ID or cluster ID
- Resource was deleted
- Wrong API endpoint URL
LineageBridge Behavior
Logs warning and continues:
Extractor 'SchemaRegistry' got 404 Not Found.
Schema Registry may not be enabled in this environment.
Solutions
-
Verify the resource exists:
-
Check Schema Registry status:
-
Manually set SR endpoint:
429 Too Many Requests
Description
Rate limit exceeded. The client is sending too many requests.
Common Causes
- Excessive API calls in short time window
- Multiple concurrent extractions
- Organization-wide rate limit hit
LineageBridge Behavior
Automatic retry with exponential backoff: - Retry 1: Wait 1 second - Retry 2: Wait 2 seconds - Retry 3: Wait 4 seconds - Respects Retry-After header if present
Log message:
Solutions
- Wait for rate limit to reset - Automatic with retries
- Reduce concurrent extractors - Disable unused extractors in UI
- Increase retry backoff - Not currently configurable
- Use caching - Enable graph caching:
Rate Limit Details
Confluent Cloud API rate limits (approximate): - Cloud API: 1000 requests/hour per API key - Kafka REST: 1000 requests/hour per cluster - Schema Registry: 1000 requests/hour per cluster
500 Internal Server Error
Description
The API server encountered an unexpected condition.
Common Causes
- Transient server error
- Backend service outage
- Malformed data in upstream system
LineageBridge Behavior
Automatic retry with exponential backoff (same as 429).
Log message:
Solutions
- Wait for automatic retry - LineageBridge retries up to 3 times
- Check Confluent Cloud status - https://status.confluent.cloud
- Try again later - If retries fail, wait a few minutes and re-run extraction
- Report persistent errors - If error persists >1 hour, contact Confluent support
502 Bad Gateway
Description
The API gateway received an invalid response from upstream.
Common Causes
- Backend service restart or deployment
- Network issues between gateway and backend
- Timeout in upstream service
LineageBridge Behavior
Automatic retry (same as 500).
Solutions
Same as 500 Internal Server Error.
503 Service Unavailable
Description
The API server is temporarily unable to handle requests.
Common Causes
- Planned maintenance
- Overloaded backend
- Resource exhaustion
LineageBridge Behavior
Automatic retry (same as 500).
Log message:
Retryable status 503 for POST /ksqldb/v2/clusters/lksqlc-abc/ksql (attempt 2/4)
Sleeping 2.0s before retry
Solutions
- Check maintenance windows - Confluent Cloud status page
- Retry with backoff - Automatic
- Reduce load - Disable heavy extractors (metrics, stream catalog)
504 Gateway Timeout
Description
The API gateway timed out waiting for upstream response.
Common Causes
- Long-running query in backend
- Database query timeout
- Network latency
LineageBridge Behavior
Automatic retry (same as 500).
Solutions
-
Reduce query scope - Filter by specific cluster:
-
Increase timeout - Not currently configurable (30s per request, 120s per extractor)
-
Split extraction - Extract clusters separately and merge graphs
Timeout Errors (Client-Side)
Description
LineageBridge timed out waiting for API response.
Timeouts
- Per-request timeout: 30 seconds (httpx client)
- Per-extractor timeout: 120 seconds (orchestrator)
Common Causes
- Slow network connection
- Large response payload (1000s of resources)
- Overloaded API server
LineageBridge Behavior
Logs warning and continues:
Solutions
-
Check network connectivity:
-
Reduce extraction scope:
-
Disable slow extractors:
- Metrics (optional, can be slow for large clusters)
-
Stream Catalog (optional)
-
Extract in stages:
Retry Configuration
Default Policy
Backoff delay: 1.0 * (2 ** attempt) seconds (1s, 2s, 4s)
Customization
Retry configuration is not currently exposed via environment variables. To customize:
-
Edit
lineage_bridge/clients/base.py: -
Or create a custom client class extending
ConfluentClient.
Error Recovery Workflow
graph TD
A[API Request] --> B{HTTP Status}
B -->|200-299| C[Success]
B -->|429, 5xx| D[Retryable Error]
B -->|400, 401, 403, 404| E[Non-Retryable Error]
B -->|Timeout| F[Timeout Error]
D --> G{Retry Count < 3?}
G -->|Yes| H[Exponential Backoff]
H --> A
G -->|No| I[Log Warning]
E --> I
F --> I
I --> J[Return Empty Results]
J --> K[Continue Extraction]
C --> L[Return Results]
L --> K Debugging API Errors
Enable Debug Logging
This logs all HTTP requests and responses:
DEBUG Request GET /kafka/v3/clusters params={'environment': 'env-abc123'} attempt=1
DEBUG Response 200 GET /kafka/v3/clusters (0.234s)
Inspect Raw Responses
Use curl to test API endpoints directly:
# Cloud API
curl -u "$LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY:$LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET" \
"https://api.confluent.cloud/org/v2/environments"
# Kafka REST API
curl -u "$LINEAGE_BRIDGE_KAFKA_API_KEY:$LINEAGE_BRIDGE_KAFKA_API_SECRET" \
"https://pkc-abc123.us-east-1.aws.confluent.cloud:443/kafka/v3/clusters/lkc-xyz789/topics"
Check API Status
- Confluent Cloud: https://status.confluent.cloud
- Databricks: https://status.databricks.com
- AWS: https://status.aws.amazon.com
- Google Cloud: https://status.cloud.google.com
Next Steps
- Credential Issues - 401/403 error solutions
- Extraction Failures - Missing data debugging
- Performance - Timeout and slowness optimization