CLI Tools Reference
LineageBridge provides command-line tools for lineage extraction, monitoring, UI access, and API serving.
Quick Start Script
For first-time users, the quickstart script provides a one-line demo:
curl -fsSL https://raw.githubusercontent.com/takabayashi/lineage-bridge/main/scripts/quickstart.sh | bash
This automatically installs dependencies, launches the UI with a sample graph, and opens your browser. No credentials needed—perfect for evaluation.
Or run locally via Make:
Overview
| Command | Purpose | Entry Point |
|---|---|---|
lineage-bridge-extract | Extract lineage and export to JSON | lineage_bridge.extractors.orchestrator:main |
lineage-bridge-watch | Monitor for changes and auto-extract | lineage_bridge.watcher.cli:main |
lineage-bridge-ui | Launch Streamlit UI | lineage_bridge.ui.app:run |
lineage-bridge-api | Start REST API server | lineage_bridge.api.main:main |
All commands are installed as scripts via pyproject.toml and are available after installation.
lineage-bridge-extract
Extract lineage from Confluent Cloud and export as JSON.
Synopsis
Required Arguments
| Flag | Description |
|---|---|
--env ENV_ID | Environment ID to scan. Repeatable for multiple environments. |
Optional Arguments
| Flag | Type | Default | Description |
|---|---|---|---|
--cluster CLUSTER_ID | string | None | Cluster ID filter. Repeatable. If omitted, scans all clusters. |
--output PATH | string | ./lineage_graph.json | Output JSON file path. |
--no-enrich | flag | false | Skip catalog and metrics enrichment (extraction only). |
--enrich-only | flag | false | Enrich an existing graph file (reads from --output path). |
--push-lineage | flag | false | Push lineage metadata to Databricks UC tables after extraction. |
Examples
Basic Extraction
Extract from one environment, save to default path:
Output:
Multi-Environment Extraction
Extract from multiple environments:
Cluster Filter
Extract only specific clusters:
Extract Without Enrichment
Skip catalog enrichment and metrics (faster, but less metadata):
Enrich Existing Graph
Backfill catalog metadata and metrics on a previously extracted graph:
This is useful when:
- You want to add catalog data after extraction
- Catalog credentials were unavailable during initial extraction
- You want to refresh metrics without re-extracting topology
Push to Databricks
Extract and push lineage to Unity Catalog:
Requires:
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URLLINEAGE_BRIDGE_DATABRICKS_TOKENLINEAGE_BRIDGE_DATABRICKS_WAREHOUSE_ID(or auto-discovery)
Environment Variables
See Configuration for all settings. Key variables:
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY— Cloud API key (required)LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET— Cloud API secret (required)LINEAGE_BRIDGE_KAFKA_API_KEY— Cluster-scoped API key (recommended)LINEAGE_BRIDGE_KAFKA_API_SECRET— Cluster-scoped API secret (recommended)LINEAGE_BRIDGE_ENABLE_METRICS— Set totrueto include throughput metricsLINEAGE_BRIDGE_LOG_LEVEL—DEBUG,INFO,WARNING,ERROR
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
1 | Interrupted (Ctrl+C) |
2 | Extraction failed (exception) |
Extraction Phases
The extractor runs a 5-phase pipeline:
- KafkaAdmin — Topic inventory and consumer groups (sequential per cluster)
- Connect, ksqlDB, Flink — Transformation edges (parallel)
- SchemaRegistry, StreamCatalog — Schema and metadata enrichment (parallel)
- Tableflow — Topic → table → catalog mapping
- Metrics — Throughput enrichment (optional, if
ENABLE_METRICS=true)
Each phase logs progress:
[INFO] Phase 1/4: Extracting Kafka topics & consumer groups
[INFO] Phase 2/4: Extracting connectors, ksqlDB, Flink
[INFO] Phase 3/4: Enriching with schemas & catalog metadata
[INFO] Phase 4/4: Extracting Tableflow & catalog integrations
[INFO] Done: 142 nodes, 238 edges (CONFLUENT: 120, DATABRICKS: 15, AWS: 7)
Output Format
The JSON output is a serialized LineageGraph object with this structure:
{
"nodes": [
{
"id": "confluent:kafka_topic:env-abc123:orders",
"node_type": "KAFKA_TOPIC",
"system": "CONFLUENT",
"qualified_name": "orders",
"display_name": "orders",
"environment_id": "env-abc123",
"cluster_id": "lkc-xyz789",
"attributes": { ... }
}
],
"edges": [
{
"src_id": "confluent:connector:env-abc123:postgres-source",
"dst_id": "confluent:kafka_topic:env-abc123:orders",
"edge_type": "PRODUCES"
}
]
}
lineage-bridge-watch
Monitor Confluent Cloud for lineage changes and trigger automatic extraction.
Synopsis
Required Arguments
| Flag | Description |
|---|---|
--env ENV_ID | Environment ID to monitor. Repeatable for multiple environments. |
Optional Arguments
| Flag | Type | Default | Description |
|---|---|---|---|
--cluster CLUSTER_ID | string | None | Cluster ID filter. Repeatable. |
--cooldown SECONDS | float | 30.0 | Seconds to wait after last change before triggering extraction. |
--poll-interval SECONDS | float | 10.0 | Seconds between REST API polls. |
--push-uc | flag | false | Push lineage to Databricks UC after each extraction. |
--push-glue | flag | false | Push lineage to AWS Glue after each extraction. |
Examples
Basic Watcher (Polling Mode)
Monitor one environment with default settings:
Output:
The watcher will:
- Poll every 10 seconds
- Detect changes (topic create/delete, connector updates, etc.)
- Wait 30 seconds after the last change
- Trigger extraction and save to
./lineage_graph.json
Custom Cooldown
Increase cooldown to reduce extraction frequency:
Use longer cooldowns (60-120s) when:
- Changes are frequent but extraction is expensive
- You want to batch multiple changes into one extraction
Fast Polling
Poll more frequently:
Shorter intervals (5-10s) are useful when:
- You need near-real-time lineage updates
- Changes are infrequent but must be detected quickly
Auto-Push to Databricks
Extract and push lineage to Unity Catalog after each change:
Requires Databricks credentials in environment.
Multi-Environment Monitoring
Watch multiple environments:
Watcher States
| State | Description |
|---|---|
WATCHING | Actively polling or consuming events |
COOLDOWN | Change detected, waiting for cooldown period to expire |
EXTRACTING | Running extraction pipeline |
STOPPED | Watcher terminated |
Event Detection
The watcher monitors these lineage-relevant events:
- Topic creation/deletion
- Connector creation/update/deletion
- ksqlDB cluster creation/deletion
- Flink statement creation/update/deletion
- Schema registration
Environment Variables
See Configuration for all extraction-related settings (credentials, enrichment, etc.).
Exit Codes
| Code | Meaning |
|---|---|
0 | Success (Ctrl+C) |
The watcher runs indefinitely until interrupted.
lineage-bridge-ui
Launch the interactive Streamlit UI.
Synopsis
No command-line arguments. Configuration is via environment variables.
What It Does
This command is a wrapper that runs:
Access
After launch, the UI is available at:
Streamlit will automatically open your browser.
UI Features
- Connection panel — Connect to Confluent Cloud
- Infrastructure selector — Choose environments and clusters
- Extraction controls — Configure and run extraction
- Graph visualization — Interactive DAG with filters
- Node details — Inspect nodes with deep links
- Watcher controls — Start/stop change detection
- Export/import — Save and load graphs as JSON
See Streamlit UI Guide for detailed walkthrough.
Environment Variables
The UI reads all LineageBridge settings from environment variables. See Configuration.
Required for connection:
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEYLINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET
Optional for full functionality:
- Cluster-scoped API keys
- Schema Registry credentials
- Databricks/AWS/GCP credentials
Port Configuration
To change the default port:
Or set in ~/.streamlit/config.toml:
lineage-bridge-api
Start the REST API server for programmatic access.
Synopsis
No command-line arguments. Configuration is via environment variables.
What It Does
Starts a FastAPI + Uvicorn server with 25 REST endpoints for programmatic lineage access.
Key endpoints:
GET /api/v1/health— Health check (no auth)POST /api/v1/lineage/events— Ingest OpenLineage eventsGET /api/v1/graphs— Manage lineage graphsPOST /api/v1/tasks/extract— Trigger async extraction
See API Reference for all 25 endpoints.
Access
Start Here
After starting the server, visit http://localhost:8000/docs in your browser for interactive API testing!
API Base URL: http://localhost:8000/api/v1/
Interactive Documentation:
- Swagger UI: http://localhost:8000/docs (recommended)
- ReDoc: http://localhost:8000/redoc
- OpenAPI Spec: http://localhost:8000/openapi.json
Environment Variables
See Configuration. Key variables:
| Variable | Default | Description |
|---|---|---|
LINEAGE_BRIDGE_API_HOST | 0.0.0.0 | Bind address |
LINEAGE_BRIDGE_API_PORT | 8000 | Port |
LINEAGE_BRIDGE_API_KEY | None | Optional API key for authentication |
All extraction and catalog credentials also apply.
Authentication
If LINEAGE_BRIDGE_API_KEY is set, all endpoints require the key in the X-API-Key header:
If unset, the API is unauthenticated (suitable for internal networks only).
Example Usage
List Environments
Response:
{
"environments": [
{
"id": "env-abc123",
"name": "Production",
"stream_governance": { "package": "ESSENTIALS" }
}
]
}
Trigger Extraction
curl -X POST http://localhost:8000/api/v1/extract \
-H "Content-Type: application/json" \
-d '{
"environment_ids": ["env-abc123"],
"enable_metrics": false
}'
Response:
Retrieve Graph
Returns the most recently extracted graph.
See API Reference for full endpoint documentation.
Common Patterns
CI/CD Pipeline
Extract lineage in a GitHub Actions workflow:
- name: Extract Lineage
run: |
lineage-bridge-extract \
--env ${{ secrets.CONFLUENT_ENV_ID }} \
--output lineage.json
env:
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY: ${{ secrets.API_KEY }}
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET: ${{ secrets.API_SECRET }}
- name: Upload Artifact
uses: actions/upload-artifact@v3
with:
name: lineage-graph
path: lineage.json
Scheduled Extraction
Run extraction daily via cron:
# crontab -e
0 2 * * * cd /path/to/lineage-bridge && \
uv run lineage-bridge-extract --env env-abc123 \
--output /data/lineage-$(date +\%Y\%m\%d).json
Watcher as systemd Service
Create /etc/systemd/system/lineage-bridge-watcher.service:
[Unit]
Description=LineageBridge Watcher
After=network.target
[Service]
Type=simple
User=lineage
WorkingDirectory=/opt/lineage-bridge
EnvironmentFile=/opt/lineage-bridge/.env
ExecStart=/usr/local/bin/lineage-bridge-watch --env env-abc123 --cooldown 60
Restart=on-failure
[Install]
WantedBy=multi-user.target
Enable and start:
Docker Compose
Extract via Docker:
services:
extract:
image: lineage-bridge:latest
command: lineage-bridge-extract --env env-abc123
env_file: .env
volumes:
- ./output:/output
Run:
Next Steps
- Visual exploration? See Streamlit UI Guide
- Graph interaction? See Graph Visualization Guide
- Auto-update? See Change Detection Guide
- API integration? See API Reference