Streamlit UI Guide
The LineageBridge Streamlit UI is the primary interactive interface for discovering, extracting, visualizing, and exploring stream lineage from Confluent Cloud.
Overview
The UI is a single-page Streamlit application with:
- Sidebar — Connection, infrastructure selection, extractors, filters, and legend
- Main area — Graph visualization and change watcher tabs
- Node details — Slide-in panel with attributes, neighbors, and deep links
Launch the UI:
Access at: http://localhost:8501
Getting Started
1. Connect to Confluent Cloud
Sidebar → Setup → Connection
On first launch, the Connection expander is open. You need:
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY— Cloud-level API keyLINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET— Cloud-level API secret
If credentials are found in .env, the UI shows:
Click Connect to authenticate. The UI will:
- Validate credentials
- List environments in your organization
- Cache selections for future sessions
Once connected:
The Connection expander collapses automatically.
2. Select Infrastructure
Sidebar → Setup → Infrastructure
After connecting, expand Infrastructure to select:
- Environment(s) — One or more Confluent Cloud environments
- Cluster(s) — Filter to specific Kafka clusters (optional)
The UI auto-discovers:
- All environments your API key can access
- All Kafka clusters in the selected environment(s)
- Schema Registry endpoints (if Stream Governance is enabled)
Example:
Environments: ✓ Production (env-abc123)
✓ Staging (env-def456)
☐ Dev (env-ghi789)
Clusters: All (default)
Selections are cached locally for the next session.
3. Extract Lineage
Sidebar → Extraction
Click Extract Lineage to run the extraction pipeline. The UI shows progress:
Phase 1/4: Extracting Kafka topics & consumer groups
Phase 2/4: Extracting connectors, ksqlDB, Flink
Phase 3/4: Enriching with schemas & catalog metadata
Phase 4/4: Extracting Tableflow & catalog integrations
Done: 142 nodes, 238 edges (CONFLUENT: 120, DATABRICKS: 15, AWS: 7)
Extraction time: Typically 10-30 seconds per environment, depending on:
- Number of clusters
- Number of topics, connectors, and queries
- Catalog enrichment enabled
4. Explore the Graph
After extraction, the main area shows the Lineage Graph tab with:
- Stats bar — Node count, edge count, environments, clusters, pipelines
- Interactive graph — Drag, zoom, click to inspect
- Detail panel — Attributes, neighbors, and deep links (on node click)
See Graph Visualization Guide for interaction details.
Sidebar Reference
The sidebar is organized into sections:
Setup
Connection
Status when connected:
Click Disconnect to clear credentials and reset the connection. This is useful when: - Switching between different Confluent Cloud accounts - Troubleshooting authentication issues - Testing with different credential sets
Status when disconnected:
No credentials found.
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=...
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=...
Expand to view masked credentials or reconnect. The UI shows the first 8 characters of your API key for verification.
Infrastructure
Environment Selector
Multi-select for environments:
Cluster Selector
Optional filter:
Or specific clusters:
Discover Button
Click Discover Environments to refresh the list from Confluent Cloud.
Extraction
Extractors
Sidebar → Extraction → Extractors
Toggle individual extractors on/off:
| Extractor | Extracts | Default |
|---|---|---|
| Kafka Admin | Topics, consumer groups | Always on (required) |
| Connect | Connectors, external datasets | ✓ On |
| ksqlDB | Persistent queries | ✓ On |
| Flink | Flink SQL statements | ✓ On |
| Schema Registry | Schemas | ✓ On |
| Stream Catalog | Tags, business metadata | ✓ On |
| Tableflow | Topic → table mappings | ✓ On |
| Catalog Enrichment | UC/Glue table metadata | ✓ On |
| Metrics | Throughput, consumer lag | ☐ Off (slower) |
When to disable extractors:
- Connect/ksqlDB/Flink off — You only care about raw Kafka topology
- Schema Registry off — No schemas in use
- Stream Catalog off — Not using tags/business metadata
- Tableflow off — No lakehouse integration
- Catalog Enrichment off — Faster extraction, less metadata
- Metrics on — Live throughput data (adds ~5-10s per cluster)
Actions
Extract Lineage — Run the full pipeline
Enrich Existing Graph — Backfill catalog metadata and metrics on the current graph (no re-extraction)
Load Demo Graph — Generate a sample graph for UI exploration (no Confluent credentials needed)
Publish
After extraction, the Publish section appears with catalog-specific controls.
Databricks
Sidebar → Publish → Databricks
Push lineage metadata to Unity Catalog tables:
- Set Table Properties — Write lineage as TBLPROPERTIES
- Set Table Comments — Write lineage as COMMENT
- Create Bridge Table — Write full lineage to a dedicated table
Requires:
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URLLINEAGE_BRIDGE_DATABRICKS_TOKENLINEAGE_BRIDGE_DATABRICKS_WAREHOUSE_ID(or auto-discovery)
Click Push Lineage to Databricks to write metadata.
AWS Glue
Sidebar → Publish → AWS Glue
Push lineage metadata to Glue tables:
- Set Table Parameters — Write lineage as table parameters
- Set Table Description — Write lineage as description
Requires:
LINEAGE_BRIDGE_AWS_REGION- AWS credentials (boto3 default credential chain)
Click Push Lineage to Glue to write metadata.
Google Data Lineage
Sidebar → Publish → Google Data Lineage
Push lineage as OpenLineage events to Google Data Lineage API:
Requires:
LINEAGE_BRIDGE_GCP_PROJECT_IDLINEAGE_BRIDGE_GCP_LOCATION- Google Application Default Credentials
Click Push Lineage to Google to send events.
AWS DataZone
Sidebar → Publish → AWS DataZone
Push lineage as OpenLineage events to AWS DataZone data products:
Requires:
LINEAGE_BRIDGE_AWS_DATAZONE_DOMAIN_IDLINEAGE_BRIDGE_AWS_DATAZONE_PROJECT_IDLINEAGE_BRIDGE_AWS_REGION- AWS credentials (boto3 default credential chain)
Click Push Lineage to DataZone to send events.
Graph
After extraction, the Graph section appears with filters and legend.
Filters
Sidebar → Graph → Filters
Environment Filter
Or filter to one environment:
Cluster Filter
Or filter to one cluster:
Search
Enter any substring to filter nodes by qualified name.
Focus Node + Hop Limit
When a node is selected and focus is enabled, the graph shows only:
- The selected node
- Nodes within N hops (upstream and downstream)
Hide Disconnected Nodes
When enabled, hides nodes with no edges (useful for large graphs).
Node Type Filters
Toggle visibility by node type:
☑ Kafka Topics
☑ Connectors
☑ ksqlDB Queries
☑ Flink Jobs
☑ Tableflow Tables
☑ UC Tables
☑ Glue Tables
☑ Schemas
☑ External Datasets
☑ Consumer Groups
Uncheck to hide nodes of that type.
Legend
Sidebar → Graph → Legend
Node Types
Color-coded list of node types with icons:
🔷 Kafka Topic (blue)
🔌 Connector (green)
🔄 ksqlDB Query (purple)
⚡ Flink Job (orange)
📊 Tableflow Table (teal)
🏛 UC Table (databricks orange)
🗄 Glue Table (aws orange)
📝 Schema (gray)
🌐 External Dataset (blue-gray)
👥 Consumer Group (indigo)
Edge Types
Edge direction and style:
→ PRODUCES (solid blue)
→ CONSUMES (solid green)
→ TRANSFORMS (solid purple)
→ MATERIALIZES (solid orange)
⤏ HAS_SCHEMA (dashed gray)
⤏ MEMBER_OF (dashed indigo)
Data
Load Data
Sidebar → Data → Load Data
Import JSON
Upload a previously exported lineage graph:
After upload, the graph loads automatically.
Export JSON
Download the current graph:
Downloads as lineage_graph.json.
Load Demo Graph
Generate a sample graph for testing:
No Confluent credentials needed — useful for:
- UI demos
- Screenshot generation
- Feature testing
Main Area
The main area has two tabs:
Lineage Graph
The Lineage Graph tab shows:
Header
Stats Bar
- Pipelines — Count of end-to-end flows (source connector → topic → sink connector)
Graph Canvas
Interactive vis.js graph with:
- Drag — Move nodes
- Zoom — Mouse wheel or pinch
- Click — Select a node (opens detail panel)
- Shift+drag — Region select (multi-select)
- Double-click — Center on node
Positions are persisted in browser localStorage.
Node Detail Panel
When a node is selected, the right side shows:
──────────────────────────────────
📋 orders
──────────────────────────────────
Type: Kafka Topic
System: CONFLUENT
Environment: Production (env-abc123)
Cluster: lkc-xyz789
Attributes:
partitions: 6
replication_factor: 3
retention_ms: 604800000
Upstream (2):
→ postgres-source (Connector)
→ customer-stream (ksqlDB Query)
Downstream (3):
→ orders-sink (Connector)
→ order-enrichment (Flink Job)
→ analytics-cg (Consumer Group)
🔗 View in Confluent Cloud
[✕ Close]
Deep Links:
- Kafka topics → Confluent Cloud topic page
- Connectors → Confluent Cloud connector page
- UC tables → Databricks table explorer
- Glue tables → AWS Glue console
Click Close or select another node to dismiss.
Change Watcher
The Change Watcher tab provides controls for the background watcher.
See Change Detection Guide for details.
Features
Auto-Save Selections
The UI caches these selections in ~/.lineage_bridge/ui_cache.json:
- Selected environment IDs
- Selected cluster IDs
- Extractor toggles
- Filter states
On next launch, selections are restored.
Graph Layout Persistence
Node positions are saved in browser localStorage (keyed by graph version). The layout persists across:
- Browser refreshes
- UI restarts
- Graph reloads (same topology)
To reset positions:
Sidebar → Graph → Filters
This clears saved positions and re-computes the DAG layout.
Extraction Progress
During extraction, a status indicator shows:
⏳ Extracting...
Phase 1/4: Extracting Kafka topics & consumer groups
✓ KafkaAdmin:lkc-xyz789
Phase 2/4: Extracting connectors, ksqlDB, Flink
✓ Connect:lkc-xyz789
⚠ Warning: Extractor 'ksqlDB' got 401 Unauthorized
✓ Flink
Phase 3/4: Enriching with schemas & catalog metadata
✓ SchemaRegistry
✓ StreamCatalog
Phase 4/4: Extracting Tableflow & catalog integrations
✓ Tableflow
Done: 142 nodes, 238 edges (CONFLUENT: 120, DATABRICKS: 15, AWS: 7)
Warnings are logged but do not fail extraction.
Focus Mode
Sidebar → Graph → Filters → Focus on selected node
When enabled:
- Click a node in the graph
- The graph filters to show only:
- The selected node
- Nodes within N hops (upstream and downstream)
- Adjust hop radius to expand/contract the neighborhood
Use cases:
- Trace upstream lineage for a specific table
- See downstream consumers of a topic
- Isolate a single pipeline
Demo Mode
Click Load Demo Graph to generate a sample graph:
- 30+ nodes across all node types
- Realistic topology (connectors → topics → transformations → tables)
- No Confluent credentials needed
Perfect for:
- UI walkthroughs
- Screenshot generation
- Feature demos
Tips & Tricks
Performance
- Large graphs (500+ nodes)? Enable Hide Disconnected Nodes and use filters
- Slow extraction? Disable Metrics extractor (saves 5-10s per cluster)
- Too many edges? Filter by node type to reduce visual clutter
Keyboard Shortcuts
| Key | Action |
|---|---|
| Shift+drag | Region select |
| Esc | Clear selection |
| Ctrl/Cmd + mouse wheel | Zoom in/out |
Multi-Select Workflow
- Hold Shift
- Drag a rectangle around nodes
- All nodes in the rectangle are selected
- Detail panel shows the first selected node
Export for CI/CD
Extract in the UI, then export JSON for automation:
- Extract lineage
- Click Export JSON in header
- Use the JSON in scripts, CI/CD, or API calls
Refresh After Changes
If you make changes in Confluent Cloud (new topics, connectors, etc.):
- Go to Sidebar → Extraction
- Click Extract Lineage to refresh
- Or enable the Change Watcher for automatic updates
Troubleshooting
"No credentials found"
Solution: Add credentials to .env:
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=your-key
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=your-secret
Restart the UI.
"401 Unauthorized"
Cause: Cloud-level API key cannot access cluster-scoped resources (topics, ksqlDB, etc.)
Solution: Add cluster-scoped API key:
Or use auto-provisioning (see Configuration).
"No Schema Registry endpoint found"
Cause: Stream Governance is not enabled for the environment
Solution:
- Enable Stream Governance in Confluent Cloud
- Or set
LINEAGE_BRIDGE_SCHEMA_REGISTRY_ENDPOINTmanually
Graph is empty after extraction
Check:
- Do you have topics in the selected clusters?
- Are all extractors enabled?
- Check console logs for errors (sidebar → Extraction → progress panel)
Positions reset after every load
Cause: Graph topology changed (new nodes, removed nodes)
Behavior: Layout resets when the set of node IDs changes. Positions persist only for identical graphs.
UI is slow with large graphs
Solutions:
- Enable Hide Disconnected Nodes
- Filter by environment/cluster
- Use Focus on selected node with low hop radius
- Disable unused node types
Next Steps
- Learn graph interaction: See Graph Visualization Guide
- Automate extraction: See CLI Tools Reference
- Set up auto-update: See Change Detection Guide
- Push lineage to catalogs: See Catalog Integration Guides