Demo Environments

Welcome to the LineageBridge demo playground. We've built three production-grade demos that showcase end-to-end lineage extraction from Confluent Cloud to different data catalogs. Each demo provisions a complete Kafka environment with streaming transformations and catalog integrations, letting you see LineageBridge in action within minutes.

Quick Start

Pick your preferred data catalog and spin up a live environment:

Unity Catalog (Databricks)AWS GlueBigQuery

cd infra/demos/uc
make demo-up      # Provisions everything (~15 min)
cd ../../..
uv run lineage-bridge-extract
uv run streamlit run lineage_bridge/ui/app.py

cd infra/demos/glue
make demo-up      # Provisions everything (~12 min)
cd ../../..
uv run lineage-bridge-extract
uv run streamlit run lineage_bridge/ui/app.py

cd infra/demos/bigquery
make demo-up      # Provisions everything (~10 min)
cd ../../..
uv run lineage-bridge-extract
uv run streamlit run lineage_bridge/ui/app.py

Each command provisions a live Confluent Cloud environment with real streaming data, runs extraction, and launches the interactive lineage graph UI.

Demo Comparison

Choose the demo that matches your data stack or evaluation criteria:

Feature	Unity Catalog	AWS Glue	BigQuery
Catalog	Databricks Unity Catalog	AWS Glue Data Catalog	Google BigQuery
Cloud	AWS (us-east-1)	AWS (us-east-1)	GCP (us-east1)
Integration	Tableflow → Delta Lake	Tableflow → Iceberg	BigQuery Sink Connector
Topics	orders_v2, customers_v2	orders_v2, customers_v2	orders_v2, customers_v2
Flink Jobs	enriched_orders, order_stats	enriched_orders, order_stats	enriched_orders, order_stats
ksqlDB	high_value_orders stream	-	-
Additional Sinks	PostgreSQL RDS (enriched)	-	-
Databricks Job	customer_order_summary (Delta)	-	-
Est. Monthly Cost	~$711	~$211	~$211
Terraform Resources	~55	~30	~22
Complexity	High (richest lineage)	Medium	Low (simplest setup)

Lineage Depth

Unity Catalog: Full multi-hop lineage from PostgreSQL sources through Kafka topics, Flink SQL transformations, ksqlDB streams, Tableflow tables, and Databricks Delta tables. Includes scheduled Databricks notebook jobs that create derived analytics tables.
AWS Glue: Kafka → Flink → Tableflow (Iceberg) → Glue Data Catalog. Clean lineage showing how Confluent Tableflow materializes Kafka topics as Iceberg tables registered in AWS Glue.
BigQuery: Kafka → Flink → BigQuery Sink Connector. Simplest path from streaming data to a cloud data warehouse with native BigQuery lineage metadata.

Architecture Overview

All demos share a common Confluent Cloud foundation built via the confluent-core Terraform module:

graph LR
    subgraph "Confluent Cloud (all demos)"
        DG1[Datagen: Orders] -->|produces| T1[orders_v2]
        DG2[Datagen: Customers] -->|produces| T2[customers_v2]
        T1 -->|consumes| F1[Flink: enriched_orders]
        T2 -->|consumes| F1
        T1 -->|consumes| F2[Flink: order_stats]
        F1 -->|produces| T3[enriched_orders]
        F2 -->|produces| T4[order_stats]
    end

    T3 -.UC demo.-> TF1[Tableflow: Delta]
    T4 -.UC demo.-> TF2[Tableflow: Delta]
    TF1 --> UC[Unity Catalog]
    TF2 --> UC

    T3 -.Glue demo.-> TF3[Tableflow: Iceberg]
    T4 -.Glue demo.-> TF4[Tableflow: Iceberg]
    TF3 --> GLUE[AWS Glue]
    TF4 --> GLUE

    T3 -.BQ demo.-> BQC1[BQ Sink Connector]
    T4 -.BQ demo.-> BQC2[BQ Sink Connector]
    BQC1 --> BQ[BigQuery]
    BQC2 --> BQ

    style UC fill:#ff6b35
    style GLUE fill:#ff9a00
    style BQ fill:#4285f4

What Gets Provisioned

Each demo creates:

Confluent Cloud Environment — Isolated namespace with Stream Governance (Essentials package)
Kafka Cluster — Single-zone Basic cluster (AWS or GCP based on demo)
Service Account + RBAC — Service principal with CloudClusterAdmin + EnvironmentAdmin roles
API Keys — Kafka, Schema Registry, Flink, Tableflow (auto-rotated credentials)
Source Connectors — Two Datagen connectors producing orders and customers with realistic schemas
Flink Compute Pool — Serverless Flink runtime (5 CFUs) for SQL transformations
Flink SQL Statements — Streaming JOIN (enriched_orders) + windowed aggregation (order_stats)
Catalog-Specific Resources — Tableflow integrations, sink connectors, external storage, IAM roles

Prerequisites

Before running any demo, ensure you have these tools and accounts configured:

CLI Tools

All demos require:

Terraform >= 1.5 (Download)
Confluent CLI (Install) — must be logged in: confluent login --save

Demo-specific tools:

Unity CatalogAWS GlueBigQuery

AWS CLI (Install) — aws configure or aws sso login
Databricks CLI (Install) — databricks configure

AWS CLI (Install) — aws configure or aws sso login

gcloud CLI (Install) — gcloud auth login and gcloud auth application-default login

Cloud Accounts

You'll need active accounts with sufficient permissions:

Demo	Accounts Required	Key Permissions
Unity Catalog	Confluent Cloud, AWS, Databricks	Confluent: create environments/clusters. AWS: IAM roles, S3, RDS. Databricks: create catalogs, external locations, notebooks, jobs
AWS Glue	Confluent Cloud, AWS	Confluent: create environments/clusters. AWS: IAM roles, S3, Glue Data Catalog
BigQuery	Confluent Cloud, GCP	Confluent: create environments/clusters (GCP region). GCP: BigQuery datasets, service accounts

Credentials Setup

Each demo includes an interactive make setup command that auto-detects credentials from your environment and prompts for missing values. You can also manually create a terraform.tfvars file in the demo directory (see terraform.tfvars.example in each demo).

Cost Estimates

Monthly costs assuming 24x7 operation with continuous data generation:

Unity Catalog Demo (~$711/month)

Confluent Cloud Kafka cluster (Basic, AWS): ~$80
Confluent Flink compute pool (5 CFUs): ~$450
Confluent ksqlDB cluster (4 CSUs): ~$32
Confluent Tableflow BYOB (Delta): ~$25
AWS RDS PostgreSQL (db.t4g.micro): ~$15
AWS S3 storage (minimal data volume): ~$5
Datagen source connectors (2): included
Databricks workspace + compute (minimal): ~$100

AWS Glue Demo (~$211/month)

Confluent Cloud Kafka cluster (Basic, AWS): ~$80
Confluent Flink compute pool (5 CFUs): ~$450
Confluent Tableflow BYOB (Iceberg): ~$25
AWS S3 storage (minimal data volume): ~$5
AWS Glue Data Catalog (minimal API calls): ~$1
Datagen source connectors (2): included

BigQuery Demo (~$211/month)

Confluent Cloud Kafka cluster (Basic, GCP): ~$80
Confluent Flink compute pool (5 CFUs): ~$450
BigQuery storage (minimal data volume): ~$1
BigQuery streaming inserts (minimal): ~$5
GCP service account: free
Datagen source connectors (2): included

Cost Reduction Tips

Run demos only when actively testing (use make demo-down to destroy resources)
Flink compute pools account for ~60% of costs — pause them when not in use via Confluent Cloud Console
Use spot/preemptible instances for Databricks jobs in the UC demo
All demos use Basic clusters and minimal compute to keep costs low

Expected Lineage Graph

After running extraction, you'll see a directed graph with nodes representing:

Kafka Topics — Source data streams (orders_v2, customers_v2, enriched_orders, order_stats, high_value_orders)
Connectors — Datagen sources, PostgreSQL sink, BigQuery sink (depending on demo)
Flink Jobs — SQL transformations (enriched_orders join, order_stats aggregation)
ksqlDB Queries — Stream filtering (high_value_orders, UC demo only)
Tableflow Tables — Delta or Iceberg tables (UC/Glue demos)
Catalog Tables — Unity Catalog tables, Glue tables, or BigQuery tables
Schemas — Avro schemas registered in Schema Registry
Consumer Groups — Active consumer group subscriptions

Edges represent lineage relationships:

PRODUCES — Data producers (connectors → topics, Flink → topics)
CONSUMES — Data consumers (Flink/ksqlDB ← topics)
TRANSFORMS — Derived data (Flink job transforms input topics)
MATERIALIZES — Tables created from topics (Tableflow → catalog tables)
HAS_SCHEMA — Schema associations (topics ↔ schemas)

Validation Steps

After provisioning, validate your demo environment:

Confluent Cloud ConsoleCatalog UILineageBridge UI

Navigate to Environments — you should see a new environment named lb-{demo}-{random}
Open the Kafka cluster — verify topics: lineage_bridge.orders_v2, lineage_bridge.customers_v2, lineage_bridge.enriched_orders, lineage_bridge.order_stats
Check Connectors — two Datagen connectors should be in RUNNING state
Open Flink — two statements should be RUNNING (enriched_orders, order_stats)
Inspect Schema Registry — schemas registered for all topics

Unity Catalog:

Open Databricks workspace → Data Explorer → find catalog lb_uc_{random} → schema lkc_{cluster} → tables: lineage_bridge_orders_v2, lineage_bridge_customers_v2, customer_order_summary

AWS Glue:

AWS Console → Glue → Data Catalog → Databases → find lkc_{cluster} → Tables: lineage_bridge_orders_v2, lineage_bridge_customers_v2, lineage_bridge_order_stats

BigQuery:

GCP Console → BigQuery → lineage_bridge dataset → Tables: lineage_bridge_enriched_orders, lineage_bridge_order_stats

Run extraction and open the UI:

uv run lineage-bridge-extract
uv run streamlit run lineage_bridge/ui/app.py

Verify graph rendering with node types filtered, hierarchical layout, and clickable nodes showing metadata panels.

Teardown

Clean up resources to stop incurring costs:

cd infra/demos/{uc|glue|bigquery}
make demo-down

This destroys all Terraform-managed resources including Confluent Cloud clusters, cloud storage, IAM roles, and catalog integrations.

Orphaned Resources

Some catalog resources may be created dynamically by Tableflow or connectors outside Terraform state. The UC demo includes a pre-destroy cleanup script (scripts/cleanup-catalog.sh) that removes auto-created Databricks schemas. For Glue/BigQuery, manually verify no orphaned tables remain after teardown.

Troubleshooting

Terraform Apply Fails

Symptom: Error: timeout while waiting for state to become 'RUNNING'

Cause: Confluent Cloud resources (Flink pools, connectors, ksqlDB clusters) can take 5-15 minutes to provision.

Fix: Terraform automatically retries. If it fails after 30 minutes, check Confluent Cloud Console for resource state and re-run terraform apply.

Datagen Connectors Not Producing Data

Symptom: Topics exist but have no data.

Cause: Datagen connectors may be paused or in FAILED state.

Fix:

# Via Confluent CLI
confluent kafka topic consume lineage_bridge.orders_v2 --from-beginning --max-messages 5

# If empty, restart connector via UI or CLI
confluent connect cluster list  # Find connector IDs
confluent connect cluster pause <connector-id>
confluent connect cluster resume <connector-id>

Tableflow Tables Not Appearing in Catalog

Symptom: Kafka topics exist, Flink jobs running, but no tables in Unity Catalog / Glue / BigQuery.

Cause: Tableflow registration can take 3-5 minutes after topic creation.

Fix: Wait for the health check script (scripts/wait-for-ready.sh) to complete during provisioning. Manually verify:

# Check Tableflow topic status
curl -u "$CONFLUENT_TABLEFLOW_API_KEY:$CONFLUENT_TABLEFLOW_API_SECRET" \
  "https://api.confluent.cloud/tableflow/v1/topics?environment=$ENV_ID&kafka_cluster=$CLUSTER_ID"

IAM Role Trust Relationship Errors (UC/Glue demos)

Symptom: AssumeRole failed: AccessDenied

Cause: IAM trust policy not fully propagated (AWS eventual consistency).

Fix: Terraform includes 30-60s sleep delays between trust policy updates. If errors persist, re-run terraform apply after 5 minutes.

Next Steps

Unity Catalog Demo Guide → — Full walkthrough for the UC demo with Databricks integration
AWS Glue Demo Guide → — Iceberg tables in AWS Glue Data Catalog
BigQuery Demo Guide → — Streaming sink to BigQuery with Data Lineage API
Production Deployment → — Adapt these demos for your production environments