Configuration
Let's get your credentials set up. LineageBridge uses environment variables for all the sensitive stuff—API keys, tokens, secrets. This keeps your credentials out of code and version control.
How Configuration Works
We keep it simple with a few principles:
- Credentials only - We only store API keys and secrets, not runtime settings
- Environment variables - Everything uses the
LINEAGE_BRIDGE_prefix - .env file support - Drop your credentials in a
.envfile and we'll load them automatically - Smart fallbacks - If you don't provide service-specific keys, we fall back to your cloud-level credentials
Quick Setup (Start Here!)
Step 1: Create Your .env File
Copy our example to get started:
Step 2: Add Your Confluent Cloud Credentials
Open .env in your favorite editor and add your credentials:
# This is all you need to get started
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=your-cloud-api-key
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=your-cloud-api-secret
That's it! LineageBridge will use these cloud-level credentials for everything. You can add more specific credentials later if needed.
Keep Your Secrets Safe
Never commit .env files to git. We've already added them to .gitignore, but double-check before committing.
Confluent Cloud Credentials
Cloud-Level API Key (Required)
You need one cloud-level API key to get started. This lets LineageBridge discover your environments, clusters, and services. You'll need OrgAdmin or EnvironmentAdmin permissions.
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=ABC123DEF456
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=xyz789secretABC...
Here's how to create one:
- Log in to Confluent Cloud
- Go to Administration → API Keys
- Click + Add key
- Select Cloud resource as the scope
- Choose My account (or create a service account for production)
- Copy the key and secret immediately—Confluent won't show them again!
Service-Scoped Credentials (Optional)
Want more granular control? You can provide dedicated API keys for each Confluent service. If you don't, we'll just use your cloud API key as a fallback—works great for most cases.
Global Kafka credentials (applies to all clusters unless you override per-cluster):
LINEAGE_BRIDGE_SCHEMA_REGISTRY_ENDPOINT=https://psrc-xxxxx.us-east-1.aws.confluent.cloud
LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_KEY=sr-key
LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_SECRET=sr-secret
Auto-Discovery
We auto-discover the Schema Registry endpoint, so you usually don't need to set this. Only needed if you have multiple Schema Registry clusters.
We auto-discover ksqlDB endpoints from your clusters.
Per-Cluster Credentials (Advanced)
Got different API keys for each Kafka cluster? No problem. You can provide them as a JSON map:
LINEAGE_BRIDGE_CLUSTER_CREDENTIALS={"lkc-abc123":{"api_key":"key1","api_secret":"secret1"},"lkc-def456":{"api_key":"key2","api_secret":"secret2"}}
Here's that same config formatted for readability (don't actually put newlines in your .env):
{
"lkc-abc123": {
"api_key": "cluster1-key",
"api_secret": "cluster1-secret"
},
"lkc-def456": {
"api_key": "cluster2-key",
"api_secret": "cluster2-secret"
}
}
Easier in the UI
You can also add per-cluster credentials in the Streamlit UI sidebar instead of wrestling with JSON in your .env file.
How Credential Fallbacks Work
When LineageBridge needs to talk to a Kafka cluster, it looks for credentials in this order:
graph LR
A[Per-cluster credential] -->|Not found| B[Global Kafka credential]
B -->|Not found| C[Cloud-level credential]
C -->|Found!| D[Connect to cluster] - Per-cluster credential - from the
CLUSTER_CREDENTIALSJSON map - Global Kafka credential - from
KAFKA_API_KEY/KAFKA_API_SECRET - Cloud-level credential - from
CONFLUENT_CLOUD_API_KEY(the ultimate fallback)
Data Catalog Credentials
Want to connect lineage to your data catalog? Here's how to set up each one.
Databricks Unity Catalog
To hook up Databricks UC:
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URL=https://myworkspace.cloud.databricks.com
LINEAGE_BRIDGE_DATABRICKS_TOKEN=dapi123456789abcdef
LINEAGE_BRIDGE_DATABRICKS_WAREHOUSE_ID=abc123def456
Where to find these:
This is your Databricks workspace URL. It looks like:
Just copy it from your browser when you're logged into Databricks.
Create a personal access token:
- In Databricks, go to User Settings (click your email in top right)
- Click Developer → Access Tokens
- Click Generate new token
- Copy it immediately—you won't see it again!
Production Tip
For production, use a service principal token instead of your personal token. That way the integration doesn't break when you leave the company.
Find your SQL Warehouse ID:
- Go to SQL Warehouses in Databricks
- Click on the warehouse you want to use
- Go to Connection Details
- Copy the Server Hostname - the ID is in there (format:
abc123def456)
AWS Glue
For AWS Glue, we use the standard AWS credential chain—same as any other AWS tool:
You'll need these AWS permissions:
glue:GetDatabaseglue:GetTableglue:UpdateTable
Set up your AWS credentials using one of these methods:
The easiest way if you have the AWS CLI installed:
Follow the prompts to enter your credentials.
If you're running on EC2, ECS, or Lambda, use an IAM role. No credentials needed—AWS handles it automatically.
Google Data Lineage
For Google Cloud Data Lineage:
Authentication works via Google's Application Default Credentials (ADC):
Make sure these APIs are enabled in your GCP project:
- Data Lineage API
- BigQuery API (if you're using BigQuery connectors)
REST API Configuration
If you plan to run the LineageBridge REST API:
LINEAGE_BRIDGE_API_KEY=your-api-key-here
LINEAGE_BRIDGE_API_HOST=0.0.0.0
LINEAGE_BRIDGE_API_PORT=8000
- API_KEY: Optional authentication key. If unset, the API runs without authentication.
- API_HOST: Bind address (default:
0.0.0.0for all interfaces) - API_PORT: Port number (default:
8000)
Production Deployment
Always set API_KEY in production to protect your lineage data.
Logging
Control log verbosity:
Valid levels:
DEBUG- Verbose output including API requests and responsesINFO- Standard output (default)WARNING- Warnings and errors onlyERROR- Errors only
Complete .env Example
Here's what a fully-configured .env looks like with all the bells and whistles:
# ============================================================================
# LineageBridge - Minimal Configuration
# ============================================================================
# This is all you need to get started
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=ABC123DEF456
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=verylongsecretstring123
# Optional: control log verbosity
LINEAGE_BRIDGE_LOG_LEVEL=INFO
# ============================================================================
# LineageBridge - With Databricks Unity Catalog
# ============================================================================
# Confluent Cloud (required)
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=ABC123DEF456
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=verylongsecretstring123
# Databricks Unity Catalog
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URL=https://dbc-a1b2c3d4-5678.cloud.databricks.com
LINEAGE_BRIDGE_DATABRICKS_TOKEN=dapi123456789abcdef
LINEAGE_BRIDGE_DATABRICKS_WAREHOUSE_ID=abc123def456
# Logging
LINEAGE_BRIDGE_LOG_LEVEL=INFO
# ============================================================================
# LineageBridge - Full Configuration
# ============================================================================
# ── Confluent Cloud (required) ──────────────────────────────────────────
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=ABC123DEF456
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=verylongsecretstring123
# ── Confluent Services (optional - falls back to cloud key) ─────────────
LINEAGE_BRIDGE_KAFKA_API_KEY=kafka-cluster-key
LINEAGE_BRIDGE_KAFKA_API_SECRET=kafka-cluster-secret
LINEAGE_BRIDGE_SCHEMA_REGISTRY_ENDPOINT=https://psrc-xxxxx.us-east-1.aws.confluent.cloud
LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_KEY=sr-key
LINEAGE_BRIDGE_SCHEMA_REGISTRY_API_SECRET=sr-secret
LINEAGE_BRIDGE_KSQLDB_API_KEY=ksql-key
LINEAGE_BRIDGE_KSQLDB_API_SECRET=ksql-secret
LINEAGE_BRIDGE_FLINK_API_KEY=flink-key
LINEAGE_BRIDGE_FLINK_API_SECRET=flink-secret
LINEAGE_BRIDGE_TABLEFLOW_API_KEY=tableflow-key
LINEAGE_BRIDGE_TABLEFLOW_API_SECRET=tableflow-secret
# ── Per-Cluster Credentials (optional) ──────────────────────────────────
# LINEAGE_BRIDGE_CLUSTER_CREDENTIALS={"lkc-abc123":{"api_key":"key1","api_secret":"secret1"}}
# ── Databricks Unity Catalog (optional) ─────────────────────────────────
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URL=https://myworkspace.cloud.databricks.com
LINEAGE_BRIDGE_DATABRICKS_TOKEN=dapi123456789abcdef
LINEAGE_BRIDGE_DATABRICKS_WAREHOUSE_ID=abc123def456
# ── AWS Glue (optional) ──────────────────────────────────────────────────
LINEAGE_BRIDGE_AWS_REGION=us-east-1
# ── Google Cloud (optional) ──────────────────────────────────────────────
LINEAGE_BRIDGE_GCP_PROJECT_ID=my-gcp-project
LINEAGE_BRIDGE_GCP_LOCATION=us-central1
# ── REST API (optional) ──────────────────────────────────────────────────
LINEAGE_BRIDGE_API_KEY=my-secure-api-key
LINEAGE_BRIDGE_API_HOST=0.0.0.0
LINEAGE_BRIDGE_API_PORT=8000
# ── Logging ──────────────────────────────────────────────────────────────
LINEAGE_BRIDGE_LOG_LEVEL=INFO
Environment-Specific Configuration
For multiple environments (dev, staging, prod), use different .env files:
Load a specific env file:
Or use dotenv directly:
Security Best Practices
Here are some tips to keep your credentials safe:
Never Commit Credentials
Make sure .env is in your .gitignore (we've already done this, but always verify):
Common Mistake
Don't accidentally commit .env.production or .env.local—the wildcard .env.* protects you.
Use Least Privilege
Give API keys only the permissions they actually need:
- Use read-only keys for LineageBridge (we don't modify your data)
- Don't use OrgAdmin keys if EnvironmentAdmin will do
- Scope keys to specific environments when possible
Rotate Credentials Regularly
Change your API keys and tokens periodically, especially:
- After someone leaves the team
- If credentials might have been exposed (chat logs, screenshots, etc.)
- Every 90 days as good security hygiene
Use Service Accounts in Production
For production deployments, use machine accounts instead of personal credentials:
Use service accounts instead of your personal API keys:
- In Confluent Cloud, go to Accounts & access
- Create a service account (e.g.,
lineage-bridge-prod) - Generate API keys for that service account
- Use those keys in your production
.env
Use service principals instead of personal access tokens:
- Create a service principal in Databricks
- Grant it the necessary permissions
- Generate a token for the service principal
- Use that token in your
.env
Use IAM roles instead of access keys when running on AWS infrastructure (EC2, ECS, Lambda):
- Attach an IAM role to your compute resource
- Grant the role
glue:*permissions - No credentials needed in
.env!
Encrypt Secrets at Rest (Production)
For production deployments, consider using a secrets manager:
- AWS Secrets Manager - Great if you're on AWS
- HashiCorp Vault - Works everywhere
- Kubernetes Secrets - If you're running on K8s (enable encryption at rest)
Then load secrets at runtime instead of using .env files.
Next Steps
Now that your credentials are configured:
- Quickstart → - Extract your first lineage graph
- Credential Management → - Advanced credential management patterns
Storage Backend
LineageBridge persists graphs / extraction tasks / OpenLineage events / watchers via a pluggable storage layer. Pick a backend with LINEAGE_BRIDGE_STORAGE__BACKEND (note the double underscore — it's the pydantic-settings nested-config delimiter):
# Default — process-local, lost on restart. Fine for the UI demo and tests.
LINEAGE_BRIDGE_STORAGE__BACKEND=memory
# JSON files under LINEAGE_BRIDGE_STORAGE__PATH (default: ~/.lineage_bridge/storage).
# Each graph is one file; events.jsonl is append-only. Cross-process safe via flock.
LINEAGE_BRIDGE_STORAGE__BACKEND=file
LINEAGE_BRIDGE_STORAGE__PATH=~/.lineage_bridge/storage
# Single-file SQLite at {path}/storage.db. WAL mode, durable across restarts.
# Recommended for the watcher (the watcher's daemon-mode state lives here so
# the UI sees it after a restart).
LINEAGE_BRIDGE_STORAGE__BACKEND=sqlite
LINEAGE_BRIDGE_STORAGE__PATH=~/.lineage_bridge/storage
REST API Endpoint (UI ↔ API)
When the Streamlit UI talks to the FastAPI process (e.g. for the watcher controls), it resolves the URL from LINEAGE_BRIDGE_API_URL. Default is http://127.0.0.1:8000. Set it when the API runs in a separate container:
Troubleshooting
Credential Not Found Errors
If you see "Cloud API key not configured":
- Verify
.envexists in the project root - Check variable names use the
LINEAGE_BRIDGE_prefix - Ensure no extra spaces around
=in.env - Try explicitly loading:
export $(cat .env | xargs)
Authentication Failures
If you get 401 Unauthorized errors:
- Verify credentials are valid in Confluent Cloud UI
- Check the API key has appropriate permissions
- Ensure you're using the correct credential type (cloud vs. cluster-scoped)
Schema Registry Issues
If Schema Registry extraction fails:
- Ensure the endpoint URL is correct
- Verify the API key has Schema Registry access
- Try omitting the endpoint to use auto-discovery
For more help, see Credential Issues.