Docker Deployment
Run LineageBridge in Docker containers for production deployments, reproducible extractions, and multi-stage workflows.
Overview
LineageBridge provides a multi-stage Dockerfile and Docker Compose profiles for different use cases:
- ui — Run the Streamlit UI in a container
- extract — One-shot extraction, then exit
- watch — Continuous change-detection watcher
- api — REST API server (future)
This guide shows you how to build images, run containers, and deploy at scale.
Quick Start
Build the Image
This builds the image using the Dockerfile at /infra/docker/Dockerfile.
Expected output:
[+] Building 45.2s (12/12) FINISHED
=> [builder 1/4] FROM python:3.11-slim
=> [builder 2/4] COPY pyproject.toml .
=> [builder 3/4] COPY lineage_bridge/ lineage_bridge/
=> [builder 4/4] RUN pip install build && python -m build
=> [stage-1 1/5] FROM python:3.11-slim
=> [stage-1 2/5] COPY --from=builder /app/dist/*.whl /tmp/
=> [stage-1 3/5] RUN pip install /tmp/*.whl && rm /tmp/*.whl
=> [stage-1 4/5] COPY lineage_bridge/ lineage_bridge/
=> [stage-1 5/5] RUN mkdir -p /app/data
=> naming to docker.io/library/lineage-bridge:latest
Run the UI
This starts the Streamlit UI at http://localhost:8501.
Expected output:
Run Extraction
This runs a one-shot extraction and saves the graph to ./data/lineage_graph.json.
Expected output:
[+] Running 1/1
✔ Container infra-extract-1 Started
[INFO] Discovering environments...
[INFO] Found 2 environments
[INFO] Extracting from env-abc123...
[INFO] Extraction complete (87 nodes, 134 edges)
[INFO] Graph saved to /app/data/lineage_graph.json
Run the Watcher
This starts the change-detection watcher in the foreground. Press Ctrl+C to stop.
Expected output:
[+] Running 1/1
✔ Container infra-watch-1 Started
Polling Confluent Cloud every 10s (cooldown: 30s)
Press Ctrl+C to stop
Docker Compose Profiles
The docker-compose.yml file defines three profiles:
Profile: ui
Purpose: Run the Streamlit UI.
Service definition:
ui:
build:
context: ../..
dockerfile: infra/docker/Dockerfile
ports:
- "8501:8501"
env_file: ../../.env
environment:
- LINEAGE_BRIDGE_EXTRACT_OUTPUT_PATH=/app/data/lineage_graph.json
volumes:
- ../../data:/app/data
profiles:
- ui
Key features:
- Exposes port 8501 (Streamlit default)
- Mounts
./datafor graph persistence - Loads credentials from
.env
Run it:
Profile: extract
Purpose: One-shot extraction, then exit.
Service definition:
extract:
build:
context: ../..
dockerfile: infra/docker/Dockerfile
command: ["lineage-bridge-extract"]
env_file: ../../.env
environment:
- LINEAGE_BRIDGE_EXTRACT_OUTPUT_PATH=/app/data/lineage_graph.json
volumes:
- ../../data:/app/data
profiles:
- extract
Key features:
- Runs
lineage-bridge-extractand exits - Saves output to
./data/lineage_graph.json - Use
run --rmto remove the container after completion
Run it:
Profile: watch
Purpose: Continuous change-detection watcher.
Service definition:
watch:
build:
context: ../..
dockerfile: infra/docker/Dockerfile
command:
- lineage-bridge-watch
- --env
- ${LINEAGE_BRIDGE_WATCH_ENV:-env-change-me}
- --cooldown
- "${LINEAGE_BRIDGE_WATCH_COOLDOWN:-30}"
env_file: ../../.env
volumes:
- ../../data:/app/data
profiles:
- watch
Key features:
- Polls Confluent Cloud every 10 seconds (default)
- Re-extracts lineage 30 seconds after the last detected change (cooldown)
- Runs in the foreground (use
docker compose up -dto daemonize)
Run it:
$ LINEAGE_BRIDGE_WATCH_ENV=env-abc123 docker compose -f infra/docker/docker-compose.yml --profile watch up
Environment Variables
Required
All services need a Confluent Cloud API key:
# In .env file
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_KEY=your-cloud-api-key
LINEAGE_BRIDGE_CONFLUENT_CLOUD_API_SECRET=your-cloud-api-secret
Optional
Override extraction settings:
# Output path (default: /app/data/lineage_graph.json)
LINEAGE_BRIDGE_EXTRACT_OUTPUT_PATH=/app/data/custom.json
# Watcher settings
LINEAGE_BRIDGE_WATCH_ENV=env-abc123
LINEAGE_BRIDGE_WATCH_COOLDOWN=60
# Catalog credentials (for enrichment)
LINEAGE_BRIDGE_DATABRICKS_WORKSPACE_URL=https://myworkspace.cloud.databricks.com
LINEAGE_BRIDGE_DATABRICKS_TOKEN=dapi...
LINEAGE_BRIDGE_AWS_REGION=us-east-1
Passing Environment Variables
Volume Mounts
Data Directory
The ./data directory is mounted to /app/data in the container:
What it contains:
lineage_graph.json— Extracted lineage graphlineage_graph_uc.json— UC-specific lineage export (if using Databricks)- Temporary files (auto-cleaned)
Persistence: Data survives container restarts and removals.
.env File
The .env file is loaded via env_file:
Security note: Do NOT mount .env as a volume in production. Use Docker secrets or environment variables instead.
Dockerfile Deep Dive
The Dockerfile uses a multi-stage build to minimize image size.
Stage 1: Builder
FROM python:3.11-slim AS builder
WORKDIR /app
COPY pyproject.toml .
COPY lineage_bridge/ lineage_bridge/
RUN pip install --no-cache-dir build && \
python -m build --wheel --outdir /app/dist
What it does:
- Copies source code and
pyproject.toml - Builds a wheel package
- Stores it in
/app/dist
Stage 2: Runtime
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /app/dist/*.whl /tmp/
RUN pip install --no-cache-dir /tmp/*.whl && rm /tmp/*.whl
COPY lineage_bridge/ lineage_bridge/
RUN mkdir -p /app/data
RUN mkdir -p /root/.streamlit && \
printf '[server]\nheadless = true\nport = 8501\naddress = "0.0.0.0"\n\n[browser]\ngatherUsageStats = false\n' \
> /root/.streamlit/config.toml
EXPOSE 8501
CMD ["streamlit", "run", "lineage_bridge/ui/app.py"]
What it does:
- Installs the wheel from the builder stage
- Copies source code (needed for Streamlit to import modules)
- Creates the
/app/datadirectory - Configures Streamlit for headless mode
- Sets the default command to run the UI
Image size: ~500 MB (Python 3.11 base + dependencies)
Multi-Stage Deployment
Development: Local Extraction + Containerized UI
Extract locally, view in Docker:
Why: Faster iteration during development (no image rebuild).
Staging: Scheduled Extraction + Persistent UI
Run extraction on a cron schedule, keep UI running:
# Cron job (runs every hour)
0 * * * * cd /path/to/lineage-bridge && make docker-extract
# Long-running UI
$ docker compose -f infra/docker/docker-compose.yml --profile ui up -d
Why: Fresh lineage data every hour, always-on UI.
Production: Watcher + UI
Run both watcher and UI:
Why: Automatic re-extraction on changes, always-on UI.
Resource Limits
Memory
LineageBridge is memory-efficient, but large graphs can use significant memory during rendering.
Recommended limits:
- Extract: 512 MB (sufficient for most clusters)
- UI: 1 GB (supports large graphs with 1000+ nodes)
- Watch: 512 MB (minimal overhead)
Set limits in docker-compose.yml:
CPU
LineageBridge is I/O-bound (network calls to Confluent Cloud). CPU usage is low.
Recommended limits:
- Extract: 0.5 CPU
- UI: 1 CPU (for responsive UI)
- Watch: 0.5 CPU
Set limits in docker-compose.yml:
Optimization
Reduce Image Size
- Use Alpine-based Python: Replace
python:3.11-slimwithpython:3.11-alpinein the Dockerfile.
Trade-off: Smaller image (~200 MB) but longer build time (needs to compile some dependencies).
-
Remove dev dependencies: The wheel includes only runtime dependencies. Dev dependencies (pytest, ruff) are excluded.
-
Multi-arch builds: Build for ARM64 and AMD64:
Cache Docker Layers
Build with BuildKit caching:
$ DOCKER_BUILDKIT=1 docker build --cache-from lineage-bridge:latest -t lineage-bridge:latest -f infra/docker/Dockerfile .
Or use a cache mount:
Troubleshooting
"Could not find .env file"
Cause: .env doesn't exist or is not in the expected location.
Solution:
-
Create
.envfrom the example: -
Verify the path in
docker-compose.yml:
This path is relative to the docker-compose.yml file, not your current directory.
"Permission denied: /app/data/lineage_graph.json"
Cause: The ./data directory doesn't exist or has incorrect permissions.
Solution:
-
Create the directory:
-
Set permissions:
-
Verify the mount:
"Streamlit is not accessible at localhost:8501"
Cause: The UI service didn't start, or the port is already in use.
Solution:
-
Check container logs:
-
Check if port 8501 is in use:
-
Change the port in
docker-compose.yml:
"Watcher exits immediately"
Cause: Missing LINEAGE_BRIDGE_WATCH_ENV environment variable.
Solution:
-
Set the environment variable:
-
Or pass it directly:
"Image build fails: no such file or directory"
Cause: Docker build context is incorrect.
Solution:
- Verify the build context in
docker-compose.yml:
The context is ../.. (project root), not infra/docker.
- Build manually to debug:
Next Steps
- Integrate with CI/CD to automate Docker builds and deployments
- Manage credentials securely using Docker secrets
- Set up multi-environment extraction in Docker containers