Contributing to LineageBridge

Thank you for your interest in contributing to LineageBridge! We welcome contributions of all kinds - bug reports, feature requests, documentation improvements, and code contributions.

Quick Start

Read the Development Setup guide
Check the Testing guide for running tests
If adding new extractors, see Adding Extractors
Submit your PR following the guidelines below

Code of Conduct

We are committed to providing a welcoming and inclusive environment. Please be respectful and constructive in all interactions.

How to Contribute

Reporting Bugs

If you find a bug, please open a GitHub issue with:

Clear title: Describe the issue concisely
Expected behavior: What you expected to happen
Actual behavior: What actually happened
Reproduction steps: Step-by-step instructions to reproduce the issue
Environment details: Python version, OS, LineageBridge version
Logs: Any relevant error messages or stack traces

Example:

Title: FlinkClient fails to parse TUMBLE windowing functions

Expected: Flink SQL with TUMBLE(TABLE ...) should extract table as source
Actual: Parser raises IndexError on TUMBLE clause
Steps: Run extraction on environment env-xyz with Flink statement flink-abc123
Environment: Python 3.11.5, macOS 14.2, LineageBridge 0.4.0
Logs: [paste stack trace]

Requesting Features

Feature requests are welcome! Please open an issue describing:

Use case: What problem does this solve?
Proposed solution: How should it work?
Alternatives considered: Other approaches you've thought about
Impact: Who would benefit from this feature?

Contributing Code

We follow a standard fork-and-pull-request workflow:

Fork the repository on GitHub

Clone your fork locally:

git clone https://github.com/YOUR-USERNAME/lineage-bridge.git
cd lineage-bridge

Create a feature branch from main:
```
git checkout -b feature/my-feature
```

Make your changes and commit with clear messages:

git add -A
git commit -m "Add support for Flink HOP windowing"

Run tests and linting before pushing:
```
make test
make lint
```
Push to your fork:
```
git push origin feature/my-feature
```
Open a Pull Request on GitHub

Pull Request Guidelines

Before Submitting

Run tests: make test or uv run pytest tests/ -v
Check linting: make lint or uv run ruff check .
Format code: make format or uv run ruff format .
Update documentation if adding features or changing behavior
Add tests for new functionality

PR Description

Your PR description should include:

What changed: Brief summary of the changes
Why: Rationale and motivation
How: Implementation approach (for complex changes)
Testing: How you tested the changes
Breaking changes: Note any API changes or migrations needed

Example:

## What
Add support for Google BigQuery as a catalog provider

## Why
Users want to materialize Kafka topics to BigQuery via Dataflow
and track lineage in BigQuery's native metadata system.

## How
- Implemented BigQueryProvider following the CatalogProvider protocol
- Added BigQuery client using google-cloud-bigquery SDK
- Set table descriptions and labels via BigQuery API

## Testing
- Unit tests with mocked BigQuery client
- Integration test against real BigQuery project (manual)
- Added fixture data in tests/fixtures/bigquery_tables.json

## Breaking Changes
None - purely additive.

Review Process

All PRs require review before merging
Reviewers may request changes - please respond constructively
Once approved, a maintainer will merge your PR
Keep PRs focused - one feature or fix per PR

Areas Needing Contribution

Here are some areas where we'd especially welcome contributions:

New Catalog Providers

Google BigQuery: Materialize to BigQuery via Dataflow
Snowflake: Support Snowpipe streaming
Azure Data Explorer: Kafka Connect integration
Iceberg REST Catalog: Direct lineage to Iceberg tables

See Adding Extractors for guidance.

Enhanced Extractors

Schema Registry: Subject references, schema evolution tracking
ksqlDB: Stream-table joins, windowing functions
Flink: Complex window patterns, catalog integrations
Tableflow: Multi-topic joins, transformation tracking

UI Improvements

Graph layout algorithms (force-directed, hierarchical)
Advanced filtering (by tags, metadata, date ranges)
Export formats (SVG, PNG, Mermaid diagram)
Diff view for lineage changes over time

Testing & Infrastructure

Integration test coverage for all extractors
Performance benchmarks for large graphs
Docker Compose local development setup
CI/CD pipeline improvements

Development Workflow

Setting Up Your Environment

See Development Setup for detailed instructions.

Quick version:

# Install dependencies
uv pip install -e ".[dev]"

# Run tests
make test

# Run UI locally
make ui

# Run extraction
make extract

Coding Standards

Python 3.11+ required
Type hints on all public functions
Docstrings for public modules, classes, functions
Ruff for linting and formatting (line length: 100 chars)
pytest for tests (asyncio_mode = "auto")

Commit Messages

Write clear, descriptive commit messages:

Use present tense: "Add feature" not "Added feature"
Keep first line under 72 characters
Add details in the body if needed

Good:

Add HOP windowing support to FlinkClient

Extends the Flink SQL parser to recognize HOP window functions
and extract the source table from HOP(TABLE <name>, ...).

Avoid:

Fix stuff
Updated flink
wip

Community & Support

GitHub Issues: Bug reports and feature requests
GitHub Discussions: Questions and general discussion
Pull Requests: Code contributions

License

By contributing to LineageBridge, you agree that your contributions will be licensed under the Apache License 2.0.

Ready to contribute? Start with the Development Setup guide, or browse open issues tagged with good-first-issue on GitHub.