You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »



Marquez Monthly Community Meeting

The Marquez Community Meeting occurs on the fourth Thursday of each month. Meetings are held on Zoom.

Next meeting: April 28, 2022

March 31, 2022

Attendees:

TSC:

  • Willy Lulciuc, Co-creator of Marquez
  • Michael Collado, Staff Engineer, Astronomer
  • Julien Le Dem, Chief Architect, Astronomer
  • Peter Hicks, Senior Engineer, Astronomer

And:

  • Ross Turk, Sr. Director of Community, Astronomer
  • Minkyu Park, Senior Engineer, Astronomer
  • John Thomas, Support Engineer, Astronomer
  • Michael Robinson, Developer Relations Engineer, Astronomer
  • Howard Yoo, Staff Product Manager, Astronomer

Agenda:

  • Website update
  • Backlog and roadmap discussion
  • Open discussion

Meeting:

Notes:

Announcements [Michael R.]

  • Marquez stickers are now available: https://www.astronomer.io/datakin-swag
  • Willy and Julien gave a talk on OpenLineage, Airflow and Marquez at Data Council Austin on March 23
  • The project's Github star count stands at 983. Have you starred the project yet?
  • 1k stars are a requirement for graduation status from the LFAI. The project is nearing completion of all requirements, so formal application will be possible soon.

Website [Ross]

  • The project now has a new website.
  • Appropriately, it's an open-source project; PRs are welcome.
  • Tech: Gatsby, Github Projects
  • Dev: run yarn deploy to work on it
  • Plans: blog page. Proposals for posts welcome. Post in Slack or open a PR.

Backlog and roadmap [Willy]

  • Issue: currently, PRs are driven by a small team (e.g., Peter's view for dataset version's, Pawel's lifecycle PR)
  • How to get the broader community involved? Want people to have more input/control over the issues we take up.
  • Solution: Github's Roadmap feature. Milestones and releases visible there. Choose Marquez on the Projects tab.
  • Process: review issues on monthly basis, move to roadmap, then release.
  • Question from Howard about how to propose new features
  • Follow-up work: discussion of how to prioritize issues; documentation needed about how to label new issues (e.g., as "features")
  • Comment from Michael C.: it's possible to add new columns to the roadmap, in addition to new issues.

Open discussion

  • Michael C.: please note issue #1928: supporting job grouping and hierarchy.
    • Problem: the project does not track parent/child job relationships, despite this nomenclature being used in OpenLineage to describe related jobs.
    • Proposal: a parent_job_id column should be added to the jobs table and to the runs table, both being uuids. 
  • Michael R.: please note that the meeting typically takes place on the 4th Thursday of each month.

February 24, 2022

Attendees:

TSC:

  • Willy Lulciuc, Co-creator of Marquez
  • Michael Collado, Staff Engineer, Datakin

And:

  • Minkyu Park, Senior Engineer, Datakin
  • Michael Robinson, Developer Relations Engineer, Datakin
  • Ross Turk, VP of Marketing, Datakin

Agenda:

  • Review of integrations to create runs and associate metadata with runs (replaced with OpenLineage)
  • Demo: How to collect OpenLineage events with the lineage API to send metadata to Marquez
  • Demo: OL Java client
  • Dataset lifecycle management
  • Open discussion

Meeting:

Notes:

Announcements [Willy]

  • Release date of 0.21.0 is now 2/28
  • Confusion in the community about which Java client to use is being addressed in OpenLineage PR #480
    • We hope to have this merged for the next OL release

Integrations and OL demo [Willy]

  • OL integration
    • Available at openlineage.io/integration/, where you can also find instructions for installing and configuring it
    • Requirements.txt needs to install airflow
    • Set OpenLineage URL to local instance of Marquez
    • Marquez is moving towards using a task listener to pull metadata in real time 
    • For now use the OL Airflow DAG
    • You can still use the OL backend; there are limitations there, however
  • Spark integration
    • When doing the Spark submit command you need to provide configuration - specify the extra listener (thanks to Michael C for his work on this)

    • Point the host to your deployment

    • See the OL website for more details (openlineage.io/integration/spark-spark)
  • Upcoming: Flink and Kafka
  • Your feedback on these integrations appreciated
  • There are many connections you can use in your platform by switching over to OL to collect metadata

OL Java client demo [Willy]

Dataset lifecycle management [Willy]

  • Marquez can now capture changes to dataset names
  • Community voiced desire for this feature
  • Marquez now supports soft deletes of datasets
  • See PR #1847
  • Support of lifecycle now more concrete: can see the phases datasets go through

Open discussion

  • Julien and Willy will be speaking in-person at the Data Council conference in Austin next month (March 23-24)
  • Michael C. will be presenting virtually at the Subsurface LIVE conference (March 2-3); topic: Spark 

January 27, 2022

Attendees:

TSC:

  • Willy Lulciuc, Co-creator of Marquez
  • Julien Le Dem, CTO of Datakin
  • Michael Collado, Staff Engineer, Datakin
  • Peter Hicks, Senior Engineer, Datakin
  • Kevin Mellott, Assistant Director of Data Engineering, Northwestern Mutual

And:

  • Ross Turk, VP of Marketing, Datakin
  • Minkyu Park, Senior Engineer, Datakin
  • John Thomas, Support Engineer, Datakin
  • Michael Robinson, Developer Relations Engineer, Datakin

Agenda:

  • Marquez recent releases overview [Willy] 
    • Marquez release 0.21.0 overview
      • Upgrade to Java17
  • Migrating integrations to OpenLineage [Willy]
  • Cloud-based development instance of Marquez via Gitpod [Peter]
  • Open discussion

Meeting:

Notes:

0.21.0 overview [Willy]

  • Features:
    • Bug fixes
    • Removal of excess code
    • Upgrade to Java17
      • API image migrated
      • Eclipse Temurin integrated
      • All CI deployment updated to support Java17
  • Discussion [Kevin, Willy, Michael C.]:
    • Support for Java client possible in lower version
    • Proposed: schedule separate meeting about this

Migrating integrations to OpenLineage [Willy]

  • Spark library in Marquez now deprecated
  • Use of OpenLineage Spark integration recommended going forward
    • review the docs about how to configure your instance
    • remember to add underscore to marquez_airflow
  • OpenLineage integration allows task listener
    • workaround: import DAG from OpenLineage
  • See the changelog: environment variables for the Airflow instance have changed

Cloud-based development instance of Marquez [Peter]

  • Enabled by integration of Gitpod
  • Docker image in the cloud with Marquez and UI
  • Ideal for those not ready to install everything locally or who are having issues with their OS
  • Fast (30 seconds), eliminates risk
  • API also available
  • Can be made private or public
  • Big advantage: shareable within organizations via URL
  • Supports everything one could do locally in VS Code or similar IDE
  • Discussion [Willy, Peter, Kevin, Julien]:
    • common use case: potential users want to see metadata from their org and share the tool
    • potential side-effect: increase in Docker pulls
    • availability of metrics unknown
    • email address required

Open Discussion

  • Advantages of possible move from CircleCI to Github Actions 
    • CircleCI downsides: outages, billing issues [Willy]
    • Julien proposed: moving to Github actions eventually after running both in parallel
    • Kevin asked to experiment with Github Actions and report back
  • Issue #1800: add support for table operations reported from OpenLineage
    • Formal solution needed [Willy]
    • Willy proposed: deploy in two modes and use flags (Julien agreed)
  • NodeID
    • An easy win: add a field that returns a nodeID [Willy]
    • Willy proposed: prioritize in next release

Marquez Workflow Group Calendar Overview

Effective March 22, 2019: Group calendars are managed within LF AI Foundation Groups.io subgroups (mail lists); with each sub-group (mail list) having a unique group calendar. Meeting invites from these group calendars are sent to the applicable sub-group (mail list). In order to see the various group calendars you must:

View Instructions on How to Subscribe to LF AI Group Calendars

For detailed information on LF AI meeting management processes view this page: LF AI Foundation - Community Meetings and Calendars



Marquez Meetings List

Schedule

Title

Owner

Subgroup (mail list)

Purpose

Dial In Link

Day of Week (frequency) 00:00 AM/PM - 00:00 AM/PM (timezone)Meeting Title (Zoom Account Used)

Meeting Owner/Moderator

marquez-mail-list@lists.lfai.foundation


Meeting Purpose


Zoom Name: https://zoom.us/...
















Marquez Group Calendar 

  1. EDIT THE CALENDAR

    Customise the different types of events you'd like to manage in this calendar.

    #legIndex/#totalLegs
  2. RESTRICT THE CALENDAR

    Optionally, restrict who can view or add events to the team calendar.

    #legIndex/#totalLegs
  3. SHARE WITH YOUR TEAM

    Grab the calendar's URL and email it to your team, or paste it on a page to embed the calendar.

    #legIndex/#totalLegs
  4. ADD AN EVENT

    The calendar is ready to go! Click any day on the calendar to add an event or use the Add event button.

    #legIndex/#totalLegs
  5. SUBSCRIBE

    Subscribe to calendars using your favourite calendar client.

    #legIndex/#totalLegs

  • No labels