DataHub Releases
Summary
Version | Release Date | Links |
---|---|---|
v0.12.0 | 2023-10-25 | Release Notes, View on GitHub |
v0.11.0 | 2023-09-08 | Release Notes, View on GitHub |
v0.10.5 | 2023-08-02 | Release Notes, View on GitHub |
v0.10.4 | 2023-06-09 | View on GitHub |
v0.10.3 | 2023-05-25 | View on GitHub |
v0.10.2 | 2023-04-13 | View on GitHub |
v0.10.1 | 2023-03-23 | View on GitHub |
v0.10.0 | 2023-02-07 | View on GitHub |
v0.9.6.1 | 2023-01-31 | View on GitHub |
v0.9.6 | 2023-01-13 | View on GitHub |
v0.9.5 | 2022-12-23 | View on GitHub |
v0.9.4 | 2022-12-20 | View on GitHub |
v0.9.3 | 2022-11-30 | View on GitHub |
v0.9.2 | 2022-11-04 | View on GitHub |
v0.9.1 | 2022-10-31 | View on GitHub |
v0.9.0 | 2022-10-11 | View on GitHub |
v0.8.45 | 2022-09-23 | View on GitHub |
v0.8.44 | 2022-09-01 | View on GitHub |
v0.8.43 | 2022-08-09 | View on GitHub |
v0.8.42 | 2022-08-03 | View on GitHub |
v0.8.41 | 2022-07-15 | View on GitHub |
v0.8.40 | 2022-06-30 | View on GitHub |
v0.8.39 | 2022-06-24 | View on GitHub |
v0.8.38 | 2022-06-09 | View on GitHub |
v0.8.37 | 2022-06-09 | View on GitHub |
v0.8.36 | 2022-06-02 | View on GitHub |
v0.8.35 | 2022-05-18 | View on GitHub |
v0.8.34 | 2022-05-04 | View on GitHub |
v0.8.33 | 2022-04-15 | View on GitHub |
v0.8.32 | 2022-04-04 | View on GitHub |
v0.12.0
Released on 2023-10-25 by @pedro93.
v0.12.0 Release Highlights
User Experience
Nested Domains
Nested Domains are here! This provides flexibility in organizing your entities within Domains to match the unique organizational structure of your company. <img width="1209" alt="CleanShot 2023-10-27 at 14 30 43@2x" src="https://github.com/datahub-project/datahub/assets/15873986/07e6754c-95cd-4552-8120-50bb2d3fa9ce">
DataHub Chrome Extension Improvements
The Acryl DataHub Chome extension now supports PowerBI! This is a super powerful way for your business users to gain DataHub-specific insights directly in the BI tools they use most. Additionally, we now support making edits back to DataHub Entities directly from the Chrome extension.
Access Management Tab for Datasets
Shoutout to @Ramendra761 from the PayPal Team for contributing a new Access Management tab in Dataset Entity pages! The aim of this feature is to enable users to view the required roles for accessing the Dataset, as defined by Roles and/or Policies in the organization’s Access Management System. It also introduces the ability to request access directly from the page. <img width="912" alt="CleanShot 2023-10-27 at 14 09 51@2x" src="https://github.com/datahub-project/datahub/assets/15873986/29d7bdda-864f-4cf8-bd7a-5be46413bba8">
Metadata Ingestion
Miscellaneous Improvements
- Sampling-Based Profiling: You can now configure sampling-based profiling to address query performance concerns in Snowflake and BigQuery
- Kafka Connect > Snowflake: We now support automatically defining lineage between the two platforms
- Athena: Support for complex and nested schemas
Column-Level Lineage
We are incubating CLL support for the following:
- Airflow plugin v2 now supports automatic extraction of CLL for certain operators, removing the need to annotate DAGs
- dbt
- Redshift
- PowerBI (support for Column-Level Lineage for M-Query)
Incubating Sources
- MLflow
- Teradata
- Unity Catalog Notebooks
- DynamoDB
Developer Experience
- Data Contracts: v0.12.0 introduces underlying models and CLI; UI support to follow
- We now support creating custom models without requiring a fork of the main DataHub project
- Updates to support OpenSearch 2.x and alternate Postgres db in postgres-setup
Other Notable Changes
- Session token configuration has changed, all previously created session tokens will be invalid and users will be prompted to log in. Expiration time has also been shortened which may result in more login prompts with the default settings. There should be no other interruption due to this change.
Breaking Changes
- #9044 - GraphQL APIs for adding ownership now expect either an
ownershipTypeUrn
referencing a customer ownership type or a (deprecated)type
. Where before adding an ownership without a concrete type was allowed, this is no longer the case. For simplicity you can use thetype
parameter which will get translated to a custom ownership type internally if one exists for the type being added. - #9010 - In Redshift source's config
incremental_lineage
is set default to off. - #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
- #8942 - Removed
urn:li:corpuser:datahub
owner for theMeasure
,Dimension
andTemporal
tags emitted by Looker and LookML source connectors. - #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include
pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
. To continue using the v1 plugin, set theDATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN
environment variable totrue
. - #8943 - The Unity Catalog ingestion source has a new option
include_metastore
, which will cause all urns to be changed when disabled. This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future. If stateful ingestion is enabled, simply settinginclude_metastore: false
will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:datahub delete --platform databricks --soft
and then reingesting withinclude_metastore: false
. - #8846 - Changed enum values in resource filters used by policies.
RESOURCE_TYPE
becameTYPE
andRESOURCE_URN
becameURN
. Any existing policies using these filters (i.e. defined for particularurns
ortypes
such asdataset
) need to be upgraded manually, for example by retrieving their respectivedataHubPolicyInfo
aspect and changing part using filter i.e.
"resources": {
"filter": {
"criteria": [
{
"field": "RESOURCE_TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}
into
"resources": {
"filter": {
"criteria": [
{
"field": "TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}
for example, using datahub put
command. Policies can also be removed and re-created via UI.
- #9077 - The BigQuery ingestion source by default sets
match_fully_qualified_names: true
. This means that anydataset_pattern
orschema_pattern
specified will be matched on the fully qualified dataset name, i.e.<project_name>.<dataset_name>
. We attempt to support the old pattern format by prepending.*\\.
to dataset patterns lacking a period, so in most cases this should not cause any issues. However, if you have a complex dataset pattern, we recommend you manually convert it to the fully qualified format to avoid any potential issues.
What's Changed
- feat(UI): AccessManagement UI to access the role metadata for a dataset by @Ramendra761 in https://github.com/datahub-project/datahub/pull/8541
- Glossary Navigation Cypress test by @kkorchak in https://github.com/datahub-project/datahub/pull/8804
- ci: upgrade python to 3.10 for builds by @hsheth2 in https://github.com/datahub-project/datahub/pull/8808
- feat(ingestion/looker): Add view file-path as option in view_naming_pattern config by @siddiquebagwan-gslab in https://github.com/datahub-project/datahub/pull/8713
- feat(upgrade): add ability to provide a startingOffset for RestoreIndices by @ukayani in https://github.com/datahub-project/datahub/pull/8539
- fix(index): Do not override the search analyzer for ngram fields by @iprentic in https://github.com/datahub-project/datahub/pull/8818
- test(managed_ingestion): fix managed ingestion test by fixing actions… by @david-leifker in https://github.com/datahub-project/datahub/pull/8820
- docs: add 0.11 docs to docs site by @hsheth2 in https://github.com/datahub-project/datahub/pull/8813
- docs(release): Update updating-datahub.md for 0.11.0 release by @iprentic in https://github.com/datahub-project/datahub/pull/8821
- fix(ingest/mssql): Add UNIQUEIDENTIFIER data type as String by @cjm98332 in https://github.com/datahub-project/datahub/pull/8642
- build(ingest): upgrade to sqlalchemy 1.4, drop 1.3 support by @mayurinehate in https://github.com/datahub-project/datahub/pull/8810
- fix(ingest): use epoch 1 for dev build versions by @hsheth2 in https://github.com/datahub-project/datahub/pull/8824
- ci: make wheel builds more robust by @hsheth2 in https://github.com/datahub-project/datahub/pull/8815
- feat(cli): fix upload ingest cli endpoint by @pedro93 in https://github.com/datahub-project/datahub/pull/8826
- docs(transformer): fix names in sample code of 'pattern_add_dataset_domain' by @Starkie in https://github.com/datahub-project/datahub/pull/8755
- fix(siblingsHook): check number of dbtUpstreams instead of all upStreams by @ethan-cartwright in https://github.com/datahub-project/datahub/pull/8817
- fix(java) Update DataProductMapper to always return a name by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8832
- build(ingest): Bump jsonschema for Python >= 3.8 by @asikowitz in https://github.com/datahub-project/datahub/pull/8836
- feat(ingest/rest-emitter): Do not raise error on retry failure to get better error messages by @asikowitz in https://github.com/datahub-project/datahub/pull/8837
- ci: add markdown-link-check by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8771
- docs(managed datahub): release notes 0.2.11 by @anshbansal in https://github.com/datahub-project/datahub/pull/8830
- build(ingest): Remove constraint on jsonschema for Python >= 3.8 by @asikowitz in https://github.com/datahub-project/datahub/pull/8842
- fix(build): clean task cleanup generated src by @anshbansal in https://github.com/datahub-project/datahub/pull/8844
- feat(ci): disable ingestion smoke build by @anshbansal in https://github.com/datahub-project/datahub/pull/8845
- fix: fix quickstart page by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8784
- feat(bigquery): add better timers around every API call by @mayurinehate in https://github.com/datahub-project/datahub/pull/8626
- feat(ingestion/dynamodb): Add DynamoDB as new metadata ingestion source by @TonyOuyangGit in https://github.com/datahub-project/datahub/pull/8768
- feat(ingest/bigquery): support bigquery profiling with sampling by @mayurinehate in https://github.com/datahub-project/datahub/pull/8794
- Fix for edit_documentation and glossary_navigation cypress tests by @kkorchak in https://github.com/datahub-project/datahub/pull/8838
- feat(ui/java) Update domains to be nested by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8841
- dcs(ml-models): enhancing ml model documentation by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8848
- logging(lineage): adding some lineage explorer and impact analysis logging by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8849
- fix(gms): lower telemetry error log level by @hsheth2 in https://github.com/datahub-project/datahub/pull/8860
- fix(datahub-gms) usage stats queryRange API's Authorization error for Dataset Owners by @siladitya2 in https://github.com/datahub-project/datahub/pull/8819
- docs(observability): Add Custom Assertion user guide by @zmcnellis in https://github.com/datahub-project/datahub/pull/8854
- fix(airflow): fix provider loading exception by @hsheth2 in https://github.com/datahub-project/datahub/pull/8861
- Fix glossary_navigation.js by @kkorchak in https://github.com/datahub-project/datahub/pull/8864
- Managing Secrets Cypress test by @kkorchak in https://github.com/datahub-project/datahub/pull/8863
- feat(ui) Make certain things disabled if read only mode is enabled by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8870
- fix(ingest): fix mode lint error by @mayurinehate in https://github.com/datahub-project/datahub/pull/8875
- feat(search): update to support OpenSearch 2.x by @david-leifker in https://github.com/datahub-project/datahub/pull/8852
- docs(observability): Custom Assertion user guide updates by @zmcnellis in https://github.com/datahub-project/datahub/pull/8878
- feat(ingest): bump acryl-sqlglot by @hsheth2 in https://github.com/datahub-project/datahub/pull/8882
- feat(ingest): bulk fetch schema info for schema resolver by @mayurinehate in https://github.com/datahub-project/datahub/pull/8865
- fix(docs): remove link-checker from CI by @hsheth2 in https://github.com/datahub-project/datahub/pull/8883
- feat(entity-client): enable client side cache for entity-client and usage-client by @david-leifker in https://github.com/datahub-project/datahub/pull/8877
- docs: add homepage ctas by @jeffmerrick in https://github.com/datahub-project/datahub/pull/8866
- fix(ingest/bigquery): show report in output by @hsheth2 in https://github.com/datahub-project/datahub/pull/8867
- fix(docker): support alternate postgres db in postgres-setup by @hsheth2 in https://github.com/datahub-project/datahub/pull/8800
- feat(python): support custom models without forking by @hsheth2 in https://github.com/datahub-project/datahub/pull/8774
- fix(docs): fixes link to developers guides by @sgomezvillamor in https://github.com/datahub-project/datahub/pull/8809
- docs(authorization): correct policies example by @siladitya2 in https://github.com/datahub-project/datahub/pull/8833
- fix(report): too long report causes MSG_SIZE_TOO_LARGE in kafka by @sgomezvillamor in https://github.com/datahub-project/datahub/pull/8857
- docs(ingest/lookml): add guide on debugging lkml parse errors by @hsheth2 in https://github.com/datahub-project/datahub/pull/8890
- feat(ingest/kafka): support metadata mapping from kafka avro schemas by @mayurinehate in https://github.com/datahub-project/datahub/pull/8825
- feat(ingest/kafka-connect): Lineage for Kafka Connect > Snowflake by @shubhamjagtap639 in https://github.com/datahub-project/datahub/pull/8811
- fix(test): fix test execution by @david-leifker in https://github.com/datahub-project/datahub/pull/8889
- feat(ingest/snowflake): allow shares config without platform instance by @mayurinehate in https://github.com/datahub-project/datahub/pull/8803
- fix(ingest): bound types-requests by @hsheth2 in https://github.com/datahub-project/datahub/pull/8895
- fix(build): run codegen when building datahub-ingestion image by @hsheth2 in https://github.com/datahub-project/datahub/pull/8869
- fix(ingest/s3): Converting windows style path to posix one on local fs by @treff7es in https://github.com/datahub-project/datahub/pull/8757
- fix(docs): Rebranding custom to custom SQL by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8896
- docs(observability): Freshness Assertion Operation Types by @zmcnellis in https://github.com/datahub-project/datahub/pull/8907
- doc(ingestion): looker & lookml ingestion guide by @siddiquebagwan in https://github.com/datahub-project/datahub/pull/8006
- fix(ingest): bump typing-extensions by @hsheth2 in https://github.com/datahub-project/datahub/pull/8897
- feat(metadata-ingestion): implement mlflow source by @hariishaa in https://github.com/datahub-project/datahub/pull/7971
- feat(docs): Update ownership-types image urls by @pedro93 in https://github.com/datahub-project/datahub/pull/8905
- docs(website): style tweaks for readability and more open spacing by @jeffmerrick in https://github.com/datahub-project/datahub/pull/8876
- build(ingest/databricks): Relax databricks-sdk pin by @asikowitz in https://github.com/datahub-project/datahub/pull/8855
- test(ingest/delta-lake): Fix minio test for new version of delta-lake by @asikowitz in https://github.com/datahub-project/datahub/pull/8914
- doc: fix title of the ui ingestion guide & remove browse.md by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8916
- refactor(ingest/bigquery): Clarify table / view queries by @asikowitz in https://github.com/datahub-project/datahub/pull/8913
- refactor(ingest/graph): Factor out filter logic by @asikowitz in https://github.com/datahub-project/datahub/pull/8888
- fix(docker): move base image to
-base
tag, full image to head by @david-leifker in https://github.com/datahub-project/datahub/pull/8919 - fix(docker): slim tags by @david-leifker in https://github.com/datahub-project/datahub/pull/8922
- ci: Docker slim tag fix by @david-leifker in https://github.com/datahub-project/datahub/pull/8925
- refactor(misc): testngJava fix, systemrestli client, cache key fix, e… by @david-leifker in https://github.com/datahub-project/datahub/pull/8926
- feat(openapi): openapi v2 updates by @david-leifker in https://github.com/datahub-project/datahub/pull/8927
- fix(data-product): show data product card on home page by @Endtry in https://github.com/datahub-project/datahub/pull/8924
- fix(graphql): support additional types in scrollAcrossEntities by @hsheth2 in https://github.com/datahub-project/datahub/pull/8891
- docs: update cta links for acryl by @hsheth2 in https://github.com/datahub-project/datahub/pull/8908
- feat(docs): Corrects release version for custom ownership types. by @pedro93 in https://github.com/datahub-project/datahub/pull/8847
- docs: fix typo in impact-analysis.md by @Erik-McKelvey in https://github.com/datahub-project/datahub/pull/8915
- feat(chrom-ext-editable): set readOnly to false so that side navigati… by @Endtry in https://github.com/datahub-project/datahub/pull/8930
- fix(client): use value for RelationshipDirection by @eboneil in https://github.com/datahub-project/datahub/pull/8912
- fix(fine-grained lineage) CLL for datajob downstreams by @eboneil in https://github.com/datahub-project/datahub/pull/8937
- fix(ingest): refactor test markers + fix disk space issues in CI by @hsheth2 in https://github.com/datahub-project/datahub/pull/8938
- fix(cli): make quickstart docker compose up command more robust by @hsheth2 in https://github.com/datahub-project/datahub/pull/8929
- feat(transfomer): add transformer to get ownership from tags by @anshbansal in https://github.com/datahub-project/datahub/pull/8748
- docs(lineage): Lineage docs refactoring by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8899
- feat(ingestion/powerbi): column level lineage extraction for M-Query by @siddiquebagwan-gslab in https://github.com/datahub-project/datahub/pull/8796
- feat(ingest/airflow): airflow plugin v2 by @hsheth2 in https://github.com/datahub-project/datahub/pull/8853
- feat(ingest/snowflake): initialize schema resolver from datahub for l… by @mayurinehate in https://github.com/datahub-project/datahub/pull/8903
- feat(bigquery): excluding projects without any datasets from ingestion by @upendrao in https://github.com/datahub-project/datahub/pull/8535
- feat(ingest/unity): Ingest notebooks and their lineage by @asikowitz in https://github.com/datahub-project/datahub/pull/8940
- test(ingest/unity): Add Unity Catalog memory performance testing by @asikowitz in https://github.com/datahub-project/datahub/pull/8932
- doc: DataHubUpgradeHistory_v1 by @sgomezvillamor in https://github.com/datahub-project/datahub/pull/8918
- fix: fix typo on aws guide by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8944
- feat(dbt-ingestion): add documentation link from dbt source to institutionalMemory by @ethan-cartwright in https://github.com/datahub-project/datahub/pull/8686
- refactor(style): Improve search bar input focus + styling by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8955
- feat: data contracts models + CLI by @hsheth2 in https://github.com/datahub-project/datahub/pull/8923
- ci: tweak ci runs to decrease wait time of devs by @anshbansal in https://github.com/datahub-project/datahub/pull/8945
- docs(ingest): add permissions required for athena ingestion by @mayurinehate in https://github.com/datahub-project/datahub/pull/8948
- feat(ingestion/dynamodb): implement pagination for list_tables by @jinlintt in https://github.com/datahub-project/datahub/pull/8910
- feat(ci): enable ci to run on PR-s targeting all branches by @shirshanka in https://github.com/datahub-project/datahub/pull/8933
- feat(ingest/dbt): support
use_compiled_code
andtest_warnings_are_errors
by @hsheth2 in https://github.com/datahub-project/datahub/pull/8956 - refactor(boot): increases wait timeout for servlets initialization by @PatrickfBraz in https://github.com/datahub-project/datahub/pull/8947
- fix(ingest/unity): Remove metastore from ingestion and urns; standardize platform instance; add notebook filter by @asikowitz in https://github.com/datahub-project/datahub/pull/8943
- fix: add retry for fetch_url by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8958
- feat(ingest/unity): Use ThreadPoolExecutor for CLL by @asikowitz in https://github.com/datahub-project/datahub/pull/8952
- feat(ingest/snowflake): support profiling with sampling by @mayurinehate in https://github.com/datahub-project/datahub/pull/8902
- Manage Access Tokens Cypress test by @kkorchak in https://github.com/datahub-project/datahub/pull/8936
- Nested domains cypress test by @kkorchak in https://github.com/datahub-project/datahub/pull/8879
- feat(models/assertion): Add SQL Assertions by @asikowitz in https://github.com/datahub-project/datahub/pull/8969
- feat(ingest): incremental lineage source helper by @mayurinehate in https://github.com/datahub-project/datahub/pull/8941
- feat(ingest): refactor + simplify incremental lineage helper by @mayurinehate in https://github.com/datahub-project/datahub/pull/8976
- fix(lint): run black, isort by @anshbansal in https://github.com/datahub-project/datahub/pull/8978
- fix(setup): drop older table if exists by @anshbansal in https://github.com/datahub-project/datahub/pull/8979
- feat(ingest/tableau): Allow parsing of database name from fullName by @asikowitz in https://github.com/datahub-project/datahub/pull/8981
- feat(auth): add data platform instance field resolver provider by @amanda-her in https://github.com/datahub-project/datahub/pull/8828
- feat(graphql): Added datafetcher for DataPlatformInstance entity by @siladitya2 in https://github.com/datahub-project/datahub/pull/8935
- feat(config): configurable bootstrap policies file by @sgomezvillamor in https://github.com/datahub-project/datahub/pull/8812
- feat(ingestion/redshift): CLL support in redshift by @siddiquebagwan-gslab in https://github.com/datahub-project/datahub/pull/8921
- fix(ingest): Fix postgres lineage within views by @harsha-mandadi-4026 in https://github.com/datahub-project/datahub/pull/8906
- refactor(ingest/dbt): move dbt tests logic to dedicated file by @hsheth2 in https://github.com/datahub-project/datahub/pull/8984
- fix(ingest/snowflake): fix sample fraction for very large tables by @mayurinehate in https://github.com/datahub-project/datahub/pull/8988
- fix: Display generic not found page for corp groups that do not exist by @jayasimhankv in https://github.com/datahub-project/datahub/pull/8880
- fix(ingest/looker): stop emitting tag owner by @sgomezvillamor in https://github.com/datahub-project/datahub/pull/8942
- feat(ingest): add output schema inference for sql parser by @hsheth2 in https://github.com/datahub-project/datahub/pull/8989
- fix(ingest/bigquery): Fix shard regexp to match without underscore as well by @treff7es in https://github.com/datahub-project/datahub/pull/8934
- feat(ingestion): Adding config option to auto lowercase dataset urns by @treff7es in https://github.com/datahub-project/datahub/pull/8928
- feat(ingest/s3): support .gzip and fix decompression bug by @hsheth2 in https://github.com/datahub-project/datahub/pull/8990
- feat(ingestion): Adds support for memory profiling by @pedro93 in https://github.com/datahub-project/datahub/pull/8856
- feat(auth): add group membership field resolver provider by @amanda-her in https://github.com/datahub-project/datahub/pull/8846
- Query plus filter search test by @kkorchak in https://github.com/datahub-project/datahub/pull/8993
- feat(ingest/teradata): Teradata source by @treff7es in https://github.com/datahub-project/datahub/pull/8977
- ci(ingest): update base requirements by @anshbansal in https://github.com/datahub-project/datahub/pull/8995
- docs(Acryl DataHub): release notes for 0.2.12 by @anshbansal in https://github.com/datahub-project/datahub/pull/9006
- feat(cli/datacontract): Add data quality assertion support by @asikowitz in https://github.com/datahub-project/datahub/pull/8968
- feat(ingest/teradata): view parsing by @treff7es in https://github.com/datahub-project/datahub/pull/9005
- Adding missing sqlparser libs to setup.py by @treff7es in https://github.com/datahub-project/datahub/pull/9015
- feat(graphql): support filtering based on greater than/less than criteria by @iprentic in https://github.com/datahub-project/datahub/pull/9001
- build(ingest): remove ratelimiter dependency by @mayurinehate in https://github.com/datahub-project/datahub/pull/9008
- build(ingest/redshift): Add sqlglot dependency by @asikowitz in https://github.com/datahub-project/datahub/pull/9021
- feat(ingest/teradata): Add option to not use file backed dict for view definitions by @asikowitz in https://github.com/datahub-project/datahub/pull/9024
- feat(ingest/unity-catalog): Support external S3 lineage by @asikowitz in https://github.com/datahub-project/datahub/pull/9025
- fix(ingest) - Fix file backed collection temp directory removal by @treff7es in https://github.com/datahub-project/datahub/pull/9027
- add dependency level to scrollAcrossLineage search results by @ethan-cartwright in https://github.com/datahub-project/datahub/pull/9016
- add create dataproduct example by @ethan-cartwright in https://github.com/datahub-project/datahub/pull/9009
- Download Lineage Results Cypress Test by @kkorchak in https://github.com/datahub-project/datahub/pull/9017
- fix(ingest/bigquery): Remove table name restrictions (allow $ and @) by @asikowitz in https://github.com/datahub-project/datahub/pull/9030
- chore(docker): update base images to alpine 3.18 by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8967
- fix(frontend): update cookie module by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8862
- docs(datahub-lite): Fix recipe by @asikowitz in https://github.com/datahub-project/datahub/pull/9023
- fix(ingest): fix typo in parsing list of groups by @mayurinehate in https://github.com/datahub-project/datahub/pull/9037
- feat(ingestion/Vertica): Fixed vertica integration test Updated vertica dialect by @vishalkSimplify in https://github.com/datahub-project/datahub/pull/9011
- fix(ingest/sqlalchemy): Fix URL parsing when sqlalchemy_uri provided by @asikowitz in https://github.com/datahub-project/datahub/pull/9032
- feature(ingest/athena): introduce support for complex and nested schemas in Athena by @bossenti in https://github.com/datahub-project/datahub/pull/8137
- docs: adding documentation for deployment of DataHub on Azure by @Saketh-Mahesh in https://github.com/datahub-project/datahub/pull/8612
- feat(frontend/ingestion): Support flagged / warning / connection failure statuses; add recipe by @asikowitz in https://github.com/datahub-project/datahub/pull/8920
- feat(avro): upgrade avro to 1.11 by @RyanHolstien in https://github.com/datahub-project/datahub/pull/9031
- fix(search): Detect field type for use in defining the sort order by @iprentic in https://github.com/datahub-project/datahub/pull/8992
- fix(api): Add preceding / to get index sizes path by @iprentic in https://github.com/datahub-project/datahub/pull/9043
- fix(search): Apply SearchFlags passed in through to scroll queries by @iprentic in https://github.com/datahub-project/datahub/pull/9041
- fix(ownership): Corrects validation of ownership type and makes it consistent across graphQL calls by @pedro93 in https://github.com/datahub-project/datahub/pull/9044
- docs(protobuf) Update messaging around nesting messages by @eboneil in https://github.com/datahub-project/datahub/pull/9048
- Use data-testids for glossary_navigation and dataset_ownership tests by @kkorchak in https://github.com/datahub-project/datahub/pull/9033
- test(ingest/delta-lake): Fix integration tests by @asikowitz in https://github.com/datahub-project/datahub/pull/9056
- Ingestion source creation cypress test by @kkorchak in https://github.com/datahub-project/datahub/pull/8850
- docs: fix lineage capability annotations by @hsheth2 in https://github.com/datahub-project/datahub/pull/8954
- Added more data-testid usage for edit_documentation and managing_secr… by @kkorchak in https://github.com/datahub-project/datahub/pull/9060
- fix(search): fix mapping builder bug by @david-leifker in https://github.com/datahub-project/datahub/pull/9062
- feat(ingestion): Adds more advanced configurations for runtime debugging by @pedro93 in https://github.com/datahub-project/datahub/pull/8998
- feat(ingest/s3): S3 add partition to schema by @treff7es in https://github.com/datahub-project/datahub/pull/8900
- feat(frontend): Remove debug flag from start script by @pedro93 in https://github.com/datahub-project/datahub/pull/9075
- feat(sqlparser): parse create DDL statements by @hsheth2 in https://github.com/datahub-project/datahub/pull/9002
- docs(ingest): update to get_workunits_internal by @eboneil in https://github.com/datahub-project/datahub/pull/9054
- Column level lineage and path test by @kkorchak in https://github.com/datahub-project/datahub/pull/8822
- refactor(ingest): Move sqlalchemy import out of sql_types.py by @asikowitz in https://github.com/datahub-project/datahub/pull/9065
- fix(ingest): add releases link by @hsheth2 in https://github.com/datahub-project/datahub/pull/9014
- fix(ingest/bigquery): Correctly apply table pattern to read events; fix end time calculation; deprecate match_fully_qualified_names by @asikowitz in https://github.com/datahub-project/datahub/pull/9077
- feat(sqlparser): extract CLL from
update
s by @hsheth2 in https://github.com/datahub-project/datahub/pull/9078 - fix(ui): Fixes handling of resources filters in UI by @skrydal in https://github.com/datahub-project/datahub/pull/9087
- docs(ingest/bigquery): Add docs for breaking change: match_fully_qualified_names by @asikowitz in https://github.com/datahub-project/datahub/pull/9094
- docs(update): Added info on breaking change for policies by @skrydal in https://github.com/datahub-project/datahub/pull/9093
- docs: add luckyorange script to head by @yoonhyejin in https://github.com/datahub-project/datahub/pull/9080
- design: refactor docs navbar by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8975
- fix(ingest): update athena type mapping by @hsheth2 in https://github.com/datahub-project/datahub/pull/9061
- feat(ingest/datahub-source): Allow ingesting aspects from the entitiesV2 API by @asikowitz in https://github.com/datahub-project/datahub/pull/9089
- feat(ingestion/redshift): support auto_incremental_lineage by @siddiquebagwan-gslab in https://github.com/datahub-project/datahub/pull/9010
- feat(auth): Add backwards compatible field resolver by @pedro93 in https://github.com/datahub-project/datahub/pull/9096
- build(gradle): Support IntelliJ 2023.2.3 by @asikowitz in https://github.com/datahub-project/datahub/pull/9034
- build(ingest): Bump avro pin: security vulnerability by @asikowitz in https://github.com/datahub-project/datahub/pull/9042
- fix(ingestion/redshift): fix schema field data type mappings by @siddiquebagwan-gslab in https://github.com/datahub-project/datahub/pull/9053
- fix(datahub-protobuf): add check if nested field is reserved by @dyhn78 in https://github.com/datahub-project/datahub/pull/9058
- fix(ingest): better handling around sink errors by @hsheth2 in https://github.com/datahub-project/datahub/pull/9003
- feat(ingest/bigquery): Attempt to support raw dataset pattern by @asikowitz in https://github.com/datahub-project/datahub/pull/9109
- docs(observability): Column Assertion user guide by @zmcnellis in https://github.com/datahub-project/datahub/pull/9106
New Contributors
- @Ramendra761 made their first contribution in https://github.com/datahub-project/datahub/pull/8541
- @ukayani made their first contribution in https://github.com/datahub-project/datahub/pull/8539
- @cjm98332 made their first contribution in https://github.com/datahub-project/datahub/pull/8642
- @ethan-cartwright made their first contribution in https://github.com/datahub-project/datahub/pull/8817
- @hariishaa made their first contribution in https://github.com/datahub-project/datahub/pull/7971
- @Endtry made their first contribution in https://github.com/datahub-project/datahub/pull/8924
- @Erik-McKelvey made their first contribution in https://github.com/datahub-project/datahub/pull/8915
- @upendrao made their first contribution in https://github.com/datahub-project/datahub/pull/8535
- @jayasimhankv made their first contribution in https://github.com/datahub-project/datahub/pull/8880
- @Saketh-Mahesh made their first contribution in https://github.com/datahub-project/datahub/pull/8612
- @dyhn78 made their first contribution in https://github.com/datahub-project/datahub/pull/9058
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.11.0...v0.12.0
v0.11.0
Released on 2023-09-08 by @iprentic.
Release Highlights
Potential Downtime
This release introduces substantial improvements to search ranking which require reindexing indices.
During the reindexing:
- a system-update job will set indices to read-only and create a backup/clone of each index
- new components will be prevented from start-up until the reindex completes
- Helm deployments will go into read-only mode and new ingestion runs will fail
This process can take anywhere from 5 minutes to multiple hours; as a rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.
User Experience
New Search and Browse Experience
We have some really exciting improvements to the DataHub user experience in this release! The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.
<div> <a href="https://www.loom.com/share/10a5de90e7084e98b3a84fa1dc83a825"> <p> Learn all about the new Search and Browse experience! </p> </a> <a href="https://www.loom.com/share/10a5de90e7084e98b3a84fa1dc83a825"> <img style="max-width:300px;" src="https://cdn.loom.com/sessions/thumbnails/10a5de90e7084e98b3a84fa1dc83a825-with-play.gif"> </a> </div>
Improvements to Search
In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.
<div> <a href="https://www.loom.com/share/97abf74703d04457b96da3fed041089d"> <p>See the Search improvements in action!</p> </a> <a href="https://www.loom.com/share/97abf74703d04457b96da3fed041089d"> <img style="max-width:300px;" src="https://cdn.loom.com/sessions/thumbnails/97abf74703d04457b96da3fed041089d-1693606777695-with-play.gif"> </a> </div>
Manage Home Page Posts
In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.
OpenAPI Endpoints Expanded
OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.
Metadata ingestion
Added support for Confluent S3 Sink Connector, extracting stored procedures and jobs from mssql, and snowflake shares. Additionally, sql parsing source now converts query logs into CLL and usage.
Developer Experience
The CLI now supports recursive deletes.
Versioned documentation
Starting from this release, we support versioned documentation on the datahub docs site! Select the version you’re on and browse docs specifically at that version.
Performance Improvements
- Batching of default aspects on initial ingestion (SQL)
- Improvements to multi-threading. Ingestion recipes, if previously reduced to 1 thread, can be restored to the 15 thread default.
- Gradle 7 upgrade moderately improves build speed
- DataHub Ingestion slim images reduced in size by 2GB+
Important Bug Fixes
- Glue Schema Registry fixed
Deprecation Notice
- MAE Events are no longer produced. MAE events have been deprecated for over a year.
What's Changed
- feat(ingest/presto-on-hive): enable partition key for presto-on-hive by @zheyu001 in https://github.com/datahub-project/datahub/pull/8380
- feat(classification): allow parallelisation to reduce time by @mayurinehate in https://github.com/datahub-project/datahub/pull/8368
- feat(ingest): Add metabase database id to platform instance mapping by @k-popov in https://github.com/datahub-project/datahub/pull/8359
- feat(ingest): add ability to read other method types than GET for OAS ingest recipes by @jsmilkstein in https://github.com/datahub-project/datahub/pull/8303
- fix(ingest): fix data platform urn in dataset_urn_to_key and dataset_key_to_urn by @Masterchen09 in https://github.com/datahub-project/datahub/pull/8209
- fix(ingest/s3): wrong sorting in case of multi-partition key by @anshbansal in https://github.com/datahub-project/datahub/pull/8536
- fix(ingest/presto): fix presto on hive test failures by @hsheth2 in https://github.com/datahub-project/datahub/pull/8548
- Cypress test for managing groups by @kkorchak in https://github.com/datahub-project/datahub/pull/8520
- feat(ingest/kafka-connect): add support for Confluent S3 Sink Connector by @tusharm in https://github.com/datahub-project/datahub/pull/8298
- Variable rename - Allows deselection of members in add members modal for a group by @Sukeerthi31 in https://github.com/datahub-project/datahub/pull/8529
- fix(ingest/s3): catch no such bucket exception instead of failing by @anshbansal in https://github.com/datahub-project/datahub/pull/8549
- fix(ingest): add tableau sqlglot dep by @hsheth2 in https://github.com/datahub-project/datahub/pull/8552
- fix(ingetion/mssql): convert dataset urns to lowercase by @siddiquebagwan in https://github.com/datahub-project/datahub/pull/8551
- Fix flaky add_user smoke test by @kkorchak in https://github.com/datahub-project/datahub/pull/8471
- feat(ci): use docker registry cache by @hsheth2 in https://github.com/datahub-project/datahub/pull/8544
- fix(glue): restore glue configurations by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8533
- build(release): Update files for 0.10.5 release by @iprentic in https://github.com/datahub-project/datahub/pull/8556
- docs(release): Update updating-datahub.md for 0.10.5 release by @iprentic in https://github.com/datahub-project/datahub/pull/8557
- feat(ingestion/snowflake): use user email-id in urn generation for top users stat by @siddiquebagwan in https://github.com/datahub-project/datahub/pull/8513
- docs(development.md): Minor grammatical error by @PauloGoncalvesLima in https://github.com/datahub-project/datahub/pull/8558
- fix(usage): Update index lifecycle policy to not delete old datahub usage events by @iprentic in https://github.com/datahub-project/datahub/pull/8565
- fix(ui): Simplify background color for Entity Health Status popover by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8559
- fix: add --write args on pre-commit prettier by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8560
- docs(observe): Add feature doc for Freshness Assertions by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8547
- docs(updating): add details on Unified Search & Browse experience by @maggiehays in https://github.com/datahub-project/datahub/pull/8568
- fix: fix features section by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8571
- feat(ingest): allow lower freq profiling based on date of month/day of week by @anshbansal in https://github.com/datahub-project/datahub/pull/8489
- fix(stats): default to 3 months by @anshbansal in https://github.com/datahub-project/datahub/pull/8566
- fix(aspect): count query only for relevant aspect index by @iprentic in https://github.com/datahub-project/datahub/pull/8569
- feat(quickstart): bump quickstart start periods more by @hsheth2 in https://github.com/datahub-project/datahub/pull/8573
- Origin/cypress test for managing policies by @kkorchak in https://github.com/datahub-project/datahub/pull/8554
- feat(ui) Show source documentation when editing entity documentation by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8516
- fix(ingest): handle redaction of configs with int keys by @hsheth2 in https://github.com/datahub-project/datahub/pull/8545
- fix(ingest/snowflake): maintain qualified name casing, do not lowercase by @mayurinehate in https://github.com/datahub-project/datahub/pull/8574
- feat(docs): add github repo links to readme and docs by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8422
- feat(ebean): Add metric in ebean aspect DAO for failed tries, as well as failed operation… by @iprentic in https://github.com/datahub-project/datahub/pull/8576
- refactor(search) Use search across multiple-entities API, deprecate Aggregator classes by @iprentic in https://github.com/datahub-project/datahub/pull/8498
- feat(siblings): dont show multiple platform icons if the siblings are ghost nodes by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8543
- docs(lineage): Add description to make_lineage_mce by @eboneil in https://github.com/datahub-project/datahub/pull/8596
- doc(ingest/log): failure log at pipeline level document by @anshbansal in https://github.com/datahub-project/datahub/pull/8591
- Dataset ownership test by @kkorchak in https://github.com/datahub-project/datahub/pull/8583
- doc(release): release notes for 0.2.10 by @anshbansal in https://github.com/datahub-project/datahub/pull/8599
- docs(release): fix typo by @anshbansal in https://github.com/datahub-project/datahub/pull/8600
- feat(ui): apply views to: domains, containers, terms by @eboneil in https://github.com/datahub-project/datahub/pull/8572
- feat(search): embedded view dropdown by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8598
- fix(ingest/file): remove
entity_type_counts
andaspect_counts
by @hsheth2 in https://github.com/datahub-project/datahub/pull/8586 - fix(ingest): use hive pure_sasl variant by @hsheth2 in https://github.com/datahub-project/datahub/pull/8570
- Feat(ingest/ldap)fix list index out of range error by @alplatonov in https://github.com/datahub-project/datahub/pull/8525
- harden autocomplete test by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8603
- feat(ui/graphql) Add ability to sort search results from search results page by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8595
- fix(ingest): Add client_certificate_path for rest client cert instead of ca_certif… by @mkamalas in https://github.com/datahub-project/datahub/pull/8581
- refactor(graphql): extract code into metadata-io part 1 by @anshbansal in https://github.com/datahub-project/datahub/pull/8607
- docs(ingest): update s3 and gcs doc with concept mapping by @mayurinehate in https://github.com/datahub-project/datahub/pull/8575
- Fix(ingestion/clickhouse) move to two tier sqlalchemy by @alplatonov in https://github.com/datahub-project/datahub/pull/8300
- fix(cypress): attempt to fix autocomplete test by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8619
- fix(cleanup): cleanup of 2 sub-modules by @anshbansal in https://github.com/datahub-project/datahub/pull/8616
- docs(ingsetion/csv-enricher): fix sample csv mentioned in Docstrings by @siddiquebagwan in https://github.com/datahub-project/datahub/pull/8432
- feat(ingest): allow relative start time config by @mayurinehate in https://github.com/datahub-project/datahub/pull/8562
- fix(ingest/airflow): make inlets work again by @hsheth2 in https://github.com/datahub-project/datahub/pull/8631
- feat(ingest/s3): Adding option to pass in any spark config property to s3 source by @treff7es in https://github.com/datahub-project/datahub/pull/8621
- feat(impact analysis): allow deep linking of url params in impact analysis by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8617
- feat(ui) Display combined sibling results in search + 2 minor updates by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8602
- feat(ui) Display consistent search results in embedded searches by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8597
- feat(ingest): Add DataHub source by @asikowitz in https://github.com/datahub-project/datahub/pull/8561
- fix(ingest/okta): fix event_loop RuntimeError with nested asyncio by @skrydal in https://github.com/datahub-project/datahub/pull/8637
- fix(ingest/kafka): use SchemaReference properties instead of dict access by @Deepankarkr in https://github.com/datahub-project/datahub/pull/8615
- feat(ingestion/ldap): flag to ingest ldap users with email instead of username by @Deepankarkr in https://github.com/datahub-project/datahub/pull/8606
- Combine siblings in autocomplete by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8610
- fix(ingest): avoid mutable defaults in powerbi dataclass by @hsheth2 in https://github.com/datahub-project/datahub/pull/8609
- chore(spring): upgrade minor versions of spring components by @david-leifker in https://github.com/datahub-project/datahub/pull/8627
- docs(quickstart): quickstart documentation, clarification on production by @david-leifker in https://github.com/datahub-project/datahub/pull/8628
- feat(datahub-ingestion): refactor datahub ingestion slim images by @david-leifker in https://github.com/datahub-project/datahub/pull/8515
- bug(8584): emit data_platform_instance aspect if the config has platform_instance by @jinlintt in https://github.com/datahub-project/datahub/pull/8585
- chore(snappy): fix snappy version constraint by @david-leifker in https://github.com/datahub-project/datahub/pull/8629
- chore(hazelcast): update hazelcast version by @david-leifker in https://github.com/datahub-project/datahub/pull/8633
- feat(graphql) Support exists operator in GraphQL Search API by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8652
- [fix][health ui] Removing ghost 0 for health signals on search cards by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8587
- fix(data products): removing data products filter in search as its not indexed on entity documents by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8650
- feat(ingest/bigquery): add tag to BigQuery clustering columns by @ANich in https://github.com/datahub-project/datahub/pull/8495
- fix(ingest/snowflake): fix usage enum bug by @hsheth2 in https://github.com/datahub-project/datahub/pull/8649
- feat(ingest/dbt-cloud): use job-based graphql queries by @hsheth2 in https://github.com/datahub-project/datahub/pull/8647
- Add and remove documentation and link for dataset by @kkorchak in https://github.com/datahub-project/datahub/pull/8604
- Lineage column level test by @kkorchak in https://github.com/datahub-project/datahub/pull/8641
- tests(search): search golden tests by @eboneil in https://github.com/datahub-project/datahub/pull/8605
- Add test case for dataset deprecation test by @kkorchak in https://github.com/datahub-project/datahub/pull/8646
- docs(ingest/kafka-connect): add details on platform instance mapping by @mayurinehate in https://github.com/datahub-project/datahub/pull/8654
- docs(ingest/airflow): add
capture_executions
to docs by @hsheth2 in https://github.com/datahub-project/datahub/pull/8662 - Fix a few view select issues by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8670
- feat(search): Add word gram analyzer for name fields by @iprentic in https://github.com/datahub-project/datahub/pull/8611
- fix(docker): misc docker fixes by @david-leifker in https://github.com/datahub-project/datahub/pull/8677
- tests(search): more golden tests by @eboneil in https://github.com/datahub-project/datahub/pull/8683
- test(ingest/vertica): Skip integration test failing CI; support arm Macs by @asikowitz in https://github.com/datahub-project/datahub/pull/8694
- ci: add
needs_artifact_download
output for ingestion image by @hsheth2 in https://github.com/datahub-project/datahub/pull/8695 - logs(ingestion/unity): Hide stack trace on sql parse failure logs by @asikowitz in https://github.com/datahub-project/datahub/pull/8657
- feat(ingestion/powerbi): support multiple tables as upstream in native SQL parsing by @siddiquebagwan-gslab in https://github.com/datahub-project/datahub/pull/8592
- build(ingest): Bump pydantic pin by @asikowitz in https://github.com/datahub-project/datahub/pull/8660
- remove(ingest/snowflake): Remove legacy snowflake lineage by @asikowitz in https://github.com/datahub-project/datahub/pull/8653
- fix(ingest/ldap): Handle case when 'objectClass' not in attrs by @asikowitz in https://github.com/datahub-project/datahub/pull/8658
- fix(ui) Remove new Role entity from searchable entity types by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8655
- fix(java) Use alias for name search sorting and fix missing mappings by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8648
- feat(ui) Create page for managing home page posts by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8707
- fix(ingest/powerbi): add sqlglot python dep by @hsheth2 in https://github.com/datahub-project/datahub/pull/8704
- ci(ingest): make ingestion caching rules correct by @hsheth2 in https://github.com/datahub-project/datahub/pull/8685
- fix(cleanup): cleanup of 1 sub-module by @anshbansal in https://github.com/datahub-project/datahub/pull/8678
- fix(policies): fix concurrent modification exception by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8681
- fix(ingest/bigquery): Add config option to create DataPlatformInstance, default off by @asikowitz in https://github.com/datahub-project/datahub/pull/8659
- feat(ingest/looker): Record observed lineage timestamps for Looker and LookML sources by @ANich in https://github.com/datahub-project/datahub/pull/7735
- feat(ingest/mssql): load jobs and stored procedures by @RChygir in https://github.com/datahub-project/datahub/pull/5363
- fix(ingestion/kafka-connect): update retrieval of database name in Debezium SQL Server by @Starkie in https://github.com/datahub-project/datahub/pull/8608
- feat(ingest/snowflake): tables from snowflake shares as siblings by @mayurinehate in https://github.com/datahub-project/datahub/pull/8531
- feat(ingest/sql-queries): Add sql queries source, SqlParsingBuilder, sqlglot_lineage performance optimizations by @asikowitz in https://github.com/datahub-project/datahub/pull/8494
- highlight matched fields in search results by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8651
- Add links to glossary term cards without counts by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8705
- fix non sibling document links by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8724
- refactor(policies): Rename edit all privilege to edit entity by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8722
- feat(java/ui) Add search suggestions to our search experience by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8710
- fix(cypress) Fix login.js cypress test by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8719
- Fixes for faling login.js and managing_groups.js Cypress tests by @kkorchak in https://github.com/datahub-project/datahub/pull/8725
- fix(kafka-setup): remove dependency confluent docker utils by @lix-mms in https://github.com/datahub-project/datahub/pull/8715
- docs(docs): add native versioning by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8714
- config(ingest/rest): Update rest sink defaults to retry more often by @asikowitz in https://github.com/datahub-project/datahub/pull/8729
- chore(jackson): update to released version of jackson by @david-leifker in https://github.com/datahub-project/datahub/pull/8674
- fix(examples): fix typo in business glossary bootstrap yml by @mayurinehate in https://github.com/datahub-project/datahub/pull/8703
- fix(schemaRegistry): change api servlet check to only apply to internal to fix glue support by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8693
- fix(ingest): stateful redundant run skip handler by @mayurinehate in https://github.com/datahub-project/datahub/pull/8467
- fix(superset): get alternate platform value if sqlalchemy_uri param is missing by @akhil7philip in https://github.com/datahub-project/datahub/pull/8667
- feat(ingest): support writing configs to files by @hsheth2 in https://github.com/datahub-project/datahub/pull/8696
- feat(search): De-duplicate scale factors across entities by @iprentic in https://github.com/datahub-project/datahub/pull/8718
- test(lineage): Add test for scroll across lineage by @iprentic in https://github.com/datahub-project/datahub/pull/8728
- feat(ingest/metabase): detect source table for cards sourced from other cards by @k-popov in https://github.com/datahub-project/datahub/pull/8577
- (ingestion) bug fix: emit platform instance aspect for dataset in Databricks ingestion by @jinlintt in https://github.com/datahub-project/datahub/pull/8671
- feat(config): Turn on new search & browse experience by default by @iprentic in https://github.com/datahub-project/datahub/pull/8737
- chore(ingest/s3) Bump Deequ and Pyspark version by @treff7es in https://github.com/datahub-project/datahub/pull/8638
- docs(ingest/openapi): Downgrade status from CERTIFIED to INCUBATING by @asikowitz in https://github.com/datahub-project/datahub/pull/8736
- feat(health): Adding Entity Health Status to the Lineage Graph View by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8739
- build(ingest): Pin mypy-boto3-sagemaker directly by @asikowitz in https://github.com/datahub-project/datahub/pull/8746
- feat(ingest/datahub): Improvements, bug fixes, and docs by @asikowitz in https://github.com/datahub-project/datahub/pull/8735
- docs(obseve): Adding Volume Assertion Guide by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8706
- fix(ingest/okta): Removed code closing okta's event_loop by @skrydal in https://github.com/datahub-project/datahub/pull/8675
- fix(highlight): disable full name highlight by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8750
- fix(ui): hide pages from web crawlers by @hsheth2 in https://github.com/datahub-project/datahub/pull/8738
- docs: add index pages for feature/deployment guides by @hsheth2 in https://github.com/datahub-project/datahub/pull/8723
- feat(docs): move versioned_sidebars to static-assets by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8743
- docs(observe): DataHub Operation freshness assertion guide by @zmcnellis in https://github.com/datahub-project/datahub/pull/8749
- feat(cli): support recursive deletes by @hsheth2 in https://github.com/datahub-project/datahub/pull/8709
- fix(ingest/bigquery): Handle null view_definition; remove view definition hash ids by @asikowitz in https://github.com/datahub-project/datahub/pull/8747
- feat(ingest/usage): Make cumulative query character limit configurable by @asikowitz in https://github.com/datahub-project/datahub/pull/8751
- fix(ingest/athena): Fixing db container id by @treff7es in https://github.com/datahub-project/datahub/pull/8689
- feat(systemMetadata): add pipeline names to system metadata by @hsheth2 in https://github.com/datahub-project/datahub/pull/8684
- ci: separate airflow build and test by @mayurinehate in https://github.com/datahub-project/datahub/pull/8688
- fix(ingest/athena): fix container linting by @hsheth2 in https://github.com/datahub-project/datahub/pull/8761
- fix(datahub-frontend) Give permission for start.sh so it can run by @rtekal in https://github.com/datahub-project/datahub/pull/8594
- feat(sql-parser): schema-aware output column casing by @hsheth2 in https://github.com/datahub-project/datahub/pull/8760
- fix(ingest/bigquery): Filter out fine grained lineage with no upstreams by @asikowitz in https://github.com/datahub-project/datahub/pull/8758
- feat(iceberg): Upgrade Iceberg ingestion source to pyiceberg 0.4.0 by @cccs-eric in https://github.com/datahub-project/datahub/pull/8357
- Allow frontend to use http proxy by @githendrik in https://github.com/datahub-project/datahub/pull/8691
- docs(observe): Dataset Profile volume assertion guide by @zmcnellis in https://github.com/datahub-project/datahub/pull/8764
- docs:fix broken img links under managed-datahub by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8769
- fix:small typo on graphql tutorial by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8741
- refactor(build): upgrade to gradle 7 & guava update by @david-leifker in https://github.com/datahub-project/datahub/pull/8745
- fix(siblings): space icons out by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8767
- chore(build): upgrade gradle wrapper by @hsheth2 in https://github.com/datahub-project/datahub/pull/8776
- feat(EntityService): batched transactions and ebean updates by @david-leifker in https://github.com/datahub-project/datahub/pull/8456
- fix(frontend): Fix"Logout with OIDC not working" by @FirKys in https://github.com/datahub-project/datahub/pull/8773
- docs:upgrade docusaurus version by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8770
- fix:change global graph url to static-assets by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8742
- doc(tests): fix endpoint param to push results by @anshbansal in https://github.com/datahub-project/datahub/pull/8783
- fix(elastic): improve error handling for profiling by @anshbansal in https://github.com/datahub-project/datahub/pull/8785
- chore(analytics): bump version by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8786
- docs(session): add documentation for session token duration and fix default by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8791
- fix(ingest/datahub): Support postgres; build(postgres): Modernize postgres docker setup by @asikowitz in https://github.com/datahub-project/datahub/pull/8762
- feat(airflow-plugin): add package type information by @mayurinehate in https://github.com/datahub-project/datahub/pull/8795
- feat(systemMetadata): Adding a lastRunId field system metadata by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8672
- added support for group-owners in dataflow entities by @dnks23 in https://github.com/datahub-project/datahub/pull/8154
- fix(ingest/tableau): fix tableau native CLL for snowflake, add type annotations by @mayurinehate in https://github.com/datahub-project/datahub/pull/8779
- fix(ingest/bigquery): fix partition and median queries for profiling by @mayurinehate in https://github.com/datahub-project/datahub/pull/8778
- docs: add datahub source to integrations page by @hsheth2 in https://github.com/datahub-project/datahub/pull/8787
- chore(ingest): upgrade sqlglot fork by @hsheth2 in https://github.com/datahub-project/datahub/pull/8775
- docs: minor fix on versioning navbar and dropdown by @jeffmerrick in https://github.com/datahub-project/datahub/pull/8790
- feat(ingest): drop sql_metadata parser by @hsheth2 in https://github.com/datahub-project/datahub/pull/8765
- fix(ingest): drop
wrap_aspect_as_workunit
method by @hsheth2 in https://github.com/datahub-project/datahub/pull/8766 - feat(search): Also de-duplicate the field queries based on field names by @iprentic in https://github.com/datahub-project/datahub/pull/8788
- feat(openapi): entity endpoints & analytics raw by @david-leifker in https://github.com/datahub-project/datahub/pull/8537
- docs(db-retention): update with default setting by @david-leifker in https://github.com/datahub-project/datahub/pull/8797
- fix(custom-search): fix custom search to be able to use unquoted query by @david-leifker in https://github.com/datahub-project/datahub/pull/8805
- feat: add feedback widget by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8732
- fix(gms): Fixed Recently Viewed section for users with '@' in the URN. by @skrydal in https://github.com/datahub-project/datahub/pull/8754
- fix(spark-test): upgrade gradle and fix spark smoke test by @david-leifker in https://github.com/datahub-project/datahub/pull/8777
New Contributors
- @zheyu001 made their first contribution in https://github.com/datahub-project/datahub/pull/8380
- @jsmilkstein made their first contribution in https://github.com/datahub-project/datahub/pull/8303
- @tusharm made their first contribution in https://github.com/datahub-project/datahub/pull/8298
- @PauloGoncalvesLima made their first contribution in https://github.com/datahub-project/datahub/pull/8558
- @Deepankarkr made their first contribution in https://github.com/datahub-project/datahub/pull/8615
- @ANich made their first contribution in https://github.com/datahub-project/datahub/pull/8495
- @siddiquebagwan-gslab made their first contribution in https://github.com/datahub-project/datahub/pull/8592
- @RChygir made their first contribution in https://github.com/datahub-project/datahub/pull/5363
- @Starkie made their first contribution in https://github.com/datahub-project/datahub/pull/8608
- @akhil7philip made their first contribution in https://github.com/datahub-project/datahub/pull/8667
- @zmcnellis made their first contribution in https://github.com/datahub-project/datahub/pull/8749
- @githendrik made their first contribution in https://github.com/datahub-project/datahub/pull/8691
- @FirKys made their first contribution in https://github.com/datahub-project/datahub/pull/8773
- @dnks23 made their first contribution in https://github.com/datahub-project/datahub/pull/8154
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.5...v0.11.0
v0.10.5
Released on 2023-08-02 by @david-leifker.
Release Highlights
NEW: Unified Search and Browse Experience
It’s here, it’s here! We are incredibly excited to roll out our re-designed, streamlined Search and Browse experience. End-users now have a one-stop-shop to search for specific data entities and browse across systems, making it easier than ever to find the most relevant and meaningful resources within DataHub.
Checkout the screenshot below and get a full walk-through in this video!
<img width="1041" alt="CleanShot 2023-08-03 at 14 47 55@2x" src="https://github.com/datahub-project/datahub/assets/15873986/2f47d033-6c2b-483a-951d-e6d6b807f0d0">
User Experience
- Column-Level Lineage (CLL) visualization update: you can now visualize CLL relationships through DataJobs (i.e. Airflow DAGs)
- Unique Glossary Terms: We now prevent creating duplicate Glossary Term names within a Term Group
- Domains: You can now configure the Documentation tab to be the default landing page within a Domain
- Formatting updates to Row Count to make large numbers more human readable (ie. 3283337 > 3.2M)
- Stats Tab: Y-axis scale now dynamically set to reflect the minimum & maximum values, improving readability
Metadata ingestion
Ingestion Enhancements:
- BigQuery: Set
platform_instance
using project_id - PowerBI: Ingest datasets not used in visualizations (tiles/pages
- Kafka Connect: Ability to set
platform_instance
- Nifi: Support for basic auth
- Presto on Hive: Extract all table properties from Hive Metastore
- Elasticsearch: Support for basic profiling
- Add advanced configuration for LDAP manager ingestion
Lineage Improvements:
- Schema-aware SQL parsing to derive column-level lineage
- Column-level lineage support for BigQuery, Tableau, and Snowflake View definitions
- Snowflake: Extract Snowpipe S3 lineage
Developer Experience
- Fine-grained ownership policies
- PATCH support for DataJob Inputs/Outputs
- New endpoints to extract size of time-series indices and truncate/cleanup time-series indices in Elasticsearch; support for bulk-deletes
- Initial support for exception reporting via Sentry
- New OpenAPI endpoint to get Task Status
- SDK: Easily generate container URNs
Docs
- Improvements to our File-Based Lineage doc, specifically focused on Fine-Grained Lineage config components (link)
- Code examples of how to manage Posts within DataHub (link)
- Guide to generating custom browse paths for the new search experience (link)
What's Changed
- refractor(classification): datahub classifier init by @mayurinehate in https://github.com/datahub-project/datahub/pull/8193
- fix(glue): fix typo in reported warning, report with flow_urn by @mayurinehate in https://github.com/datahub-project/datahub/pull/8138
- fix(ingest/delta-lake): fix CI issues due to delta lake version bump by @mayurinehate in https://github.com/datahub-project/datahub/pull/8215
- Upgrade kafka and its dependencies to 3.4 in docker compose by @jinlintt in https://github.com/datahub-project/datahub/pull/8161
- chore(release): update default cli for managed ingestion by @pedro93 in https://github.com/datahub-project/datahub/pull/8226
- fix(ownership): Corrects graphQL resolver for entity operations by @pedro93 in https://github.com/datahub-project/datahub/pull/8219
- fix(cli/quickstart): handle docker hangs gracefully by @hsheth2 in https://github.com/datahub-project/datahub/pull/8211
- fix(cli): make quickstart robust to docker race conditions by @hsheth2 in https://github.com/datahub-project/datahub/pull/8233
- fix(search): tag/term should filter for both entity and field level by @anshbansal in https://github.com/datahub-project/datahub/pull/7881
- docs(tests): document test eval endpoint by @anshbansal in https://github.com/datahub-project/datahub/pull/8227
- feat(ingest/bigquery_v2): enable platform instance using project id by @asikowitz in https://github.com/datahub-project/datahub/pull/8216
- feat(stats): make rowcount more human readable by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8232
- docs(es): Update aws deploy docs to correct ElasticSearch version by @iprentic in https://github.com/datahub-project/datahub/pull/8240
- feat(sdk): support patches as MCPs in file source by @hsheth2 in https://github.com/datahub-project/datahub/pull/8220
- fix(apiAuth): add resources where applicable and update docs by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8234
- feat(patch): support datajob input output by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8190
- feat(ingest/unity): Set external url for containers and datasets by @asikowitz in https://github.com/datahub-project/datahub/pull/8238
- docs(airflow): add docs on custom operators by @matthew-coudert-cko in https://github.com/datahub-project/datahub/pull/7913
- chore(release): update datahub upgrade docs by @pedro93 in https://github.com/datahub-project/datahub/pull/8228
- fix(ingestion/tableau): Remove unused field documentViewId by @mohdsiddique in https://github.com/datahub-project/datahub/pull/8225
- feat(ui): create fast path for immediate processing of ui sourced changes by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8200
- fix(ingest/druid) Handling gracefully if no table returned in a schema by @treff7es in https://github.com/datahub-project/datahub/pull/8203
- fix(kafka-setup): bump kafka version by @david-leifker in https://github.com/datahub-project/datahub/pull/8245
- feat(ingestion/powerbi): Ingest datasets not used in PowerBI visualization(tiles/pages) by @mohdsiddique in https://github.com/datahub-project/datahub/pull/8212
- fix(sdk/dataflow): deprecate cluster and use env and platform_instance instead by @shubhamjagtap639 in https://github.com/datahub-project/datahub/pull/8201
- fix(ingest): pass platform correctly to browse path v2 helper by @asikowitz in https://github.com/datahub-project/datahub/pull/8244
- feat(search): Supporting Aggregations for hasX fields by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8241
- fix(ingest): Call validator on the base urn as well as aspect components when ingesting by @iprentic in https://github.com/datahub-project/datahub/pull/8250
- docs(website): adjust markprompt z-index so it's not covered by nav by @jeffmerrick in https://github.com/datahub-project/datahub/pull/8255
- fix(patch): Fix exception when using default patch for patching missing aspects by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8221
- fix(custom-search): revert underscore as quoted by @david-leifker in https://github.com/datahub-project/datahub/pull/8163
- chore(ci): add back optional static sleep for tests by @anshbansal in https://github.com/datahub-project/datahub/pull/8258
- chore(checkbox): darken all checkboxes by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8248
- chore(assertions): catch any exception on assertion delete by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8247
- feat(opensearch): Rollover usage events at a file size rather than time-based manner by @iprentic in https://github.com/datahub-project/datahub/pull/8182
- fix(ingest/okta): Set default of okta_profile_to_username_attr to email by @asikowitz in https://github.com/datahub-project/datahub/pull/8263
- feat(ui) Update Search & Browse to be a unified experience by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8235
- fix(ingest/tableau): split table columns query from datasources query by @mayurinehate in https://github.com/datahub-project/datahub/pull/8217
- fix(ingest/okta): Set default of okta connector to match OIDC defaults by @anshbansal in https://github.com/datahub-project/datahub/pull/8272
- feat(elasticsearch): Add endpoint for getting the size of timeseries indices by @iprentic in https://github.com/datahub-project/datahub/pull/8265
- feat(ingest/delete-cli): Add configurable batch size; update docs by @asikowitz in https://github.com/datahub-project/datahub/pull/8274
- fix aggregation sorting in browsev2 sidebar by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8276
- Support de-selecting browse paths by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8242
- feat(cli): Initial support for sending exceptions to Sentry by @treff7es in https://github.com/datahub-project/datahub/pull/7172
- fix(ingestion/powerbi): use admin api resolver to fetch modified workspaces by @mohdsiddique in https://github.com/datahub-project/datahub/pull/8273
- fix: dbt-athena types mapping for complex types by @svdimchenko in https://github.com/datahub-project/datahub/pull/8264
- feat(graphql) Prevent duplicate glossary term names within a group by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8187
- Add retries to JavaEntityClient:deleteReferencesTo by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8268
- feat(ingest): Create zero usage aspects by @asikowitz in https://github.com/datahub-project/datahub/pull/8205
- fix(docs) Update Chrome extension docs to reflect current reality by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8284
- refactor(validations): Add URL-based Routing to Dataset Validations Tab by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8254
- fix(metadata-io): retry transactions on serialization errors when using a PostgreSQL database by @Masterchen09 in https://github.com/datahub-project/datahub/pull/8278
- docs(ingest/lineage): Update fine grained file lineage docs by @eboneil in https://github.com/datahub-project/datahub/pull/8283
- docs(posts): add examples by @abiwill in https://github.com/datahub-project/datahub/pull/7688
- chore(deprecate): remove legacy sql table by @david-leifker in https://github.com/datahub-project/datahub/pull/8253
- fix(ingest/csv-enricher): Adding extra check in csv enricher to ignore non-urn urns by @treff7es in https://github.com/datahub-project/datahub/pull/8169
- tests(urn): Add tests for more cases of invalid urns by @iprentic in https://github.com/datahub-project/datahub/pull/8285
- feat(search): add search annotations for profile aspect by @anshbansal in https://github.com/datahub-project/datahub/pull/8282
- fix(ingest/snowflake): snowflake profiling geometry type by @mayurinehate in https://github.com/datahub-project/datahub/pull/8279
- refactor(unity): Remove databricks_cli and cleanup by @asikowitz in https://github.com/datahub-project/datahub/pull/8249
- Sidebar local storage setting + toggle tooltip by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8288
- fix(ui) Fix UI issues with self-referencing column level lineage by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8296
- feat(ui) Add ability to view CLL through DataJobs in lineage visualization by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8281
- docs(business glossary) Update business glossary docs by @eboneil in https://github.com/datahub-project/datahub/pull/8287
- docs(graphql): add developer guide for adding a new graphql endpoint by @iprentic in https://github.com/datahub-project/datahub/pull/8297
- fix(test): consolidate mae-consumer test entity registry by @david-leifker in https://github.com/datahub-project/datahub/pull/8309
- fix(ingestion) Fixes producing MAE events with browsePathsV2 aspect by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8304
- fix(embed): set embed url to false for tableau config by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8308
- fix(embed): hide chart & dashboard previews if not for looker by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8307
- fix(ingest/unity): Pin databricks-sdk and update docs by @asikowitz in https://github.com/datahub-project/datahub/pull/8293
- fix(ui) Only show search and browse V2 onboarding steps if flag is on by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8315
- fix(ingest/looker): Fix typo on ViewField creation for measures by @asikowitz in https://github.com/datahub-project/datahub/pull/8318
- docs(managed datahub): docs for v0.2.9 by @anshbansal in https://github.com/datahub-project/datahub/pull/8323
- feat(ingest/snowflake): snowpipe s3 lineage by @mayurinehate in https://github.com/datahub-project/datahub/pull/8262
- fix(ingest/postgres): fix profiling errors, skip json type column by @mayurinehate in https://github.com/datahub-project/datahub/pull/8291
- tests(elasticsearch): Add fixture test for basic scroll functionality by @iprentic in https://github.com/datahub-project/datahub/pull/8321
- feat(tableau): add config knobs for excluding external links from tableau by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8314
- fix(documentation): remove links from associatedUrn by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8319
- fix(browsev2): improved error handling by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8326
- fix(search) Add facets list to our cache key to avoid cache collisions by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8327
- feat(elasticsearch): Add rest.li endpoint that does truncation cleanup of a timeseries index by @iprentic in https://github.com/datahub-project/datahub/pull/8277
- Container link in browse v2 sidebar by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8305
- fix(browse): try to prevent overlapping pagination calls by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8329
- feat(usage): add max width to users tooltip by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8335
- feat(usagestats): Optimize elasticsearch query for usage stats aggregations by @iprentic in https://github.com/datahub-project/datahub/pull/8333
- feat(ingest): add YamlFileUpdater utility by @hsheth2 in https://github.com/datahub-project/datahub/pull/8266
- feat(ui) Show Acryl information with button and banner behind flag by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8330
- test(ingest/trino): xfail test to unblock CI by @asikowitz in https://github.com/datahub-project/datahub/pull/8340
- fix(restli): Add docs for get task status, and fix hostname regex by @iprentic in https://github.com/datahub-project/datahub/pull/8341
- docs(lineage): add read lineage example by @eboneil in https://github.com/datahub-project/datahub/pull/8322
- fix(async): submit additional default aspects only when not in async mode by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8320
- feat(auth): Fine grained ownership policies by @skrydal in https://github.com/datahub-project/datahub/pull/7499
- fix(ingest/s3): Fix for flaky s3 test - uploading s3 files in consistent order by @treff7es in https://github.com/datahub-project/datahub/pull/8367
- fix(ingest/airflow): Remove info log on import by @fjmacagno in https://github.com/datahub-project/datahub/pull/8246
- fix(ui) Update copy of the demo site acryl banner by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8370
- test(ingest/mysql): Configure sql_server tests for arm64 by @asikowitz in https://github.com/datahub-project/datahub/pull/8360
- fix(browse): filter entities by whether they might exist in the instance by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8355
- ci(docs): add missing deps for lxml package for vercel by @hsheth2 in https://github.com/datahub-project/datahub/pull/8372
- feat(browsepathv2): enable incremental update browsepath by @david-leifker in https://github.com/datahub-project/datahub/pull/8354
- chore(smoke-test): use a more recent ingestion cli version in tests by @david-leifker in https://github.com/datahub-project/datahub/pull/8374
- feat(stats): show size in bytes and scale at y=min by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8375
- fix(schema-registry): fix internal schema reg with custom duhe topic … by @david-leifker in https://github.com/datahub-project/datahub/pull/8371
- fix(java) Add try catch block when backfilling browse v2 by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8377
- feat(ingest): Add advanced configuration for LDAP manager ingestion by @bda618 in https://github.com/datahub-project/datahub/pull/7784
- fix(ingest): update pydantic helpers to address unique name issue by @mayurinehate in https://github.com/datahub-project/datahub/pull/8324
- fix(cli): local variable reference before assignment by @segun-s in https://github.com/datahub-project/datahub/pull/8222
- feat(ingest): Turn on browse path v2 creation by @asikowitz in https://github.com/datahub-project/datahub/pull/8342
- chore(ingest/delta-lake): cleanup import error handling by @hsheth2 in https://github.com/datahub-project/datahub/pull/8230
- test(ingest/nifi): Configure nifi tests for arm64 by @asikowitz in https://github.com/datahub-project/datahub/pull/8363
- build(ingest): Pin pydeequ to unblock CI by @asikowitz in https://github.com/datahub-project/datahub/pull/8381
- fix(ingest/sql-common): Fix profile_table_level_only by @asikowitz in https://github.com/datahub-project/datahub/pull/8331
- feat(ingest): schema-aware SQL parsing for column-level lineage by @hsheth2 in https://github.com/datahub-project/datahub/pull/8334
- fix(config) Set search and browse flags default off by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8378
- test(ingest/kafka): Configure kafka connect tests for arm64 by @asikowitz in https://github.com/datahub-project/datahub/pull/8362
- fix(ui): fix a too much recursion error when column lineage is highlighted by @Masterchen09 in https://github.com/datahub-project/datahub/pull/8207
- fix(ingest/s3): Deequ import rearragement by @treff7es in https://github.com/datahub-project/datahub/pull/8389
- feat(ingest): Add disable flag for TopicRecordNameStrategy by @segun-s in https://github.com/datahub-project/datahub/pull/8224
- refactor(graphql): make graphql engine extensible by @shirshanka in https://github.com/datahub-project/datahub/pull/8394
- feat(ui) Allow a configurable default tab for domain entity profile page by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8316
- test(ingest): Aspect level golden file comparison by @asikowitz in https://github.com/datahub-project/datahub/pull/8310
- test(ingest/airflow): Fix test for airflow 2.6.3 by @asikowitz in https://github.com/datahub-project/datahub/pull/8393
- feat(ingest/bigquery): support column-level lineage by @hsheth2 in https://github.com/datahub-project/datahub/pull/8382
- build(ingest): Inline import testing utils for check cli by @asikowitz in https://github.com/datahub-project/datahub/pull/8400
- refactor(ui): uniform ordering of items on the entities sidebar section by @sudhakarast in https://github.com/datahub-project/datahub/pull/8365
- test(ingest/testing-utils): Add back delta info ignore path by @asikowitz in https://github.com/datahub-project/datahub/pull/8402
- fix(ingest/bigquery): skip self-references when generating lineage by @hsheth2 in https://github.com/datahub-project/datahub/pull/8403
- feat(ingest): datamodel to ingest organisation role metadata for a dataset by @sheeru in https://github.com/datahub-project/datahub/pull/8267
- test(ingest/kafka-connect): Attempt to fix flaky test by @asikowitz in https://github.com/datahub-project/datahub/pull/8404
- feat(ingest/dbt-cloud): reduce graphql query complexity by @hsheth2 in https://github.com/datahub-project/datahub/pull/8390
- fix(ingest/snowflake): fix azure cloud region ids in external url by @mayurinehate in https://github.com/datahub-project/datahub/pull/8376
- feat(elasticsearch): Implement optimization to use reindexing instead… by @iprentic in https://github.com/datahub-project/datahub/pull/8352
- feat(ingest/presto-on-hive): Extracting all the table properties from Hive Metastore by @treff7es in https://github.com/datahub-project/datahub/pull/8348
- feat(openapi): Add openapi endpoint for getting task status by @iprentic in https://github.com/datahub-project/datahub/pull/8391
- feat(ingest/airflow): able to set
platform_instance
inDataset
by @dungdm93 in https://github.com/datahub-project/datahub/pull/8313 - test(ingest/minio): Configure delta lake minio tests for arm64 by @asikowitz in https://github.com/datahub-project/datahub/pull/8364
- docs(ingest): Add warning for Python 3.7 deprecation by @asikowitz in https://github.com/datahub-project/datahub/pull/8411
- fix(ingest/tableau): graceful handling of get all datasources failure… by @mayurinehate in https://github.com/datahub-project/datahub/pull/8406
- fix(owner): Corrects ownership aspect generation during update operations by @pedro93 in https://github.com/datahub-project/datahub/pull/8399
- chore(stats): change default stats lookback by @anshbansal in https://github.com/datahub-project/datahub/pull/8408
- feat(ingest/kafka-connect): allow setting platform_instance for kafka… by @mayurinehate in https://github.com/datahub-project/datahub/pull/8299
- fix(ingestion/powerbi): increment msal version by @mohdsiddique in https://github.com/datahub-project/datahub/pull/8385
- docs(perf-test) Update README by @eboneil in https://github.com/datahub-project/datahub/pull/8410
- fix(ingest/s3): fix test flakiness by @treff7es in https://github.com/datahub-project/datahub/pull/8416
- fix(ingest): tweak ingestion exit codes by @hsheth2 in https://github.com/datahub-project/datahub/pull/8418
- build(ingest/boto3): Update boto3-stubs to fix CI by @asikowitz in https://github.com/datahub-project/datahub/pull/8425
- feat(ingest/snowflake): View CLL from sql parsing of view definition by @asikowitz in https://github.com/datahub-project/datahub/pull/8419
- fix(ingest/snowflake): Add sqlglot as snowflake dependency by @asikowitz in https://github.com/datahub-project/datahub/pull/8427
- fix(schema-reg): allow other response codes from schema registry check by @david-leifker in https://github.com/datahub-project/datahub/pull/8302
- fix: add docs on update description via graphQL by @yoonhyejin in https://github.com/datahub-project/datahub/pull/8289
- docs(databricks/spark-lineage): Fix incorrect statement by @asikowitz in https://github.com/datahub-project/datahub/pull/8423
- feat(browsev2): styling updates and select platform by @joshuaeilers in https://github.com/datahub-project/datahub/pull/8428
- fix(ui ingestion): fixing issue where stale fields could stick around when changing recipes by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8421
- ci: workarounds for pyyaml installation by @hsheth2 in https://github.com/datahub-project/datahub/pull/8435
- build(ingest/boto3): Update boto3-stubs to fix CI by @asikowitz in https://github.com/datahub-project/datahub/pull/8452
- fix(ingestion-redshift): Fix Redshift ingestion logs by @arunvasudevan in https://github.com/datahub-project/datahub/pull/8454
- fix(ingest/bigquery): make sql parsing more robust by @hsheth2 in https://github.com/datahub-project/datahub/pull/8450
- fix(GreatExpections): AssertionRunEventClass does not match the examp… by @JifeiMei in https://github.com/datahub-project/datahub/pull/8243
- chore(ingest): hide ignore old/new state options by @hsheth2 in https://github.com/datahub-project/datahub/pull/8438
- docs(env): add env vars authentication by @david-leifker in https://github.com/datahub-project/datahub/pull/8436
- feat(graphql-plugins): add ability for plugins to call back to core e… by @shirshanka in https://github.com/datahub-project/datahub/pull/8449
- feat(io): refactor metadata-io module by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8306
- feat(ingest/mysql): Add estimate row count for mysql by @eboneil in https://github.com/datahub-project/datahub/pull/8420
- ingest(elasticsearch): add basic profiling by @anshbansal in https://github.com/datahub-project/datahub/pull/8351
- feat(ingest/lookml): fail when nothing was produced by @hsheth2 in https://github.com/datahub-project/datahub/pull/8464
- chore(ingest): drop bigquery-beta and snowflake-beta aliases by @hsheth2 in https://github.com/datahub-project/datahub/pull/8451
- feat(ingest/nifi): add support for basic auth in nifi by @mayurinehate in https://github.com/datahub-project/datahub/pull/8457
- Fix query_tab test that was failing on CI run by @kkorchak in https://github.com/datahub-project/datahub/pull/8463
- ingest(mysql): add storage bytes information by @anshbansal in https://github.com/datahub-project/datahub/pull/8294
- fix(cache) Fix caching bug with new search filters by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8434
- fix(browseV2) Escape forward slashes in browse v2 query by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8446
- fix(ingestion/powerbi-report-srever): handle requests.exceptions.JSONDecodeError by @mohdsiddique in https://github.com/datahub-project/datahub/pull/8442
- feat(sdk): easily generate container urns by @hsheth2 in https://github.com/datahub-project/datahub/pull/8198
- Update presto-on-hive URN in data_platforms.json by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8484
- fix(mysql): getting table name correctly by @anshbansal in https://github.com/datahub-project/datahub/pull/8476
- feat(ingest/elastic): reduce number of calls made by @anshbansal in https://github.com/datahub-project/datahub/pull/8477
- refactor(search): Support searching multiple entities in search() as in scroll() by @iprentic in https://github.com/datahub-project/datahub/pull/8461
- fix(ingest): Generate browse paths v2 for more sources; properly pass platform_instance by @asikowitz in https://github.com/datahub-project/datahub/pull/8501
- chore(ingest): add example of training metric/hyper parameters by @anshbansal in https://github.com/datahub-project/datahub/pull/8491
- feat(ingest): enable pipeline reporting by default by @hsheth2 in https://github.com/datahub-project/datahub/pull/8472
- feat(docs) Add guide for generating browsePathsV2 aspects by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8448
- fix(browsepathv2): default browse path with empty space by @anshbansal in https://github.com/datahub-project/datahub/pull/8503
- docs: add docs on sqlglot lineage by @hsheth2 in https://github.com/datahub-project/datahub/pull/8482
- feat(search ui): Adding support for pluggable filter rendering by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8455
- fix(ingest): hint at --update-golden-files option when tests fail by @hsheth2 in https://github.com/datahub-project/datahub/pull/8507
- ci: fix commandLine usage in build.gradle by @hsheth2 in https://github.com/datahub-project/datahub/pull/8510
- fix(ui) Fix broken dataPlatformInstance references in browseV2 by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8485
- fix(dataProduct) Show entity count excluding soft deleted entities by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8444
- feat(ui): Adding support for rendering assertion health status in Dataset Search Card, Search Preview, etc. by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8460
- docs(ingest/bigquery): add permissions to profile google drive backed… by @mayurinehate in https://github.com/datahub-project/datahub/pull/8490
- chore(ingest/tableau): miscellaneous cleanup refractor by @mayurinehate in https://github.com/datahub-project/datahub/pull/8417
- docs(ingest/lookml): clarify connection map config by @hsheth2 in https://github.com/datahub-project/datahub/pull/8508
- config(ebean): add ebean retry configuration by @david-leifker in https://github.com/datahub-project/datahub/pull/8500
- fix(ingest): respect max_threads for ingestion reporter by @hsheth2 in https://github.com/datahub-project/datahub/pull/8521
- chore(ingest): bump sqllineage and sqlparse by @hsheth2 in https://github.com/datahub-project/datahub/pull/8481
- fix(search): fix lightning cache enable logic by @david-leifker in https://github.com/datahub-project/datahub/pull/8522
- docs(docker): document docker container dependency tree by @david-leifker in https://github.com/datahub-project/datahub/pull/8496
- feat(lineage): Apply search flags to scroll query in LineageSearchService by @iprentic in https://github.com/datahub-project/datahub/pull/8518
- feat(search): Throw exception instead of returning an empty response from scroll in an error case by @iprentic in https://github.com/datahub-project/datahub/pull/8517
- fix(gms): GMS hang when upgrade image #8270 by @yangjiandan in https://github.com/datahub-project/datahub/pull/8271
- fix(ui): Allows deselection of members in add members modal for a group by @Sukeerthi31 in https://github.com/datahub-project/datahub/pull/8349
- fix(ui) Remove initial redirect logic from frontend by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8401
- fix(sso) - Add redirect_uri to authenticate route on 401 error by @mkamalas in https://github.com/datahub-project/datahub/pull/8346
- fix(auth): ignore case when comparing http headers by @lix-mms in https://github.com/datahub-project/datahub/pull/8356
- fix(ui): use locale lowercase when filtering columns of an entity in the lineage by @Masterchen09 in https://github.com/datahub-project/datahub/pull/8213
- feat(elasticsearch): allow bulk delete by @david-leifker in https://github.com/datahub-project/datahub/pull/8424
- feat(metrics): add metrics for aspect write and bytes by @david-leifker in https://github.com/datahub-project/datahub/pull/8526
- fix(ingest/build): Fix sagemaker mypy and flake8 issues by @treff7es in https://github.com/datahub-project/datahub/pull/8530
- feat(siblings): hiding non-existant siblings in FE by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8528
- fix(ingest): pin boto3-stubs in CI by @hsheth2 in https://github.com/datahub-project/datahub/pull/8527
- docs: small update to homepage by @shirshanka in https://github.com/datahub-project/datahub/pull/8483
- fix(ingest): remove duplication of tags by @anshbansal in https://github.com/datahub-project/datahub/pull/8532
- ci: reduce git fetch depth by @hsheth2 in https://github.com/datahub-project/datahub/pull/8473
- feat(ingest/vertica): performance improvement and bug fixes by @vishalkSimplify in https://github.com/datahub-project/datahub/pull/8328
- test(ingest): test case statements with sql parser by @hsheth2 in https://github.com/datahub-project/datahub/pull/8437
- feat(ingestion/tableau): support column level lineage for custom sql by @mohdsiddique in https://github.com/datahub-project/datahub/pull/8466
- fix(ingest/json-schema): convert non-string enums to strings by @benjamin-awd in https://github.com/datahub-project/datahub/pull/8479
- feat(browseV2): add browseV2 logic to system update by @RyanHolstien in https://github.com/datahub-project/datahub/pull/8506
- feat(cli): Adds ability to upload recipes to DataHub's UI by @pedro93 in https://github.com/datahub-project/datahub/pull/8317
- feat(presto-on-hive): allow v1 fieldpaths in the presto-on-hive source by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8474
- fix(ui) Make multiple small updates to new search and browse by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/8524
- feat(search): Allow aggregating on facets that are not explicitly part of default filter set by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/8540
- fix(test): increase siblings.js test stability by @david-leifker in https://github.com/datahub-project/datahub/pull/8542
New Contributors
- @matthew-coudert-cko made their first contribution in https://github.com/datahub-project/datahub/pull/7913
- @eboneil made their first contribution in https://github.com/datahub-project/datahub/pull/8283
- @fjmacagno made their first contribution in https://github.com/datahub-project/datahub/pull/8246
- @segun-s made their first contribution in https://github.com/datahub-project/datahub/pull/8222
- @sudhakarast made their first contribution in https://github.com/datahub-project/datahub/pull/8365
- @sheeru made their first contribution in https://github.com/datahub-project/datahub/pull/8267
- @dungdm93 made their first contribution in https://github.com/datahub-project/datahub/pull/8313
- @JifeiMei made their first contribution in https://github.com/datahub-project/datahub/pull/8243
- @kkorchak made their first contribution in https://github.com/datahub-project/datahub/pull/8463
- @Sukeerthi31 made their first contribution in https://github.com/datahub-project/datahub/pull/8349
- @lix-mms made their first contribution in https://github.com/datahub-project/datahub/pull/8356
- @benjamin-awd made their first contribution in https://github.com/datahub-project/datahub/pull/8479
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.4...v0.10.5
v0.10.4
Released on 2023-06-09 by @pedro93.
View the release notes for v0.10.4 on GitHub.
v0.10.3
Released on 2023-05-25 by @iprentic.
View the release notes for v0.10.3 on GitHub.
DataHub v0.10.2
Released on 2023-04-13 by @iprentic.
View the release notes for DataHub v0.10.2 on GitHub.
DataHub v0.10.1
Released on 2023-03-23 by @aditya-radhakrishnan.
View the release notes for DataHub v0.10.1 on GitHub.
DataHub v0.10.0
Released on 2023-02-07 by @david-leifker.
View the release notes for DataHub v0.10.0 on GitHub.
DataHub v0.9.6.1
Released on 2023-01-31 by @david-leifker.
View the release notes for DataHub v0.9.6.1 on GitHub.
DataHub v0.9.6
Released on 2023-01-13 by @maggiehays.
View the release notes for DataHub v0.9.6 on GitHub.
DataHub v0.9.5
Released on 2022-12-23 by @jjoyce0510.
View the release notes for DataHub v0.9.5 on GitHub.
[Known Issues] DataHub v0.9.4
Released on 2022-12-20 by @maggiehays.
View the release notes for [Known Issues] DataHub v0.9.4 on GitHub.
DataHub v0.9.3
Released on 2022-11-30 by @maggiehays.
View the release notes for DataHub v0.9.3 on GitHub.
DataHub v0.9.2
Released on 2022-11-04 by @maggiehays.
View the release notes for DataHub v0.9.2 on GitHub.
DataHub v0.9.1
Released on 2022-10-31 by @maggiehays.
View the release notes for DataHub v0.9.1 on GitHub.
DataHub v0.9.0
Released on 2022-10-11 by @szalai1.
View the release notes for DataHub v0.9.0 on GitHub.
DataHub v0.8.45
Released on 2022-09-23 by @gabe-lyons.
View the release notes for DataHub v0.8.45 on GitHub.
DataHub v0.8.44
Released on 2022-09-01 by @jjoyce0510.
View the release notes for DataHub v0.8.44 on GitHub.
DataHub v0.8.43
Released on 2022-08-09 by @maggiehays.
View the release notes for DataHub v0.8.43 on GitHub.
v0.8.42
Released on 2022-08-03 by @gabe-lyons.
View the release notes for v0.8.42 on GitHub.
v0.8.41
Released on 2022-07-15 by @anshbansal.
View the release notes for v0.8.41 on GitHub.
v0.8.40
Released on 2022-06-30 by @gabe-lyons.
View the release notes for v0.8.40 on GitHub.
v0.8.39
Released on 2022-06-24 by @maggiehays.
View the release notes for v0.8.39 on GitHub.
[!] DataHub v0.8.38
Released on 2022-06-09 by @jjoyce0510.
View the release notes for [!] DataHub v0.8.38 on GitHub.
[!] DataHub v0.8.37
Released on 2022-06-09 by @jjoyce0510.
View the release notes for [!] DataHub v0.8.37 on GitHub.
DataHub V0.8.36
Released on 2022-06-02 by @treff7es.
View the release notes for DataHub V0.8.36 on GitHub.
[!] DataHub v0.8.35
Released on 2022-05-18 by @dexter-mh-lee.
View the release notes for [!] DataHub v0.8.35 on GitHub.
v0.8.34
Released on 2022-05-04 by @maggiehays.
View the release notes for v0.8.34 on GitHub.
DataHub v0.8.33
Released on 2022-04-15 by @dexter-mh-lee.
View the release notes for DataHub v0.8.33 on GitHub.
DataHub v0.8.32
Released on 2022-04-04 by @dexter-mh-lee.
View the release notes for DataHub v0.8.32 on GitHub.