Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified the Metadata model overview page #46

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ nav:
- "Overview": metadata/overview.md
- "Entities & Attributes": metadata/entities.md
- "Standards": metadata/standards.md
- "Data Dictionary Overview": metadata/data_dictionary_overview
- "Data Dictionary": metadata/data_dictionary
- "CLI Tools": cli_tools
- "FAQ": faq.md
Expand Down
Binary file added user_docs/assets/img/Overview_fig.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 3 additions & 22 deletions user_docs/metadata/overview.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,7 @@
# The GHGA Metadata Model
## **Glossary**
- **Entity**: An Entity holds characteristics of a real-world object. Example: The Individual entity is described by the information (properties) for sex, year of birth and height.

- Synonyms: class, table, object
The GHGA metadata model aims at facilitating comprehensive submissions that maximize the amount of collected metadata without creating friction on the submitter side, enabling (reusable) submissions of different types of -omics data into GHGA. The schema consists of **Research Metadata** and the **Administrative Metadata**. The **Research Metadata** aims at maximising the reusability and FAIRness of the data and the **Administrative Metadata** focuses on managing the resources, such as creation or acquisition of the data, rights management, and disposition. The schema also differentiates between file types depending on whether they were generated through primary analysis (**Research Data File**), secondary analysis (**Process Data File**) or supplementary information to the classes (**Supporting File**)

- **Property**: A Property is a single characteristic that can be used in combination with other characteristics to describe a real-world object. Example: The combination of the properties sex, year of birth and height describe the (real-world object) entity Individual.
The GHGA metadata model follows several internationally renowned concepts, standards, and resources to provide a metadata schema to share data in a standardized and harmonized fashion. Please visit (https://zenodo.org/records/8341224) for further details.

- Synonyms: attribute, element, field, slot

- **FAIR**: Findable, Accessible, Interoperable, Reusable

## **Introduction**
The German Human Genome-Phenome Archive (GHGA) provides a nation-wide resource for archiving, accessing and sharing of multi-omics data produced and processed in research and health care initiatives in Germany. GHGA aims to bring these data together and make it easier to find data for secondary use, by adopting and adhering to [FAIR data principles](https://doi.org/10.1038/sdata.2016.18). In order to meet the domain-specific requirements we developed the GHGA Metadata Schema - a schema for representing information pertaining to various aspects of our data.

This documentation serves as the description and reasoning behind the Metadata Model of GHGA, which encapsulates the metadata schema, its technical implementation, and resources to support submission of metadata. The Archive function of GHGA is envisioned to handle a wide variety of omics and research data. The GHGA metadata model aims at facilitating comprehensive submissions that maximize the amount of collected metadata without creating friction on the submitter side, enabling (reusable) submissions of different types of -omics data into GHGA. This metadata model can satisfy the heterogeneous needs of submitters while maintaining the FAIR principles, interoperability with EGA and facilitating streamlined user journeys.

Classes in the schema can be grouped into **Research Metadata** and **Administrative Metadata** based on the information they capture. The **Research Metadata** aims at maximising the reusability and FAIRness of the data, while the **Administrative Metadata** focuses on managing the resources, such as creation or acquisition of the data, rights management, and disposition. The Research Metadata classes include *Individual*, *Biospcimen/Sample*, *Experiment*, *Experiment Method*, *Analysis* and *Analysis Method*. The Administrative Metadata captures *Dataset*, *Data Access Policy*, *Data Access Committee*, *Publication*, and *Study*.

The model also differentiates between three file types:

- **Research Data File**: A file which results from the omics experiment, such as sequencing of a sample.
- **Process Data File**: A file that is generated as output from an analysis performed on a *Research Data File*, such as alignment or processing.
- **Supporting File**: A file that provides further information about an *Individual*, *Experiment Method* or *Analysis Method*. These could be unstructured protocols or structured information, such as Phenopackets or BioCompute Objects.

Furthermore we provide data submitters with a Submission Spreadsheet in order to easily deposit their data within GHGA.
![GHGA Metadata Model Overview](../assets/img/Overview_fig.png)
Loading