The key objective of the task area Federated Repositories is to make all relevant research objects and its relations findable by a semantic search portal. We will contribute main components for an overarching single NFDI-portal. Our idea is guided by the insight, that bibliographic search needs specific knowledge about the existence of an object (e.g. titel, year, creator, and similar), but often fails to fit real needs.
In practice, people want to do something and want to know if or how that has been done before. Objects like algorithms, software artefacts, metadata standards, data formats, publications, instruments, electronic lab notebooks, repositories, or single steps in a lab experiment have a purpose. Because physicists are typically do-it-yourself minded, NFDI4Phys wants to offer a platform to register such solutions for specific problems to enable their findability, accessibility, and reuse far beyond the current system of publications and repositories like github. Especially small technical steps that are “scientifically uninteresting”, but valuable time savers can hardly be found with the current systems. Our NFDI4Phys Repository will connect to already existing and currently planned services like re3data, TIB terminology service, as well as the algorithm database of MarDI, but unifies towards a single point of entry for all those services.
Goals of the federated NFDI4Phys Repository are to enable:
- search, access, and analysis of data resources without worrying about the physical location or file format;
- effortless, ideally one-click archiving of data resources together with the corresponding metadata (in close relationshp with FAIR Laboratories);
- easy access to related material;
- to keep track if, how, and how often data resources have been used and to assure their authorship
To enable efficient content based search and access to the federated NFDI4Phys Repository, data resources have to be annotated with semantic metadata according to the aforementioned requirements. Semantic metadata enables content-based search, i.e. the need to employ specific predefined vocabularies to allow for retrieval of already known resources can be omitted and complemented with a similarity-driven search by description. Moreover, semantic metadata will also enable low maintenance data integration and interoperability. On the highest level descriptions of available services and data repositories will focus on their usage. The central NFDI4Phys portal serve as main point of access to NFDI4Phys services and resources and will enable query federation over all individual repositories connected under NFDI4Phys.
The NFDI4Phys Repository will act as a registry for data and metadata transformations as well as lab devices to enable findability, accessibility, and reuse of public data and speed up work in the lab. We will start to fill the registry with all our users to gain a critical mass for individual components (like devices).
Authentication and authorization are a central challenge in federated infrastructures. A user identity management which enables cryptographic proofs of authorship and provides the historical record of the data (data provenance), as well as the access to data and computational resources across organizations is needed. The currently available data and computing service landscape for the NFDI4Phys community is highly decentralized and mostly operated by researchers themselves or by local infrastructure providers. Moreover, these infrastructures are generally only available to researchers of the local institution. To reach the vision of a common, distributed data space each participant can contribute data resources as well as parts of their storage & compute infrastructures.
A reliable federated NFDI4Phys Repository will be realized in close cooperation with Task Area Metadata and Ontologies along a data space that feeds from a linked data knowledge graph, (meta-)data formats as well as protocols and interfaces meeting the requirements of the community. Our solution will guarantee long-term sustainable technical operation of the main portal which functionality mainly relys on trustworthy individual providers of information and storage.