Author Image

Hi, I am Sara El-Gebali, PhD

Sara El-Gebali, PhD

Metadata Specialist at DataCite

I am a dedicated Project Lead, Product Owner, and Research Data Manager with a Ph.D. in cancer research. My journey has been driven by a passion for transforming data into meaningful insights, and I have been fortunate to work on projects that align with this goal.
Currently, as a Metadata Specialist at DataCite, I contribute to strategic planning, promote and refine the DataCite metadata schema, and ensure its seamless integration into broader PID infrastructures. I also perform quality assurance checks and provide training and support to enhance metadata quality.
Before joining DataCite, I worked at SciLifeLab-DataCentre, where I led the implementation of open-source technical solutions. I also had the pleasure of leading a team of developers and data stewards in an agile environment. My certification as a product owner from scrum.org has been instrumental in this role.
In my previous position as Unit Head for Research Data Management at the Max Delbrück Centre for Molecular Medicine (MDC), I helped develop strategies, policies, and procedures to support open research.
One of the projects I am proud of is leading the development of the Open Data module for NASAs OpenCore project. It was a unique opportunity to coordinate efforts across the globe.
Co-founding FAIRPoints and serving on the steering committee for the FAIR Digital Objects forum has allowed me to engage with a community that shares my commitment to FAIR data principles.
I also mentor at Open Life Science and actively participate in various communities, promoting Open & FAIR Research practices and advocating for equality, diversity, and inclusion.
If you are interested in discussing FAIR data principles, open science, or data-driven research practices, I would love to connect.

Professional Scrum Product Owner™ I (PSPO I)
Open Data Module Lead
Carpentries Instructor
Leadership
Open Science
Teaching

Skills

Experiences

1
DataCite

November 2023 - Present

Germany-Remote

DataCite is a world-leading provider of persistent identifier services to help make research outputs and resources findable, citable, connected, and reused globally.

Metadata Specialist

November 2023 - Present

Responsibilities:
  • Champion the promotion, refinement, and integration of the DataCite metadata schema across broader PID infrastructures.
  • Conduct quality assurance checks and metric analysis to elevate metadata quality.
  • Contribute to the strategic planning and concept development within the PID4NFDI project.
  • Provide expert training, guidance, and support.

FAIRPoints

2022 - Present

Remote

The event series highlighting pragmatic measures developed by the community towards the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) data principles.

Co-Founder

2022 - Present

Responsibilities:
  • Initiated and currently co-lead FAIRPoints, a pivotal platform fostering discussions and training on FAIR principles globally.
  • Organize international webinars and workshops aimed at advancing data management standards implementation.
2

3

Remote

The FDO Forum is proposed as a neutral place where policy, researchers and technological experts can meet to exchange information about all relevant aspects related to FDOs.

Steering Committee Member

2022 - Present

Responsibilities:
  • Engage in strategic planning and initiative execution to promote FAIR Digital Objects’ adoption in research and data management.
  • Develop guidelines and standards to boost digital objects’ interoperability, accessibility, and reusability.

Open Life Science

2023 - Present

Remote

OLS is a not-for profit organisation dedicated to capacity building and diversifying leadership in research worldwide. We imagine a future where research is accessible, inclusive, and equitable for everyone.

Governance Committee Member

2023 - Present

Responsibilities:
  • Advising on policy and direction within Open Life Science to promote ethical and inclusive open science practices.
  • Mentor and guide early-career researchers in implementing effective open science principles within their projects.
4

5

Remote

Transform to Open Science (TOPS) is a NASA initiative designed to rapidly transform agencies, organizations, and communities to an inclusive culture of open science. TOPS is part of NASA´s Open-Source Science Initiative.

Open Data Module Lead

2022 - 2022

Responsibilities:
  • Led the development of the Open Data module within the NASA OpenCore project, enhancing data accessibility and openness.

RDA & EOSC Future

August 2021 - Present

Remote

RDA/EOSC Future Domain Ambassador

August 2021 - Present

Responsibilities:
  • The key focus of my ambassador activities was to deliver a series of “Ask Me Anything”-style events featuring keynote speakers from the RDA and EOSC groups focused on RDA activities and EOSC solutions in relation to FAIR implementation and Open practices in Science.
6

7
SciLifeLab-DataCentre

August 2021 - October 2023

Uppsala-Sweden

Project leader for Metadata and Curation

August 2021 - October 2023

Responsibilities:
  • Directed the implementation of multiple open-source technical solutions, notably the Digital Research Hub for Biosamples metadata handling.
  • Promoted Open & FAIR practices extensively through strategic outreach and training programs.
  • Managed a dynamic team as Agile Product Owner, comprising developers and data stewards.

Berlin-Germany

Head of the Research Data Management Team

2020 - 2021

Responsibilities:
  • Established and led the research data management unit, crafting policies and strategies that strengthened research data governance and support.
  • Supported and advised researchers, providing training and workshops.
8

9

Cambridge-UK

Scientific Database Curator

2016 - 2019

Responsibilities:
  • Curated extensive protein sequence datasets at Pfam.
  • Managed bioinformatics resources for protein biology.
  • Taught bioinformatics resources for protein sequence analysis
  • Contributed to regulatory and governance processes as a member of the Staff Association committee.

Heidelberg-Germany

Scientific Data Editor

2015 - 2016

Responsibilities:
  • Conducted in-depth analysis and interpretation of scientific content to create structured and machine-readable metadata.
  • Coordinated activities with software and bioinformatics teams to develop databases and user interfaces.
  • Developed workflows for editorial and production processes and created new avenues for promoting the SourceData project.
10

Education

MSc Molecular Pathology and Genomics
BSc Molecular Biology
International Baccalaureate

Projects

OpenCIDER
OpenCIDER
Founder 2020 - Present

Open Computational Inclusion & Digital Equity Resource .

RDM workshop
Developer 2020

Research Data Management workshop material

Publications

Ten simple rules for pushing boundaries of inclusion at academic events.
PLoS Comput Biol. 1 March 2024

Inclusion at academic events is crucial for ensuring diversity, building networks, and avoiding academic siloing. Ten Simple Rules aim to raise awareness and provide actionable suggestions to support Equity, Diversity, and Inclusivity practices in academic events.

NASA TOPS Open Science 101
Zenodo 6 December 2023

This is the initial v1.0.0 release of NASA’s Open Science 101 curriculum converted into Markdown from openscience101.org. The open science curriculum will introduce those beginning their open science journey to important definitions, tools, and resources; and provide participants at all levels recommendations on best practices.

SourceData a semantic platform for curating and searching figures
Nature Methods 31 October 2017

SourceData a platform that allows researchers and publishers to share scientific figures and (when available) underlying source data in a way that is machine readable and findable.

Harmonizing Metadata Across Disciplines – Bioschemas and the DataCite Metadata Schema
DataCite 7 March 2024

The blog post highlights the integration of Bioschemas with the DataCite Metadata Schema to enhance metadata interoperability across various scientific disciplines. This collaboration aims to improve the findability and usability of life sciences data by standardizing metadata descriptions and facilitating cross-domain data sharing.

DataCite metadata training
Zenodo 9 April 2024

DataCite Metadata Training covering The DataCite Metadata Schema, DataCite metadata journey, Making connections with DataCite metadata

RDA EOSC Future Spotlight
RDA website 7 June 2020

RDA/EOSC Future Domain Ambassador Spotlight- Sara El-Gebali, Life Science.

InterPro in 2017-beyond protein family and domain annotations.
Nucleic Acids Res. 4 January 2017

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro’s predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

InterPro in 2019- improving coverage, classification and access to protein sequence annotations
Nucleic Acids Res. 8 January 2019

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro’s sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

Overcoming the obstacles to data sharing
Science Business 14 September 2023

The article from Science|Business highlights the hurdles to effective data sharing, despite the push for open science. These challenges include variations across disciplines, legal and ethical concerns, and insufficient incentives. The Research Data Alliance (RDA) is tackling these issues by promoting FAIR data principles, offering guidelines, and supporting researchers with training and infrastructure. The RDA also emphasizes the role of domain ambassadors in fostering cross-disciplinary collaboration and making data sharing more practical and appealing for researchers.

Developing a mapping process framework
Zenodo 30 April 2024

FAIR-IMPACT conducts workshops on mapping and crosswalks within the community, aiming to standardise the mapping process by identifying common components and approaches used in community practices. These workshops consist of practical sessions, presentations on initial experiences, and feedback regarding the framework, followed by applying the framework to mapping exercises. Semantic artefacts, such as ontologies and vocabularies, are crucial for data interoperability and implementing FAIR principles. FAIR-IMPACT focuses on making mappings shareable and reusable, elevating them to citizens in the FAIR data world. In previous FAIR-IMPACT workshops, we explored the motivation behind mappings and methodologies for making them FAIR. In this workshop, we aim to formalise a mapping process framework based on community practices and trial its implementation. Visit the official event page to watch the video recording https://fair-impact.eu/events/fair-impact-events/developing-mapping-process-framework.

ELIXIR fair training handbook
Zenodo 19 March 2024

Train-the-trainer handbook for making training materials FAIR

FAIR Digital Objects Participation Guidelines
Zenodo 18 December 2023

FAIR Digital Objects Forum, Diversity Statement & Participation Guidelines

Inclusion and Digital Equity From Theory to Practice
Zenodo 25 July 2023

Invited Presentation- Inclusion and Digital Equity from theory to practice delivered at ISCB2023.

A New Odyssey- Pioneering the Future of Scientific Progress Through Open Collaboration
Zenodo 24 July 2023
The way forward with inclusive Open Science
Zenodo 10 March 2023

Presentation slides used during AAAS 2023 panel discussion session title- Foster Inclusion in Scientific Communities through Shared Data https://aaas.confex.com/aaas/2023/meetingapp.cgi/Paper/31052

Data Management Planning
Zenodo 19 July 2022

BHKi Seminars- Bioinformatics, software engineering and Data Management- Data Management Planning 19th July 2022

FAIR4Software-Workshop material
Zenodo 23 March 2022

Material for 1.5 hour workshop on FAIR principles with regards to software, and what is currently known, best practices for FAIRness.

The road beyond Open
Zenodo 9 May 2022

Presentation for Open Science Pathways in the Earth, Space, and Life Sciences- https://www.scilifelab.se/event/agu-scilifelab/

BOSSConf_2022_Research_Data_Management
Zenodo 26 April 2022

Presentation on Open Science, FAIR principles, Research Data Management, FAIR Software, delivered during BOSSConf 2022

Research Data Management 1 day workshop
Zenodo 25 February 2021

Materials for 1 full-day workshop on Research Data Management basics

Accomplishments

Data Visualizations with Kibana
Udemy May-2024

This course covered- Fundamentals of Kibana, Securing Kibana (users, roles, and spaces), Creating basic & advanced visualizations, Kibana Query Language (KQL), Creating and interacting with dashboards, Reporting and Alerting

Complete Guide to ElasticSearch
Udemy May-2024

This course covered- How to build a powerful search engine with Elasticsearch, The theory of Elasticsearch and how it works under-the-hood, How to write complex search queries, advanced knowled in the concepts and terminology of Elasticsearch.

FAIR Impact for Signposting and ROCRATE

Awarded FAIRImpact grant to contribute to initiatives aimed at enhancing the findability and accessibility of research outputs through improved signposting and the use of ROCRATE metadata standards.

Wellcome Genome Campus Award for Best Practice in Supporting Equality and Diversity in Science

Awarded for outstanding contributions to promoting equality and diversity within the scientific community, highlighting efforts to create an inclusive environment that fosters diversity in scientific research and education.

Brave the Shave Campaign

Participated in the Brave the Shave campaign, raising significant funds for MacMillan Cancer Support, and also contributed to the Little Princess Trust by donating hair for wigs for children undergoing cancer treatment.

Volunteer at Refugee Camps in Lesvos

Contributed to humanitarian efforts by volunteering at refugee camps, providing support and aid to displaced individuals.

Open Science Community Egypt
OSCE 2023

Played a pivotal role in establishing the Open Science Community in Egypt. Provided strategic advice and support, facilitating workshops and seminars to train local researchers in open science methodologies, thereby promoting open science practices among Egyptian researchers.

eLife Innovation Leaders 2020 Mentorship Program
eLife June-2020

eLife Innovation Leaders is a new open leadership training and mentorship programme designed for innovators developing prototypes or community projects to improve open science and research communication.

Podcasts and Media Appearances

FAIRDataPodcast

Podbean Interview on Open Science

RSEng Dev Stories

imakefoss