Marie Schmit

Bioinformatics and software engineer


Experience

Research Engineer in Bioinformatics

Institut Pasteur (FR, Paris)
Team Web Integration, Scientific Programmation and Reproducibility

Generation and aggregation of metadata on life-science workflows. Deployment of a a scalable knowledge base enabling efficient metadata retrieval for the project ShareFAIR, applying best practices on code reproducibility. Technical integration and lead. Help coordinating the project (organisation, scheduling, communication).

Tools: Python, Docker, SPARQL, SQL, JavaScript, Bash

November 2023 - Present

Master's Thesis

Cranfield University (UK, Cranfield)
Bioinformatics Department

Optimisation, improvement and implementation of new functionalities in a Java application (mapoptics) for DNA optical maps visualisation. Improvement and implementation of a comparative genomic pipeline. Integration of a server and its Docker image for genomic data comparison.

Tools: Java, Docker, Bash

May 2023 - Sept 2023

Internship, Low-Code Developer

Siemens Healthineers (GE, Erlangen)

Building prototypes of applications using low code tools (Power Apps, and Power Automate). Preparation, moderation and follow-up of workshops with stakeholders from different departements of Siemens Healthineers using the design thinking approach. Analysis of technical feasibility, user research, problem framing.

Tools: Power Apps, Power Automate, Agile Methods

March 2022 - Aug 2022

Internship, Software Developer

AddixWare (FR, Aix-en-Provence)

Implementation of IT tools optimising the search for candidates.

Tools: Python (PyPDF2, Pandas, JSON, tika, docx), Django, HTML, CSS

Jan 2021

Education

MSc Applied Bioinformatics

Cranfield University (UK, Cranfield)

Programmation applied to biology: Bioinformatics with Python (Biopython, seaborn), data analysis and statistics using R, Java, Machine Learning using R. Gene sequencing with Next Generation Sequencing (NGS) and 3rd generation sequencing (3GS). Epigenetics, proteomics, metabolomics (R, QIIME2). Software development front and back end: data integration and interaction (SQLITE, Java Script, Node.js, html).

Group project: Building a pipeline to assess the quality of organelle assemblies and annotations (workflow creation with Snakemake). Building a script to automatically curate organelles assemblies, by assessing the quality of the predicted coding sequence (Biopython, BCBio).

Sept 2022 - Sept 2023

System engineer in Computer Science and Microelectronics (ISMIN)

Ecole des Mines de Saint-Etienne (FR, Gardanne)

Computer Science (C and C++, SQL, Python, Machine Learning), algorithmic, graph and optimisation, statistics and probabilities, microelectronics (embedded C, Assembleur), management.

Sept 2020 - Sept 2023

Preparatory classe MPSI / MP (Maths and Physics)

Lycée Jean-Baptiste Kléber (FR, Strasbourg)

Intensive preparation for the national competitive entrance exams leading to French engineering school.

Sept 2018 - Sept 2020

Baccalauréat scientifique

Lycée Louis Casimir Teyssier (FR, Bitche)

Mention "very good"

Activities: European class German, MATh.EN.JEAN, latin

Sept 2015 - Sept 2018

Skills

Programming Languages & Tools
Bioinformatics
Platforms
Soft skills
Bureautique
Languages


Publications & Posters

A Standards-Based Knowledge Graph that Bridges Scientific Workflows, Run-Time Provenance, and Tool Registries
Authors: Marie Schmit, Ulysse Le Clanche, George Marchment, Sarah Cohen-Boulakia, Olivier Dameron, Alban Gaignard, Frédéric Lemoine, Hervé Ménager

SWAT4HCLS 2026, Mar 2026, Amsterdam, Netherlands (International conference)

Life science workflows are now prevalent for implementing, executing, and sharing complex data analyses, increasing their scalability and reproducibility. Adhering to the FAIR principles for software further reinforces their reproducibility and the reliability of their results. To maximize their FAIRness, consistent and standardised annotations are critical across several levels: workflows, individual steps, software tools, and input/output data. Such comprehensive metadata make workflows easier to understand, reuse and reproduce, while keeping track of the provenance of their results. However, a unified, queryable knowledge framework that integrates workflows with enriched metadata is lacking. To address this, we developed an integrated workflow knowledge base, that consolidates FAIR metadata from diverse sources and workflow languages into a standardised graph-based representation. It leverages established ontologies and standards (e.g. EDAM, schema.org) to enrich metadata, and link the workflow structure with its execution traces. Our approach provides FAIR-compliant metadata of publicly available pipelines, enabling queries at every granularity level, while accounting for the quality of source data annotation.
2026

BioFlow-Insight: facilitating reuse of Nextflow workflows with structure reconstruction and visualization
Authors: George Marchment; Bryan Brancotte; Marie Schmit; Frédéric Lemoine; Sarah Cohen-Boulakia doi:10.1093/nargab/lqae092

NAR Genomics and Bioinformatics, Volume 6, Issue 3, September 2024

Bioinformatics workflows are increasingly used for sharing analyses, serving as a cornerstone for enhancing the reproducibility and shareability of bioinformatics analyses. In particular, Nextflow is a commonly used workflow system, permitting the creation of large workflows while offering substantial flexibility. An increasing number of Nextflow workflows are being shared on repositories such as GitHub. However, this tremendous opportunity to reuse existing code remains largely underutilized. In cause, the increasing complexity of workflows constitute a major obstacle to code reuse. Consequently, there is a rising need for tools that can help bioinformaticians extract valuable information from their own and others’ workflows. To facilitate workflow inspection and reuse, we developed BioFlow-Insight to automatically analyze the code of Nextflow workflows and generate useful information, particularly in the form of visual graphs depicting the workflow’s structure and representing its individual analysis steps. BioFlow-Insight is an open-source tool, available as both a command-line interface and a web service. It is accessible at https://pypi.org/project/bioflow-insight/ and https://bioflow-insight.pasteur.cloud/.
2024

Interests & Volunteering

Volunteering
Card image
Communication manager, co-funder - SolidarISMIN (student charity)
2021-2023

Communication within the campus (students, administration). Management of the association's social media accounts. Written and visual communication.
Fundraising, student awareness and mobilisation, organisation of events and partnerships for various associations ( Aïda , Règles élémentaires , Association Neurofibromatoses et Recklinghausen ).
Supervision and coordination.

Card image
Logistic manager - FEI (Forum Entreprises ISMIN)
2021-2022

Material management of the ISMIN Business Forum: booking and setting up rooms and equipment, organising the buffet and tea room, scheduling the team.

Card image
Communication manager - Gala
2021-2022

Communication with graduates and their families, internal communication within the campus.
Organisation of the School Gala, on the occasion of the graduation ceremony.

Students sports office
2021-2022

Organisational support

Société Protectrice des Animaux (SPA)
2021

Help at dog shelter

Interests