IDSIAresearch assistant, Lugano, 2021-22

In the BERGAMOS project, we are collaborating with the DBCLS in Japan in the fields of biomedical text mining and misinformation.

Institute for Computational Linguisticspublicationsresearch assistant & teaching, Zürich, 2017-20

Upon returning from Japan, I worked as a research assistant at the OntoGene group; programming in python to improve our entity recognition pipeline. Furthermore, I assisted in teaching introductory courses in information extraction and text mining, holding the occasional lecture and designing exercises.

SwissMADE presentationSNF project, 2018-20

In a collaboration with five Swiss hospitals we are working on digitally processing patient reports to infer new adverse drug effects. In this project, I am responsible for the technical realisation, legal safeguarding as well as the coordination with our partners.

MedMon CTI project, 2018-20

Mining social media data to harness patient insights for the novel and transformative concept of patient-centered drug in collaboration with a major pharmaceutical company. In this project, I am responsible for handling the data, and developing processing pipelines.

Database Center for Life Science research reportresearch grant in Japan, 2016

Through a research grant issued by the Japan Society for the Promotion of Science and building on my previous involvement with the OntoGene group, I got a chance to become a visiting researcher at DBCLS working on various text mining projects, such as PubAnnotation.

REST-ful inteface for obtaining fast dependency parsingpython and ruby project, 2016

Building upon my master thesis and PubAnnotation's ability to obtain annotations from third-party services, I implemented a web service providing dependency parsing on demand. While this service was designed with PubAnnotation in mind, it is independent and open for any use.

MSc of Software Systems summa cum laudeat UZH, Zürich, 2014-16

Focus on information extraction, HCI (particularly sustainable HCI) and big data. Please find below a selection of projects and papers I worked on during this course.

Dependency Parsing for Relation Extraction in Biomedical Literature publicationMSc thesis, 2016

Continuing my involvement with the OntoGene group and building on my previous project, I propose an improved way for extracting relations from biomedical texts. The thesis has received the top grade, and has been published since.

Bachelor of Applied Informatics (Neuroinformatics)at UZH and ETH, Zürich, 2009-13

The bachelor course was offered by UZH and ETH, and provided me with a strong basis in Computer Science, including some computer graphics, and an overview of the field of neuroinformatics.

Thalamo-Cortical Bouton Counts in Mouse Auditory Cortexreportlab project, 2012

This project at INI allowed me to participate in an investigation of the utility of a GFP, from the very beginning of the experiment including injecting the virus, perfusion, preparation of samples and evaluation.

Publications

Automated Detection of Adverse Drug Eventsfirst authorICPR, 2021

The Swiss Monitoring of Adverse Drug Events (SwissMADE) project is part of the SNSF-funded Smarter Health Care initiative, which aims at improving health services for the public. Its goal is to use text mining on electronic patient reports to automatically detect adverse drug events automatically in hospitalised elderly patients who received anti-thrombotic drugs. The project is the first of its kind in Switzerland: the data is provided by four hospitals from both the German- and French-speaking part of Switzerland, all of which had not previously released electronic patient records for research, making extraction and anonymisation of records one of the major challenges of the project.

Annotating the Pandemicfirst authorEMNLP, 2020

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.

Approaching SMM4H with Merged Models and Multi-task Learningco-authorSSM4H, 2019

We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.

Improving spaCy dependency annotationfirst authorGenomics Inform., 2019

Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

OGER++: hybrid multi-type entity recognitionco-authorJournal of Cheminformatics, 2019

We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.

Using a Hybrid Approach for Entity Recognition in the Biomedical Domainco-authorSMBM, 2016

This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step. The technique is then evaluated on the CRAFT corpus. We present the performance we obtained, analyze the errors and propose a possible follow-up of this work.