In the BERGAMOS project, we are collaborating with the DBCLS in Japan in the fields of biomedical text mining and misinformation.
Upon returning from Japan, I worked as a research assistant at the OntoGene group; programming in python to improve our entity recognition pipeline. Furthermore, I assisted in teaching introductory courses in information extraction and text mining, holding the occasional lecture and designing exercises.
Through a research grant issued by the Japan Society for the Promotion of Science and building on my previous involvement with the OntoGene group, I got a chance to become a visiting researcher at DBCLS working on various text mining projects, such as PubAnnotation.
Building upon my master thesis and PubAnnotation's ability to obtain annotations from third-party services, I implemented a web service providing dependency parsing on demand. While this service was designed with PubAnnotation in mind, it is independent and open for any use.
Focus on information extraction, HCI (particularly sustainable HCI) and big data. Please find below a selection of projects and papers I worked on during this course.
The bachelor course was offered by UZH and ETH, and provided me with a strong basis in Computer Science, including some computer graphics, and an overview of the field of neuroinformatics.
This project at INI allowed me to participate in an investigation of the utility of a GFP, from the very beginning of the experiment including injecting the virus, perfusion, preparation of samples and evaluation.
The Swiss Monitoring of Adverse Drug Events (SwissMADE) project is part of the SNSF-funded Smarter Health Care initiative, which aims at improving health services for the public. Its goal is to use text mining on electronic patient reports to automatically detect adverse drug events automatically in hospitalised elderly patients who received anti-thrombotic drugs. The project is the first of its kind in Switzerland: the data is provided by four hospitals from both the German- and French-speaking part of Switzerland, all of which had not previously released electronic patient records for research, making extraction and anonymisation of records one of the major challenges of the project.
The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.
We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.
Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.
We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.
This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step. The technique is then evaluated on the CRAFT corpus. We present the performance we obtained, analyze the errors and propose a possible follow-up of this work.