Robotics and Semantic Systems

Computer Science | LTH | Lund University

Denna sida på svenska This page in English


CS MSc Thesis Presentation 6 February 2023


From: 2023-02-06 13:15 to 14:00
Place: E:4130 (Lucas)
Contact: birger [dot] swahn [at] cs [dot] lth [dot] se
Save event to your calendar

One Computer Science MSc thesis to be presented on 6 February

Monday, 6 February there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.

The presentation will take place in room E:4130 (Lucas).

Note to potential opponents: (Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation ( Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions are found on this page.)

13:15-14:00 in E:4130 (Lucas)

Presenter: Silke Kylberg
Title: Optimizing End-to-End Neural Speaker Diarization for Swedish Customer Service Conversations
Examiner: Pierre Nugues
Supervisors: Dennis Medved (LTH), Ludwig Engström (Voxo)

Speaker diarization is a method used to answer the question "who spoke when" in an audio recording. The applications vary from movies to telephone calls, and in combination with a speech recognition system, speaker diarization can be used to enrich speech-to-text transcription with speaker labels. However, speaker diarization often requires a lot of training data. In this thesis, we investigated how to train the EEND-vector Clustering model with different types of datasets to achieve well-functioning diarization performance for Swedish customer service calls. The model was trained with English and Swedish non-domain-specific simulations, Swedish domain-specific simulations and real Swedish telephone conversations annotated with voice activity detection (VAD). Thereafter, real Swedish telephone conversations were used for fine-tuning the pre-trained models, using VAD as annotation method, and for evaluating the model performance. The best performance was reached when fine-tuning the model pre-trained with Swedish and English non-domain-specific simulations, with a DER of 12.87%.

Link to popular science summary: To be updated