lunduniversity.lu.se

Robotics and Semantic Systems

Computer Science | LTH | Lund University

Denna sida på svenska This page in English

Events

CS MSc Thesis Presentation 6 February 2025

Föreläsning

From: 2025-02-06 13:15 to 14:00
Place: E:2405 (Glasburen)
Contact: birger [dot] swahn [at] cs [dot] lth [dot] se


One Computer Science MSc thesis to be presented on 7 February

Thursday, 6 February there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.

The presentation will take place in E:2405 (Glasburen).

Note to potential opponents: Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation (firstname.lastname@cs.lth.se). Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions are found on this page.


13:15-14:00 in E:2405 (Glasburen)

Presenter: Valentin Haara
Title: Enhancing Function Matching in Binary Code with Machine Learning Models
Examiner: Jonas Skeppstedt
Supervisor: Marcus Klang (LTH)

This thesis focuses on improving existing function-matching methods and proposing new techniques. The project explores different machine learning-based approaches and representations for comparing functions extracted from binary code. By refining the binary extraction process and introducing alternative function representations, this work enhances the efficiency and accuracy of function matching. The proposed methods are evaluated on multiple datasets, including real-world Electrical Control Unit (ECU) binaries and open-source programs, demonstrating significant improvements in execution time and scalability over existing brute force approaches.

The most promising model is a combination of Term Frequency - Inverse Document Frequency and truncated Single Value Decomposition (TF-IDF/tSVD), together with the function representation data flow 2. It was able to find the correct function out of 6.5k functions in 82 per cent of the 1.7k runs. In 90 per cent of the runs the correct function was in the top 10 hits.

Link to popular science summary: To be uploaded