+30 km
Uren
Opleiding
Contract
Ervaring
Salaris
Datum
Bedrijfstype
Zoek vacatures
Soortgelijke vacatures omgeving Veenendaal.
Laad meer vacatures

Masters thesis in Data & AI: Scaling a Machine Learning Data Quality framework for Generalizability Veenendaal Info Support

Solliciteer nu
Solliciteer als één van de eersten
Opslaan
Solliciteer nu
Opslaan
Delen

Gevraagd

  • Fulltime

Aanbod

  • Loondienst (vast)
  • 1.000 p/m (bruto)
  • Auto v/d zaak
 

Vacature in het kort

Veenendaal
Enjoy professional guidance, training sessions, knowledge events, and two vacation days per month. The task involves extending and testing a data quality framework on varied datasets and AI models, including LLMs. The goal is to develop a tool to help evaluate and improve data for complex AI systems. Interests include data science and statistics. Read on to discover how this opportunity can transform your career.
 

Over het bedrijf

Info Support
Directe werkgever
Bedrijfsprofiel
 

Volledige vacaturetekst

Challenging assignment with €1000 compensation or €500 + lease car or €600 + housing, professional guidance, training sessions, knowledge events, brainstorming with colleagues and 2 vacation days p/m.

We know that the quality of a training dataset is an important indicator of model performance, but how well does a data quality framework developed on a simple machine learning task generalize to real-world ML scenarios? In this thesis, you’ll extend and test an existing framework on diverse datasets, models and tasks, including LLMs. You’ll explore new quality dimensions and benchmark generalizability, building towards a practical tool that helps teams evaluate and improve their data in complex AI pipelines

ð¡Areas of Interest: data quality, machine learning, LLMs, statistics, data science

The impact of data quality on machine learning performance is well established, yet most frameworks are tested only in limited, controlled environments. Previously, we developed an automatic data quality framework (Automatic Assessment of Dataset Quality for ML), which showed promising results by quantifying data quality across three core dimensions: completeness, consistency and accuracy using synthetic data and a small set of machine learning models.

However, real-world systems operate in far more varied and complex contexts. Today’s AI models range from classical algorithms to advanced LLMs, and datasets span structured tables, text, sensor streams, and more. Without validating how such a framework performs across these environments, its insights remain confined to the lab. The question is not just does it work, but how well does it generalize?


The Assignment

This thesis explores the generalizability of the developed data quality assessment framework across a wide spectrum of machine learning use cases. You will:

· Extend dataset coverage using both real-world and synthetic data from diverse domains (e.g., healthcare, finance, social media, e-commerce, public benchmarks).

· Diversify task types, including classification, regression and clustering.

· Broaden algorithmic scope by comparing a range of machine learning models.

· Evaluate the role of LLMs in assessing data quality across different data domains.

· Compare model responses to varying data quality degradations across model sizes and architectures.

· Add quality dimensions such as uniqueness, timeliness, accessibility, believability and statistical measures.

· Benchmark generalizability by measuring the framework’s reliability across tasks, models, and datasets.

The final deliverable is an empirically validated, modular extension of the original framework - capable of guiding users towards improving their datasets for machine learning.

About Info Support

Info Support specializes in custom software, data/AI solutions, management, and training and is active in the Finance, Industry, Agriculture, Food & Retail, Mobility & Public, and Healthcare sectors. We provide solid and innovative solutions for complex and critical software issues. Our headquarters are located in Veenendaal (NL) and Mechelen (BE). At present, approximately 500 employees are employed by Info Support.

Info Support's working method is characterized by a number of core values: solidity, integrity, craftsmanship, and passion. These core values are intertwined in our work and the way we interact with each other.

To ensure that all employees are always up to date with the latest developments, Info Support has an in-house knowledge center that eagerly satisfies the hunger for more or different knowledge and skills.

B2 language proficiency in Dutch is required.

Vacature opslaan
 Vacature delen
Sluit
Je notitie is succesvol opgeslagen
Voeg een notitie toe aan deze vacature
Opslaan
Sluit
Bedankt, je melding is verstuurd
Rapporteer deze vacature
Leg kort uit waarom je deze vacature rapporteert:
Versturen
Terug naar vacatures
Sluit
Kies 1 of meer
Sluit
Vacature opgeslagen
Klik op het hartje bovenaan de pagina om je opgeslagen vacatures te zien.
Terug naar vacatures
Sluit
Vul een in