30.04 17:00 - 18:00 USI East Campus, Room D1.13 |
|---|
| |
|---|
Abstract: Retrieval-Augmented Generation (RAG) systems have become the default way we deploy LLMs over proprietary knowledge, yet the way we test them has not kept pace. Most existing frameworks score answers one query at a time against a fixed snapshot of the corpus, which is useful for benchmarking, but a long way from testing the system as a whole. In this talk I want to take a software-testing view of RAG systems and ask two questions: First, is our test suite actually exercising the retriever, or are we just hitting the same popular chunks over and over? I’ll introduce Chunk Coverage, an oracle-independent adequacy criterion. Second, what happens when the corpus moves under your feet: updates, stale versions, OCR noise, format drift? I’ll talk about metamorphic testing for RAG.
Chair: Carmen Armenti | |
|---|
|
|---|
|
|
Università della Svizzera Italiana | |
|---|
|
|---|
|
|
| | Jinhan Kim is a postdoc at TAU lab in Università della Svizzera italiana (USI) led by Prof. Paolo Tonella. He completed his Ph.D. degree from KAIST under the supervision of Prof. Shin Yoo. His research bridges Software Engineering (SE) and Artificial Intelligence (AI), focusing on testing and reliability of AI systems. He develops principled methods to assess and improve the robustness of complex systems, from traditional software to LLMs, aiming to make them reliable and transparent for deployment in safety-critical domains. More information is available at https://jinhan.me/ 17:00 |
|---|
| |
|---|
|
|
|
|