Abusing GitLab CI/CD to build a data engineering pipeline


Francesco Bresciani
Software Engineer at Codelounge
Often data engineering projects start as a collection of small individual projects. Over time these projects grow in number and complexity, and the processing pipeline becomes a serious affair. With the number and complexity of the projects, the intricacy of the infrastructure increases as well. In this talk I’ll show an unconventional approach to manage the execution of such pipeline that leverages CI/CD technologies. This approach ensures the correct execution and the validation of a data pipeline we at CodeLounge built to collect and process more than 4 millions entries from the Swiss Registry of Commerce.

Francesco is a Software Engineer at CodeLounge, the center for software research & development, part of the Software Institute – USI, Lugano. He received his BSc degree in Computer Science Engineering in 2019 from SUPSI, Lugano. Before joining CodeLounge in 2021, Francesco worked for two different startups, where he contributed to the development of the frontend for two web applications at different stages of maturity in the fields of marketing and trading of raw materials respectively. At CodeLounge, his responsibilities encompass a broader spectrum of tasks, including also data engineering, data analysis, and visual analytics.

In February 2019, the Software Institute started its SI Seminar Series. Every Thursday afternoon, a researcher of the Institute will publicly give a short talk on a software engineering argument of their choice. Examples include, but are not limited to novel interesting papers, seminal papers, personal research overview, discussion of preliminary research ideas, tutorials, and small experiments.

