Program

Learn about this year’s Hi! PARIS Summer School Program !

Event schedule

Keynotes

This year’s Hi! PARIS Summer School will host four Keynote addresses.
More information on each keynote’s topic and speaker can be found below.

Keynote 1
Generative AI for image applications
Date: July 3, 2023
Start: 3:30 pm
End: 4:30 pm
Abstract:

The main goal of this talk is to give the participants a comprehensive overview of the different methods of generative modeling, with special attention to their application in image generation. The talk will cover different techniques and models from this field.

We will look first at state-of-the-art image models such as Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), and Normalizing Flows (NFs). These state-of-the-art models have revolutionized the field and have been instrumental in generating realistic images. In addition, we will explore the applications of these models in the context of conditional image generation so that participants understand how these techniques can be used in practical scenarios.

We will then focus to an emerging competitor in the generative modeling field, diffusion models or Score-Based Generative Models (SGMs). We will gain an in-depth understanding of the mathematical principles underlying these models, including the time-reversal of stochastic processes. We will also explore the connections between diffusion models and Regularized Optimal Transport and their applications in control theory.

Keynote 2
Who Benefits from the Data Economy? 
Date: July 4, 2023
Start: 1:00 pm
End: 2:00 pm
Abstract:

In the public debate around privacy and the data economy, several claims have been made concerning the benefits that multiple stakeholders may accrue from the collection and analysis of consumer data. How many of those claims are empirically validated by independent research? I will review prior work and present a series of ongoing studies that aim at understanding and estimating how the economic value extracted from consumer data is being allocated to different stakeholders, and the way privacy protection may influence those allocations.

Keynote 3
Leveraging the Structure of Data
Date: July 5, 2023
Start: 1:00 pm
End: 2:00 pm
Abstract:

Although predictions from machine learning models influence more and more of our lives, the standard way of posing a ML problem has remained relatively unchanged for decades. In the search for better models, a new and popular family of techniques (sometimes called Graph Machine Learning) has emerged. These techniques rely on expanding beyond the features of an individual entity and instead look to pull information from its relationships. The methods offer a tantalizing way of improving task performance by leveraging previously unused information. However, it is not a free lunch, as these models can be more complex, difficult to train, and may have challenges in interpretability.  This talk will discuss the fundamentals of graph machine learning, a few models, and some insights from years of real-world applications.

Keynote 4
Harnessing AI for Organizational Theory Research, While Avoiding its Pitfalls
Date: July 6, 2023
Start: 1:00 pm
End: 2:00 pm
Abstract:

This talk will explore the intersections of AI, organizational theory, and managerial practice. I will highlight emerging methods for measuring different facets of culture, cognition, and social networks in organizations and illustrate how they can shed new light on the drivers of individual, group, and organizational performance. I will also discuss potential pitfalls of AI in organizational research and some admittedly incomplete and evolving thinking on how to avoid them. Applications of these methods to such topics as post-merger cultural integration, cognitive diversity in groups, organizational identification, and innovation will be discussed.

Tutorials

This year’s Hi! PARIS Summer School tutorials are meant to teach participants the best programming practices in AI and Machine Learning. Tutorials will be organized in two parallel tracks, Track A – Data Science and AI for Business and Society and Track B – Theory and Methods of AI and Data Science.

Both tracks will be held simultaneously during the week and participants will be able to choose between both tracks for each of the six tutorials.
More information on each tutorial can be found below.

Track A – Data Science and AI for Business and Society

Tutorial 1.A
Image Recognition Using Deep Learning : Implementation and Application
Date: July 3, 2023
Start: 11:00 am
End: 3:00 pm
Abstract:

This 3 hour module will offer a hands on introduction to deep learning based image recognition tools. Participants will gain familiarity with preparing and importing images into software (python) and applying one of the foundational deep learning architectures to classify the images and create vector representations. We will discuss different applications of the output of deep learning tools to extract managerial and scientific insights. In particular, the course will discuss applications of these tools to creating large-scale measures that have otherwise proven to be elusive to measure or susceptible to bias in measurement.

Requirements:

Basic knowledge of linear algebra is helpful but not required, Basic knowledge of python (e.g. libraries such as pandas and numpy) is helpful but not required, Basic familiarity with standard regression OLS models. You should be familiar with what it means to estimate relationships between variables using OLS models. A gmail account is required to open the google collab notebooks which will be shared before the class

Tutorial 2.A
Data in Finance: FinTech Lending
Date: July 4, 2023
Start: 8:30 am
End: 12:00 pm
Abstract:

This tutorial includes a short lecture followed by an interactive game in which participants play the role of a FinTech lender. Context: Banks and insurers increasingly use alternative data and machine learning to screen consumers and price products. For example, a FinTech using digital footprints to predict default will have a competitive edge over traditional banks. However, there are important pitfalls to avoid when using alternative data and machine learning to score consumers, such as the winner’s curse and discrimination. This tutorial and its interactive game provide an introduction to these issues.

Requirements:

Multivariate statistical analysis, in particular OLS/logit regressions and/or machine learning methods.

Tutorial 3.A
Understanding the New Research Paradigm: ML and AI Techniques in Economic and Management Research
Date: July 4, 2023
Start: 2:00 pm
End: 5:30 pm
Abstract:

Abstract – The capabilities of machines are advancing rapidly, with examples such as ChatGPT’s human-like reasoning and creativity, Copilot’s capacity to become our peer-programmers, Facebook’s facial recognition technology, and Google’s new AI and ML frameworks like Tensorflow. With these advancements, researchers now have a large toolset of approaches to perform data-driven research and provide insights that were previously infeasible. But, as researchers, how will these advancements change our research identity and the nature of our research? For instance, face recognition algorithms do not follow predetermined rules for detecting certain pixel combinations that make up a face, based on human understanding. Instead, these algorithms utilize a vast dataset of labeled photos to estimate a function , which predicts the presence  of a face based on pixels . This approach has similarities to econometrics and raises important questions, which we will address in this workshop. Specifically, we will answer three questions – (a) Are these algorithms simply utilizing conventional methods to process extensive and innovative datasets? (b) If these are new empirical tools, how do they relate to existing knowledge? and, (c) How can we as researchers incorporate these methods into our own research?

The first half of the workshop will be an interactive lecture, where we will understand the background and implications of ML and AI techniques for economic research. In the second half of this workshop, we will have a hands-on exercise. Here, we will develop a data-driven research question using these new and advanced computational techniques. The idea here is to see the amazing power that we now have in conceptualizing new constructs and finding interesting insights.

 

Hands on exercise  – With the rise of generative AI technologies, studying the extent to which these algorithms (that are trained on human-created ground truths) are able to produce highly novel ideas, which are useful for organizations, is an open question (Jago 2019, Amabile 2020). Additionally, with claims that these AI technologies can mimic human emotions and behavior (Raj et al. 2023), there is an increasing need to understand the implications on the future of software development– whether such AI technologies can become our peer-programmers, and what their impact might be on the creativity and quality of the software produced?

As a first step towards answering the aforementioned questions, this exercise will try to extract the perceptual attributes (think novelty, usefulness etc.) of a given sample of software code. A sample dataset of code commits from GitHub will be provided and our task will be to computationally evaluate them using ChatGPT API and study their antecedents.

 

Documents – Recommended reading

  • Jia Deng, Wei Dong, Socher, R., Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. “ImageNet: A Large-Scale Hierarchical Image Database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, June, pp. 248–255. (https://doi.org/10.1109/CVPRW.2009.5206848).
  • Choudhury, P., Wang, D., Carlson, N. A., and Khanna, T. 2019. “Machine Learning Approaches to Facial and Text Analysis: Discovering CEO Oral Communication Styles,” Strategic Management Journal (40:11), pp. 1705–1732. (https://doi.org/10.1002/smj.3067).

 

 

Requirements:

Basic to intermediate knowledge of Python is sufficient. The hands-on exercise will be done using Python and on Jupyter notebook.

ChatGPT API key is necessary.  Please create an account with Openai and generate your ChatGPT API key. Once the account is created, the API key can be found in this link : https://platform.openai.com/account/api-keys. For more information please refer – https://platform.openai.com/docs/api-reference/authentication

Tutorial 4.A
What to expect of the EU AI Regulation? Challenges and Compliance
Date: July 5, 2023
Start: 8:30 am
End: 12:00 pm
Abstract:

This workshop will examine how the future European AI Regulation will impact artificial intelligence applications to be launched in the market and the applicable requirements to comply with it. What products and services are considered as artificial intelligence under the future European AI Regulation? And how to classify AI applications under the different levels of risk established by the Regulation, which determine the rules applicable to them? Departing from these questions, this tutorial will examine in-depth two case studies involving high-risk AI applications, which must fulfil different requirements. Most of these general requirements established by the regulation (e.g., transparency, explainability, fairness, robustness, accuracy) are further specified by technical standards stipulated by European standardization organizations for specific contexts. We will compare the risk management system created by the EU AI Regulation with the voluntary NIST AI Risk Management Framework prevailing in the US and discuss whether and to what extent the AI Regulation creates a regulatory model that expands beyond Europe.

Tutorial 5.A
Introduction to Structural Causal Models and Directed Acyclic Graphs
Date: July 6, 2023
Start: 8:30 am
End: 12:00 pm
Abstract:

This workshop introduces causal inference applications using Structural Causal Models (SCMs) and Directed Acyclic Graphs (DAGs). In the first half, participants will learn about the basics of Do-calculus, the critical role of mediators, confounders, and colliders, and explore the rising significance of causal machine learning in industry practice. Further, Causal Discovery (CD) and Large Language Models (LLM) will demonstrate how causal relationships in observational data can be supported with algorithm-supported induction. The second half of the workshop is dedicated to practical applications, wherein attendees will gain hands-on experience with the DoWhy library, running Python code on CoLab.

Requirements:

Have a working Google account to run CoLab

Tutorial 6.A
Anomaly Detection for Building Trust in Organizations and Marketplaces 
Date: July 6, 2023
Start: 2:00 pm
End: 5:30 pm
Abstract:

Anomaly detection is a critical tool for organizations and marketplaces looking to build trust and mitigate risks. In this tutorial, we will explore the key concepts and techniques involved in anomaly detection and demonstrate how they can be leveraged for high-stakes tasks such as communication surveillance, transaction monitoring, anti-money laundering, and insider trading detection. We will focus on the use of unsupervised machine learning and graph-based techniques, explaining why they are particularly effective for detecting anomalies when we face information constraints or data complexity. Through practical examples and hands-on experience, participants will gain a deeper understanding of how to design anomaly detection solutions for building trust in organizations and marketplaces and how to implement those solutions in real-world scenarios.

Requirements:

Fundamentals of probability, statistics, and linear algebra; Basic knowledge of Python and Jupyter Notebook

Track B – Theory and Methods of AI and Data Science

Tutorial 1.B
Learning on messy tabular data
Date: July 3, 2023
Start: 11:00 am
End: 3:00 pm
Abstract:

Many if not most data science projects are run on tabular data: data from one or multiple tables with columns of diverse nature. Tabular data comes with its own challenges: many entries are of discrete nature (categories or entities), entries may be missing, the data may need to be enriched by joining multiple tables. Additional data-integration challenges arise when the tables are assembled across different sources and come with different conventions. In this lecture I will present various machine-learning methods dedicated to such data. I will illustrate these methods with example using the dirty-cat and scikit-learn Python packages.

Requirements:
  • Basic knowledge of machine learning (the fit/predict pipeline, model selection and validation)
  • Working knowledge of central data-science tools in Python: pandas and scikit-learn
Tutorial 2.B
Multi-label learning
Date: July 4, 2023
Start: 8:30 am
End: 12:00 pm
Abstract:

Multi-label learning aims to build models that provide potentially multiple labels for each data point (unlike multi-class classification that provides a single class label per instance). A growing number of applications in data science and machine learning involve the multi-label setting; including image and text classification, time-series forecasting, localization and tracking, missing-value imputation, recommender systems, and many other kinds of structured-output problems. There is a broad selection multi-label learning methods, including approaches that leverage ‘off-the-shelf’ (classical, multi-class) models, as well as deep neural network architectures. The first part of this tutorial involves a lecture that introduces some of these methods, as well as discusses particular cases of interest such as weak/partial labels, questions of model interpretability and scalability, and the intersection with other areas of machine learning such as sequential, multi-task and transfer learning. In a second part of the tutorial, we will take a hands-on approach with Python.

Requirements:

Familiarity with the fundamentals of machine learning, especially classification; and Python (knowledge of its libraries such as numpy, scikit-learn, and pytorch would be helpful too).

Tutorial 3.B
The Power of Multimodal Generative AI: Text and Image Generation
Date: July 4, 2023
Start: 2:00 pm
End: 5:00 pm
Abstract:

This talk explores two groundbreaking aspects of generative AI: text generation and image generation. We delve into large language models (LLMs) capable of engaging in dynamic conversations. We also uncover diffusion models, a remarkable technique for generating realistic and diverse images. We discuss potential applications, challenges, and ethical considerations surrounding these advancements in generative AI. Attendees will gain insights into the latest developments and be inspired to push the boundaries of human-machine collaboration.

Requirements:

TBA

Tutorial 4.B
Self-supervised learning in computer vision and medical imaging
Date: July 6, 2023
Start: 8:30 am
End: 12:00 pm
Abstract:

Many tasks in Computer Vision and Medical Imaging, such as object detection, image classification, or semantic segmentation, have reached astonishing results in the last years. This has been possible mainly because large (N > 10^6) and labeled data-sets were available. When dealing with small, labelled data-sets, a common strategy consists in pre-training a model on a large dataset and then transferring it to the small target dataset. This is commonly called Transfer Learning. Supervised pre-training, namely using a large, labelled dataset, such as ImageNet, is the de facto standard technique. However, recent studies have shown that its usefulness, namely feature reuse, is important only when there is a high visual similarity between pre-training and target domains, namely a small domain gap. This might not be case in many applications, in particular when using 3D data or in Medical Imaging. To reduce the domain gap, several self-supervised pre-training strategies have recently emerged. They leverage annotation-free pretext tasks to provide surrogate supervision signals for feature learning.
In the first part of this tutorial, you will learn the most important and used self-supervised strategies for computer vision and medical imaging. In particular, we will study thoroughly contrastive learning using a geometric approach. In the second part, you will test the methods on both toy exemples and real data using Pytorch.

Requirements:

– Basic knowledge of Machine Learning and Deep Learning
– Basic knowledge of Image Processing
– Being familiar with Python. In particular: numpy, matplotlib, scikit-learn, pytorch

Tutorial 5.B
What are the challenges of neural models for sentiment analysis?
Date: July 6, 2023
Start: 8:30 am
End: 12:00 pm
Abstract:

In this tutorial, you will be familiarised with the task of sentiment analysis and its challenges. You will then study the different neural architectures that can be used to tackle this task, including learning text representations and building different classification pipelines. You will also learn how to incorporate knowledge about sentiments from sociolinguistics and psychology into these architectures.

Requirements:

pytorch, basic knowledge on neural models

Tutorial 6.B
Fairness in Machine Learning
Date: July 6, 2023
Start: 2:00 pm
End: 5:00 pm
Abstract:

Many machine learning algorithms are used to make decisions that affect the lives of humans (e.g., for hiring or criminal risk assessment). Yet in the past they have often been observed to discriminate against certain demographic groups. In this tutorial, we will introduce the problem of fairness in machine learning, with some examples from different learning tasks. We will then study the different methods to train fair learning algorithms, for the case of classification. We will do a hands-on example for the famous compas dataset.

Requirements:

Basics of machine learning (theory and practice with python)

Academic Round Table

AI and Data Science in the Contemporary Society

The Academic round table will be an opportunity for the audience to learn about the views and research of four leading academic scholars from around the globe, on the role of AI and Data Science in the contemporary society.

The panel will be moderated by Prof. Ioana Manolescu from INRIA, France. After an opening introduction by the moderator, each of the panel members will be invited to provide a 5-minute presentation of their research in relation to the evolving landscape of AI and Data Science. This will be followed by a moderated and interactive discussion on the topic to identify future areas of interest for researchers and practitioners. The audience will have the opportunity to ask their questions to the panelists, which are expected to stimulate an exciting discussion on the subject.  

Academic speakers:

  • Alessandro ACQUISTI – Carnegie Mellon University
  • Brian PEROZZI  – Stony Brook University & Google
  • Sameer B. SRIVASTAVA  – University of California, Berkeley
  • Gaël Richard – Hi! PARIS Scientific co-Director and Télécom Paris, IP Paris

Industry Round Table

Opportunities and Challenges with AI and Data Science

The Industry round table is composed of representatives from Hi! PARIS Corporate Donors. This event is an opportunity for the audience to learn about cutting edge AI and Data Science initiatives being taken by each of the participating companies.

After an opening introduction by the panel moderators, each of the industry panel members will be invited to provide a 5-minute presentation about AI and Data Science initiatives being taken by their companies and the challenges they are currently facing. This will be followed by a jointly moderated session, led by Prof. Federica De Stefano and Prof. Klaus Miller from HEC Paris, to identify areas of practical interest that can spawn impactful research. The audience will have the opportunity to ask their questions to the panelists. The industry panel will be an interactive event with an opportunity to open communication channels for further research opportunities between the industry and academia.

Industry speakers:

  • Bruno DAUNAY  – AI Lead at Leonard, VINCI’s Innovation and prospective hub
  • Imen EL KAROUI – Data Intelligence Director at KERING
  • Sébastien GOURVENEC – Scientific/technical management & expertise in R&D for industrial environments at TotalEnergies
  • Jeremy HARROCH – VP Capgemini Invent, in charge of Quantmetry x Capgemini Industrial Project at Capgemini
  • Stéphane LANNUZEL – Beauty Tech Program Director at L’Oréal
  • Claude LE PAPE-GARDEUX – Intelligence, Optimization & Analytics Fellow at Schneider Electric
  • Laurent NIZARD – Head of AI Solutions and Data Science at Rexel

Practical Research Tips Session by the Hi! PARIS Engineering Team

Learn how to develop your own Python Package ! 

The practical research tips tutorial by the Hi! PARIS Engineering Team will teach participants how to develop a Python Package. In this course, participants will learn about package structure, documentation and maintaining python code style using flake to turn loose code into convenient packages. To speed up package development, we will use cookiecutter to create package skeletons and templates. Participants will also learn how to use setuptools and twine to build and publish your packages to PyPI (a repository of software for the Python programming language).

Awais Hussain SANI

Machine Learning Research Engineer

Hi! PARIS Engineering Team

PhD in Signal Processing and Telecommunication. 10 years of experience in working Electronics, Telecommunication and Machine Learning. Expertise in managing infrastructure, developing and deploy machine and deep learning models.

Laurène DAVID

Machine Learning Research Engineer

Hi! PARIS Engineering Team

Graduate of University Paris 1 Panthéon-Sorbonne in Data Science and Applied Mathematics with previous experiences at REXEL as a Data Analyst and Withings as a Machine Learning Research Intern.

Student-oriented Practical Session by the Hi! PARIS Engineering Team

Learn how to build a machine learning pipeline and track the run performances !  

The student oriented practical session will be presented by the Hi! PARIS Engineering Team. This course provides a comprehensive introduction to data versioning, model management, and AI deployment using DVC, MLflow, and AI technologies. Participants will learn the fundamentals of these tools and gain practical skills to effectively track and manage data, models, and experiments in AI projects and make your research or work reproducible.

Gaëtan Brison

Machine Learning Research Engineer

Hi! PARIS Engineering Team

Msc in Data Science at EDHEC Business School & Computer Science at Georgia Tech. Previous experience working for Amazon in forecast department and CNRS in natural language processing.

Shreshtha Shaurya

Machine Learning Research Engineer Intern

Hi! PARIS Engineering Team

Msc in Data Science at EDHEC Business School. Previous experience working for a real estate start-up company as a data analyst by helping build image recognition for their application.

Poster Session

Posters will be displayed in Galerie Nord, HEC Paris Campus from Poster session on Day 2 (Tuesday 4 July, 5:30-6:30pm) for presentation to Poster award on Day 4 (Thursday 6 July, 5:30-6:30pm).

Please note. Posters must be printed by your own means. There will be no printing on site. 

Format. The preferred format of the poster is A0 paper, portrait mode (height : 119 cm, width : 84 cm). We will provide you with pins or with tape to hang your poster on the wall.

Guidelines. For Your Convenience,  see above some guidelines for poster presentation borrowed from the ICML Conference.
There are many great guides to making accessible and inclusive talks and posters; we advise everyone to consider all the points made in the RECSYS guidelines, the ACM guide, and the W3C guide.

We would like to highlight the following items:

  1. Keep your posters clear, simple, and uncrowded. Use large, sans-serif fonts, with ample white space between sentences and paragraphs. Use bold for emphasis (instead of italics, underline, or capitalization), and avoid special text effects (e.g., shadows).
  2. Choose high contrast colors; dark text on a cream background works best.
  3. Avoid flashing text or graphics. For any graphics, add a brief text description of the graphic right next to it.
  4. Choose color schemes that can be easily identified by people with all types of color vision and do not rely on color to convey a message (see How to Design for Color Blindness and Color Universal Design for further details).
  5. Use examples that are understandable and respectful to a diverse, multicultural audience.

You can find an example of good poster and another example of a poor poster here: https://guides.nyu.edu/posters

Social Event

Two social events are schedules as part of the Hi! PARIS Summer school 2023:

  • Day 1 (Monday 3 July, 6:00-7:00pm) – Opening welcome cocktail in Galerie Nord, HEC Paris Campus
  • Day 3 (Wednesday 5 July, 6:00-9:00pm) – Cocktail at HEC Paris Le Château