Data Challenge – MSSISS 2024

Data Challenge

MSSISS 2022 Data Challenge

Detroit Restaurant Health Violations

Organized in collaboration with the Michigan Data Science Team

While outbreaks of foodborne illnesses from manufacturing often make national headlines through large food recalls, smaller, local incidents in restaurants receive much less publicity. Nonetheless, these represent a large proportion of foodborne illnesses: in its 2014 annual report, the CDC finds that 65% (485/742) of outbreaks originates from restaurants, which accounts for 44% (4,780/10,895) of all illnesses.

Restaurant food safety is therefore of paramount importance to public health, and is generally maintained through regular inspections by local (county/city) health departments. The city of Detroit has recently published rich datasets of restaurant inspections conducted over the past few years. The main objective of this challenge is to study these datasets to help public health officials be more effective in preventing restaurant-based foodborne illnesses. 

The MSSISS 2022 Data Challenge consists of two competitions; participating teams can compete in either or both. The first competition is a straightforward metric-based prediction; the second is an open-ended exploratory analysis where teams decide to study one or more research questions.

The challenge focuses on the following three datasets: a repository of Detroit restaurants, inspections conducted by Detroit Health Department between August 2016 and October 2020 and violations cited during these inspections. The datasets, along with detailed descriptions, can be found on the associated kaggle competition website.

Prediction Competition

In this first component, teams will be given a list of restaurants for which a predictive probability of being cited for at least one priority violation during 2019 is requested, based on 2016-2018 data. Details can be found on the associated kaggle competition website.
The competition is invitation-only: click here to join!

For this competition, teams will not be allowed to pull in any external data as part of the development process. Top teams will be asked to provide code reproducing their results to be eligible for awards. 



March 6th, 2022

Submit on kaggle here

Reproducible code*

March 9th, 2022

Methodology presentation*

March 10/11th, 2022


*winners only

Evaluation & Awards

The prediction competition will be objectively assessed through a performance metric (AUC), and the best submissions will be invited to present their methodology during MSSISS where they will also receive their prize (TBA).

Exploratory Analysis Competition

In this second component, you will be using your scientific instincts, statistical skills and creativity to answer one or more research questions (listed below). There is no single correct answer to any of the questions and different interpretations can lead to significantly different approaches. Final analyses of invited teams will be presented during MSSISS as oral or poster presentations and a progress report halfway through the competitions will be requested. 

You are encouraged to bring in external data for any of the questions and to further explore questions you think of that are related to your selected topic(s).

Teams should consider the full datasets available on the City of Detroit Open Data Portal:


Progress report

February 20th, 2022
  • Submitted to the organizers directly
  • 1-2 pages
  • Selected questions
  • Summary of current progress
  • Next steps and goals

Analysis presentation

March 10/11th, 2022
  • During MSSISS 2022
  • Oral or poster presentation (TBA)

Evaluation & Awards

Judges will evaluate the presentations based on the following criterion:

  • Relevant exploration and visualization of the data
  • Formalization into appropriate statistical problems
  • Pertinence, adequateness and practicality of the proposed recommendations
  • Balance of simplicity, interpretability and explanatory power
  • Justification of analysis choices
  • Statistical rigor
  • General communication and presentation

Outstanding analyses will be awarded prizes (TBA).

Research Questions


Complaints and crowdsourcing

Most restaurant inspections are “routine inspections,” but some originate from a complaint by customers to the Detroit Health Department. Investigate whether these complaints are useful in identifying potential violations. In particular, do inspections triggered by complaints lead to more citations and/or more severe citations? Alternatively, when customers notice some food safety concern, they may complain on social media to directly warn future customers about the issue. Investigate whether social media posts and reviews (e.g., Google, Yelp, Twitter) could help inspectors identify restaurants violating the health code. See this paper or that paper for examples.



Identify predictors of violations such as, but not limited to, location (neighborhood, street), time (seasonality, trend over years), establishment type (restaurant/school, delivery only, food truck), food served, complexity level, inspections (time since last, past violations), whether it is a chain and which chain it is. For information identified to be predictive, explore how it is associated with violations. Furthermore, explore what these features may indicate about bias or fairness in inspections. 


Inspector consistency

In a blogpost, Stephanie Quesnelle from Data Driven Detroit finds that inspectors are highly variable in the number of violations per inspection. Investigate what this observation indicates about the consistency, effectiveness and fairness of inspectors. Consider this together with the predictors task above; are there any features, such as location or cuisine, which disproportionately affect the inspectors citation behavior?



The onset of the Covid-19 pandemic in March of 2020 has had drastic effects on restaurant operations (e.g., closure of dine-in services, increased health and cleanliness requirements). Determine if and how evidence of the pandemic is present in the data. Furthermore, explore how local policy changes (e.g., required contact tracing, employee case-related closure, testing and vaccine availability for front-line workers) has affected inspections.


Prevention and correction

Citing restaurants for violations, even benign ones, could have a preventive effect. Analyze this claim by investigating if and how the number and importance of violations depends on the number and importance of prior citations, or by identifying predictors of consecutive violations. Furthermore, for priority violations, inspectors may conduct follow-up inspections to ensure a restaurant has corrected the error. Investigate the conditions in which these inspections are conducted and resolved (or not.)


Your own theme

While exploring the three datasets, you may encounter additional research questions. If you would like to work on your own questions, feel free to contact us to get them approved for the competition!


To register your team for the MSSISS 2022 Data Challenge, please fill out the following form:

Teams are limited to five members and teams of one are allowed. The first team member registered will be taken as the team captain for communications between the organization and participants. You can register your team for either or both competitions within the form.

If you do not yet have a team but wish to participate, you can sign up as a free agent below, and we will add you to an open team or form additional teams if needed. 

Important Dates & Deadlines

January 24th Data challenge begins, registration begins
February 14th Registration ends
February 20th Progress report submission deadline
March 6th Predictions submission deadline
March 10–11 Presentations and results during MSSISS


The MSSISS organizing committee would like to thank Renee Li from MDST for her considerable contribution to this project.

We would like to thank Stephanie Quesnelle from Data Driven Detroit for suggesting the dataset to us and for helping us design the competition.

We would also like to thank Dr. Sean Meyer and Dr. Jonathan Gryak from MIDAS for general advice.


Data Challenge Organizing Committee

Simon Fontaine and Renee Li

lsa logoum logoU-M Privacy StatementAccessibility at U-M