For his practical assignment of the course Text-Mining from the Department of Data Science and AI at Maastricht University, Francesco Gibellini decided to follow his hearth and text-mine a large data set of Italian recipes. As data source, he used the well-known Italian website: from Giallo Zafferano (https://www.giallozafferano.it/) which contains thousands of Italian recipes from various regions.
After various forms of pre-processing and normalization, the first experiment was to cluster the text of the recipes using clustering techniques such as k-NN and Non-Negative Matrix Factorization (NMF). Interestingly, the cluster results immediately represented the main structure of the Italian kitchen by grouping the recipes automatically into clearly distinguishing groups of Primi (appetizer), Secondi (main course) and Dolci (dessert), which was quit a promising results.
A more fine grained clustering on ingredients, resulted in even more interesting results, where all recipes using Gnocchi, Risotto and Paste were clearly grouped together as related recipes.
In a final experiment, a closer look was taken at the ingredients of the recipes in order to answer the question: “What goes well with what?, which ingredients pair better than others?”.
For each recipe, the ingredients were identified by using named entity extraction, next different textual occurrences of ingredients were normalized. Normalization was not a trivial tasks, as textual occurrences such as ‘Salmone Scozzese 150g’ (‘Scottish salmon 150g’) had to be normalized to ‘Salmone’.
Once this task was completed, a network representation of around 900 ingredients, for which many had 100+ pairings, could be created. This resulted in some interesting insights as is shown below for the ingredient Gember.
Where some more detail is provided below.
This insight shows popular combinations, but additional clustering techniques can also help us to find combinations that may not be that obvious, but which could in fact be very unique new combinations resulting in surprising new tastes! See the one blow for surprising Broccolli combinations:
Maybe, we can even use this to have a computer program come up with new recipes or to assist chefs around the world to create new creative Italian recipes!
Well done Francesco, great project!