Using data mining to track critical dietary patterns and lifestyle diseases
Many chronic diseases are diet-related. A team at BFH Wirtschaft is pursuing a novel approach with multidisciplinary research: the scientists, including our author, are using data mining methods to uncover meaningful rules about the influence of nutrition on chronic diseases.
Dietary patterns play an important role in health. Supermarkets, health practitioners, sports organisations and governments have long taken the issue seriously, and more and more people are looking at their eating habits. But often they are not aware of the characteristics, limitations and especially the ingredients of their food. There are now various diets and fitness programmes that individuals follow to help them with a healthy diet and lifestyle. However, they often do not include a critical analysis of how the respective diet influences chronic diseases. This is where the research idea comes in: to use big data to show the influence of food on chronic diseases.
For this purpose, a comprehensive database was linked from the collected Swiss national nutrition data (menuCH) as well as Swiss demographic and health data. The resulting database is used to discover dietary patterns that lead to chronic lifestyle diseases. To extract and reveal such hidden patterns, we use data mining techniques. Data mining is increasingly used in the field of data analysis, which has recently emerged from computer science and differs from traditional statistical analysis. Traditional statistical analysis techniques are often developed to build evidence in support of or against a hypothesis from a more limited data set. Thus, statistical analysis typically examines the validity of the hypothesis by performing statistical tests on data that may have been collected for that purpose. Data mining techniques, on the other hand, are not used so much to build confidence in a hypothesis, but rather to extract unknown relationships that exist in the data set. Data mining is therefore a hypothesis-free data analysis method that can use statistical methods as tools, but does not initially assume that a hypothesis will be verified or rejected.
Life Style Diseases are diseases that are becoming more prevalent as countries become more industrialised and people age. These diseases include obesity, hypertension, heart disease, type 2 diabetes, cancer, mental disorders and many others. They differ from infectious diseases, also called communicable diseases (CD), because of their non-infectious, spreading nature, often due to dietary behaviour. Lifestyle diseases are therefore classified as non-communicable (NC) diseases. According to the World Health Organization (WHO, 2018), the growing epidemic of chronic diseases affecting both developed and developing countries is linked to dietary and lifestyle changes. The rapidly increasing burden of chronic diseases is a major determinant of global public health.
Figure 1: Common chronic diseases
Knowledge discovery in nutrition research
Data is being produced at an exponential rate, especially as storage capacity is now virtually unlimited. Data mining is increasingly used in data analysis as an emerging multidisciplinary field of:
- machine learning,
- Information retrieval
However, the most tedious task of a data mining research is to extract, load and transfer data. This process, called data preprocessing, is very time and resource intensive and costs almost 3/5 of the effort of such a data mining project. Data mining, according to (Fayyad et al., 1996, p. 40 f.).can be divided into 4 steps: 1- Collect data sources, 2- Clean and integrate data sources, 3- Use data mining methods to discover new rules, and 4- Interpret knowledge to create new knowledge. The following diagram shows these steps:
Collecting data sources
Nutrition database (menuCH)
The National Nutrition Survey menuCH (BLV, 2021) provides for the first time representative data on food consumption and dietary habits of the population living in Switzerland. Nutrition and exercise have a direct influence on health and quality of life. From January 2014 to February 2015, around 2000 people in the Swiss resident population were surveyed. Men and women aged between 18 and 75 years provided information on their food consumption, cooking, eating and exercise, as well as on their demographic behaviour. The survey was conducted as a questionnaire in the first step and orally by telephone in the second step. The questionnaire provides information on eating and drinking and cooking behaviour as well as intake of additives and salts, avoided foods and reasons for avoiding foods. In addition, the survey provides basic knowledge on healthy eating, activity patterns, body measurements, weight satisfaction, dietary behaviour, social structure of the respondents. The oral interview provides information on the interview and interview context; age and body information; food consumed (preparation, category, nutritional values, quantity and timing of food intake). In addition, there is a demographic classification of the respondents: telephone number, year of birth, age group, gender, relationship status, nationality, country of birth, household size, residence in the major Swiss regions.
Every five years, the Swiss Ministry of Health collects data from approximately 21500 Swiss citizens, asking them categorised questions about their health problems using a dual approach consisting of a questionnaire and a detailed telephone interview. In our study, only the data from the telephone survey were retrieved (BVG, 2021). The topics with the highest relation and priority to health and nutrition were extracted and reduced to a table with nine topics: Alcohol consumption, ageing problems, disability, cholesterol, chronic diseases, diabetes, drug use, nutrition, health status.
Figure 2 Data mining steps according to Fayyad et al.
Clean and integrate data sources
In our previous study (Mewes, 2021), we created an integrated database of menuCH and Swiss health data. Our multidisciplinary team consisted of a health and nutrition specialist who enabled us to appropriately assess, select and summarise the properties of the available attributes into different categories and an informatics specialist for data mining. The aim of this categorisation was to create several 4-8 sub-categories for each category. Regarding health data, categories were created on blood pressure, cholesterol levels, diabetes and alcohol consumption. Blood pressure was divided into 6 categories. Cholesterol data was divided into 4 categories. The diabetes data was divided into 4 categories and finally the alcohol consumption data was divided into 4 categories. As an example, the alcohol consumption data was divided as follows:
- Daily alcohol consumption up to 18 grams,
- Daily alcohol consumption > 18-23 grams,
- Daily alcohol consumption > 23-28 grams,
- Daily alcohol consumption > 28 grams.
After defining the categories for each chronic disease and menuCH attributes, the data were transferred to an integrated relational database according to the appropriate categorisation. 5 Common demographic characteristics available in both databases were used, such as gender, age group, household, marital status and language, to link the two databases into an integrated relational database.
We applied a data mining method according to the A-priori algorithm (Aggarwal, 1999) on the integrated Swiss diet and health database to obtain rules showing the impact of dietary habits on the selected chronic diseases such as hypertension, diabetes and high cholesterol. The following section describes some association rules and the associated interpretations for the selected chronic diseases.
Association rules and interpretations
|Hypertension||Hypertension is a disease of the organ axis heart – vessels – kidneys or lungs. The rules in the study showed an association of hypertension and normal pressure with characteristics (food intake, smoking, number of hot meals). In one group, food intake is sufficient to maintain health. Cholesterol and fats are apparently not over-absorbed, so vascular damage and apparently obesity are avoided. In another group of hypertensive patients with impaired cardiovascular-kidney function axis, which included elderly people, dietary supplements could improve energy production (ATP). Previous smoking could have impaired lung function, so the pulmonary circulation could also be under high pressure. Obesity should be reduced in this group.|
|Diabetes||It is not clear from the MenuCH data how the meals are composed and how much the study participants consume. Only the disease was associated with characteristics, not the health status. Therefore, we obtained rules related to carbohydrate-glucose metabolism. The rules found indicated that compared to glucose intake, smoking has little significance for the measurement metabolism of glucose. The prognosis may worsen only in consumptive diseases. A balanced diet was found in a group ofType 1 diabetics, but increased food intake was found in Type 2 diabetics. The obviousType 1 diabetics deliberately eat a balanced diet with glucose, do not need supplements and do not smoke. If they were type 2 diabetics, they would have ingested too much food in the past and developed obesity, leading to type 2 diabetes with the dreaded complication of “metabolic syndrome”. These type 2 diabetics are usually not aware of their diet. Food supplements (vitamins, trace elements) would only have an added value if they were malnourished.|
|Cholesterol||Hypercholesterolaemia is a disease of fat metabolism; cholesterol can be biosynthesised purely internally. A chain is formed from unused glucose or its degradation product acetyl-CoA, which ends with cholesterol. Therapeutically, this synthetic pathway can be interrupted with statins. The second possibility of hypercholesterolaemia is based on an increased external intake (high-fat diet, especially animal fats). Here, associations of ill and non-ill people with the same characteristics were investigated. A group with a normal cholesterol level eats a lot of vegetables, which also contain the necessary micronutrients. If bread or other carbohydrates are not consumed excessively, the body’s own cholesterol production remains low. Previous smoking did not seem to cause vascular changes that would worsen in combination with hypercholesterolaemia. A second group with normal cholesterol ate like the previous group but did not smoke. Vascular walls altered by nicotine consumption due to arteriosclerosis can be excluded. There is no cardiovascular risk. This group suffers from hypercholesterolaemia. However, vegetable consumption is insufficient or started too late, or cholesterol is caused by too much carbohydrate intake. In another group, the intake of regular hot meals with (hopefully) a balanced composition and non-smoking behaviour significantly reduce the risk of atherosclerosis. Here, the vascular walls should be less altered.|
Summary and future research
The interpretation of the derived rules reveals interesting aspects of the selected Swiss population. In general, the dietary habits of the Swiss are reasonable in relation to chronic diseases. The results show that the derived rules are only relevant for a very small part of the sample. Furthermore, the rules show that the occurrence of the independent nutritional characteristics in the different forms is evenly distributed in the rules, which can be interpreted as the majority of the sample population following the latest nutritional standards, smoking little and engaging in regular physical activities. Nevertheless, a small percentage of the sample has chronic diseases due to unhealthy diets. The weighting of characteristics should be taken into account in further studies so that characteristics with a small overall share in the population do not drop out prematurely in the process of data mining.
- Aggarwal, C. C. & Yu, P. S. (1999). Data mining techniques for associations, clustering and classification. In N. Zhong & L. Zhou (Eds.), Methodologies for knowledge discovery and data mining (pp. 13-23). Springer Berlin Heidelberg.
- FSVO, Federal Food Safety and Veterinary Office. (2021). menuCH – National Nutrition Survey. https://www.blv.admin.ch/blv/de/home/lebensmittel-und-ernaehrung/ernaehrung/menuch.html
- Swiss Federal Statistical Office. (16 February 2021). Swiss Health Survey. https://www.bfs.admin.ch/bfs/de/home/statistiken/gesundheit/erhebungen/sgb.html
- Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases, 27-34. https://doi.org/10.1145/240455.240464
- Mewes, I. Jenzer, H., Einsele, F. (2021). A Study about Discovery of Critical Food Consumption Patterns Linked with Lifestyle Diseases for Swiss poulation using Data Mining Methods, Online HealthInf, BIOSTEC- International Joint Conference on Biomedical Eng. Systems and Technologies.
- Mewes, I. Jenzer, H., Einsele, F. (2021). Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes. In World Academy of Science, Engineering and Technology, International Journal of Health and Medical Engineering Vol:15, No:1, 2021.
- World Health Organization. (2018). Noncommunicable diseases: country profiles 2018. World Health Organization.
- Ilona Mewes completed her Bachelor’s degree in Business Information Systems at BFH Wirtschaft. She did the two papers during her case work (term paper) and bachelor’s thesis under the supervision of Prof. Dr. Farshideh Einsele and Helena Jenzer.
- Prof. Dr. pharm. Helena Jenzer was Head of Research at BFH Health until 2020. She heads the hospital pharmacy of the Psychiatric University Hospital Zurich.