Researchers in India have carried out a data mining exercise to determine which are the most important risk factors in increasing the chances of an individual suffering a heart attack. Writing in the International Journal of Biomedical Engineering and Technology, they confirm that the usual suspects high blood cholesterol, intake of alcohol and passive smoking play the most crucial role in “severe”, “moderate” and “mild” cardiac risks, respectively.
Subhagata Chattopadhyay of the Camellia Institute of Engineering in Kolkata adds that being male aged between 48 and 60 years are exposed to severe and moderate risk by virtue of their age and gender respectively, whereas women over 50 years old are effected by mild risk in the absence of the other factors.
Medical prognosis is a highly subjective art as is determining risk for particular health events, such as heart attack. After all, clinical history, symptoms and signs rarely follow a linear path and their interpretation at the individual level by doctor does not usually conform to the rules of epidemiology – personal intuition, emotions, logic and experience all conspire to confound the conclusion drawn for each patient at a given time under a particular set of circumstances.
The use of computational data mining techniques that allow researchers to extract interesting and meaningful information from real-life clinical data could remove at least some aspect of the subjectivity of clinical prognosis and allow the epidemiology to work at the patient level more precisely. There have been data mining approaches tried before. However, they often have inherent problems in that the classification of the data for information retrieval is based on decision making learnt from examples set by doctors and so they incorporate the very subjectivity that Chattopadhyay hopes to avoid with his approach.
He has used 300 real-world sample patient cases with various levels of cardiac risk – mild, moderate and severe and mined the data based on twelve known predisposing factors: age, gender, alcohol abuse, cholesterol level, smoking (active and passive), physical inactivity, obesity, diabetes, family history, and prior cardiac event. He then built a risk model that revealed specific risk factors associated with heart attack risk.
“The essence of this work essentially lies in the introduction of clustering techniques instead of purely statistical modeling, where the latter has its own limitations in ‘data-model fitting’ compared to the former that is more flexible,” Chattopadhyay explains. “The reliability of the data used, should be checked, and this has been done in this work to increase its authenticity. I reviewed several papers on epidemiological research, where I’m yet to see these methodologies, used.”