Our research interests lie in the area of Machine learning, Big biomedical data, and Clinical genomics. Currently, there is a tremendous amount of clinical data that is being gathered from a variety of sources and types. Including structured and unstructured data, Electronic Health Records (EHR), Bio molecular, Genetic, Wearables, and other types of data. Therefore, there is a need for developing new machine learning and deep learning methods that can make new insights from this type of data, in order to advance toward personalized medicine.
The current practice, for a physician to look at the laboratory test result, is to scan a list of results and look for normal or abnormal values. Such a marginal point of view can miss information in higher dimensions. For example, it might be that all tests are in the normal range, but they are all on the extreme edges. This still may alert the physician. We are developing a novel method to estimate the ‘normality’ of a set of laboratory tests. We compare hundreds of thousands of test results to cope with this challenge.
We are developing a model for mortality prediction in the intensive care unit (ICU). In the intensive care unit, new data is collected frequently, as many patients are connected to monitors, and clinical care is given at a high rate. This data can help in alerting the stuff for patients who are at higher risk for a complication that needs extra special care. Such models were developed before, but they suffer from biases. One type of bias we want to overcome is the time of prediction. For example, a subject that is critically ill might be “easier” to predict, and if the model learns only on such samples, it will be useless as a real-world application.
A person’s genetic encode his risk for a variety of disease. The number of ‘features’ that can be generated from our genome is huge, as our genome consists of a sequence of 3 billion ‘letters’. Even if we restrict our self to only common variations there are still millions of such. Not only the number of possible features is huge, but also the effect of each feature is small. Therefore, statistical power tends to be very low. We are developing new methods that can learn from such super high dimensional data.
Clinical practice and guidelines tend to change over time as new methods and knowledge accumulate. Such an example is the guidelines for redoing Coronary Artery Bypass Grafting (CABG) surgeries. Over the year the recommendation had favored toward alternatives like coronary angioplasty. But the safety of redoing CABG rose dramatically w/o changing the guidelines. Using data of over 1.5 million CABG surgeries, we showed that the risk for redo CABG went down and in recent years is comparable to first time CABG surgery.