Seminars of the Focus Area Complex Systems

Prof. Dr. C. Beta, Prof. Dr. K. Dethloff, Prof. Dr. R. Engbert, Prof. Dr. M. Holschneider, Prof. Dr. W. Huisinga, Prof. Dr. Ralf Metzler, Prof. Dr. A. Pikovsky, Prof. Dr. S. Reich, Prof. Dr. M. Rosenblum, Prof. Dr. G. Rüdiger, Prof. Dr. T. Scheffer, Prof. Dr. F. Scherbaum, Prof. Dr. J. Selbig, Prof. Dr. F. Spahn


Speaker: Silvana Gromöller, Max Planck Institute of Infection Biology, Berlin and University of Potsdam

Title: Combining multiplatform data - Classification of combined multiplatform data for biomarker studies in tuberculosis * Bioinformatics Affinity Seminar

Time: Wed, July 3, 2013, 10am

Place: MPI für Molekulare Pflanzenphysiologie, Room 0.21 in The Box

Tuberculosis (TB) is still a major global health problem. In 2011, 8.7 million incident cases and 1.4 million deaths from TB were reported. However, the lack of a suitable diagnostic test leads to insufficient diagnosis, which is a major reason why TB still causes millions of death per year. Therefore, there is a need for more sensitive, specific and cost effective diagnostic tests. High throughput studies may lead to sensitive, specific biomarkers for TB. The information content of several different combined high throughput data sets can allow deeper insight into the whole biological system and can reveal distinct aspects of the host response. In the case of classification, the combination of different data sets could increase the classification accuracy compared to using only one data set. However, there is an increasing demand for algorithms and tools for combining of such multiplatform data sets.

In this work, an algorithm is designed to combine different multiplatform data. This algorithm is called: "Data Fusion with PLS-DA and PCA (DFPD)". The algorithm combines different types of data with supervised and unsupervised machine learning approaches. In the presented work machine learning methods are used to achieve two goals. First for data combining and second for classification of the samples. To test the performance of the algorithms two different sets of multiplatform data are used. These data sets are cytokine and metabolic profiles with a sample size of 99, and microRNA, mRNA and methylation data sets with a sample size of 32 from whole blood cells. In the case of microRNA, mRNA and methylation the datasets are divided into measures from two sub cell types (monocytes, neutrophils), that means 16 different samples per cell type. The classification accuracy increased with the combination of microRNA and mRNA data by using the novel DFPD algorithm. The performance is not matched by the classification for any single data set.

Furthermore, in the presented work the regularized Canonical Correlation Analysis (rCCA) and sparse Partial Least Squares Canonical mode (sPLS-can) are used to find bidirectional relationships between two sets of multiplatform data for biomarker studies in tuberculosis.

Back to the seminar schedule

Past NLD Seminars (1994-2007) & (2008 ...)

Students' seminar: Theoretical Physics, PIK, Modeling & TSA Berlin-Potsdam-Colloquia: PhysGesellschaft Berlin, TU Berlin, Pro Physik, AIP, AEI, MPI-KGF, GFZ, HMI, PIK, AWI, Max Planck Institute for the History of Science, Mathematik, DPG Disputationen, & Vorschau UP

Udo Schwarz, Zentrum für Dynamik komplexer Systeme,
Universität Potsdam, Campus Golm Karl-Liebknecht-Str. 24, 14476 Potsdam, building 28, room 2.107
Phone: (+49-331) 977-1658, Fax : (+49-331) 977-1045

Email: Udo.Schwarz AT

DFG SFB 1294

DFG Sonderforschungsbereich 1294 Data assimilation

DFG SPP 1488

DFG Schwerpunktprogramm 1488 Planetmag

News: odeint C++ library

Check out boost::odeint – our fast and flexible C++ library for integrating differential equations!