Jan Mielniczuk (http://www.ipipan.waw.pl/staff/j.mielniczuk/ ) is full professor and head of Department of Artificial Intelligence at the Institute of Computer Science, Polish Academy of Science and professor at the Faculty of Mathematics and Information Science of Warsaw University of Technology. He received the M.Sc. (1981) and Ph.D. (1985) degrees from the Warsaw University and Dr. Habil. (1996) degree from the Institute of Mathematics, Polish Academy of Sciences. In 2009 he received a title of Professor. He is at the Institute of Computer Science from 1981. Part of his research and teaching activities was conducted while being abroad, including several visits to the University of Michigan, Ann Arbor and to the Rice University, Houston. Recently he has been a coordinator of Ph.D. Studies Information Technologies: Research and their Interdisciplinary Applications with 40 plus Ph.D. students enrolled.
His main research contributions concern computational statistics and data mining, in particular time series modelling and prediction, inference for high dimensional and misspecified data, model selection, computer intensive methods, asymptotic analysis and quantification of dependence. He is an author and coauthor of two books and numerous research articles published among others in Journal of Machine Learning Research, The Annals of Statistics, Statistica Sinica, Neural Networks, IEEE Information Theory, Computational Statistics & Data Analysis and Bernoulli.
His currently taught courses include those on data mining, time series analysis and applied statistics.
He is elected member of Committee of Mathematics of Polish Academy of Sciences and European Regional Committee of Bernoulli Society.
Title: Selection of active predictors for misspecified binary model
Selection of active predictors in high dimensional regression problems plays a pivotal role
in contemporary data mining and statistical inference. However, properties of frequently applied selection procedures such as consistent choice of an active set usually strongly rely on
assumption that data follows a specific model.
In the presentation we address this problem and discuss general setups when estimation procedures can appproximately recover the direction of the true vector of parameters and estimate
its support consistently. This explains sometimes observed phenomenon that certain procedures work well even when the underlying data generating mechanism is misspecified; e.g.
methods constructed for linear models are applied to binary regression. The basic reasoning
was discovered long ago by D. Brillinger and P. Rudd but it is scarcely known in data mining
As a particular application we introduce a two-stage selection procedure which first screens
predictors using LASSO method for logistic regression and then choses the final model via
optimization of Generalized Information Criterion on ensuing hierarchical family. We discuss
its properties and in particular the fact that in the case of misspecification it picks with large
probability a model which approximates Kullback-Leibler projection (in the average sense)
onto the family of logistic regressions.