Data mining day

Mercredi 12 Juin 2013
Centre de Développement des Technologies Avancées (CDTA)

9h00-10h00 : Revisiting maximal pattern mining problems

Jean-Marc Petit - LIRIS, INSA de Lyon, France

10h00-11h00 : Frequent string mining from large scale noisy data

Takeaki Uno - NII, Tokyo, Japon

11h30-12h30 : Une approche globale pour la fouille de graphes

Yahia Lebbah- Université d'Oran, Algérie

14h00-15h30 : Discussion : Datamining and related areas

Revisiting maximal pattern mining problems

Professeur Jean-Marc Petit - LIRIS, INSA de Lyon, France

Abstract. The framework of Mannila and Toivonen \cite{MT97} classifies pattern mining problems that are (isomorphically) equivalent to frequent itemset mining (FIM). Moreover, the notion of dualization turns out to be crucial to characterize the complexity of maximal patterns mining problems. Nevertheless, many pattern mining problems cannot benefit from this framework since the isomorphism requirement is too restrictive for many patterns such as sequences, episodes or graphs to mention a few. In this setting, our ambition is to extend their framework to take into account such complex patterns and studying their complexity. First, we identify and classify pattern structures, which are not boolean lattices, for which the dualization problem remains quasi-polynomial. We introduce convex embedding and poset reduction as key notions to obtain a new classification of pattern mining problems. Second, we point out how to adapt the seminal Dualize & Advance algorithm for these classes. Last but not least, we point out how known instances of complex pattern mining problems, including sequences and conjunctive queries, fit into our framework.

Biography.

Jean-Marc Petit is a professor in computer science at the engineering school INSA Lyon. From september 1997 to august 2005, he was an associate professor at the University Blaise Pascal, Clermont-Ferrand. He received his PhD in Computer Science from the University Claude Bernard in 1996 and graduated in computer science from the University Claude Bernard and ENS Lyon ("Magistère") in 1993. Since 2005, he is the head of the “pervasive information system” group (15 members) and has developed scientific partnerships with major companies (e.g. EDF R&D or Alcatel- Lucent) on new database applications. His main research interests concern databases and data mining and leaded three main scientific projects: DBA Compnion, Generules and iZi.

Retour au programme de la journée

Frequent string mining from large scale noisy data

Takeaki Uno - NII, Tokyo, Japon

Abstract.

Finding frequent string is a fundamental issue in data mining and information processing. The problem is actually easy from computation side, thanks to suffix array algorithms. However, once we allow errors for the occurrence, i.e., if we consider a string appears at a place if the string is *similar* (in short distance) to the substring beginning from the place, it turns to be very difficult problem. The reason is that quite many short strings can appear in many places if we allow errors of at most some constant k; on the other hand, if we allow errors according to the length, the monotone property is violated, and search algorithm will not be efficient. In the existing studies, we can say that no algorithm is efficient for this task. Instead of usual hill climbing approach, here we propose an approach based on similarity detection. In the approach, we first find all pairs of similar strings, and then find frequent strings according to the groups of similar strings. We also explain how to compute the similarity efficiency. As the result, from genome sequence we can extract frequent strings that are very long compared to existing researches, in quite short time, say 10 min.

Biography.

Takeaki Uno received the PhD degree (Doctor of Science) from Department of Systems Science, Tokyo Institute of Technology Japan, 1998. He was an assistant professor in Department of Industrial and Management Science in Tokyo Institute of Technology from 1998 to 2001, and have been an associate professor of National Institute of Informatics Japan, from 2001. His research topic is discrete algorithms, especially enumeration algorithms, algorithms on graph classes, and data mining algorithms. On the theoretical part, he studies low degree polynomial time algorithms, and hardness proofs. In the application area, he works on the paradigm of constructing practically efficient algorithms for large scale data that are data oriented and theoretically supported. In an international frequent pattern mining competition in 2004 he won the best implementation award. He got Young Scientists' Prize of The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in Japan, 2010.

Retour au programme de la journée

Une approche globale pour la fouille de graphes

Yahia Lebbah- Université d'Oran, Algérie

Abstract. Nous introduisons la problématique de la fouille de graphes et ses différentes approches de résolution, en mettant en avant les étapes coûteuses dans le processus de génération des sous-graphes fréquents dans une base de graphes. Nous verrons notamment que la fouille de graphes fait appel aux problématiques combinatoires de l'isomorphisme de sous-graphes et de graphes. Nous montrerons comment relâcher des problèmes combinatoires difficiles de la fouille de graphes, pour se ramener à une méthode globale qui génère un encadrement rigoureux des sous-graphes fréquents. Nous montrerons aussi une évaluation expérimentale de l'ensemble de l'approche, en explicitant le type de graphes où des gains intéressants sont obtenus.

Biography.

Yahia Lebbah received the PhD degree in computer science from Ecole des Mines de Nantes, France, 1999, and then joined university of Oran in Algeria as assistant professor. Since 2009, he is a professor. From 2010, he is the head of the LITIO laboratory in computer science at the university of Oran. He is also conducting the creation of the high performance computing center at the same university. He has a long time collaboration with the CeP team, I3S/CNRS laboratory at the university of Nice - Sophia Antipolis in France, where he is associate researcher. He has collaborated with COPRIN team at INRIA Sophia Antipolis and Celtique team at INRIA Rennes. His main research area is constraint programming (CP). He has particularly worked on solving constraints and continuous optimisation problems. His current research interests is the use of CP and optimisation techniques in global optimization, graph mining, software verification and multicriteria decision making.

Retour au programme de la journée