Contact Information


Department of Pure and Applied Mathematics,
University of Padova
Office number 720
c/o Torre Archimede
via Trieste, 63
35121 Padova, Italy

Office phone number: 0498271459

Email: dasan[]math.unipd.it (replace [] with @)

 

Publications

Book Chapters

"Self-Organizing Maps for Structured Domains:
Theory, Models, and Learning of Kernels",
F. Aiolli, G. Da San Martino, M. Hagenbuchner, A. Sperduti. In Innovations in Neural Information Paradigms and Applications, Studies in Computational Intelligence (2009). Original publication available at www.springerlink.com 
bibtex download article
(abstract)

Abstract: Self-Organizing Maps (SOMs) are a form of Machine Learning methods which are popularly applied as a tool to either cluster vectorial information, or to produce a topology preserving projection of high dimensional data vectors onto a low dimensional (often 2-dimensional) display space [20]. A SOM is generally trained unsupervised. The computational complexity of the underlying algorithms grows linearly with the size and number of inputs, which renders the SOM suitable for data mining tasks. The standard SOM algorithm is defined on input domains involving fixed sized data vectors. It is however recognized that many problem domains are naturally represented by structured data which are more complex than fixed sized vectors. Just to give some examples, in speech recognition, data is available in the form of variable length temporal vectors, while in Chemistry data is most naturally represented through molecular graphs.Moreover, numerous data mining tasks provide structural information which may be important to consider during the processing. For example, document mining in the world wide web involves both inter-document structure due to the formatting or hypertext structure, and intra-document structure due to hyperlink or reference dependencies. Note that any model capable of dealing with graphs can be used also in applications involving vectors, sequences, and trees, since these are special cases of graphs.

International Journals

"Learning Non-sparse Kernels by Self-Organizing Maps for Structured Data", F. Aiolli, G. Da San Martino, M. Hagenbuchner, A. Sperduti. In IEEE Transactions on Neural Networks (2009).  bibtex download article
(abstract)

Abstract: The development of neural network models able to encode structured input, and the more recent definition of kernels for structures, makes it possible to directly apply machine learning approaches to generic structured data. However, the effectiveness of a kernel can depend on its sparsity with respect to a specific dataset. In fact, the accuracy of a kernel method typically reduces as the kernel sparsity increases. The sparsity problem is particularly common in structured domains involving discrete variables which may take on many different values. In this paper, we explore this issue on two well known kernels for trees, and propose to face it by recurring to Self-Organizing Maps for Structures. Specifically, we show that a suitable combination of the two approaches, obtained by defining a new class of kernels based on the activation map of a Self-Organizing Map for Structures, can be effective in avoiding the sparsity problem and results in a system that can be significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on two relatively large corpora of XML formatted data and a dataset of user sessions extracted from website logs.

International Conferences

Year: 2011 - 2006

"Improving biomarker list stability by integration of biological knowledge in the learning process", T. Sanavia, F. Aiolli, G. Da San Martino, A. Bisognin, B. Di Camillo. ISMB 2011.  bibtex (abstract)

Abstract: n/a

"Extending Tree Kernels with Topological Information", F. Aiolli, G. Da San Martino, A. Sperduti. ICANN 2011.  bibtex download article
(abstract)

Abstract: The denition of appropriate kernel functions is crucial for the performance of a kernel method. In many of the state-of-the-art kernels for trees, matching substructures are considered independently from their position within the trees. However, when a match happens in similar positions, more strength could reasonably be given to it. Here, we give a systematic way to enrich a large class of tree kernels with this kind of information without aecting, in almost all cases, the worst case computational complexity. Experimental results show the eectiveness of the proposed approach.

"Sparsity Issues in Self-Organizing-Maps for Structures", M. Hagenbuchner, G. Da San Martino, A.C. Tsoi, A. Sperduti. ESANN 2011.  bibtex download article
(abstract)

Abstract: Recent developments with Self-Organizing Maps (SOMs) produced methods capable of clustering graph structured data onto a fixed dimensional display space. These methods have been applied successfully to a number of benchmark problems and produced state-of-the-art results. This paper discusses a limitation of the most powerful version of these SOMs, known as probability measure graph SOMs (PMGraphSOMs), viz., the sparsity induced by processing a large number of small graphs, which prevents a successful application of PMGraphSOM to such problems. An approach using the idea of compactifying the generated state space to address this sparsity problem is proposed. An application to an established benchmark problem, viz., the Mutag dataset in toxicology will show that the proposed method is effective when dealing with a large number of small graphs. Hence, this work fills a gap between the processing of a number of small graphs, and the processing of densely connected graphs using PMGraphSOMs.

"A New Tree Kernel Based on SOM-SD", F. Aiolli, G. Da San Martino, A. Sperduti. ICANN 2010.  bibtex download article
(abstract)

Abstract: Many different paradigms have been studied in the past to treat tree structured data, including kernel and neural based approaches. However, both types of methods have their own drawbacks. Kernels typically can only cope with discrete labels and tend to be sparse. On the other side, SOM-SD, an extension of the SOM for structured data, is unsupervised and Markovian, i.e. the representation of a subtree does not consider where the subtree appears in a tree. In this paper, we present a hybrid approach which tries to overcome these problems. In particular, we propose a new kernel based on SOM-SD which adds information about the relative position of subtrees (the route) to the activation of the nodes in such a way to discriminate even those subtrees originally encoded by the same prototypes. Experiments have been performed against two well known benchmark datasets with promising results.

"Route Kernels for Trees", F. Aiolli, G. Da San Martino, A. Sperduti. ICML 2009.  bibtex download article
(abstract)

Abstract: Almost all tree kernels proposed in the literature match substructures without taking into account their "relative positioning" with respect to one another. In this paper, we propose a novel family of kernels which explicitly focus on this type of information. Specifically, after defining a family of tree kernels based on routes between nodes, we present an efficient implementation for a member of this family. Experimental results on four different datasets show that our method is able to reach state of the art performances, obtaining in some cases performances better than computationally more demanding tree kernels.

"A Kernel Method for the Optimization of the Margin Distribution", F. Aiolli, G. Da San Martino, A. Sperduti. ICANN 2008.  bibtex download article (abstract)

Abstract: Recent results in theoretical machine learning seem to suggest that nice properties of the margin distribution over a training set turns out in a good performance of a classifier. The same principle has been already used in SVM and other kernel based methods as the associated optimization problems try to maximize the minimum of these margins. In this paper, we propose a kernel based method for the direct optimization of the margin distribution (KM-OMD). The method is motivated and analyzed from a game theoretical perspective. A quite efficient optimization algorithm is then proposed. Experimental results over a standard benchmark of 13 datasets have clearly shown state-of-the-art performances.

"Kernelized Self-Organizing Maps for Structured Data", F. Aiolli, G. Da San Martino, A. Sperduti, and M. Hagenbuchner. ESANN 2007.  bibtex download article (abstract)

Abstract: The suitability of the well known kernels for trees, and the lesser known Self-Organizing Map for Structures for categorization tasks on structured data is investigated in this paper. It is shown that a suitable combination of the two approaches, by defining new kernels on the activation map of a Self-Organizing Map for Structures, can result in a system that is significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on a relatively large corpus of XML formatted data.

"Efficient Kernel-Based Learning for Trees", F. Aiolli, G. Da San Martino, A. Sperduti, and A. Moschitti. CIDM 2007.  bibtex download article (abstract)

Abstract: Kernel methods are effective approaches to the modeling of structured objects in learning algorithms. Their major drawback is the typically high computational complexity of kernel functions. This prevents the application of computational demanding algorithms, e.g. Support Vector Machines, on large datasets. Consequently, on-line learning approaches are required. Moreover, to facilitate the application of kernel methods on structured data, additional efficiency optimization should be carried out. In this paper, we propose Direct Acyclic Graphs to reduce the computational burden and storage requirements by representing common structures and feature vectors. We show the benefit of our approach for the perceptron algorithm using tree and polynomial kernels. The experiments on a quite extensive dataset of about one million of instances show that our model makes the use of kernels for trees practical. From the accuracy point of view, the possibility of using large amount of data has allowed us to reach the state-of-the-art on the automatic detection of Semantic Role Labeling as defined in the Conference on Natural Language Learning shared task.

"Fast On-line Kernel Learning for Trees", F. Aiolli, G. Da San Martino, A. Sperduti, and A. Moschitti. ICDM 2006 (Short Paper).  bibtex download article (abstract)

Abstract: Kernel methods have been shown to be very effective for applications requiring the modeling of structured objects. However they usually are too computational demanding to be applied to complex learning algorithms, e.g. Support Vector Machines. Consequently, in order to apply kernels to large amount of structured data, we need fast on-line algorithms along with an efficiency optimization of kernel functions. In this paper we optimize tree kernel computation by representing set of trees with minimal Direct Acyclic Graphs (DAGs). This reduces the storage requirements and speeds up the computation as the kernel function over DAGs may correspond to the evaluation of a large number of trees. The experiments conducted show that substantial computational savings can be obtained for the perceptron algorithm.

"A new Swarm Intelligence Coordination Model Inspired by Collective Prey Retrieval and its Application to Image Alignment", G. Da San Martino, F. A. Cardillo, and A. Starita. PPSN06 2006.  bibtex download article (abstract)

Abstract: Swarm Intelligence is the emergent collective intelligence of groups of simple agents acting almost independently. Algorithms following this paradigm have many desirable properties: flexibility, decentralized control, robustness, fault tolerance. This paper presents a novel agent coordination model inspired by the way ants collectively transport large preys. In our model a swarm of agents, each having a different destination to reach, moves with no centralized control in the direction indicated by the majority of agents keeping its initial shape. The model is used to build an algorithm for the problems of image alignment and image matching. The novelty of the approach and its effectiveness are discussed.

"A new Swarm Intelligence Algorithm for Image Alignment", G. Da San Martino, F. A. Cardillo, and A. Starita. MEDSIP06 2006.  bibtex download article (abstract)

Abstract: Swarm Intelligence is the emergent collective intelligence of groups composed of simple and quasi-independent agents. Algorithms following this paradigm show many desirable properties: flexibility, decentralized control, robustness, fault tolerance. This paper presents a new agent coordination model loosely inspired by the way some species of ants collectively transport large preys. The model is used to build a Swarm Intelligence algorithm for the problem of image alignment. The novelty of the approach and its effectiveness are discussed.

Other Publications

"Stable Feature Selection for Biomarker Discovery: Use of Biological Information", T. Sanavia, F. Aiolli, G. Da San Martino, A. Bisognin, B. Di Camillo. BITS Annual Meeting 2011.  bibtex download article (abstract)

Abstract: Studies on the transcriptome done with microarray data have experienced a huge diffusion and the analysis of these data has been promising for the identification of key genes which can be used as diagnostic/prognostic markers for the disease. In particular, supervised classification analysis has been largely applied to address this issue and the state-of-the-art machine learning methods have demonstrated to give solutions with empirically good accuracy. However, the stability of the selected biomarkers is also a very important task. If an accurate system tends to select the same biomarkers in different independent experiments, then it is more likely that the selected biomarkers are the right ones. In general, there are two stability issues arising in gene expression classification and analysis. First, it would be desirable that classifiers obtained using different training data do not change too much (low variance of the classifiers). Since training data are often limited, predictive models obtained from different datasets can be extremely different. Secondly, since the number of features is generally very high, then features can be combined in many different ways to give solutions able to explain data. As a consequence, many possible sets of features can be considered relevant to the task and equally good in terms of accuracy. This characteristic makes the process of selecting the set of relevant features which are relevant for a classification task a very hard problem. A possible approach to improve list stability is to integrate biological information from genomic databases in the learning process.

"Mining Structured Data", G. Da San Martino and A. Sperduti. Computational Intelligence Magazine 2010.  bibtex download article (abstract)

Abstract: In many application domains, the amount of available data increased so much that humans need help from automatic computerized methods for extracting relevant information. Moreover, it is becoming more and more common to store data that possess inherently structural or relational characteristics. These type of data is best represented by graphs, which can very naturally represent entities, their attributes, and their relationships to other entities. In this article, we review the state of art in graph mining, and we present advances in processing trees and graphs by two Computational Intelligence classes of methods, namely Neural Networks and Kernel Methods.

"Kernel Methods for Tree Structured Data (Ph.D. Thesis)" G. Da San Martino. Technical Report, UBLCS-2009-04, Department of Computer Science, University of Bologna.  bibtex download article (abstract)

Abstract: Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

 

Teaching

Year 2012-2013

Modulo di informatica teorica per il corso:

Year 2011-2012

Modulo di informatica teorica per il corso:

Year 2007-2008

Didattica di supporto per il corso:

  • Fondamenti di Informatica, corso di laurea in Chimica Industriale

Year 2006-2007

Didattica di supporto per il corso:

  • Fondamenti di Informatica, corso di laurea in Chimica Industriale
  • Informatica di base, corso di laurea in Biologia

Software

SVM-Light Add-ons

Various kernels for trees and utilities implemented as modules of Alessandro Moschitti's SVM-lighT-TK 1.2 software, which, in turn, is based on SVM-Light 5.01 by Thorsten Joachims.

Biomarkers

A software for integrating biological knowledge, in the form of similarity matrices, into the learning process.

Python Tree Kernels

A set of classes to handle trees and compute kernel functions on them.