Advanced search
Start date

Data classification in complex networks via pattern conformation, data importance and structural optimization

Full text
Murillo Guimarães Carneiro
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Zhao Liang; Fabricio Aparecido Breve; Elbert Einstein Nehrer Macau; Renato Tinós; Jo Ueyama
Advisor: Zhao Liang

Data classification is a machine learning and data mining task in which a classifier is trained over a set of labeled data instances in such a way that the labels of new instances can be predicted. Traditionally, classification techniques define decision boundaries in the data space according to the physical features of a training set and a new data item is classified by verifying its relative position to the boundaries. Such kind of classification, which is only based on the physical attributes of the data, makes traditional techniques unable to detect semantic relationship existing among the data such as the pattern formation, for instance. On the other hand, recent works have shown the use of complex networks is a promissing way to capture spatial, topological and functional relationships of the data, as the network representation unifies structure, dynamic and functions of the networked system. In this thesis, the main objective is the development of methods and heuristics based on complex networks for data classification. The main contributions comprise the concepts of pattern conformation, data importance and network structural optimization. For pattern conformation, in which complex networks are employed to estimate the membership of a test item according to the data formation pattern, we present, in this thesis, a simple hybrid technique where physical and topological associations are produced from the same network. For data importance, we present a technique which considers the individual importance of the data items in order to determine the label of a given test item. The concept of importance here is derived from PageRank formulation, the ranking measure behind the Googles search engine used to calculate the importance of webpages. For network structural optimization, we present a bioinspired framework, which is able to build up the network while optimizing a task-oriented quality function such as classification, dimension reduction, etc. The last investigation presented in this thesis exploits the graph representation and its hability to detect classes of arbitrary distributions for the task of semantic role diffusion. In all investigations, a wide range of experiments in artificial and real-world data sets, and many comparisons with well-known and widely used techniques are also presented. In summary, the experimental results reveal that the advantages and new concepts provided by the use of networks represent relevant contributions to the areas of classification, learning systems and complex networks. (AU)

FAPESP's process: 12/07926-3 - Evolutionary Algorithms to Semantic Role Labeling
Grantee:Murillo Guimarães Carneiro
Support type: Scholarships in Brazil - Doctorate