# Abstracts of HIS02 Presentations

An Efficient Algorithm in Optimal Partition Problem for Trees
Mounir Asseraf - ESIEA Recherche; school of engineer
Addressing the inherent computational complexity of the construction of optimal trees, we will present in this paper an efficient procedure to find the optimal partition for categorical variables. The attribute selection metric will be presented for this optimisation. It's the Kolmogorov-Smirnov criterion adapted to discrete variables. The algorithm converges to the globally optimal solution in polynomial time with three degrees. We will compare the complexity time with other classical algorithms and show that there is a significant difference in time required to find the optimal partition.

A Railway Interlocking Safety Verification System Based on Abductive Paraconsistent Logic Programming
Kazumi Nakamatsu - Himeji Institute of Technology
Jair Abe - Paulista University
Atsuyuki Suzuki - Shizuoka University
We introduce a safety verification system for railway interlocking based on a paraconsistent logic program called an Extended Vector Annotated Logic Program(EVALP for short) and its abductive procedure called SLDNFAÉˆ, which is an extended version of SLDNFA by Denecker and De Schreye.

Eigenspace-based Face Recognition: A Comparative Study of Different Hybrid Approaches
Pablo Navarrete - Universidad de Chile
Javier Ruiz-del-Solar - Universidad de Chile

Different eigenspace-based approaches have been proposed for the recognition of faces. They differ mostly in the kind of projection method been used, in the projection algorithm been employed, in the use of simple or differential images before/after projection, and in the similarity matching criterion or classification method employed. Statistical, neural, fuzzy and evolutionary algorithms are used in the implementation of those systems. The aim of this paper is to present an independent, comparative study between some of these hybrid eigenspace-based approaches. This study considers theoretical aspects as well as simulations performed using a small face database (Yale Face Database) and a large face database (FERET).

Pattern Recognition with Ultrasonic Sensor Using Classification Methods
NaÔma Ait Oufroukh - Cemif Complex Systems Group
Etienne Colle - Cemif Complex Systems Group
This paper describes a binaural ultrasonic sensor for mobile robot recognition of simple objects with non-parametric methods (K nearest neighbours and a neural network). These methods identify and then exploit echo features defined as characteristics of the simple objects. The features selection is also studied and the reduction of parameters is obtained with several methods: Sequential Backward, Forward Selection and Branch and bound. The parameters correlation is verified by correlation circle given by the principals component analysis. The result of feature selection and classification are presented.

A Color Pattern Recognition Problem based on the Multiple Classes Random Neural Network Model
Jose Aguilar - Universidad de los Andes
The purpose of this paper is to describe the use of the multiple classes random neural network model to recognize patterns having different colors. We propose a learning algorithm for the recognition of color patterns based upon the non-linear equations of the multiple classes random neural network model using gradient descent of a quadratic error function. In addition, we propose a progressive retrieval process with adaptive threshold value. The experimental evaluation shows that our approach provides good results.

Sparse Distributed Memory with Adaptive Threshold
Jose Aguilar - Universidad de los Andes
Niryaska Perozo - Universidad de los Andes
Sparse Distributed Memory is a content addressable, associative memory technique which relies on close memory items tending to be clustered together, with some abstraction and blurring of details. This paper discusses the limitations of the original model. Then, we propose a method which improve Sparse Distributed Memory efficiency through an adaptive threshold. The results obtained are good and promising.

The Use of a Back Propagation Neural Network to Determine the Load Distribution on a Component
Ramin Amali - University of the West of England
John Vinney - University of the West of England
Siamak Noroozi - University of the West of England
Vikash Patel - University of the West of England
A method that combines a Back Propagation Neural Network (BPNN) with the data obtained using Finite Element Analysis (FEA) is introduced in this paper as an approach to solve inverse problems. This paper presents the feasibility of this approach. It demonstrates that the method approach works under laboratory or controlled conditions. FEA results are used to train the BPNN. The component used is a simple cantilever plate resembling an aircraft wing. Once trained, the approximate load distribution solution to any problem, bound by the training envelope, can be obtained quickly and accurately.

Complete Algorithm to realize CI Model-based Control and Monitoring Strategies on Microcontroller Systems
Klaus-Dietrich Kramer - Hochschule Harz - University of Applied Sciences
Steffen Patzwahl - Hochschule Harz - University of Applied Sciences
Thomas Nacke - Institute for Bioprocessing and Analytical Measurement Techniques e.V.
This paper describes a complete algorithm for the realization of model-based control and monitoring strategies adopting computational intelligence (CI) strategies to microcontroller systems. Process interfaces, data mining and realization of process models and controller structures are subjected to detailed discussion. A further key issue is exporting the complete algorithm to target hardware systems with microcontrollers and/or digital signal processors. The paper outlines a software tool for user-friendly implementation of the complete algorithm. The application presented is a fuzzy control (FC) system to control substrate supply to biogas pilot plants.

Performance-guided Neural Network for Rapidly Self-Organising Active Network Management
Sin Wee Lee - Computational Intelligence Research Group, Leeds Metropolitan University
Dominic Palmer-Brown - Computational Intelligence Research Group, Leeds Metropolitan University
Jonathan Tepper - The Nottingham Trent University
Christopher Roadknight - BTexact Technologies
A neural network architecture is introduced for the real-time learning of input sequences using external performance feedback. The target problem domain suggests the use of Adaptive Resonance Theory (ART) networks [1] that are able to function in a robust and fast real-time adaptive active network environment, where user requests and new proxylets (services) are constantly being introduced over time [2,3]. The architecture learns, self-organises and self-stabilises in response to the user requests and maps the requests according to the types of proxylets available. However, the ART1 architecture and the original algorithm are modified to incorporate an external feedback mechanism whereby the performance of the system is fed into the network periodically. This modification, namely the ësnap-driftí algorithm, uses fast convergent, minimalist learning (snap) when the overall network performance has been poor and slow learning (drift towards user request input patterns) when the performance has been good. A key concern of the research is to devise a mechanism that effectively searches for alternative solutions to the ones that have already been tried, guided simultaneously by the input data (bottom-up information) and the performance feedback (top-down information). Preliminary simulations evaluate the two-tiered architecture using a simple operating environment consisting of simulated training and test data.

An Automated Hybrid Reasoning System for Forecasting
Florentino Fernandez Riverola - University of Vigo
Juan Manuel Corchado Rodriguez - University of Salamanca
A hybrid neuro-symbolic problem solving model is presented in which the aim is to forecast parameters of a complex and dynamic environment in an unsupervised way. In situations in which the rules that determine a system are unknown, the prediction of the parameter values that determine the characteristic behaviour of the system can be a problematic task. The proposed system employs a case-based reasoning model to wrap a growing cell structures network, a radial basis function network and a set of Sugeno fuzzy models to provide an accurate prediction. Each of these techniques is used in a different stage of the reasoning cycle of the case-based reasoning system to retrieve, to adapt and to review the proposed solution to the problem. This system has been used to predict the red tides that appear in the coastal waters of the north west of the Iberian Peninsula. The results obtained from those experiments are presented.

Rule Extraction from Bagged Neural Networks
Guido Bologna - Swiss Institute of Bioinformatics
This work presents a technique to extract rules from ensembles of Discretized Interpretable Multi Layer Perceptrons'' (DIMLPs) based on the characterization of discriminant hyper-planes. Experiments were achieved on 25 classification problems using single DIMLP networks and bagged ensembles. It turned out that extracted rule sets from bagged DIMLPs were on average significantly more accurate than single networks (78.0% versus 76.4%), and slightly less complex. Finally, rules were slightly more accurate than those generated from ensembles of C4.5 decision trees (78.0% versus 77.8%), while exhibiting significantly smaller complexity in almost all classification problems.

Jigsawing : A Method to Create Virtual Examples in OCR data
Vishwanathan S.V.N - Dept of Comp. Sci and Automation, Indian Institute of Science
Narasimha Murty, M - Dept of Comp. Sci and Automation, Indian Institute of Science
In this theoretical note we propose the use of a suffix tree on square matrices for compact representation of a set of training patterns. We show how a test pattern can be generated by \emph{jigsawing} various regions from different training patterns. This in turn leads us naturally to a compact data dependent representation of a test pattern which we call the \emph{description tree}. We envisage the use of the description tree in a variety of applications including nearest neighbor classifiers, data dependent distance norms, kernel methods and syntactic pattern recognition. We provide statistical learning theory based arguments to show that our method generates valid virtual examples and hence will lead to better classification accuracy.

Designing Not-So-Dull Virtual Dolls
Fabio Zambetta - Dipartimento di Informatica, Universitý degli Studi di Bari
Graziano Catucci - Dipartimento di Informatica, Universitý degli Studi di Bari
Intelligent virtual agents exhibiting autonomous behavior rather than mere reactions to users actions are going to become a major requirement for modern web sites. In this paper we present SAMIR, a system conceived to design intelligent agents with a 3D animated look as a front-end, to enhance the user interaction with the web applications itís embedded into.

Model-Based Restoration of Short-Exposure Solar Images
Michal Haindl - Institute of Information Theory and Automation
Stanislava Simberova - Astronomical Institute
This paper presents a derivation of a fast recursive filter for image restoration if degradation obeys a linear degradation model with the unknown possibly non-homogeneous point-spread function. It is assumed that for every ideal undegraded image several degraded observed images are available. Pixels in the vicinity of steep discontinuities are left unrestored to minimize restoration blurring effect. The degraded image is assumed to follow a causal simultaneous multidimensional regressive model and the point-spread function is estimated using the local least-square estimate.

Spiking Neurons in Clustering of Diabetic Retinopathy Data
Krzysztof Cios - University of Colorado at Denver
William Jackson - Barbara Davis Center for Juvenile Diabetes
Waldemar Swiercz - University of Colorado at Boulder
Laura Springhetti - Barbara Davis Center for Juvenile Diabetes
In simple models of biological neurons the output does not depend on time. In contrast, the spiking neuron model, in response to external stimulation, generates a series of action potentials (spikes). In the paper we use MacGregorís spiking neuron model, and the Temporal Correlation Learning (TCL) rule to update synaptic connections between spiking neurons. The network of such neurons, with the TCL rule, was shown to be capable of clustering, without the need of specifying the number of clusters. In this paper the network of spiking neurons is used for finding clusters in eye images of diabetic retinopathy patients. Diabetic retinopathy is a disease caused by diabetes that if not treated can lead to major loss of vision.

Condition Monitoring, Root Cause Analysis and Decision Support on Urgency of Actions
Galia Weidl - ABB Corporate Research & Malardalen University
Anders Madsen - Hugin Expert A/S
Erik Dahlquist - ABB Process Industries & Malardalen University
We discuss the use of a hybrid system utilizing Object Oriented Bayesian networks and influence diagrams for probabilistic reasoning under uncertainties in industrial process operations. The Bayesian networks are used for condition monitoring and root cause analysis of process operation. The recommended decision sequence of corrective actions and observations is obtained following the ìmyopicî approach. The BN inference on most probable root cause is used in an influence diagram for taking decisions on urgency of corrective actions vs. delivery deadline. The build-in chain of causality from root cause to process faults can provide the user with explanation facility and a simulation tool of the effect of intended actions.

Towards a Unique Framework to Describe and Compare Diagnosis Approaches
Cecilia Zanni - LSIS
Marc Le Goc - LSIS
Claudia Frydman - LSIS
This paper introduces a unique framework to describe the different approaches to intelligent monitoring and diagnosis present in the literature. In first place, the state of the art in diagnosis is reviewed and then we propose a framework for analyzing the approaches presented so far, based on the KADS standard for development of knowledge based systems. In the end, we present our conclusions on the conceptual level of description of these systems, which lead us to state a general structure for them.

UniLR: An Automated Fuzzy Legal Reasoner
Dharmendra Sharma - University of Canberra
Automating legal reasoning is an interesting problem in artificial intelligence. In this paper, the problem of deciding on student disciplinary cases is studied, the various characteristics of the problem are identified and a prototype rule-based expert system that uses fuzzy reasoning is developed for the problem. The main motivation for the work is to study the difficulties in automating legal reasoning so that a sound verdict is reached for a small but an intricate domain. Some results are presented and the experience gained from the project is discussed. Future work is discussed on strengthening the prototype to include a formal case specification and an interaction language, and its drawing (through automated clustering) and use of relevant information from a base of previous cases. The UniLR prototype has been successfully tested for sample data.

A Fuzzy Relatedness Measure for Determining Interestingness of Association Rules
Balasubramaniam Shekar - Indian Institute of Management Bangalore
Rajesh Natarajan - Indian Institute of Management Bangalore
In Knowledge Discovery in Databases (KDD)/ Data Mining literature, ëinterestingnessí measures are used to rank rules according to the ëinterestí a particular rule is expected to evoke in a user. In this paper, we introduce an aspect of interestingness called ëitem-relatednessí to determine interestingness of item-pairs occurring in association rules. In actuality, association rules that contain weakly-related item-pairs are the ones that are interesting. We elucidate and quantify three different types of item-relatedness. Relationships corresponding to item-relatedness proposed by us are shown to be captured by paths in a ëfuzzy taxonomyí (an extension of the concept hierarchy tree). We then combine these measures of item-relatedness to arrive at a total-relatedness measure. This total relatedness measure appropriately combines each aspect of relatedness-relationships among items. We finally demonstrate the efficacy of this total measure on a sample taxonomy. We analyse the results and explain intuitive correspondences between numerical results and reality.

Experimental Evaluation of the PLA-Based Permutation-Scheduling
Joanna JEDRZEJOWICZ - Institute of Mathematics, GdaÒsk University
Piotr JEDRZEJOWICZ - Department of Information Systems, Gdynia Maritime University
The paper proposes implementations of the population learning algorithm (PLA) for solving three well-known NP-hard permutation-scheduling problems. PLA is a recently developed method belonging to the class of population-based algorithms and used for solving difficult optimization problems. The first of the discussed problems involves scheduling tasks on a single machine against common due date with earliness and tardiness penalties. The second is known as the permutation flow shop problem. The third one involves scheduling tasks on a single machine with total weighted tardiness as a criterion. To evaluate the proposed implementations computational experiments have been carried. Experiments involved solving available sets of benchmark problems and comparing the results with the optimum or best-known solutions. PLA has found better upper bounds on several benchmark instances.

Set Approximation Quality Measures in the Variable Precision Rough Set Model
Wojciech Ziarko - University of Regina
The article introduces the basic notions of the variable precision rough set model (VPRS) including the parametric definitions of lower approximation, boundary and negative regions of a set. The main focus of the article is on the evaluation of the resulting set approximations and probabilistic decision tables using a number of proposed probabilistic measures. The application of the measures to evaluation of probabilistic decision tables is illustrated with a comprehensive example.

Alliance Formation with Several Coordintaors
Vladimir Marik - Czech Technical University
Viktor Mashkov - Czech Technical University
Practical issue of alliance formation is considered as a subproblem of a more general coalition formation problem in the context of multi-agent systems. The paper describes the process of creating alliances with specific restrictions applied. The intended alliance is formed by several independent coordinators which exchange information during the alliance formation process. The main goal of the research consists in highliting the problems occurring when alliance is formed by several coordinators and sketching the ways of their solution. The approach proposed integrates the current results in both the fields of the multi-agent research and the complex discrete system diagnostics.

Clustering Web User Interests Using Self Organising Maps
Xiaozhe Wang - School of Business Systems, Faculty of Information Technology
Kate A. Smith - School of Business Systems, Faculty of Information Technology
This paper presents an approach to clustering a Web userís interests represented as text term maps using the unsupervised Neural Networks algorithm (Self Organising Map) from the records in the particular userís history file. Self Organising Map is a good tool for clustering the text data set into a low-dimensional regular grid that can be visualised as maps labeled with text terms. In this research, an experiment was carried out to find the User Interests Term Map in which all associated terms could be grouped in the same cluster. The text terms based on the Web userís interests could be applied to the intelligent Web query search in future work.

Web Traffic Mining Using a Concurrent Neuro-Fuzzy Approach
Xiaozhe Wang - School of Business Systems, Faculty of Information Technology
Ajith Abraham - School of Business Systems, Faculty of Information Technology
Kate Smith - School of Business Systems, Faculty of Information Technology
Web servers play a crucial role to convey knowledge and information to the end users. With the popularity of the WWW, discovering the hidden information about the users and usage pattern is critical to determine effective marketing strategies and to optimise the server usage and accommodate future growth. Many of the currently available server analysis tools could provide only statistical data without much useful information. Mining useful information becomes a challenging task when the user traffic volume is enormous and keeps on growing. In this paper, we propose a concurrent neuro-fuzzy model to analyse useful information from the available statistical/text data from the Web log analyser. We made use of the cluster information generated by Self Organising Map (SOM) for data analysis and a Fuzzy Inference System (FIS) to forecast the daily and hourly traffic volume. Empirical results clearly demonstrate that the proposed hybrid technique is efficient and could be extended to other Web environments.

A Data Preparation Bayesian Approach for a Clustering Genetic Algorithm
Estevam Hruschka Jr. - COPPE / Universidade Federal do Rio de Janeiro
Eduardo Hruschka - Universidade Tuiuti do Paran·
Nelson Ebecken - COPPE / Universidade Federal do Rio de Janeiro
The substitution of missing values is an important task in data mining applications and it can be performed by means of many methods. This work describes the use of the bayesian algorithm K2 as a data preparation tool for a clustering genetic algorithm. We illustrate the proposed method by means of simulations in three datasets: Ruspini, 200 Randomly Generated and Wisconsin Breast Cancer. The obtained results show that the substitution Bayesian method is adequate in the Clustering Genetic Algorithm context.

Using Genetic Algorithms for Minimizing the Production Costs of Hollow Core Slabs
Vanessa Castilho - Departamento de Engenharia de Estruturas - EESC - USP
Maria do Carmo Nicoletti - Departamento de ComputaÁ“o - UFSCar
Mounir El Debs - Departamento de Engenharia de Estruturas - EESC - USP
Genetic algorithms (GAs) are adaptive methods based on the genetic process of biological organisms that have been successfully applied to a variety of tasks, in areas such as function optimization, parameter tuning, learning, etc. The main goal of this work is to investigate a set of selection strategies which are used by a typical GA for minimizing the cost function of prefabricated hollow core slabs and to demonstrate GA's robustness over a conventional method with respect to the complexity of the problem and quality of the solution

MAYBE - Multi-Agent Yield-Based Engineering : Improve Training in the Emergency Room Chain
Gouarderes Sophie - UniversitÈ de Pau et des Pays de l'Adour
Gouarderes Guy - IUT Bayonne
Delpy Philippe - Centre Hospitalier de la Cote Basque
This paper describes a method of multi-agent analysis and design for reactive, real-time information systems, relating to complex and risks applications in medicine. According to specific needs in emergency healthcare units : spatio-temporal deployment of heterogeneous tasks, non-determinism of actors and self-organization in an unpredictable and/or disrupted environment, we propose MAYBE - Multi-Agent Yield-Based Engineering. MAYBE is a solution that makes possible for the agents to evolve and adapt by instantiation in different contexts. This paper details the various stages of the methodology applied to an emergency case, in parallel with the computerization of the process. It also compares the issues with other current work.

Analyzing the founder effect in simulated evolutionary processes using gene expression programming
Candida Ferreira - Gepsoft
Gene expression programming is a genotype/phenotype system that evolves computer programs encoded in linear chromosomes of fixed length. The interplay between genotype (chromosomes) and phenotype (expression trees) is made possible by the structural and functional organization of the linear chromosomes. This organization allows the unconstrained operation of important genetic operators such as mutation, transposition, and recombination. Although simple, the genotype/phenotype system of gene expression programming can provide some insights into natural evolutionary processes. In this work the question of the initial diversity in evolving populations of computer programs is addressed by analyzing populations undergoing either mutation or recombination. The results presented here show that populations undergoing mutation recover practically undisturbed from evolutionary bottlenecks whereas populations undergoing recombination alone depend considerably on the size of the founder population and are unable to evolve efficiently if subjected to really tight bottlenecks.

Embedding human reasoning in soft computing
Vive Kumar - Simon Fraser University, Surrey, Canada
Most contemporary computing systems situate humans only as the end users. Alternatively, human-in-the-loop computing systems employ humans to buttress the system when faced with hard tasks, which enables human reasoning to be used as part of the computational process. This paper discusses how human reasoning can be embedded as part of a Soft Computing environment and examines its implications. It exemplifies the arguments using a prototype help recommender system and highlights how pedagogical recommendations are made with the integration of human reasoning with system reasoning mechanisms. It also highlights a number of research fronts that result from embedding human reasoning to complement other reasoning mechanisms that are present in Soft Computing.

A Study of K-Nearest Neighbour as an Imputation Method
Gustavo Batista - University of S“o Paulo - USP
Maria Monard - University of S“o Paulo - USP
Data quality is a major concern in Machine Learning and other correlated areas such as Knowledge Discovery from Databases (KDD). As most Machine Learning algorithms induce knowledge strictly from data, the quality of the knowledge extracted is largely determined by the quality of the underlying data. One relevant problem in data quality is the presence of missing data. Despite the frequent occurrence of missing data, many Machine Learning algorithms handle missing data in a rather naive way. Missing data treatment should be carefully thought, otherwise bias might be introduced into the knowledge induced. In this work, we analyse the use of the k-nearest neighbour as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set by some plausible values. Our analysis indicates that missing data imputation based on the k-nearest neighbour algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data.

Nonlinear Principal Component Analysis to Preserve the Order of Principal Components
Ryo Saegusa - Dept. of Applied Physics, Waseda University
Shuji Hashimoto - Dept. of Applied Physics, Waseda University
Principal component analysis (PCA) is an effective method of linear dimensional reduction. Because of its simplicity in theory and implementation, it is often used for analysis in various disciplines. However, because of its linearity, PCA is not always suitable, and has redundancy in expressing data. To overcome this problem, some methods of nonlinear PCA have been proposed. However, most of the methods have drawbacks, such that the number of principal components must be predetermined, and also the order of the generated principal components is not explicitly given. In this paper, we propose a hierarchical neural network model composed of a number of multi-layered perceptrons to perform nonlinear PCA that preserves the order of the principal components. Moreover, our method does not need to know the number of the principal components in advance. The effectiveness of the proposed model will be shown through experiments.

A Focus and Constraint-Based Genetic Algorithm for Interactive Directed Graph Drawing
Hugo do Nacimento - School of Information Technologies, University of Sydney
Peter Eades - School of Information Technologies, University of Sydney
This paper presents a user-driven genetic algorithm for directed graph drawing. An interactive framework is considered where users can focus the algorithm on regions of the drawing that need major improvement, or include domain knowledge as layout constraints. The paper describes how focus and user constraints are managed by the genetic algorithm. The combination of userís skills with automatic tools allows a more flexible and efficient optimization method, when compared to traditional non-interactive genetic algorithms. Issues regarding memory usage, processing time, solution representation and convergence are discussed here.
Identifying PatternsS of Corporate Tax Payment
Maria Cleci Martins - University of Gloucestershire
Iria Garaffa - Universidade Luterana do Brasil
This research investigates the possibility of identifying patterns for corporate tax payment behaviour behind the Government database on companies. The aim is to reveal companies showing abnormal behaviour or outliers (companies whose behaviour on tax payment is away from the average). Knowing whether or not the outliers are evading taxes will be the result of in site inspections which are under development. We are particularly interested in analysing whether there is correlation between the dependent ñ the tax paid in earlier years ñ and the independent variables, such as total revenue and number of employees. A preliminary data analysis shows that there is strong correlation between those variables, and therefore the data might be appropriate for the task. However, patterns are only unveiled after using a two-step methodology: Clustering and Backpropagation Neural Networks (BNN). Clusters of similar companies are investigated separately using BNN. Analysis of results from the Artificial Neural Networks (ANN) was based on the comparison between the forecast and the true amount of tax paid by companies each year.

Dialogue Act Connectionist Detection in a Spoken Dialogue System
Emilio Sanchis - Dpto. Sistemas Informaticos y Computacion. Universidad Politecnica de Valencia
Maria Jose Castro - Dpto. Sistemas Informaticos y Computacion. Universidad Politecnica de Valencia
We present an approach to dialogue act detection within the framework of a domain-specific dialogue system. The task consists of answering telephone queries about train timetables, prices and services for long distance trains in Spanish. In this system, the representation of the meaning of the user utterances is made by means of "dialogue acts", which determine the type of communication of the user turn, and by their associated "case-frames", which supply the data of the utterance. We focus on the classification of a user turn given the utterance in a specific class of dialogue act by using multilayer perceptrons. This classification can help in the posterior processes of understanding and dialogue management. Results of experiments with the correct transcriptions of the user utterances (text data) and with the sequences of words obtained from the recognition process (speech data) are presented.

Factor Analysis with Qualitative Factors as Fuzzy Numbers
Elisabeth Rakus-Andersson - Blekinge Institute of Technology
Leszek Zakrzewski - System Research Institute of Polish Academy of Sciences (Phd studies)
The classical version of Factor Analysis is seldom used in the case of qualitative factors. We now propose a fuzzy interpretation of the method, which gives the possibility to investigate the strength of the factor influence on a tested variable. By assuming that fuzzy numbers in the L-R form represent the variable and the factors, which are the qualitative parameters, we are able to perform all the operations when following the Factor Analysis algorithm. The conception of a new fuzzy space with particularly defined op

Determining The Degree of Generalization Using An Incremental Learning Algorithm
Pablo Zegers - Universidad de los Andes
Malur Sundareshan - University of Arizona
Any Learning Machine (LM) trained with examples poses the same problem: how to determine whether the LM has achieved an acceptable level of generalization or not. This work presents a training method that uses the data set in an incremental manner such that it is possible to determine when the behavior displayed by the LM during the learning stage truthfully represents its future behavior when confronted by unseen data samples. The method uses the set of samples in an efficient way, which allows discarding all those samples not really needed for the training process. The new training procedure, which will be called Incremental Training Algorithm'', is based on a theoretical result that is proven using recent developments in statistical learning theory. A key aspect of this analysis involves identification of three distinct stages through which the learning process normally proceeds, which in turn can be translated into a systematic procedure for determining the generalization level achieved during training. It must be emphasized that the presented algorithm is general and independent of the architecture of the LM and the specific training algorithm used. Hence it is applicable to a broad class of supervised learning problems and not restricted to the example presented in this work.

Using a Helper FFN to Represent the Cost Function for Training DRNNís by Gradient Descent.
Roelof Brouwer - University College of the Cariboo
Abstract. This research is concerned with a gradient descent training algorithm for a target network that makes use of a helper feed-forward network (FFN) to represent the performance function used in training the target network. A helper FFN (HFFN) is trained because the mathematical form of the performance function for the target network in terms of the trainable parameters, P, is not known yet data for the relationship can be generated. The transfer function of the HFFN provides a differentiable function for the performance function of the parameter vector, P, for the target network allowing gradient search methods for finding the optimum P. The method is applied to the training of discrete recurrent networks (DRNNs) that are used as a tool for classification of temporal sequences of characters.

An Empirical Comparison of Kernel Selection for Support Vector Machines
Shawkat Ali - Monash University
Ajith Abraham - Oklahoma State University
Abstract. Support Vector Machine (SVM) has gained much attention as an efficient pattern recognition tool primarily between two classes problems by finding a decision surface determined by certain points of the learning set, termed Support Vector (SV). In this paper, we examine how to discriminate SVM for two class classification problems with different kernel settings. We also compare SVM with other three popular learning algorithms, namely Navie Bayes, C4.5 and neural network in terms of accuracy and computational complexity. Our studies reveal that SVM is the best choice for classification and SVM polynomial kernel is the best choice when compared to others.

Recognizing Malicious Intention in an Intrusion Detection Process
Fabien Autrel - ONERA Toulouse
Alexandre MiËge - ENST Paris
Salem Benferhat - IRIT Toulouse
Generally, the intruder must perform several actions, organized in an {\em intrusion scenario}, to achieve his or her malicious objective. We argue that intrusion scenarios can be modelled as a planning process and we suggest modelling a malicious objective as an attempt to violate a given security requirement. Our proposal is then to extend the definition of attack correlation presented in \cite{Cup02} to correlate attacks with intrusion objectives This notion is useful to decide if a sequence of correlated actions can lead to a security requirement violation. This approach provides the security administrator with a global view of what happens in the system. In particular, it controls unobserved actions through hypothesis generation, clusters repeated actions in a single scenario, recognizes intruders that are changing their intrusion objectives and is efficient to detect variations of an intrusion scenario. This approach can also be used to eliminate a category of false positives that correspond to false attacks, that is actions that are not further correlated to an intrusion objective.

3D-CG Avatar Motion Design by means of Interactive Evolutionary Computation
Hitoshi Iba - The University of Tokyo
Nao Tokui - The University of Tokyo
Hiromi Wakaki - The University of Tokyo
The motion of a 3D-CG avatar is recently used in many games and movies. But it is not easy to generate human motion. Also along with the increasing spread of the Internet, the users want to use various expressions on the web. However the users who don't have special techniques cannot create human motion. In the light of foregoing, the system by which the users can create human motion in an available environment is requierd. This paper describes a new approach to generating human motion, more easily and semi-automatically by means of Interactive Evolutionary Computation (IEC). In our system the profile of the avatar is based on the Humanoid Animation standard in order to popularize easily.

An imperfect string matching experience using deformed fuzzy automata
Jose Javier Astrain - Universidad Publica de Navarra
Jose Ramon Garitagoitia - Universidad Publica de Navarra
Jesus Villadangos - Universidad Publica de Navarra
Federico FariÒa - Universidad Publica de Navarra
This paper presents a string matching experience using deformed fuzzy automata for the recognition of imperfect strings. We propose an algorithm based on a deformed fuzzy automaton that calculates a similarity value between strings having a non-limited number of edition errors. Different selections of the fuzzy operators for computing the deformed fuzzy automaton transitions allows to obtain different string similarity definitions. The selection of the parameters determining the deformed fuzzy automaton behavior is obtained via genetic algorithms.

Panoramic View System for Extracting Key Sentences based on Viewpoints and an Application to a Search Engine
Wataru Sunayama - Osaka University
Masahiko Yachida - Osaka University
Since there are many resources on the WWW, it has become natural for us to extract available information from them. We would like to see many documents in order to get useful information. Though summaries are useful pieces of documents, a document has various viewpoints to be summarized. Therefore, if a viewpoint of a summary is different from user's, a user cannot grasp the contents of the document correctly, and the user has to see through the documents until the end. In this paper, we present a system which makes a summary based on a user's viewpoint by user's search keywords.

Reconstruction of conditional distribution field based on empirical data
Chervonenkis Alexey - Royal Holloway University of London (London),
In the paper the problem of conditional distribution estimation at a current point on tha basis of measured values of a random field at a set of sampling points is considered. An estimator is searched in the form similar to Parsen estimation of a distribution function, but with coefficients depending on the distance between the current point and the sampling points. Theoretical foundation is given for optimal choice of these coefficients. Application of the theory are considered in the learning theory and practical problems.

Hybrid Evolutionary Multi-Objective Optimization Algorithms
Hisao Ishibuchi - Osaka Prefecture University
Tadashi Yoshida - Osaka Prefecture University
This paper examines how the search ability of evolutionary multi-objective optimization (EMO) algorithms can be improved by the hybridization with local search through computational experiments on multi-objective permutation flowshop scheduling problems. The task of EMO algorithms is to find a variety of non-dominated solutions of multi-objective optimization problems. First we describe our multi-objective genetic local search (MOGLS) algorithm, which is the hybridization of a simple EMO algorithm with local search. Next we discuss some implementation issues of local search in our MOGLS algorithm such as the choice of initial (i.e., starting) solutions for local search and a termination condition of local search. Then we implement hybrid EMO algorithms using well-known EMO algorithms: SPEA and NSGA-II. Finally we compare those EMO algorithms with their hybrid versions through computational experiments. Experimental results show that the hybridization with local search can improve the search ability of the EMO algorithms when local search is appropriately implemented in their hybrid versions.

A Permutation Based Genetic Algorithm for RNA Secondary Structure Prediction
Kay C. Wiese - Simon Fraser University Surrey
Edward Glen - Simon Fraser University Surrey
This paper presents a permutation based genetic algorithm (GA) to predict the secondary structure of RNA molecules. More specifically the proposed algorithm predicts which specific canonical base pairs will form hydrogen bonds and build helices, also known as stem loops. Since RNA is involved in both transcription and translation and also has catalytic and structural roles in the cell, knowing its structure is of fundamental importance since it will determine the function of the RNA molecule. We introduce a GA where a permutation is used to encode the secondary structure of RNA molecules. We discuss initial results on RNA sequences of lengths 76 and 785 nucleotides and present several improvements to the algorithm. We show that with a higher selection intensity through the Keep-Best Reproduction operator and 1-elitism the best results (i.e. the structures with the lowest free energy) are achieved.

A Trainable Classifier via k Nearest Neighbors
Abdelouahid Lyhyaoui - Universidad Carlos III de Madrid
Jeronimo Arenas-Garcia - Universidad Carlos III de Madrid
Angel Navia-Vazquez - Universidad Carlos III de Madrid
This paper introduces a new classifier derived from a variant of the k-Nearest Neighbor (kNN) rule. This classification scheme, which we call kNN Learning Vector Classifier (kNN-LVC), has a similar architecture to that of Learning Vector Quantizers (LVQs). In fact, both methods place in the observation space a set of centroids or prototypes with a limited area of influence; however, our approach finds optimal prototypes by optimizing a new discriminant function that considers the k nearest prototypes to a sample. Among kNN-LVC characteristics are its localized nature, easy training and interpretation, small storage requirements, and a very competitive performance. The proposed technique is benchmarked against other classifiers as kNN and LVQ. Experiments show good generalization capabilities and efficacy of our approach on datasets with enough number of data in relation to dimensionality.

Frequent Flyer Points Calculator: More Than Just a Table Lookup
Grace Rumantir - School of Multimedia Systems - Monash University
Frequent flyer program is a popular promotional tool used by most major airlines in the world. The trend of airlines forming alliances with other airlines to expand their service-base means that the potential routes that one can take to get from one city to another have increased exponentially. Airline customers however are typically provided with a table which only shows the list of direct flights or flights with a transit point that can be booked using frequent flyer points. The online frequent flyer calculators available on the internet seem to basically work based on this incomplete table. To optimise the use of available frequent flyer points and to provide satisfactory automated customer service, a better frequent flyer points calculator which accesses the complete map of the available routes and the distance of each sector of a route needs to be built. This paper shows the use of standard graph algorithms to build such a frequent flyer points calculator.

Towards landscape analyses to inform the design of hybrid local search for the multiobjective quadratic assignment problem
Joshua Knowles - Free University of Brussels
David Corne - University of Reading
The quadratic assignment problem (QAP) is a very difficult and practically relevant combinatorial optimization problem which has attracted much research effort. Local search (LS) moves can be quickly evaluated on the QAP, and hence favoured methods tend to be hybrids of global optimization schemes and LS. Here we introduce the {\em multiobjective} QAP (mQAP) where $m \geq 2$ distinct QAPs must be minimized simultaneuously over the same permutation space, and hence we require a set of solutions approximating the Pareto front (PF). We argue that the best way to organise a hybrid LS for the mQAP will depend on details of the multiobjective fitness landscape. By using various techniques and measures to probe the landscapes of mQAPs, we attempt to find evidence for the relative ease with which the following can be done by LS: approach the PF from a random initial solution, or search along or close to the PF itself. On the basis of such explorations, we hope to design an appropriate hybrid LS for this problem. The paper contributes a number of landscape measurement methods that we believe are generally appropriate for multiobjective combinatorial optimization.

Adaptive Support Vector Classifications
Zhenqiu Liu - The University of Tennessee
Ying Xu - Oak Ridge National Laboratory
Support vector machines (SVM) was originally designed for regression and binary classification. It promises to give good generalization and has been applied to various tasks. The basic idea behind SVM is to do the classification through solving a nonlinear(quadratic) programming. In this paper, we concentrate on adaptive support vector classification problems. Since there are many parameters in the kernel functions of SVM, tuning the smooth parameters can certainly improve the performance of classification. The general literature of SVM has not discussed in detail the subject of tuning the various user defined parameters. In this paper, we explore the trade-off between maximum margin and classification errors and estimate the best kernel parameters. Toy and real life data are used in the experiments

An XML-based specification of Fuzzy Logic Controllers
Dario Mastropasqua - Dipartimento di Informatica, Universita' degli Studi di Bari
Nicola Mosca - Dipartimento di Informatica, Universita' degli Studi di Bari
Fabio Zambetta - Dipartimento di Informatica, Universita' degli Studi di Bari
Since their introduction, Fuzzy Systems have proven their usefulness in various fields, including linear and non linear control, pattern recognition, financial systems and data analysis. Current fuzzy systems, however, vary in supported capabilities, rules representation and storage. These differences, strictly interconnected, cause a lot of problems when there is a necessity to port a whole set of fuzzy rules from one system to another. To solve this portability issue, the International Electrotechnical Commission (IEC) formed a specific committee to propose a standard format to represent fuzzy systems, named Fuzzy Control Language (FCL). Subsequently, in another context, another research group have proposed a standardized system to represent Evolutionary Computation Systems using an XML-based grammar. Based on this idea, strengthened by growing Internet fame, we propose a mapping of the FCL grammar and capabilities in a standardized XML format, showing at the same time, a bunch of extensions that can further enhance FCL language expressive power.

Mercer Kernel Based Learning For Fault Detection
Bernardete Ribeiro - University of Coimbra
Paulo Carvalho - University of Coimbra
This paper proposes a Mercer's kernel based learning for classification problems using Minkovsky's norm. A comprehensively comparative study of the main characteristics of the support vector algorithm using various values of $\alpha$ parameter in norm's definition is presented. Special emphasis is laid on kernel machine accuracy evaluation and model complexity using Gaussian kernels. Experimental results are given concerning a real application dealing with classification of part defects in an injection molding machine for plastics industry. Also, future research directions are outlined. Keywords: Kernel learning, Support Vector Machines, Fault Detection.

2D-Histogram Lookup for Low-contrast Fault Processing
Mario Koeppen - Fraunhofer IPK
Raul Vicente Garcia - Fraunhofer IPK
Xiufen Liu - Fraunhofer IPK
Bertram Nickolay - Fraunhofer IPK
This paper presents a framework for low-contrast texture fault processing based on 2D-Histogram lookup. 2D-Histogram lookup is a variant of 2D-Lookup operation. For 2D-Lookup, in two differently processed versions of the same image, grayvalue pairs from the same location in both images are replaced by the corresponding entry in a given two-dimensional matrix. The two operations and the matrix have to be provided for full algorithm specification. For 2D-Histogram lookup, the matrix is derived from the 2D-Histogram of both processed images. The main advantage of using the 2D-Histogram is the darkening of rarely occurring structures in the image, while highly probable image structures becomes bright. The so-processed images are then given to a 2D-Lookup procedure for automatic filter generation. For low-contrast texture faults, i.e. faults which are hard to separate from the background texture, the approach shows better performance in fault region detection than the approach of 2DLookup adaptation without 2D-Histogram lookup. For handwriting separation from textured background, no achievement was obtained.

A Trial Method to Create a Natural Interaction in Interactive Genetic Algorithm
Futoshi Sugimoto - Toyo University
Masahide Yoneyama - Toyo University
We have been developing a hybrid fitness assignment strategy to realize a natural interaction in IGA. The strategy allows a user to select some individuals and evaluate a grade that shows how the selected individual resembles a target image. In this paper, we will show a method to compose fitness when a user selects two individuals in the hybrid fitness assignment strategy. It is known that better performance is obtained when two individuals are selected in the generations limited with a condition. The condition is equivalent to the actual situation in which it is difficult for a user to select only one individual. The hybrid strategy is useful to realize a more natural interaction in the actual situation.

Balancing Volume, Quality and Freshness in Web Crawling
Ricardo Baeza-Yates - Center for Web Research, Dept. of CS, Univ. of Chile
Carlos Castillo - Center for Web Research, Dept. of CS, Univ. of Chile
We describe a crawling software designed for high-performance, large-scale information discovery and gathering on theWeb. This crawler allows the administrator to seek for a balance between the volume of a Web collection and its freshness; and also provides flexibility for defining a quality metric to prioritize certain pages.

A Neural Network Model of Rule-guided Behavior
Tetsuto Minami - Graduate School of Informatics, Kyoto University
Toshio Inui - Graduate School of Informatics, Kyoto University
The flexibility of our behavior is mainly caused by our ability to abstract rules from a circumstance and apply them to other situations. To examine the system for such rule-guided behavior, we proposed a neural network model of rule-guided behavior and simulated the physiological experiments of a rule-guided delayed matching-to-sample task (Wallis et al., 2001). Our model was constructed through neural system identification (Zipser, 1993) and a fully recurrent neural network model was optimized to perform a rule-guided delayed task. In the model's hidden layer, rule-selective units as in Wallis et al.(2001) were found, and an examination of connection weights substantiated that rule-selective neurons maintain encoded rule information and indirectly contributed to rule-guided responses. The simulation results predict functional interactions among neurons exhibiting various task-related activities.

Active Learning Using One-class Classification
Ibrahim Gokcen - Tulane University
Jing Peng - Tulane University
Bill Buckles - Tulane University
Active learning aims at minimizing the number of labeled examples and at the same time reaching the optimum as fast as possible. In this paper, we propose a new parameterization for active learning, which is described based on the idea of one-class classification. We demonstrate the use of this parameterization by proposing a simple heuristic and a volume reduction metric for active learning. Empirical results on a variety of data sets show that our metrics outperform and are comparable with other proposed active learning metrics.

A Large Benchmark Dataset for Web Document Clustering
Mark Sinka - University of Reading
David Corne - University of Reading
Targeting useful and relevant information on the WWW is a topical and highly complicated research area. A thriving research effort that feeds into this area is document clustering, which overlaps closely with areas usually known as text classification and text categorisation. A foundational aspect of such research (which has been proven over and over again in other research disciplines) is the use of standard datasets, against which different techniques can be properly benchmarked and assessed in comparison to each other. We note herein that, so far in this broad area of research, as many datasets have been used as research papers written, thus making it difficult to reason about the relative performance of different categorisation/clustering techniques used in different papers. In this paper we propose a standard dataset with a variety of properties suitable for a wide range of clustering and related experiments. We describe how the dataset was generated, and provide a pointer to it, and encourage its access and use. We also illustrate the use of part of the dataset by establishing benchmark results for simple k-means clustering, comparing the relative performance of k-means on a pair of close categories and a pair of distant categories. We naturally find that performance is better on the pair of distant categories, however the experiments reveal that although stop-word removal is confirmed as helpful, word-stemming is, (perhaps counter to intuition), not necessarily always recommended on distant categories.

Enhancing Real-World Applicability by Providing Confidence-in-Prediction in the XCS Classifier System
Phillip Dixon - University of Reading
David Corne - University of Reading
Martin Oates - Evosolve Ltd
Classifier systems are machine learning ruleset-discovery systems; the XCS classifier system has been found in recent years to compete well with rival machine learning systems on difficult benchmarks, and is now being intensively researched for real world applications. A problem common to all such systems is their accuracy on unseen data, and hence their real-world performance. This is not surprising, and standard methods such as cross-validation and early-stopping are commonly used in training to assess likely performance on unseen data. However, an additional and related issue is the confidence we can have in a prediction as a function of our confidence in the inputs. Predictions which lay on the boundary between two differing outcomes (e.g. the system may say ëmalignantí in response to an input pattern, but a slight difference in that pattern might cause it to respond ëbenigní) must somehow be identified and questioned on their validity. We describe a technique which takes a ruleset learned by XCS (or another system), and provides highly useful confidence information when predictions are made with that ruleset. Further, we describe a measure which enables confidence-in-prediction behaviour to be assessed for different rulesets. Using this measure, called zero tolerance performance, we find that small, succinct and general rulesets produce better confidence-in-prediction performance. This mitigates against direct use of classifier system learned rulesets, since these tend to be very large, but it validates recent research which is successfully applying post-processing techniques which quickly reduce such a ruleset to a much smaller but equally accurate one. Development and testing is done on the standard Wisconsin Breast Cancer database.

Comparison of Fuzzy Rule Selection Criteria for Classification Problems
Hisao Ishibuchi - Department of Industrial Engineering
Takashi Yamamoto - Department of Industrial Engineering
This paper compares heuristic rule selection criteria in fuzzy rule extraction for classification problems. Using several heuristic criteria, we examine the performance of extracted fuzzy rules through computer simulations on four data sets (glass, Wisconsin breast cancer, wine, and sonar). Simulation results show that better results are obtained from composite criteria of the confidence and support measures than the individual use of those measures. It is also suggested that genetic algorithm-based rule selection can improve the classification ability of extracted fuzzy rules by searching for good rule combinations. This result shows the importance of taking into account the combinatorial effect (i.e., interaction) of extracted fuzzy rules when we design fuzzy rule-based systems.

A self-growing probabilistic decision-based neural network with applications to anchor/speaker identification
S.S. Cheng - Dept. of Computer Engineering, National Chiao-Tung University
Y.H. Chen - Dept. of Computer Engineering, National Chiao-Tung University
C.L. Tseng - Dept. of Computer Engineering, National Chiao-Tung University
H.T. Pao - Department of Management Science, National Chiao-Tung University
In this paper, we propose a new learning algorithm for a mixture Gaussian based neural network, called Self-growing Probabilistic decision-based neural networks (SPDNN) for better density function estimation, and pattern classification. We also developed a new Self-growing Mixture Gaussian learning (SMGL) algorithm, that is able to find the natural number of components based on a self-growing validity measure, Bayesian Information Criterion (BIC). It starts with a single component randomly initialized in the feature space and grows adaptively during the learning process until most appropriate number of components are found. In our experiments on anchor/speaker identification, we have observed noticeable improvement among various model-based or vector quantization-based classification schemes. Key Words: Self-growing Probabilistic Decision-based Neural Networks (SPDNN), Supervised learning, Competitive learning, Unsupervised learning, Validity measure.

Document Oriented Modeling of Cellular Automata
Christian Veenhuis - Fraunhofer Institute for Production Systems and Design Technology
Mario K–ppen - Fraunhofer Institute for Production Systems and Design Technology
This paper proposes a document-oriented modeling concept for cellular> automata (CA), which supports the simple and rapid design of a huge variety of cellular automata. This modeling concept is realized as a domain-specific modeling language derived from XML (e{\bf X}tensible {\bf M}arkup {\bf L}anguage). XML is in general considered as the future for internet documents and data exchange. The main concept behind XML is to separate the content of a document from its layout (its appearance). The presented modeling concept uses a document for describing a whole cellular automaton. Like the content of a document is separated from its layout, the abstract cellular automaton is separated from a concrete implementation and programming language. Everyone can create and use XSL(T) stylesheets for translating cellular-automaton-documents into ready to use source-code (covering the adequate cellular-automaton-functionality) as well as for documentation and exchange of the realised CA.

Scene-Based Nonuniformity Correction Method Using the Inverse Covariance Form of the Kalman Filter
Sergio Torres - Universidad de Concepcion
Jorge Pezoa - Universidad de Concepcion
A scene-based algorithm for nonuniformity correction (NUC) in focal-plane arrays (FPA) detectors has been developed. The NUC technique is based in the inverse covariance form (ICF) of the Kalman filter. The gain and the offset of each detector of the FPA are modeled by discrete-time Gauss-Markov processes. These parameters are taken as constant within a given sequence of frames, corresponding to a certain time and operational conditions, but they randomly drift from one sequence to another in response to new operational conditions. For each detector and each sequence of frames, the ICF filter input is an observation vector consisting of detector's read-out values. The output of ICF filter for any sequence of infrared frames is the detectors' gain and offset. The efficacy of the ICF of the Kalman filter to compensate for nonuniformity noise in infrared imagery is demonstrated using sequences of infrared imagery with both artificial nonuniformity and artificial drift in the detectors' parameters. It is shown that the ICF filter and the Kalman filter generate similar reductions of nonuniformity. However, the ICF filter compensates the noisy images with less number of operations per pixel and per frame than the Kalman filter.

Adaptive Bias Compensation for Non-Uniformity Correction on Infrared Focal Plane Array Detectors
Esteban Vera - Universidad de Concepcion
Rodrigo Reeves - Universidad de Concepcion
Sergio Torres - Universidad de Concepcion
The non-uniform response in infrared focal plane array (IRFPA) detectors produces corrupted images with a fixed-pattern noise. In this paper we present a new adaptive scene-based non-uniformity correction (NUC) technique. The method simultaneously estimates detector's parameters and performs the non-uniformity compensation using a neural approach and a Kalman estimator in a frame by frame recursive basis. Each detector's output is connected to its own inverse model: a single 1-input linear neuron. The neuron bias is directly related to the detector's offset, and have the property of being softly adapted using simple learning rules, choosing a suitable error measure to fit the NUC objective. The proposed method has been tested with sequences of real infrared data taken with a InSb IRFPA, reaching high correction levels, reducing the fixed pattern noise, and obtaining an effective frame by frame adaptive estimation of each detector's offset.

Combining Classifiers with Multimethod Approach
Mitja Lenic - University of Maribor
Peter Kokol - University of Maribor
The automatic induction of classifiers from examples is an important technique used in data mining. One of the problems encountered is how to induce a good classifier without overfitting. Although there is a lot of research going on in this field, the research is mainly focused on a specific machine learning method or on a specific combination of those methods. In this paper a multimethod approach to combine classifiers is presented that combines advantages of single methods and avoids theirs disadvantages at the same time by applying different methods on the same knowledge model, each of which may contain inherent limitations, with the expectation that the combined multiple methods may produce better results.

SADISCO: A Scalable Agent Discovery and Composition Mechanism
James Nolan - George Mason University
Arun Sood - George Mason University
Robert Simon - George Mason University
Peer-to-peer systems have recently gained popularity as a way to share files amongst distributed users. Such an approach can be applied to the discovery of distributed software agents. In this paper, we introduce a scalable agent discovery mechanism that utilizes a semantic layer on top of traditional middleware, and forms a hierarchy representing the types of agents on the network. The approach is used to support the composition of meta-agents, or "an agent of agents", to build distributed applications. Our results show that the SADISCO approach scales well and allows users to discover agents with little or no {\em a priori} information.

Selection of Models for Time Series Prediction via Meta-Learning
Ricardo PrudÍncio - Centro de Inform·tica, Universidade Federal de Pernambuco
Teresa Ludermir - Centro de Inform·tica, Universidade Federal de Pernambuco
In this work, we propose the use of meta-learning techniques in the task of selecting models for time series prediction. In our approach, a machine learning algorithm generates symbolic knowledge used for choosing a better model to predict a time series, according to the features of the series. In the implemented prototype, a decision tree is used for selecting between the Simple Exponential Smoothing model and the Time-Delay Neural Network, for predicting stationary time series. Our experiments revealed encouraging results.

Bi-Directional Flow of Information in the Softboard Architecture
Silvio Macedo - Imperial College London
Ebrahim Mamdani - Imperial College London
This paper addresses the issue of integrating top-down and bottom-up processing of information in modular hybrid systems. Backed by computational and neurological evidence, we show that there are important advantages to be gained from supporting bi-directional flow of information, especially in the context of intelligent information processing in real-world applications. We consider modular hybrid systems from a granular computing perspective and propose an approach based on evidential reasoning to integrate both top-down and bottom-up processes. The implementation of the model in the Softboard framework, an experimental distributed hybrid architecture, is presented and its application to intelligent filtering and retrieval of multimedia is illustrated.

Combining Genetic Algorithms and Neural Networks to Build a Signal Pattern Classifier
Carla Purdy - ECECS Dept., University of Cincinnati
In this paper we show how genetic algorithms and neural networks are combined to build a high performance Signal Pattern Classifier (GNSPC). Signal patterns are intrinsic to many sensor-based systems. The goal of GNSPC is to differentiate among large numbers of signal pattern classes with low classification cost and high classification performance. Classification performance is measured by the correct classification of noisy signal patterns despite using pure signal patterns for building the classifier. GNSPC is basically a decision tree classifier with similarity classification rules. The rules are used to test the similarity of signal patterns. A combination of a genetic algorithm and a neural network is used to find the best rules for the decision tree. This combination provides powerful classification capabilities with great tuning flexibility for either performance or cost-efficiency. Learning techniques are employed to set the genetic algorithm global parameters and to obtain training data for the neural network.

Design of Strong Causal Fitness Functions
Sandra Hirche - Technische Universit”t Berlin
Ivan Santibanez-Koref - Technische Universit”t Berlin
Ivo Boblan - Technische Universit”t Berlin
A new kind of fitness functions for controller optimization is presented. This new fitness functions are postulated to be strong causal. Thus a better behaviour during the o ptimization process can be achieved.

Voice Codification Using Self Organizing Maps as Data Mining Tool
Juan Vel·squez - RCAST, The University of Tokyo
Hiroshi Yasuda - RCAST, The University of Tokyo
Terumasa Aoki - RCAST, The University of Tokyo
Richard Weber - DII, The University of Chile
Voice transmission plays a crucial role in many applications such as e.g. telecommunications. An alternative to increase the efficiency of voice transmission is using a codification that permits compressing the signal to be transmitted. Such a compression assumes data sets with basic forms, whose combination produce the voice signal. Generally this data set is organized around an array of data, known as codebook. The codebook is constructed by a vectorial quantization process, which consists of looking for what vectors are most representatives, within a set. Next a structure of data is created that stores the vectors, also known as centers. Then, given a codebook with the most representative basic forms, the problem is translated to take a piece of voice, look for its position and transmit it. Since the receiver will have the same structure of data the voice will be able to be synthesized. The problem consists in the search in the codebook, which can be expensive in terms of computation and other resources, which perform operation in real time, characteristic that in some services is fundamental, for example in telecommunication. In this work we present a new algorithm to construct and to cross codebooks by using a data mining tool such as self organizing maps over a database of humans voices. This algorithm produces a codebook structure within a relation of proximity between its elements, reducing the problem to a local search, which allows to decrease compression time and to reduce the rate of transmitted bits.

Feature Extraction by Distance Neural Network in Classification Tasks
Cristian Maturana - Consorcio S.A.
Richard Weber - Department of Industrial Engineering, University of Chile
Feature extraction is an important task for data mining applications, in particular for classification. On one hand it leads to a better understanding of the relations between features and classification results. On the other hand it helps to perform fast classifications with a reduced number of features. Principal component analysis (PCA) is one of the mostly used techniques for linear feature extraction. Recently neural networks have been proposed for non-linear feature extraction out-performing PCA in many cases like non-linear principal component analysis (NLPCA). The mentioned approaches have in common an error reduction when reproducing the initial feature space from the reduced space. We present a new approach which tries instead conserving the patterns distribution from the original space to the reduced space. This model called dNN (Distance Neural Network) pro-vides very good results for several cases outperforming PCA and shows to be com-petitive with the best non-linear techniques for feature extraction.

Revealing Feature Interactions in Classification Tasks
Derek Partridge - Department of Computer Science, University of Exeter
Shuang Cang - Department of Computer Science, University of Exeter
This paper presents a contribution to the theory of optimal feature-subset selection associated with pattern recognition or classification tasks. It extends the theory of Mutual Information (MI) to deal with the difficulties introduced by feature interaction. The essential contribution is to permit MI calculations between sets of features and the target class such that all interactions between the features in the chosen set are taken into account in the MI value produced. In order to accomplish this extension from traditionally pairwise (i.e., feature-feature or feature-class) MI computation we have developed algorithms to transform any continuous-valued feature into a discrete-valued one, and to transform any set of discrete-valued features into a `composite' feature suitable for the necessary MI calculations. We have built these algorithms into classical forwards and backwards sequential search procedures, and these provide an initial survey of the interactions present among features within a given set of feature vectors. If no significant feature interaction is present then the features have effectively been ranked, and the optimal subset selection problem has been solved. When feature interactions are present, the initial survey will indicate where and what interactions are present and will suggest, if necessary, further probes with the extended MI algorithm to reveal their full nature. We demonstrate the effectiveness of the extended MI algorithm on a number of examples that have been presented as problematic in the literature, especially in the feature-selection literature that has employed MI theory .

HyCAR - A Robust Hybrid Control Architecture for Autonomous Robots
Farlei Heinen - UNISINOS - Mestrado em Computacao Aplicada (PIPCA)
Fernando Osorio - UNISINOS - Mestrado em Computacao Aplicada (PIPCA)
This work presents a new hybrid architecture applied to autonomous mobile robot control - HyCAR (Hybrid Control for Autonomous Robots). This architecture provides a robust control for robots as they become able to operate and adapt themselves to different environments and conditions. We designed this new hybrid control architecture, integrating the two main techniques used in robotic control (de-liberative and reactive control) and the most important environment representation techniques (grids, geometric and topological maps), through a three-layer architec-ture approach (vital, functional and deliberative layers). To guarantee the robustness of our control system, we also integrated a localization module based on Monte Carlo localization method. This localization module possesses an important role in our control system, and supplies a solid base for the control and navigation of autonomous mobile robots. In order to validate our control architecture, a realistic simulator of mobile robots was implemented (SimRob3D) allowing the practical use of the proposed system. We implemented several three-dimensional environment models, as well as diverse sensorial and kinematics models found in actual robots. Our simulation results had demonstrated that the control system is perfectly able to determine the mobile robot position into a partially known environment, considering local or global localization, and also to determine if the robot needs to re-localize it-self given an incorrect localization. In navigation tasks the robot was able to plan and follow self-generated trajectories in a dynamic environment, which can include several unexpected static and mobile obstacles. We also demonstrated that with the integration of topological and grid information we improved planning algorithm execution.

The Evolutionary Learning Rule for System Identification in Aadaptive Finite Impulse Filters
Oscar Montiel - CITEDI-IPN
Oscar Castillo - Tijuana Institute of Technology
Roberto Sepulveda - CITEDI-IPN
Patricia Melin - Tijuana Institute of Technology
In this paper, we are proposing an approach for integrating evolutionary computation applied to the problem of system identification in the well-known statistical signal processing theory. Here, some mathematical expressions are developed in order to justify the learning rule in the adaptive process when a Breeder Genetic Algorithm is used as the optimization technique. In this work, we are including an analysis of errors, energy measures, and stability

Learnable Topic-specific Web Crawler
Niran Angkawattanawit - Massive Information and Knowledge Engineering Research Group, Department of Computer Engineering, Faculty of Engineering, KASETSART University, THAILAND
Arnon Rungsawang - Massive Information and Knowledge Engineering Research Group, Department of Computer Engineering, Faculty of Engineering, KASETSART University, THAILAND
Topic-specific web crawler collects relevant web pages of interested topics from the Internet. There are many previous researches focusing on algorithms of web page crawling. The main purpose of those algorithms is to gather as many relevant web pages as possible, and most of them only show the approaches of the first crawling. However, no one has ever mentioned some important questions, such as how the crawler does during the next crawlings, does the crawling process can be done in an incremental way, how to track the change of web pages, etc. In this paper, we present an algorithm that covers the detail of both the first and the consecutive crawlings. For efficient result of the next crawling, we keep the log of previous crawling to build some knowledge bases: starting URLs, topic keywords and URL prediction. These knowledge bases are used to build the experience of the topic-specific web crawler to produce better result for the next crawling.

Simulating an Information Ecosystem within the WWW
Reginald L. Walker - University of California at Los Angeles
The design focus of the Tocorime Apicu integrated search engine builds upon new approaches and techniques associated with evolutionary computation to improve the precision and recall mechanisms of existing information retrieval systems within popular search engines. The interactions of the four major components of engines are facilitated through the use of a hierarchical communication topology which partitions the nodes of a distributed computing system into subclusters. The hierarchical communication topology is based on an information ecosystem modeled upon and incorporating the social structure of honeybees---this providing mechanisms for the efficient sharing of information.

Accurate Human Face Extraction using Genetic Algorithm and Subspace Method
Makoto Murakami - Toyo University
Masahide Yoneyama - Toyo University
Katsuhiko Shirai - Waseda University
Subspace method that can express facial images efficiently by linear translation into lower dimensional subspace has wide application for face recognition e.g. identification, facial pose detection etc. In the preprocess of this method the accurate extraction of human face area is required, but it is influenced by the light condition, various background, individual variation and so on, so it has not put into practical use yet. In this paper we examine the subspace method by comparison of the search space, and apply Genetic Algorithm to face extraction and show that the effective results was obtained.

Noise and Elitism in Evolutionary Computation
Tuvik Beker - The Hebrew University of Jerusalem
Lilach Hadany - Stanford University
Evolutionary Computation applications often involve a high degree of noise in the evaluation of solution performance. In the Genetic Algorithms literature, this has usually been considered a limitation, and attempts to reduce this noise seem to be the common practice. After introducing a proper measure for the performance of evolutionary techniques, we show that for complex fitness landscapes, noise reduction is not necessarily helpful, and sometimes is in fact harmful. Based on insights gained from biology, we suggest a simple form of Elitist selection, termed {\em Group Elitism}, which has a strong and robust positive effect on the performance of noisy algorithms. We discuss the notion of Elitism as an extreme form of {\em Fitness-Associated Variation}, and show that Group Elitism efficiently deals with the harmful effects of evaluation noise without compromising its benefits in evading local adaptive maxima.

Latent Semantic Indexing Based on Factor Analysis
kawamae noriaki - NTT Information Sharing Platform Laboratories
The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), which is based on singular value decomposition (SVD), and probabilistic latent semantic indexing (PLSI) have already been proposed to overcome problems in document indexing, but critical problems remain. In contrast to LSI and PLSI, our proposed method uses a more meaningful, robust statistical model based on factor analysis and information theory. As a result, this model can solve the remaining critical problems in LSI and PLSI. Experimental results with a test collection showed that our method is superior to LSI and PLSI from the viewpoints of information retrieval and classification. We also propose a new term weighting method based on entropy.

A spatial dimension for searching the world-wide web
Andrea Rodriguez - University of Concepcion, CHILE
Few studies have explored spatial relations as they constrain the searching of documents in the World Wide Web (WWW). This paper presents the theoretical basis for a spatial searching of Web documents. It reviews spatial reasoning concepts associated with spatial relations and describes a model for organizing and deriving spatial relations based on a hierarchical structure of the space, that is, a conceptual model of the space in terms of connected regions. Using a study case, this work presents guidelines for how this model can be used for extending current searching techniques of Web documents to answer queries that are constrained by spatial relations.

Building Yearbooks with RDF
Ernesto Krsulovic - University of Chile
Claudio Gutierrez - Univesity of Chile
We present a simple application of semantic integration using the RDF model of metadata, namely, the construction and maintenance of a yearbook. It can be built and used by organizations which already have their information on the Web and require to keep yearbooks to service advanced searching facilities. Unlike traditional approaches, ours ensures wide interoperability, extensibility and historical recording by using RDF and a decentralized approach.

A non-deterministic versus deterministic algorithm for searching spatial configurations
Mary-Carmen Jarur - University of Concepcion
Andrea Rodriguez - University of Concepcion
A deterministic approach to searching spatial configurations can be a computationally demanding task, since it implies to permute combinations of possible objects stored in a database in order to satisfy particular spatial constraints. In addition, while deterministic approaches may find the best solution, they can also miss possible solutions due to local optimum effects. In this paper, we present an approach to searching spatial configurations that explores the characteristics of genetic algorithms to find solutions within a time framing. The novelty of our approach lies in the combination of genetic algorithms with a heuristic operator and an indexing schema for handling binary spatial constraints. Experimental results compare a genetic versus a deterministic algorithm and show the convenience of using a genetic algorithm depending on the type and complexity of a user query.

Parallel Text Query Processing using Composite Inverted Lists
Mauricio Marin - University of Magallanes
The inverted lists strategy is frequently used as an index data structure for very large textual databases. Its implementation and comparative performance has been studied in sequential and parallel applications. In the latter, with relatively few studies, there has been a sort of "which-is-better" discussion about two alternative parallel realizations of the basic data structure and algorithms. We suggest that a mix between the two is actually a better alternative. Depending on the workload generated by the users, the composite inverted lists algorithm we propose in this paper can operate either as a local or global inverted list, or both at the same time.

Performance Based Feature Identification for Intrusion Detection Using Support Vector Machines
Srinivas Mukkamala - New Mexico Institute of Mining and Technology
Andrew Sung - New Mexico Institute of Mining and Technology
Intrusion detection is a critical component of secure information systems. This paper addresses the issue of identifying important input features in building an intrusion detection system (IDS). Since elimination of the insignificant and/or useless inputs leads to a simplification of the problem, faster and more accurate detection may result. Feature ranking and selection, therefore, is an important issue in intrusion detection. Since support vector machines (SVMs) tend to scale better and run faster than neural networks with higher accuracy for intrusion detection, we apply the technique of deleting one feature at a time to perform experiments on SVMs to rank the importance of input features for the DARPA collected intrusion data. Important features for each of the 5 classes of intrusion patterns in the DARPA data are identified. It is shown that SVM-based IDS using a reduced number of features can deliver enhanced or comparable performance. An IDS for class-specific detection based on five SVMs is proposed.

Linguistic Hedges: a Quantifier Based Approach
Martine De Cock - Dept. of Applied Mathematics and COmputer Science, Ghent University

We present an entirely new approach for the representation of intensifying and weakening linguistic hedges in fuzzy set theory, which is primarily based on a crisp ordering relation associated with the term that is modified, as well as on a fuzzy quantifier. With this technique we can generate membership functions for both atomic and modified linguistic terms. We prove that our model respects semantic entailment and we show that it surpasses traditional approaches, such as powering and shifting modifiers, on the intuitive level and on the level of applicability.

Ensembles in Practice: Prediction Estimation, Multi-Feature and Noisy Data
Stefan Zemke - DSV, Stockholm University/KTH

This paper addresses 4 practical ensemble applications: time series prediction, estimating accuracy, dealing with multiple feature and noisy data. The intent is to refer a practitioner to ensemble solutions exploiting the specificity of the application area.

Self-Organized Data and Image Retrieval as a Consequence of Inter-Dynamic Synergistic Relationships in Artificial Ant Colonies
Vitorino Ramos - CVRM – GeoSystems Centre, Technical Univ. of Lisbon (IST)
Fernando Muge - CVRM – GeoSystems Centre, Technical Univ. of Lisbon (IST)
Pedro Pina - CVRM – GeoSystems Centre, Technical Univ. of Lisbon (IST)

Social insects provide us with a powerful metaphor to create decentralized systems of simple interacting, and often mobile, agents. The emergent collective intelligence of social insects – swarm intelligence – resides not in complex individual abilities but rather in networks of interactions that exist among individuals and between individuals and their environment. The study of ant colonies behavior and of their self-organizing capabilities is of interest to knowledge retrieval/ management and decision support systems sciences, because it provides models of distributed adaptive organization which are useful to solve difficult optimization, classification, and distributed control problems, among others. In the present work we overview some models derived from the observation of real ants, emphasizing the role played by stigmergy as distributed communication paradigm, and we present a novel strategy (ACLUSTER) to tackle unsupervised data exploratory analysis as well as data retrieval problems. Moreover and according to our knowledge, this is also the first application of ant systems into digital image retrieval problems. Nevertheless, the present algorithm could be applied to any type of numeric data.