Schreck,T., Schneidewind,J., Keim,D.
Modern data analysis applications generate, store, and process massive amounts of data. This data is not limited to raw textual or numeric data records – typical applications also have to deal with complex data like biometrical data or multimedia data. Most intelligent data analysis methods require appropriate data representation to calculate similarity scores between data instances. Feature vectors are a generic way for describing complex data by vectors of characteristic numeric features, and support important applications like clustering, classification, and similarity search. Calculating appropriate feature vectors for a given data type is a challenging task. Determining good feature vector extractors usually involves experimentation and application of supervised information. However, such experimentation usually is expensive, and supervised information often is data dependent. We address the important feature selection problem by a novel approach based on analysis of certain feature space images. We develop two image-based analysis techniques for the automatic discrimination power analysis of feature spaces. We evaluate the techniques on a comprehensive feature selection benchmark, demonstrating the effectiveness of our analysis and its potential toward automatically addressing the feature selection problem.