# Feature-Based Data Analysis

## by Tino Weinkauf • April 11, 2012 • Highlights

Computing power increases constantly. While the fastest supercomputer in 1993 had a performance of 59.7 GFlop/s, we are now clearly in the petascale era with twenty installations breaching the 1 PFlop/s barrier (see TOP500 Supercomputer Sites). Along with the computing power, the size and complexity of simulation results is increasing as well. In many cases the simulated data sets are at least four-dimensional, e.g. with three spatial dimensions and time. In some cases even higher-dimensional data sets are produced by considering additional parameters.

## The Need to Master the Amount of Information

Due to the sheer size of the data sets alone, it is favorable if not necessary to automate at least parts of the analysis. A way to achieve this is by extracting features. A *feature* — as used in the following — is a mathematically well-defined, geometric object (point, line, surface, …) with its definition and interpretation depending on the underlying application, but usually it represents important structures (e.g. vortex, stagnation point) or changes to such structures (events, bifurcations). An automated extraction of features aids an analysis in a number of ways:

**reduction of information**

The human brain has the ability to grasp primarily three dimensions and the current hardware has only limited capabilities of displaying them. Hence, only parts of the massive and complex simulation results can be visualized at once. Feature extraction reduces the amount of data to a small set of geometric objects. Furthermore, a quantification of the extracted features allows to build up a feature hierarchy leading to even further simplified representations by e.g. filtering out the less important features.**target-oriented study**

Feature extraction is used to automatically find interesting parts in the data, where e.g. certain structural changes occur. This can guide the user in the manual exploration of a data set. It allows to concentrate on certain aspects of the data — leading to a target-oriented, application-dependent study of the most important structures of a data set.**shifting the analysis to the supercomputer**

In some cases it is preferable to analyze the data on the supercomputer along the simulation: for example, if the data set is too large to be efficiently handled by commodity hardware, or if the analysis results have some influence on the simulation itself (e.g. simulation steering). Feature extraction is easily automated and therefore, it is perfectly fitted for batch jobs on supercomputers.**interactive visualization**

The resulting feature set is usually small enough to be handled and displayed interactively with commodity hardware.**more objective analysis**

In contrast to most other visualization methods, feature extraction techniques usually depend on less parameters or even no parameters at all. Hence, the interpretation of the results depends less on a user-defined parametrization (e.g. isovalue, transfer function).**faster analysis**

The time needed by the user is reduced since parts of the analysis are automated and the manual part is interactive.

Feature-based data analysis encompasses a number of different approaches. In the following, we will concentrate on the feature-based analysis of vector fields, with the main focus on flow fields, which play a vital role in many areas. Examples are burning chambers, turbomachinery and aircraft design in industry as well as blood flow in medicine. Furthermore, we will show an application for topological data analysis – an important subfield of feature-based data analysis.

## Feature-Based Analysis of Fluid Flows

A feature-based analysis gives rise to new possibilities in comparison to other visualization techniques. Consider the flow behind a cylinder as shown in the Figure below. The so-called von Kármán vortex street develops behind the cylinder and is clearly depicted by the line integral convolution. Using an animation, the temporal movement of these vortices can be depicted by this method as well. However, it is incapable of *distinguishing* between different vortices and *measuring* their path, velocity, or life time. The feature-based vortex analysis, on the other hand, allows this since the vortices have been extracted as distinct geometric objects. Furthermore, it is possible to quantify these objects by means of importance or strength, and to filter accordingly.

## Topological Data Analysis

The analysis is based on the extraction and examination of topological structures of vector fields. Topology is a well-researched field of mathematics. It allows to condense a data set to its structural skeleton. For vector fields, this means to segment the domain into regions of different flow behavior. Consider a simple 2D vector field as it is shown above. A topological analysis always starts with the extraction of so-called critical points which are the zeros of the vector field. There are different flow patterns around critical points which allow to classify them into sources, sinks, and saddles. While the flow behavior around sources and sinks is uniformly either outflow or inflow, the flow around saddles is a mixture of both. Certain stream lines around saddles can be found which separate these different areas. They are called separation lines. This leads to a complete segmentation of the domain into regions of different flow behavior.

Topology can be used in a number of ways to foster the analysis of flows. For example, the topology of the velocity field of a flow can be seen as a condensed representation of the stream lines and may therefore serve as a skeletal, simplified representation of the flow. However, such an analysis depends on the reference frame and may therefore be not applicable in all situations. As another application, the topological structures of certain derived vector fields describe centers of high strain or vortex activity – leading to a Galilean invariant analysis of important flow processes like mixing.

The Figure below shows an example where a topology-based analysis has been successfully applied to explain the impact of an active flow control technique at an airfoil. This study has been carried out within the DFG Collaborate Research Center SFB 557 together with Bert Günther and Bernd R. Noack from the Technical University Berlin as well as Hans-Christian Hege from Zuse Institute Berlin. The flow around an airfoil is subject to large efforts in order to increase the desired lift and to reduce the parasitic drag. In this example, these performance enhancements are achieved by controlling the flow separation at the rear flap using periodic air injection (figure (a)). The uncovering of the underlying physics was a necessary step in order to choose optimal values for frequency, intensity, and angle of the injection. Based on this, the lift could be raised by 11.2%. The vortex structures have been extracted as topological separatrices of the pressure gradient and denote lines of minimal pressure. Figure (c) shows parts of the topological skeleton of the pressure gradient. A quantification of the separation lines based on pressure and a subsequent filtering of weak vortices has been applied. The result is shown in figures (d), (e), (f), where the impact of the frequency of air injection onto the vortex structures can be studied. Note that this is a five-dimensional data set consisting of three spatial dimensions, time, and the parameter dimension. Raising the frequency causes a reduction of the lower vortex, which is a necessary condition for gaining lift. However, higher frequencies (F+ > 0.6) are not beneficial to the lift. Using a visual comparison of the vortex structures at different frequencies, we found that new vortex structures are induced by the air injection itself. This has a negative effect on the pressure ratio and consequently on the lift. Especially at higher frequencies, the excitation dominates the natural flow structures and induces long-living, almost two-dimensional vortices in fast succession at the top of the rear flap (figure (f)). In contrast to this, the induced vortices at F+ = 0.6 dissolve quickly and therefore, they are less influential. Our topology-based analysis technique of the vortex structures contributed to the physical understanding of the flow structures and was a substantial part of the optimal choice of parameters.

## More information

- Paper Saddle Connectors – An Approach to Visualizing the Topological Skeleton of Complex 3D Vector Fields
- Paper Topological Methods for 2D Time-Dependent Vector Fields Based on Stream Lines and Path Lines
- Paper Extraction of Parallel Vector Surfaces in 3D Time-Dependent Fields and Application to Vortex Core Line Tracking
- Paper Feature-based Analysis of a Multi-Parameter Flow Simulation
- Thesis Extraction of Topological Structures in 2D and 3D Vector Fields
- Summary of the thesis in an article for the Gesellschaft für Informatik Extraktion topologischer Strukturen von 2D- und 3D-Vektorfeldern (in German)