Statistics and analytical methods


In quantitative studies, the underlying sample plays a central role. It is an essential building block in the research project that contributes significantly to its quality. For this reason, sampling is a topic of constant research, optimization and innovation at infas.

infas uses a variety of scientifically established methods – often in combination. The aim is to apply the optimal sampling concept for each individual research project, both in terms of content and economy. In the following, the sampling methods that infas regularly uses in social science studies are described.

The renowned Arbeitskreis Markt- und Sozialforschungsinstitute e.V. (ADM) enables infas to draw random samples based on an up-to-date selection framework.

infas is not only a member of the industry association ADM, but is also active in the sampling working group there and has access to the most up-to-date selection framework. This makes infas one of the few social research institutes that can use the ADM selection framework for social science surveys, which is recognized and often required in tenders. This includes several networks for face-to-face surveys using the random route approach. In addition, the selection populations for fixed network samples and mobile network samples are provided with annual updates.

Studies, some with infas involvement, confirm that around one in two lines is now no longer listed in generally accessible registers. The proportion of unlisted fixed-network lines is highest in large cities. In addition, younger people, single households and people with a low level of education are more often not listed in the telephone directory than older people or multi-person households.

In addition to listed numbers (entered in telephone registers), we therefore use randomly generated numbers in sampling and draw them in with corresponding inclusion probabilities. In Germany, a proposed solution was developed in this context by Häder/Gabler, which has been tested in practice on a large scale.

Both the selection frame for fixed network numbers and the selection frame for mobile phone numbers are based on the Häder/Gabler method. In this procedure, telephone numbers in the fixed network or in mobile communications are generated synthetically, since generally accessible directories such as telephone books offer only an incomplete selection framework, even in the fixed network area.

A more recent challenge is posed by households that can only be reached via mobile lines but not via a fixed-network line (so-called “mobile-only” households). The dual-frame method makes it possible to include this subgroup in a sample as well.

Mobile Onlys have no probability of inclusion when drawn from the fixed-line selection frame. According to recent surveys, this affects at least 20 percent of the population and about 30 percent of households. Since these households differ significantly in composition from those with fixed networks, this is therefore a systematic coverage problem that cannot be ignored for many surveys.

The gap is closed by an additional mobile phone sample from synthetically generated mobile phone numbers. A sampling frame for this is also provided by the ADM telephone sampling working group. The mobile numbers are also generated numbers, as only very few mobile numbers are listed in a directory.

In this dual frame approach, the sample is thus drawn not from one frame, but from two frames that together cover the population completely: one with telephone numbers exclusively from the fixed network and the other with numbers from mobile communications. Appropriate design weighting is then required to merge the two samples.

Only the vanishingly small proportion of people who have neither a landline nor a cell phone cannot be included in the sample using this approach. Following the procedure proposed by Häder/Gabler, the two samples can basically be merged like any sample from two frames.

One challenge with the dual-frame approach is the mixing ratio of the two samples. In other words, the question of how high the proportion of mobile numbers and fixed-network numbers should be. Here, detailed simulation calculations with different weighting models for merging the two samples, which infas conducted, show that there is an optimal mixing ratio in the realized sample in which the weighting factors (design weights) have the lowest variance and the weights have the highest effectiveness. From a cost perspective, alternative mixing ratios are conceivable in which the increase in the variance of the weighting factors and consequently the increase in the sampling error are comparatively small.

For most studies today, the dual-frame approach to sampling is called for. It is the only way to ensure that all segments of the population are included in the sample. In cooperation with ADM, infas has helped to ensure that the dual-frame method has been scientifically tested and is becoming increasingly established in market and social research.

In stratified selection, the population is divided into individual subgroups (strata). Independent random samples are drawn from each (largely homogeneous) stratum. The stratification characteristics should be related to the object of investigation.

We use stratified selection procedures (not to be confused with quota sampling) for a number of samples. By default, for example, fixed-network samples and residents’ registration office samples are stratified according to regional (federal state, counties, etc.) and regional structural characteristics (municipality size classes, regional structure types, etc.). To calculate the result parameters, the stratified results must be weighted according to their respective proportions.

A major advantage of stratified sampling is the reduction of sampling error. This makes the results more accurate. In addition, it is also possible to make statements within the individual strata. Stratification can be both proportional according to the distribution of units in the population or disproportional with a distribution deviating from the population.

In cluster sampling, the population is divided into subpopulations, which are, however, usually smaller than in stratified sampling. The survey is then conducted only in randomly selected lumps.

Belonging to a cluster does not result from a systematically selected characteristic, but from the available possibilities to divide the sampling units. These are often regionally delimited clusters. Within the selected clusters, random samples are again drawn for the survey.

infas uses lumped samples by default in population office samples where communities or neighborhoods define the clusters.

Multistage sampling is a random selection procedure in which samples of the survey units to be included are drawn in two or more selection stages.

For this purpose, a sample is first drawn from the population, usually from a higher hierarchical level. Then another sample is drawn from this sample. A multilevel selection is useful if the population is hierarchically structured, for example, at the levels between the federal government and the states or states, counties and municipalities.

In practice, infas often uses a combination of lumped sample, stratified sample and ADM sample. For example, a population registration office sample represents a two-stage lumped, stratified sample. In the first stage, communities and sample points are randomly selected using the PPS (Probability Proportional to Size, with the number of target units in the community as the MOS = Measure of Size or weight of importance) method. In the second stage, the same number of individuals are randomly selected within each of the selected Sample Points (usually via systematic random selection). The selection of municipalities and sample points in the first stage is usually stratified according to regional and regional structural characteristics.

The population sample is considered the silver bullet in empirical social research. infas is one of the few social research institutes that can draw nationwide population samples for scientific surveys.

The population sample is based on a random selection on the basis of local population registers. The data are then converted into a cluster sample. Population samples have numerous advantages: First, data quality and data completeness may be considered very high. Second, data from non-participants are known, so that non-response analyses can be conducted.

For the implementation of a population sample, the participating municipalities have to be contacted, the data taken over and merged. infas is one of only a few social research institutes in Germany with the resources and experience to successfully request address draws from numerous offices simultaneously and to create a total sample from the data.

Together with its corporate sister infas 360 GmbH, infas is the only social research institute in Germany to date to use Small Area Statistics for sample design.

While classical sampling methods provide the basis for reliable statistical statements on large-scale areas, elaborate empirical investigations are necessary for small-scale analyses, the problem generally being the size of the available samples. If these are spatially disaggregated, i.e. geographically broken down, only small subsamples with low statistical power in the subareas are obtained, so that the classical statistical estimation methods produce very high standard errors.

Small-area geographical references and the inclusion of external data make it possible to obtain representative samples even for very sophisticated or complex populations or small-area analyses. Small Area Statistics can also be used to find specific target groups with significantly reduced screening effort. Last but not least, the reliability and validity of realized surveys can be checked (see also Multidata based studies).

As a data specialist for geo-referenced inventory information, the company’s sister company infas 360 provides the necessary data and analyses for Small Area Statistics, Small Area Methods and Smart Research.

Statistical analysis methods

In addition to classic statistical analysis methods, which are daily practice at infas, our statistics department is continuously on the lookout for new methods, tests them and integrates them into its portfolio.

An efficient statistics department at infas ensures that the best statistical methods are always used in the empirical studies of the social research institute. In addition to all types of established statistical analyses, infas continuously scans and reviews new academic developments and integrates them into the existing statistical portfolio. When applying statistical methods, the research question defines the statistical analysis and not vice versa.

A selection of the statistical methods regularly used at infas:

  • linear, ordinal, multinomial, and logistic regression analyses,
  • Variance, factor, and cluster analyses,
  • mixed (hierarchical or multi-level) models,
  • Panel regressions,
  • effect analytic procedures (matching procedures, difference estimators, instance variables),
  • Structural equation models,
  • Small-area estimation method.

Local public transport plays an important role in achieving climate targets and ensuring future mobility in metropolitan areas and rural regions alike. At the same time, public transport is in strong competition with private transport and new mobility services.

News about statistics and analytical methods

Publications on the topics of statistics and analytical methods