For those techniques where hyperparameters need to be selected, we used a leave-one-out strategy on the test material. Gender Recognition Gender recognition is a subtask in the general field of authorship recognition and profiling, which has reached maturity in the last decades for an overview, see e. This apparently colours not only the discussion topics, which might be expected, but also the general language use. Top rankingfemales insvr ontokenunigrams, with ranksand scoresforsvr with various feature types.

The unigrams do not judge him to write in an extremely female way, but all other feature types do. However, it does not manage to achieve good results with the principal components that were best for the other two systems.

Before being used in comparisons, all feature counts were normalized to counts per words, and then transformed to Z-scores with regard to the average and standard deviation within each feature. Figures 1, 2, and 3 show accuracy measurements for the token unigrams, token bigrams, and normalized character 5-grams, for all three systems at various numbers of principal components.

