Feature Selection
Feature selection is the process of selecting a group of terms from a training set and using them as features.
The two main benefits of doing this is that firstly you are decreasing the number of features. This automatically makes training/classifying quicker. Secondly it increases classification accuracy by removing noise and can therefore help prevent over-fitting.
I have recently released a library containing 3 different feature selection algorithms, so I am going to focus on them.
Chi Squared
Chi Squared is used in statistics to measure the independence of two independent events. In our case, our two events are the occurrence of a term and a class. What the equation spits back at us is a measure of how much the expected and observed count differs from each other.
The higher the result the more dependent the two events are of each other. (i.e the occurrence of a term makes the occurrence of the class more or less likely.)
The equation looks like…

N is the observed frequency and E is the expected frequency.
et is the occurrence of a term. Which can be true or false (1, 0).
ec is the occurrence of a class. Which can be true or false (1, 0).
Therefore N11 would be the occurrence of a term AND a class
Mutual Information
Mutual information measures how much the presence or absence of a term contributes to making the correct classification.

Frequency Based
Frequency Based is simply selecting the terms that occur most in a class
Here is the code (1)