# Support Vector Machines review

This blog is a review about support vector machines.

• Maximize the distance
• Hinge loss and slack variable
• Primal and dual forms
• Kernel trick
• Transductive SVM

## Maximize the distance

SVM was put forward by Cortes and Vapnik (1995)1.
For a set of labeled points, we want to find a decision boundary which separates the two classes to the furthest possible distances.
Consider two points from the hyperplanes $wx+b=0$ and $wx-b=0$ respectively: their distance could be denoted by:

To maximize this distance is essentially minimize $|w|$.

## Hinge loss and slack variable

The data are not always linearly separable. Introduce hinge loss

which is 0 when each data point is at correct side (a.k.a $wx_i -b > 1$ when $y_i=1$ or <-1 if -1 for the other class); and larger than 0 otherwise.
Now we have a quadratic programming problem:

subject to $y_i (wx_i - b) \geq 1-\xi_i$

## Primal and dual form

Above mentioed is the primal formulation. Its dual form (as in quadratic programming sense?) is:

subject to $\sum_i c_i y_i = 0$, and $0 \leq c_i \leq \frac{1}{2n\lambda} \forall i$
where $c_i$ are defined as $w = \sum_i c_i y_i x_i$

## Kernel trick

We can replace the $x_i x_j$ term with a higher order (kernel) function $\phi (x_i x_j)$.

## Transductive SVM

Transductive learning applies when there are very few labeled data. During training, look at those unlabeled data. Test the model on the unlabeled data. As a reminder, do not choose models based on any information related to unlabeled data.
Transductive SVM is proposed by (Joachims 1999)2. The algorithm could be written simplified as following.
(1) Train an SVM on the labeled data;
(2) Use it to label those test data;
(3) if there exists two newly labeled points such that their estimated labels are different, and their slack variables are both larger than 0, and that their slack variables sum up to be greater than 2 (a.k.a: this decision boundary is not very “confident” for these two points), then toggle the estimated labels of these two points.
(4) Repeat step (1)-(3) several times until some criteria is satisfied.

This version could be implemented fairly easily. There are other versions of transductive SVMs. e.g: this github repo contains some good examples.

References: