There is a very nice wrapper for libconfig for python, right here https://github.com/keimoon/python-libconfig

(This is connected to the previous post on libconfig here)

## Wednesday, July 17, 2013

## Saturday, April 6, 2013

### Why class balancing happens 'automatically' in AdaBoost

This is a very interesting characteristic of AdaBoost, that explains why, in general, it is not necessary to balance the initial weights during training for very skewed datasets.

I will use the notation from the AdaBoost Wikipedia article.

First, assume that we have $N_p$ positive samples and $N_n$ negative samples, with $N_n >> N_p$, representing a skewed dataset. Let's name the ratio between the number of negative and positive samples as $K = \frac{N_n}{N_p}$.

Now, assume that we are at the first iteration, and we have uniform weights $D_{t=1}(i) = \frac{1}{m}$ for all training samples. At the first iteration $t=1$ the goal is to find a weak learner

since all weights are equal, $\epsilon_t$ can be expressed as $\epsilon_{t} = \sum_{i=1}^{m} I(y_i \ne h_{t}(x_{i}))$, so it is just the unweighted misclassification error.

Now it comes the interesting part: assume that the weak learners in the weak learner pool are very weak, so that

With this in mind, $\epsilon_1 = \frac{N_p}{N_p + N_n}$. With simple math we can compute the value of $\alpha_1=\frac{1}{2} \log \frac{1 - \epsilon_1}{\epsilon_1}$ and update the distribution $D$ for iteration $t=2$, obtaining

I will use the notation from the AdaBoost Wikipedia article.

First, assume that we have $N_p$ positive samples and $N_n$ negative samples, with $N_n >> N_p$, representing a skewed dataset. Let's name the ratio between the number of negative and positive samples as $K = \frac{N_n}{N_p}$.

Now, assume that we are at the first iteration, and we have uniform weights $D_{t=1}(i) = \frac{1}{m}$ for all training samples. At the first iteration $t=1$ the goal is to find a weak learner

$h_{t} = \underset{h_{t} \in \mathcal{H}}{\operatorname{argmax}} \; \left\vert 0.5 - \epsilon_{t}\right\vert$, with $\epsilon_{t} = \sum_{i=1}^{m} D_{t}(i)I(y_i \ne h_{t}(x_{i}))$

since all weights are equal, $\epsilon_t$ can be expressed as $\epsilon_{t} = \sum_{i=1}^{m} I(y_i \ne h_{t}(x_{i}))$, so it is just the unweighted misclassification error.

Now it comes the interesting part: assume that the weak learners in the weak learner pool are very weak, so that

*the lowest misclassification error*$\epsilon_t$*is achieved with a weak learner that classifies all samples as negatives*. (This seems unreasonable, but I have seen it happen frequently on skewed datasets).With this in mind, $\epsilon_1 = \frac{N_p}{N_p + N_n}$. With simple math we can compute the value of $\alpha_1=\frac{1}{2} \log \frac{1 - \epsilon_1}{\epsilon_1}$ and update the distribution $D$ for iteration $t=2$, obtaining

$$\frac{D_{t=2}(+)}{D_{t=2}(-)} = \frac{N_n}{N_p} = K$$

where $D_{t=2}(+)$ and $D_{t=2}(-)$ are the weight of the positive and negative samples respectively.

This means that

**the effect of the first weak learner was to balance the dataset, so that the sum of the positive weights equal the sum of the negative ones**. Another equivalent interpretation is that the first weak learner just adds a constant to the predictive function, namely $-\alpha_1$ in this particular case.
However, there are some differences w.r.t. balancing the weights initially:

- The first weak learner is 'wasted' and could be avoided.
- If shrinkage is introduced, it is likely that many weak learners will be 'wasted'. Though 'wasted' is not very meaningful, because they would still be useful and help minimize training loss.
- The fact that the first weak learner returns -1 for all training samples does not imply that it will also for any unseen new sample. The implications of this are hard to tell though.
- I am assuming that a weak learner can return the same value for all training samples, which is the case for a decision stump, but it may not happen with other types of weak learners (e.g. svm)
- .. I am pretty sure there are other differences

Cheers!

## Tuesday, February 19, 2013

### C++11: compile-time lookup-table/array with constexpr

I've been trying a few of the new features of C++11, wondering how difficult it would be to create a compile-time self-populated array.

Here there is a piece of code that creates an array that contains the values of sin(0.2 x) + cos(0.5 x), completely at compile-time, and then prints it in ascii form to stdout. The most tricky part is the one containing a variadic template, which computes a set of indices from 0 to N-1.

You can see the output it produces here

Here there is a piece of code that creates an array that contains the values of sin(0.2 x) + cos(0.5 x), completely at compile-time, and then prints it in ascii form to stdout. The most tricky part is the one containing a variadic template, which computes a set of indices from 0 to N-1.

You can see the output it produces here

Subscribe to:
Posts (Atom)