## Advent of Code 2018 – 25 days of coding

On December 1st the 2018 edition of Advent of Code will start. For those who don’t know what Advent of Code is: It is a programming competition where the authors release one programming problem every day at midnight EST/UTC-5 (6.00 in Germany).

The difficulty of the problems varies every day and it’s mostly about developing algorithms based on detailed descriptions. If you’re interested in how problems look like, check AOC 2017. You can implement your solution in any language you prefer. You don’t submit your code, but only the response of your algorithm to an input that is given to you on the problem description (this input is different for every user, so you cannot just steal it from others).

## Use inotifywait and rsync to automatically push code to a remote server without git (Tips for usage with PyCharm included)

I have written a little helper script that I use whenever I want to write code locally but run it remotely. This is for example useful when I cannot run the code locally because it needs one or more GPUs or is very computationally intensive.

One possibility would be to use git and push/pull each change manually. But this would obviously be too much effort for little changes (like typo fixes). Another alternative is to manually run rsync after each change. But as I am lazy, I want to run rsync automatically whenever any file in my project changes.

## Why are precision, recall and F1 score equal when using micro averaging in a multi-class problem?

In a recent project I was wondering why I get the exact same value for precision, recall and the F1 score when using scikit-learn’s metrics. The project is about a simple classification problem where the input is mapped to exactly $$1$$ of $$n$$ classes. I was using micro averaging for the metric functions, which means the following according to sklearn’s documentation:

Calculate metrics globally by counting the total true positives, false negatives and false positives.

According to the documentation this behaviour is correct:

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall.

After thinking about it a bit I figured out why this is the case. In this article, I will explain the reasons.

## Lemmatize whole sentences with Python and nltk’s WordNetLemmatizer

Lemmatization is the process of converting words (e.g. in a sentence) to their stemming while respecting their context. For example, the sentence “You are not better than me” would become “You be not good than me”. This is useful when dealing with NLP preprocessing, for example to train doc2vec models. The python module nltk.stem contains a class called WordNetLemmatizer. In order to use it, one must provide both the word and its part-of-speech tag (adjective, noun, verb, …) because lemmatization is highly dependent on context. Read More

## NLP: Approaches for Sentence Embeddings (Overview)

In 2013, Mikolov et. al published ‘Distributed Representations of Words and Phrases and their Compositionality‘, a paper about a new approach to represent words by dense vectors. This was an improvement over the alternative, representing words as one-hot vectors, as these dense vector embeddings encode some meaning of the words they represent. In other terms, words with similar meaning are be close to each other in the vector space of the embedding. For example, “blue” would be close to “red” but far from “cat”. A commonly used name for their approach is word2vec.

## Calculate power set (set of all subsets) in Python without recursion

If you want to calculate a set containing all subsets of set (also called power set) you could either choose an recursive approach or try this iterative approach which is faster than the recursive one.

def get_subsets(fullset):
listrep = list(fullset)

subsets = []
for i in range(2**len(listrep)):
subset = []
for k in range(len(listrep)):
if i & 1<<k:
subset.append(listrep[k])
subsets.append(subset)

return subsets

subsets = get_subsets(set([1,2,3,4]))
print(subsets)
print(len(subsets))

You can also find a shorter version at the end of the article, but to understand the principle the algorithm above is more suitable.

## Useful Python code snippets and language properties

In this article I want to share a few code Python snippets that can help writing short and efficient code. I tested it with Python 3.5.2 on Ubuntu 16.04.