Programming – Simon's blog

Resources to learn how to use LLMs (Large Language Models)

August 19, 2023September 15, 2023 Simon Leave a comment

Weights & Biases: Building LLM powered apps

This is a course by Weights & Biases that explains the basics of tokenization and how to build apps using LLM APIs: https://www.wandb.courses/courses/building-llm-powered-apps

Useful notebooks: https://github.com/wandb/edu/tree/971ef92ee35ecaaf6b5a4902d11804540af60879/llm-apps-course/notebooks

Example chatbot implementation that can answer questions about documentation in form of Markdown files: https://github.com/wandb/edu/tree/971ef92ee35ecaaf6b5a4902d11804540af60879/llm-apps-course/src

Coursera / deeplearning.ai – Generative AI with Large Language Models

This course costs 39 pounds but is worth the price. It covers many topics of the LLM lifecycle in three weeks of videos, labs and quizzes. Lots of papers and articles for deeper knowledge are also provided!

Topics that are covered are:

Transformer architectures (encoder-decoder, encoder only, decoder only) and their use cases
Pre-training for the different transformer variants
Fine-tuning (Parameter-efficient fine-tuning (PEFT), Low rank adaptation (LoRA), prompt tuning)
Distributed training
Quantization
Model evaluation
Reinforcement learning from human feedback (RLHF)
Program-aided language models (PAL)
and others

You can enroll here: https://www.coursera.org/learn/generative-ai-with-llms/

Python 3: Recursively print structured tree including hierarchy markers using depth-first search

September 5, 2020September 5, 2020 Simon 3 Comments

Printing a tree in Python is easy if the parent-child relationship should not be visualized as well, i.e. just printing all nodes with an indentation that depends on the level within the tree.

To keep the code easy, let’s first define a simple tree structure by creating a Node class that holds a value x and can have an arbitrary number of child nodes:

class Node(object):
    def __init__(self, x, children=[]):
        self.x = x
        self.children = children

To print all nodes of a tree using depth-first search, only few lines are required:

def printTree(root, level=0):
    print("  " * level, root.x)
    for child in root.children:
        printTree(child, level + 1)

#tree = Node(..., children=[Node(...., ...), Node(...,....)] # See end of the article for a bigger structure that is used for the examples in this article.
printTree(tree)

However, the output can be hard to read. When the tree has more than a few levels, it is challenging to see the relationship between parent and child nodes. A definition of the following tree is given at the end of this article if you want to try it yourself. For now, just focus on the output:

Advent of Code 2018 – 25 days of coding

November 30, 2018December 7, 2018 Simon Leave a comment

On December 1st the 2018 edition of Advent of Code will start. For those who don’t know what Advent of Code is: It is a programming competition where the authors release one programming problem every day at midnight EST/UTC-5 (6.00 in Germany).

The difficulty of the problems varies every day and it’s mostly about developing algorithms based on detailed descriptions. If you’re interested in how problems look like, check AOC 2017. You can implement your solution in any language you prefer. You don’t submit your code, but only the response of your algorithm to an input that is given to you on the problem description (this input is different for every user, so you cannot just steal it from others).

Use inotifywait and rsync to automatically push code to a remote server without git (Tips for usage with PyCharm included)

November 29, 2018November 29, 2018 Simon 2 Comments

I have written a little helper script that I use whenever I want to write code locally but run it remotely. This is for example useful when I cannot run the code locally because it needs one or more GPUs or is very computationally intensive.

One possibility would be to use git and push/pull each change manually. But this would obviously be too much effort for little changes (like typo fixes). Another alternative is to manually run rsync after each change. But as I am lazy, I want to run rsync automatically whenever any file in my project changes.

Lemmatize whole sentences with Python and nltk’s WordNetLemmatizer

June 29, 2018July 2, 2018 Simon Leave a comment

Lemmatization is the process of converting words (e.g. in a sentence) to their stemming while respecting their context. For example, the sentence “You are not better than me” would become “You be not good than me”. This is useful when dealing with NLP preprocessing, for example to train doc2vec models. The python module nltk.stem contains a class called WordNetLemmatizer. In order to use it, one must provide both the word and its part-of-speech tag (adjective, noun, verb, …) because lemmatization is highly dependent on context. Read More

Calculate power set (set of all subsets) in Python without recursion

December 10, 2017September 5, 2020 Simon 8 Comments

If you want to calculate a set containing all subsets of set (also called power set) you could either choose an recursive approach or try this iterative approach which is faster than the recursive one.

def get_subsets(fullset):
  listrep = list(fullset)

  subsets = []
  for i in range(2**len(listrep)):
    subset = []
    for k in range(len(listrep)):			
      if i & 1<<k:
        subset.append(listrep[k])
    subsets.append(subset)		

  return subsets

subsets = get_subsets(set([1,2,3,4]))
print(subsets)
print(len(subsets))

You can also find a shorter version at the end of the article, but to understand the principle the algorithm above is more suitable.

Useful Python code snippets and language properties

December 6, 2017August 8, 2018 Simon Leave a comment

In this article I want to share a few code Python snippets that can help writing short and efficient code. I tested it with Python 3.5.2 on Ubuntu 16.04.

I will keep updating this article.
Read More