Learning Algorithms with a Transformer

This article demonstrates training a transformer model to run a multi-step algorithm in order to sort lists of numbers. My previous article demonstrated adding lists of numbers by showing the transformer the input numbers and asking it to generate their sum. This simple approach can also be applied to sorting lists, however as the lists get larger the task gets increasingly complex and it is not feasible for the model to spit out the solution in one go. Instead, the model needs to be taught how to solve the problem using a multi-stage algorithm, just as a human would. The model below is trained to generate the steps of bubble sort, allowing it to break the problem into smaller steps that it executes one after another.

The training code for this model can be found on GitHub.

The end of each bubble sort iteration is denoted by the number 10. The model is trained to output an 11 before and after the final iteration, to distinguish the final sorted list from the intermediate steps. It’s worth noting that the model in the demo isn’t 100% accurate; maybe you can find a list that it will keep sorting forever! Humans often make mistakes when following algorithms too, but they can check over their work to go back and correct it if needed. This model only gets one shot to get it right.

Bubble sort isn’t the fastest sorting algorithm out there; it would be interesting to train a model that runs a more efficient algorithm, such as merge sort. In fact, it would be even better if the model could learn its own sorting algorithm all by itself! I’m hoping to train models that are able to learn algorithms from scratch, having been trained on just examples of the expected inputs and outputs of the algorithm.

Learning Algorithms with a Transformer

Further Reading

Learning Algorithms from Scratch

Adding Numbers with a Transformer

Transformers as Rule Engines