0
One of the most interesting recent developments in natural language processing is the T5 family of language models. These are transformer based sequence-to-sequence models trained on multiple different tasks. The main insight is that different tasks provide a broader context for tokens and help the model statistically approximate the natural language context of words.
Tasks can be specified by providing an input context, often with a “command” prefix and then predicting an output sequence. This google AI blog article contains an excellent description of how multiple tasks are specified.
Naturally, these models are attractive to anyone working on NLP. The drawback, however, is that they are large and require very significant computational resources, beyond the resources of most developers desktop machines. Fortunately, google makes their Colaboratory platform free to use (with some resource limitations). It is the ideal platform to experiment with this family of models in order to see how their perform on a specific application.
As I decided to do so, for a text-classification problem, I had to pierce together information from a few different sources and try a few different approaches. I decided to put together a mini tutorial of how to fine-tune a T5 model Continue reading