ServiceNow Research

Leveraging Human Preferences to Master Poetry

Abstract

Large language models have been fine-tuned to learn poetry via supervised learning on a dataset containing relevant examples. However, those models do not generate good-quality output that respects the structure expected for a specific poem type. For instance, generated haikus may contain toxic language, be off-topic, incoherent, and not respect the typical 5-7-5 syllable meter. In this work, we investigate if it is possible to learn an objective function to quantify the quality of haiku—from human feedback—and if this reward function can be used to improve haiku generation using reinforcement learning.

Publication
AAAI Workshops
Rafael Pardinas
Rafael Pardinas
Applied Research Scientist

Applied Research Scientist at AI Frontier Research located at London, United Kingdom.

Gabriel Huang
Gabriel Huang
Research Scientist

Research Scientist at AI Frontier Research located at Montreal, QC, Canada.

David Vazquez
David Vazquez
Director of AI Research

Director of AI Research at AI Research Management located at Montreal, QC, Canada.

Alexandre Piche
Alexandre Piche
Research Scientist

Research Scientist at AI Frontier Research located at Montreal, QC, Canada.