--- parent: Decision Records nav_order: 18 --- # Use regular expression to split multiple-sentence titles ## Context and Problem Statement Some entry titles are composed of multiple sentences, for example: "Whose Music? A Sociology of Musical Language", therefore, it is necessary to first split the title into sentences and process them individually to ensure proper formatting using '[Sentence Case](https://en.wiktionary.org/wiki/sentence_case)' or '[Title Case](https://en.wiktionary.org/wiki/title_case#English)' ## Considered Options * [Regular expression](https://docs.oracle.com/javase/tutorial/essential/regex/) * [OpenNLP](https://opennlp.apache.org/) * [ICU4J](https://web.archive.org/web/20210413013221/http://site.icu-project.org/home) ## Decision Outcome Chosen option: "Regular expression", because we can use Java internal classes (Pattern, Matcher) instead of adding additional dependencies ### Positive Consequences * Less dependencies on third party libraries * Smaller project size (ICU4J is very large) * No need for model data (OpenNLP is a machine learning based toolkit and needs a trained model to work properly) ### Negative Consequences * Regular expressions can never cover every case, therefore, splitting may not be accurate for every title