June 6th ABAIR Meeting
· One min read
Highlights
- DCU text corpus finally accessed, but data duplication an issue.
- Attempting to improve recognition by using LLMs to generate diverse sentences with Irish placenames.
- Problems with textcorpus from DCU - duplication. Only half 38.9 million words
- Issues with proper nouns in recognition - not well-covered in corpora. Take placenames from logainm and use LLMs to generate text.
- Chat with Master's supervisor. No issue with problematic data.
- Liam suggests mixing some of the shorter and longer data so model doesn't overfit. Varied lengths may help.
- Generated sentences with placenames with LLMs.
- Introduced mol.abair.ie