Summary of "Interpretace HR dat pomocí jazykových modelů – Luděk Kopáček, Martin Koryťák [seminář MPN 6.11.2024]"

Main Ideas and Concepts

Introduction to the Seminar: The seminar focuses on the interpretation of HR data using language models, presented by Luděk Kopáček and Martin Koryťák from Workday.
Overview of Workday:
- Workday develops software for large companies, focusing on finance and HR.
- The Prague branch specializes in extended analytics, combining data analysis with machine learning and natural language processing.
Understanding HR Data:
- HR data begins with the application process and evolves through hiring and employee records.
- It encompasses various metrics related to employee skills, recruitment processes, and organizational dynamics.
Data Analysis and Storytelling:
- Workday employs an analytical tool called "storyteller" to identify business patterns and anomalies in HR data.
- The goal is to translate complex data insights into understandable narratives for business users.
Language Models in Data Interpretation:
- Traditional methods used templates for data interpretation, which were often confusing.
- The shift towards using language models aims to generate more natural and context-aware narratives.
Technical Insights on Language Models:
- Introduction of transformers and their application in language models.
- Emphasis on tokenization and how language models process text iteratively to generate coherent outputs.
Challenges and Solutions in Model Training:
- The journey of using open-source language models and the challenges faced, including the need for fine-tuning and effective prompt engineering.
- Techniques like Few-Shot Learning and LoRA (Low-Rank Adaptation) are discussed for optimizing model training with limited data.
Evaluation and Quality Control:
- The importance of stability, response speed, and the differentiation of outputs based on input.
- Strategies for error detection and correction in generated text, including using secondary models for verification.
Lessons Learned:
- The significance of data quality in model training.
- The rapid evolution of language models necessitates continuous adaptation and integration into products.

Methodology and Instructions

Using Language Models:
- Shift from template-based outputs to direct generation using language models.
- Implement Few-Shot Learning for training with minimal examples.
- Utilize LoRA for efficient fine-tuning of smaller models.
Error Detection:
- Use a secondary model to verify outputs and correct errors.
- Employ statistical significance tests to identify meaningful insights in data.
Model Evaluation:
- Focus on response stability and speed during text generation.
- Ensure differentiation in outputs for varied inputs to enhance user experience.

Speakers

Luděk Kopáček: Co-presenter from Workday, discussing the company's background and the application of language models.
Martin Koryťák: Co-presenter from Workday, focusing on the technical journey and challenges encountered with language models.