
Machine Learning Research
Synthetic Data Factory: AgentInstruct, a framework for generating diverse synthetic data for LLM fine-tuning
Researchers increasingly fine-tune models on synthetic data, but generated datasets may not be sufficiently diverse. New work used agentic workflows to produce diverse synthetic datasets.