
Projects
Advanced Reasoning Benchmark (ARB) Project
advanced prompting methods, measure performance of LLMs to solve academic problems
Multimodal Dataset Project
build dataset for training multimodal agents
Minerva-OCR Project
An OCR model of quality comparable to proprietary OCR solutions that can generate bounding boxes around any word that appears on an image and label them with the corresponding word.
Exploring video pretraining for sample efficient RL (aka #video-rl)
Improve pre training of video models for sample efficient RL agents
Non-English GPTs and better datasets and tokenizers (aka #polylingual)
Build higher-quality non-English datasets and tokenizers and finetune non-English GPTs out of a pretrained English GPT
Learning Machine Learning (aka #learning)
bring people up to speed in SoTA machine learning
