--- title: README emoji: 📚 colorFrom: yellow colorTo: red sdk: static pinned: false --- # About Us MosaicML’s mission is to make efficient training of ML models accessible. We continually productionize state-of-the-art research on efficient model training, and study the combinations of these methods in order to ensure that model training is ✨ as optimized as possible ✨. These findings are baked into our highly efficient model training stack, the MosaicML platform. If you have questions, please feel free to reach out to us on [Twitter](https://twitter.com/mosaicml), [Email](community@mosaicml.com), or join our [Slack channel](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg)! # [LLM Foundry](https://github.com/mosaicml/llm-foundry/tree/main) This repo contains code for training, finetuning, evaluating, and deploying LLMs for inference with [Composer](https://github.com/mosaicml/composer) and the [MosaicML platform](https://www.mosaicml.com/training). # [Composer Library](https://github.com/mosaicml/composer) The open source Composer library makes it easy to train models faster at the algorithmic level. It is built on top of PyTorch. Use our collection of speedup methods in your own training loop or—for the best experience—with our Composer trainer. # [StreamingDataset](https://github.com/mosaicml/streaming) Fast, accurate streaming of training data from cloud storage. We built StreamingDataset to make training on large datasets from cloud storage as fast, cheap, and scalable as possible. It’s specially designed for multi-node, distributed training for large models—maximizing correctness guarantees, performance, and ease of use. Now, you can efficiently train anywhere, independent of your training data location. Just stream in the data you need, when you need it. To learn more about why we built StreamingDataset, read our [announcement blog](https://www.mosaicml.com/blog/mosaicml-streamingdataset). StreamingDataset is compatible with any data type, including images, text, video, and multimodal data. With support for major cloud storage providers (AWS, OCI, and GCS are supported today; Azure is coming soon), and designed as a drop-in replacement for your PyTorch [IterableDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset) class, StreamingDataset seamlessly integrates into your existing training workflows. # [MosaicML Examples Repo](https://github.com/mosaicml/examples) This repo contains reference examples for training ML models quickly and to high accuracy. It's designed to be easily forked and modified. It currently features the following examples: * [ResNet-50 + ImageNet](https://github.com/mosaicml/examples#resnet-50--imagenet) * [DeeplabV3 + ADE20k](https://github.com/mosaicml/examples#deeplabv3--ade20k) * [GPT / Large Language Models](https://github.com/mosaicml/examples#large-language-models-llms) * [BERT](https://github.com/mosaicml/examples#bert) # [MosaicML Platform](https://mcli.docs.mosaicml.com/en/latest/getting_started/installation.html) The proprietary MosaicML Platform enables you to easily train large AI models on your data, in your secure environment. With the MosaicML Platform, you can train large AI models at scale with a single command. We handle the rest — orchestration, efficiency, node failures, infrastructure. Our platform is fully interoperable, cloud agnostic, and enterprise proven. It also seamlessly integrate with your existing workflows, experiment trackers, and data pipelines.