So excited to finally see a large LLM repo which provides code AND the data used to train it. We need to demand Open Data as well as Open Source in the field of machine learning/GenAI. Tip of my hat to the team behind OpenCoder: https://github.com/OpenCoder-llm/OpenCoder-llm