
| Footnotes | * The GPT-4 technical report contains no detail about "architecture (including model size), hardware, training compute, dataset construction, [or] training method" due to the "competitive landscape and the safety implications" of large LMs. ** A token can be either a word or sub-word, generated using a byte-pair encoding (BPE). BPE creates a smaller vocabulary by breaking down words into smaller sub-word units. It iteratively merges the most frequent pair of consecutive bytes in a corpus of text, gradually reducing the number of unique byte sequences until a desired vocabulary size is reached. |
| References | [1] Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901. [2] Huang, Shaohan, et al. "Language Is Not All You Need: Aligning Perception with Language Models." arXiv preprint arXiv:2302.14045 (2023). [3] Alayrac, Jean-Baptiste, et al. "Flamingo: a visual language model for few-shot learning." arXiv preprint arXiv:2204.14198 (2022). [4] Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022). [5] Radford, Alec, et al. "Robust speech recognition via large-scale weak supervision." arXiv preprint arXiv:2212.04356 (2022). [6] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). [7] Hao, Yaru, et al. "Language models are general-purpose interfaces." arXiv preprint arXiv:2206.06336 (2022). [8] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). [9] Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021. [10] Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022). [11] Bai, Yuntao, et al. "Constitutional AI: Harmlessness from AI Feedback." arXiv preprint arXiv:2212.08073 (2022). [12] OpenAI. "GPT-4 Technical Report", OpenAI (2023) [13] Wei, Jason, et al. "Chain of thought prompting elicits reasoning in large language models." arXiv preprint arXiv:2201.11903 (2022). |
| Author | John Hughes & Lawrence Atkins |
| Acknowledgements | Edward Rees, Ellena Reid, Liam Steadman, Markus Hennerbichler |




![[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]](/_next/image?url=https%3A%2F%2Fimages.ctfassets.net%2Fyze1aysi0225%2F4jGjYveRLo3sKjzBzMIXXa%2F11e90a40df418658e9c15cb1ecff4e4b%2FBlog_image-wide-carousel.webp&w=3840&q=75)
![[alt: Logo design featuring the text "SPEECHMATICS" alongside a stylized logo for "Cekura," set against a soft green background with subtle curved lines.]](/_next/image?url=https%3A%2F%2Fimages.ctfassets.net%2Fyze1aysi0225%2F39N1Yr95B2jvfd7JKGihq0%2F7b1ca5f8d5db0235b64829dcab16b96a%2FSpeechmatics_partners_with_Cekura-wide-carousel.webp&w=3840&q=75)