This Meta November 18 2025 paper details the development, training, and evaluation of Segment Anything Model 3 (SAM 3), a promptable segmentation model for images and videos. A major focus is the creation of the Segment Anything with Concepts (SA-Co) benchmark, which uses a multi-stage data engine involving noisy pseudo-labels, human annotators, and AI verifiers to produce high-quality, large-scale training data with an extensive ontological coverage of concepts. The document also explores model architecture components, such as temporal disambiguation strategies for multi-object tracking in videos and an ambiguity head to handle multiple valid interpretations of a phrase. Finally, extensive quantitative results are presented, comparing SAM 3's performance against various state-of-the-art models across tasks like instance segmentation and object counting. Source: https://scontent-sjc6-1.xx.fbcdn.net/v/t39.2365-6/586037495_2236299700208804_3520531923593328648_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=nmZfwAXlWFIQ7kNvwGuKXcX&_nc_oc=Adnm9S5A81iwt1v5NK0_vEawxh12xF9LXksgiuxyQBYKt0QgFzDZlMMCfu1GtGLRR7g&_nc_zt=14&_nc_ht=scontent-sjc6-1.xx&_nc_gid=1CWvrmVm88pkpnwup5jdnA&oh=00_AfjvGlCU_0PFdvGqnjcfyQuKxfa3Qz18c_452htHpqMptw&oe=69251C89