Why "Better Model" is Not the Same as "Better Outcome"
Foundation models have moved fast and SAM 3 is genuinely a step forward on segmentation. But a deployed vision system is more than a model. It is camera discovery and stream handling, frame sampling, scheduling across many cameras on modest hardware, event definitions that match how a site actually works, deduplication and noise control, evidence routing into the tools the team already uses, multi-site management, and a path from "I had an idea" to "the agent is running" that does not require a research team.
Most of the value in a production deployment lives in those layers. The model is one component. Plenty of organisations have stalled trying to take a brilliant model from a research demo to a system that survives on a site for six months without an engineer touching it.
The Hardware Reality
SAM 3 at 848M parameters is not something you run on a small NUC under the till in the back office. It expects GPU-class compute. That is the right design choice for a research-grade foundation model and the wrong shape of bill for most real sites. Manako composes specialist skills sized to the hardware that actually exists in the field. We choose the skill on the model versus latency versus cost trade-off per task, and we make that choice for you.
How Manako Uses Models Like SAM 3
A Vision Agent is not one model trying to do everything. It is a composition of specialist vision skills, each trained to do one thing extremely well. When you describe what you need watched for, the platform selects the exact skills the job requires and composes them. Some skills are best served by a small, fast specialist that fits on cheap hardware. Some are best served by a foundation model where the value justifies the cost. The platform makes that choice based on accuracy, latency and the hardware you actually have, not on which model is in the press cycle.
Manako benchmarks its skills against independent ground truth. SAM 3 is one of the public references in those benchmarks. We are not in competition with the model. We compose with it where it is the right tool.
Where Each is the Right Call
Choose SAM 3 When:
- You are an ML team building a vision product and you want a powerful segmentation foundation to fine-tune on.
- You need raw segmentation masks and you have the engineering capacity to ship them into production.
- You are doing research or labelling at scale and want a strong zero-shot starting point.
- You have the GPU budget and the people to maintain a production CV pipeline.
Choose Manako When:
- You have cameras. You have a problem. You want it solved this week, on the hardware that already exists at the site.
- You do not want to staff up an ML team to deploy what should be a configuration.
- You want the model choice to be made by the platform based on the job and the hardware, not by your engineers based on Twitter.
- You need the evidence to land with your duty manager, not in a SQL table somebody has to query.
Conclusion
SAM 3 is a model. Manako is a product. We benchmark against SAM 3 because it is one of the best public references in segmentation. We do not compete with it. We compose with it where the hardware and the job allow.

