sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

594
active users

Who here is interested in talking about optimization? 🤖

My research is in , and I am applying these methods to coordinate teams of aerial to film moving actors such as animal groups or as in team sports!

Thread 👇

Micah Corah

Many perception objectives are submodular. For example, if you want to maximize coverage over the surface of a group of moving actors, you will get a submodular objective.

From there, you can develop efficient and distributed algorithms to coordinate the team to maximize the quality of views.

This even works well in pretty general cases such as optimizing receding horizon trajectories (e.g. dynamic motions) with path costs, terminal costs/constraints, and (some) non-collision constraints. Likewise, camera and sensor models can incorporate noise or raytracing against a complex environment without upsetting underlying assumptions.

However, there is another part to this story. While the methods I work with are powerful and often easy to implement, they are often studied primarily from a theoretic perspective. And, while these methods have been implemented on multi-robot systems, to the best of my knowledge, these implementations have not often demonstrated significant impacts on task performance.

The reasons for why this happens are a little fuzzy. However, in my understanding this often comes down to implicit coordination. Applications and systems that involve generating or maintaining a shared model (e.g. when tracking targets or mapping a building) often exhibit implicit coordination due to sharing models or sensor data. Knowing what other robots have previously observed is often sufficient for making decisions when good performance can be achieved just by observing something else.

This is where problems like multi-robot videography come in. In these settings, you want to optimize immediate views, and your rewards for observations at one moment might be independent of observations at any other time.

The result, is you end up with system dynamics that are quite different from other domains. In videography applications, robots without coordination will tend to collapse to common equilibria (e.g. to film the front of the most important person).

Then, explicit coordination via methods like greedy submodular optimization ends up being critical to task performance (e.g. maintaining a diversity of viewpoints that evenly cover a group of people).

And, this is the gist of what I have been working on for the last year or so 😄🤖🛩️