Abstract: Human action understanding serves as a foundational pillar in the field of intelligent motion perception.Skeletons serve as a modality- and device-agnostic representation for human modeling, ...
Perception Encoder, PE, is the core vision stack in Meta’s Perception Models project. It is a family of encoders for images, video, and audio that reaches state of the art on many vision and audio ...
Abstract: The rapid expansion of aerial vehicle applications in the low-altitude economy (LAE) requires reliable scene understanding to support safe and effective urban operations. However, existing ...