Every event nowadays --- from social gatherings to terrorist attempts --- is massively captured by multiple cameras and instantly uploaded to the internet. Although several of such images might have timestamps embedded into their metadata, they are not always correct --- due to misregistration or even alteration --- nor trustworthy to infer time information and reconstruct the order of the facts. Nonetheless, in a forensic investigation, it is still indispensable to mine temporal information to fully understand and reconstruct an event like this, besides being able to fact-check and organize the flood of information regarding that incident. Besides digital forensics, temporal information can help researchers to better explain how human interaction shapes our environment and how these interactions differ in time, as well as the development and consequences of natural or human events, such as the passage of a hurricane, a military conflict, or the evolution of cities throughout years. Temporal reasoning is an important tool to understand how human trends, behaviors and tastes evolved, e.g., in fashion and architectural styles.Even though time has a single direction, its passage can be perceived in different ways. The visual evidences used to identify the flow of time between two moments of the same scene are strongly dependent on the scene's semantic, the elements present in it and the amount of time that has passed between those moments. The most common hints are dynamic elements (e.g., objects and people movement through the scene), differences in illumination and shadows caused by the sun's movement in the sky, the presence or absence of weather effects (e.g., snow, fog, rain), variations caused by the seasons (e.g., trees losing their leaves in the Fall), and signs of human intervention (e.g., the construction of a building or a landmark). In this vein, the objective of this work is to propose techniques capable of understanding how to reason about time using visual information present in pictures and videos. More specifically, we envision methods capable of inferring the chronological order of a group of media depicting the same semantic context.We have split this task into smaller ones, regarding the time duration comprehended by the media pieces. The importance of a particular visual evidence when inferring temporal order is strongly dependent to how much time it has passed between each capture. Considering this, we plan on investigating both how to model these evidences as well as how to weight them during the inference. We will explore traditional hand-crafted features (i.e., features designed with domain knowledge to capture a particular characteristic from the image) and data-driven techniques (i.e., methods that learn the important features directly from the raw pixels) aimed for each proposed time span and evidence.In this work, we will also investigate how the volume of previous knowledge about the context being analyzed, i.e., the amount of training samples of that specific place or event of interest, affects the temporal prediction. In this sense, we intent to explore whether a more precise prediction can be achieved by devising a solution specialized in a particular context; or if a method trained with experience from multiple contexts obtains better results.Finally, this proposal is part of the thematic project "DéjàVu: Feature-Space-Time coherence from heterogeneous data for media integrity analytics and active interpretation of events", that aims at organizing, synchronizing and harnessing information from multiple media sources to better understand what happened before, during and shortly after an event. Our proposal fits into the core of DéjàVu, with the goal of chronologically organizing available imagery to better comprehend the order of the facts. Additionally, we also aim at mining temporal information that other DéjàVu techniques could leverage during their pipeline.
News published in Agência FAPESP Newsletter about the scholarship: