Computer Vision for Spatio-temporal Analysis of Internet Photo Collections
Author | : Kevin Matzen |
Publisher | : |
Total Pages | : 272 |
Release | : 2016 |
Genre | : |
ISBN | : |
Download Computer Vision for Spatio-temporal Analysis of Internet Photo Collections Book in PDF, ePub and Kindle
The advent of the digital camera and subsequently the smartphone has ushered in an unprecedented age of photography where a large fraction of the human population has an Internet-enabled camera in their pocket. What is perhaps more remarkable is that many of these people are willing to share their experiences publicly with the rest of the world by uploading them to social media platforms such as Flickr, Facebook, and Instagram; roughly billions of photos per day. However, making use of these photos in a shared setting that leverages the uniquely visual aspect of this medium is non-trivial. What does this photo contain? Where was this photo taken? What was the structure of the scene? What trends can we observe? What is the story behind this photo? These are the sorts of questions one might like to answer using computer vision applied to an immense corpus of imagery, effectively peering through billions of windows into the world with automation. This dissertation presents three methods for doing such large-scale analyses to help a human analyst understand properties of the real world related to space and to time by automatically analyzing, cataloging, and visualizing Internetscale photo collections, evaluated on millions or tens of millions of photos, but designed to scale horizontally for modern data processing platforms. First I present a method for aggregating millions of photographs of some physical space in the world (e.g. a city) and building a 3D reconstruction that has time- varying appearance capturing the evolution of that 3D space over time. This method includes a segmentation algorithm to recover temporally consistent elements and these elements can in turn be detected in new imagery to predict when the photo was taken. Next I present a method for aggregating millions of photographs and using state-of-the-art convolutional neural networks to mine for small discriminative patches in the imagery. These patches are designed to be discriminative in the sense that if the goal is to classify an image into one of two categories, then these patches could be used in lieu of the full image. Experiments validate the method on a wide variety of datasets and tasks, but of most relevance is the application to informing an analyst what visual elements differentiate one city from another in terms of building architecture or geo-location and time from the observed fashion style. Finally, I present a method and in depth case study for using millions of photographs to identify and visualize fashion style trends across the world and across years.