Thursday, November 29, 2018

Google+ Migration - Part VI: Location, Location, Location!

<- Image Attachments

Before we focus on putting all the pieces together, here a small, optional excursion into how to make use of location information contained in G+ posts.

We should consider carefully if and how we want to include geo location information as there might be privacy and safety implications. For such locations, it can make sense to choose the point of a nearby landmark or add some random noise to the location coordinates.

Many of my public photo sharing post containing the location of near where the photos where taken. Diaspora* posts can contain a location tag as well, but it does not seem to be very informative and the diaspy API currently does not support adding post a post location.

Instead we can process the location information contained in the post takeout JSON files and transform it to extract some information which we can use to format the new posts.

In particular, we want to include a location link to the corresponding location on Openstreetmap as well as generate some additional hashtags from the location information, e.g. which country or city the post relates to.

Using the longitude & latitude coordinates from the location info, we can directly link to the corresponding location for example on Openstreetmap or other online mapping services.

"location": {
    "latitude": 40.7414688,
    "longitude": -74.0033873,
    "displayName": "111 8th Ave",
    "physicalAddress": "111 8th Ave, New York, NY 10011, USA"
  }

In order to extract hierarchical location information like the country or the city of the location, we are calling the reverse-geocoding API of Openstreetmap with the coordinates to find the nearest recorded address of that point. To simply calling the web-api, we can use the geopy library (install for example with pip install geopy).

From various components of the address, we can generate location hashtags that help define the context of the post. The use of the additional pycountry module which contains a library of canonical country names by ISO-3166 country-codes is entirely optional but helps to create a more consistent label.

For the location record above, we can generate the following additional content snippets:

#US #UnitedStates #NYC


#!/usr/bin/env python

import codecs
import geopy.geocoders
import json
import pycountry
import sys

geocoder = geopy.geocoders.Nominatim(user_agent='gplus_migration', timeout=None)

def get_location_hashtags(loc):
  hashtags = []
  if 'latitude' in loc and 'longitude' in loc:
    addr = geocoder.reverse((loc['latitude'], loc['longitude'])).raw
    if 'address' in addr:
      addr = addr['address']
      cc = addr['country_code'].upper()
      hashtags.append(cc)
      hashtags.append(pycountry.countries.get(alpha_2=cc).name.replace(' ',''))
      for location in ['city', 'town', 'village']:
        if location in addr:
          hashtags.append(addr[location].replace(' ', ''))
  return hashtags

def get_location_link(loc):
  if 'latitude' in loc and 'longitude' in loc and 'displayName' in loc:
    map_url = ('https://www.openstreetmap.org/?lat=%s&lon=%s&zoom=17' % (loc['latitude'], loc['longitude']))
    return '[%s](%s)' % (loc['displayName'], map_url)

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

for filename in sys.stdin.readlines():
  filename = filename.strip()
  post = json.load(open(filename))  
  if 'location' in post:
    print(' '.join(('#' + tag for tag in get_location_hashtags(post['location']))))
    print(get_location_link(post['location']))