Sunday, November 18, 2018

Google+ Migration - Part IV: Visibility Scope & Filtering

<- Part III: Content Transformation

Circles and with them the ability to share different content with different sets of people was one of the big differentiators of Google+ over other platforms at the time, which typically had a fixed sharing model and visibility scope.

Circles were based on the observation that most people in real life interact with several "social circles" and often would not want these circles to mix. The idea of Google+ was that it should be possible to manage all these different circles under a single online identity (which should also match the "real name" identity of our governments civil registry).

It turns out that while the observation of disjoint social circles was correct, most users prefer to use different platform and online identities to manage to make sure they don't inadvertently mix. Google+ tried hard to make sharing scopes obvious and unsurprising, but the model remained complex, hard to understand and accidents were only ever one or two mouse-clicks away.

Nevertheless, many takeout archives may contain posts that were intended for very different audiences and have different visibility that may still matter deeply to users. We are presenting here a tool that could help to analyze the sharing scopes that are present in a takeout archive and partition its content by selecting any subset of them.

The access control section (ACL) of each post has grown even more complex over time with the introduction of communities and collections. In particular there seem to be following distinct ways of defining the visibility of a post (some of which can be combined):
  • Public
  • Shared with all my circles
  • Shared with my extended circles (user in all my circles and their circles, presumably)
  • Shared with a particular circle
  • Shared with a particular user
  • Part of a collection (private or public)
  • Part of a community (closed or public)
Since my archive does not contain all these combinations, the code for processing the JSON definition of the post sharing and visibility scope is based on the following inferred schema definition. Please report if you encounter any exception from this structure.

After saving the following Python code in a file, e.g. post_filter.py and making it executable (chmod +x post_filter.py) we can start by analyzing the existing visibility scopes that exist in a list of post archive files:

$ ls ~/Desktop/Takeout/Google+\ Stream/Posts/*.json | ./post_filter.py
1249 - PUBLIC 
227 - CIRCLE (Personal): circles/117832126248716550930-4eaf56378h22b473 
26 - ALL CIRCLES 
20 - COMMUNITY (Alte St├Ądte / Old Towns): communities/103604153020461235235 
15 - EXTENDED CIRCLES 
9 - COMMUNITY (Raspberry Pi): communities/113390432655174294208 
1 - COMMUNITY (Google+ Mass Migration): communities/112164273001338979772 
1 - COMMUNITY (Free Open Source Software in Schools): communities/100565566447202592471

For my own purposes, I would consider all public posts as well as posts to public communities as essentially public and any posts that were restricted to any circles as essentially private. By carefully copying the community IDs from the output above, we can create the following filter condition to selection only the filenames of these essentially public posts from the archive:

ls ~/Desktop/Takeout/Google+\ Stream/Posts/*.json | ./post_filter.py --public --id communities/113390432655174294208 --id communities/103604153020461235235 --id communities/112164273001338979772

We can then use the resulting list of filenames to only process post which are meant to be public. In a similar way, we could also extract posts that were shared with a particular circle or community, e.g. to assist in building a joint post archive for a particular community across its members.

#!/usr/bin/env python

import argparse
import codecs
import json
import optparse
import sys

class Visibility:
  PUBLIC = 'PUBLIC'
  CIRCLES = 'ALL CIRCLES'
  EXTENDED = 'EXTENDED CIRCLES'
  CIRCLE = 'CIRCLE'
  COLLECTION = 'COLLECTION'
  COMMUNITY = 'COMMUNITY'
  USER = 'USER'
  EVENT = 'EVENT'

def parse_acl(acl):
  result = []

  # Post is public or has a visiblility defined by circles and/or users.
  if 'visibleToStandardAcl' in acl:
    if 'circles' in acl['visibleToStandardAcl']:
      for circle in acl['visibleToStandardAcl']['circles']:
        if circle['type'] == 'CIRCLE_TYPE_PUBLIC':
          result.append((Visibility.PUBLIC, None, None))
        elif circle['type'] == 'CIRCLE_TYPE_YOUR_CIRCLES':
          result.append((Visibility.CIRCLES, None, None))
        elif circle['type'] == 'CIRCLE_TYPE_EXTENDED_CIRCLES':
          result.append((Visibility.EXTENDED, None, None))
        elif circle['type'] == 'CIRCLE_TYPE_USER_CIRCLE':
          result.append((Visibility.CIRCLE, circle['resourceName'], circle.get('displayName', '')))
    if 'users' in acl['visibleToStandardAcl']:
      for user in acl['visibleToStandardAcl']['users']:
        result.append((Visibility.USER, user['resourceName'], user.get('displayName', '-')))

  # Post is part of a collection (could be public or private).
  if 'collectionAcl' in acl:
    collection = acl['collectionAcl']['collection']
    result.append((Visibility.COLLECTION, collection['resourceName'], collection.get('displayName', '-')))

  # Post is part of a community (could be public or closed).
  if 'communityAcl' in acl:
    community = acl['communityAcl']['community']
    result.append((Visibility.COMMUNITY, community['resourceName'], community.get('displayName', '-')))
    if 'users' in acl['communityAcl']:
      for user in acl['communityAcl']['users']:
        result.append((Visibility.USER, user['resourceName'], user.get('displayName', '-')))

  # Post is part of an event.
  if 'eventAcl' in acl:
    event = acl['eventAcl']['event']
    result.append((Visibility.EVENT, event['resourceName'], user.get('displayName', '-')))

  return result


#---------------------------------------------------------
parser = argparse.ArgumentParser(description='Filter G+ post JSON file by visibility')
parser.add_argument('--public', dest='scopes', action='append_const', const=Visibility.PUBLIC) 
parser.add_argument('--circles', dest='scopes',action='append_const', const=Visibility.CIRCLES)
parser.add_argument('--ext-circles', dest='scopes',action='append_const', const=Visibility.EXTENDED)
parser.add_argument('--id', dest='scopes',action='append')

args = parser.parse_args()
scopes = frozenset(args.scopes) if args.scopes != None else frozenset()

stats = {}
for filename in sys.stdin.readlines():
  filename = filename.strip()
  post = json.load(open(filename))  
  acls = parse_acl(post['postAcl'])
  for acl in acls:
    if len(scopes) == 0:
      stats[acl] = stats.get(acl, 0) + 1
    else:
      if acl[0] in (Visibility.PUBLIC, Visibility.CIRCLES, Visibility.EXTENDED) and acl[0] in scopes:
        print (filename)
      elif acl[1] in scopes:
        print (filename)
          
if len(scopes) == 0:
  sys.stdout = codecs.getwriter('utf8')(sys.stdout)
  for item in sorted(stats.items(), reverse=True, key=lambda x: x[1]):
    if item[0][0] in (Visibility.PUBLIC, Visibility.CIRCLES, Visibility.EXTENDED):
      print ('%d - %s' % (item[1], item[0][0]))
    else:
      print ('%d - %s (%s):\t %s' % (item[1], item[0][0], item[0][2], item[0][1]))