Saturday, April 20, 2019

Email to Disaspora* posting Bot

What I still miss the most after moving from G+ to Diaspora* for a my casual public social network posting is a well integrated mobile app for posting on the go.

The main use-case for me is posting photos on the go, which I now mostly take on my cellphone and minimally process with Google Photos.

One of the problems with the mobile app for Diaspora* (Dandelion in the case of Android) is that the size limit for photo uploads is quite small compared to the resolution of todays cellphone cameras. There is also not much point of uploading  high-resolution images for purely on-screen consumption to an infrastructure managed by volunteers on a shoestring budget. I also liked the ability to geo-tag the mobile posts by explicitly selecting a nearby landmark to obfuscate a bit the current location.

For a few weeks now, I have been sharing my account with a G+ archive bot that is uploading recycled posts from the takeout archive (see here for the first part of the series describing the process). I like the structured formatting and meta-data tags that come from automated processing and since my bot seems to be getting more likes that I do, I am thinking why not keep it around?

I am a heavy email user and email clients are well integrated into the sharing functions of both Android and IOS mobile platforms. Since the posting bot is already using a free web-mail account for error reporting it would be easy to use the same account for sending emails to the bot for post-processing and posting. Only emails originating from my own address(es) should be converted into a post. Thanks to DKIM domain authentication used by most major email providers today, we can somewhat trust the authenticity of the sender information in the header.

This new bot is using the POP3 protocol to access the inbox of the online hosted email account, download the emails, check the senders and extract the plain text and image attachment parts in particular. If available, Exif GPS data is extracted from the images and reverse-geocoded using OpenStreetMap to the rough neighborhood of where the image was taken (see previous post). The images are rotated and scaled to a maximum size for upload. Some simple, hard-coded "business rules" are used to generated additional hashtags for some of common use-cases - primarily photo sharing or link sharing.

The post is then staged the same format and directory structure as for the takeout archive processor so that the same posting bot can be re-used.

Similarly, we can run the new the combination of email processor and diaspora exporter from the crontab on a Raspberry Pi or some other linux based always-on server platform:
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
19 * * * * /home/pi/mail_bot/mail_bot.sh
Where the mail_bot.sh script is as follows:
#!/bin/sh

cd /home/pi/mail_bot
./mail_bot.py --login-info=./logins.json --staging-dir=./staging  --mail-errors
/home/pi/post_bot/post_bot.py --staging-dir=./staging --login-info=./logins.json --mail-errors
The email processing component is in mail_bot.py below. It depends on the module exif2hashtag.py from the previous post as well as on the additional packages dateutil, dkimpy and PIL/Pillow, which can again be installed as pip3 python-dateutil dkimpy Pillow.

The mail section in the logins.json file requires two additional 'pop-server' with the name or address of the email accounts pop3 service and 'authorized-senders' with a list of email addresses wholes messages will be transformed into Diaspora* posts.
#!/usr/bin/env python3

import argparse
import datetime
import email
from email.mime.text import MIMEText
import io
from io import StringIO
from io import BytesIO
import json
import logging
import logging.handlers
import os
import poplib
import shutil
import smtplib
import sys

import dateutil.parser
import dkim
import html2text
import PIL.Image 

import exif2hashtag

ISO_DATE = '%Y%m%d'
ISO_DATETIME = ISO_DATE + '_%H%M%S'

# Extra hashtags for the sites I might be posting links from a mobile reader.
SITES = {
  'www.republik.ch' : ['Republik', 'News', 'media', 'lang_de', 'CH', 'Switzerland'],
  'www.tagesanzeiger.ch' : ['Tagesanzeiger', 'news', 'media', 'lang_de', 'CH', 'Switzerland'],
  'www.youtube.com' : ['YouTube'],
  'wikipedia.org' : ['Wikipedia'],
  'blog.kugelfish.com' : ['Blog', 'mywork', 'CC-BY', 'technology', 'programming'],
}

def send_error_message(txt, email_info):
  """Send a crash/error message to a configured email address."""
  server = smtplib.SMTP(email_info['smtp-server'])
  server.starttls()
  server.login(email_info['username'], email_info['password'])
  msg = MIMEText(txt)
  msg['From'] = email_info['username'] 
  msg['To'] =  email_info['recipient']
  msg['Subject'] = 'error message from %s on %s' % ('mail-bot', os.uname()[1])
  server.sendmail(email_info['username'], email_info['recipient'], msg.as_string())
  server.quit()

def validate(authorized_senders, sender, msg):
  """Check DKIM message signature and whether message is from an approved sender."""
  if not dkim.verify(msg):
    return False
  for s in authorized_senders:
    if s in sender:
      return True
  return False

def header_decode(hdr):
  """Decode RFC2047 headers into unicode strings."""
  str, enc = email.header.decode_header(hdr)[0]
  if enc:
    return str.decode(enc)
  else:
    return str

def export_image(img, outdir, num, max_size):
  """Reformat and stage image for posting to diaspora."""
  exif_info = exif2hashtag.get_exif_info(img)
  gps_info = exif2hashtag.get_gps_info(exif_info)
  latlon = exif2hashtag.get_latlon(gps_info)
  orientation = exif_info.get('Orientation', None)
  if orientation:
    if orientation == 3:
      img=img.rotate(180, expand=True)
    elif orientation == 6:
      img=img.rotate(270, expand=True)
    elif orientation == 8:
      img=img.rotate(90, expand=True)

  destination = os.path.join(outdir, 'img_%d.jpg' % num)
  source_size = max(img.size[0], img.size[1])
  if max_size and source_size >= max_size:
    scale = float(max_size) / float(source_size)
    img = img.resize((int(img.size[0] * scale), int(img.size[1] * scale)), PIL.Image.LANCZOS)
  img.save(destination, 'JPEG')
  return exif2hashtag.get_location_hashtags(latlon)


def export_message(msg, outdir, image_size):
  """Stage message for posting to diaspora."""
  hashtags = ['mailbot']
  content = []
  title = header_decode(msg.get('Subject'))
  if title:
    content.append('### ' + title)
    content.append('')
  img_count = 0
  for part in msg.walk():
    if part.get_content_type() == 'text/html':
      txt = part.get_payload(decode=True).decode("utf-8")
      for str, tags in SITES.items():
        if str in txt:
          hashtags.extend(tags)
      converter = html2text.HTML2Text()
      converter.ignore_links = True
      converter.body_width = 0
      content.append(converter.handle(txt))
    elif part.get_content_type() == 'text/plain':
      
    elif part.get_content_type() == 'image/jpeg':
      img_count += 1
      data = BytesIO()
      data.write(part.get_payload(decode=True))
      data.seek(0)
      img = PIL.Image.open(data)
      for tag in export_image(img, outdir, img_count, image_size):
        if not tag in hashtags:
          hashtags.append(tag)
  
  if img_count > 0:
    hashtags = ['photo', 'photography', 'foto',  'myphoto', 'CC-BY'] + hashtags

  if hashtags:
    content.append(' '.join(('#' + tag for tag in hashtags)))

  content_file = io.open(os.path.join(outdir, 'content.md'), 'w', encoding='utf-8')
  content_file.write('\n'.join(content))
  content_file.close() 

#---------------------
parser = argparse.ArgumentParser(description='Coolect post images referenced from a set of posts')
parser.add_argument('--staging-dir', dest='staging_dir', action='store', required=True)
parser.add_argument('--login-info', dest='login_info', action='store', required=True)
parser.add_argument('--image-size', dest='image_size', action='store', type=int, default=1024)
parser.add_argument('--mail-errors', dest='mail', action='store_true')

args = parser.parse_args()

# Set up logging to both syslog and a memory buffer.
log_buffer = StringIO()    
logging.basicConfig(stream=log_buffer, level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
syslog = logging.handlers.SysLogHandler(address='/dev/log')
syslog.setFormatter(logging.Formatter('diaspora-mail-bot: %(levelname)s %(message)s'))
logging.getLogger().addHandler(syslog)

try:
  # Load login/authentication data from a separate file.
  login_info = json.load(open(args.login_info))
  email_info = login_info['mail']

  pop3 = poplib.POP3_SSL(email_info['pop-server'])
  pop3.user(email_info['username'])
  auth = pop3.pass_(email_info['password'])
  msg_count = pop3.stat()[0]

  logging.info('%d new messages on %s' % (msg_count, email_info['pop-server']))

  for msg_num in range(1, msg_count + 1):
    msg_txt = b'\n'.join(pop3.retr(msg_num)[1])
    msg = email.message_from_bytes(msg_txt)
    sender = msg.get('From')
    subject = msg.get('Subject')

    if not validate(email_info['authorized-senders'], sender, msg_txt):
      logging.info('dropping message from unauthorized sender "%s" - subject: "%s"' % (sender, subject))
      pop3.dele(msg_num)
      continue

    timestamp = dateutil.parser.parse(msg.get('Date'))
    outdir = os.path.join(args.staging_dir, timestamp.strftime(ISO_DATE), timestamp.strftime(ISO_DATETIME))
    if not os.path.exists(outdir):
      os.makedirs(outdir)

    try:
      export_message(msg, outdir, args.image_size)
      pop3.dele(msg_num)
    except:
      logging.info('error exporting msg %d - deleting directory %s' % (msg_num, outdir))
      shutil.rmtree(outdir, ignore_errors=True)
      raise
  pop3.quit()

except (KeyboardInterrupt, SystemExit):
  sys.exit(1) 
except Exception as e:
  logging.exception('error in main loop')
  if args.mail and 'mail' in login_info:
    send_error_message(log_buffer.getvalue(), login_info['mail'])
  sys.exit(1)


Thursday, April 18, 2019

Extracting location information from Photos

Photos exported from digital cameras often contain meta-data in Exif format (Exchangeable Image File Format). For images taken with cellphone cameras, this info typically also includes (GPS) location information of where the photo was taken.

Inspired by this previous post on the mapping of GPS lat/lon coordinates from Google+ location data to a rough description of the location, we could also use the location encoded in the photo itself.

We are using again the reverse geocoding service from OpenStreetMap to find the names of the country and locality in which the GPS coordinates are included in.

For the purpose of public posting, reducing the accuracy of the GPS location to the granularity of the city town or village provides some increased confidentiality of where the picture was taken compared to the potentially meter/centimeter resolution accuracy of GPS data that generally allows to pinpoint the location down to a building and street address.

Fractional numbers are represented as ratios of integers in Exif. For example the number 0.5 could be encoded as the tuple (5, 10). The coordinates in the Exif location meta-data are represented in the DMS (Degrees Minutes Seconds) format which needs to to be converted into the DD (decimal degree) format used by most GIS systems including OpenStreetMap.


#!/usr/bin/env python

import sys

import geopy
import PIL.Image 
import PIL.ExifTags
import pycountry

geocoder = geopy.geocoders.Nominatim(user_agent='gplus_migration', timeout=None)

def get_location_hashtags(latlon):
  """Reverse geo-code lat/lon coordinates ISO-code / country / municipality names."""
  hashtags = []
  if latlon:
    addr = geocoder.reverse((latlon[0], latlon[1])).raw
    if 'address' in addr:
      addr = addr['address']
      cc = addr['country_code'].upper()
      hashtags.append(cc)
      hashtags.append(pycountry.countries.get(alpha_2=cc).name.replace(' ',''))
      for location in ['city', 'town', 'village']:
        if location in addr:
          hashtags.append(addr[location].replace(' ', ''))
  return hashtags

def get_exif_info(img):
  """Decode Exif data in image."""
  ret = {}
  info = img._getexif()
  if not info:
    return ret
  for tag, value in info.items():
    decoded = PIL.ExifTags.TAGS.get(tag, tag)
    ret[decoded] = value
  return ret

def get_gps_info(info):
  """Decode GPSInfo sub-tags in Exif data."""
  ret = {}
  if not info or not 'GPSInfo' in info:
    return ret
  for tag, value in info['GPSInfo'].items():
    decoded = PIL.ExifTags.GPSTAGS.get(tag, tag)
    ret[decoded] = value
  return ret

def degrees_from_ratios(ratios):
  """Convert from Exif d/m/s array of ratios to floating point representation."""
  f = [(float(r[0]) / float(r[1])) for r in ratios]
  return f[0] + f[1] / 60.0 + f[2] / 3600.0

def get_latlon(gps_info):
  """Extract the GPS coordinates from the GPS Exif data and convert into fractional coordinates."""
  lat = gps_info.get('GPSLatitude', None)
  lat_hemi = gps_info.get('GPSLatitudeRef', None)
  lon = gps_info.get('GPSLongitude', None)
  lon_hemi = gps_info.get('GPSLongitudeRef', None)
  if lat and lat_hemi and lon and lon_hemi:
    return (degrees_from_ratios(lat) * (-1 if lat_hemi == 'S' else 1),
            degrees_from_ratios(lon) * (-1 if lon_hemi == 'W' else 1))
  else:
    return None

def get_camera(info):
  """Get Camera make & model as another example of Exif data."""
  if 'Make' in info and 'Model' in info:
    return '%s %s' % (info['Make'], info['Model'])
  else:
    return None
  
#------------------------------------------------------

for filename in sys.argv[1:]:
  image = PIL.Image.open(filename)
  exif_info = get_exif_info(image)
  gps_info = get_gps_info(exif_info)
  latlon = get_latlon(gps_info)
  print ('%s : %s %s' % (filename, get_camera(exif_info), get_location_hashtags(latlon)))