Saturday, April 20, 2019

Email to Disaspora* posting Bot

What I still miss the most after moving from G+ to Diaspora* for a my casual public social network posting is a well integrated mobile app for posting on the go.

The main use-case for me is posting photos on the go, which I now mostly take on my cellphone and minimally process with Google Photos.

One of the problems with the mobile app for Diaspora* (Dandelion in the case of Android) is that the size limit for photo uploads is quite small compared to the resolution of todays cellphone cameras. There is also not much point of uploading  high-resolution images for purely on-screen consumption to an infrastructure managed by volunteers on a shoestring budget. I also liked the ability to geo-tag the mobile posts by explicitly selecting a nearby landmark to obfuscate a bit the current location.

For a few weeks now, I have been sharing my account with a G+ archive bot that is uploading recycled posts from the takeout archive (see here for the first part of the series describing the process). I like the structured formatting and meta-data tags that come from automated processing and since my bot seems to be getting more likes that I do, I am thinking why not keep it around?

I am a heavy email user and email clients are well integrated into the sharing functions of both Android and IOS mobile platforms. Since the posting bot is already using a free web-mail account for error reporting it would be easy to use the same account for sending emails to the bot for post-processing and posting. Only emails originating from my own address(es) should be converted into a post. Thanks to DKIM domain authentication used by most major email providers today, we can somewhat trust the authenticity of the sender information in the header.

This new bot is using the POP3 protocol to access the inbox of the online hosted email account, download the emails, check the senders and extract the plain text and image attachment parts in particular. If available, Exif GPS data is extracted from the images and reverse-geocoded using OpenStreetMap to the rough neighborhood of where the image was taken (see previous post). The images are rotated and scaled to a maximum size for upload. Some simple, hard-coded "business rules" are used to generated additional hashtags for some of common use-cases - primarily photo sharing or link sharing.

The post is then staged the same format and directory structure as for the takeout archive processor so that the same posting bot can be re-used.

Similarly, we can run the new the combination of email processor and diaspora exporter from the crontab on a Raspberry Pi or some other linux based always-on server platform:
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
19 * * * * /home/pi/mail_bot/mail_bot.sh
Where the mail_bot.sh script is as follows:
#!/bin/sh

cd /home/pi/mail_bot
./mail_bot.py --login-info=./logins.json --staging-dir=./staging  --mail-errors
/home/pi/post_bot/post_bot.py --staging-dir=./staging --login-info=./logins.json --mail-errors
The email processing component is in mail_bot.py below. It depends on the module exif2hashtag.py from the previous post as well as on the additional packages dateutil, dkimpy and PIL/Pillow, which can again be installed as pip3 python-dateutil dkimpy Pillow.

The mail section in the logins.json file requires two additional 'pop-server' with the name or address of the email accounts pop3 service and 'authorized-senders' with a list of email addresses wholes messages will be transformed into Diaspora* posts.
#!/usr/bin/env python3

import argparse
import datetime
import email
from email.mime.text import MIMEText
import io
from io import StringIO
from io import BytesIO
import json
import logging
import logging.handlers
import os
import poplib
import shutil
import smtplib
import sys

import dateutil.parser
import dkim
import html2text
import PIL.Image 

import exif2hashtag

ISO_DATE = '%Y%m%d'
ISO_DATETIME = ISO_DATE + '_%H%M%S'

# Extra hashtags for the sites I might be posting links from a mobile reader.
SITES = {
  'www.republik.ch' : ['Republik', 'News', 'media', 'lang_de', 'CH', 'Switzerland'],
  'www.tagesanzeiger.ch' : ['Tagesanzeiger', 'news', 'media', 'lang_de', 'CH', 'Switzerland'],
  'www.youtube.com' : ['YouTube'],
  'wikipedia.org' : ['Wikipedia'],
  'blog.kugelfish.com' : ['Blog', 'mywork', 'CC-BY', 'technology', 'programming'],
}

def send_error_message(txt, email_info):
  """Send a crash/error message to a configured email address."""
  server = smtplib.SMTP(email_info['smtp-server'])
  server.starttls()
  server.login(email_info['username'], email_info['password'])
  msg = MIMEText(txt)
  msg['From'] = email_info['username'] 
  msg['To'] =  email_info['recipient']
  msg['Subject'] = 'error message from %s on %s' % ('mail-bot', os.uname()[1])
  server.sendmail(email_info['username'], email_info['recipient'], msg.as_string())
  server.quit()

def validate(authorized_senders, sender, msg):
  """Check DKIM message signature and whether message is from an approved sender."""
  if not dkim.verify(msg):
    return False
  for s in authorized_senders:
    if s in sender:
      return True
  return False

def header_decode(hdr):
  """Decode RFC2047 headers into unicode strings."""
  str, enc = email.header.decode_header(hdr)[0]
  if enc:
    return str.decode(enc)
  else:
    return str

def export_image(img, outdir, num, max_size):
  """Reformat and stage image for posting to diaspora."""
  exif_info = exif2hashtag.get_exif_info(img)
  gps_info = exif2hashtag.get_gps_info(exif_info)
  latlon = exif2hashtag.get_latlon(gps_info)
  orientation = exif_info.get('Orientation', None)
  if orientation:
    if orientation == 3:
      img=img.rotate(180, expand=True)
    elif orientation == 6:
      img=img.rotate(270, expand=True)
    elif orientation == 8:
      img=img.rotate(90, expand=True)

  destination = os.path.join(outdir, 'img_%d.jpg' % num)
  source_size = max(img.size[0], img.size[1])
  if max_size and source_size >= max_size:
    scale = float(max_size) / float(source_size)
    img = img.resize((int(img.size[0] * scale), int(img.size[1] * scale)), PIL.Image.LANCZOS)
  img.save(destination, 'JPEG')
  return exif2hashtag.get_location_hashtags(latlon)


def export_message(msg, outdir, image_size):
  """Stage message for posting to diaspora."""
  hashtags = ['mailbot']
  content = []
  title = header_decode(msg.get('Subject'))
  if title:
    content.append('### ' + title)
    content.append('')
  img_count = 0
  for part in msg.walk():
    if part.get_content_type() == 'text/html':
      txt = part.get_payload(decode=True).decode("utf-8")
      for str, tags in SITES.items():
        if str in txt:
          hashtags.extend(tags)
      converter = html2text.HTML2Text()
      converter.ignore_links = True
      converter.body_width = 0
      content.append(converter.handle(txt))
    elif part.get_content_type() == 'text/plain':
      
    elif part.get_content_type() == 'image/jpeg':
      img_count += 1
      data = BytesIO()
      data.write(part.get_payload(decode=True))
      data.seek(0)
      img = PIL.Image.open(data)
      for tag in export_image(img, outdir, img_count, image_size):
        if not tag in hashtags:
          hashtags.append(tag)
  
  if img_count > 0:
    hashtags = ['photo', 'photography', 'foto',  'myphoto', 'CC-BY'] + hashtags

  if hashtags:
    content.append(' '.join(('#' + tag for tag in hashtags)))

  content_file = io.open(os.path.join(outdir, 'content.md'), 'w', encoding='utf-8')
  content_file.write('\n'.join(content))
  content_file.close() 

#---------------------
parser = argparse.ArgumentParser(description='Coolect post images referenced from a set of posts')
parser.add_argument('--staging-dir', dest='staging_dir', action='store', required=True)
parser.add_argument('--login-info', dest='login_info', action='store', required=True)
parser.add_argument('--image-size', dest='image_size', action='store', type=int, default=1024)
parser.add_argument('--mail-errors', dest='mail', action='store_true')

args = parser.parse_args()

# Set up logging to both syslog and a memory buffer.
log_buffer = StringIO()    
logging.basicConfig(stream=log_buffer, level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
syslog = logging.handlers.SysLogHandler(address='/dev/log')
syslog.setFormatter(logging.Formatter('diaspora-mail-bot: %(levelname)s %(message)s'))
logging.getLogger().addHandler(syslog)

try:
  # Load login/authentication data from a separate file.
  login_info = json.load(open(args.login_info))
  email_info = login_info['mail']

  pop3 = poplib.POP3_SSL(email_info['pop-server'])
  pop3.user(email_info['username'])
  auth = pop3.pass_(email_info['password'])
  msg_count = pop3.stat()[0]

  logging.info('%d new messages on %s' % (msg_count, email_info['pop-server']))

  for msg_num in range(1, msg_count + 1):
    msg_txt = b'\n'.join(pop3.retr(msg_num)[1])
    msg = email.message_from_bytes(msg_txt)
    sender = msg.get('From')
    subject = msg.get('Subject')

    if not validate(email_info['authorized-senders'], sender, msg_txt):
      logging.info('dropping message from unauthorized sender "%s" - subject: "%s"' % (sender, subject))
      pop3.dele(msg_num)
      continue

    timestamp = dateutil.parser.parse(msg.get('Date'))
    outdir = os.path.join(args.staging_dir, timestamp.strftime(ISO_DATE), timestamp.strftime(ISO_DATETIME))
    if not os.path.exists(outdir):
      os.makedirs(outdir)

    try:
      export_message(msg, outdir, args.image_size)
      pop3.dele(msg_num)
    except:
      logging.info('error exporting msg %d - deleting directory %s' % (msg_num, outdir))
      shutil.rmtree(outdir, ignore_errors=True)
      raise
  pop3.quit()

except (KeyboardInterrupt, SystemExit):
  sys.exit(1) 
except Exception as e:
  logging.exception('error in main loop')
  if args.mail and 'mail' in login_info:
    send_error_message(log_buffer.getvalue(), login_info['mail'])
  sys.exit(1)


No comments:

Post a Comment