I would like to write a new BridgeDB Distributor. Bridge distribution via Twitter direct messages would be a new channel with high collateral damage upon censorship. I would therefore write a Twitter bot implemented as a BridgeDB Distributor, with an optional rate control mechanism. I would also prepare the way for other distributors by writing reusable code, and hopefully starting work on another bot-like distributor.
I would like to write a new BridgeDB Distributor: a distributor that would act as a Twitter bot, responding to direct messages sent by Twitter users. We have been discussing this project with isis and sysrqb a bit, and I think I have a realistic plan for a useful new distributor, as well as ideas for possible further expansion, should the critical deliverables be met in good time.
The main plan is:
write a Twitter bot that responds to PMs (what Twitter calls "direct messages"):
produce and populate a new bridge hashring
be able to test hashring using fake bridge descriptors
write Twitter bot: having received a direct message "get bridges", respond by giving out a few bridge lines: get bridges nearest to hashed(twitter_handle) in the twitter hashring
written using Python Twisted, by incorporating the RESTful Twitter API
isis wrote SSL certificate chain verification for Twisted, would reuse this.
the new distributor would be a new class inheriting from bridgedb.Dist.Distributor
(obvious: send in multiple messages if needed: if bridge lines in response include fingerprints (and/or passwords), ~2 bridge lines per message can fit (obviously split per bridge line as needed))
use multiple Twitter access tokens if possible, if we conclude that there might be problems with Twitter per-token POST rate limiting
as of now, we can see that Twitter only allows sending direct messages to those who follow the sender:
some accounts get a feature where one is able to choose whether to be able to receive messages from people they do not follow
this is maybe follower-number-related
consider asking Twitter to enable this (the plan assumes this is not possible, but this would make things easier for the users)
we can catch "follow" events. So as of now, the plan is to implement the workflow/pattern as follows:
be able to parse messages with keywords matching pluggable transports (e.g. "get obfs3 bridges", "get fte bridges"), give appropriate bridges when possible
provide meaningful error messages
be able to 'talk' in several (most relevant) languages? (if at all needed, this should be a core feature.) I am thinking that the instructions (response to 'help') and error messages should be carefully crafted
write a rate limiting/control mechanism for the bot:
this includes further discussion whether this is needed, how easy it is to create new handles and use them to get bridges in bulk
the idea here is to write a working rate limiting mechanism anyway, and be able to turn it on/off
as of now, the plan would be to serve CAPTCHAs via Twitter Media CDN (we assume that if Twitter is not blocked, its CDNs are not, either)
Twitter direct messages can display images. What this would mean for the end user is that they would see the image in their message, and would respond directly (and privately), as if having a normal conversation
reuse parts of IPBasedDistributor, reCAPTCHA code, where applicable
how can we serve CAPTCHAs (e.g. if generated via GIMP) most efficiently? As of now, I do not see a scalable and decent way to serve them on-demand via Twitter CDN. Can we use some other CDN to 'proxy' the images? Overall, this does not sound like an elegant idea.
we can pre-upload everything to e.g. Twitter CDN. We can also not do that, upload on-demand, and remember which ones were uploaded (this is for the case if this kind of 'cache' expires.)
Further expansion plans would be (at least a subset of these is very desirable, but outside of critical deliverable scope, unless we decide otherwise):
write code or refactor code in a way that would make it easy to be reused by other distributors
for example, an XMPP+OTR distributor is highly desireable
it would be great to make sure the actual distributor and bot parts are mostly ready for it (i.e. would require as little change as possible). The actual XMPP+OTR handler will be more difficult, but can be done in pure Python
would probably run on a separate machine from bridgedb. bridgedb would handle only highly sanitized input: most likely, specific requests into the hashring, with an authentication token coming from the other machine/instance.
ideally: write all or part of the XMPP+OTR distributor
discuss IRC distributor options and nuances (especially rate control). This is easier to implement in and of itself
discuss WhatsApp distributor development options. WhatsApp censorship is undesirable
consider and, if needed, implement alternative rate control mechanisms for the Twitter Distributor
consider including more info in direct message responses, e.g. links to Tor download mirrors (sometimes that is where the effective censorship 'bottleneck' is) and/or torrent magnet links?
make a definite conclusion whether rate control for Twitter distributor is needed. Perhaps it is needed "sometimes":
discuss, ideally help with bridgedb API from within tor-launcher
In terms of general code architecture, the idea would be to write a new generic bridgedb.Dist.Distributor with a hashring for 'handles'. (IRC would reuse most of this, XMPP / other federated communications systems might inherit and expand/override to have subrings per domain/network, etc. (this can also be done for Twitter - do we need it? Probably not; but worth discussing/thinking a bit.))
Twitter distributor would subclass this handle-based distributor, and implement actual bot functionality via Twisted. Parts of existing code can be reused.
Discussion points:
Rough timeline (to be discussed later on / made more concrete as needed):
write a working Twitter bot PoC with the user-follow->bot-follow->user-send-DM->bot-send-DM flow
=> as soon as possible, ideally in the coming days
=> done, at https://twitter.com/wfntestacct
generic 'handle' distributor (test out hashring / get familiar with this)
working Twitter bot as a subclassed 'handle' distributor
=> until June
rate control mechanism (first approximation of / something that we can test out)
=> until July (this is conservative; hopefully earlier; but allows for all sorts of snags, and delay from before, if any)
[27th June is mid-term evaluation deadline]
a working, clean, robust version of Twitter distributor, with deliverables and features as discussed with developers
=> until August (this is again a tad conservative, but allows for expansion, lots of refactoring and discussion, etc.)
tests, documentation (hopefully some or a lot of this before August)
=> 11th August [this is 'soft pencils down' date]
whatever we wanted from expansion plans to fit into GSoC scope, it should go here (and earlier / before this point, provided all good with core deliverables)
=> 18th August [this is 'hard pencils down' date]
Working PoC for a bridge-distributor-twitter-bot: https://github.com/wfn/twidibot (as of now, it can be interacted with here: https://twitter.com/wfntestacct)
torsearch backend code from last year is at https://github.com/wfn/torsearch. (Probably most / a lot of work at the nasty bottleneck solutions (in the *.sql scripts) and in the onionoo_api query logic.) But more code samples possible.
I would like to continue with my efforts to help the Tor community, and to develop for the Tor Project.
Besides Tor, nothing substantial in terms of free software development. Active free software user and supporter. Experience in using tools of the (open source) trade.
My plan is (and I have spent time making sure this is possible) to be able to devote as much time to GSoC this summer as last summer (if not more.) I will be doing a "0.3"-time (basically quarter-time) job (which is mostly remote) for my faculty (light sysadmin/programming.) No academic obligations throughout the whole coding period, though.
Yes, I think so. It should become a natural part of the BridgeDB codebase, but if all goes well, it will get deployed and actually used. First $x months will probably require at least some (if not continuous) attention. In addition, further work on BridgeDB definitely possible. In any case, no plans on disappearing!
Bi-weekly reports to @tor-dev, further discussion on @tor-dev (or with wider community, e.g. @tor-talk) or privately. Problems, etc. discussed via email; also via IRC for more synchronous discussion.
Vilnius University, philosophy undergraduate, 3rd year.
I would be very excited to be able to focus my efforts on Tor once again! I hope to continue to be involved.
This is the only GSoC project I am applying to.
Contact:
kostas at jakeliunas period com
XMPP: phistopheles at jabber period org
IRC: wfn
4096R/0E5DCE45