Technical Design Notes¶
StoryMaker Implementation Notes
I. Overview
The StoryMaker mobile application is comprised of two major feature sets, the Media Creator and the Interactive Trainer. These are not the names presented to the user necessarily, and the user experience is more of a blended, back and forth between the two, but in terms of the development and engineering, these should be thought of as two seperate modules of functionality, that expose their functionality as APIs (or within Android, as Intents) to eachother.
All of the specific feature requirements (MUST, SHOULDS and MAYS) are outlined within the Statement of Work agreed upon between the Guardian Project and Free Press Unlimited. These have been translated into issue tickets in GP's project tracker (https://dev.guardianproject.info/projects/wrapp), which have been assigned to development sprints with targeted due dates. Everything within this document is intended to be developed and be applicable to those requirements and timelines.
It should also be noted that audio production is as primary a task as video and photo within this application. Since much of our work with ObscuraCam/InformaCam has been focused on video and audio, we have a natural bias and experience with develping apps which handle these. For this project, expansion into audio is required, and so we must develop a set of code and capabilities that match what we can already do with video and photo. Fortunately, the media processing libraries we are using (mainly FFMPEG) handle audio nearly as well as video, so that we must mostly just develop new user interface code to handle the audio processing.
A. Security by Design
All of the features must be implemented in a manner that is secure by design, working within a threat model that is based on the idea of the device that the app is running on being physically compromised, and used to increment either the operator of the device, or the subjects of their published content. As part of the development of this application, we should also develop a set of guidelines for how a target device should be configured within the capabilities of its base operating system, and the application should be tested with this configuration. For example, the use of full disk encryption on Android ICS/JB with pin-lock screen, should be fully tested with the app if that is the configuration we expect it to be run in for proper security.
II. Media Creator
This module contains three major feature areas: Production, Rendering and Publishing.
A. PRODUCTION: Provide user interface and functionality to enable users to produce media output (video, audio, photos) based on defined project templates.
It must be emphasized that we are not producing an all-purpose video or audio editor type tool, with a traditional timeline+tracks based user interface. The goal is to provide a guided experience that helps the user create a quality output with certain standardized characteristics. It might be thought of as the difference between using an all purpose word processor and a blog posting system.
The templates define a sequence of segments, with a specific set of parameters for each segment. The segment parameters include length, title, filters, transitions and other data necessary to understand how it is to be composes and rendered with the other segments in the sequence. It is intended that the template is defined in standard format such as XML or JSON, so that they can be easily defined using text editor tools, and updated or distributed from a desktop or web server.
Here is an initial concept of what a template JSON description might look like:
{
Name:"Interview",
Segements:{
{
Name:"Opening Shot",
Length:5,
Graphic:{Lower Third,Title Card}
},
{
Name:"Medium Shot",
Length:15,
Processing:"Voice Over"
},
}
}
In this case, the user would be responsible for first loading or recording a five second clip for the opening shot, and entering text to display either on a title card overlay, or as part of a lower third box with text. Then they would load or record a fifteen second clip (or one that would be shortened to fifteen seconds by trimming), and then recording a voice over audio track to play on top of the video.
THIS is the critical capability of the Media Creator app - knowing how to turn a template such as this into a guided set of actions to walk the user through.
B. RENDERING: Processing of projects to export rendered media files.
This process is handled primarily by the FFMPEG for Android library, which provides an Android Java API for combining multiple media clips together with specified filters, and exporting them in a specified media format. This library handles video, audio and image processing, in a variety of combinations.
The project content and configuration set by the user is converted into an output media file for export to storage media, local playback, and upload to a remote server. The FFMPEG for Android project (which perhaps should be renamed the Android Open Media Processor?) is the primary library that intended to used to fulfill this capability. Beyond the base configuration of FFMPEG, we will need to add a number of video and audio filters, for actions such as rendering text, drawing filled boxes, crossfading audio, and combining multiple, different video and audio tracks. We should also explore what built-in media processing capabilities can be taken advantage of in the newer Android operating systems, such as the StageFright system.
Type types of common media processing tasks the media rendering is required to do:
1) Assemble a series of video or audio clips together, with each clip trimmed to a specified start and stop time.
2) Overlay new audio tracks upon another video file.
3) Export into a specific format, with a specified bitrate, framerate, or other quality parameters.
4) Provide basic video processing capabilities such as adjusting brightness, contract or color.
On top of all of this rendering, a suitable user interface must be developed to provide feedback to the user. The renders will at least taken ten minutes, and most likely more like thirty, depending upon the length of the overall projet and resolution of the source content. An overall progress bar should be presented to the user, as well as an indication of the progress through each step. If possible, the entire rendering process should be moved to a background task, and the progress should be shown in the Android notification bar area. Once it is complete, a notification should be displayed to allow the user to tap, and open the media file for playback and optional upload.
C. PUBLISHING: Handle uploading of produced media to remote media servers, and notification of StoryMaker aggregation service.
Once the media is processed, and rendered out to a file, the user must be able to securely upload the file to a remote media hosting service. This upload service must also handle working over very high latency, low bandwidth (EDGE/2.5G), and support resuming in case an upload does not complete on the first (or fifth) try. The service must also work over Tor, if Orbot is available on the device, or otherwise work through any active VPN connections that may be running on the device. Needless to say, but I will, it must also work over Wifi and 3G. Beyond the upload of the media itself, the upload feature will also ping a configured remote web service about the location of the upload. This web service is used to aggregate all StoryMaker uploads into a montoring and evaluation platform, where trainers and experience media creators can provide feedback on the work. The development of that server-side functionality is beyond the scope of this document.
III. Interactive Trainer
A. Overview
Rather than just provide an app for media creation and publishing, and offer training sessions or PDF "How to" documents, the intention of StoryMaker from the beginning has been to integrate the education, training, and learning experience deep into the core of the application itself. Think of complex video games that begin by walking you through an initial training level or mission that helps you figure out the controls to jump, turn, dive, activate buttons, and so on. Perhaps think of this experience, less as an app with a built-in tutorial, and think of it as an integrated path the user will take, to be taught how to be the most effective story teller they can be, using their smartphone. While the content will have specific technical and technique oriented aspects, it will also cover media journalist and narrative filmmaking.
On to the technical bits of this... the Interactive Trainer is more than just a place where "how to" text and graphics are displayed. It is intended to be more courseware than just simple documentation. This means two things: 1) it must keep track of the users progress through the content and 2) it must provide ability to link into the media production app at the appropriate place. If a particular lesson is instructing the user how to create a two minute audio interview for a news radio show, there must be a way to jump right into a new project of the type "Short Interview (Audio)" from the trainer content. This would likely be achieved via custom intents or registered URI handlers within the Android app.
B. Lesson Distribution
Lessons are manifested as bundles of HTML5 content, including javascript, bitmap and SVG graphics, stored together as a zip file. Lessons will be stored on a configured server, and the listing of available lessons can be published as a JSON or XML document, perhaps simply an RSS feed that can be subscribed to. The app will be responsible for displaying the list of available lessons, and downloading them as required, or as necessary, based on the progress of the user through the trainer. When a lesson is opened, it will be unpacked into local storage, either permanently or temporarily based on available space. Every lesson will have a unique GUID assigned to it, so that if lessons are added, updated or changed on the server, it will be clear to the client app.
C. Lesson Display
Once the lesson content is unpacked to local storage, a web browser component within the app will be used to display it. On Android this would be the Webkit-based WebView component, and likely will be based on the Phonegap-originated Apache Cordova project. In addition, the HTML5 content may best be displayed if it is designed around the JQueryMobile library, which provides mobile device specific user interface components.
D. Trainer<->Creator Interaction
A set of specific URIs will be defined for the WebView container componant to handle, such that the HTML5 Lesson content can include them to allow certain actions to happen. Here are some example/proposed URIs:
The "stmkr://" protocol can be used to ensure there is no confusion with standard http or https links.
"stmkr://project/new/interview": this would indicate that a new project should be opened, using the "interview" template.
"stmkr://camera/overlay/ruleofthirds": this would indicate that the built-in camera should be displayed, with the "Rule of Thirds" overlay graphic.
E. Camera Overlay
A unique feature of this app is that it provides guidance to the user on filming quality video shots by showing an overlay graphic on a live camera view. This might be the outline of a person or buildings, or some indication of where subjects might stand, or how much headroom there should be. These overlays do not need to continue when the actual filming is in progress, but are displayed more as a preview or a setup mode before the actual filming or photo capture takes place. This means that we can include a CameraSurface type preview UI with the overlay graphics, and then switch to the built-in camera apps, most likely. However, we must also consider implementing our own custom cameras for media capture if necessary (say the transition from the overlay/preview to actual record is not fast or smooth enough), though this would significantly increase the complexity and risk of the project.