Late Nights at the Scriptorium: Interim Results from the Interface Cell of the MONK Project
Stéfan Sinclair, Andrew Macdonald, Matthew Bouchard, Mike Plouffe, Alejandro Giacometti, Amit Kumar, Milena Radzikowska, Stan Ruecker, Piotr Michura, Carlos Fiorentino, Matthew Kirschenbaum, and Catherine Plaisant
The Metadata Offer New Knowledge (MONK) Project is an attempt to leverage emerging text mining, text analysis, and text visualization technologies for use by humanities scholars. Led by PIs John Unsworth and Martin Mueller, the MONK team consists of over 35 researchers at 7 universities in the United States and Canada. The project is organized around five core research areas, or cells, dedicated respectively to data, analytics, users and use cases, collaboration, and interfaces. In this presentation, we will concentrate on several aspects of the work being carried out by the interface cell.
This will be in some senses a project report, describing the current state of our activities, but we also intend to summarize the insights we have gained that we believe may be of benefit to other projects that involve the development of online tools.
Although excellent in many ways, OL turned out, at least at that time, to provide limited handling of text formatting, since it compiles into a Flash object.
Some of the recurring priorities for the MONK technologies are as follows:
- attractive and slick
- works fast
- capable of animation
- capable of scaling for showing many items
- incremental loading of content
- no download required
- able to host or interact with other technologies (Java, Flash, etc.)
- decent control over typography
- XML or JSON support for data interchange
- ability to simulate state
- ability to log (for testing, user undo, and possibly collaboration)
To help us with the evaluation, we attempted to carry out test cases at two levels: first, where it was appropriate to do so (with technologies that include display to the end user, for example) we tried three tests: create a tree-structure; display text with non-roman characters; do a simple 3-D animated visualization with different colors. These demonstrated some basic functionality and allowed us to determine whether things that we know we needed to do would be harder or easier to do in a particular environment.
We also accepted examples that had been done by others, provided that they could satisfy any of the test cases.
We surveyed and experimented with a wide range of potential solutions, including OpenLaszlo, Flash, Adobe Apollo, Adobe Flex, Echo2, GWT, DWR, ZUL, YUI, EXTJS, JQuery, MooTools, jMaki, and RAP.
We also included on our list several solutions that we didn’t find time to address explicitly. These were Prototypejs, Dojo, Scripti.li.cious, Tapestry, and iceFaces.
Among the most significant distinctions were the following:
- client-side vs. server-side management
- embedded objects (Flash, applet, etc.) vs. native browser support (DHTML & AJAX)
- level of programming language (given our current personnel)
As we developed the idea of a Monk Workbench, we narrowed our choices down to two main options:
- RAP, an Eclipse-based server-side framework for building rich web applications
The goal of the next part of the presentation is to advocate, albeit cautiously, for the development and use of proxies for large, distributed projects. By “proxy,” we mean a set of calls that the interface developers can use to obtain data in a standardized format.
Whether the data is real or not is a secondary issue, since the proxy exists to isolate the interface developers from issues at the back end. In fact, it can almost be guaranteed that at the beginning of the project, much of the data available to the interface designers will be faked at the proxy, while as the project progresses, iteratively more and more of the fake data will be replaced by access to the real thing.
Although by its nature a proxy layer is technical, its purpose is therefore primarily managerial. By means of this device, we have attempted on the MONK project to isolate the working environments of the different cells, in order to avoid the situation where the critical path for one cell leads through another cell, and the dependent group could potentially be left waiting. From the perspective of project management, a proxy layer seems to be a necessary condition for the success of a distributed team. However, there are a number of factors that complicate the situation.
First, a successful proxy needs to meet a number of technical criteria. It should be in a form where the calls for data are quick and easy to implement, since a typically iterative design process will require an ever-increasing number of different kinds of data. Once implemented, the details of the individual calls should be held stable, since the whole point of the proxy is that the interface developers can work within an environment that doesn’t shift too much on them. A good proxy needs, therefore, to be both flexible and rigid. It should also be easily accessible from the back end, so that swapping in real data for fake data is not problematic. Finally, it should be the consistent one-stop shop for all data needs. This is particularly difficult to ensure in cases where a number of proxy-like calls are available from the native back-end technology. There is always a temptation to make exceptions, and require the interface developers to access data directly from a source. The problem with succumbing to that temptation is that when the data source changes, the interface needs to be modified to accommodate the change.
The MONK proxy consists of a set of URLs that return XML. We decided to use XML in part to make troubleshooting easier for human readers, although for performance reasons we are converting most of the data to JSON format at the interface. Eventually, it may become useful to provide JSON directly from the proxy.
The final part of this talk will describe our work to date on the tools, toolsets, and workbench. The purpose of the workbench design is to provide maximum flexibility in the selection and reuse of components with maximum support and encouragement for the novice user. The main canvas space contains tool sets that each consists of all the tools necessary to accomplish a particular task (Figure 1). So, for example, the FeatureLens stack will contain all the components of FeatureLens. The Search by Example stack will contain all the components necessary to do supervised classification. The networkgraphiny stack holds all the tools required to perform a social network analysis of a text and visualize the results. Each tool set has a unique and distinctive visual reference to the primary analytic to that tool set. To launch a tool set, the user would either drag it into the socket named “drag n’drop a tool set to start,” or else double-click the toolset.
We recognize in MONK that a great deal of functionality can be provided simply by being able to search and sort across collections, although we also seek to make some fairly sophisticated processes available, such as the D2K supervised classification. We also hope to be able to create visualizations that will help to provide various forms of interactive prospect for both the processes and the results. Our strategy in designing tools is therefore to develop in some cases more than one tool that can perform similar functions. The user will then be able to choose a basic, enhanced, or experimental tool at stages in the process where those options are available. For example, we currently have on our list half a dozen tools that provide various ways to search and sort items, either within or across collections. The simplest of these is a hierarchical tree browser, much like the folder system on the desktop. The most sophisticated include the MONK implementation of the mandala browser, and also a faceted browser that dynamically arranges text tiles according to the available metadata about the documents.
Figure 1. This MONK workbench sketch shows a variety of tools grouped into toolsets, allowing users to approach the workbench from the perspective of a particular kind of task. Some tasks we hope to support include search by example (also known as supervised classification, using either Naïve Bayes or Support Vector Machines), working with timelines, and studying patterns of relationships.
We also wanted to support expert users who may choose to construct new toolsets, as well as modify existing ones. To make that possible, there is a “new toolset” option that would be launched like any other tool set. There is also a complete list of all tools, grouped by type, across the bottom of the screen. The current list of tools includes the following. Whether or not we will be able to include all of these in the initial MONK release remains to be seen:
- Select documents using a list
- Select documents using columns
- Select documents using facets
- Select documents using magnets
- Select random documents
- Select documents using a hyper-graph
- View results
- Read text
- Search by example
- Search by taxonomy
- Explore feature list
- Explore feature overview
- Explore relationships
- Construct a timeline
- Visualize sonic colouring
- Explore geography
- Manage project
- Manage workset
- Re-visit workbench history
- View statistics
- View graphs
Sinclair, Stéfan, Andrew Macdonald, Matthew Bouchard, Mike Plouffe, Alejandro Giacometti, Amit Kumar, Milena Radzikowska, Stan Ruecker, Piotr Michura, Carlos Fiorentino, Matthew Kirschenbaum, and Catherine Plaisant. “Late Nights at the Scriptorium: Interim Results from the Interface Cell of the MONK Project.” Panel presented at the Society for Digital Humanities/ Société pour l’étude des médias interactifs annual conference at the 2008 Congress of the Social Sciences and Humanities, University of British Columbia. June 2-3, 2008.