With this latest release, Webrecorder introduces a new way to enable users to capture technically complex web sites: Autopilot.
Autopilot can perform actions on the current web page loaded in Webrecorder, similar to a human user: clicking buttons, scrolling down, expanding sections, and so forth. It does so via “behaviors,” carefully written, scripts that are adapted to the specific design and structure of certain web sites. To start, the Webrecorder team has prepared website-specific behaviors for a few popular sites, listed in our guide, and we hope to expand this list in the future.
When users visit a page a has a website-specific behavior is available for, the Autopilot button (located near the upper right corner of the screen) will be highlighted in blue. For all other pages, Autopilot is still available but comes with the “Default Scrolling” behavior, which performs more general actions that may work on any page, currently including scrolling and pressing “play” on embedded video and audio.
Once launched, Webrecorder users can watch the Autopilot perform the chosen behavior, which runs until it completes, the web site stops loading content or it is intentionally stopped.
Expanding human scale web collecting
Webrecorder’s goal for web archiving has always been to create “high fidelity” copies of web pages with a close resemblance to the original, including interactive features. The most logical way to achieve this is to allow Webrecorder users to capture web pages by simply interacting with sites as they are normally designed to be used. However, to capture many resources that benefit from this high fidelity approach, web archivists ended up having to click up to hundreds of buttons or images as well as keeping track of each item so to get the amount of information they were seeking. This process of course can be tedious and prone to error.
Autopilot behaviors are simulating these human-like interactions on a carefully chosen group of popular sites. Autopilot does work at a human-like pace. While it doesn’t necessarily rake in Gigabytes of information per minute, and doesn’t branch out to hundreds of web pages in parallel, it does capture important details and context with little to no manual work required.
It’s important to note that this process can never be perfect, especially given most web sites that are being collected undergo technical changes on a regular basis:
As the example of Twitter rolling out a new interface illustrates, an Autopilot behavior might have been created for an older version of a web site and then be unable to work with the freshly released design. For this reason, each behavior displays a date when it was last known to be working. As always, users can report errors and issues via the bug button in the Webrecorder user interface, including if they observe Autopilot features not working as expected.
Also, every behavior has been created with certain assumptions of what a high fidelity copy of a web page should contain. These parameters are written out in the behavior’s description. We hope that the initial sets matches with what our users would like to capture, and welcome feedback from users via firstname.lastname@example.org
Finally, the Autopilot is working within technical constraints imposed by web browsers, as well as how each web site operates. For instance, a social media site that indicates a user account has been active since 2010 might not allow Autopilot to actually scroll back to that beginning. Limits are also to be expected for accounts that have been very active and contain several thousand posts, tweets, or comments. It is important to understand that Autopilot is not a full substitute for API-driven access or data export features of social media sites frequently employed for some “big data” use-cases—instead it specializes on the perspective a regular user would view the web.
The future of Autopilot
The Autopilot is a framework to run behaviors, and is built with the understanding updates and expansions will be needed on an ongoing basis. As web sites are constantly being redesigned, and web browsers support new features while deprecating others, behaviors will require constant attention and tweaks to continue being useful to Webrecorder users.
Webrecorder’s goal is to expand the Autopilot’s capabilities and maintain behaviors to be applicable as long as possible, ideally with the help of open source contributors. We hope to encourage the community to help us in creating additional website-specific behaviors as we won’t be able to do it all ourselves! In the coming weeks, we will have more resources detailing the technical infrastructure of our new behavior system. In the meantime, if you are interested in how the behavior system works, take a look at our preliminary documentation.
Maintenance and growth will require a lot of resources, including bandwidth and skilled labor, so at some point in the future usage of Autopilot will have to be metered and possibly be offered at some financial cost to users. For the introduction of the feature, Webrecorder.io offers the Autopilot to all users without many constraints.
Thanks to our testers!
The Webrecorder team is grateful to our amazing group of testers that have given Autopilot a spin and provided invaluable feedback: Penny Baker, Lisa Barrier, Helene Brousseau, Sumitra Duncan, Steven Gentry, Kathryn Gronsbell, Anisa Hawes, Stephen Klein, Caspar Lam, Amye McCarther, Christie Moffatt, Genevieve Milliken, Sami Norling, Andrea Puccio, and Alex Thurman.