Visiting Mozilla

It has been a few months since I completed my Outreachy internship with Mozilla last summer. I’d like to reflect on the opportunities I was given to visit the Mozilla offices in Paris and San Francisco.

One of the benefits of the Outreachy program is the opportunity to travel during the internship. At the invitation of my mentor Adrian, I was able to do some work from Mozilla’s office in Paris. The time I spent in Paris was very constructive. I met other employees at different stages of their careers, some of whom had been through the internship experience themselves. As well as being able to ask these people for advice, I was able to get a sense of the office culture and a feel for what it is like to work in a technical team.

The opportunity to ask questions and bounce ideas around in person was invaluable. I am a big believer in the value of remote working, having watched many people close to me successfully integrate a productive work life into the rest of their life that way. I do however think that being present in person every now and then is very important for building the collaborative relationships upon which remote work relies. During the trip to Paris I got a lot of work done on the Socorro webapp, allowing panels and graphs to be added more easily, a substantial task that required collaboration.

I am impressed with Mozilla’s improvements to their Outreachy programme: they have expanded the number of participants they now take, and now provide laptops (including retroactively to my cohort) and a travel budget on top of the Outreachy travel budget. I happened to be in San Francisco in Autumn last year and was invited to Mozilla’s office in the city to meet Lonnen, who has also worked on Socorro. As well as putting another face to another name, it was great to feel included and valued.

Since completing the Outreachy programme, I have been working hard on my PhD in theoretical systems biology. The Outreachy experience has helped me improve my technical skills and develop the confidence to interact with the open source community, asking questions, filing issues and submitting patches. I am still exploring different programming languages and packages for analysing data (I am mainly working with Julia at the moment), but I can see that plotting interactive graphs in JavaScript is going to be an important component of my work, and my experience working with Mozilla’s Socorro and Metrics Graphics projects is already proving useful.

Advertisements

My Outreachy internship with Mozilla

About me

I recently completed an Outreachy internship with the Socorro team at Mozilla. I wanted to gain experience working at a high-profile tech company and I chose this internship for a couple of reasons. I liked the sound of the Socorro project because it involved mainly front-end work, and I am sympathetic to the goal that Outreachy is trying to achieve: supporting the employment of women and other underrepresented groups in the tech industry.

I am about to finish a Masters in bioinformatics and will be starting a PhD in the field next term. I am most comfortable working with JavaScript and my most recent university work has included writing statistics modules in Node.js and interactive graph tools using D3.

About Socorro

Socorro collects, processes and stores crash data about Firefox and other Mozilla products, so that problems can be identified and solved quickly. Crash-stats is a webapp built in the Django framework that provides users with a view into the crash data, and this is achieved partly through graphs. My job was to work on the Socorro team, guided by my mentor Adrian Gaudebert, to help maintain and improve crash-stats, focussing particularly on the graphs.

When I started the project, the graphs on crash-stats had been added at different points throughout the site’s history, often in response to a particular request from a user. Several different JavaScript graph libraries were being used:

graphs_old

Many of the graphs were in frequent use, but a few were disused or even broken. The plan was for me to redraw the useful graphs using a single graph library, and then to create an interactive tool for drawing customised graphs, as an early step towards a long-term goal of giving users more control over what data is visualised.

Redrawing the graphs

Once we had identified the useful graphs, my first task was to redraw them using Metrics Graphics, a Mozilla-made graph library specialising in time-series graphs. Metrics Graphics is built on top of D3, a powerful graph library that manipulates the DOM based on data. Drawing a graph directly in D3 entails specifying each element of an svg, resulting in code that is difficult to maintain, but Metrics Graphics provides a convenient API while making many of the basic design decisions and adding interactivity.

We encountered a few bugs, which is to be expected from such a new library, but the great thing about using an open source library was that I was able to discuss issues with the team working on it and make upstream fixes myself where necessary. The team responded really quickly, advising me, merging my fixes and adding the issues I didn’t get round to fixing to their milestones. Here are some examples of the updated graphs:

graphs_new

The new graphs are more consistent in terms of colour, font, style and interactive behaviour. I prefer the Metrics Graphics philosophy of emphasising the data, and giving the axes and labels a less overbearing presence so that they can serve their purpose of being for reference rather than a major part of the visuals. This does result in Metrics Graphics having some slightly controversial policies, such as making the auto-generated axes shorter than the plots in some situations; however they are starting to offer options to overrule such decisions. There is still some work to be done: some of the old graphs are still present and there are a few bugs concerning the axis labels and resizing.

Custom graph tool

Another aim of the project was to create a tool for drawing customised graphs on the new signature page. This page, a replacement for the old signature reports page, contains multiple tabs that allow the user to view the data in different ways. One of these tabs already offered customised data aggregations in table format, and we wanted to add a new tab for making graphs of this aggregated data, broken down per day.

As preparation for this new graphs tab, I reorganised the code for the tabs. The idea was to have a more object-orientated structure, centred around the tab and the panel within the tab, each panel containing a separate, customised visualisation of the data (e.g. a table or graph), according to the user’s parameters. Once the new middleware functionality to support this was implemented by the team, it was then relatively straightforward for me to add a tab for drawing graphs on multiple deletable panels, within this framework. Here is an example of the same data presented in table and graph format:

aggregations_graphs

We faced a few dilemmas, for example the question of the maximum number of lines to display on a line graph. Aggregating crash data on certain fields can result in tables with many rows, and a graph depicting all of these rows could end up looking quite complicated:

multimultiline

We chose to display a maximum of four lines on each graph – depicting the four datasets with the highest number of crashes – and give a summary of the other rows as text below the graph, like this:

build_id

While higher numbers of crashes may be of greater interest, only showing the top four is potentially restrictive, so we have probably not quite found the ideal solution yet. One improvement could be to allow the user to click on and view any dataset.

Profile page

The third and final strand of my internship was to create a profile page for signed in users that would display user-specific information. This information, which includes a list of permissions and summaries of crash reports and API tokens, is currently being displayed on three separate pages. The aim was to create a new page that would display all this information in one place, with a view to adding more personalised information in the future. The new page is visible here to a logged in user, but is still being tested so it has not replaced the old pages yet.

There is lots to be done to the profile page in the future, and the team have mentioned plans to track what a user is most interested in and automatically show the most relevant graphs in a dashboard-like display.

Going forward

More generally there is plenty of work to be done on making crash-stats more personalisable, and this is the direction the team intend to take in the future, as summarised in this blog post by Adrian. I’ve heard talk of a move to reinforce the JavaScript, which I think would be a very good idea. The site is full of visual components such as panels and interactive components such as forms, that are mostly implemented separately, meaning that they come with different appearances, different behaviour and different bugs. To really sharpen up the user interface and code base in preparation for creating a more interactive site, it would be great if there could be a more library-like approach to the JavaScript, with one universal table, one universal datepicker, etc.

I think even the new Metrics Graphics graphs could be made more universal: at the moment, every time a line graph is drawn certain Metrics Graphics options are chosen to format the graphs according to preferences that work best for Socorro. It would make more sense to have a universal Socorro line graph that already had those options defined, and only pass in what is unique to each graph, such as the data.

What I have learnt

Before this internship I would not have had the confidence to make a pull request to a team of people who didn’t know me, but during the internship I found myself contributing to two different Mozilla projects and I am looking forward to making further contributions in the future. Before I started working on Socorro I had never worked with Django, and had only worked with Python to a moderate level. It was a steep learning curve to understand how all the different components were interacting and to learn about concepts such as decorators and mocking, but by the end of the project I had managed to make a new Django app.

I’d like to thank the Socorro team for helping, teaching and encouraging me, and particularly Adrian, who was a very attentive and approachable mentor: he met with me every week, travelled to London and Paris to work alongside me and refused to let me make him a cup of tea on my first day, because “that’s not what interns are for”.

Final words about Outreachy

I hear that Mozilla are making some improvements to their Outreachy programme, which I welcome. I would advocate for Outreachy interns to be given all the same privileges as Mozilla interns, including a Mozilla account, which would have allowed me to participate in team meetings and integrate much better. While I appreciate that Outreachy has a lower barrier to entry than the Mozilla internship, I think it’s dangerous to have an outreach programme with fewer privileges and lower expectations, because it automatically categorises the people who are struggling to be employed in the industry as less employable. An outreach program should get people from underrepresented groups through the door, and then give them the same opportunities as everyone else.

Mozilla could do more to raise awareness of the Outreachy programme among its employees. The Socorro team were wonderful and had had Outreachy interns before, but a quite a few people I encountered had never heard of Outreachy. Having to explain to them that I was not a Mozilla intern reinforced the idea that I was a bit of an outsider: the very problem that Outreachy is trying to address.

These issues aside, I had a really great experience and learnt a huge amount, and from what I hear there are some very positive changes coming: I hope these will give future Outreachy interns the same experience as Mozilla interns.

Weeks 11-12

In addition to making new features, I have been fixing bugs related to these new features. There has been a steady trickle of these bugs, some of which have been introduced alongside a new feature, some of which had already existed in an old feature and a couple of which have required upstream fixes in the graph library, Metrics Graphics. One of the latter was quite interesting, so here it is in detail.

One of my colleagues filed a bug about a graph that was displaying some data about the ratio of crashes each day over a couple of weeks. There was a graph and a table that were supposed to be showing the same data, but for some reason they were different. In the table, the ratio of crashes increased dramatically on the 6 August:

Screenshot from 2015-08-14 01:02:22

But in the graph the spike was on 5 August:

Screenshot from 2015-08-14 00:52:53

First I tried to reproduce the problem, but when I did, I saw this graph, which agreed with the table:

Screenshot from 2015-08-14 01:00:11

Strange. How were the graphs were being made? The data came from our Python middleware, with each point on the graph being represented by two numbers: one for the ratio and the other for the date, as a time stamp (in this case an integer representing the number of seconds since 1 January 1970, in UTC time). Our JavaScript then passed this data on to Metrics Graphics, but first it converted the dates into JavaScript Date objects, as required by Metrics Graphics. (The value of a Date object is also a time stamp.) Finally Metrics Graphics was drawing the graph, using the library D3 to convert the ratios and time stamps into the pixel coordinates of each point, and to draw suitable axes.

Our respective graphs were being drawn using the same data, with the dates encoded as the same numbers. So why were they different, and what was so special about me that made the graphs work for me but no-one else?

I’m British, that’s what.

It was a time zone issue. The dates in the table were the dates according to UTC time, as were the JavaScript dates we were handing over to Metrics Graphics. But in order to plot the points and draw the axes, D3 was taking the time stamps and converting them to local time, which in the UK is the same as UTC time. (Actually since it’s summer, the UK time and UTC time differ by 1 hour, but this was not enough to affect the dates here. The times on this particular graph are set to midnight, however, so even my graph was slightly wrong: the points should have been aligned with the ticks on the x-axis.)

The solution was for Metrics Graphics to offer a new option to convert the time stamps in UTC time instead. So now everyone sees this:

Screenshot from 2015-08-14 03:29:51

Weeks 9-10

The page I have been working on is just a single page with some tabs, but like all Django apps it has models, views and URLs (all Python), HTML templates, and some CSS to make it look pretty. It also has some JavaScript to control switching between tabs and give interactivity to the content within each tab. This post is about refactoring some of that JavaScript.

Over the weeks I have added two new tabs to the signature page. The functions to load each tab were values of an object, like this:

var tabFunctions = {};

tabFunctions.coolTab = function () {
    // Code for loading this tab
};

tabFunctions.anotherCoolTab = function () {
    // Code for loading this tab
};

When the user clicks on the tab named “coolTab”, tabFunctions.coolTab gets called, and to add a new tab, you add a new function to tabFunctions. (Plus some additional Python, HTML and CSS, but let’s ignore that.) The advantage of doing it this way is that it’s flexible, but the downside is that quite a lot of code is repeated between the different functions in tabFunctions. After I added the first tab, I decided to refactor the code to make it easier to add the second tab.

The tabs share a lot of things in common but also differ in some ways, so it made the most sense to create a generic Tab object, which could be extended by coolTab, anotherCoolTab, etc. Each tab consists of a panel that has a heading, a div for controls (so that the user can control the content of the tab) and a div for the content:

<section class="panel">
    <header>
        <h2></h2>
    </header>
    <div class="body">
        <div class="controls"></div>
        <div class="content"></div>
    </div>
</section>

And each tab needs to be able to:

  • Load its controls
  • Load its content initially
  • Load more specific content as specified by the user
  • Show itself (when it is clicked on)
  • Hide itself (when another tab is clicked on)

Loading controls and loading content should be kept separate because when new content is loaded in response to the user setting the controls, you don’t want to reload the controls too. So far the generic tab looks something like this:

var Tab = function (heading) {
    this.heading = heading;
    this.alreadyLoaded = false;
    // Make all the elements and append them together
};

Tab.prototype.loadControls = function () {
    // Make the controls (e.g. a select with some options)
    // Bind an event to call loadContent (e.g. when the user chooses a new option)
};

Tab.prototype.loadContent = function () {
    // Load the content, with either default or user-specified parameters
};

Tab.prototype.showTab = function () {
    if (!this.alreadyLoaded) {
        this.loadControls();
        this.loadContent();
        this.alreadyLoaded = true;
    }
    // Then show the tab
};

Tab.prototype.hideTab = function () {
    // Hide the tab
};

To make a specific tab, a new object can be defined that inherits the prototype of the generic Tab object and adjusts the functions loadControls and loadContent. However, it is in loadContent where a lot of the code is repeated, so doing it this way would still result in a lot of unnecessary work. In fact loadContent essentially does three things that can differ slightly between different tabs:

  • Gets parameters (default for the initial load, user-specified for subsequent loads)
  • Builds a URL, which encodes these parameters and encodes which view this tab talks to
  • Makes an AJAX request based on this URL. This is the same for every tab, except for the function that is called if the request is successful

So loadContent should be adjusted and three new methods added:

Tab.prototype.getParameters = function () {
    // Get and return the parameters
};

Tab.prototype.buildURL = function (params) {
    // Build and return the URL
};

Tab.prototype.onAjaxSuccess = function () {
    // Add the new content to the tab
};

Tab.prototype.loadContent = function () {
    var params = this.getParameters();
    var url = this.buildURL(params);
    // Make AJAX request that uses url and calls this.onAjaxSuccess if successful
};

The three new methods should do enough to load a basic tab, but can be altered if a specific tab needs specific behaviour. With this structure, a tab that generates its URL in a slightly unusual way can simply alter buildURL, rather than needing a whole new loadContent function, for example.

One final layer of complexity was that some tabs behaved differently, for example some loaded graphs while others loaded tables and some had inner panels while others did not. In general this was an exercise in finding the right balance between keeping the generic tab simple but repeating code for the specific tabs, and having configurations and lots of “if” statements complicating the generic tab but keeping the specific tabs simple.

Weeks 7-8

In my previous post I discussed making graphs, or rather redrawing existing graphs using a new library. Since I was only redrawing the graphs, I only discussed the final step for making a graph: feeding the data to the graph library. I have also been making new graphs from scratch, however, which is a bit more complicated. I’ll first describe how a graphing tool works in crash-stats:

1. First some parameters are chosen (either by the user, or by whoever set the default parameters), such as dates of crashes, products that crashed, versions of those products, etc.

What the user sees during step 1.

Screenshot from 2015-08-06 14:19:25

2. These parameters are encoded in a URL and sent off to the server. Since we are using Django, the URL also specifies which Django view to send our parameters to.

What the user sees during steps 2 – 4.

Screenshot from 2015-08-06 14:49:22

3. The view function specified by the URL is called. The view does some processing of the parameters and sends them to the middleware API. The view then returns the data it gets back from the middleware as a JSON object.

4. The JavaScript processes this returned data into the format required by the graph library.

5. Now we are ready to do everything in my previous post: hand the data over to Metrics Graphics and receive back a beautifully drawn graph.

What the user sees after step 5.

Screenshot from 2015-08-06 14:21:33

So, code-wise, making a new graphing tool involves writing: a form so the user can choose parameters, an AJAX request to be performed when the form is submitted, a new Django view (which will (a) process the parameters, (b) talk to the middleware and (c) return the data), and some functionality to format the data for the graph, draw the graph, and add it to the DOM.

A final note about testing Django views, in particular mocking. The idea of the test is that you check that your view does everything it is supposed to: that it processes the parameters and sends them in the correct form to the middleware, and that it processes the data that it gets back from the middleware correctly. What you don’t want to do here is test the middleware, so you define a function that mocks the call to the middleware. Inside this function you can check that your parameters have been passed through correctly, and you can also ask it to return some data in place of what the middleware would have returned, and then check that the view processes it correctly.

Weeks 5-6

This past fortnight I have been getting well and truly stuck into graphs. As you can imagine, there are many graphs on the crash-stats site, displaying various data about browser crashes. At the moment they all look quite different and have different degrees of interactivity, but my task is to redraw them using a single graph library.

I’m not entirely new to making graphs. At university I recently did a project using the JavaScript graph library D3. The project involved drawing interactive chord diagrams to help researchers investigate relationships between genes. D3 was great for this, because it gave me lots of control, meaning that I could easily respond to requests from researchers to change the appearance or add new functionality.

The graphs I have been making at Mozilla are more straightforward, such as bar charts and line graphs, and the problem with using D3 for these is that you have to do a lot of work to achieve something that has been done many times before. So we chose to use MetricsGraphics, a Mozilla-made graph library that uses D3, but hides the complicated mechanics from behind a simple API. Of course you pay for this by reducing the control you have over your graphs, but for straightforward graphs this doesn’t matter too much.

To illustrate my point, here is a worked example of a multi-line graph that’s very similar to one I was working on this week. I have altered the data in honour of today’s strike action by train drivers on the London Underground (apologies to any browser-crash-data enthusiasts reading this). I compare the salary of a train driver on the London Underground to that of a doctor, to help clear up some of the questions that were raised in today’s media about the respective salaries for these two professions. First of all, here’s the graph:

Screenshot from 2015-07-09 20:00:12

And now I’ll show you how easy it was to make. For a multi-line graph, we need two HTML elements: one for the graph and one for the legend explaining which line is which:

<div id="graph" style="width: 600px; height: 200px;"></div>
<div id="legend" style="width: 600px; text-align: center;"></div>

The tube driver salary data (taken from this BBC article) and the doctor salary data (taken from the NHS careers and British Medical Association websites) are structured as an array of data-point objects for each line on the graph:

var graphData = [
    [
        {'date': '2015', 'salary': 49673},
        {'date': '2043', 'salary': 49673}
    ],
    [
        {'date': '2015', 'salary': 22636},
        {'date': '2016', 'salary': 28076},
        {'date': '2024', 'salary': 75249},
        {'date': '2025', 'salary': 77605},
        {'date': '2026', 'salary': 79961},
        {'date': '2027', 'salary': 82318},
        {'date': '2028', 'salary': 84667},
        {'date': '2032', 'salary': 90263},
        {'date': '2038', 'salary': 95860},
        {'date': '2043', 'salary': 101451}
    ]
];

var legend = ['Train driver', 'Doctor'];

All that is left to do is to draw the graph, which involves calling the graph drawing function and passing in an object with some options. Notice how simple this call is, compared to drawing a multi-line graph from scratch in D3.

MG.data_graphic({
    title: 'Projected salaries for a train driver on the London Underground and a doctor over 30 years',
    data: graphData,
    full_width: true,
    target: '#graph',
    x_accessor: 'date',
    y_accessor: 'salary',
    interpolate: 'basic',
    area: false,
    legend: legend,
    legend_target: '#legend'
});

It’s not always exactly this straightforward: it’s sometimes been a bit tricky to get the data into the correct format, and I’ve run into a few problems with automatic positioning of axis labels and ticks, but there is no doubt that this library and others like it make life a whole lot easier.

Weeks 3-4

Over these last two weeks I have been getting stuck into a more substantial task: adding to the signature page. The signature page is where crash hunters can go to find out more information about a particular type of crash. It is organised into tabs, each showing different data, often with a form for further filtering. My task was to add a tab and display a particular graph in it.

Most of my time was spent getting my head around the flow of data between the client, the Django and the database. Instead of writing a long essay about what I did (I’ve written enough of those recently), I’ve made a diagrammatic representation of how the graph tab works. (It’s not quite ready to go live yet – there are a few things to iron out, mainly to do with getting the data from the database.)

Incidentally, the graph is drawn using MetricsGraphics.js, a Mozilla-made graph library based on D3 that specialises in time-series graphs. Worth checking out.