Week 8

July 4, 2018

HELLO!!

In this week, I did some experiments and got some good results. Welcome to Week-8 of Google Summer of Code!

In Brief

Task-2 [Under Discussion]:
  • Open issues for unclear metrics

As last week, this week too the Task-2 is being discussed in the mailing lists.

Task-3 [Almost Done]:
  • Create new chainable functions and classes to calcualte the metrics

    • I know, Almost Done is a weird tag, but it’s true. The new_functions almost good to go as soon as the PR adding them is merged into manuscripts.
    • This week I cleaned up the old code, tested it as a module (earlier they were mere scripts), added comments and function descriptions and over all made the code pretty :P
    • I’ve removed the sample_metrics.ipynb notebook from the manuscripts2 folder and shifted it to Example folder in the main directory. This way we can treat manuscripts2 as a module just like manuscripts.
    • I also made a mistake and forgot to change other files related to the new_functions so the PR is still to be merged, but it has all the necessary files now so that should be soon, I hope!
  • Related PR: grimoirelab-manuscripts/pull/67

  • Example metrics: Sample_metrics.ipynb

Once this PR is merged to the main repo, we will start creating the toolchain required to actually create the reports using these functions. I’ll focus on this during this week.

Task-4 [Closed]:
  • Experiment with visualisations: Plotly, Altair, Seaborn

This task was tagged as closed last week. It was to experiment with different plotting libraries and see which ones we can use.

Task-5 [Under Progress]:
  • Make PRs to add fields into grimoire-elk so as to calculate metrics from enriched indices

I Discussed this with Jesus, Valerio and Alvaro and I’ll be working on adding some more metrics to the GitHub data source in grimoire-elk this week.

Task-6 [Almost Done]:
  • Experiment with Altair

In this task, I had to try out Altair to create interactive HTML charts and static png images to be added in a PDF. I was successful in creating these. You can see a sample HTML file here

Task-7-A [Completed]:
  • To create PDF of sample static visualisations of the current metrics that manuscripts presents.

I was able to create the PDF, using matplotlib and pandas.plot() method. You can find it here.

Task-7-B [Under Progress]:
  • Create the tool chain to generate the whole report using the new functions

I’ll be working on writing functions to create the above plots along with structuring the report, using the toolchain, this week. The above PDF was generated using this notebook

Other news:
  • With the help of Valerio and Jesus, I submitted a proposal at PyCon India happening in Hyderabad, India on October (5-6). I am super excited!!

  • The PR I created calculating the default start date as the minimum date from the first activities of all the data sources has been merged. Yay! PR Source

Okay, time to dive in!


In Detail

Task-6:

I’ll discuss Task-6 first as it was the most time consuming of all the tasks. In this task, I had to figure out how to create interactive visualisations in Altair.

  • This HTML file is an example of how interactive charts can be made a part of the manuscripts interactive visualisation tool set.

How to use:-

In the graphs, the first timeseries plot is all the commits in the Project. The Project here consists of manuscripts, perceval and grimoire-elk. The first plot is differentiated in color and each color represents the author of the commit. On hovering over the commit in the plot, you get to know the details about the commit.

You can select a time period by clicking and dragging on the plot. The chart below will change and show the number of commits that are present in the area selected in the above graph. The authors will change, i.e if an author hasn’t made any commits in that time period, then they will be removed from the chart below. It’s quite cool!

If you mess with the second graph, the first one changes. The second chart consists of the names of the authors in the X axis and the number of commits made by them in the Y axis. The commits of each author are differentiated by colours. Each colour represents a repository (if they’ve not contributed to a repo, they won’t have that color in their bar). In the second graph, you can click on a colour and the first graph will adjust itself to show the commits corresponding to that repository inside the project.

Using graphs like these, we can get an indepth view of our project.


  • The second HTML file is more user oriented. The first graph in the HTML, shows the number of commits in the repo(singular). The second graph shows the distribution of the commits by users.

Like the first HTML, you can select the time period in the first graph and the users who made commits in that time period will be highlighted in the second graph.

Unlike the first HTML, if you click on a bar in the second graph, you’ll get to see the distribution of the commits made by that author(the bar represents an author here) over time. This functionality can be used to look at active authors and authors contribution patterns.

Although this task looks easy, it took a lot of time to figure out the most basic and small things.

I’ll be doing a separate post describing how to use Altair, what brushes are and how to create these linked graphs. I can’t right now beause I am short on time :( Please stay tuned!

Task-7:

Okay, I’ve actually divided Task-7 into two parts.

A

The first part is to figure out how to create graphs for the metrics that manuscripts currently produces. I used matplotlib and pandas to create the graph. I am using fpdf library to create the PDF files, it makes the process quite simple, but I am still experimenting with it.

You can checkout the PDF here. This was generated using the code in this notebook.

The difficult part will be to figure out the sizes of these plots as each of them produces unique information.

B

For the second part, I’ve been again reading the code of how mansucripts currently produces the reports. I think we can follow the similar pattern. I plan on adding a bin/manuscripts2 file similar to bin/manuscripts which will be able to calculate the metrics using the classes in manuscripts/derived_classes.py.

I have to think some more though, so this task is still under progress.


Finishing up, tasks for week-9 are as follows:

  • Try to participate in the discussions regarding GMD and other metrics (Task-2)
  • Jesus will be commenting on the fresh commits I made in the PR adding new functions to manuscripts. (Task-3)
  • Create new files (raw and enriched) in grimoire-elk and Open corresponding Issues for them. These files will reflect the raw and enriched PR data that will be used to calculate the metrics(Task-5 extended)
  • Task-6 is closed (Altair works for us and will be used in the final interactive charts).
  • Task 7-B is to create the reports for the metrics manuscripts produces right now, using the new functions.

Read you next week!