Empower developers with ML-assisted code review

Waren Long, source{d}.

Empower developers with ML
assisted code review

AI Meetup - April, 18th, 2019

Waren Long

source{d}

About source{d}

sourced.tech

Plan

  1. ML on Code: Origins & Motivation
  2. Lookout
  3. style-analyzer
Cover picture

ML on Code

Software Development Workflow

codacy.com/blog

At Google

Modern Code Review: A Case Study at Google
  A. Bacchelli et al. 2018

25 million PR Review Comments on GitHub

dataset

The Alternative Hypothesis

"Programming languages are inherently harder to write and read... so programmers deliberately write code as unsurprising as possible."

"Code (in all languages) is more predicatble than natural language because it more technical and difficult to learn."

On the Naturalness of Software
  P. Devanbu et al. 2016

Software is bimodal

"Source code is bimodal: it combines a formal algorithmic channel and a natural language channel of identifiers and comments. Because the two channels interact, [...] bimodality is a natural fit for machine learning."

RefiNym: Using Names to Refine Types
  E. Barr et al, 2018

Lookout

When to help ?

  • While you type = IDE
  • While you check = CI
  • While you review = PR
  • Periodically, asynchronously
  • Part of the workflow
  • More time to run the models
  • Nice UI
  • High precision score required
  • Longer feedback loop

Goals

Example of Lookout Comment on GitHub

Architecture

Push event

Review event

style-analyzer

Approach

  1. Parse to intermediate representation
  2. Train Decision Tree Forest
  3. Extract production rules
  4. Generate fixes from mismatched predictions

Representations of Source Code

Token-level models
→ Raw content

Syntactic models
→ Abstract Syntax Tree (AST)

doc.bblf.sh/architecture

Classes Predicted by style-analyzer

whitespace
tabulation
newline
␣+/- whitespace indentation increase/decrease
→+/- tabulation indentation increase/decrease
'/" single/double quotes
empty gaps between non-label nodes, NOOP

Feature Extraction

Annotated Code Snippet

functionclassesToArray(value){
                if(isArray(value)){returnvalue;}
                 if(typeofvalue===␣"string"␣){
                    returnvalue.match(rnothtml)||[];
                }
                 return[];
             }
        

Explainability is key

Generating Production Rules From Decision Trees
  J.R. Quinlan, 1987

Rules

Machine Learning

Evaluation

~95% weighted avg.

Evaluation improvements

Code as Data and ML on Code Applications

Thank you