How AI can help programmers write better code ?

Waren Long, source{d}.

How AI can help programmers write
better code ?

AI & Society - March 4, 2019

Waren Long


About source{d}


  1. MLonCode, Origins & Motivation
  2. Lookout, a Framework for Assisted Code Review
  3. style-analyzer, an Adaptative Code Formatter
Cover picture


Software Development Workflow

At Google

Modern Code Review: A Case Study at Google
  A. Bacchelli et al. 2018

PR Review Comments on GitHub by Topic

dataset extracted from , January 2015

The Machine Learning Perspective

The Alternative Hypothesis

"Programming languages are inherently harder to write and read... so programmers deliberately write code as unsurprising as possible."

"Code (in all languages) is more predicatble than natural language because it more technical and difficult to learn."

On the Naturalness of Software
  P. Devanbu et al. 2016

The Machine Learning Perspective

Software is bimodal

"Source code is bimodal: it combines a formal algorithmic channel and a natural language channel of identifiers and comments. Because the two channels interact, [...] bimodality is a natural fit for machine learning."

RefiNym: Using Names to Refine Types
  E. Barr et al, 2018


When to help ?

  • While you type = IDE
  • While you check = CI
  • While you review = PR
  • Periodically, asynchronously
    Pros and Cons :
  • Part of the workflow
  • More time to run the models
  • Nice UI
  • High precision score required
  • Longer feedback loop


Example of Lookout Comment on GitHub


Push event

Review event



  1. Parse to intermediate representation
  2. Train Decision Tree Forest
  3. Extract production rules

Representations of Source Code

Token-level models
→ Raw content

Syntactic models
→ Abstract Syntax Tree (AST)

Babelfish, a Universal Code Parser

Classes Predicted by style-analyzer

Mixed representation

AST-augmented token stream
with virtual nodes

a = b * 2

Annotated Code Snippet with Style Mistakes

Explainability is key

Generating Production Rules From Decision Trees
  J.R. Quinlan, 1987


Machine Learning


~95% weighted avg.

Evaluation improvements

Code as Data and MLonCode Applications


Thank you