Waren Long, source{d}.
Nantes ML Meetup - July 1st, 2019
 
        "Source code is bimodal: it combines a formal algorithmic channel and a natural language channel of identifiers and comments. Because the two channels interact, [...] bimodality is a natural fit for machine learning."
Earl Barr
fMRI scans of skilled programmers show NLP parts of the brain active when reading code
Decoding the representation of code in the brain
  B. Floyd et al. 2017
"Programming languages are inherently harder to write and read... so programmers deliberately write code as unsurprising as possible."
"Code (in all languages) is more predicatble than natural language because it more technical and difficult to learn."
Prem Devanbu at ML4P
On the natualness of Software
  A. Hindle et al. 2012
On the natualness of Software
  A. Hindle et al. 2012
Modeling Vocabulary for Big Code Machine Learning
  R. Robbes et al. 2019
 
		| Natural Language | Code | 
|---|---|
| I shot an elephant in my pyjamas | 
                         | 
|  |  | 
 
        
        
     
         
         
        • Token neighbors
• AST-node neighbors
• AST paths
 
        
    src-d/gemini, source{d}
 
        Public git archive: a big code dataset for all
  V. Markovtsev et al. 2018
 
         
        
     
        code2vec: Learning Distributed Representations of Code
 Alon et. al. 2018
 
        
        
     
         
        
            Assert.NotNull(clazz)
        
        
        
     
        
        
     
        
            def sum_positive(arr, lim):
              sum = 0
              for i in range(lim):
                  if arr[i] > 0:
                      sum += arr[i]
              return sum
        
        
        ~900 nodes/graph
~8k edges/graph
The Graph Neural Network Model
F. Scarselli et. al. 2009
 
         
        The Graph Neural Network Model
F. Scarselli et. al. 2009