Workshop on Research Highlights in Programming Languages

Session 1: Program Synthesis/Machine Learning, Session chair: Deepak D’Souza

 

3:30pm – 4:00pm        Models and Programs: Better Together, Sriram Rajamani, Microsoft Research India

Over the past decade, we have seen ubiquitous adoption of Machine Learning (ML) models across many different application domains. In this talk, we explore the connections between ML models and programs, and argue that there are significant advantages in combining ML models with programmatic representations. We present results from our work across many different Machine Learning perspectives in support of this hypothesis, including Supervised Learning, Bayesian Learning and Reinforcement Learning. In each case, we show advantages in constructing human readable and interpretable programmatic representations of ML models, and in combining tools and techniques from program analysis and program synthesis with machine learning. We conclude with opportunities in using programming language techniques to make ML models efficient, robust, interpretable, and verifiable.

4:00pm – 4:30pm        Quantified Invariants via Syntax-Guided Synthesis, Sumanth Prabhu S, TCS Research Pune and IISc Bangalore

Automated reasoning about programs with arrays is an important problem as arrays are ubiquitous data-structure. However, this often requires invariants that are quantified universally over ranges of array elements. Such invariants are difficult to find and difficult to prove inductive. In this talk, I present our recent algorithm to generate universally quantified invariants, which is based on enumerative search.  The algorithm discovers the invariant in three stages. First, it discovers a range for the invariant by exploiting program syntax. Then, it identifies useful properties about individual elements of arrays and generalizes to the entire range to get candidate invariants. Finally, it filters the wrong candidates by using heuristics and SMT solver. The implementation of the algorithm advances state-of-the-art on a wide range of array-handling programs.

4:30pm – 5:00pm        Resilient Abstraction-Based Controller Design, Stanly Samuel, IISc Bangalore

We consider the computation of resilient controllers for perturbed non-linear dynamical systems w.r.t. linear-time temporal logic specifications. We address this problem through the paradigm of Abstraction-Based Controller Design (ABCD) where a finite state abstraction of the perturbed system dynamics is constructed and utilized for controller synthesis. In this context, our contribution is twofold: (I) We construct abstractions which model the impact of occasional high disturbance spikes on the system via so called disturbance edges. (II) We show that the application of resilient reactive synthesis techniques to these abstract models results in closed loop systems which are optimally resilient to these occasional high disturbance spikes. We have implemented this resilient ABCD workflow on top of SCOTS and showcase our method through multiple robot planning examples.

5:00pm – 5:30pm        IR2Vec: LLVM IR based Scalable Program Embeddings, S. Venkata Keerthy, IIT Hyderabad

In order to overlook the complexities arising out of NP-Completeness, modern compilers use expert-determined heuristics for most of the optimizations. As such heuristics are not designed to give optimal results for all the scenarios, there is an increased scope for machine learning approaches to solve such problems in order to obtain better results. In order to design such a solution, it is important to represent programs in a form that is amenable to learning.

In this work, we propose IR2Vec, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. It is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the programs. As our infrastructure is based on the Intermediate Representation (IR) of the source code, obtained embeddings are both language and machine independent. The entities of the IR are modeled as relationships, and their representations are learned to form a seed embedding vocabulary. Using this infrastructure, we propose two incremental encodings: Symbolic and Flow-Aware. Symbolic encodings are obtained from the seed embedding vocabulary, and Flow-Aware encodings are obtained by augmenting the Symbolic encodings with the flow information.

We show the effectiveness of our methodology on two optimization tasks (Heterogeneous device mapping and Thread coarsening). Our way of representing the programs enables us to use non-sequential models resulting in orders of magnitude of faster training time. The encodings generated by IR2Vec outperform the existing methods in both the tasks. In particular, our results improve or match the state-of-the-art speedup in 11/14 benchmark-suites in the device mapping task across two platforms and 53/68 benchmarks in the Thread coarsening task across four different platforms. When compared to the other methods, our embeddings are more scalable, is non-data-hungry, and has better Out-Of-Vocabulary characteristics.

Session 2: Software Faults, Testing, and Re-engineering, Session chair: Akash Lal

 

5:45pm – 6:15pm        Diagnosing Software Faults Using Multiverse Analysis, Prantik Chatterjee, IIT Kanpur

Spectrum-based Fault Localization (SFL) approaches aim to efficiently localize faulty components from examining program behavior. This is done by collecting the execution patterns of various combinations of components and the corresponding outcomes into a spectrum. Efficient fault localization depends heavily on the quality of the spectra. Previous approaches, including the current state-of-the-art Density- Diversity-Uniqueness (DDU) approach, attempt to generate “good” test-suites by improving certain structural properties of the spectra. In this work, we propose a different approach, Multiverse Analysis, that considers multiple hypothetical universes, each corresponding to a scenario where one of the components is assumed to be faulty, to generate a spectrum that attempts to reduce the expected worst-case wasted effort over all the universes. Our experiments show that the Multiverse Analysis not just improves the efficiency of fault localization but also achieves better coverage and generates smaller test-suites over DDU, the current state-of-the-art technique. On average, our approach reduces the developer effort over DDU by over 16% for more than 92% of the instances. Further, the improvements over DDU are indeed statistically significant on the paired Wilcoxon Signed-rank test.

6:15pm – 6:45pm        Monolith Decomposition based on Business Functionality Inference, Shivali Agarwal, IBM Research India

The problem of extracting microservices automatically from monolithic applications has seen much interest in recent years. Most of the proposed techniques rely on finding partitions of the source code that align with technical components or with the implicit modules in the existing code structure. While this is acceptable for software modularization and reverse engineering purposes, splitting microservices is better done on the basis of business functionality rather than technical components. In this talk, we will cover a novel approach for monolith decomposition, that maps the implementation structure of a monolith application to a functional structure that in turn can be mapped to business functionality.

We use formal concept analysis, used in literature for feature identification, in a novel way to infer the functional structure of the monolith application. This is preceded with an abstraction of the monolith code into a novel set of code features that capture the functionality relevant code structure and flows, and are largely agnostic of the programming language. These code features are used by the proposed functional decomposition technique. The approach then applies a clustering technique that groups classes together into three types of partitions: 1) candidate microservices, 2) a utility class group, and 3) a group of classes that require significant refactoring to enable a clean microservice architecture. The clustering technique attempts to maximize cohesion and minimize coupling both at the implementation structure level and at the functional structure level. This results in microservice candidates that are naturally aligned with the different business functions exposed by the application, while also exploiting implicit implementation boundaries in the monolithic code.

We have evaluated our approach on multiple Java and .NET applications against three baseline techniques. We will present the results on open benchmark applications like DayTrader and demonstrate the candidate microservice advisor tool released by us for decomposing enterprise legacy applications.

6:45pm – 7:15pm        Flaky Tests: Some Results and Research Challenges, Darko Marinov, UIUC, USA

Testing is the most common approach in practice to check software. Regression testing checks software changes. A key challenge for regression tests is to detect software bugs and fail when a change introduces a bug, signaling to the developer to fix it. However, an emerging challenge is that the tests must also _not_ fail when there is _no_ bug in the change. Unfortunately, some tests, called flaky, can non-deterministically pass or fail on the same software, even when it has no bug. Such flaky tests give false alarms to developers about the existence of bugs. A developer may end up wasting time trying to fix bugs not relevant to the recent changes the developer made. I will present some work done by my group and our collaborators to alleviate the problem of flaky tests. I will also discuss some remaining challenges, especially for researchers in programming languages.