Workshop on Research Highlights in Programming Languages

Session 1: Race Detection, Session chair: K. Narayan Kumar

2:00pm – 2:30pm Static Data Race Detection for Android Models and Programs, Abhishek Vijay Uppar, IISc Bangalore

A data race takes place in an execution when when two conflicting accesses to a memory location occur ” in parallel”. Detecting races in Android applications is challenging due to their non-standard concurrent execution model, involving callbacks, looper threads, and posting of tasks to threads. In this work we model Android applications as a class of Event Driven Programs (EDP) characterized by looper threads with FIFO queues, and posting of tasks to these threads. We extend the notion of disjoint blocks introduced in Data races and static-analysis for interrupt-driven kernels by Chopra et al., to propose a static may happen-in-parallel (MHP) analysis for EDP programs. The analysis is based on the key notion of an abstract task graph which models the posting of tasks from one thread to another in an EDP program. We then give a small set of rules based on the abstract task graph of a program, that soundly captures when two statements may not happen-in-parallel.

We have evaluated our analysis manually on fragments of a few Android apps like OpenTracks, Aard2, etc and the precision appears to be good. We are currently in the process of implementing the analysis in the Flowdroid/Soot framework.

2:30pm – 3:00pm Static Race Detection for Embedded Systems, Rekha Pai, IISc Bangalore

Embedded systems are increasingly employed in safety-critical situations and the presence of races may lead to erroneous behaviour with serious consequences. Detection of races is one step towards proving correctness of programs. Two statements are involved in a data race if they are conflicting accesses to a shared memory location and can happen “simultaneously” or one after another. The problem of detection of races is well studied for classical concurrent systems which has standard synchronization techniques (using lock-unlock) and thread interleaving semantics. Embedded systems typically use non-standard synchronization mechanisms like disabling and enabling interrupts, suspending and resuming scheduler or specific threads, dynamically raising and lowering thread priorities, use of flags, etc. in addition to the standard locks. This brings in complex context switching possibilities which makes it challenging to detect races with high precision.

This talk will outline our recent work aimed at detection of races in non-standard concurrent systems. First, the talk will introduce some non-standard concurrent programming models and the challenges it brings to detection of races. This motivates the new definition of race suitable for such systems. The talk will then focus on the novel idea of disjoint blocks that enables detection of races with high precision.

3:00pm – 3:30pm LLOV: A Fast Static Data-Race Checker for OpenMP Programs, Utpal Bora, IIT Hyderabad

Data races are common source of bugs in parallel programs. Though frameworks such as OpenMP makes it easier to specify parallelism in programs, detecting races within such programs still remains a challenging task. There have been multiple data race checkers developed for pthread based parallel programs. However, very little progress has been made in designing and developing static race checkers for high level parallel programming languages and APIs such as OpenMP. This is evident from the scarcity of non-commercial or open source static race checkers for OpenMP programs. We developed a language agnostic, static data race checker for detecting data races within OpenMP programs. Our tool, LLOV, is based on polyhedral compilation techniques. Precise dependence analysis of the polyhedral model also facilitates LLOV to prove data race freedom in OpenMP programs. LLOV is designed to work on the Intermediate Representation (IR) of the LLVM Compiler, thus, making it language independent. We have tested it on C, C++, and Fortran programs. Designing a language independent static data-race checker is challenging even in a well-designed compiler framework like LLVM, as the high level semantics and language subtleties of the input languages and the parallelization directives are not structurally preserved in the IR. We overcome these hurdles in LLOV by a three pronged approach: (i) reconstructing OpenMP pragmas by carefully analyzing properties and patterns in the IR, (ii) extending the existing infrastructure of Polly to compute exact dependencies in parallel programs, and (iii) application of the Integer Set Library (ISL) for polyhedral operations. We show that the precision, accuracy, and the F1 score of LLOV on well-established benchmarks are comparable to the state-of-the-art dynamic race checkers while being orders of magnitude faster.

Session 2: Security, Session chair: Aseem Rastogi

3:45pm – 4:15pm Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities, Yannis Smaragdakis, University of Athens, Greece

Smart contracts on permissionless blockchains are exposed to inherent security risks due to interactions with untrusted entities. Static analyzers are essential for identifying security risks and avoiding millions of dollars worth of damage. We introduce Ethainter, a security analyzer checking information flow with data sanitization in smart contracts. Ethainter identifies composite attacks that involve an escalation of tainted information, through multiple transactions, leading to severe violations. The analysis scales to the entire blockchain, consisting of hundreds of thousands of unique smart contracts, deployed over millions of accounts. Ethainter is more precise than previous approaches, as we confirm by automatic exploit generation (e.g., destroying over 800 contracts on the Ropsten network) and by manual inspection, showing a very high precision of 82.5% valid warnings for end-to-end vulnerabilities. Ethainter’s balance of precision and completeness offers significant advantages over other tools such as Securify, Securify2, and teEther.

4:15pm – 4:45pm Pifthon: A Flow Secure Analyzer for Imperative Programming Languages, Sandip Ghosal, IIT Bombay

Information flow control (IFC) plays a vital role in controlling dissemination of information among mutually distrusted subjects/objects of a program. For the last two decades, researchers have been building tools/techniques for enforcing information flow security in programming languages or certify a program for a given information flow policy (IFP), which specifies constraints for information propagation. However, the use of existing IFC tools/techniques in real application development or certification is far from reality, due to imprecise flow analysis, difficulty in usability, and unsatisfactory declassification techniques. Precision mainly depends on binding mechanisms of labels with the subjects/objects of a program. Majority of the existing IFC techniques follow either static or dynamic labeling. While static labeling often results in false-positive for program certification, pure dynamic labeling could plausibly declare all programs to be flow secure. We have arrived at a dynamic labeling algorithm (DL) that adopts a hybrid labeling (static+dynamic) approach to analyze programs for flow security. Integrating the DL algorithm with the Readers-Writers Flow Model (RWFM), we have built a compile-time flow analyzer called Pifthon, for a dialect of Python. The flow-analyzer aids in identifying possible misuse of the information in a sequential program corresponding to a given IFP. Pifthon captures forward inter-statement implicit flows and backward flows in loop statements and inherits robust declassification from RWFM that is complete w.r.t. to flow policies. Further, Pifthon has distinct advantages like reduced labeling overhead and flow violation detections, making it quite usable and covers a wide range, including termination and progress-sensitive channels. In this talk, we shall present the labeling algorithm as well as the declassification rule that has been embedded in it, and illustrate the tool on several benchmark programs and provide a comparison with that of existing tools/techniques.

4:45pm – 5:15pm Verse: an EDSL for Cryptographic Primitives, Abhishek Dang, IIT Kanpur

Despite impressive work on code optimisation in modern compilers, cryptographic primitives are still being written in low-level languages like C and even assembly. While performance is an important consideration, the unpredictable nature of modern optimising compilers for high level languages leave primitives written in them vulnerable to various side channel attacks. However, in this bargain we have lost out on the safety, portability, and maintainability that comes naturally with a high-level language. We present Verse (Verse: An EDSL for cryptographic primitives – In the 20th International Symposium on Principles and Practice of Declarative Programming. ACM, 2018), a DSL embedded inside Coq for writing cryptographic primitives, which addresses some of these shortcomings. Verse is a low-level language for writing cryptographic primitives albeit with a relatively high-level interface. A Verse programmer can generate Verse code using functions written in Gallina, the functional programming language underlying Coq. This meta-programming facility can be seen as assembler macros and is used to give many utility functions for coding patterns like looping, register caching etc. These features of Verse make the programming experience remarkably high-level. To the programmer, Verse appears to be a typed low-level language, with facility for defining and using assembler macros and strong compile time type safety. The primary motivation of embedding inside Coq, a proof assistant, was to prove functional properties of Verse code in the same environment where the code is written. The parametric nature of the Verse AST and heavy use of the Coq sectioning mechanism in writing the code makes our job of giving a clean interface for proofs hard. Since the publication in PPDP ’18, we have provided the facility to add annotatations encoding program properties in Verse code itself. Extraction and presentation of proof obligations from annotated code is now very usable. We are working on a newer yet monoid based framework for semantics. This would pave the way for even more features like inline machine code, modular code and proofs and even apparently recursive code.

We hope to make a case for an embedded language as the approach towards narrow programming domains intent upon security and verification.