2012 GSoC LSB projects

Go back to the main GSoC Linux Foundation page

Mailing list: lsb-discuss at lists dot linux-foundation dot org

IRC: #lsb on irc.freenode.net

Workgroup Resources

Code License: GPL/BSD, specs: GNU FDL

Mentors: Alexey Khoroshilov (alexey dot khoroshilov at gmail dot com), Jeff Licquia, Vladimir Rubanov, Denis Silakov (dsilakov at gmail dot com).

Add Dynamic Tests to the Linux Application Checker

Linux Application Checker (AppChecker) is a Web-based tool for analyzing dependencies (libraries and interfaces externally required) of application packages. In particular, AppChecker is used to assure that application uses only LSB interfaces. However, the major disadvantage of the tool at the moment is that it performs only static analysis of required symbols and libraries and just checks if these dependencies are included in LSB.

A more powerful test would be to assure that application uses library functions in the proper way (e.g., doesn't pass NULLs as parameter values if this is forbidden). To be sure, it is impossible to verify all aspects of run-time function invocation, but some checks still can be made. The LSB project has a tool called DynChk which is able to test applications at run-time - see the Linux Symposium report for details.

DynChk is a useful tool, but it covers only several base libraries. In addition, it is a pure command-line tool which is not always convenient in case of large and complex reports.

The aims of this project are:

  • improve DynChk by adding more tests and covering more libraries
  • integrate DynChk into the Linux AppChecker

Desired knowledge: Perl, C, MySQL

Mentors: Denis Silakov, Vladimir Rubanov

Update LSB OLVER Core Test Suite

LSB OLVER core test suite is a part of LSB certification tests. Originally it has been developed for LSB 3.1. LSB specification as well as implementations under test evolved significantly since that time. So the goal of the project is to update the tests, i.e.

  • to fix issues appearing on modern Linux distributions;
  • to accomodate all relevant changes in text of specifications.

Extention of the tests for new LSB Core interfaces is also welcome.

LSB Core 3.1 standard covers the following libraries: libc, libcrypt, libdl, libm, libpthread, librt, libutil, libpam, libz, libncurses.  The interfaces of these libraries are implemented by glibc, Linux-PAM, zlib and ncurses upstream components. OLVER contains tests for 1532 interfaces from these libraries.
 
The OLVER test suite is developed using model-based testing technology UniTESK. The technology follows well-known Design by Contract approach to represent formal specifications. Program contracts consist of preconditions and postconditions of interface operations and data type invariants.  The main advantage of program contracts is that they enable automatic construction of test oracles that check the conformance of the target system behavior to the specifications.

Traceability of formal specifications to text of original LSB/POSIX specifications is supported using the following ideas:

  • each elementary requirement in original specification obtains a unique identifier;
  • each of the requirements is explicitly linked to a fragment of text in the original specification;
  • all checks in preconditions and postconditions are marked by identifiers of corresponding requirements.

See more details and documentation at http://linuxtesting.org/olver
 
Desired knowledge: C, SEC, model based testing
 
Mentor: Alexey Khoroshilov

Formalization of Correct Usage of Kernel Core API

Linux Driver Verification (LDV) program is devoted to developing an integrated framework for device drivers verification and adopting leading edge research results to enhance quality of verification tools. Currently LDV tools are used to check Linux device drivers against several safety rules, each of them formally defines what is correct usage of a particular part of kernel core API. Another way of looking at the safety rules is that a rule represents one or more typical misuse of kernel core API.

By the moment, the framework contains 5 rules formalized only. Nevertheless it has already helped to find and fix more than 50 issues in device drivers of vanilla Linux kernel.

This project (or several ones) is aimed at extending number of rules supported by the framework.

We have a collection of several dozens rules written in informal way, but other rules can be used as well. Formalization process usually includes the following steps:

  • identification of all elements of kernel core API relevant to the rule;
  • writing a finite state machine like formal model that consists of
    • model state (one or more C variables);
    • transitions between states linked to appropriate events in device driver (usually calls of kernel core API);
    • a set of error states (that represents misuse of kernel core API);
  • writing unit tests covering all the interfaces;
  • checking all drivers against the rule, analyzing results, reporting bugs and fixing the rule.

A notation used for formal representation of the rules is C code and AspectJ-like notation to merge the model code with original source code of device driver in appropriate places.

More than one application can be approved. Formalization of ~5 rules is expected from each student.

Desired knowledge: C, aspect-oriented programming, Linux kernel space API
 
Mentor: Alexey Khoroshilov 

Advanced Source Code Instrumentation for the Linux Driver Verification Project

Linux Driver Verification (LDV) program is devoted to developing an integrated framework for device drivers verification and adopting leading edge research results to enhance quality of verification tools. Currently LDV tools are used to check Linux device drivers against several safety rules, each of them formally defines what is correct usage of a particular part of kernel core API. Another way of looking at the safety rules is that a rule represents one or more typical misuse of kernel core API. By the moment, the framework contains 5 rules formalized only. Nevertheless it has already helped to find and fix more than 50 issues in device drivers of vanilla Linux kernel.

Formal model of a safety rule is very similar to a finite state machine written in C programming language. It consists of:

  • model state (one or more C variables);
  • transitions between states linked to appropriate events in device driver (usually calls of kernel core API);
  • a set of error states (that represents misuse of kernel core API).

To link the transitions with events in device driver we use an approach of “aspect-oriented programming”. Each event is described as a pointcut, while a transition is described as an advice (usually it contains C code modifying model state). To apply aspects to source code of device drivers we use a gcc-based C instrumentation framework. Currently it supports several kinds of pointcuts, but some of new safety rules require additional features.

This project is aimed at extension of the instrumentation framework with the most desirable feature that is an extension of advice syntax&semantic with an ability to apply custom transformations of syntax tree to expressions matched by the corresponding pointcut. The instrumentation framework should be extended to parse such aspects and modify drivers source code in the appropriate way. A set of transformations supported should be configurable outside of the instrumentation framework.

As a result of the project at least one safety rule should be formalized with help of the new advanced instrumentation feature. Finding and reporting bugs in Linux kernel will be the best evidence of effectiveness of this work.

Desired knowledge: Aspect-oriented programming, parsers, GCC internals, Linux kernel space API 

Mentor: Alexey Khoroshilov

Redesign LSB Navigator to be more independent from particular DBMS

For a very long time we have been using MySQL DBMS to manage the LSB Database which serves as a centralized source of data about standardized interfaces as well as data about Linux ecosystem. From time to time suggestions were made to try another DBMS (e.g., PostgreSQL), but the thing is that until recently we have never experienced significant problems with MySQL, so nobody has even estimated a complexity of such migration.

However, not long ago we were rebuilding/resurrecting LSB infrastructre (servers, services, etc.) and we have met some significant issues with MySQL (e.g.,
see this LSB Navigator bug). So the question of migration to something other than MySQL raised again, and this time we do have significant reasons to at least try to use another DBMS.

 However, many LSB tools are highly bounded to MySQL and different mysql* functions are used throughout in the code (currently we use Perl MySQL module in Perl tools and PHP mysqli extension in LSB Navigator). Often it is not enough to just replace all occurences of every mysql* function with its analogue for another DBMS. Some additional abstraction layer is required - e.g., DBMS-independent wrappers that will encapsulate all DBMS-specific actions.

We sugget to start with LSB Navigator which is (with more than 30.000 lines of code) the largest product that uses LSB DB. Additional task is to investigate scripts that use LSB DB to generate parts of LSB SDK, libchk, cmdchk and other tools. In general, these scripts are not large and easy to rewrite, but it would be nice to have some DBMS abstraction layer for them, as well.

To test the implementation, a student will have to 'migrate' LSB DB and Navigator to PostgreSQL. One can note that when migrating from one DBMS to another issues can arise with differences in SQL dialects used by different DBMS. Such issues are out of scope of this project. In general, we believe that most of our SQL statements will work with almost every RDBMS, though we haven't had a chance to check - this project should provide us with such a chance.

Desired knowledge: PHP, MySQL, PostgreSQL and general concepts of Relational Databases, Perl

Mentor: Denis Silakov

Implement a Lightweight Data Race Detector for Linux Kernel Modules on x86

Unlike many user-space applications, most of the Linux kernel modules must operate in a multithreaded way. A module is often required to be able to process many requests from other software components at the same time, as well as to service hardware interrupts, etc. Concurrency-related errors in the kernel modules, and data races in particular, may have fatal consequences, so it is very important to have the tools that help reveal such errors.

Although development of a full-fledged data race detector for the Linux kernel is a very difficult task and is probably not suitable for a student's summer project, there is at least one lightweight approach that can be implemented with reasonable effort. "DataCollider" system for Microsoft Windows (http://research.microsoft.com/apps/pubs/default.aspx?id=139266) relies on sampling techniques and uses hardware breakpoints to detect potentially conflicting memory accesses and therefore reveal data races. The system has proven its worth by finding critical bugs in the Windows kernel.

The goal of this project is to implement similar ideas for Linux kernel modules on x86 architecture, both 32 and 64 bit.

The Linux kernel already provides API to manipulate hardware and software breakpoints as well as the decoder of machine instructions that allows to determine instruction boundaries (used in Kprobes). The enhanced version of this decoder developed in Kernel Strider project (http://code.google.com/p/kernel-strider/) can be used to find out how the instructions access memory and, in many cases, to obtain the size of the accessed memory area, which can be helpful when implementing a data-race detector. So the basic blocks for the system to be developed in this project are already in place.

Solid knowledge of C language as well as the skills of Linux kernel module development techniques are required for this project. Prior experience with assembly programming and knowledge of x86 instruction format could be helpful but is not mandatory.

Mentor: Alexey Khoroshilov

Groups: