C++ / Unix Functionality

Here, we review the pertinent functionalities underlying both C/C++ and unix.

C/C++ Libraries

Glotzilla will implement a precompiled utility library that can be used to aid development of new simulation and data analysis codes in C/C++. Although such libraries will initially be written in C++, the general idea can be extended to other programming languages that support pre-compiled headers such as Fortran-90 and Java in the form of a general API. The idea underlying C libraries is straightforward: a common task that is performed by many programmers need not be re-written each time. Rather, a standard implementation of the task is made readily available via a one-line “#include” statement. For example, consider the C <stdio.h> library, which supplies a set of functions for printing to the screen from a C/C++ code (a process that involves thousands of lines of source code if written from scratch).

int main(int argc, char**argv)
{
   printf("Hello World!");
   return 0;
}

The program above calls the printf function to print the words “Hello World!” to the screen. The printf function is made available to the programmer by using the command “#include <stdio.h>” at the top of the file. This form of standardization 1) decreases the size of codes, 2) increases the accuracy and performance of codes. The C/C++ standard includes hundreds of pre-compiled libraries for performing tasks such as storing memory, manipulating data, writing to files, and performing mathematical calculations.

Integration with Unix

The vast majority molecular simulation research is carried out under unix-based operating systems such as MacOS or Linux. Glotzilla will integrate with such environments by providing binaries for performing common simulation and data analysis tasks from the unix command line. In unix, a binary (i.e., a set of computer instructions) can be executed by calling its name from the terminal.

host $ ls
Desktop Movies Documents Pictures

The example above demonstrates the execution of the unix “ls” (list) function, which, returns a directory listing to the screen (stdout) when executed. Users can execute custom binaries compiled in C/C++ or other computer languages in an identical fashion.

A particularly powerful feature of unix-based operating systems is the ability to “pipe” the output (stdout) of one process to the input (stdin) of another. This allows users to create chains of processes linked by their input/output (I/O) streams. As pipes process information in memory, chains of processes connected by pipes perform nearly as well as individual executables. The advantage lies in the fact that linking together executables in a modular fashion is results in many more permutations of useful processes.

host $ ls
Desktop Movies Documents Pictures

host $ ls | grep Movies
Movies

This general idea is demonstrated in the example above, where a user sends the output of “ls” to the input of “grep” using the pipe operator “|.” The “grep” function acts as a filter, printing only files that match the regular expression “Music”.

host $ ls | grep "Movies"
Movies

host $ ls | grep Movies | wc
       1       1       7

In next example, the output of the grep function is passed to the “wc” (word count) function, which writes the number of lines, words, and characters on to stdout. This general approach to data manipulation is referred to as the “pipes and filters” paradigm, since the flow of data through the unix pipeline is similar to the flow of mass through a physical pipeline. Here each program in the pipeline acts as a filter and passes the datastream to the next program.