Remote procedure calls (RPCs) provide a framework for implementing remote access to a system. They create a distributed computing environment that is established and controlled at the procedure level within an application.
In an earlier version of DODS, RPCs were the basis for all data transmission; the data delivery design has since changed to HTTP as the foundation for data transmission. Regardless, RPCs still merit discussion since they frame the concepts that are central to the design of the data delivery component of DODS.
RPC technology simplifies the design of distributed software systems by
providing a means to separate one program into two cooperating processes
using a procedural interface. This is straightforward because many programs
can naturally be broken into two sections along a procedural line separating
a core set of functions and some additional functions special to a particular
application. Figure
shows how RPCs can be used
to split a user program and API into a client program and remote server. The
RPC client and server `stubs' now form the communication interface between
the application program and server. The client stubs encode their arguments
and sends them over the network to matching stubs in the server. The server
stubs extract the arguments from the network and compute results based on the
arguments. The server stubs return to the client the results of the
computations.
RPC client stubs can serve as replacements for the API functions invoked by an application program. In this case the function of the networking portions of the client and server software are no different than in the more general case described in the preceding paragraph. Once arguments are passed into the server stub it can `compute the result' by calling the API function for which the client stub is a replacement. The server stub then returns as its result the return value of the API call. From the vantage point of the user program, there is no difference between this access and an access to a local file using the standard API library. This is true because the RPC client stubs preserve exactly the semantics of the standard API since there is a one-to-one mapping between function calls in the RPC client stubs (i.e., client library), the RPC server stubs (the transmission protocol) and the standard implementation of the API2. While building a server in this way is guaranteed not to change the semantics of the API, the operational mode of the API has radically changed. Previously it read data from a file, now it reads data from a data server which may be located anywhere on a given computer network (e.g., the Internet).
RPCs are not without drawbacks, however. A major problem with RPC technology is that it is based on a strict request-reply paradigm. While certain data sets work well in that context, others do not. For example, JGOFS provides access to data sets that are functionally relational databases, and thus are of unknown length. A good data server for such a data set would begin returning records to the user program before the server completes sending the result of a given access. However, the request-reply nature of RPCs makes this difficult.
Further complicating the use of RPCs is that all network interprocess communications code is nominally contained in the RPC server. Thus the server is responsible for all access logging and all security precautions. While basic access to a network is fairly easy to provide, more advanced functions quickly add to the complexity of the server and can easily dominate the time required to design and build it as well as the effort required to support it.
Note
During the early part of the DODS design effort, we built two sample RPC client-stub/server-stub pairs: one for NetCDF and one for JGOFS. While these worked well within the limitations of RPCs, it was felt that for certain data sets, particularly those that contained relational data, the strict request-reply nature of RPCs would need to be improved upon. This combined with the additional work required to support development of full-fledged data servers caused us to look at other transport mechanisms.
These implementations, however, did effectively demonstrate that we could build surrogates for the DODS supported APIs and re-link third party programs with surrogate libraries. The re-linked user programs could indeed access data via the data servers without any modification whatsoever. These RPC-based libraries, however, could not interoperate.