Version 4 (modified by 12 years ago) ( diff ) | ,
---|
Scrambler Messaging
High-Level Design Goals
Scrambler's messaging architecture was designed around three principles:
- Decoupling: By isolating the lower-level communications from the rest of Scrambler, we can explore and implement alternate messaging algorithms without disrupting large sections of the code.
- Low Message Count: Our experience with AstroBEAR taught us that MPI latency costs scale far worse than bandwidth costs. Consequently, it is better for us to group messages into a few large blocks than it is to send them in many smaller blocks.
- First Arrival Processing: If scrambler is waiting on messages from several processors, it should be able to process whichever message it receives first.
Messaging Architecture
We have modeled Scrambler's messaging architecture after the OSI network model. Each layer of the model is responsible for one aspect of communication.
- Application Layer: The data to be sent or received is generated here.
- Scheduling Layer: Determines which other processes to communicate at this step.
- Parsing/Unparsing Layer: Specifies what data to send to receive from a given process. The sending part of this layer is referred to as parsing, and the receiving part is called unparsing.
- Packing/Unpacking Layer: Bundles data into message blocks on the sending end (this process is called packing) and extracts data from message blocks on the receiving end (a.k.a. unpacking).
- Transmission Layer: Coordinates the actual MPI sends and receives between processors.
Each layer is designed to be independent, and interact only with the layers immediately above and below it. The packing layer, for instance, is called by the parsing layer and calls the transmission layer. Under no circumstances does the parsing layer ever access the transmission layer directly.
In order for this design to work, each layer must assume that the other layers are doing their jobs properly. For example, the packing layer does not decide what to include in the message—it just packs what is given to it by the parsing layer.
Only the last two layers make any MPI calls beyond the occasional MPI_Wtime()
for a timestamp. These communication layers are completely independent of application-level context; they work exactly the same regardless of the current scrambler algorithm stage. In contrast, the first three layers are extremely stage-dependent. This is unsurprising, since these three layers determine the content of the messages that get passed by passed by the communication layers.
How to Use Scrambler Communications
The communications system revolves around two objects: packed messages and message groups. A packed message object manages the information for a single message; this message can either be sent or received, but it must be decided when the message is created. A packed message must also have a level, a remote processor (i.e., a destination), and a tag.
Each packed message also references one or more message blocks. These blocks contain the data to be sent to the remote processor. Packed messages are not limited to one type of data—the MPI_Pack()
command allows us to store integers, reals and double-precision numbers side by side.
Packing Layer
The packing layer operates on messages and message groups. Its purpose is to gather a message's data and pack it into a series of buffers for easy transmission. As its name suggests, this layer makes use of the MPI_Pack()
routine to handle the complications of transferring heterogeneous data. Packed messages typically transfer all the data that the receiving processor requires from the sender for a specific operation and level. Our goal is to have 99% of messages require only one transmission. As such, our maximum buffer size is quite large—5 MB or so. This accomodates virtually all tree data and a fair amount of grid data.
Our packing algorithm starts on the sending side by flattening data into a 1D array and then packing it to a character (byte) array. If the new data would overflow the packing buffer, then we pack as much into the buffer as we can, allocate a fresh buffer and start packing the remainder. The algorithm does not post a send for a buffer until the buffer is closed. Buffers are closed automatically when they are completely filled, but the last buffer must be closed by the programmer. When a sending PackedMessage
object is destroyed, all buffers are automatically closed.
The packing/unpacking layer assumes that the data will be unpacked in the same order that it was packed, but it is the job of the parsing layer to ensure this. The unpacking process is similar to the packing process; data is unpacked from the received buffers into a 1D array, which is then reshaped into an array with the appropriate dimensions. If the buffer is emptied and the array is still waiting for data, then another buffer is assumed to be en route and it is received.