Chapter 4 The MROD

All the sub-detectors of ATLAS are equipped with a "chain" of electronic boards dedicated to the task of collecting the raw (analog) data from the detector modules, converting the raw data in a sub-detector specific digital format (plus standardised headers and trailers) and presenting the data to the High Level Trigger and DAQ systems (see Chapter 3). Data has to be stored until the trigger system issues a decision about the relevance of the data.

The hardware components of the readout chain are divided in three groups:

the Front-End (FE) electronics, which are responsible for the digitizing the analog pulses arriving from the detectors and de-randomizing in time the data;
the Read-Out Driver (ROD) is a detector specific element which gathers data from the Front End via one or more data streams and merges them in a ROD data fragment. The ROD fragment is then sent to a ROB via the ROL;
the Read-Out Link (ROL) is the physical link between the ROD and ROB, it provides a fast data transfer with error suppression and flow control mechanisms;

The readout chain is driven externally by the Timing, Trigger and Control system (TTC) which is part of the Trigger and Data Acquisition system and it is used for the distribution of fast timing, trigger signals and control signals across the ATLAS experiment.

The Muon Spectrometer is equipped with four types of chambers, each with its own specific readout chain: the RPC chambers, the TGC chambers, the CSC chambers, and the MDT chambers. In this chapter I will describe the MDT readout chain and in particular the prototype ROD, which was developed at NIKHEF.

4.1 The MDT Readout chain

This section will describe briefly how a signal is created in the MDT chambers and how it is processed as it travels along the MDT readout chain.

When a muon hits a drift tube in a MDT chamber, it ionises the gas inside the tube. As a result, small clusters of electrons are formed along the muon trail. These clusters drift in the electric field towards the anode wire. In proximity to the wire, the drifting clusters are subjected to a strong electric field, which causes an increase of the charge of the clusters because of the avalanche effect. Eventually, the electron clusters are collected by the wire, and generate a small current. The current signal passes through the Hedgehog boards, which decouple the signal from the high voltage of the MDT chambers and connect to the mezzanine cards. The mezzanine cards amplify and filter the raw analog signal and convert it into a digital form --- a hit. The time of arrival of the hit is measured by a Time to Digital Converter (TDC) and encoded with the hit information. On each chamber, a Chamber Service Module (CSM) scans the mezzanine cards and collects the TDC data before forwarding it to the MDT ROD (MROD). The MROD collects data from six CSMs and merges it to create a MROD data fragment. The fragments are sent via the ROL to the ROB.

File:Chain.gif

Figure 4.1: Scheme of the MDT readout chain: 24 tubes are serviced by a Hedgehog board on which a mezzanine card is mounted. The CSM module polls data from 18 mezzanine cards and sends it to the MROD via optical link. Each MROD merges data from 6 up to 8 CSMs and sends the data to a ROB. In total, 192 MRODs will be present in the Atlas detector [79].

4.1.1 Hedgehog boards

The Hedgehog boards are passive electric circuits mounted directly on the terminations of the MDT tubes and come in two sorts: the high voltage (HV) boards on one side of the chamber and the read out (RO) boards on the opposite side. The task of the HV boards is to provide high voltage to the MDT tubes and to terminate MDT wires with the correct impedance in order to avoid signal reflection. The task of the RO boards is to connect the MDT's anode wires to the mezzanine boards, decoupling the mezzanine boards from the high voltage of the MDT and preventing cross-talk between the MDT wires.

4.1.2 Mezzanine cards

Mezzanine cards --- also known as TDC cards --- are electronic boards that are mounted on top of the RO Hedgehog boards. The mezzanine cards fulfill four important tasks:

amplifying the MDT signal;
shaping the MDT signal;
converting the analog MDT signal in a digital hit;
time-stamp the hit and store it in a buffer for later retrieval.

These tasks are accomplished by two types of chip mounted on the mezzanine cards: the ASD chip and the AMT chip.

The ASD chip

The ASD (Amplifier, Shaper, Discriminator) chip, is divided into three stages (see Figure 4.2): in the Amplifier stage the MDT signal is converted to a differential signal by a pair of pre-amplifiers, and then is fed into the second Shaper stage, where a chain of three amplifiers and RC circuits perform bipolar shaping of the signal to optimise the signal to noise ratio and restore the baseline. In the third and final stage, the Discriminator, the most important signal processing takes place. The use of differential signals reduces noise effects, in particular coupling effects between the digital and analog circuitries of the ASD [77].

The Discriminator stage consists of a differential amplifier (DA4) and an Analog to Digital Converter (ADC). The signal arriving from the Shaper stage is split in two parallel circuits and sent to both the DA4 and the ADC.

In the ADC the charge carried by the signal is measured using the Wilkinson "dual slope" technique: the signal pulse charges a capacitor during the first few nanoseconds --- from 10 to 50 ns, user defined --- after the rising edge of the pulse. Then, the capacitor is allowed to discharge. Since the discharge current is constant in time, the time needed to fully discharge the capacitor is proportional to the charge carried by the pulse.

In the DA4, the signal pulse is discriminated against a user-defined threshold. When the signal height first exceeds the preset threshold, a digital pulse is generated, the width of which is determined by the shape and amplitude of the MDT signal. Multiple threshold crossings (and output pulses) are possible. The rising edge of the first (main) pulse contains the timing information for the event [77].

At this point, the user has the choice to select which of the two signals (ADC or DA4) is output by the Discriminator stage; two modes are present: in the ADC mode (default) the output pulse width represents the charge measured by the ADC integrating the MDT signal over a fixed gate time, while the rising edge contains the timing information. In Time-over-Threshold mode, the DA4 signal --- the length of which is proportional to the total time the MDT signal was above threshold --- is sent to the output drivers; this measurement is also referred to as Leading and Trailing Edge; multiple threshold crossings are avoided by the introduction of a user-defined dead time. The selected output pulse is converted to a low voltage differential signal (LVDS) and sent to the AMT chip.

File:Asd.gif

Figure 4.2: Scheme of the different stages of the ASD chip. The MDT signal (moving through the picture from the left side to the right side) is first converted to a differential signal in the Amplifier stage, then is shaped by three differential amplifiers in the Shaper stage, then either its charge or its time over threshold are encoded in the length of the output pulse by the Discriminator stage. Finally, the output pulse is converted to the LVDS format and sent to the AMT chip (not present in this picture) [77].

The AMT chip

The purpose of the the AMT chip is to timestamp the digital hit output from the ASD chip and to match it to the trigger signal arriving from the Level 1 Trigger [78]. The AMT main component is a ring oscillator coupled with a Phase Locked Loop (PLL). The oscillator generates a 40 MHz clock pulse which is fed into the PLL. The PLL locks in phase with the clock pulse and generates a second signal with double frequency (80 MHz). This second signal runs in a loop through the PLL circuitry, in such a way that the feedback of the signal at the beginning of the loop stabilises the phase of the signal itself. As the signal travels around the loop, it changes the state of 16 binary switches (called taps). Hence, when the digital hit of the ASD enters the AMT, the status of the taps is latched, giving a 4 bit measurement of the time of the arrival of the ASD hit. This measurement is defined as Fine Time Measurement, and one bit corresponds to (80 MHz/16 taps) 78.125 nanoseconds.

The AMT chip includes as well a Coarse Counter, which provides the Coarse Time Measurement: this time measurement is synchronised with the 40 MHz LHC clock, thus it is equivalent to the bunch count identifier (BCID). The coarse counter has a user-programmable offset to allow for trigger latency.

When a hit has been detected in one of the 24 AMT channels the corresponding channel buffer is selected, the time measurement done with the ring oscillator is encoded into binary form (vernier time), the correct coarse count value is selected and the complete time measurement is written into the 256-words-deep L1 buffer together with a channel identifier. The AMT performs individual timing measurements of both the leading and the trailing edge of the ASD pulse; the user can choose for timestamps to be sent individually or opt for a paired measurement: in this case, the timestamp of the leading edge and the length of the pulse are encoded in a single word . The paired measurement in the AMT is however troubled by bugs, and the possibility to perform the encoding in the CSM is currently under study.

File:Amt-time.gif

Figure 4.3: Time measurement in the AMT chip. The 16 taps time measurement of the Ring Oscillator is encoded in the 4-bits vernier time. The vernier time, plus the Least Significant Bit of the Coarse Counter become the 5-bits Fine Time Measurement. The remaining 12 high order bits of the 16-bit timestamp are occupied by the 12 Most Significant Bits of the Coarse Counter, and form the Coarse Time Measurement (figure taken from [78]).

The timestamps stay inside the L1 buffer until a trigger signal reaches the AMT chip. The AMT chip then extracts the Bunch Identifier (BCID) of the trigger signal and looks in the L1 buffers for hits which have a coarse time measurement compatible with the BCID of the trigger signal. A match between the trigger and a hit is detected within a programmable time window (see Figure 4.4). All hits from the trigger time until the trigger time plus the matching window will be considered as matching the trigger and processed for output. The maximum trigger latency which can be accommodated by this scheme equals half the maximum coarse time count --- 2¹²/2 = 2048 clock cycles = 51 μs. In addition to the matching window, the search for hits matching a trigger is performed within an extended search window to guarantee that all matching hits are found even when the hits have not been written into the L1 buffer in strict temporal order. For normal applications it is sufficient to make the search window 8 larger than the match window. The search window should be extended for applications with very high hit rates or in case paired measurements of wide pulses are performed (a paired measurement is only written into the L1 buffer when both leading and trailing edge have been measured). The trigger matching can optionally search a time window before the trigger for hits which may have masked hits in the match window because of the dead-time of the ASD. A channel having a hit within the specified mask window will set its mask flag. In case an error condition (L1 buffer overflow, Trigger FIFO overflow, memory parity error, etc.) has been detected during the trigger matching a special word with error flags is generated. All data belonging to an event is written into the read-out FIFO, preceded by a header word and followed by a trailer word. The header word contains the Event ID (EVID) and Bunch ID (BCID) of the trigger that matches the data. The trailer word contains the same EVID plus a word count [78].

The data plus the header and trailer word is encoded with a defined parity to allow for the detection of transmission errors and is sent out of the AMT chip to the CSM module.

File:Amt-window.gif

Figure 4.4: Principle of operation of the L1 buffer. The time increases from left to right. The BCID of the trigger signal is used as a starting point to look for hits matching the trigger. All hits inside the matching window are selected for output. Hits inside the search window are scanned as well, to prevent the possibility that high event rates might cause matching hits to be written in the L1 buffer outside of the matching window. The matching algorithm looks also for hits preceding the trigger which could possibly have masked hits within the matching window. If any masking hit is found, a warning flag is raised. Hits older than the reject time are deleted from the buffer to limit buffer occupancy. All the windows and the reject time are user-configurable [78].

4.1.3 CSM

The Chamber Service Module (CSM) was developed at the University of Michigan to act as a front-end readout multiplexer, mezzanine initialization controller, trigger and timing distributor, and calibration controller. Initially, the CSM functions would be performed by the ROD of the muon spectrometer; it was decided to split the muon ROD in two parts: the CSM would be installed on each MDT chamber while the MROD would multiplex the data streams coming via optical links from 6-8 CSMs. This arrangement has obvious advantages: since the CSM is mounted in proximity of the mezzanine cards, no bulky bundles of cables are needed to multiplex the front end. The second advantage is that while the CSM has to be radiation-hard and immune to magnetic fields, the MROD can comfortably sit outside of the experimental area and be constrained by less stringent requirements.

The main task of the CSM is to do simple time-division multiplexing of data from the 18 connected mezzanines. With this scheme the CSM polls cyclically the read-out FIFOs which receive serial data from the AMT chips. From each of the 18 FIFOs one data word is read, whether it is a hit word or a header/trailer word, and independently of its EVID and BCID. If no data is available from a specific mezzanine, an idle place-holder is inserted in the cycle (see Figure 4.5). Once the cycle is complete, a separator word and two idle words are prepended to the 18 data/place-holder words. Finally, the 21 (18 AMT + 1 separator + 1 idle) words are sent to an optical output encoder and driver, from which they are transferred serially via optical link to the MROD module.

File:Csm-poll.gif

Figure 4.5: Serial data flow and event transmission by the CSM. The polling cycle reads one word from each of the 18 TDC FIFOs. If no data is present, a place-holder word is generated for the corresponding TDC. When the cycle is complete, the 18 words plus 3 control words are output serially. The numbers on the figure indicate the EVID. Please note that the CSM reads and outputs the data words irrespective of their Event Identifier. The chronological ordering and fragment building is performed by the MROD [79].

The CSM is composed of six functional blocks: the JTAG interface, the trigger timing and control receiver (TTCrx), the optical transmitter, the serial to parallel receivers, the multiplexer, and the environment monitor. The list below further defines the content of these blocks.

The JTAG Interface

The JTAG¹ Interface consists of an FPGA-based controller plus a connection chain. The purpose of the JTAG interface is to provide the configuration parameters to the CSM components and (in the prototyping phase) to the mezzanines connected to the CSM. The JTAG controller supports the following functions: [79]

initialization of the parameters of the CSM and ASD/AMT cards;
initialization of the parameters of the TTCrx;
controlling the run/reset/resynchronisation of the CSM, TDC, and ASD/AMT.

The JTAG interface receives the configuration bit stream from an ELMB --- Embedded Local Monitor Board, see below --- mounted on the MDT chamber and distributes the bit stream to all the mezzanine connected to the JTAG chain. The JTAG chain is a closed loop; after passing through the mezzanines, the JTAG bit stream returns to the CSM, where it is checked: from the status of the bit stream one can retrieve information about the status of the components linked by the chain.

The TTCrx chip

The TTCrx is a chip designed at CERN for the Trigger and Timing Control systems of the LHC experiments. The TTCrx chip contains an optical receiver, which connects to the optical fiber carrying the TTC signals. The TTCrx fulfils the task of receiving the optical signals from the TTC system, converting it to digital data ("commands") , and sending the commands to the CSM. The CSM would then forward the commands to all the connected mezzanines. Beside the commands, the TTCrx chip locks with the LHC clock coming from the optical fiber, and provides the clock signal to the CSM and mezzanines alike.

The commands that the TTCrx forwards to the CSM include among others:

Event Counter Reset (ECR) and Bunch Counter Reset (BCR) --- These commands instruct the mezzanines to reset to zero the Event ID (EVID) and the Bunch ID (BCID).
Level 1 Trigger --- After this command, the AMT chips on the mezzanines start searching for data in the L1 buffers.
Initiate Calibration Triggers --- The CSM sends to the mezzanines a sequence of calibration trigger pulses, with adjustable delays.

The Serial to Parallel Units

Input data from the mezzanines arrives as serial bit streams that are assembled into 32-bit words under the control of 18 Serial to Parallel circuits, one for each mezzanine. The circuits perform a parity check on data words, to detect transmission errors from the mezzanines. An input multiplexer subsection polls for data from the circuits at 40MHz and transmits found data to 18 individual FIFOs. A second polling multiplexer scans the individual FIFOs for data and if found sends the 32-bit data unit to the optical transmitter. If no data is present in the FIFO for the polled TDC, an empty TDC flag is transmitted for the corresponding mezzanine. The TDC data words and their parity bits are modified so as to include parity information both for the received word from the TDC and for the outgoing word. Bit 27 is set to contain a parity error flag if the incoming word from the TDC fails the parity test done within the CSM. Bit 26 of the outgoing TDC word is set so that the parity of the outgoing word is odd (including the Bit 27 error flag). The outgoing words are then sent to the Fibre Sequencer for output to the MROD.

The Fibre Sequencer

The task of the Fibre Sequencer is to collect the TDC words from the polling multiplexer, to convert them into optical signals and send them via optical fiber to the MROD. The optical transmitter is based on the CERN-designed radiation-hard GOL chip and a Truelight optical transceiver. This unit can accept 32-bit words at up to 40MHz for transmission along an optical fiber. The data words are transferred serially in trains of 21 words. At the MROD the 32-bit words are collected, the separator word checked, the TDC empty words removed, and the actual TDC data stored. The optical transmission idle words do not appear at the receiver output. They serve only as synchronisation characters.

The DCS Analog Monitor

The final subsection provides for voltage and temperature monitoring. The cable from each mezzanine card provides a connection to its analog voltage regulator output, its digital voltage regulator output, and to an on-board temperature sensor. These 18 x 3 lines plus 3 lines from the CSM regulators and temperature sensor, are routed to a 64 channel analog multiplexer and ADC. The 64-channel ADC is a direct copy of the ADC of the standard ELMB (see below).

4.1.4 The ELMB

The Embedded Local Monitor Board [80] is a general-purpose board which interfaces the front-end electronics and environmental sensors with the DCS system (see Chapter 3). The ELMB is composed of a 4 MHz RISC micro-controller, a 64 channel ADC and a CAN transceiver. The main purpose of the ELMB is to monitor the environmental variables of the ATLAS detector: the ADC can digitise analog data coming from different types of sensors, such as temperature probes, Hall probes (to measure the magnetic field of ATLAS), amperometers (to monitor the power consumption and current stability of front-end electronics). The digitised data is sent to the DCS system via the CAN bus (which is an ISO standard bus).

The ELMB has also a role in the configuration of front-end electronics: configuration parameters can be sent via the CAN bus to the ELMB, where the micro-controller distributes the configuration parameters through its digital input/output ports.

Each MDT chamber is equipped with one ELMB board, which sends configuration data to the mezzanines via the CSM, receives data from the current/voltage regulators of the mezzanines, and reads out the temperature and Hall probes mounted on the MDT chamber.

4.2 The MROD

In the ATLAS experiment, each sub-detector has its own flavour of ROD module, and the MROD (MDT-ROD) is the ROD serving the MDT chambers of the Muon Spectrometer.

The MROD is a fast data multiplexer and data formatter: its task is to collect data from 6-8 CSM modules, group data fragments according to the EVID and merge the data fragments in a single pre-formatted ROD fragment, which is then sent downstream to a ROB.

The experimental environment of the ATLAS detector requires the MROD to be fast in order to cope with the data input rate given by the LVL1 trigger: in the worst-case scenario, assuming a high background radiation on the end-cap chambers and 100 kHz of trigger rate, the maximum data throughput is estimated to be about 180 MByte/s, which can be reduced to about 140 MByte/s if lossless zero suppression and event compression techniques are implemented in the MROD [89].

The prototype MROD was developed at NIKHEF with an hybrid FPGA/DSP design; the main features of this design are:

time-division multiplexing and formatting of TDC data performed in high-speed FPGAs;
network of interconnected DSPs running event managing algorithms;
control, status and monitoring performed via VME bus;
ease of configuration of the run-time parameters;
possibility to connect together more than one MROD board, to accommodate for special tower topologies (end-caps);
possibility of running different algorithms on the DSPs.

The software running on the prototype MROD was developed entirely at NIKHEF: the FPGA code was written by P.Jansweijer, while the C algorithm running on the DSPs was written in the year 2000 by H.Boterenbrood for the MCRUSH --- an early single-channel prototype of the MROD --- and was developed by myself to adapt to the new hardware during the years 2002/03. The software algorithms described in Sections 4.3 and 4.4 belong to the last version of the code utilised during the H8 test-beam in summer 2003. From that moment on, the code development was continued by R.van der Eijk, who developed a C++ implementation.

4.2.1 The SHARC DSP

The SHARC is a Digital Signal Processor manufactured by Analog Devices. Among the most interesting features of the SHARC are six high-speed data transfer links; with a width of 8-bits and a maximum clock frequency of 100 MHz, each link is able to transfer up to 100 Megabyte per second. The second most important feature of the SHARC is that all the links can be operated in DMA mode, which means that the data transfer between external sources and the internal memory of the SHARC can occur in parallel for all links, and the core processor of the SHARC does not have to spend processing time taking care of the actual data transfer [83].

The performance of the SHARC links is essential for the functionality of the prototype MROD: the ability to read out several data streams in parallel is necessary when high-speed multiplexing is required: in the mean time that the data transfer is taking place, the 100 MHz core processor² is able to manage the data streams, check the data and format the data block. All the five SHARCs mounted on the MROD are interconnected via SHARC links (see Figure 4.6); the main flow of data occurs between the MROD-In and the MROD-Out, while the other links are used either for booting the DSPs or to pass status messages.

The SHARC is very flexible and configurable: unlike the FPGA, the SHARC can be programmed in Assembler, C or C++; the binary executables are loaded in the 512 kB memory of the SHARC either by transfer through a SHARC-link, by direct memory addressing from an external bus, or by upload from a PROM.

4.2.2 The ALTERA FPGA

The FPGA acronym stands for Field Programmable Gate Array: an FPGA is an integrated chip that consists of a network of logical gates. The connections between the gates are programmable and they are described by a hardware description language (HDL) code which is converted into binary format and fed into the FPGA, usually via external memory or EEPROM devices. FPGAs can be used for data manipulation (like DSPs, see Section 4.2.1) and memory management. FPGAs are very fast but the way they are programmed is cumbersome and fast, on-the-fly modification of FPGA software is usually impossible.

The MROD uses ALTERA APEX FPGAs [84]. These FPGAs contain about 200.000 programmable gates (100.000 for the MROD-Out FPGA), and feature pre-programmed blocks in order to simplify the task of the programmer: features like FIFOs or Dual Port RAM management are already implemented in the FPGA.

4.2.3 The ZBT buffer

In order to cope with the input rate of data coming from the S-link, it is necessary to have high-speed memory for buffering purposes. Synchronous SRAM memory is preferable to asynchronous SRAM since it is generally faster, but for synchronous designs, when switching between reading and writing, a wait cycle needs to be inserted, slowing down the memory access. This can be avoided with ZBT (Zero Bus Turnaround) RAM. This type of memory is therefore used in the MROD.

The ZBT buffer is configured as Dual Ported Memory. At one port, data coming from the S-Link is continuously written into the memory by the FPGA, which makes full use of the built-in memory management and address generation of the ALTERA chip. At the other port, the FPGA reads the data out of the ZBT buffer for further processing. The dual-port architecture allows one access each 25 ns on both ports, for a 40 MHz input rate of 32-bits data, which results in a maximum input bandwidth of 160 MByte/s.

4.2.4 The S-link

The S-Link protocol is a standard data transfer and management protocol for point-to-point data communication links developed at CERN [86]. Links implementing the S-Link protocol can transfer 33-bit-wide words (the extra bit denotes a command word from a data word) up to 160 Mbps. The S-Link protocol includes cyclic redundancy checks to signal transmission errors and a hardware flow control mechanism. For each S-Link two electronic boards are needed: one Link Source Card (LSC) which transmits the data and a Link Destination Card (LDC) which receives them. LSC/LDC pairs are available in different implementations, with optical or copper links. The MROD uses an optical implementation: an ODIN (Optical Dual link INterface) LSC is mounted on the MROD-Out to send the data fragments to a ROB, which contains the corresponding ODIN LDC. In the near future the ODIN implementation will be substituted by the optical HOLA implementation [87], which is the standard implementation for ROLs.

4.2.5 The GOLA-link

The GOLA-link is an optical link based on the HOLA S-Link implementation and on the CERN-developed GOL ASIC [88]; the GOL chip is radiation-hard (as opposed to other S-Link implementations), making it suitable to be used in the experimental area of ATLAS. The GOLA-link was chosen to equip the CSM: hence each MROD has six GOLA LDC boards --- two for each MROD-In daughter-board --- which read out the six CSMs connected to the MROD (see Section 4.2.6).

The main drawback of the GOLA implementation is the absence of a mechanism able to exert back-pressure on the LSC (such as the mechanism available for the S-link protocol): the stream of data cannot be stopped, so if the MROD does not read the data fast enough its input buffer are likely to overflow. This possibility, however has been ruled out both by data flow simulation [89] and direct measurement.

4.2.6 The MROD board

The topology of the MROD board is outlined in Figure 4.6; a 9U VME64x motherboard (the MROD-Out) carries 3 daughterboards (the MROD-Ins). The MROD-Out contains:

one ALTERA FPGA which interfaces with the VME and TIM backplanes (see Section 4.2.7) and with the external bus of the SHARC DSPs;
one S-Link Source Card which outputs the data from the MROD;
two SHARC DSPs which provide the links that connect with the three daughterboards;
a host bus which connects the two SHARCs and the FPGA;
six GOLA optical input boards which receive data from the CSMs: despite being mechanically mounted on the MROD-Out, the GOLA links are a component of the MROD-In;

Each MROD-In contains:

two ALTERA FPGAs which handle data from the GOLA links and process the data;
two ZBT memory buffers which store the CSM data;
one SHARC DSP which connects to the FPGAs and to the MROD-Out motherboard.

File:MRODscheme.gif

Figure 4.6: Schematic diagram of the MROD: the MROD-Out motherboard contains one FPGA which drives the interfaces with the VME and TIM backplanes and two SHARC DSPs which provide the links for the transfer of data, message passing and configuration (links marked in red and numbered). The three MROD-In daughterboards contain each two FPGAs which drive the input links and process the data, two memory buffers and a SHARC that collects the data from the FPGAs and sends it to the motherboard. The picture on the left shows the front panel of the MROD, with the six GOLA input cards (two for each MROD-In) and the S-Link output card (mounted on the MROD-Out).

4.2.7 The TIM module

The TIM (TTC Interface Module) is a 9U VME64x module designed to interface the TTC system (see Section 3.4.4) and the CTP system (see Section 3.4.3) with the ROD system.

The TIM receives the LHC clock, fast commands and event ID from the TTC system and forwards them to the RODs with minimum latency. The communication with the RODs takes place via an extra backplane in the ROD crate (see Section 4.2.8). The TIM module also collects the BUSY signals from the RODs via the backplane: this signal notifies that further input of FE data might be problematic for the ROD (for example, its internal buffers are full, or the link with the ROBs is down); the TIM acts as a logical OR gate for the BUSY signals coming form the RODs residing in the crate; the output signal is forwarded to a ROD Busy Module which generates a veto signal if the BUSY signal lasts longer than a predefined duration. The veto is passed to the Local Trigger Processor (LTP) and eventually to the CTP, which stops temporarily to send triggers, until the veto is removed.

The TIM is very useful for testing and commissioning the RODs: it can be programmed by the crate controller to send stand-alone LHC-like clock, trigger signals, fast commands and event ID to the RODs.

4.2.8 The MROD crate controller

The 192 MROD boards of the Muon Spectrometer will reside in 16 crates. Each MROD crate contains a ROD Crate Controller (RCC), 12 MROD boards and a TIM module. The crate controller is a PC contained in a 6U VME module: the model chosen for the ATLAS detector is a Concurrent Technologies module [81], which contains a Pentium-class CPU. The crate controller runs the standard CERN Linux operating system, and has an Ethernet interface to allow remote connections. The filesystem of the crate controller resides either on a hard-disk or on a remote networking filesystem. The task of the crate controller is to boot and configure the MROD and TIM modules present in the crate.

The VME bus appears to the crate controller as a range of memory addresses: to each module a different subset of the range is assigned. The server program mrodsrv was developed in order to access the SHARCs mounted on the MROD-Out motherboard from the crate controller: the mrodsrv program first of all reads a binary executable file from the RCC's filesystem, then boots a SHARC with the binary file via the VME backplane --- several instances of the server program run in parallel to boot all the SHARCs of the MROD boards mounted in the crate. Once a SHARC is booted with the desired program, mrodsrv creates in the SHARC's memory a communications buffer which is used for the exchange of messages. The server program provides the SHARCs with a standard text interface and with file I/O, and can be used for a text-based interaction between the user and the MROD boards.

The MROD-In is booted by mrodsrv like the MROD-Out, however in this case the connection between the VME bus and the MROD-In SHARC is not direct: mrodsrv uses the SHARC links between the MROD-Out and the MROD-In to reach the MROD-In and load the program in memory, making good use of the possibility of booting via link provided by the SHARC hardware.

File:Mrodin.gif

Figure 4.7: Diagram of the MROD-In components --- only one of the MROD-In channels is represented for the sake of clarity. The FPGA reads the data from the GOLA card via the Input FIFO. The data is divided into 18 partitions of the ZBT buffer, and the presence of TDC trailers is checked. Once a trailer is found, its presence is marked in the so-called Tetris register. If one line of the Tetris register is complete --- i.e. all 18 TDC trailers for one event are present --- the selected memory cells are read from the ZBT into the Output FIFO. When all data has entered the Output FIFO, the data length is calculated and put inside the Event Length FIFO to be read by the SHARC. The FPGA drives an endless DMA between the Output FIFO and the SHARC. Transmission errors or missing trailers are signalled by the FPGA to the SHARC via interrupt lines. The data gathered by the SHARC from the two channels is merged and sent to the MROD-Out via SHARC links.

4.3 MROD-In firmware and software

The task of the MROD-In is to sort data from the CSMs and perform the first level of multiplexing in the MROD: data coming from two CSMs are merged to a single fragment by the MROD-In, which then forwards the fragment to the MROD-Out, where the second level of multiplexing take place: three fragments from each MROD-In are merged to form a MROD event fragment.

Execution of the task of the MROD-In is split between its major components: the two FPGAs read the data from the CSMs, store them in a memory buffer and sort the data according to the EVID, while the SHARC merges the data sorted by the FPGAs.

4.3.1 FPGA algorithm

The FPGA algorithm is stored in a memory chip mounted on the MROD-In which boots the FPGA when the MROD board is powered on. The FPGA is also connected to the MROD-In SHARC to be configured and to run: the SHARC can issue several commands to the FPGA: it can for example reset it, set it in running mode, or set it in a special test mode, where simulated TDC data is fed into the FPGA to test its correct functioning. The most important configuration parameter sent by the SHARC is the number of TDCs connected to the CSM; its importance will become clear in the next paragraphs.

The FPGA algorithm starts with the FPGA enabling communications with the GOLA link, which carries the data from the CSM. The FPGA partitions the ZBT memory in 18 circular buffers of 8192 words each, 18 being the maximum number of TDC potentially connected to the CSM. When data starts arriving from the CSM, the FPGA starts dividing the data between the partitions : the CSM sends a repeated cycle of 21 words (see Figure 4.5). The first word, called "Separator" marks the start of a new cycle, and the next two words are optical idle word which are filtered out by the GOLA card and never reach the FPGA. Words 4-21 are recognised as data words by the FPGA and undergo the following treatment:

the FPGA checks for the parity of the data word. If it is not correct, the data word is reformatted as an error word: the first nibble of the word is overwritten on the second nibble (in order to be able to recognise the failed word offline) and the first nibble is overwritten with a user-programmable error code;
if the current data word is a TDC header, the FPGA overwrites the most significant byte with the TDC identifier --- the reasons for this are fully explained in Appendix A;
if the current data word is a TDC hit, do nothing special;
if the current data word is a TDC trailer, the FPGA extract the encoded EVID from the word and raises a flag in the Tetris Register for the corresponding EVID and TDC identifier; this flag marks that all the data words for a given EVID have been sent;
if the current data word is a "dummy" word (i.e. the TDC did not send any word and the CSM put a place-holder instead) the FPGA discards it;
if the word is a header, data or trailer word, the FPGA stores it in one of the 18 memory partitions, depending on the order of the arrival after the separator word. So, word number 4 of the CSM stream will go to partition #1 (we are not counting the optical idle words) and so on up to word #22 which will be stored in partition #18. The process is illustrated in Figure 4.8.

This sequence is repeated each time the FPGA finds a separator word in the CSM stream; as long as the sequence continues, the Tetris Register fills up with flags marking completed events. The Tetris Register takes its name from the famous Russian game [91]: the register is made of 16 lines, each one reserved to an EVID, and 18 columns, one for each TDC. When one line is filled, i.e. all the TDC have sent data for an EVID, the data is ready to be output: the total data size is computed and put in a Event Length FIFO to be read by the SHARC, and the data is sent through the Output FIFO to the SHARC via DMA transfer; a Header and a Trailer are added to the data fragment.

The FPGA always outputs data fragments sequentially: in the unfortunate case that one or more TDC trailers are missing, the Tetris row for that EVID would never be complete; however, if the event with EVID+1 is completed, the FPGA detects that some trailers might be missing and outputs both events, marking in the Header word of the incomplete fragment which TDC did not send its trailer.

Another problem might arise if one of the TDCs "dies" and stops sending data altogether: then the Tetris Register will fill up with incomplete events. The FPGA recovers this situation by running into "panic mode": if the Tetris Register becomes full, the FPGA will output the first data fragment, marking it as incomplete, trying to free up space. However sequentiality is retained in "panic mode", this mode sees the FPGA running with mostly full buffers, so it is not recommended for high rate of data input: it is crucial that malfunctioning or disconnected TDCs are excluded from the partitioning algorithm. The configuration parameters received from the SHARC can disable readout for those TDCs and restrict the Tetris mechanisms to functioning ones.

File:Timedivmult.gif

Figure 4.8: Time-division-multiplexing in the FPGA: the ZBT memory buffer is sliced into 18 partitions, one for each connected TDC. The picture represents a simple case where only 4 TDCs are connected. When a separator word is received, the FPGA filters it out and starts counting the number of received words. The words are divided between partition with respect to the place they occupy after the separator word. Hence word #1 will be put in partition #1, word #2 in partition #2 and so forth, until a new separator word is found (optical idle word are not represented for simplicity). If an idle (empty) word is found, the FPGA skips the respective partition and filters out the idle word. When the data is ready for output, the partitions are read sequentially, thus ordering the data word according to the TDC identifier. An explanation of time-division-multiplexing from the CSM point of view is summarised in Figure 4.5.

4.3.2 SHARC algorithm

The algorithm of the MROD-In³ is organised in a two-loops structure, much like the MROD-Out software.

First, the MROD-In receives from the mrodsrv a string of configuration parameters which are used, among many other things, to tell each of the FPGAs how many TDCs are connected to each chamber and producing data. The MROD-In loads the configuration parameters into the FPGAs and can reset them if necessary.

Once the FPGAs are configured, the SHARC sets up the DMA controller which takes care of data transfer via the external bus from the FPGA to the internal memory of the SHARC. The FPGA is the master of the transfer control: it starts transferring data as soon as it's available.

After the configuration is finished, the MROD-In enters the input loop: the MROD-In checks a flag on the FPGA of the first channel. This flag signals that a complete event fragment has been written in the output FIFO of the FPGA; if so, the MROD-In reads from the Event Length FIFO a word which encodes the length of the data fragment and the EVID for the fragment. The MROD-In stores the EVID and checks if the number of words transferred to the memory of the SHARC is equal to the event length --- this is an extra security check, since by the time the Event Length FIFO is read, it is not guaranteed that the FPGA has already finished the data transfer. Once all the data has arrived, the MROD-In updates the data pointer of his memory buffers, and checks if the memory buffers are getting full, in which case it stops reading the Event Length FIFO --- a full FIFO effectively stops the DMA transfer on the FPGA side --- until more memory is freed by the output loop.

The input loop is repeated for both channels; when both channels have sent data fragments with the same EVID, then an event fragment is ready to be merged and sent to the MROD-Out.

The output loop collects the memory addresses that point to the event fragment, and fills out the DMA Transfer Control Blocks (TCB): each of these blocks contains:

a memory address to the first word of a data fragment;
the number of words to be transferred;
the address to the next TCB.

This information is used by the DMA controller to chain the data fragments sequentially and to merge the data from the two channels in a single DMA transfer; the MROD-In takes care of eventual roll-overs in the circular buffers by setting up two TCBs for the affected event fragment. Once the DMA chain is complete, the MROD-In computes the total event length and encodes the event length with the EVID in a "magic word" which is sent to the MROD-Out to initiate the data output. When the data transfer to the MROD-Out is complete, the MROD-In flags the memory occupied by the transferred fragment as free and checks if a second event fragment is read to be output, and the chaining procedure is repeated again. If no completed event fragment is present, the MROD-In ends the output loop and returns to the main loop. If the MROD-In is operating in debug mode, it might output to a console screen relevant information, such as number of transferred blocks, execution time, errors and so on.

4.4 MROD-Out firmware and software

The MROD-Out board has two SHARCs which may run in parallel the same algorithm for multiplexing the data coming from the MROD-In daughterboards⁴. The MROD-Out algorithm is based on an endless loop, which is divided in two nested sub-loops, the first sub-loop dedicated to the readout of the data coming from the MROD-Ins, the second sub-loop for the output to the S-Link LSC.

In the initialization phase, each MROD-Out SHARC is booted by the server program mrodsrv with the binary executable. The server provides the SHARC with a string of command-line arguments which are used as parameters for the algorithm --- for example, the number of connected MROD-Ins. In the initial phase of the program, the MROD-Out configures its hardware to communicate with the 3 MROD-In daughterboards: SHARC links 1,2 and 3 are set-up as 8-bit wide 40 MHz input links, and the DMA controllers are switched on to drive the data stream from those links. A portion of the MROD-Out internal memory is reserved to buffer the incoming data: for each MROD-In, a circular buffer with 8 slots of 400 words is allocated.

Then, the program enters the main loop: in the input phase, the MROD-Out polls the SHARC links connected to the MROD-In, looking for the magic word (see Section 4.3). This magic word signals to the MROD-Out that data is ready to be transferred. The magic word encodes the data length and the sequential EVID. The MROD-Out decodes the magic word and instructs the DMA controller to start a data transfer for the number of words indicated in the magic word. The MROD-Out reads the EVID and the BCID provided from the TIM module (see Section 4.2.7) and checks that the EVID encoded in the magic word matches with the EVID of the TIM. While the data is being transferred in a slot of the circular buffer, the header section of the event fragment is filled with relevant information (see Appendix A): MROD identifier, EVID, BCID, etc. The MROD-Out reads the DMA counter (which counts from the selected data length down to zero) to check if all data has been transferred; if so, the number of completed data fragments is increased. The MROD-Out monitors the occupancy level of the circular buffers: if either a single buffer has all its slots occupied or the majority of slots in all buffers is occupied, the MROD-Out sends a BUSY signal to the TIM module, throttling the trigger rate. The input loop is repeated until the number of completed fragments equals the number of connected MROD-Ins, in which case the program enters the output phase.

In the output loop, the algorithm collects the memory pointers of the buffered data and splits the data in a chain of 256-words-long data chunks. This occurs because the output FPGA of the MROD-Out has a 256-words-deep input FIFO which could be overflowed if data is not being output fast enough through the S-Link; hence the MROD-Out checks if the S-Link FIFO is empty: if the answer is affirmative, it sends the next chunk on the chain; if the answer is negative, the MROD-Out sends the TIM a BUSY signal and resumes sending the chunks once the FIFO has been emptied. The last data chunk on the chain contains the MROD trailer with the total data length for the six connected chambers and status and error words. When all the chunks of an event have been sent, the MROD-Out flags the data slots as free and ready for data input. The output loop is optimised to send immediately a new chain of data chunks if a new fragment has been completed in the meantime, thus minimizing the occupancy of the circular buffers.

In the last section of the main loop, debugging and benchmarking routines can be turned on, for example to dump sample events via the VME crate or display run statistics on a console screen. However, this feature is limited by the slowness of file I/O and it is possible only for trigger rates of the order of 100 Hz.

4.5 The MROD at work

The MROD prototype described in this thesis was used in 2003/2004 in two operating environments: at the cosmic ray test-stand at NIKHEF, and at the H8 test-beam at CERN.

4.5.1 The MROD at NIKHEF's cosmic ray test-stand

The main purpose of the cosmic ray test-stand is to test the functionality and evaluate the performance of the MDT chambers built at the production site at NIKHEF. The set up is organised as follows: five MDT chambers are stacked vertically in a support frame. The MDT chambers are connected to a gas system which fills the chambers with the ATLAS MDT standard --- 93%/7% Ar/CO₂ at 3 bar pressure --- gas mixture. A voltage of 3080 V is applied to the chambers by the High Voltage distribution system. Each chamber is fitted with mezzanine cards and a CSM module⁵; the 5 CSM modules are connected via optical fibers to an MROD.

The trigger system is composed of 4 trigger stations: each station is composed of six scintillators divided in two stacks of three. Each scintillator is read out on a single side by a photomultiplier tube; on each stack one photomultiplier is placed opposite to the other two achieve uniformity of response along the length of the stack; the station are placed perpendicular to the MDT tubes, to obtain information on the second spatial coordinate. All the PMT pulses are fed into a spare mezzanine to retrieve the timing information of the trigger signal; the mezzanine is read out by a CSM which is unattached to any chamber, used solely for the purpose of reading out the trigger mezzanine, and connected to the sixth and remaining input of the MROD.

For each trigger station, between the upper stack and the lower stack an iron slab of about 1 m is positioned. The purpose of the iron slab is to absorb low-energy cosmic rays: since a trigger signal is composed by a coincidence of PMT pulses coming from the upper and lower stacks, absorbed cosmic particles do not form a trigger signal. Soft cosmic rays need to be excluded from the data taking process because they are vulnerable to multiple scattering in the MDT chambers, which degrades the measured MDT resolution. The global trigger signal is formed with the logical OR of the trigger pulses of the individual trigger stations. The global trigger is fed via a NIM cable in the TTCvi module (see Section 3.4.4) that converts the trigger signal in optical format, and distributes it via optical fibers to all the CSMs.

The data readout chain is the same as explained earlier in this chapter, with a single difference: the data coming out of the MROD is not sent to a ROB. Instead, the data is sent via optical fiber to an S-Link-PCI interface card; this card consists of a standard S-Link LDC card mounted in a PCI slot of a PC. Software running on the PC performs the task of reading out the data arriving from the S-link, checking it for consistency of the data format and storing it on the hard disk.

The data stored on the PC can be analyzed in two ways:

at a low level, checking for data integrity, chamber/trigger efficiency, data throughput;
at a high level, reconstructing particle tracks from data, evaluating the R-t relation of the MDT chambers, track reconstruction efficiency, spatial resolution.

The data analysis software present performs low level analysis of MDT data. A sample output of this program is shown on Figure 4.9.

File:Present.gif

Figure 4.9: Visual output of the MDT data analysis program present. a) TDC spectrum of MDT Hits. b) Hits per tube of a single MDT chamber. Two malfunctioning wires are evident in this plot. The "stgif]]" of the histogram represent different mezzanines, each of which reads out 24 tubes. The shape of the histogram is affected by the efficiency and coverage of the trigger system: mezzanines closer to the scintillator trigger have higher hit rates. The overall curved shape of the histogram might be affected by light collection efficiency in the scintillators. c) Charge spectrum. The charge is measured by the Wilkinson ADC (see Section 4.1.2) and could be used to correct the TDC measurement from time slewing. d) Total length of event fragments.

4.5.2 The MROD at the H8 Test-beam 2003

The MROD was used in the data acquisition system for the 2003 H8 Muon Test-beam. This section will describe the DAQ system, while a full description of the Muon Test-beam layout is included in Section 2.4.

The DAQ software used at H8 was a prototype of the final ATLAS DAQ software, which is structured as a Finite State Machine (FSM): the configuration and running of the different DAQ subsystems is broken up into a finite number of well-defined sequential functional states, common to all subsystems. In this way, the synchronisation of the subsystems is guaranteed: the transition of the DAQ system from state (n) to state (n+1) cannot occur until all subsystems have successfully performed the transition from state (n-1) to state (n) --- this prevents, for example, that some part of the system is taking data while some other part is still waiting to be configured. The states defined in the ATLAS DAQ are: Absent, Booted, Loaded, Configured, Running, Paused and Stopped.

The DAQ was controlled by a PC located in the Counting Room of the H8 area: the task of this PC was to issue commands defining state transitions for all the DAQ subsystem. The DAQ controller's knowledge of the H8 detectors was embedded in a Configuration Database (ConfDB), which defined the subsystems making up the H8 DAQ, the protocols needed to communicate with the subsystems, and the configuration parameters needed for each state of the FSM.

The DAQ subsystem pertaining to the MDT chambers was composed of a MROD crate, which contained:

one Concurrent Technologies Crate Controller, on which Crate DAQ software was running;
two MROD boards; one of the MRODs was collecting data from the six CSMs connected to the MDT chambers in the Barrel section, while the second MROD was collecting data from the six MDT chambers in the End-Cap section;
one TIM module, which task was to multiplex the BUSY signals from the two MRODs and relay it to the trigger system, as outlined in Section 4.2.7;
one TTCvi board; this board is part of the TTC timing and triggering system (see Section 3.4.4) and had two tasks: the first task was to convert the electrical signals coming from the central trigger into optical signals to be distributed to the CSMs, while the second task was to provide the CSMs during the initialization stages with a 40 MHz clock signal and with BCID and EVID reset signals;
one Corelis JTAG/VME interface card; this was needed since the JTAG programming features of the ELMB board (see Section 4.1.4) were not implemented yet at the time of the test-beam; the task of the Corelis card was to distribute JTAG initialization strings to the CSMs.

The DAQ Controller was connected to the Crate Controller via Ethernet protocol. At the transition from the Absent to the Booted state, the DAQ Controller would start remotely the Crate DAQ software on the Crate Controller. The Crate DAQ software contained all the VME parameters needed to communicate with the boards contained with the crate, plus an ordered list of actions to perform on the boards during state transition. Once the Crate Controller was booted, it would be ready to receive commands from the DAQ Controller. In the transition from the Booted to the Loaded state, the Crate Controller would instruct the Corelis unit to program the CSMs with the proper JTAG bit-streams; while in the transition form the Loaded to the Configured state, the Crate Controller would invoke multiple instances of the program mrodsrv to boot all the SHARCs on the two MROD boards and configure them with the configuration parameters taken from ConfDB. In the Configured state, the MRODs are idle and waiting for a message from the Crate Controller to move to the Running state --- the communication between the Crate Controller and the MROD occurs through a message pipe on the Crate Controller's filesystem.

When all the modules in the crate are in the Configured state, the DAQ system can be switched into the Running state, where data readout occurs. The transition to the Running state starts with the Crate Controller ordering the MRODs to exit the idle mode and getting ready to receive data; then, the Crate Controller commands the TTCvi to reset the Bunch ID and the Event ID and to start relaying trigger signals to the CSMs. Upon receiving trigger signals, the CSMs collect data words from the TDCs and send it to the MRODs.

The data fragments built by the MRODs are sent via optical fibers to a ROS PC, which contains two S-Link PCI boards, one for each MROD. The data fragments are stored in memory and checked for errors before being sent to the SFI (Sub Farm Input) PC, which assembles data fragments from various ROS PCs into a complete event fragment, where data from all the Muon subdetectors (MDT, RPC, TGC) are present. The SFI PC and the ROSes are part of the DAQ and implement the Finite State Machine mechanism.

The H8 Muon test-beam of 2003 collected more than 60 million events. This data was used for several studies of the Muon Spectrometer (see Section 2.4). Some of the data was analyzed at NIKHEF with the track reconstruction package MuonRay [95], with electronics' mapping and data decoding routines written by myself, in order to adapt to the 2003 H8 data format. MuonRay can perform tracking efficiency measuremnets, track resolution measurements, chamber calibration.The visual output of this software package is shown in Figure 4.10.

File:Event display zy.gif

Figure 4.10: Event display of the package MuonRay. The picture shows a muon track crossing three MDT chambers of the barrel section at H8, taken from 2003 data. On the right, a zoom of the track crossing individual tubes of the three chambers. The black circles measure the track radii in each of the tubes, with the fitted track traced on top.

4.5.3 MROD-X

In Autumn 2003 the MROD was subjected to an extensive technical review by a panel at CERN. The panel pointed out weaknesses in the SHARC links, which are prone to transmission errors when operated at the maximum bandwith of 8 bits at 80 MHz. The panel suggested replacing the ALTERA FPGAs of the MROD with Xilinx Virtex-II Pro FPGAs. These FPGAs are technically more sophisticated and include the proprietary data transfer protocol RocketIO, which can achieve transfer rates up to 3 Gbps per link, thus virtually eliminating any risk of bottlenecks; in the MROD-X design, the Xilinx FPGAs mounted on the MROD-Ins communicate directly with the Xilinx mounted on the MROD-Out, bypassing the SHARCs and thus eliminating the communication problems (see Figure 4.11). This design conserves the topology of the ALTERA-based MROD, requiring only a minimal and unexpensive re-design of the PCB board. On the MROD-X, the SHARCs are still retained for configuration and monitoring purposes, and can be used as an emergency fall-back data link in case of malfunction of a RocketIO link.

File:Mrod-x-new.gif

Figure 4.11: Diagram of the new MROD-X layout: the dark blue arrows indicate the new RocketIO links between the Xilinx FPGAs. The SHARC links from the previous layout are kept for configuration and monitoring. For a comparison with the old layout, see Figure 4.6

Chapter IV

Contents