giodsmall.gif (8034 bytes)

Grid Enabled Analysis

GAE

Clarens

CAIGEE

Tier2 Information

The GIOD Project

GIOD Description

GIOD Presentations

GIOD Images

GIOD Results

GIOD Publications

GIOD Contacts

GIOD Press

GIOD Notes

General

Web Server Statistics

JJB's Home Page

GIOD  Partners

Caltech's Centre for Advanced Computing Research

Caltech's HEP department


CERN's Information Technology Division


CERN's CMS experiment

Hewlett Packard Company

 

  • Data analysis tasks are characterised by large CPU time per event
  • Implies a small data flow rate (most of the time is spent calculating, rather than doing I/O)
  • Thus we pre-stage and filter the required data first
    • The staging and filtering is a part of re-clustering the events either for individual user, or analysis group needs, and is done in a separate process so that:
      • free up any central tape or disk drive asap   
      • minimize contention with other users, associated with central access of data   
      • start moving "small" rather than"large" datasets around,  asap.
      • The data to be reclustered and moved can be determined from a single job for the user or analysis group
  • When processing, we want the access to the data to belocal in time as well as space: This means reducing the probability of having to open a container and fill a cache, only to have the cache flushed before many objects in the container are de-referenced.
    • Thus, to reference raw data or lower level hits (perhaps due to an unusual condition or analysis requirement), then the analysis causes a flag (or store a tag) to be set, which defers the access to the required container until later (when sufficient other analysis tasks require access to that container).
    • We avoid handling the unusual condition in quasi-real time, during the fast analysis on smaller, or more easily reachable, container(s).
  • We move the events into 300 MBytes of local memory on the processor on which they will be analysed
    • Moving 300 Mbytes to a local memory takes only 3000 seconds at 1 Mbps.
    • A "service" for 100 users, with some 100 Mbps should be sufficient.
    • Moving the data to the user in this way, takes advantage of many desktops for processing.
    • We can use similar considerations for data movement over a LAN, at say 10 Mbps/user (5 minutes to get 30k events). Need 1 Gbps to run such a service.
    • This avoids contention with other users for the same data
    • Users could get many "chunks"of data per day.
    • We can fit 30000 events of size 10 kBytes into the memory
  • We then start the analysis task on the processor, assumed to be 2000 MIPS.
    • We assume the time to analyse the event is proportional to the time taken to reconstruct it: 20000 MIPS-seconds per raw 1 MByte event.
    • So we need 200 MIPS-seconds for our 10 kBytes events
    • The analysis runs at the rate of 10 events per second
    • The analysis is complete in 3000 seconds

.

 

  02/16/2007 by Julian Bunn, email: Julian.Bunn@caltech.edu