Child pages
  • xAOD analysis tutorial
Skip to end of metadata
Go to start of metadata
Table of Contents

Setting up the Environment

Setup ATLAS Local Root Base (ALRB). 

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
setupATLAS

Most users add the first two lines from above to their profile so they can simply do "setupATLAS" to get the environment setup.

Setup FAX and ROOT so we can access data remotely:

lSetup fax root

Create the usual grid proxy. You will be prompted to enter your password.

voms-proxy-init -voms atlas

Lesson 1: Looking at xAOD

Let us take a look at a typical xAOD file:

mkdir lesson_1; cd ./lesson_1
xrdcp $STORAGEPREFIX/atlas/rucio/valid2:AOD.01482225._000140.pool.root.1 AOD.01482225._000140.pool.root.1
root -l AOD.01482225._000140.pool.root.1
TBrowser a

Use the TBrowser's left pane to open "CollectionTree". Then click on the branch with the extension “Aux” (for example, “AntiKt4TruthJetsAux”) and plot AntiKt4TruthJetsAux.pt.

See other details in the  CERN tutorial.

Lesson 2: Using pyROOT to read xAOD

Now we will read the above line in Python and print some values for electron (eta and phi).

First, we setup RootCore environment in the directory “lesson_1” that has the xAOD file:

rcSetup -u; rcSetup Base,2.0.12

Let's create a new file:

xAODPythonMacro.py
#!/usr/bin/env python
 
# Set up ROOT and RootCore:
import ROOT
ROOT.gROOT.Macro( '$ROOTCOREDIR/scripts/load_packages.C' )
 
ROOT.xAOD.Init() # Initialize the xAOD infrastructure
 
fileName="AOD.01482225._000140.pool.root.1" # Set up the input files
treeName = "CollectionTree" # default when making transient tree anyway
 
f = ROOT.TFile.Open(fileName)
t = ROOT.xAOD.MakeTransientTree( f, treeName) # Make the "transient tree"
 
# Print some information:
print( "Number of input events: %s" % t.GetEntries() )
for entry in xrange( t.GetEntries() ):
   t.GetEntry( entry )
   print( "Processing run #%i, event #%i" % ( t.EventInfo.runNumber(), t.EventInfo.eventNumber() ) )
   print( "Number of electrons: %i" % len( t.ElectronCollection ) )
   for el in t.ElectronCollection:  # loop over electron collection
      print( "  Electron trackParticle eta = %g, phi = %g" %  ( el.trackParticle().eta(), el.trackParticle().phi() ) )
      pass # end for loop over electron collection
   pass # end loop over entries

If you copy this script, correct indentation (the lines should start from the column position 0). This is a good for your to learn this code! Then run it as:

chmod +x xAODPythonMacro.py
./xAODPythonMacro.py

Using this code, one can fill histograms. But the code runs slow. Below we will show how to use C++/ROOT compiled code to run over this file.

How will you find xAOD variables without using ROOT TBrowser? Try this code:

asetup 19.1.1.1,slc6,gcc47,64,here
checkSG.py AOD.01482225._000140.pool.root.1

You will see a table with the names of the variables.

Lesson 3: Analysis program to read xAOD

Now we will create a C++/ROOT analysis program and run over this input xAOD file. Start from the new shell and set up environment, then:

mkdir lesson3
cd lesson3
rcSetup -u; rcSetup Base,2.0.12
rc find_packages  # find needed packages
rc compile        # compiles them

Next we will us a simple example code that runs over multiple files remotely accessed.

Download it from here, unzip it in the same directory, and re-compile everything:

wget https://ci-connect.atlassian.net/wiki/download/attachments/10780723/MyAnalysis3.zip
unzip MyAnalysis3.zip
rc find_packages    # find this package
rc compile          # compiles it

go to the program starting directory and recreate an input files list (Since this dataset is rather large we suggest to remove all but a few files from the inputdata.txt.) :

cd MyAnalysis/util  # go to the analysis code
fax-get-gLFNs valid2.117050.PowhegPythia_P2011C_ttbar.digit.AOD.e2657_s1933_s1964_r5534_tid01482225_00 > inputdata.txt 

Your analysis is started using testRun.cxx. We pass “submitDir” which will be the output directory with ROOT file. You must delete it every time you run the code (or use different output). The code runs over a list of files inputdata.txt.  The actual analysis should be put to “Root/MyxAODAnalysis.cxx” (called from testRun.cxx).

 testRun submitDir   # runs over all files inside inputdata.txt 

 

Lesson 4: Filling histograms

Start from the new shell and set up environment, then:

mkdir lesson_4; cd lesson_4 
rcSetup -u; rcSetup Base,2.0.12
rc find_packages  
rc compile      
wget https://ci-connect.atlassian.net/wiki/download/attachments/10780723/MyAnalysis42.zip
unzip MyAnalysis42.zip
rc find_packages    # find this package
rc compile          # compiles it

You will notice that we  have made a number of changes to code of the lesson3. We will fill histograms with pT of jets and muons. To do this, added the following changes:

In cmt/Makefile.RootCore we have added:

PACKAGE_DEP = EventLoop xAODRootAccess xAODEventInfo GoodRunsLists xAODJet xAODTrigger xAODEgamma JetSelectorTools JetResolution xAODMuon

Then we have modified:

MyAnalysis/MyxAODAnalysis.h # added new pointers to histograms
Root/MyxAODAnalysis.cxx     # initialized histograms and put loops over jets and muons

As you may see in Root/MyxAODAnalysis.cxx this analysis also requires GRL file that you can get here:

curl 'https://ci-connect.atlassian.net/wiki/download/attachments/10780723/data12_8TeV.periodAllYear_DetStatus-v61-pro14-02_DQDefects-00-01-00_PHYS_StandardGRL_All_Good.xml?version=1&modificationDate=1412370575621&api=v2'
	>../../data12_8TeV.periodAllYear_DetStatus-v61-pro14-02_DQDefects-00-01-00_PHYS_StandardGRL_All_Good.xml

Go to the program starting directory and recreate an input files list (Since this dataset is rather large we suggest to remove all but a few files from the inputdata.txt.) :

cd MyAnalysis/util  # go to the analysis code
fax-get-gLFNs valid2.117050.PowhegPythia_P2011C_ttbar.digit.AOD.e2657_s1933_s1964_r5534_tid01482225_00 > inputdata.txt 

finally start the analysis:

testRun submitDir   # runs over all files inside inputdata.txt 

Lesson 5: Working on a Tier3 farm (Condor queue)

In this example we will use HTCondor workload management system to send the job to be executed in a queue at a Tier3 farm. 

For this example we will start from the directory lesson 4, so if you did not do the lesson 4 please do that one first and verify that your code runs locally.

Start from the new shell and set up environment, then create this shell script that will be executed at the beginning of each job at each farm node:

startJob.sh
#!/bin/bash
export RUCIO_ACCOUNT=YOUR_CERN_USERNAME
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
lSetup fax root
 
# Not needed if using `x509userproxy' in the submit file:
#export X509_USER_PROXY=$PWD/your-proxy-file

rcSetup -u; rcSetup Base,2.0.12
rc find_packages
rc compile  

unzip payload.zip
rc find_packages
rc compile  

cd MyAnalysis/util
rm submitDir

# here we create a job specific inputfile list
echo "job $1 from $2"
tot_files=$( cat inputdata.txt | wc -l )
echo "total files: $tot_files"
rem=$(( $tot_files%$2 ))
files_per_job=$(( $tot_files/$2 ))
if [ $rem -ne 0 ]; then
   files_per_job=$(( $files_per_job+1 ))
fi
echo "files per job: $files_per_job"
start_file=$(( $1*$files_per_job+1 ))
end_file=$(( $start_file+$files_per_job-1 ))
echo "start file: $start_file   end file: $end_file"
sed -n $start_file\,${end_file}p inputdata.txt > tmp.txt
cp tmp.txt inputdata.txt

cat inputdata.txt
echo "startdate $(date)"
testRun submitDir 

# here we move the output data back to the start directory. All the files in that directory are copied back by condor.
mv submitDir/hist-sample.root ../../hist-sample_$1.root
echo "enddate $(date)"

Make sure the RUCIO_ACCOUNT variable is properly set. Make this file executable and create the file that describes our job needs and  that we will give to condor:

job.sub
Jobs=10
getenv         = False
executable     = startJob.sh
output         = MyAnal.$(Process).out
error          = MyAnal.$(Process).err
log            = MyAnal.$(Process).log
arguments = $(Process) $(Jobs) 
environment = "IFlist=$(IFlist)"
transfer_input_files = payload.zip,/tmp/your-proxy-file,MyAnalysis/util/inputdata.txt
universe       = vanilla
+ProjectName="myproject"
# Copy your local $X509_USER_PROXY to the job, transparently
# (Local, Java, and Vanilla universes only.)
x509userproxy  = $ENV(X509_USER_PROXY)
queue $(Jobs)

To access files using FAX the jobs need a valid grid proxy. That's why we send it with each job. Proxy is the file starting with "x509up" so in both  job.sub and startJob.sh you should change "your-proxy-file" with the name of your grid proxy file. The filename you may find in the environment variable $X509_USER_PROXY. Set a correct project name using the +ProjectName = "myproject" parameter or write the correct name of your project in a file: $HOME/.ciconnect/defaultproject.

You need to pack the analysis directory and the GRL file into a payload.zip file:

startJob.sh
rc clean
rm -rf RootCoreBin
zip -r payload.zip MyAnalysis data12_8TeV.periodAllYear_DetStatus-v61-pro14-02_DQDefects-00-01-00_PHYS_StandardGRL_All_Good.xml

Now you may submit your task for the execution and follow its status in this way:

startJob.sh
~> condor_submit job.sub
Submitting job(s)..........
10 job(s) submitted to cluster 49677.

~> condor_q ivukotic 
-- Submitter: login.atlas.ci-connect.net : <192.170.227.199:60111> : login.atlas.ci-connect.net
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
49677.0   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 0     
49677.1   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 1     
49677.2   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 2     
49677.3   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 3     
49677.4   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 4     
49677.5   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 5     
49677.6   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 6     
49677.7   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 7     
49677.8   ivukotic       10/9  10:21   0+00:00:11 R  0   0.0  startJob.sh 8     
49677.9   ivukotic       10/9  10:21   0+00:00:10 R  0   0.0  startJob.sh 9     

10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended

When jobs are done all of the output files will be sent back to your working directory.

 

  • No labels
Write a comment…