The Top 3 Reasons to Run LSF Command bpeek 2017-08-27T14:31:52+00:00

The Top 3 Reasons to Run LSF Command bpeek

With LSF, as an end user, you have some tools and methods to inspect jobs while they are running.

An example would be using the bpeek command.

bpeek shows you the standard output and standard error output produced by an unfinished job at the moment in time it is invoked. Three reasons why you would run this command:

  1. It is useful in monitoring the progress of a job and identifying errors.
  2. It gives you a chance to see if your job is valid (or buggy) so that you can decide to let it complete or to perhaps kill it so that you can avoid wasting time.
  3. It also can help you determine which files are being processed at a given point in time which can also be useful in troubleshooting any real or perceived file system (GPFS, Lustre, Quobyte) issues.

bpeek [-f] [-q queue_name | -m host_name | -J job_name | job_ID | “job_ID[index_list]”]  

Options

Options/Flags Description
-f Displays the output of the job using the command tail -f.
-q queue_name Operates on your most recently submitted job in the specified queue
-m host_name Operates on your most recently submitted job that has been dispatched to the specified host.
-J job_name Operates on your most recently submitted job that has the specified job name.

The job name can be up to 4094 characters long. Job names are not unique.

The wildcard character (*) can be used anywhere within a job name, but cannot appear within array indices. For example job* returns jobA and jobarray[1], *AAA* [1]returns the first element in all job arrays with names containing AAA, however job1[*] will not return anything since the wildcard is within the array index.

job_ID | “job_ID[index_list]” Operates on the specified job.

 

-h Prints command usage to stderr and exits
-V Prints LSF release version to stderr and exits.

 

An example would involve a script that you submitted as a job (Ex: job ID 2574378).  There is an error in the script (perhaps a missing path to command or the command you wish to run does not exist on the compute node) that would not necessarily cause the script to end or fail suddenly, but the results you expect to see may not be correct. In other words, it worked in your test environment, but bpeek can help you determine that it may not work on the cluster and you (or your cluster admin) may have to install some additional software on the compute nodes.

[bpappas@login.morgan ~]$ bpeek -f 2574378

<< output from stdout >>

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

./date_into_file.sh: line 6: 1: command not found

The above output indicates that a command is failing to execute repeatedly during the job. This is not intended to replace an error log, rather a mechanism to see executions errors in the job as it is running.