broken track simulation

class module0_flow.misc.broken_track_sim.BrokenTrackSim(**params)

Bases: h5flow.core.H5FlowStage

Generates a realistic broken track distribution by randomly translating reconstructed track hits and removing hits that cross disabled sections of the anode plane.

The algorithm is:

select a random “source” track within the event passing a length selection cut

translate the random track in x,y such that the track is still contained

mask off hits that fall on disabled channels

re-run track reconstruction on new hit distribution

label new tracks as broken according the overlap of the new track hits with the old source track and the distance of their endpoints

Parameters:

path: str, path to output datasets within HDF5 file
generate_2track_joint_pdf: bool, flag to generate an output .npz file that can be used by the the track merging reconstruction
joint_pdf_filename: str, path of output .npz file (if generated)
pdf_bins: list of list, bin description for each parameter in output pdf, each formatted as (log10(min), log10(max), nbins)
rand_track_length_cut: float, track length cut for source track [mm]
broken_track_distance_cut: float, cut on the distance of the 2nd-closest new track endpoint from the closest source endpoint to label a track as broken
tracks_dset_name: str, path to input tracks dataset
hit_drift_dset_name: str, path to charge hit drift data
hits_dset_name: str, path to input charge hits dataset

All of tracks_dset_name, hits_dset_name, and hit_drift_dset_name are required in the cache.

Requires Geometry and DisabledChannels resources in workflow.

offset datatype (1:1 with event):

id          u4,     unique identifier
dx          f8,     x translation applied to event
dy          f8,     y translation applied to event
i_track     i8,     index of track within event used as source

label datatype (1:1 with new track dataset):

id                              u4,     unique identifier
match                           u1,     1 if new track is matched to the source track
broken                          u1,     1 if new track is broken
neighbor                        i4,     index of neighboring track
hit_frac                        f4,     fraction of hits that came from source track
true_endpoint_d                 f4(2,), minimum distance endpoints to source track endpoints
neighbor_deflection_angle       f4,     deflection angle of track and its neighbor
neighbor_transverse_sin2theta   f4,     transverse endpoint angle of track to its neighbor
neighbor_missing_length         f4,     missing length of track to its neighbor
neighbor_overlap                f4,     overlap of track and its neighbor
neighbor_sin2theta              f4,     angle of track and its neighbor

The new tracklets dataset datatype is the same as TrackletReconstruction.tracklet_dtype.

apply_translation(hits, rand_x, rand_y)

class_version = 3.1.0

default_pdf_bins = [(), (), (0, 3, 30), (), ()]

find_matching_tracks(new_tracks, rand_tracks, rand_x, rand_y, track_ids, hits_track_idx)

finish(source_name)

generate_random_translation(rand_tracks)

init(source_name)

missing_track_segments = 200

new_track_dtype

new_track_label_dtype

offset_dtype

run(source_name, source_slice, cache)

select_random_track(tracks)

setup_reco()

truth_hit_frac_cut = 0.8

class module0_flow.misc.broken_track_sim.JointPDF(*bins)

Bases: object

fill(*val)

class module0_flow.misc.broken_track_sim.TrackletMerger(**params)

Bases: h5flow.core.H5FlowStage

Merges existing tracks with neighbors based on a multi-dimensional likelihood ratio metric. The observables used in the likelihood estimation are:

sin^2(theta): angle between the two track segments

transverse distance: maximum transverse displacement of track from the axis of the first track [mm]

missing length: length of line segment between closer two endpoints that crosses active pixels [mm]

overlap: quadrature sum of 1D overlap of tracks in x, y, and z [mm]

delta-dQ/dx: difference in raw dQ/dx [mV]

Requires an input histogram .npz file consisting of 4 arrays:

'{sig}': an array of shape: (N0, N1, ... N4) representing the number of signal events in each bin of the 5 observables

'{sig}_bins': an array of 5 arrays each with shape: Ni+1 representing the bin edges

'{bkg}': an array of shape: (N0, N1, ... N4) representing the number of background events in each bin of the 5 observables

The selection is performed by normalizing the input histograms to a PDF, calculating the signal/background likelihood ratio, and rescaling to a normalized metric between 0 and 1. The p-value (or inefficiency) of this metric is calculated based on the signal histogram. The track merging selection cut is applied on this p-value, e.g. a pvalue_cut = 0.05 will result in a 95% selection efficiency for merging neighboring tracks (at least for the sample used to generate the input histograms).

Parameters:

pdf_filename: str, path to .npz file containing multi-dimensional pdf (more details above)
pdf_sig_name: str, name of array in .npz file containing the “signal” histogram
pdf_bkg_name: str, name of array in .npz file containing the “background” histogram
pvalue_cut: float, p-value/inefficiency used as cut for likelihood ratio
max_neighbors: int, number of neighbor tracks to attempt merge procedure
track_charge_dset_name: str, path to input charge dataset (1:1 with track hits, requires 'q' field)
hit_drift_dset_name: str, path to charge hit drift data
hits_dset_name: str, path to input charge hits dataset
track_hits_dset_name: str, path to input track-referred charge hits dataset
tracks_dset_name: str, path to input track dataset
merged_dset_name: str, path to output track dataset

All of hits_dset_name, hit_drift_dset_name, track_hits_dset_name, and tracks_dset_name are required in the cache.

Requires both Geometry and DisabledChannels resources in workflow.

merged datatype is the same as the TrackletReconstruction.tracklet_dtype.

Example config:

track_merge:
    classname: TrackletMerger
    requires:
     - 'combined/tracklets'
     - name: 'combined/track_hits
       path: ['combined/tracklets', charge/hits']
     - name: 'combined/track_hit_drift
       path: ['combined/tracklets', charge/hits', 'combined/hit_drift']
    params:
        merged_dset_name: 'combined/tracklets/merged'
        hit_drift_dset_name: 'combined/hit_drift'
        hits_dset_name: 'charge/hits'
        track_charge_dset_name: 'charge/hits'
        tracks_dset_name: 'combined/tracklets'
        pdf_filename: 'joint_pdf.npz'
        pvalue_cut: 0.10
        max_neighbors: 5

static calc_2track_deflection_angle(tracks, neighbor)

static calc_2track_missing_length(tracks, neighbor, missing_track_segments, pixel_x, pixel_y, disabled_channel_lut, cathode_region, pixel_pitch=None)

static calc_2track_overlap(tracks, neighbor)

static calc_2track_sin2theta(tracks, neighbor)

static calc_2track_transverse_sin2theta(tracks, neighbor)

cathode_region = 15

class_version = 3.1.0

static closest_trajectories(tracks0, tracks1)

Parameters:

tracks0 – track dtype of shape: (..., M,)
tracks1 – track dtype of shape: (..., M,)

Returns:

start and end points of closest trajectory segments and points of closest approach, shape: (..., M, 3)

static create_groups(mask)

Combine masks of n x n ajacency matrix such that the mask of row i is equal to the OR of the rows that can be reached from i and the rows that can reach i. E.g.:

arr = [[1,0,1],
       [0,1,0],
       [0,0,1]]
new_arr = create_groups(arr)
new_arr # [[1,0,1],
           [0,1,0],
           [1,0,1]]

and:

arr = [[0,1,0],
       [0,0,1],
       [1,1,0]]
new_arr = create_groups(arr)
new_arr # [[1,1,1],
           [1,1,1],
           [0,1,1]]

Parameters:: mask – ajacency matrix (shape: (..., n, n))
Returns:: updated ajacency matrix (shape: (..., n, n))

default_hit_drift_dset_name = combined/track_hit_drift

default_hits_dset_name = charge/hits

default_max_neighbors = 5

default_merged_dset_name = combined/tracklets/merged

default_pdf_bkg_name = origin

default_pdf_filename = joint_pdf-2_0_1.npz

default_pdf_sig_name = rereco

default_pvalue_cut = 0.1

default_track_charge_dset_name = charge/hits

default_track_hits_dset_name = combined/track_hits

default_tracks_dset_name = combined/tracklets

static find_k_neighbor(tracks, mask=None, k=1)

Find k-th neighbor based on endpoint distance and require no overlap:

tracks is an (N,M) array of tracks

mask is boolean of same shape as tracks

mask true indicates a valid track to search for neighbors

init(source_name)

static load_r_values(filename, sig_key, bkg_key)

Load the N-D pdf histogram from an .npz file. Loads and normalizes the histograms stored under {sig_key} and {bkg_key} with bins stored under {key}_bins to create a PDF. The likelihood ratio (R) is then calculated and converted to a normalized value between 0-1 (r) with the following transformation:

r = 1 - e^(-R)

Bins with 0 entries are assigned an R-value of 0.

Parameters:

filename – path to .npz file with arrays
sig_key – name of “signal” histogram in .npz file
bkg_key – name of “background” histogram in .npz file

Returns:

tuple of r histogram (shape: (N0, N1, ...)), r bins in each dimension (shape: (D, Ni)), an array possible r values (shape: (1001,), and corresponding p-values (shape: (1001,))

static make_missing_segment(start1, end1, start2, end2)

merged_dtype

missing_track_segments = 150

static poca(start_xyz0, end_xyz0, start_xyz1, end_xyz1)

Finds the scale factor to point of closest approach of two lines each defined by 2 3D points. The scale factor is a number between 0 and 1 representing the position along the line. To extract the 3D point of closest approach on each line:

s0, s1 = poca(start0, end0, start1, end1) # shape: (N, 1)
poca0 = (1 - s0) * start0 + s0 * end0 # shape: (N, 3)
poca1 = (1 - s1) * start1 + s1 * end1

Parameters:: end}_xyz(i) ({start,) – start/end point of line i, shape: (..., N, 3)
Returns:: tuple of line segment 0 and 1, shape: (..., N, 1)

run(source_name, source_slice, cache)

static score_neighbor(r, r_bins, statistic_bins, p_bins, *params)

Calculates a p-value based on a binned, multi-dimensional PDF

Parameters:

r – likelihood ratio, shape: (N,)*D
r_bins – bin edge for each parameter, shape: (D, N+1)
statistic_bins – bins for statistic, range 0-1, shape: (n,)
p_bins – bins for p value range 0-1, shape: (n,)
*params –
array of parameters to use to calculate p-value, requires D parameters in the same sequence as listed in the bins, each with the same shape

Returns:

array of same shape as the params arrays with a p-value between 0-1