CS代写|COMPSCI 4073 DATA FUNDAMENTALS (H)

(a) One of the features will measure the size of cells. The current software identififies marker points on a cell. A function called marker_pts(size, pts) takes a hypothetical size and a set of observed points and returns how likely the cell is to be that size given those observed points.

Three members of the development team suggest different strategies for estimating the size of a cell.

(i) What types of statistical estimation are Raul, Hugh and Cara suggesting? Give one pro and one con for each general approach. [6]

(ii) One of these models is to be implemented with a probabilistic programming language.

Which of the directed graphs below is a viable model for a probabilistic program estimating the unknown size given some observed points and latent variables spread and epsilon that determine size? Explain your choice.

(b) After processing, cell sizes are binned into fifive discrete categories. Each slide i = 1Nfrom a sample type j = 1Nj is processed by the microscope to give each cell k =1Nk on that slide a size category si jk.

(i) Explain how to numerically determine from this data which sample type has the most diverse size categories on average. Do not write code; explain your answer in words and use equations as appropriate. [5]

(ii) The team have created a plot showing the cell size counts, but are unhappy with the result. Suggest a problem with the visualisation approach used, and brieflfly outline a visualisation approach that would better reveal relevant details.

(c) You are building a tracker that tracks the 2D position of a cell under a microscope. This tracker maintains many possible guesses about the location of the cell.

(i) One simple pre-processing step is to divide the camera image by the frame before, to compute a relative change in brightness. Give two circumstances where an IEEE754 exception might be raised during this operation. [2]

(ii) The initial version of the tracker tracks 100 cells, and each cell is tracked using 35 guesses. Each guess is a 2D point. The last 60 frames of this data are stored in a NumPy array. Suggest a suitable shape for the array containing the guesses. [2]

(iii) A computer vision engineer writes a function lik_cell(samples, img) that takes an (N, 2) array of guesses and an image and returns an (N,) array of log-likelihoods  for each guess given that image.

Write a vectorised NumPy function (i.e. without any explicit loops) expected_locations(guess_array, img) that takes an array with the shape you gave above, and an image. This function should select the last frame of guesses; normalise the likelihoods for each cell to form a probability mass function per cell; and return the expected location of each cell in the last frame using this PMF, as a (100, 2) array. [8]

Answers to the following questions may involve more than one of these transformations.

(i) In terms of the Layered Grammar of Graphics, is this a faceted or layered plot? State your reasoning brieflfly [2]

(ii) Which of these transforms could plausibly be linear? State your reasoning brieflfly. [2]

(iii) Which of the linear transforms would be represented with a matrix whose eigendecom position has signifificantly unequal eigenvalues? State your reasoning brieflfly. [2]

(iv) Which of the linear transforms would you expect to have all singular values close to 1?

State your reasoning brieflfly. [2]

(v) Which of the linear transforms would you expect to have a determinant close to or equal to zero? State your reasoning brieflfly, and explain what relevance a determinant of zero would have in operations applied to this transform. [3]

(b) These transformations represent different optical transformations made by a microscope.

To perform analysis of microscope slides, the effects of these optical transforms need to be removed from positions identifified on the image itself.

(i) How would you propose doing this “transform removal” for those transforms which you assess to be linear, given a matrix of input points X and a known transform matrix A? State an equation in your answer. [2]

(ii) The team implementing the microscope have a built in SVD function in the microscope fifirmware. Given a matrix A representing an optical transform, explain how this routine could be used to: (a) concretely perform the removal you described above; (b) assess the numerical stability of this removal; and (c) determine if there will be any rotation or skewing of the points when removing the transform. [6]

(c) The team propose visualising the quality of the transform removal. They estimate this by imaging a special calibration card with 32 known positions under the microscope, reversing the expected transform, and comparing the resulting positions with the known positions on the calibration card.

(i) Given an input (32,2) tensor of calibration points pts and an observed set (32, 2) of corrected points observed, write a NumPy function distances to compute square of the L1 distance of each observation from the original calibration position and return all 32 distances. [3]

(ii) The team have computed these distances for four different microscopes, each with its own transform. On each microscope, the error has been measured at 12 different focal levels. The team wish to visualise these results, and have produced the plot below showing the mean error. Criticise this plot, and indicate how you would improve it.

(iii) Discuss whether a line plot or step/staircase plot is more appropriate for the mean curve plotted above. [3]

(a) In this analyser, one specifific cell is tracked and the overall displacement of the cell is tracked for 10 seconds after agitation stops. The displacement is measured at 60Hz, the frame rate of the microscope camera.

(i) The cells in a test to be run are known to have an important oscillation at 13Hz. Will these oscillations be detectable with the apparatus available? State your reasoning and any assumptions made. [3]

(ii) The analysis team have 400 seconds of analysis for cells from 80 slides from 32 sample types as a [24000, 80, 32] row-major float64 array. The analysis team want to rearrange this to an array of 10 second contiguous windows for each of the 32 sample types, combining together all slides for that type. Describe the shape and strides of the array before and after this transformation, and explain the steps that would be applied in NumPy to perform this transformation. [8]

(iii) For other samples, the important oscillation frequencies are unknown. Describe a procedure to extract the amplitude of different frequencies present in this 60Hz sampled signal and how the result would be interpreted. [3]

(iv) Given the outputs of this analysis, describe how to generate a pure oscillation at the frequency determined to have maximum ampltitude, and describe in high level terms how you would set all relevant parameters of this oscillation from the analysis output. [4]

(b) An analysis procedure has been devised that produces an array of amplitudes at evenly spaced frequencies from an input time series, as shown below. The team want to be able to identify the most signifificant “peak” and identify its centre frequency, amplitude and width.

To do this, it is proposed to fifind the maximum value in the amplitudes, fifit a quadratic curve to the values close to that maxima, and extract the relevant parameters.

(i) Write a NumPy function slice_peak(xs, w) that fifinds the index of the largest element of a one-dimensional array x, and slices out a region w wide on each side of that index (i.e. a slice of legnth 2*w) and returns it. Assume that this region will never exceed the bounds of the array. [2]

(ii) A quadratic peak has the formula

y = max(a2(xx0) 2 +a1(xx0) +a0,0),

where x0,a0,a1,a2 are real-valued parameters; x0 being the centre of the peak..

Describe how an objective function could be derived to determine how well a parameter vector matched a slice of data as returned by slice_peak and suggest a suitable optimisation algorithm that could be applied and any hyperparameters that might need to be tuned and what they would do in your chosen algorithm. Do not write code; explain in words. [6]

(iii) Fitting a quadratic via optimisation of this objective function often leads to unexpected results, as this often results in extreme values for a2 and a0. Describe a way of constraining the optimisation to prefer small values of a2 and a0 that will work with any optimisation algorithm, and suggest how this preference for small values could be adjusted. [4]