# 计算机视觉代写 | Fundamentals of Computer Vision Project Assignment 2

本次美国代写主要为Matlab计算机视觉相关的assignment

**1 Motivation**

The goal of this project is to implement forward (3D point to 2D point) and inverse (2D point to

3D ray) camera projection, and to perform triangulation from two cameras to do 3D

reconstruction from pairs of matching 2D image points. This project will involve understanding

relationships between 2D image coordinates and 3D world coordinates and the chain of

transformations that make up the pinhole camera model that was discussed in class. Your

specific tasks will be to project 3D coordinates (sets of 3D joint locations on a human body,

measured by motion capture equipment) into image pixel coordinates that you can overlay on

top of an image, to then convert those 2D points back into 3D viewing rays, and then

triangulate the viewing rays of two camera views to recover the original 3D coordinates you

started with (or values close to those coordinates).

You will be provided:

• 3D point data for each of 12 body joints for a set of motion capture frames recorded of a

subject performing a Taiji exercise. The 12 joints represent the shoulders, elbows,

wrists, hips, knees, and ankles. Each joint will be provided for a time series that is

~30,000 frames long, representing a 5-minute performance recorded at 100 frames per

second in a 3D motion capture lab.

• Camera calibration parameters (Intrinsic and extrinsic) for two video cameras that were

also recording the performance. Each set of camera parameters contains all information

needed to project 3D joint data into pixel coordinates in one of the two camera views.

• An mp4 movie file containing the video frames recorded by each of the two video

cameras. The video was recorded at 50 frames per second.

While this project appears to be a simple task at first, you will discover that practical

applications have hurdles to overcome. Specifically, in each frame of data there are 12 joints

with ~30,000 frames of data to be projected into 2 separate camera coordinate systems. That is

over ~700,000 joint projections into camera views and ~350,000 reconstructions back into

world coordinates! Furthermore, you will need to have a very clear understanding of the

pinhole camera model that we covered in class, to be able to write functions to correctly

project from 3D to 2D and back again.

The specific project outcomes include:

• Experience in Matlab programming

• Understanding intrinsic and extrinsic camera parameters

• Projection of 3D data into 2D images coordinates

• Reconstruction of 3D locations by triangulation from two camera views

• Measurement of 3D reconstruction error

• Practical understanding of epipolar geometry.

**2 The Basic Operations**

The following steps will be essential to the successful completion of the project:

1. Input and parsing of mocap dataset. Read in and properly interpret the 3D joint data.

2. Input and parsing of camera parameters. Read in each set of camera parameters and

interpret with respect to our mathematical camera projection model.

3. Use the camera parameters to project 3D joints into pixel locations in each of the two

image coordinate systems.

4. Reconstruct the 3D location of each joint in the world coordinate system from the

projected 2D joints you produced in Step3, using two-camera triangulation.

5. Compute Euclidean (L²) distance between all joint pairs. This is a per joint, per frame L²

distance between the original 3D joints and the reconstructed 3D joints providing a

quantitative analysis of the distance between the joint pairs.

2.1 Reading the 3D joint data

The motion capture data is in file Subject4-Session3-Take4_mocapJoints.mat . Once you load it

in, you have a 21614x12x4 array of numbers. The first dimension is frame number, the second

is joint number, and the last is joint coordinates + confidence score for each joint. Specifically,

the following snippet of code will extract x,y,z locations for the joints in a specific mocap frame.

mocapFnum = 1000; %mocap frame number 1000

x = mocapJoints(mocapFnum,:,1); %array of 12 X coordinates

y = mocapJoints(mocapFnum,:,2); % Y coordinates

z = mocapJoints(mocapFnum,:,3); % Z coordinates

conf = mocapJoints(mocapFnum,:,4) %confidence values

Each joint has a binary “confidence” associated with it. Joints that are not defined in a frame

have a confidence of 0. Feel free to Ignore any frames don’t have all confidences = 1.

There are 12 joints, in this order:

1 Right shoulder

2 Right elbow

3 Right wrist

4 Left shoulder

5 Left elbow

6 Left wrist

7 Right hip