软件设计代写|CIT 594 Solo Project An Exercise in Software Design

Learning Objectives

In completing this assignment, you will learn how to:

Background

Government agencies such as the Centers for Disease Control can use social media information to get an understanding of the spread of infectious disease. By analyzing the use of words and expressions that appear over time on platforms such as Twitter, Facebook, etc., it is possible to estimate where the disease is affecting people and whether it is spreading or appears to be contained.

In this assignment, you will design and develop an application that analyzes a small set of Twitter data to look for tweets regarding occurrences of the flu and determines the US states in which these tweets occur.

Information about the format of the input data, the functional specification of the application, and the way in which it should be designed follows below. Please be sure to read all the way through before you start programming!

Input Data Format

Your analytical dataset is a collection of potentially flu-related tweets, with some metadata for each. At a minimum, each tweet record will contain the tweet text, the tweet date, and the tweet location in latitude/longitude format. Additional metadata may also be present, but is not needed for your analysis.

Some tweet records may appear in the dataset multiple times with identical data. You should analyze, count, and log each appearance independently, and not worry about trying to detect duplicate tweets.

Tweet records will be provided in two different formats: a tab-separated text file and a JSON file.

Your program will need to automatically select the appropriate parser for a given file based on its type. You may infer the format of a file from its file name extension (the portion of the file name following the last “.”). Note that the provided “flu tweets.txt” and “flu tweets.json” files both contain the same set of tweets, just in different formats and with slightly different extraneous metadata.

Tweets: Tab-Separated

The tab-separated values file for this assignment has “.txt” as its file extension. Each line of the file contains the data for a single tweet. The following is an example of the data for a single tweet:

[41.38, -81.49] 6 2019-01-28 19:02:28 Yay, homework!

The line contains four tab-separated (“\t”) fields:

This field is demarcated by square brackets (“[]”) and the latitude and longitude are separated by a comma and a space (“, ”): “[41.38, -81.49]”

Tweets: JSON

JSON (“JavaScript Object Notation”) is a popular standard for exchanging data on the World Wide Web. For details see: ECMA-404 and rfc8259. In this assignment and elsewhere, JSON files use the “.json” extension. In brief, JSON files are human-readable text files which encode data in JavaScript syntax (which is also similar to Python syntax). Permissible atomic values are strings, numbers, or one of true, false, or null. All other data is encoded in one of two composite types: “object” and “array”.

JSON objects are effectively maps written in the same syntax as Python dicts: curly braces (“{}”) surrounding comma separated key:value pairs. Arrays are represented as square brackets (“[]”) enclosing a series of comma separated values, like Python lists. In general, JSON allows for arbitrary nesting of composite types.

The JSON tweets file contains an array of tweet objects. In JSON, the example tweet above might look something like:

“location”: [41.38, -81.49],

“identifier”: 6,

“time”: “2019-01-28 19:02:28”,

“text”: “Yay, homework!”

Note that if you open the provided JSON tweet archive in a text editor (which you should do to familiarize yourself with its structure), you may see that the JSON tweet objects contain additional fields beyond the ones that we are using, that certain unused fields are missing, or that the fields are in a different order from the example given above. This should not affect your work on this assignment, as you are not expected to attempt to manually parse the JSON from the file text.

Instead, you will read in the files using a standard JSON processing library, and work with the resulting data structures.

There are numerous Java libraries for reading JSON objects, and numerous tutorials on how to use them. For this assignment, we’re going to be using the JSON.simple library. Use the provided json-simple-1.1.1.jar from the starter files; add it to your project’s build path. A tutorial for this library is available here. Do not put the jar file in your src or bin directories, and do notunpack it. Jars are meant to be used directly.

 To repeat: Do not attempt to write your own code to parse the JSON file! It would be extremely time-consuming to get all the details right, and would take you far afield from the focus of this assignment. Only process the JSON file using the provided JSON.simple library.

States

In order to determine the state from which each tweet originated, your program will also need to read a file that contains the latitude and longitude of the center of each of the 50 US states, plus Washington DC, in comma-separated format. Each line of the file contains the data for a single state and contains the name of the state, the latitude, and the longitude. Here is an example:

Alabama,32.7396323,-86.8434593

Provided Files

Your program will be evaluated using the following input files:

Download the three input files along with json-simple-1.1.1.jar and add them to your project’s root directory so that you can test your program. Identical copies of those files will be used as part of the functional evaluation of your submission. You should, of course, create your own input files for testing.

Functional Specifications

This section describes the specification that your program must follow. Some parts may be underspecified; you are free to interpret those parts any way you like, within reason, but you should ask a member of the instruction staff if you feel that something needs to be clarified. Your program must be written in Java. As with previous assignments, you should use Java 11 for this project since this is the level used by Codio. Your code must not make use of external libraries that are not part of the standard Java 11 API other than the provided JSON.simple library. Do not configure a module for your project (even if your IDE recommends doing so). It’s possible your IDE might generate a module-info.java file even without prompting you; we recommend deleting this.

Runtime arguments

The runtime arguments to the program should specify, in this order:

For example: flu tweets.json states.csv log.txt

Do not prompt the user for this information! These should be specified when the program is started

(e.g. from the command line or using a Run Configuration in your IDE). If you do not know how to do this, please see the documentation here.