5 April, 2024
0 Comments
1 category
Dataset:
- twitter_ecommerce.csv contains information on tweets about four Chinese E-commerce
stocks: BABA, JD, PDD, and VIPS. - stock_price.csv contains the daily price information for the above four stocks.
Tasks:
- Calculate three sentiment scores: neg (fraction of negative words), pos (fraction of positive
words), and vader for each stock at the daily level and assess the pairwise correlations among
them using the Pearson Correlation tool. Comment on how these sentiment measures are
correlated with each other. - Calculate the daily returns: Return is defined as (Price_t – Price_t-1)/Price_t-1. Consider the
“Adj Close” column as the price. Use the Multi-Row Formula tool to generate the lagged price
column with the following expression: IF [Row-1:Ticker] = [Ticker] THEN [Row-1:Adj Close]
ELSE NULL() ENDIF. Make sure you sort the data appropriately before applying this expression. - Join the stock return and tweet data by Ticker and Date and then run the regression of the
same-day stock return on daily tweet intensity (i.e., number of tweets) and three sentiment
measures. Pool the data of the four stocks together for the regression. Given the distribution
of tweet numbers, apply the logarithmic transformation to the number of tweets variable to
increase model fit after adding 1 to it, i.e., log(1+tweet_num). Note that on certain trading
days, there may not be any tweet on a specific stock on Twitter. Replace such missing values
of tweet intensity and sentiments with 0. Comment on the regression results and summarize
the key insights. Interpret the regression coefficients, p-values, and R-squared. - Run the regression of next-day stock return on daily tweet intensity and three sentiment
measures, i.e., using yesterday’s tweet data to predict today’s stock return. To do this, you
can use the Multi-Row Formula tool to generate a lagged date column (LaggedDate) first in
the stock return dataset. Then in the existing Join tool, you can simply change the Field from
Date to LaggedDate for the stock return side. Comment on the regression results and
summarize the key insights. In your response, highlight the differences in regression results
for next-day stock return from that of same-day stock return.
Deliverables:
- You are required to finish this assignment in Alteryx only.
- Source code of your program in one file (.yxmd for Alteryx) containing the complete analysis,
with annotations explaining each tool and step. Use relative path for workflow dependencies
in Alteryx so that the grader can run your program without making any change. There should
be one Join Tool in your program, and by changing the specific Field, the grader can run either
same-day or next-day return regressions. - A Word document (.docx) containing your responses to each requirement in the Tasks.
- Compress the above two files into a zip file named with your student ID, e.g., 123456.zip.
- You should not make any modifications to the two input files: twitter_ecommerce.csv and
stock_price.csv. Also, DO NOT include these two input csv files in your zip file.
Evaluation Criteria: - Correctness and completeness of the feature engineering steps implemented in Alteryx.
- Accuracy and thoroughness of the model interpretation of results within Alteryx.
- Quality and clarity of the final report, including insights and conclusions drawn from the
analysis
Get Homework Help Now
Category: Uncategorized