Well, I upped the ante a little. 720 large caps from Nasdaq and NYSE, started with 32 variables. Disappointing results.
![]()
#1
Hello peeps. I'm working on making an econometric model of the stock market. I'm trying to unify fundamentals and some basic tech. analysis. This is just the beginning but I figured some of you may be interested in the results. The data consists of large cap NA companies. As you can see, I'm working on more variables..I have sector in the data set but I didn't make the dummies yet
I really need to find a better screening source with more key ratios. If you know of any, let me know.
Last edited by Eightysixturbo; 06-12-2012 at 11:03 PM.
pardon my 'merican
#2
Well, I upped the ante a little. 720 large caps from Nasdaq and NYSE, started with 32 variables. Disappointing results.
![]()
pardon my 'merican
#3
I imagine you'd have some wicked multicollinearity among price/book, price/sales, price/earning, PEG in the first. Try taking some (or 3) out and see what happens. I would say market cap is completely useless and your PE and PEG arent that great but you can't be sure until you check to see if you have a problem.
Also, linear is much less likely to be a good fit than some non-linear version. You could at least start by using logs. Others here have far more experience than I do with this though... beng?
Last edited by jnm2.0t; 06-13-2012 at 03:08 PM.
they're steppin' on my rhythm and they're stealin' all my lines
#4
Your model only explains 14-15% of the variance. Does that fly in your line of work?
Also you only have 191 and 720 obs, are these ALL Large Cap NA companies or where they chosen at random? If so, you need to run a Random Effects model.
#5
You also need to think bigger about what could explain the movement before setting out. What do you really think helps explain how the price reacts. Where are your outside economic variables? I can assure you stock price movements are not wholly explained by variables internal to the company alone.
they're steppin' on my rhythm and they're stealin' all my lines
#6
Line of work? I'm doing this out of my own interest. I understand the low adj r squared. This is only the beginning for me and I will post progress. The first regression was from nasdaq's website and the second was a screen of large caps.
First I'd like to run a non-linear regression like you said, I think some of these can be explained by it. Also maybe my at-one-moment style for the variables isn't working out. After I figure that out, then I can find out how to quantify fear and market panic
Simultaneously, I'm learning how to use Stata because I've given up on SPSS. I might post another, hopefully better, regression tonight.
Last edited by Eightysixturbo; 06-14-2012 at 02:05 PM.
pardon my 'merican
#7
Ok. New data set. All stocks from the Nasdaq. 1126 stocks eliminated due to missing values. Price is logged. Lesson today ; don't underestimate the value of Beta. More to come, might edit this later as i'm still playing around with the 59 variables I have.
![]()
pardon my 'merican
#8
What does a Log(stock price) mean? Son, you have set yourself up for the classic "re-transformation problem".
#10
To put it very simply, your model will give you the log (AAPL stock price) and that makes no sense in reporting. So you have to exponentiate the log (AAPL stock price) to get to the actual price. When you do this re-tranformation you now have to deal with homo or heteroscedasticity of the error term.
That's the short version. The long version is what is your end game? To show an army of econometricians, physicists (who work on wall street), statisticians/mathamaticians that a simple logit is a better fit than their proprietary hyper complex mixed effect hierarchical models in predicting future stock price?
#12
Yes but generally the more variables you add the higher your R2 will get just because it finds a home for the variance, but that doesn't really make it a good model. You cant just focus on R2 as your indication of a good model, there's some modeling methods out there like autoregressive and ARMA that produce astronomical R2 values because they rely on weighted averages of prior results but aren't really useful in practice.
You're also still using too many closely related variables like volatility month 1 and volatility week 1. Choose 1 of them.
they're steppin' on my rhythm and they're stealin' all my lines
#13
Also are you using log or ln?
they're steppin' on my rhythm and they're stealin' all my lines
#14
Last edited by Eightysixturbo; 06-15-2012 at 01:21 PM.
pardon my 'merican
#15
I am not an expert but I do read some off these regression reports. The log is fine. You just have to change the way you interpret the coefficients. So that beta coefficient of -0.217 basically means that a one unit increase in beta reduces price by 21.7%. Making these sorts of transformations always means you have to change the way you interpret.
But more generally, what is your data set? Is it cross-sectional, time series or longitudinal? And over what time period? Because whether that adjusted R-square is good or not really depends on the nature of your data set. In time series, a high R-square is pretty much the norm. In longitudinal, low R-square is the norm. Also, depending on the data set, you will have to use different estimation techniques. If you have a longitudinal data set, you should not use simple OLS. You might use fixed effects or random effects but random effects is only justified if you don't have any omitted variables that are not correlated with your other right hand side variables. I suspect there is a serious problem here. Why would price respond negatively to beta? Beta is a measure of the strength of correlation between the stock price and the market price. When that beta is big, it means that the stock is very risky relative to holding an S&P 500 index fund. Why would your returns be lower for holding riskier stocks? That does not jive with basic common sense or the no. 1 principle in investing: there is a positive tradeoff between risk and return. In other words, you should get paid for bearing risk which is why risky investments have on average higher returns.
Taking a step back, what do you propose to do with these results? Are you using them to forecast price?
#17
reread vwconvert's post
To get to the heart of his post, read up on endogeneity and instrumental variables. It's when your RH vars are correlated with your LH var (more specifically, when a RH var is determined in part by the LH var) -- so, price (or log(price)) determines in part any of your variables that use price. To deal with the situation, you want to find replacement explanatory variables that are correlated with the original RH vars but uncorrelated with the LH (the replacement RH vars are called instruments).
#18
Thanks for all of the info and suggestions. I kind of gave up on it for a bit due to being busy and whatnot.
For clarification, the data set is cross sectional. What I did was take all the nasdaq stocks and all the available indicators at the end of the day after trading (and after hours).
So, the data and results represent one day. I chalked up beta to perhaps being a bad day for stocks and the volatile ones took a hit as they sometimes do.
Forecasting price would be an interesting goal but I'm just trying to gain a simple understanding of the market. I'm well aware that there are complex models out there for specific sectors of the market.
DCdave, that is what I feared would be one of the many problems with this regression. Hopefully, I'll figure something out and develop a slew of different and equally complex issues![]()
pardon my 'merican