Measuring Information Consumption via EDGAR Log Files

EDGAR, the Electronic Data Gathering Analysis and Retrieval system performs automated collection, validation and forwarding of submissions by companies who are required by law to file forms with the U.S. Securities and Exchange Commission (SEC). “

So in a nutshell, the EDGAR database can be seen as an open source of securities market information where they list a variety of filings such as 10-k filings (annual reports), regulatory stuff and also registration statements (IPO’s).  Kindly, the SEC publicly provides the amount of traffic on the EDGAR data filings. This means that one can examine the consumption or interest of financial information by measuring the magnitude and frequency of accessed data filings. The Log File Dataset consists of many variables like the timestamp when the file has been accessed,  the company identifier code, filing number and also crawling information, in particular is it an automated crawler (via e.g. Python or even Google-bot) or only a single person who queries the database. The dataset ranges from 2003 to 2017, thus there is a huge amount of data. Only one day of log file data is about 3 Gigabyte csv-file or several terabytes of data for the total period. That means, analyzing the whole Log File Dataset is ambitious, but one can infer some interesting research issues by filtering the dataset cleverly.


How large is the investor´s attention on the IPO of a company measured by accesses in EDGAR registration filings?

Taking only the initial public offering (IPO) of a company into account, one can analyze the traffic on the EDGAR database.  Measuring the number of clicks on an IPO statement, the attention of traders on a company newly listed can be investigated. A huge amount of traffic infers widely consumption of new information. Because registration statements contain new crucial information about risk factors, valuation and dividend policy, investors that want to invest in a recent listed company should pay close attention to these documents. One could examine if the stock is less volatile during the first months after the offering process if the registration statement is consumed at a high frequency and magnitude. Because than, if many investors share the same information about the stock, they would not buy to high or sell to low as market efficiency hypotheses tells us.


Further analysis can be made such as controlling for other variables like volume or returns on that stock if IPO statements are consumed highly. Not only the amount of traffic but also the tenor of the statement should be considered as well. Text analysis can be applied to capture the sentiment of the IPO statement. Thus, there is a variety of research issues which can be analyzed using the EDGAR Log File Dataset. However, interpretation on the outcome of this analysis might be challenging or even endogenous. It is unknown whether clicks on IPO filings or sentiment analysis on the statements have a relation to financial variables.

But overall, measuring the amount of traffic of IPO registration statements should reflect the investor´s attention and interest on investing in a U.S. company. Analysis will be continued…



Print Friendly, PDF & Email