Last week Scott Bauguess, Acting Director and Acting Chief Economist of the Securities and Exchange Commission’s (SEC) Division of Economic Risk and Analysis, shared insights about how the SEC is leveraging artificial intelligence and machine learning to track, and perhaps predict, emerging risks in the marketplace. In the latest in a series of speeches, Bauguess also described how the SEC is using big data, harnessed with the appropriate processing power and partnered with human intuition, to focus investigative and enforcement resources. While Bauguess and others at the SEC see a bright future for data analytics at the SEC, particularly in identifying emerging trends, Bauguess stressed the human element is ever important in assessing risk, combatting fraud and bringing or recommending enforcement actions.
The SEC’s initial foray into machine learning was sparked by the financial crisis. Using hindsight, coupled with simple word counts and regular expressions, the SEC searched issuer filings to test whether increased use of “credit default swap” in filing documents could have alerted SEC staff to the growing market risk. While the frequency analysis was not particularly impressive, the study highlighted the power of applying text analytics and natural language processing to SEC filings, and the SEC has built on this simple text modelling approach.
The SEC has since moved to more sophisticated topic modelling approaches, like latent dirichlet allocation (LDA), to identify emerging trends in disclosure documents and identify potential risks. Rather than analyzing documents based on user-supplied terms like “credit default swaps”, LDA synthesizes large sets of documents and compares the text to language probability distributions in order to organically determine which new topics or terms to track. Thus, LDA not only reports on the frequency and use of new terms in filing documents, but it also determines which new terms warrant future monitoring.
LDA can function without specialized expertise or programming, making it applicable across departments within the SEC (and of course, the private sector). Generally referred to as unsupervised machine learning, this form of analytics also excels at identifying commonalities or connections between documents, or outliers amongst documents, so that SEC staff can triage review. The SEC has used LDA to find connections within the tips, complaints and referrals database, as well as to highlight abnormal disclosures by corporate issuers charged with wrongdoing.
While part of LDA’s appeal is that it does not require human ex ante input, Bauguess also highlighted how SEC staff “supervise” machine learning by incorporating human judgment in algorithms. For example, an unsupervised program will pick up on patterns and trends in language used in various filing documents or data, but a supervised program then applies insight from past cases to determine whether a filing should be further reviewed for distinct forms of fraud or misconduct. In the investment advisor space, the first “unsupervised” computer learning phase identifies topical themes and the degree of negativity in a filing, and the second “supervised” phase incorporates past examination outcomes to predict idiosyncratic risks for each adviser.
The implication for securities enforcement is tangible. For example, trading data has been instrumental to the work of the SEC’s Market Abuse Unit, which uses algorithms to identify potential abuse by analyzing massive amounts of trading data. This approach enabled the SEC to identify a Ukrainian-based group who settled with the SEC for $30 million after allegedly hacking a newswire service to glean earnings data before public release, and has otherwise originated at least nine distinct insider trading cases.
Thus, machine learning applications have evolved and been significantly more successful than the initial simple “credit default swap” search. But Bauguess also stresses three caveats. First, analytics are only as good as the underlying data. As analytics become increasingly integral to regulator work, it is important to devote appropriate resources to data and collection design. With electronic filing, much progress has been made, but the potential for future innovation, through programs like the Consolidated Audit Trail (CAT) or Option Pricing Reporting Authority (OPRA), is immense.
A related second point is that big data is getting bigger. The SEC webpage received over seven billion views in a year and delivered over two petabytes of data to visitors. Exchanges are scheduled to begin reporting transactions through CAT later this year, and broker-dealers will soon follow. OPRA alone creates two terabytes of data each day. Together, these trends imply that not only will data and collection design be increasingly important, but the methods used to synthesize this data will need to be increasingly efficient.
Last, Bauguess stresses that machine learning is an aid to, but not a replacement for, actual regulators. Unsupervised machine learning gives regulators new insights and is a new tool. But even supervised machine learning is not intended to stand in the place of a human regulator. As Bauguess has previously explained, machine learning and big data focuses and eases the job of a regulator, but corroboration and a final review are still critical roles for human intelligence and input. In sum, machine learning, big data and artificial intelligence have changed, and will continue to change, how the SEC approaches data and identifies potential risks to shape enforcement priorities, but an algorithm prediction alone is not, and should not be, a sufficient basis for an enforcement action.
Special thanks to Nick Dumont and Rachel Jackson for their work on this blog posting.