Big Data in Securities Finance

Big Data in Securities Finance

Chris Benedict,



Tom Ashton,

Product Specialist,


September 19,  2018

THE AMOUNT OF DATA IN THE world is increasing exponentially—by some accounts, doubling approximately every 18 months. This phenomenon has given rise to the term “big data,” which some dismiss as another 21st Century buzz word. Yet those on the cutting edge of technology understand big data’s true potential to transform the way most industries—from finance to healthcare to transportation and beyond—do business.

The objective of big data is not just to store and maintain vast quantities of data. Instead, it is the ability to meaningfully process huge data sets utilizing advanced analytical practices such as predictive analytics and machine learning to unlock information and insight. With advancements in technology and the costs associated with data storage and retrieval continuing to fall, these projects are no longer blocked by significant expenses. Cost saving is even more prevalent with the introduction of cloud computing, allowing firms of all sizes to engage in advanced analytics without significant hardware investment.

Some still view big data as a necessary evil that creates technological challenges and headaches; however, there are many benefits firms can realize from harnessing and efficiently using big data: cost savings from identifying more efficient ways to do business, reducing decision-making time, anticipating and responding to the needs of clients and quicker, more robust error detection and handling, to name but a few.

Big data can be used for such obvious business solutions as reducing time to payment in the accounts receivable department to more complex and potentially life-saving applications such as predicting lava flow vectors during volcanic eruptions. Big data permeates every aspect of our lives, whether we know it or not.

Combining big data with machine learning brings with it huge transformative potential across many industries, not least the financial sector. Banks are using data to their advantage by training algorithms to react to market trends, track trading volatility and manage assets on behalf of investors. This technology is able to spot trends (and divergences) much more efficiently than humans, and is also able to react in real time, thereby reducing any impact from major events—for example, by tracking and analyzing market volatility during a breaking global incident and halting, or drastically reducing, any trading activity to minimize risk. Banks also utilize these practices to minimize the risk of fraudulent transactions: Historically, banks would rely on analysts running complex SQL queries against massive data warehouses, which could take weeks or even months to provide any meaningful results. With big data, this has changed significantly, as these systems are now able to learn and become more useful as they ingest more data.

Consequently, these systems can pick out events that could suggest untoward behavior, allowing firms to act quickly and investigate, preventing situations from spiraling out of control.

Outside of the financial sector, Google has been utilizing big data for years across its various products. In 2015, Google introduced a “smart reply” functionality in Gmail. This functionality relies on two recurrent neural networks, one to process the incoming mail and the other to predict a set of canned responses based on the original mail’s content. Gmail is not the only product to be on the receiving end of these machine learning concepts, with Google Maps applying deep learning algorithms to its Street View cars to analyze more than 80 billion photos and assist in extracting street names and house numbers, a task which would have been almost impossibly time consuming for mere humans.

In order to conceptualize what big data really is, one must understand the four key characteristics of big data, known as the “Four V’s”:

• Volume: how much data is being stored

• Variety: the different types of data being stored

• Velocity: the speed at which the data is generated and processed

• Veracity: how “clean” the data is


As noted, the volume of data in the world is increasing exponentially. The financial services industry historically has struggled with the issue of big data, and the securities finance market is no exception.

Since 2013, DataLend has accumulated more than 40 terabytes worth of securities finance data, which equals more than three million phone books worth of data (for those of us who remember what those were). A large volume of data typically causes trouble with velocity, or how quickly that data can be retrieved.


The speed at which data is accumulated is constantly increasing. For example, NYSE is said to capture more than one terabyte of data each day. According to Forbes, around 1.7 megabytes of new information will be created every second for every human on the planet by 2020. It is important to return result sets to end users as quickly as possible; as a result, a large volume of data can cause data velocity issues.

Challenges around volume and velocity are usually answered by IT departments. Servers can be upgraded, queries can be optimized and network infrastructure can be improved to help alleviate these issues. But veracity and variety are usually tackled from a business perspective.


On the surface, a securities lending transaction looks pretty straight forward: a broker-dealer borrows a security from an agent lender and posts either cash or securities as collateral. However, there is a variety of detail that goes along with that transaction. Trades can be booked as open, or set to end after a specific date. Cash collateral trades can be booked against a variety of currencies, while non-cash collateral trades can be booked against a wide range of collateral, including corporate bonds, money market securities such as Treasury bills and U.S and international equities, to name a few. The fee or rebate rate associated with each transaction can vary widely, from GC trades of 10 basis points (bps) to extremely hot securities trading with demand spreads of 5,000 bps or more. Firms usually trade with numerous counterparts across many asset classes and geographic regions, necessitating specific settlement instructions. And so on.

All of these characteristics, and many more, need to be stored at the trade level and cross-referenced for performance or compliance reporting across a wide array of views. Since DataLend processes and cleanses over three million transactions per day, the variety of securities lending data stored over time can be quite complex.


The volume, velocity and variety of data are meaningless if the data is not accurate. DataLend strives to ensure the cleanest, most robust and deepest data set possible in securities finance. This is achieved in part by employing multiple data cleansing algorithms to identify and segregate “outlier” trades, or transactions that do not represent normal market conditions based on the intrinsic lending value of a particular security. For example, broker-dealers lending securities to other broker-dealers may charge different fees compared to a traditional lender-to-broker transaction. Broker-to-broker fees can skew the market, so it is best to identify and segregate these transactions and report them as a separate market.

In another example, new loans may be booked incorrectly; algorithms to detect and flag higher-than-normal unit quantities, contract values or fees given current and historical patterns can help eliminate these erroneous values.

But the concept of data veracity is not always straight forward. There are times when certain types of valid transactions are viewed differently by different users. For example, fees associated with structured transactions may be seen as not representative of normal market conditions and dismissed as “noise” by some. Other users may deem these transactions of critical importance in their performance reporting metrics.

Market nuances, such as the disparity in fees to borrow Asian equities from onshore versus offshore lenders, also can be captured and segregated to show the differences and more granularity. In such situations, a fifth “V” is often quoted in the big data world: visualization. The ability to quickly retrieve accurate data is pointless if the consumer cannot comprehend it in its end state.

What all this means for the securities finance industry and consumers of market data is greater efficiency and more informed decision making. DataLend’s Web portal provides traders, quantitative analysts, relationship managers and desk heads with all the functionality necessary to use securities finance big data to their advantage, as DataLend pre-calculates a wide array of securities lending metrics across approximately 50,000 individual securities on loan globally and provides trending information for any of them at the click of a button.

Using Big Data to Tell the Full Story

The ability to quickly view historic and current fees or rebate rates, utilization, re-rate trends, transactional-level data and many more metrics provides traders a platform to make better informed trading decisions. Traders and managers can also use this information to see where they are under- or over-performing from a borrowing or lending perspective, allowing them to take corrective action. All of this granular information can be aggregated and distilled to help determine a firm’s most (or least) profitable and efficient trading partners, asset classes and/or markets.

Big data in DataLend can assist relationship managers in explaining performance results to their beneficial owner clients through the use of our robust Client Performance Reporting (CPR) tools, showing them performance metrics across asset classes, countries, sectors or even individual securities over customizable, user-defined reporting timeframes. CPR also goes one step further by running data through comprehensive peer comparison algorithms, allowing users not only to see their own performance, but how that performance compares to peers in predefined groups, organized by fiscal location, legal structure or by the type of collateral used.

These metrics can help relationship managers identify revenue opportunities for their beneficial owner clients, allowing them to have a better informed conversation regarding the risks and rewards associated with these opportunities.

Furthermore, securities lending traders are working more closely than ever with their collateral management teams as a result of big data. Previously, trading and collateral information was housed in separate systems, requiring an analyst to run queries to extract and combine that data. Now users can quickly retrieve both sets of information, allowing them to more easily determine if an asset should be used as collateral or lent out.

Using Big Data to Model Possibilities

Agent lenders can use big data to play “what if ” when lending on an exclusive basis to estimate how much revenue a beneficial owner’s portfolio might have made had it been lent on a discretionary basis at prevailing market rates. Traders can quickly see varying prices and rates between a depositary receipt and the underlying security trading in the local market to detect arbitrage opportunities.

Quantitative analysts process securities lending data related to hundreds or possibly thousands of underlying assets within an ETF versus the ETF itself to engage in more efficient and accurate statistical arbitrage trading. Traders on both the agent lender and broker-dealer side can run vast quantities of historic securities lending data through proprietary models to anticipate possible rate movements as a result of upcoming earnings surprises. Models related to these and other examples can be run again and again as more data is ingested, evolving to become more accurate over time (an example of machine learning).

Outside of the securities finance world, DataLend’s big data can help asset managers from an investment perspective. The ability to quickly see various securities finance metrics across a portfolio helps to tell a story from both a long and short perspective. Increased utilization, short interest and volume-weighted average fees in a security usually signal increased short-selling pressure.

This information juxtaposed against a backdrop of fundamental and technical analysis may help an asset manager implement a strategy that would mitigate the potential near-term selling pressure. Analyzing recent price action in conjunction with a rising days-to-cover metric may signal an upcoming short squeeze, given the right catalyst. Comparative metrics can suggest a particular security is undervalued (or overbought) relative to its peers.

Historically, asset managers have not consumed much securities finance data to help make investment decisions; however, this is changing. More and more firms are gravitating toward non-traditional information sources, or “alt data,” as a means to realize additional alpha, and securities finance data is rife with opportunity.

What’s Next for Big Data

Taking big data a step further, cutting-edge firms are leveraging emerging technologies such as artificial intelligence to incorporate newly amalgamated sources of big data with sophisticated deep learning and predictive analysis applications. We will start to see the true benefits when questions no longer need to be asked to obtain desired information; instead, answers will be found to questions that were never before considered.

With the ever-increasing velocity at which data is being created, zetta (1 billion terabytes), yotta (1,000 zettabytes) and brontobytes (1,000 yottabytes) will soon become the language with which we describe data volume.

The future of big data may still be evolving, but one thing is for sure: It will transform how organizations operate with and use data. Firms are getting better at interpreting big data, and as a result a huge paradigm shift is underway.