SBC News DATA.BET: efficient sports mapping can revolutionise sports analytics & betting

DATA.BET: efficient sports mapping can revolutionise sports analytics & betting

SBC News DATA.BET: efficient sports mapping can revolutionise sports analytics & betting
Oleksii Kulish, Lead Data Scientist at DATA.BET.

Oleksii Kulish, Lead Data Scientist at DATA.BET, writes for SBC News to highlight the importance of mapping systems, and their impact on prediction accuracy and betting strategies.

In the digital age, sports data comes from diverse sources like official league databases and third-party aggregators. This variety presents challenges in ensuring consistency and accuracy across datasets. 

Sports event mapping systems are technological solutions that unify data from multiple sources into a single, coherent database. This article explores how such systems work and why they are essential for modern sports analytics and betting.

What is data mapping?  

Data mapping is the process of associating sports events, tournaments, teams, or even players from different data sources into unified entities (see pic below). Many rely on simple systems that assume identical or similar spelling of proper names. 

However, this approach often faces significant challenges due to variations in spelling, differences in languages, and identical team names that are not explicitly distinguished within certain leagues. 

For instance, it is generally understood that ‘Dynamo’ in the Ukrainian Premier League refers to Dynamo Kyiv. Similarly, a soccer tournament might be called ‘Segunda Liga’ in one source, ‘Liga Portugal 2’ in another, or Liga Portugal SABSEG refers to the league’s title sponsor.

SBC News DATA.BET: efficient sports mapping can revolutionise sports analytics & betting SBC News DATA.BET: efficient sports mapping can revolutionise sports analytics & betting

Without proper mapping, these inconsistencies can disrupt historical event storage and analysis systems and live event platforms. Simple similarity metrics, such as Levenshtein distance or cosine similarity, often fall short in handling these complexities. 

They may struggle to distinguish between shorthand names like “Na’Vi” (commonly recognized as Natus Vincere) or resolve ambiguities in identical names across different leagues. Robust mapping requires more advanced techniques that go beyond basic text comparisons to account for contextual and domain-specific nuances.

The development of such a system is an excellent example of a classic data science challenge. We have designed a solution that automatically maps approximately 98% of competitors/tournaments based on their historical games, performing pairwise comparisons for each of the two data sources. 

To ensure 100% mapping accuracy, we have implemented strict constraints and validation checks. Over five years of operation, no single false positive case has been observed. This robust system ensures that even complex cases are handled efficiently, improving both the speed and accuracy of data mapping for sports events.

Data sources

The system is based on historical sports event schedules from various sources. It processes information from individual sports events and requires a minimum set of inputs to operate efficiently. It includes tournament names, team names (both competitors, which can include individual players or pairs, as in tennis), event time, and team composition (optional, for player-level mapping).

While the basic information is sufficient, providing more detailed data can significantly improve the system’s performance. For example, the number of total kills in the MOBA game, the number of towers destroyed by each team in Dota 2, round scores in CS, etc.

This extra data not only accelerates the mapping process but also improves the efficiency of mapping new teams and players. By leveraging this enriched dataset, the system can map entities faster and ensure greater adaptability when encountering previously unseen competitors or team compositions.

Data preparation

When integrating new sports or data providers, some initial work is required to ensure the data meets the system’s needs. Most of these checks are automated, and if no issues arise with the mapping, no manual intervention is needed. Preprocessing ensures consistency and readiness for mapping by trimming historical data, normalizing text, and cleaning sport-specific identifiers.

This ensures that the system can smoothly adapt to new sources without significant delays or manual effort, making the process more efficient. 

Cold start mapping

Cold Start is an approach used for initial mapping (matching) of sports events when the system does not yet have enough data or records. This is particularly important when new data sources Since there is insufficient historical data or information for mapping, alternative methods are used, such as:

  • Exact names mapping

The system searches for events where the names of both teams and the tournament match exactly. This is the quickest and most efficient method, especially for large sports where the exact names are critical. In general, it is approximately 5-10% of competitors, depending on the provider and the sport.

  • Frequency mapping

When exact matches aren’t found, the system can determine matches based on how often two teams appear at the same time. This allows the system to gather a list of potential candidates for mapping, even if there are differences in spelling or the use of IDs instead of human-readable names.

  • Manual mapping

If automatic methods fail, the initial 5% of teams must be manually matched. Once this is done, the system can perform more efficiently in future mappings.

The core of mapping system statistical methods

After obtaining the initial mapping, we apply various methods for further one. Most of these methods are based on static approaches. Here, we introduce the concept of a purification process, akin to Dante’s “Purgatory”. 

Considering the human factor, the purification process ensures that each candidate must reach a specified number of matches before being considered for reliable mapping. This threshold is defined individually for each sport and serves as one of the system’s hyperparameters.

Confidence mapper method

This mapper is used when two elements of the sports event (tournament name/team name / or other) have already been mapped. Each candidate from a non-mapped element goes through the purification process to ensure accuracy in the mapping.

Mapped vs exact name mapper method

This mapper triggers when one entity is already mapped, and the other has an exact name match.

Iteration mapper method

This method involves the sequential application of previous approaches. By iterating, you gradually associate more and more events with known entities. The process continues as long as new entities are added to the mapping. Once the system exhausts its potential, the process is concluded.

Manual mapping: 

The system allows for the manual addition of relationships. For example, if you want to quickly establish a connection between teams instead of waiting for a sufficient number of matches to pass through the purification process, or if an error is detected in the system (such as the creation of a large number of events that are not valid in one of the data sources).

There are two options available:

  • UI mapping: Users can select which entities (teams, tournaments, players) from one source correspond to entities in another. Alternatively, users can ban a particular relationship. In such cases, even if the system identifies this connection, it will be ignored. This method is especially useful when a data source overwrites an existing entity with new information instead of creating a new one, leading to inconsistencies.
  • Semi-manual mapping: This approach is more semi-automated. It involves monitoring sources that have already been connected to each other. In these cases, the connection is passed through the purification process for further validation.

The sports event mapping system is a crucial element in ensuring the accuracy and consistency of data coming from various sources. By automating the mapping process and utilizing diverse methods such as exact name matching, frequency mapping, and statistical approaches, the system achieves high efficiency, guaranteeing a 99.9% accuracy when integrating data. However, the process also involves verification steps, including manual mapping in cases where automatic methods fail or when quick connections between new teams and tournaments need to be established.

The more data sources connected to the system, the more accurate and complete the event catalog becomes, as pairwise mapping allows for the comparison and reconciliation of different versions of the same events from multiple sources. This approach helps avoid errors related to format discrepancies, spelling differences, or variations in names. It ensures more accurate data interpretation and higher overall system performance.

Additional data provided by users can significantly improve mapping efficiency, enabling the system to adapt more quickly to new sources or teams. The system also accounts for sport-specific identifiers, making it flexible and ready for integration with new data. While the initial stages of mapping may require manual intervention, over time, the system becomes more autonomous and capable of mapping with high accuracy even in new or less commonly used sources.

Overall, this system is vital for sports analytics and betting. It provides users with accurate and timely information about events, tournaments, and teams, which directly impacts the quality of predictions and betting strategies. 

By integrating more data sources and refining its processes, this system has the potential to revolutionize sports analytics and betting, setting new standards for accuracy and adaptability.

Check Also

SBC News DATA.BET: esports a strategic growth avenue for operators

DATA.BET: esports a strategic growth avenue for operators

Bogdan Holovnov, Head of Esports at DATA.BET, takes a close look at why esports is …

SBC News DATA.BET grows parallel to esports market in 2024

DATA.BET grows parallel to esports market in 2024

DATA.BET has revealed in detail its corporate success story from 2024, reflecting the growth in …

SBC News CIRSA nets 68% stake in CasinoPortugal as Blackstone IPO lingers

CIRSA nets 68% stake in CasinoPortugal as Blackstone IPO lingers

Grupo CIRSA has received regulatory approval to acquire a majority stake in CasinoPortugal, expanding the …