The Importance of Reliable Sources in Threat Intelligence
Active, reliable sources of blocklists are essential in threat intelligence, providing real-time data on malicious IPs, domains, URLs, and files that pose security risks. By regularly updating and verifying these sources, security systems can quickly identify and block emerging threats, reducing the attack surface and safeguarding networks from cyber-attacks. Consistent and trustworthy blocklists allow for proactive threat mitigation, enhancing an organization’s resilience against a constantly evolving threat landscape. In this report, I will dive deep into my process of finding reliable, new sources to aid in OneFirewall’s database. I will also delve into my strategy behind assigning values to these sources, such as confidence scores, frequency of updates, and thresholds, ensuring that only the most relevant data informs our security measures. Then, I will provide a comprehensive overview of all my newly added sources and explain the specific roles each plays in strengthening threat detection. Additionally, I’ll cover other new implementations designed to optimize the efficiency and accuracy of data parsing, with a final section dedicated to the recent addition of domain feeds, which expands our coverage across potentially harmful web entities. Following the introduction of new sources, enhancements to parsing logic, and implementation of threshold mechanisms, OneFirewall observed a substantial increase in the comprehensiveness of its database. This included an approximate 17% increase in the total number of low-risk entries, which is indicative of broader coverage of lower-priority threats. Additionally, medium- and high-risk entries saw significant growth, with medium-risk entries increasing by nearly 61% and high-risk entries by 46%, highlighting improved identification of moderately severe and severe threats.Finding Sources
The identification of reliable sources required a structured, multi-step approach, combining broad research and targeted validation:- Initial Broad Research I began by investigating how to find reliable sources, focusing on aggregated threat intelligence repositories, open blocklist databases, and community-driven feeds. These broad searches helped uncover a diverse range of potential sources.
- Targeted Evaluation After compiling a list, I conducted a detailed evaluation of each source. This process included analyzing the credibility of the organization or individual maintaining the source, update frequency, and the type of data provided (e.g., IPs, domains, URLs). To balance the database, I included both established, highly reliable feeds and smaller, niche sources such as GitHub repositories, which often contained unique data.
- Cross-Referencing with Existing Feeds To ensure that OneFirewall’s database remained efficient and free of redundancy, I cross-referenced newly identified feeds against the existing database. This allowed me to pinpoint gaps and prioritize sources that introduced new, non-overlapping information.
- Validation through Consistency Checks I cross-referenced blocked IPs between sources, focusing on overlaps with trusted feeds. Higher consistency increased a source’s credibility, while unique data points were further validated before inclusion.
Assigning Values to New Sources
Assigning values such as confidence scores, update frequency, and thresholds to each source was a critical step in integrating them effectively into the system:- Confidence Scores Initially, confidence scores were assigned based on perceived reliability, ranging from 0.1 to 0.9. As the database grew, I recalibrated these scores by comparing them with existing OneFirewall data. This process revealed that most new sources required lower scores, typically around 0.2, to align with the broader database. This adjustment ensured consistency and avoided over-reliance on unproven feeds.
- Update Frequency For each source, I analyzed update frequencies using documentation, GitHub commit logs, or direct monitoring. Sources labeled as “active” but with infrequent or outdated updates were flagged for periodic review. Feeds with large, cumulative blocklists required additional adjustments, as their growing datasets could impact performance. For these, I implemented reduced upload frequencies without compromising data relevance.
- Thresholds To manage data volume effectively, I introduced a threshold system across all sources. Thresholds were tailored to each source by analyzing its recent upload history and setting limits slightly above the average data size. For example, if a source consistently uploaded around 800 entries, a threshold of 1000 ensured flexibility while preventing overloads. The logic also prioritized the newest data by retaining the last X entries in cases of cutoff, ensuring the latest updates were captured.
Newly Added Sources
The inclusion of new feeds significantly enhanced OneFirewall’s database, with approximately 40 new sources added during this phase. These sources were carefully selected for their ability to address gaps in the existing database and provide diverse, actionable threat intelligence. The new additions contributed to a notable increase in OneFirewall’s scope, particularly in medium- and high-risk categories, which grew by 61% and 46%, respectively. Some notable examples include:- Anti Attacks: This feed has already contributed over 1.5 million unique IPs in the short time it has been active, demonstrating its value as a high-volume source of malicious IPs.
- Specialized Niche Feeds: Community-maintained blocklists that provide unique, often overlooked data points, complementing larger, well-established sources.
- Broad Aggregators: Consolidated repositories that integrate multiple datasets, enhancing the diversity and depth of the threat intelligence database.
Modified Existing Sources
To improve the efficiency and relevance of the existing database, several feeds were reviewed and updated. Key improvements included:- Confidence Adjustments: Increased confidence values for X sources after validating their consistency with other feeds.
- Threshold Implementation: Applied thresholds to all sources, optimizing their contributions without overloading the system.
New Implementations for Sources in Other Formats
To accommodate sources in diverse formats, I developed new parsing logic:-
Compressed Files
- Introduced parsing logic and to handle both .zip and .gz files respectively.
- Modified the initializing code to integrate these parsers, enabling the system to unpack and process data from compressed archives.
-
Domain Feeds
- Developed a parser to process domain-specific blocklists, enabling OneFirewall to handle malicious domains and extend its coverage beyond IP feeds.
- This logic was built to complement the existing threat intelligence database and improve correlation between domains and other threat indicators
-
File/Hash Feeds
- Implemented parsing logic to process feeds containing malicious file hashes (e.g., SHA256, MD5) and filenames.
- Integrated these feeds with the existing database to strengthen OneFirewall’s ability to detect file-based threats and correlate them with IPs and domains.
-
Script Optimization
- Refactored the overall script and code structure to make execution more streamlined and intuitive.
- Reorganized and restructured key components to improve readability, reduce redundancies, and enhance performance.
- These changes increased the efficiency of data processing, making the system more adaptable to evolving requirements.
Threshold Logic and Importance
The introduction of threshold logic addressed the challenges posed by high-volume sources and cumulative blocklists:- Performance Optimization Thresholds limited the number of entries each source could contribute per update, preventing system overloads while ensuring relevance.
- Dynamic Tailoring Thresholds were customized for each source based on its average data volume. This ensured flexibility for natural fluctuations while maintaining system efficiency.
- Prioritizing Recent Data The logic prioritized the newest entries by retaining the last X elements during cutoff scenarios. This ensured that updates always reflected the latest data while trimming older, less relevant information.
New Feeds for Domains
Domain feeds represent a critical addition to OneFirewall’s capabilities, allowing it to address web-based threats more effectively:- Logic Development: Introduced parsing logic to extract and validate domain data.
- Integration with Existing Feeds: Domain feeds were cross-referenced with IP blocklists to identify overlaps and strengthen correlations.
- Scalability: The new parser supports domain-specific sources while maintaining compatibility with existing logic.
New Feeds for Files/Hashes
The addition of file and hash feeds expands OneFirewall’s capabilities to address file-based threats, including malicious hashes such as SHA256, MD5, and filenames. To process these feeds, I introduced a parser specifically designed to handle and extract file-related data.- Logic Development: The parser processes file hashes and names from various source formats, including .json and .txt.
- Integration with Existing Feeds: File feeds were cross-referenced with IP and domain feeds to identify overlaps, improving threat correlation.
- Scalability: The new parser supports file-specific feeds from both open-source and proprietary platforms, strengthening OneFirewall’s protection against file-based attack vectors such as malware and ransomware.