The Importance of Reliable Sources in Threat Intelligence

Active, reliable sources of blocklists are essential in threat intelligence, providing real-time data on malicious IPs, domains, URLs, and files that pose security risks. By regularly updating and verifying these sources, security systems can quickly identify and block emerging threats, reducing the attack surface and safeguarding networks from cyber-attacks. Consistent and trustworthy blocklists allow for proactive threat mitigation, enhancing an organization’s resilience against a constantly evolving threat landscape.

In this report, I will dive deep into my process of finding reliable, new sources to aid in OneFirewall’s database. I will also delve into my strategy behind assigning values to these sources, such as confidence scores, frequency of updates, and thresholds, ensuring that only the most relevant data informs our security measures. Then, I will provide a comprehensive overview of all my newly added sources and explain the specific roles each plays in strengthening threat detection. Additionally, I’ll cover other new implementations designed to optimize the efficiency and accuracy of data parsing, with a final section dedicated to the recent addition of domain feeds, which expands our coverage across potentially harmful web entities.

Following the introduction of new sources, enhancements to parsing logic, and implementation of threshold mechanisms, OneFirewall observed a substantial increase in the comprehensiveness of its database. This included an approximate 17% increase in the total number of low-risk entries, which is indicative of broader coverage of lower-priority threats. Additionally, medium- and high-risk entries saw significant growth, with medium-risk entries increasing by nearly 61% and high-risk entries by 46%, highlighting improved identification of moderately severe and severe threats.

Finding Sources

The identification of reliable sources required a structured, multi-step approach, combining broad research and targeted validation:

  1. Initial Broad Research I began by investigating how to find reliable sources, focusing on aggregated threat intelligence repositories, open blocklist databases, and community-driven feeds. These broad searches helped uncover a diverse range of potential sources.

  2. Targeted Evaluation After compiling a list, I conducted a detailed evaluation of each source. This process included analyzing the credibility of the organization or individual maintaining the source, update frequency, and the type of data provided (e.g., IPs, domains, URLs). To balance the database, I included both established, highly reliable feeds and smaller, niche sources such as GitHub repositories, which often contained unique data.

  3. Cross-Referencing with Existing Feeds To ensure that OneFirewall’s database remained efficient and free of redundancy, I cross-referenced newly identified feeds against the existing database. This allowed me to pinpoint gaps and prioritize sources that introduced new, non-overlapping information.

  4. Validation through Consistency Checks I cross-referenced blocked IPs between sources, focusing on overlaps with trusted feeds. Higher consistency increased a source’s credibility, while unique data points were further validated before inclusion.

Through this process, I expanded the OneFirewall database by over 17% in low-risk entries and achieved significant growth in medium-risk (61%) and high-risk (46%) categories, ensuring a balanced mix of well-backed and specialized sources. This can be seen in the following bar graphs comparing entries of different level risk sources before and after my changes.

Assigning Values to New Sources

Assigning values such as confidence scores, update frequency, and thresholds to each source was a critical step in integrating them effectively into the system:

  1. Confidence Scores Initially, confidence scores were assigned based on perceived reliability, ranging from 0.1 to 0.9. As the database grew, I recalibrated these scores by comparing them with existing OneFirewall data. This process revealed that most new sources required lower scores, typically around 0.2, to align with the broader database. This adjustment ensured consistency and avoided over-reliance on unproven feeds.

  2. Update Frequency For each source, I analyzed update frequencies using documentation, GitHub commit logs, or direct monitoring. Sources labeled as “active” but with infrequent or outdated updates were flagged for periodic review. Feeds with large, cumulative blocklists required additional adjustments, as their growing datasets could impact performance. For these, I implemented reduced upload frequencies without compromising data relevance.

  3. Thresholds To manage data volume effectively, I introduced a threshold system across all sources. Thresholds were tailored to each source by analyzing its recent upload history and setting limits slightly above the average data size. For example, if a source consistently uploaded around 800 entries, a threshold of 1000 ensured flexibility while preventing overloads. The logic also prioritized the newest data by retaining the last X entries in cases of cutoff, ensuring the latest updates were captured.

The refined scoring system, combined with new parsing logic and thresholds, played a critical role in expanding OneFirewall’s database. Post-implementation, the proportion of critical-risk entries in the database increased significantly, nearly tripling in volume. This suggests that the system now captures more high-impact, urgent threats, likely due to enhanced validation and better handling of cumulative blocklists. By focusing on refining these processes, OneFirewall achieved a 168% increase in critical-risk entries, directly improving the platform’s ability to flag and address urgent risks in real-time.

Newly Added Sources

The inclusion of new feeds significantly enhanced OneFirewall’s database, with approximately  40 new sources added during this phase. These sources were carefully selected for their ability to address gaps in the existing database and provide diverse, actionable threat intelligence. The new additions contributed to a notable increase in OneFirewall’s scope, particularly in medium- and high-risk categories, which grew by 61% and 46%, respectively.

Some notable examples include:

  • Anti Attacks: This feed has already contributed over 1.5 million unique IPs in the short time it has been active, demonstrating its value as a high-volume source of malicious IPs.

  • Specialized Niche Feeds: Community-maintained blocklists that provide unique, often overlooked data points, complementing larger, well-established sources.

  • Broad Aggregators: Consolidated repositories that integrate multiple datasets, enhancing the diversity and depth of the threat intelligence database.

Each source was rigorously evaluated for consistency, credibility, and relevance. Through validation and cross-referencing, these new feeds were seamlessly integrated, ensuring a comprehensive and balanced dataset capable of addressing threats across all risk categories.

Modified Existing Sources

To improve the efficiency and relevance of the existing database, several feeds were reviewed and updated. Key improvements included:

  • Confidence Adjustments: Increased confidence values for X sources after validating their consistency with other feeds.

  • Threshold Implementation: Applied thresholds to all sources, optimizing their contributions without overloading the system.

Reactivations: Reinstated previously dormant sources after ensuring compatibility with new parsing logic.

New Implementations for Sources in Other Formats

To accommodate sources in diverse formats, I developed new parsing logic:

  1. Compressed Files

    • Introduced parsing logic and to handle both .zip and .gz files respectively.

    • Modified the initializing code to integrate these parsers, enabling the system to unpack and process data from compressed archives.

  2. Domain Feeds

    • Developed a parser to process domain-specific blocklists, enabling OneFirewall to handle malicious domains and extend its coverage beyond IP feeds.

    • This logic was built to complement the existing threat intelligence database and improve correlation between domains and other threat indicators

  3. File/Hash Feeds

    • Implemented parsing logic to process feeds containing malicious file hashes (e.g., SHA256, MD5) and filenames.

    • Integrated these feeds with the existing database to strengthen OneFirewall’s ability to detect file-based threats and correlate them with IPs and domains.

  4. Script Optimization

    • Refactored the overall script and code structure to make execution more streamlined and intuitive.

    • Reorganized and restructured key components to improve readability, reduce redundancies, and enhance performance.

    • These changes increased the efficiency of data processing, making the system more adaptable to evolving requirements.

These implementations broadened OneFirewall’s capabilities, making it more versatile and adaptable to various data sources.

Threshold Logic and Importance

The introduction of threshold logic addressed the challenges posed by high-volume sources and cumulative blocklists:

  1. Performance Optimization Thresholds limited the number of entries each source could contribute per update, preventing system overloads while ensuring relevance.

  2. Dynamic Tailoring Thresholds were customized for each source based on its average data volume. This ensured flexibility for natural fluctuations while maintaining system efficiency.

  3. Prioritizing Recent Data The logic prioritized the newest entries by retaining the last X elements during cutoff scenarios. This ensured that updates always reflected the latest data while trimming older, less relevant information.

The threshold system was applied across all sources, enhancing scalability and performance as the database grew.

New Feeds for Domains

Domain feeds represent a critical addition to OneFirewall’s capabilities, allowing it to address web-based threats more effectively:

  • Logic Development: Introduced parsing logic to extract and validate domain data.

  • Integration with Existing Feeds: Domain feeds were cross-referenced with IP blocklists to identify overlaps and strengthen correlations.

  • Scalability: The new parser supports domain-specific sources while maintaining compatibility with existing logic.

This addition expands OneFirewall’s threat intelligence scope, providing deeper insights into malicious domains and their associated risks.

New Feeds for Files/Hashes

The addition of file and hash feeds expands OneFirewall’s capabilities to address file-based threats, including malicious hashes such as SHA256, MD5, and filenames. To process these feeds, I introduced a parser specifically designed to handle and extract file-related data.

  • Logic Development: The parser processes file hashes and names from various source formats, including .json and .txt.

  • Integration with Existing Feeds: File feeds were cross-referenced with IP and domain feeds to identify overlaps, improving threat correlation.

  • Scalability: The new parser supports file-specific feeds from both open-source and proprietary platforms, strengthening OneFirewall’s protection against file-based attack vectors such as malware and ransomware.

This implementation enhances OneFirewall’s ability to correlate threats across multiple dimensions, providing deeper insights into malicious file activity.

Conclusion

Through the systematic addition of new feeds, refined parsing logic, and improved threshold mechanisms, OneFirewall’s threat intelligence database saw significant growth across all risk categories. The database became more robust, with lower-risk entries increasing by over 15% and high- to critical-risk entries growing by approximately 46% and 168%, respectively. These improvements highlight the platform’s enhanced ability to monitor and mitigate a wider spectrum of threats, ultimately strengthening its position as a comprehensive and reliable threat intelligence tool.