ashhadulislam · October 21, 2023 14:13
diff --git a/ProjectDetails.txt b/ProjectDetails.txt
 Project Details Document 
 Machine Learning: Scamming Website detection
 1.	Basic details
 Student name:	Ismail Khuda Bukhsh
 Draft project title:	Machine Learning: Scamming Website detection

 Client organisation:	Some projects are done for a client who provides requirements. It may be that you do not have a client for your project, in which case client information and client contact name can be left blank.
 Client contact name:	

 2.	Degree suitability

 This project on Anti-Phishing Website aligns perfectly with my course, Cybersecurity and Forensic Computing, as it encompasses various subjects we have extensively studied. Throughout the course, we delved into crucial topics such as vulnerability assessment, risk management, incident response, and social engineering, which are all directly relevant to the creation and administration of a secure and resilient website. Additionally, our coursework focused on secure coding practices, techniques for securing user authentication, and the implementation of secure communication protocols like HTTPS. Furthermore, we gained valuable insights into incident response procedures, forensic techniques for investigating phishing incidents, and effective methods for mitigating the impact of such attacks. Armed with this comprehensive knowledge, I am well-equipped to contribute to the development of a sustainable phishing detection website.

 3.	The project environment and problem to be solved
 Phishing websites are created with the intention of deceiving users into divulging their personal information, including usernames, passwords, credit card information, and social security numbers. Users who accidentally provide their personal information on such a website potentially become the victims of identity theft, in which case cybercriminals may use that information for unlawful purposes. When the users enter their financial details on these websites, attackers may use that information to make unauthorised transactions or steal funds from their accounts, resulting in financial loss. Such websites also convince users to reveal their login information for a variety of online accounts, including email, social media, or shopping platforms, which enables them to gain unauthorised access to these accounts and may result in privacy violations, unauthorised actions, or even account hijacking. These websites occasionally spread malware to user’s devices, jeopardising their device’s security and leading to data breaches, system instability, or illegal access to private data. Users may suffer reputational harm if they innocently fall victim to a phishing attempt and have their accounts or personal information stolen. To send spam or phishing emails to the contacts of the victims, for instance, attackers may exploit compromised email accounts.

 This project will solve the issue of phishing websites, which will help safeguard people from falling for these scams, stop the misuse of their personal information, and lessen the financial losses made by people and organisations due to fraudulent actions. It will promote a safer online environment by restoring and maintaining trust in online interactions. Confidential information, intellectual property, and sensitive data can all be protected and kept out of the wrong hands. Solving the problem of phishing websites involves educating users about the risks and techniques associated with phishing attacks. This will increase awareness and empower individuals to recognize and report phishing attempts, contributing to a more vigilant and resilient online community.

 When creating an anti-phishing website, the primary audience and target users includes internet users which comprises people of all ages and backgrounds who utilise the internet for various purposes, such as online shopping, banking, social networking, and other online activities. Anti-phishing websites can target businesses and organisations of various sizes. This includes businesses and institutions that deal with sensitive data and seek to keep an online environment secure. Additionally, local, state, and federal governments can also be a viable target market. These websites can assist in educating citizens and government workers about the dangers of phishing and offering tips on how to safeguard sensitive information.

 The market for anti-phishing websites is expanding. There is a continuing effort to solve the problem of phishing, as illustrated by the existence of groups like the Anti-Phishing Working Group (APWG) and the development of devices and methods to stop attacks via phishing. Additionally, the focus on education, reporting, and the development of anti-phishing tools suggests a growing recognition of the importance of protecting individuals and organisations from phishing scams.

 It is crucial to answer vital research questions that might direct the design, content, and operation of the website if one is to create a successful anti-phishing website. The efficiency of the anti-phishing tools and technologies which are already in use, the difficulties and hurdles users encounter when reporting phishing efforts, and the way in which users view and react to phishing attempts must be assessed. Investigation of the latest phishing trends and methods employed by attackers, the degree of familiarity and expertise the target audience has with phishing assaults, and the effectiveness of the existing anti-phishing tools and technologies should most necessarily be done.

 4.	Project aim and objectives

 Aim: The project's objective is the creation of a user-accessible website designed to facilitate the recognition and prevention of phishing incidents. The website's functionality will be underpinned by Machine Learning algorithms, continuously updated with the latest data, enabling it to evaluate the likelihood of a given URL being a phishing website when entered by the user.

 Objectives: 

 ●	Research on existing anti-phishing websites, tools, and technologies to identify the strengths, weaknesses, and limitations. 
 ●	Delve into extraction of features from URLs
 ●	Investigate different phishing detection algorithms, such as machine learning-based approaches, rule-based systems, and anomaly detection techniques
 ●	Build and evaluate Machine Learning based algorithms to identify phishing websites
 ●	Analysis of  user behaviours, expectations, and preferences to identify phishing attacks. 
 ●	Conduction of usability testing throughout the development process. 

 5.	Project constraints
 Project constraints for the phishing detection platform typically include limitations and boundaries that may affect the development, deployment, and functionality of the system. Here are the constraints for this project:

 1.	Integration: If the platform needs to integrate with other systems or APIs, constraints can arise from the availability of those integration points and compatibility issues.
 2.	Data Availability: The quality and quantity of phishing data available for training the system can be a constraint. Limited or low-quality data may impact the effectiveness of the platform.
 3.	Scalability: If the platform needs to handle a large volume of data and users, scalability can be a constraint. Ensuring the platform can grow with increased demand is important.
 4.	Performance: The platform must meet certain performance benchmarks, such as response times and accuracy, which can be constrained by the available hardware and infrastructure.
 5.	User Interface: User interface design and user experience (UX) may be constrained by the need for simplicity, accessibility, or compliance with branding guidelines.
 6.	Security: Ensuring the security of the platform is paramount. Constraints may arise from security considerations that limit certain functionality.
 7.	Geographical Constraints: The platform may need to adhere to certain geographical constraints, such as data localisation laws, which affect where data can be stored and processed.

 6.	Facilities and resources
 Various facilities and resources will be needed for the project. Below is a list of the key requirements:

 1.	Development Environment:
 a.	Computer Workstations: Desktop or laptop computers with sufficient processing power and memory for development tasks.
 b.	Operating System: Typically, Linux, macOS, or Windows.
 c.	Integrated Development Environment (IDE): IDEs like PyCharm, Visual Studio Code, or Sublime Text for coding.
 2.	Software and Tools:
 a.	Python: The primary programming language for the project.
 b.	Flask: The web framework for building the back end.
 c.	HTML/CSS/JavaScript: For creating the front-end user interface.
 d.	Database Management System: To store and manage data. Popular choices include SQLite, MySQL, or PostgreSQL.
 e.	Version Control System: Such as Git for tracking code changes and collaboration.
 f.	Virtual Environment: Tools like Python's virtualenv for managing project dependencies.
 g.	Text Editor: For editing HTML, CSS, and JavaScript files.
 h.	Web Browsers: For testing and debugging web pages.
 i.	Package Managers: pip for Python packages and npm for JavaScript packages.
 j.	Web Server: For serving the Flask application in a production environment (e.g., Apache).
 k.	Machine learning tools like
 i.	Text Vectorizers
 ii.	Feature extractors
 iii.	Classification models like Random Forest, Decision Trees, SVM and Deep Neural Networks
 3.	Web Hosting and Domain (Optional):
 a.	A web hosting service to deploy the Flask application.
 b.	A domain name to make the platform accessible on the internet.
 4.	Database Server:
 a.	It will be hosted using Amazon Web Services (AWS).
 5.	Development Frameworks and Libraries:
 a.	Libraries and frameworks, such as SQLAlchemy for database interaction, Flask extensions, and JavaScript libraries.
 6.	Security Tools and Services:
 a.	Security scanning tools to identify and mitigate vulnerabilities.
 b.	SSL/TLS certificates for secure communication.
 c.	Firewall configurations and security practices to protect the application from cyber threats.
 7.	Testing and Quality Assurance:
 a.	Testing environments for debugging and testing code.
 b.	Quality assurance and testing practices to ensure the application's reliability.
 8.	Backup and Recovery Solutions: Regular data backup systems to prevent data loss.
 9.	Project management and collaboration tools: (e.g., Git) for version control
 10.	Compliance and Legal Resources:
 a.	Ensuring compliance with data protection laws and regulations (e.g., GDPR).
 11.	Hardware and Networking: Reliable internet connectivity for development and deployment.

 7.	Log of risks

 No	Description	Likelihood
 (high, medium, low)	Impact	Mitigation/Avoidance
 1	Compatibility Issues	Medium	Can cause errors, malfunctions and lead to reduced functionality, data corruption or security vulnerabilities	To thoroughly test website across different browsers, devices, and operating systems
 2	Data breaches	Low	A risk of data breaches or privacy violations if proper security measures are not in place	Implementing strong encryption for sensitive data, complying with relevant data protection regulations (GDPR). Regularly monitoring and auditing website's security controls.
 3	Phishing attacks targeting the website	High	Hackers may attempt to target the anti-phishing website itself, exploiting any vulnerabilities to gain unauthorised access or compromise user data.	Implementing robust security measures like secure coding practices and regular security audits. Staying updated with security patches and updates for the frameworks and libraries used.

 8.	Project deliverables
 The project deliverables for this platform are the following:

 1.	Phishing Detection Algorithm: The core deliverable is the phishing detection algorithm or model, which is the heart of the platform. This algorithm should be capable of analysing URLs, emails, or other online content and determining whether they are likely to be phishing attempts.
 2.	User Interface (UI): An intuitive and user-friendly web-based or desktop user interface is essential for interacting with the platform. It allows users to input URLs, view detection results, and configure settings.
 3.	Database: A database to store and manage data related to phishing URLs, user accounts, historical data, and detection results.
 4.	Training Data: A dataset of known phishing and legitimate URLs used to train and fine-tune the detection algorithm. This dataset is critical for the algorithm's accuracy.
 5.	Documentation: Comprehensive documentation, including user manuals, API documentation, and system architecture documentation.
 6.	Testing and Quality Assurance Reports: Reports on testing activities, including unit tests, integration tests, and system tests, as well as quality assurance results.
 7.	Integration Capabilities: If the platform needs to integrate with other systems, APIs, or security tools, the integration components are considered deliverables.
 8.	Training Materials: Educational materials and documentation for users and administrators to understand how to use and manage the platform effectively.
 9.	Project Closure Documentation: Formal project closure reports, including lessons learned, final assessments, and recommendations for future actions.

 9.	Project approach
 The project begins with project initiation, defining objectives, followed by comprehensive research to understand existing phishing detection techniques and gather project requirements. Subsequent phases involve the design of the platform's architecture, user interface, and database, setting the foundation for development. The project advances with back-end and front-end development, emphasising a robust phishing detection algorithm and user-friendly interface. Integration and testing ensure seamless operation, while advanced development and optimisation refine the algorithm for enhanced performance and accuracy. Documentation and user training materials are prepared to support users effectively, leading to a final presentation and evaluation of project success and lessons learned. 

 Project Management Methodology (PMM): In order to well execute the project a special version of the Agile approach will be incorporated called the Cowboy: An agile methodology for the solo programmer. Drawing from fundamental Agile principles, the Cowboy approach follows an iterative process where each cycle enhances the project by adding features and addressing issues from previous cycles. It maintains a backlog of tasks for the entire project, similar to Scrum, with a detailed subset for the current iteration. In Cowboy, project artifacts are kept deliberately simple, capturing the core concepts and system architecture. These artifacts may evolve or even be eliminated as requirements change, emphasizing the need to avoid excessive time spent on perfecting them. It entails the following:

 1. Solo Development: A single programmer working independently on a project.

 2. Lack of Formal Processes: In cowboy programming, the developer typically takes a more spontaneous approach to development.

 3. High Independence: Cowboy programmers prefer working alone, making quick decisions, and having full control over the entire development process.

 4. Limited Testing: Due to the independent nature of cowboy programming, testing and quality assurance might be less thorough compared to methodologies that emphasize testing at various stages.

 This approach was developed in Virginia Commonwealth University by Ashby Brooks Hollar in 2006



 10.	Project tasks and timescales

 Phase 1: Project Initiation (Weeks 1-2)

 ●	Week 1: Project Kick-off
 ○	Define project objectives and scope.
 ○	Identify key stakeholders.
 ○	Set initial expectations and goals.
 ●	Week 2: Project Planning
 ○	Develop a detailed project plan.
 ○	Create a work breakdown structure (WBS).
 ○	Set up project management tools.

 Phase 2: Research and Requirements (Weeks 3-6)

 ●	Week 3-4: PID Submission
 ○	Create a comprehensive PID that includes objectives, scope, and initial design concepts.
 ●	Week 5: Requirement Gathering
 ○	Collect and document user requirements.
 ○	Define functional and non-functional requirements.
 ●	Week 6: Literature Review
 ○	Conduct a literature review on phishing detection techniques.
 ○	Explore existing platforms and solutions.

 Phase 3: Design and Architecture (Weeks 7-12)

 ●	Week 7-8: Architecture Design
 ○	Plan the system architecture, including back end, front end, and database components.
 ●	Week 9-10: User Interface Design and Ethics report
 ○	Design the user interface with essential features.
 ○	Create wireframes and mock-ups.
 ○	Complete ethics review
 ●	Week 11-12: Database Design
 ○	Design the database schema.
 ○	Define data storage and access patterns.

 Phase 4: Development (Weeks 13-20)

 ●	Week 13-16: Back-End Development
 ○	Begin developing the back-end components.
 ○	Implement the initial phishing detection algorithm.
 ●	Week 17-20: Front-End Development
 ○	Develop the user interface.
 ○	Implement features for URL input, display of detection results, and user management.

 Phase 5: Integration and Testing (Weeks 21-26)
 ●	Week 21-24: Integration
 ○	Integrate the front end with the back end.
 ○	Ensure data flow and communication between components.
 ●	Week 25-26: Unit and Integration Testing
 ○	Perform unit testing on individual components.
 ○	Conduct integration testing to identify and resolve issues.

 Phase 6: Advanced Development and Optimization (Weeks 27-30)
 ●	Week 27-28: Advanced Algorithm Development
 ○	Enhance the phishing detection algorithm.
 ○	Implement advanced machine learning techniques.
 ●	Week 29-30: Advanced Optimization
 ○	Optimise the platform for performance, scalability, and security.
 ○	Address any performance bottlenecks.

 Phase 7: Documentation and User Training (Weeks 31-32)
 ●	Week 31: Documentation
 ○	Create user manuals, technical documentation, and system architecture documentation.
 ●	Week 32: User Training and Final Testing
 ○	Develop training materials.
 ○	Conduct final testing, quality assurance, and user acceptance testing.

 Project Conclusion:
 ●	Prepare for the final presentation.
 ●	Document lessons learned.
 ●	Evaluate the project's success and areas for improvement.

 11.	Supervisor meetings
 Project meetings are not restricted to a weekly basis, the number of meetings will vary depending on the progress of the project. Most commonly it will be once a week. However, sometimes it can be twice a week if needed or once in two weeks if not needed. All project meetings will run online with the supervisor. 

 The project meetings are essential throughout the whole time frame of the project to ensure staying aligned with the project objectives and seek guidance to any challenges faced.

 12.	Legal, ethical, professional, social issues
 Legal:

 ●	Intellectual Property Rights: 
 The issue: Infringing on copyrights, trademarks, or other forms of Intellectual Property (IP) can result in legal action by the IP rights holders. They may resort to legal action to defend their rights, requesting monetary compensation and injunctions prohibiting the improper use of their intellectual property. This may result in pricey legal disputes, possible penalties, and damages granted to the owners of IP rights. Such immoral behaviour can put off potential clients, investors, or business partners. Users may be held accountable for copyright infringement if they unintentionally upload content protected by copyright to the website. This may have negative effects on the website's credibility and user experience, and it may have legal repercussions for both users and website owners.

 Steps to mitigate the issue: Intellectual property rights must be respected. This includes ensuring that any code, algorithms, or other intellectual property used in the anti-phishing website must comply with relevant copyright and licensing requirements. If third-party libraries or frameworks are used, their licences should be properly acknowledged and adhered to.

 ●	Jurisdictional Compliance: 

 The issue: Cybersecurity, data protection, and online fraud prevention are all governed by a variety of laws and regulations. When creating a global anti-phishing website, these discrepancies may provide difficulties. Multiple jurisdictional compliance can be difficult and time-consuming. Some jurisdictions demand that any personal information gathered from their citizens be kept on their turf. This may restrict the capacity to centralise data processing and storage, which could raise expenses and complicate the project's anti-phishing website.  An anti-phishing website may be subject to legal action, fines, or other penalties if it violates the laws of a certain jurisdiction. 

 Steps to mitigate the issue: The laws and rules of the jurisdictions in which the anti-phishing website will function must be thoroughly investigated and comprehended. This includes any special guidelines for preventing online fraud as well as data protection laws and cybersecurity legislation. Strong privacy policies, data protection measures, and security protocols must be created which will adhere to the standards set forth by pertinent authorities. The policies and practices for the anti-phishing website must be periodically reviewed and updated to ensure that they meet any new specifications.

 Ethical:

 ●	Transparency and Informed Consent: 

 The issue: Ethical issues can arise if transparency and informed consent are not adequately addressed in the development of an anti-phishing website. There may be issues about privacy and the misuse of personal information if the website gathers more data than is necessary without alerting users or if the aim of data gathering is not made apparent. Similarly, user’s autonomy and control over their data may be jeopardised if they are not given the choice to provide or withhold consent.

 Steps to mitigate the issue: Must place a priority on transparency by offering clear and straightforward privacy policies, terms of service, and data management methods to address these problems. Before collecting any personal information, the users should be properly informed of the reason for and scope of the data collection. By doing so, this can promote trust, user empowerment, and ethical practices in the development of anti-phishing websites.

 ●	Bias and fairness:

 The issue: If the anti-phishing algorithms are not properly trained or calibrated, they may generate a high number of false positives, flagging legitimate websites as phishing sites. The user experience and trust in the system may be negatively impacted by unfair treatment or discrimination as an outcome. On the flip side, if the algorithms are ineffective at identifying genuine phishing assaults, they can produce a lot of false negatives and miss harmful websites. 
 
 Steps to mitigate the issue: The anti-phishing algorithms must be trained using diverse training data that is representative of various user demographics. It is important to periodically evaluate how well the anti-phishing algorithms are working to spot any unfairness or biases and rectify them. Establishing mechanisms for feedback from users and reporting any issues relating to bias or unjust treatment is necessary. Also, a procedure for recourse if consumers feel they have been treated unfairly should be ensured.


 Professional:

 ●	Resource Limitations: 
 The issue: Collaboration and stakeholder engagement often require the allocation of resources, including time, personnel, and budget. Limited availability of these resources can impose constraints on the project's timeline and scope. It may be difficult to meet everyone's expectations since different stakeholders may have conflicting demands on the same resources.

 Steps to mitigate the issue: With regards to project schedules, resource constraints, and scope, stakeholders must be provided with realistic expectations. Any restrictions or trade-offs that may occur during the project should be discussed openly. The expectations of stakeholders must be controlled by providing frequent updates and openly addressing any problems.