In the dynamic landscape of open-source development, the challenge of efficiently aligning developers with projects that resonate with their expertise persists. This Repo introduces CodeCompass, a novel solution addressing this issue by leveraging the GitHub API to personalize project recommendations. By analyzing users’ historical contributions, including commit history and issue engagement, CodeCompass employs Natural Language Processing (NLP) techniques, such as stemming, lemmatization, and the BERT model, to discern nuanced semantic relationships within issues. The system calculates cosine similarity to identify highly correlated issues, surpassing a 90% threshold. The result is a curated list of open-source project issues, presented through a React-based website interface, ensuring not only technical alignment but also resonance with users’ past experiences in the GitHub community. This intricate interplay of data acquisition, processing, and machine learning techniques establishes CodeCompass as an intuitive and efficient recommendation system, streamlining developer engagement in the open-source community.
Navigating the vast landscape of open-source projects is a challenge for developers, leading to delays and suboptimal utilization of skills. CodeCompass addresses this by providing a streamlined solution for developers to discover and contribute to projects matching their skills and interests.
CodeCompass caters to a diverse range of developers, from novices to seasoned contributors. Through surveys and analysis, we understand their programming languages, areas of interest, preferred complexities, and community engagement levels. This enables personalized recommendations for a rich contribution experience.
CodeCompass utilizes the GitHub API for real-time project information, including metadata, languages used, commit history, and community engagement metrics. This ensures up-to-date and relevant data for accurate recommendations.
CodeCompass leverages user historical contributions using the GitHub API. It extracts and processes issues, applying NLP techniques for data integrity. The BERT model and cosine similarity calculate issue similarities, resulting in a curated list of recommended open-source project issues aligned with the user’s preferences.
Developed with React, CodeCompass ensures an intuitive and dynamic user interface. Responsive design optimizes accessibility, and the platform’s engaging front end provides an aesthetic experience for users across all skill levels.
The machine learning model is seamlessly integrated into the React-based front end, with Flask serving as the backend bridge. This integration facilitates real-time and accurate project suggestions, enhancing CodeCompass’s functionality and responsiveness.
Thorough testing procedures, including unit testing, integration testing, and end-to-end testing, ensure the robustness and reliability of the recommendation system.
npm i
to install packagesnpm start
to Run the Web Appnodemon server\src\app.ts
to start the Authantication Server via GitHubpython3 src\test.py
to Start the flask App for Recommendation of ProjectThese above should be running simulatanusly
CodeCompass follows industry-standard practices for deployment on scalable and reliable cloud infrastructure. Continuous integration and automated deployment pipelines minimize downtime and enhance accessibility for users.
A dedicated maintenance strategy post-deployment includes regular updates, security patches, and performance optimizations. Continuous monitoring and feedback loops ensure a resilient and responsive system for sustained user satisfaction.
Lemmatization is a crucial step in CodeCompass’s recommendation system, transforming words into their root forms for effective semantic analysis during the comparison of user issues with suggested GitHub repository issues.
from nltk.stem import WordNetLemmatizer
).lemmatizer = WordNetLemmatizer()
).preprocess_text
) for lemmatization.preprocess_text
function includes:
calculate_similarity
function) to preprocess issue text for comparison.from transformers import BertTokenizer, BertModel
).tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
).preprocess_and_tokenize
) to tokenize and preprocess text using the BERT tokenizer.preprocess_and_tokenize
function:
calculate_similarity
function) to preprocess issue text for comparison.recommendedRepo
).similarity
: The value of cosine similarity between known and unknown issues.recommended_repos
: The list of serialized recommended repositories.response_data
) as the result of the API call.We optimized the email sending process by switching to a more efficient email service. This reduces the time taken to send emails and improves the overall performance of the application.
Implemented caching for frequently accessed data to reduce server load and improve response times. This includes caching GitHub user data and repository data.
Optimized the GitHub user retrieval process by reducing the number of API calls. This minimizes the load on the GitHub API and speeds up the application.
To identify performance bottlenecks, we used profiling tools such as cProfile
for Python and Chrome DevTools for the front end. These tools helped us pinpoint slow functions and areas of the code that needed optimization.
cProfile
to profile the Flask application.Issue: Slow responses from the GitHub API. Solution: Implemented caching to store frequently accessed data and reduce the number of API calls.
Issue: High server load due to frequent data retrieval. Solution: Implemented caching and optimized data retrieval processes to reduce server load.
Issue: Slow email sending process. Solution: Switched to a more efficient email service to speed up the email sending process.
Issue: Slow page load times due to large data sets. Solution: Implemented lazy loading and optimized data fetching to improve page load times.