Projects

I'm always working on new projects.
You can find them on my Github.

China’s Electricity Market Outcomes and Reforms & Data workflow

➤ Utilized Python libraries such as BeautifulSoup and Selenium to navigate dynamic web pages and extract data from PDF and PNG files. Implemented OCR (Optical Character Recognition) to convert scanned documents and images into machine-readable text. Collected Chinese electricity usage data in PDF and PNG formats from over 30 provinces across various websites.
➤ Conducted empirical analysis focusing on energy market dynamics in China. Used R packages such as tidyverse for data manipulation and ggplot2 for data visualization. Applied linear regression models and generalized linear models (GLMs) to examine the relationship between marketization rate and coal prices, with a focus on regional variations between Shandong and Guangdong provinces.
➤ Currently working on automating workflows by creating efficient data organization schemes, automating data processing pipelines, and setting up robust systems for version control using the San Diego Supercomputer Center (SDSC) server.
➤ Contributed to a meta-analysis of research papers focused on carbon neutrality targets by 2060, with emphasis on gathering and analyzing data from predictive models for gas power in 2050 under diverse scenarios. Developed a comprehensive data template for mid-century models.
➤ Currently working on implementing processes to automate the collection and summarization of news articles related to Chinese policies using large language models (LLMs), enhancing the efficiency of policy monitoring and analysis. Conducted simple text analysis and plotted trend data to identify key insights.

Python
Jupyter
R
SQL
Data Cleaning
Modeling
Analysis

Chinese Academy of Sciences Project

➤ Summarized and reviewed three recent image inpainting papers, discussing innovations in strategy, design, stability, and algorithm. Analyzed each paper's failures to provide a comprehensive understanding of advancements in image inpainting.
➤ Contributed to the publication of a paper titled "Re-NeRF: Ultra-High-Definition NeRF with Deformable Net Alignment" (2024). The paper introduces the De-NeRF framework to address limitations in achieving high-fidelity view synthesis in ultra-high-resolution scenes using Neural Radiation Field (NeRF) by employing deformable convolutional networks to resolve misalignment issues, reduce training time, and minimize network parameters.

NextJS
React
TypeScript
Python
Jupyter
AWS

Predictors of Adolescent Substance Use in the Adolescent Brain Cognitive Development (ABCD) Study

➤ Cleaned and preprocessed data for the Brief Problem Monitor (BPM) and multiple datasets from the ABCD study. Delivered a merged dataset to support group efforts in combining data for the final cohort, enabling results generalization across analysts.
➤ Conducted cross-sectional and longitudinal multivariate regression analyses to examine the effects of genetic liability for externalizing behaviors (EXT), along with psychological and environmental factors, on the onset of tobacco and cannabis use among male and female adolescents.
➤ Developed interactive visualizations using packages such as ggplot and matplotlib to present survival analysis and regression results. Created hazard plots for coefficients and charts displaying participant numbers, effectively communicating findings to non-technical stakeholders.

Python
Jupyter
PostgreSQL
AWS

Fast Data Discovery

➤ Designed and implemented incremental matrix factorization algorithms to support pre-join aggregate computation in large-scale relational databases.
➤ Applied QR, LU, and Cholesky decompositions to accelerate feature selection and reduce computational cost during data lake exploration.
➤ Built a conditional independence testing (CIT) pipeline for statistical feature augmentation to improve downstream ML model performance.
➤ onducted performance benchmarking (runtime, memory, precision) across matrix operations; optimized join strategies via offline simulation framework.

Python
Jupyter
C
Optimization
Data Visualization

Automated Public Equity Research Data Scraping System

➤ Utilized professional terms such as "free equity research reports," "analyst revenue estimates," and "future revenue estimates for public companies" to precisely locate target websites. Strictly evaluated each candidate website to ensure the data's predictive nature and breadth.
➤ Designed and implemented a Python web script to automate the extraction from selected websites. The script incorporated advanced error handling and retry mechanisms to ensure continuous and stable data retrieval despite network instability and server response delays.
➤ Developed a data cleaning and preprocessing module to standardize the format of data from different sources, ensuring consistency and accuracy. Integrated the processed data into the company's MongoDB cluster, supporting large-scale data storage and rapid querying.

Python
Jupyter
MongoDB
JSON

Addressing Voter Turnout Disparities through Data-Driven Resource Allocation

➤ Stricter fairness constraints τ achieve more equitable distribution while maintaining near-optimal turnout impact.
➤ Fairness-constrained allocation better aligns with racial equity objectives.
➤ Unconstrained optimization maximizes turnout but exacerbates existing disparities.

Python
Jupyter
Data Cleaning
Modeling
Analysis
leaflet
Next.js
React
TypeScript

Course Projects

CSE151B - Climate Emulation with U-Net

DSC180 - Active Learning

DSC180 - Targeted Interventions to Reduce Inequality

CSE158 - Fake News Detection

CSE151A - Skin Type Classification

DSC106 - COVID Visualization

DSC106 - Yelp Visualization

COGS108 - Predicting Confusion through EEG

DSC80 - LOL Result Model 2023

DSC80 - Side Analysis of League Of Legends 2023