
Sign up to save your podcasts
Or


Review: Web Scraping MLB Statistics to Predict Player Salaries Based on Performance
Author: Alexander J. Schoessler
Publication Information: Swarthmore College Senior Theses, Projects, and Awards, Spring 2023
Abstract
This study investigates the relationship between player performance and salaries in Major League Baseball (MLB) and predicts salary fairness using machine learning models. The research utilizes Python web scraping techniques to collect player performance statistics, personal details, and salary data from ESPN and Spotrac. Separate datasets for batters, starting pitchers, and relief pitchers were established. A linear regression model was applied to predict salaries and analyze the alignment between salaries and performance.
Results indicate that high-salary players (e.g., top batters and starting pitchers) are often overvalued, while seasoned players with strong performance but lower salaries are undervalued. The model achieved a moderate accuracy with R² scores ranging from 0.5 to 0.6.
Key Contributions
Data Sources
The data primarily came from ESPN and Spotrac, including player performance statistics, salary, and contract information.
Methodology
Results
Conclusions and Recommendations
The study highlights significant structural issues in MLB salary distribution and recommends adopting data-driven methods for salary decision-making, using player performance metrics to enhance salary evaluations. Future research could improve prediction accuracy and utility by incorporating more advanced metrics and exploring other machine learning models.
Review: Web Scraping MLB Statistics to Predict Player Salaries Based on Performance
作者Alexander J. Schoessler
出版資訊Swarthmore College Senior Theses, Projects, and Awards, Spring 2023
摘要
本研究旨在探討美國職棒大聯盟(MLB)球員的表現與薪資之間的關係,並通過機器學習模型預測薪資是否合理。研究利用Python的網頁爬蟲技術,從ESPN與Spotrac網站收集MLB選手的技術指標、基本資料及薪資數據,並分別建立打者、先發投手與後援投手的數據集。最後,採用線性回歸模型進行薪資預測,並分析薪資與表現的匹配情況。
結果顯示,高薪球員(如頂尖打者與先發投手)可能被高估,而表現出色但薪資較低的資深球員則被低估,且模型的R²分數約為0.5-0.6,預測精度適中。
主要貢獻
數據來源
研究數據主要來自ESPN與Spotrac,包括球員技術指標、薪資與合約信息。
研究方法
結果
結論與建議
研究顯示MLB薪資分配中存在顯著的結構性問題,建議球隊在薪資決策中採用數據驅動的方法,結合球員技術指標進行更精準的薪資評估。此外,未來可通過引入更高階的技術指標與其他機器學習模型,進一步提升薪資預測的準確性與實用性。
By C.Y. LUReview: Web Scraping MLB Statistics to Predict Player Salaries Based on Performance
Author: Alexander J. Schoessler
Publication Information: Swarthmore College Senior Theses, Projects, and Awards, Spring 2023
Abstract
This study investigates the relationship between player performance and salaries in Major League Baseball (MLB) and predicts salary fairness using machine learning models. The research utilizes Python web scraping techniques to collect player performance statistics, personal details, and salary data from ESPN and Spotrac. Separate datasets for batters, starting pitchers, and relief pitchers were established. A linear regression model was applied to predict salaries and analyze the alignment between salaries and performance.
Results indicate that high-salary players (e.g., top batters and starting pitchers) are often overvalued, while seasoned players with strong performance but lower salaries are undervalued. The model achieved a moderate accuracy with R² scores ranging from 0.5 to 0.6.
Key Contributions
Data Sources
The data primarily came from ESPN and Spotrac, including player performance statistics, salary, and contract information.
Methodology
Results
Conclusions and Recommendations
The study highlights significant structural issues in MLB salary distribution and recommends adopting data-driven methods for salary decision-making, using player performance metrics to enhance salary evaluations. Future research could improve prediction accuracy and utility by incorporating more advanced metrics and exploring other machine learning models.
Review: Web Scraping MLB Statistics to Predict Player Salaries Based on Performance
作者Alexander J. Schoessler
出版資訊Swarthmore College Senior Theses, Projects, and Awards, Spring 2023
摘要
本研究旨在探討美國職棒大聯盟(MLB)球員的表現與薪資之間的關係,並通過機器學習模型預測薪資是否合理。研究利用Python的網頁爬蟲技術,從ESPN與Spotrac網站收集MLB選手的技術指標、基本資料及薪資數據,並分別建立打者、先發投手與後援投手的數據集。最後,採用線性回歸模型進行薪資預測,並分析薪資與表現的匹配情況。
結果顯示,高薪球員(如頂尖打者與先發投手)可能被高估,而表現出色但薪資較低的資深球員則被低估,且模型的R²分數約為0.5-0.6,預測精度適中。
主要貢獻
數據來源
研究數據主要來自ESPN與Spotrac,包括球員技術指標、薪資與合約信息。
研究方法
結果
結論與建議
研究顯示MLB薪資分配中存在顯著的結構性問題,建議球隊在薪資決策中採用數據驅動的方法,結合球員技術指標進行更精準的薪資評估。此外,未來可通過引入更高階的技術指標與其他機器學習模型,進一步提升薪資預測的準確性與實用性。