Web scraping is a technique that uses different software tools to extract data or information from a web page. It is used to collect data without structure and convert it into structured data to be later processed in databases or spreadsheets. The workshop will adopt a practical approach to web scraping with the aim of allowing attendees to carry out the processing of useful information in their own projects.
The meeting will establish an ongoing line of work focusing on data and the viewing of data overseen by the Montera34 group and following on from the Maps&Data workshops held in 2016 and 2017 in Hirikilabs, one of the results of which was the Report on the Airbnb effect in Donostia and the Basque Country. The objective of this new line of work, consisting of meetings and workshops, is to feed into the DataCommonsLab, a new open group that will be carrying out ongoing work on data and which will meet periodically in Hirikilabs.
February 6, Tuesday
Introduction: Presentation of the activity, establishment of the context and explanation of the aims of the workshop.
Introduction to scraping: Explanation of web operation (HTML, JSON, APIs ...), and introduction to forms of information storage obtained.
Scraper development: Explanation and practical application of initial tools to carry out scraping (Postman, Python, Beautiful Soup, etc.).
February 7, Wednesday
Scraper development: Continuation of the previous day's session.
Introduction to advanced scraping techniques: JavaScript execution, use of proxies, other issues arising in the carrying out of the workshop.
El web scraping es una técnica que emplea diferentes softwares para extraer datos o información de una página web. Se usa para recoger datos sin estructura y convertirlos en datos estructurados para posteriormente ser tratados en bases de datos u hojas de cálculo.