Instacart Data Science Challenge
Welcome ! Instacart runs on data - and our awesome shoppers. One of the important ways we make customers happy is by delivering their groceries on-time. To do this, we start by asking “How long will a shopping trip take?” Let's find out.
This data can be useful!
Your goal is to predict the shopping time (the difference between shopping_started_at and shopping_ended_at in seconds) for trips in the test set. The shopping time only includes the time it takes for a shopper to pick the items in the store. It does not include the driving and delivering parts.
- Perform any data cleaning, exploratory analysis and visualizations you may need to understand the data.
- Construct a predictive model and discuss why you chose your approach.
- Assess performance of your model, alternatives you consider or concerns you may have.
- Generate a csv file called predictions.csv containing the predictions of the test trips in the following format:
trip_id,shopping_time
130622,900
130625,456
...
note: shopping_time has to be in seconds.
We want you to have the greatest chance of succeeding in this challenge, so please do the following:
- Use either Python or R (these are the primary tools we use) and any open source libraries you'd like - submissions in any other language will not be reviewed
- Make sure your output is formatted in the exact format specified as above
- Include a written and / or visual summary of your work (such as R Markdown, Jupyter notebook or even just a text file or google doc) in addition to your code
IMPORTANT: Please put predictions.csv, your code files and you written summary in a zip file named <your-name>-instacart-ds-challenge.zip
<your-name>-instacart-ds-challenge.zip
- predictions.csv
- code (.py, .r, .rmd, .ipynb, etc..)
- summary.txt
If you have any question, do not hesitate to contact us at ds-challenge@instacart.com
Have fun, and good luck!