Using Amazon Athena to query data inside S3
- Create a Database
- Open the Athena console.
- If this is your first time visiting the Athena console, you’ll go to a Getting Started page. Choose Get Started to open the Query Editor. If it isn’t your first time, the Athena Query Editor opens.
- Choose the link to set up a query result location in Amazon S3.
- In the Settings dialog box, enter the path to the bucket that you created in Amazon S3 for your query results. Prefix the path with
s3://
and add a forward slash to the end of the path.
- Click Save.
- In the Athena Query Editor, you see a query pane. You can type queries and statements here.
- To create a database named
mydatabase
, enter the following CREATE DATABASE statement.CREATE DATABASE mydatabase
- Choose Run Query or press
Ctrl+ENTER
. - Confirm that the catalog display refreshes and
mydatabase
appears in the Database list in the navigation pane on the left.
- Create a Table
Now that you have a database, you’re ready to run a statement to create a table. The table will be based on Athena sample data in the locations3://athena-examples-
. The statement that creates the table defines columns that map to the data, specifies how the data is delimited, and specifies the Amazon S3 location that contains the sample data.aws-region
/cloudfront/plaintext/- For Database, choose
mydatabase
. - Choose the plus (+) sign in the Query Editor to create a tab with a new query. You can have up to ten query tabs open at once.
- In the query pane, enter the following
CREATE TABLE
statement. In theLOCATION
statement at the end of the query, replacemyregion
with the AWS Region that you are currently using (for example,us-west-1
).CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs ( `Date` DATE, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, os STRING, Browser STRING, BrowserVersion STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$" ) LOCATION 's3://athena-examples-
myregion
/cloudfront/plaintext/';NoteReplace
myregion
ins3://athena-examples-
with the region identifier where you run Athena, for example,myregion
/path/to/data/s3://athena-examples-us-west-1/path/to/data/
. - Choose Run Query.The table
cloudfront_logs
is created and appears under the list of Tables for themydatabase
database.
- For Database, choose
- Query Data
- Open a new query tab and enter the following SQL statement in the query pane.
SELECT os, COUNT(*) count
FROM cloudfront_logs WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' GROUP BY os;
- Choose Run Query.
The results look like the following:
- You can save the results of the query to a
.csv
file by choosing the download icon on the Results pane.
- Choose the History tab to view your previous queries.
- Choose Download results to download the results of a previous query. Query history is retained for 45 days.
- Open a new query tab and enter the following SQL statement in the query pane.