Commit 8efca395 authored by hinoka@google.com's avatar hinoka@google.com

Added gsutil/gslib to depot_tools/third_party

This is needed for https://chromiumcodereview.appspot.com/12042069/
Which uses gsutil to download objects from Google Storage based on SHA1 sums

Continuation of: https://chromiumcodereview.appspot.com/12317103/
Rietveld didn't like a giant CL with all of gsutil (kept crashing on upload),
The CL is being split into three parts

Related:
https://chromiumcodereview.appspot.com/12755026 (gsutil/boto)
https://codereview.chromium.org/12685009/ (gsutil/)

BUG=

Review URL: https://codereview.chromium.org/12685010

git-svn-id: svn://svn.chromium.org/chrome/trunk/tools/depot_tools@188842 0039d316-1c4b-4281-b951-d872f2087c98
parent 50f1d2a1
This directory contains library code used by gsutil. Users are cautioned not
to write programs that call the internal interfaces defined in here; these
interfaces were defined only for use by gsutil, and are subject to change
without notice. Moreover, Google supports this library only when used by
gsutil, not when the library interfaces are called directly by other programs.
# Copyright 2010 Google Inc. All Rights Reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish, dis-
# tribute, sublicense, and/or sell copies of the Software, and to permit
# persons to whom the Software is furnished to do so, subject to the fol-
# lowing conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL-
# ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
# SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
"""Package marker file."""
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Package marker file."""
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
Access Control Lists (ACLs) allow you to control who can read and write
your data, and who can read and write the ACLs themselves.
If not specified at the time an object is uploaded (e.g., via the gsutil cp
-a option), objects will be created with a default object ACL set on the
bucket (see "gsutil help setdefacl"). You can replace the ACL on an object
or bucket using the gsutil setacl command (see "gsutil help setacl"), or
modify the existing ACL using the gsutil chacl command (see "gsutil help
chacl").
<B>BUCKET VS OBJECT ACLS</B>
In Google Cloud Storage, the bucket ACL works as follows:
- Users granted READ access are allowed to list the bucket contents.
- Users granted WRITE access are allowed READ access and also are
allowed to write and delete objects in that bucket -- including
overwriting previously written objects.
- Users granted FULL_CONTROL access are allowed WRITE access and also
are allowed to read and write the bucket's ACL.
The object ACL works as follows:
- Users granted READ access are allowed to read the object's data and
metadata.
- Users granted FULL_CONTROL access are allowed READ access and also
are allowed to read and write the object's ACL.
A couple of points are worth noting, that sometimes surprise users:
1. There is no WRITE access for objects; attempting to set an ACL with WRITE
permission for an object will result in an error.
2. The bucket ACL plays no role in determining who can read objects; only the
object ACL matters for that purpose. This is different from how things
work in Linux file systems, where both the file and directory permission
control file read access. It also means, for example, that someone with
FULL_CONTROL over the bucket may not have read access to objects in
the bucket. This is by design, and supports useful cases. For example,
you might want to set up bucket ownership so that a small group of
administrators have FULL_CONTROL on the bucket (with the ability to
delete data to control storage costs), but not grant those users read
access to the object data (which might be sensitive data that should
only be accessed by a different specific group of users).
<B>CANNED ACLS</B>
The simplest way to set an ACL on a bucket or object is using a "canned
ACL". The available canned ACLs are:
project-private Gives permission to the project team based on their
roles. Anyone who is part of the team has READ
permission, and project owners and project editors
have FULL_CONTROL permission. This is the default
ACL for newly created buckets. This is also the
default ACL for newly created objects unless the
default object ACL for that bucket has been
changed. For more details see
"gsutil help projects".
private Gives the requester (and only the requester)
FULL_CONTROL permission for a bucket or object.
public-read Gives the requester FULL_CONTROL permission and
gives all users READ permission. When you apply
this to an object, anyone on the Internet can
read the object without authenticating.
public-read-write Gives the requester FULL_CONTROL permission and
gives all users READ and WRITE permission. This
ACL applies only to buckets.
authenticated-read Gives the requester FULL_CONTROL permission and
gives all authenticated Google account holders
READ permission.
bucket-owner-read Gives the requester FULL_CONTROL permission and
gives the bucket owner READ permission. This is
used only with objects.
bucket-owner-full-control Gives the requester FULL_CONTROL permission and
gives the bucket owner FULL_CONTROL
permission. This is used only with objects.
<B>ACL XML</B>
When you use a canned ACL, it is translated into an XML representation
that can later be retrieved and edited to specify more fine-grained
detail about who can read and write buckets and objects. By running
the gsutil getacl command you can retrieve the ACL XML, and edit it to
customize the permissions.
As an example, if you create an object in a bucket that has no default
object ACL set and then retrieve the ACL on the object, it will look
something like this:
<AccessControlList>
<Owner>
<ID>
00b4903a9740e42c29800f53bd5a9a62a2f96eb3f64a4313a115df3f3a776bf7
</ID>
</Owner>
<Entries>
<Entry>
<Scope type="GroupById">
<ID>
00b4903a9740e42c29800f53bd5a9a62a2f96eb3f64a4313a115df3f3a776bf7
</ID>
</Scope>
<Permission>
FULL_CONTROL
</Permission>
</Entry>
<Entry>
<Scope type="GroupById">
<ID>
00b4903a977fd817e9da167bc81306489181a110456bb635f466d71cf90a0d51
</ID>
</Scope>
<Permission>
FULL_CONTROL
</Permission>
</Entry>
<Entry>
<Scope type="GroupById">
<ID>
00b4903a974898cc8fc309f2f2835308ba3d3df1b889d3fc7e33e187d52d8e71
</ID>
</Scope>
<Permission>
READ
</Permission>
</Entry>
</Entries>
</AccessControlList>
The ACL consists of an Owner element and a collection of Entry elements,
each of which specifies a Scope and a Permission. Scopes are the way you
specify an individual or group of individuals, and Permissions specify what
access they're permitted.
This particular ACL grants FULL_CONTROL to two groups (which means members
of those groups are allowed to read the object and read and write the ACL),
and READ permission to a third group. The project groups are (in order)
the owners group, editors group, and viewers group.
The 64 digit hex identifiers used in this ACL are called canonical IDs,
and are used to identify predefined groups associated with the project that
owns the bucket. For more information about project groups, see "gsutil
help projects".
Here's an example of an ACL specified using the GroupByEmail and GroupByDomain
scopes:
<AccessControlList>
<Entries>
<Entry>
<Permission>
FULL_CONTROL
</Permission>
<Scope type="GroupByEmail">
<EmailAddress>travel-companion-owners@googlegroups.com</EmailAddress>
</Scope>
</Entry>
<Entry>
<Permission>
READ
</Permission>
<Scope type="GroupByDomain">
<Domain>example.com</Domain>
</Scope>
</Entry>
</Entries>
</AccessControlList>
This ACL grants members of an email group FULL_CONTROL, and grants READ
access to any user in a domain (which must be a Google Apps for Business
domain). By applying email group grants to a collection of objects
you can edit access control for large numbers of objects at once via
http://groups.google.com. That way, for example, you can easily and quickly
change access to a group of company objects when employees join and leave
your company (i.e., without having to individually change ACLs across
potentially millions of objects).
<B>SHARING SCENARIOS</B>
For more detailed examples how to achieve various useful sharing use
cases see https://developers.google.com/storage/docs/collaboration
""")
class CommandOptions(HelpProvider):
"""Additional help about Access Control Lists."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'acls',
# List of help name aliases.
HELP_NAME_ALIASES : ['acl', 'ACL', 'access control', 'access control list',
'authorization', 'canned', 'canned acl'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Working with Access Control Lists',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
gsutil users can access publicly readable data without obtaining
credentials. For example, the gs://uspto-pair bucket contains a number
of publicly readable objects, so any user can run the following command
without first obtaining credentials:
gsutil ls gs://uspto-pair/applications/0800401*
Users can similarly download objects they find via the above gsutil ls
command.
If a user without credentials attempts to access protected data using gsutil,
they will be prompted to run "gsutil config" to obtain credentials.
See "gsutil help acls" for more details about data protection.
""")
class CommandOptions(HelpProvider):
"""Additional help about Access Control Lists."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'anon',
# List of help name aliases.
HELP_NAME_ALIASES : ['anonymous', 'public'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY :
'Accessing public data without obtaining credentials',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
Top-level gsutil Options
<B>DESCRIPTION</B>
gsutil supports separate options for the top-level gsutil command and
the individual sub-commands (like cp, rm, etc.) The top-level options
control behavior of gsutil that apply across commands. For example, in
the command:
gsutil -m cp -p file gs://bucket/obj
the -m option applies to gsutil, while the -p option applies to the cp
sub-command.
<B>OPTIONS</B>
-d Shows HTTP requests/headers.
-D Shows HTTP requests/headers plus additional debug info needed when
posting support requests.
-DD Shows HTTP requests/headers plus additional debug info plus HTTP
upstream payload.
-h Allows you to specify additional HTTP headers, for example:
gsutil -h "Cache-Control:public,max-age=3600" \\
-h "Content-Type:text/html" cp ...
Note that you need to quote the headers/values that
contain spaces (such as "Content-Disposition: attachment;
filename=filename.ext"), to avoid having the shell split them
into separate arguments.
Note that because the -h option allows you to specify any HTTP
header, it is both powerful and potentially dangerous:
- It is powerful because it allows you to specify headers that
gsutil doesn't currently know about (e.g., to request
service features from a different storage service provider
than Google); or to override the values gsutil would normally
send with different values.
- It is potentially dangerous because you can specify headers
that cause gsutil to send invalid requests, or that in
other ways change the behavior of requests.
Thus, you should be sure you understand the underlying storage
service HTTP API (and what impact the headers you specify will
have) before using the gsutil -h option.
See also "gsutil help setmeta" for the ability to set metadata
fields on objects after they have been uploaded.
-m Causes supported operations (cp, mv, rm, setacl, setmeta) to run
in parallel. This can significantly improve performance if you are
uploading, downloading, moving, removing, or changing ACLs on
a large number of files over a fast network connection.
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads
and processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration
file. You might want to experiment with these value, as the
best value can vary based on a number of factors, including
network speed, number of CPUs, and available memory.
Using the -m option may make your performance worse if you
are using a slower network, such as the typical network speeds
offered by non-business home network plans.
If a download or upload operation using parallel transfer fails
before the entire transfer is complete (e.g. failing after 300 of
1000 files have been transferred), you will need to restart the
entire transfer.
-s Tells gsutil to use a simulated storage provider (for testing).
""")
class CommandOptions(HelpProvider):
"""Additional help about gsutil command-level options."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'options',
# List of help name aliases.
HELP_NAME_ALIASES : ['arg', 'args', 'cli', 'opt', 'opts'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'gsutil-level command line options',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
We're open to incorporating gsutil code changes authored by users. Here
are some guidelines:
1. Before we can accept code submissions, we have to jump a couple of legal
hurdles. Please fill out either the individual or corporate Contributor
License Agreement:
- If you are an individual writing original source code and you're
sure you own the intellectual property,
then you'll need to sign an individual CLA
(http://code.google.com/legal/individual-cla-v1.0.html).
- If you work for a company that wants to allow you to contribute your
work to gsutil, then you'll need to sign a corporate CLA
(http://code.google.com/legal/corporate-cla-v1.0.html)
Follow either of the two links above to access the appropriate CLA and
instructions for how to sign and return it. Once we receive it, we'll
add you to the official list of contributors and be able to accept
your patches.
2. If you found a bug or have an idea for a feature enhancement, we suggest
you check http://code.google.com/p/gsutil/issues/list to see if it has
already been reported by another user. From there you can also add yourself
to the Cc list for an issue, so you will find out about any developments.
3. It's usually worthwhile to send email to gs-team@google.com about your
idea before sending actual code. Often we can discuss the idea and help
propose things that could save you later revision work.
4. We tend to avoid adding command line options that are of use to only
a very small fraction of users, especially if there's some other way
to accommodate such needs. Adding such options complicates the code and
also adds overhead to users having to read through an "alphabet soup"
list of option documentation.
5. While gsutil has a number of features specific to Google Cloud Storage,
it can also be used with other cloud storage providers. We're open to
including changes for making gsutil support features specific to other
providers, as long as those changes don't make gsutil work worse for Google
Cloud Storage. If you do make such changes we recommend including someone
with knowledge of the specific provider as a code reviewer (see below).
6. You can check out the gsutil code from svn - see
http://code.google.com/p/gsutil/source/checkout. Then change directories
into gsutil/src, and check out the boto code from github:
git clone git://github.com/boto/boto.git
7. Please make sure to run all tests against your modified code. To
do this, change directories into the gsutil top-level directory and run:
./gsutil test
The above tests take a long time to run because they send many requests to
the production service. The gsutil test command has a -u argument that will
only run unit tests. These run quickly, as they are executed with an
in-memory mock storage service implementation. To run only the unit tests,
run:
./gsutil test -u
If you made mods to boto please run the boto tests. For these tests you
need to use HMAC credentials (from gsutil config -a), because the current
boto test suite doesn't import the OAuth2 handler. You'll also need to
install some python modules: change directories into the top-level gsutil
directory and run:
pip install -qr boto/requirements.txt
(You probably need to run this commad using sudo.)
Make sure each of the individual installations succeeded. If they don't
you may need to run individual ones again, e.g.,
pip install unittest2
Then ensure your .boto file has HMAC credentials defined (the boto tests
don't load the OAUTH2 plugin), and then change directories into boto/tests
and run:
python test.py unit
python test.py -t s3 -t gs -t ssl
8. Please consider contributing test code for your change, especially if the
change impacts any of the core gsutil code (like the gsutil cp command).
9. When it's time to send us code, please use the Rietveld code review tool
rather than simply sending us a code patch. Do this as follows:
- check out the gsutil code from at
http://code.google.com/p/gsutil/source/checkout and apply your changes
in the checked out directory.
- download the "upload.py" script from
http://code.google.com/p/rietveld/wiki/UploadPyUsage
- run upload.py from the above gsutil svn directory.
- click the codereview.appspot.com link it generates, click "Edit Issue",
and add mfschwartz@google.com as a reviewer, and Cc gs-team@google.com.
- click Publish+Mail Comments.
""")
class CommandOptions(HelpProvider):
"""Additional help about Access Control Lists."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'dev',
# List of help name aliases.
HELP_NAME_ALIASES : ['development', 'developer', 'code', 'mods',
'software'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Making modifications to gsutil',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW OF METADATA</B>
Objects can have associated metadata, which control aspects of how
GET requests are handled, including Content-Type, Cache-Control,
Content-Disposition, and Content-Encoding (discussed in more detail in
the subsections below). In addition, you can set custom metadata that
can be used by applications (e.g., tagging that particular objects possess
some property).
There are two ways to set metadata on objects:
- at upload time you can specify one or more headers to associate with
objects, using the gsutil -h option. For example, the following command
would cause gsutil to set the Content-Type and Cache-Control for each
of the files being uploaded:
gsutil -h "Content-Type:text/html" -h "Cache-Control:public, max-age=3600" cp -r images gs://bucket/images
Note that -h is an option on the gsutil command, not the cp sub-command.
- You can set or remove metadata fields from already uploaded objects using
the gsutil setmeta command. See "gsutil help setmeta".
More details about specific pieces of metadata are discussed below.
<B>CONTENT TYPE</B>
The most commonly set metadata is Content-Type (also known as MIME type),
which allows browsers to render the object properly.
gsutil sets the Content-Type
automatically at upload time, based on each filename extension. For
example, uploading files with names ending in .txt will set Content-Type
to text/plain. If you're running gsutil on Linux or MacOS and would prefer
to have content type set based on naming plus content examination, see the
use_magicfile configuration variable in the gsutil/boto configuration file
(See also "gsutil help config"). In general, using use_magicfile is more
robust and configurable, but is not available on Windows.
If you specify a -h header when uploading content (like the example gsutil
command given in the previous section), it overrides the Content-Type that
would have been set based on filename extension or content. This can be
useful if the Content-Type detection algorithm doesn't work as desired
for some of your files.
You can also completely suppress content type detection in gsutil, by
specifying an empty string on the Content-Type header:
gsutil -h 'Content-Type:' cp -r images gs://bucket/images
In this case, the Google Cloud Storage service will attempt to detect
the content type. In general this approach will work better than using
filename extension-based content detection in gsutil, because the list of
filename extensions is kept more current in the server-side content detection
system than in the Python library upon which gsutil content type detection
depends. (For example, at the time of writing this, the filename extension
".webp" was recognized by the server-side content detection system, but
not by gsutil.)
<B>CACHE-CONTROL</B>
Another commonly set piece of metadata is Cache-Control, which allows
you to control whether and for how long browser and Internet caches are
allowed to cache your objects. Cache-Control only applies to objects with
a public-read ACL. Non-public data are not cacheable.
Here's an example of uploading an object set to allow caching:
gsutil -h "Cache-Control:public,max-age=3600" cp -a public-read -r html gs://bucket/html
This command would upload all files in the html directory (and subdirectories)
and make them publicly readable and cacheable, with cache expiration of
one hour.
Note that if you allow caching, at download time you may see older versions
of objects after uploading a newer replacement object. Note also that because
objects can be cached at various places on the Internet there is no way to
force a cached object to expire globally (unlike the way you can force your
browser to refresh its cache).
<B>CONTENT-ENCODING</B>
You could specify Content-Encoding to indicate that an object is compressed,
using a command like:
gsutil -h "Content-Encoding:gzip" cp *.gz gs://bucket/compressed
Note that Google Cloud Storage does not compress or decompress objects. If
you use this header to specify a compression type or compression algorithm
(for example, deflate), Google Cloud Storage preserves the header but does
not compress or decompress the object. Instead, you need to ensure that
the files have been compressed using the specified Content-Encoding before
using gsutil to upload them.
For compressible content, using Content-Encoding:gzip saves network and
storage costs, and improves content serving performance (since most browsers
are able to decompress objects served this way).
Note also that gsutil provides an easy way to cause content to be compressed
and stored with Content-Encoding:gzip: see the -z option in "gsutil help cp".
<B>CONTENT-DISPOSITION</B>
You can set Content-Disposition on your objects, to specify presentation
information about the data being transmitted. Here's an example:
gsutil -h 'Content-Disposition:attachment; filename=filename.ext' \\
cp -r attachments gs://bucket/attachments
Setting the Content-Disposition allows you to control presentation style
of the content, for example determining whether an attachment should be
automatically displayed vs should require some form of action from the user to
open it. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1
for more details about the meaning of Content-Disposition.
<B>CUSTOM METADATA</B>
You can add your own custom metadata (e.g,. for use by your application)
to an object by setting a header that starts with "x-goog-meta", for example:
gsutil -h x-goog-meta-reviewer:jane cp mycode.java gs://bucket/reviews
You can add multiple differently named custom metadata fields to each object.
<B>SETTABLE FIELDS; FIELD VALUES</B>
You can't set some metadata fields, such as ETag and Content-Length. The
fields you can set are:
- Cache-Control
- Content-Disposition
- Content-Encoding
- Content-Language
- Content-MD5
- Content-Type
- Any field starting with X-GOOG-META- (i.e., custom metadata).
Header names are case-insensitive.
X-GOOG-META- fields can have data set to arbitrary Unicode values. All
other fields must have ASCII values.
<B>VIEWING CURRENTLY SET METADATA</B>
You can see what metadata is currently set on an object by using:
gsutil ls -L gs://the_bucket/the_object
""")
class CommandOptions(HelpProvider):
"""Additional help about object metadata."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'metadata',
# List of help name aliases.
HELP_NAME_ALIASES : ['cache-control', 'caching', 'content type',
'mime type', 'mime', 'type'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Working with object metadata',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>BUCKET NAME REQUIREMENTS</B>
Google Cloud Storage has a single namespace, so you will not be allowed
to create a bucket with a name already in use by another user. You can,
however, carve out parts of the bucket name space corresponding to your
company's domain name (see "DOMAIN NAMED BUCKETS").
Bucket names must conform to standard DNS naming conventions. This is
because a bucket name can appear in a DNS record as part of a CNAME
redirect. In addition to meeting DNS naming requirements, Google Cloud
Storage imposes other requirements on bucket naming. At a minimum, your
bucket names must meet the following requirements:
- Bucket names must contain only lowercase letters, numbers, dashes (-), and
dots (.).
- Bucket names must start and end with a number or letter.
- Bucket names must contain 3 to 63 characters. Names containing dots can
contain up to 222 characters, but each dot-separated component can be
no longer than 63 characters.
- Bucket names cannot be represented as an IPv4 address in dotted-decimal
notation (for example, 192.168.5.4).
- Bucket names cannot begin with the "goog" prefix.
- For DNS compliance, you should not have a period adjacent to another
period or dash. For example, ".." or "-." or ".-" are not acceptable.
<B>OBJECT NAME REQUIREMENTS</B>
Object names can contain any sequence of Unicode characters, of length 1-1024
bytes when UTF-8 encoded. Object names must not contain CarriageReturn,
CarriageReturnLineFeed, or the XML-disallowed surrogate blocks (xFFFE
or xFFFF).
We highly recommend that you avoid using control characters that are illegal
in XML 1.0 in your object names. These characters will cause XML listing
issues when you try to list your objects.
<B>DOMAIN NAMED BUCKETS</B>
You can carve out parts of the Google Cloud Storage bucket name space
by creating buckets with domain names (like "example.com").
Before you can create a bucket name containing one or more '.' characters,
the following rules apply:
- If the name is a syntactically valid DNS name ending with a
currently-recognized top-level domain (such as .com), you will be required
to verify domain ownership.
- Otherwise you will be disallowed from creating the bucket.
If your project needs to use a domain-named bucket, you need to have
a team member both verify the domain and create the bucket. This is
because Google Cloud Storage checks for domain ownership against the
user who creates the bucket, so the user who creates the bucket must
also be verified as an owner or manager of the domain.
To verify as the owner or manager of a domain, use the Google Webmaster
Tools verification process. The Webmaster Tools verification process
provides three methods for verifying an owner or manager of a domain:
1. Adding a special Meta tag to a site's homepage.
2. Uploading a special HTML file to a site.
3. Adding a DNS TXT record to a domain's DNS configuration.
Meta tag verification and HTML file verification are easier to perform and
are probably adequate for most situations. DNS TXT record verification is
a domain-based verification method that is useful in situations where a
site wants to tightly control who can create domain-named buckets. Once
a site creates a DNS TXT record to verify ownership of a domain, it takes
precedence over meta tag and HTML file verification. For example, you might
have two IT staff members who are responsible for managing your site, called
"example.com." If they complete the DNS TXT record verification, only they
would be able to create buckets called "example.com", "reports.example.com",
"downloads.example.com", and other domain-named buckets.
Site-Based Verification
If you have administrative control over the HTML files that make up a site,
you can use one of the site-based verification methods to verify that you
control or own a site. When you do this, Google Cloud Storage lets you
create buckets representing the verified site and any sub-sites - provided
nobody has used the DNS TXT record method to verify domain ownership of a
parent of the site.
As an example, assume that nobody has used the DNS TXT record method to verify
ownership of the following domains: abc.def.example.com, def.example.com,
and example.com. In this case, Google Cloud Storage lets you create a bucket
named abc.def.example.com if you verify that you own or control any of the
following sites:
http://abc.def.example.com
http://def.example.com
http://example.com
Domain-Based Verification
If you have administrative control over a domain's DNS configuration, you can
use the DNS TXT record verification method to verify that you own or control a
domain. When you use the domain-based verification method to verify that you
own or control a domain, Google Cloud Storage lets you create buckets that
represent any subdomain under the verified domain. Furthermore, Google Cloud
Storage prevents anybody else from creating buckets under that domain unless
you add their name to the list of verified domain owners or they have verified
their domain ownership by using the DNS TXT record verification method.
For example, if you use the DNS TXT record verification method to verify your
ownership of the domain example.com, Google Cloud Storage will let you create
bucket names that represent any subdomain under the example.com domain, such
as abc.def.example.com, example.com/music/jazz, or abc.example.com/music/jazz.
Using the DNS TXT record method to verify domain ownership supersedes
verification by site-based verification methods. For example, if you
use the Meta tag method or HTML file method to verify domain ownership
of http://example.com, but someone else uses the DNS TXT record method
to verify ownership of the example.com domain, Google Cloud Storage will
not allow you to create a bucket named example.com. To create the bucket
example.com, the domain owner who used the DNS TXT method to verify domain
ownership must add you to the list of verified domain owners for example.com.
The DNS TXT record verification method is particularly useful if you manage
a domain for a large organization that has numerous subdomains because it
lets you control who can create buckets representing those domain names.
Note: If you use the DNS TXT record verification method to verify ownership of
a domain, you cannot create a CNAME record for that domain. RFC 1034 disallows
inclusion of any other resource records if there is a CNAME resource record
present. If you want to create a CNAME resource record for a domain, you must
use the Meta tag verification method or the HTML file verification method.
""")
class CommandOptions(HelpProvider):
"""Additional help about gsutil object and bucket naming."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'naming',
# List of help name aliases.
HELP_NAME_ALIASES : ['domain', 'limits', 'name', 'names'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Object and bucket naming',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
If you use gsutil in large production tasks (such as uploading or
downloading many GB of data each night), there are a number of things
you can do to help ensure success. Specifically, this section discusses
how to script large production tasks around gsutil's resumable transfer
mechanism.
<B>BACKGROUND ON RESUMABLE TRANSFERS</B>
First, it's helpful to understand gsutil's resumable transfer mechanism,
and how your script needs to be implemented around this mechanism to work
reliably. gsutil uses the resumable transfer support in the boto library
when you attempt to upload or download a file larger than a configurable
threshold (by default, this threshold is 1MB). When a transfer fails
partway through (e.g., because of an intermittent network problem),
boto uses a randomized binary exponential backoff-and-retry strategy:
wait a random period between [0..1] seconds and retry; if that fails,
wait a random period between [0..2] seconds and retry; and if that
fails, wait a random period between [0..4] seconds, and so on, up to a
configurable number of times (the default is 6 times). Thus, the retry
actually spans a randomized period up to 1+2+4+8+16+32=63 seconds.
If the transfer fails each of these attempts with no intervening
progress, gsutil gives up on the transfer, but keeps a "tracker" file
for it in a configurable location (the default location is ~/.gsutil/,
in a file named by a combination of the SHA1 hash of the name of the
bucket and object being transferred and the last 16 characters of the
file name). When transfers fail in this fashion, you can rerun gsutil
at some later time (e.g., after the networking problem has been
resolved), and the resumable transfer picks up where it left off.
<B>SCRIPTING DATA TRANSFER TASKS</B>
To script large production data transfer tasks around this mechanism,
you can implement a script that runs periodically, determines which file
transfers have not yet succeeded, and runs gsutil to copy them. Below,
we offer a number of suggestions about how this type of scripting should
be implemented:
1. When resumable transfers fail without any progress 6 times in a row
over the course of up to 63 seconds, it probably won't work to simply
retry the transfer immediately. A more successful strategy would be to
have a cron job that runs every 30 minutes, determines which transfers
need to be run, and runs them. If the network experiences intermittent
problems, the script picks up where it left off and will eventually
succeed (once the network problem has been resolved).
2. If your business depends on timely data transfer, you should consider
implementing some network monitoring. For example, you can implement
a task that attempts a small download every few minutes and raises an
alert if the attempt fails for several attempts in a row (or more or less
frequently depending on your requirements), so that your IT staff can
investigate problems promptly. As usual with monitoring implementations,
you should experiment with the alerting thresholds, to avoid false
positive alerts that cause your staff to begin ignoring the alerts.
3. There are a variety of ways you can determine what files remain to be
transferred. We recommend that you avoid attempting to get a complete
listing of a bucket containing many objects (e.g., tens of thousands
or more). One strategy is to structure your object names in a way that
represents your transfer process, and use gsutil prefix wildcards to
request partial bucket listings. For example, if your periodic process
involves downloading the current day's objects, you could name objects
using a year-month-day-object-ID format and then find today's objects by
using a command like gsutil ls gs://bucket/2011-09-27-*. Note that it
is more efficient to have a non-wildcard prefix like this than to use
something like gsutil ls gs://bucket/*-2011-09-27. The latter command
actually requests a complete bucket listing and then filters in gsutil,
while the former asks Google Storage to return the subset of objects
whose names start with everything up to the *.
For data uploads, another technique would be to move local files from a "to
be processed" area to a "done" area as your script successfully copies files
to the cloud. You can do this in parallel batches by using a command like:
gsutil -m cp -R to_upload/subdir_$i gs://bucket/subdir_$i
where i is a shell loop variable. Make sure to check the shell $status
variable is 0 after each gsutil cp command, to detect if some of the copies
failed, and rerun the affected copies.
With this strategy, the file system keeps track of all remaining work to
be done.
4. If you have really large numbers of objects in a single bucket
(say hundreds of thousands or more), you should consider tracking your
objects in a database instead of using bucket listings to enumerate
the objects. For example this database could track the state of your
downloads, so you can determine what objects need to be downloaded by
your periodic download script by querying the database locally instead
of performing a bucket listing.
5. Make sure you don't delete partially downloaded files after a transfer
fails: gsutil picks up where it left off (and performs an MD5 check of
the final downloaded content to ensure data integrity), so deleting
partially transferred files will cause you to lose progress and make
more wasteful use of your network. You should also make sure whatever
process is waiting to consume the downloaded data doesn't get pointed
at the partially downloaded files. One way to do this is to download
into a staging directory and then move successfully downloaded files to
a directory where consumer processes will read them.
6. If you have a fast network connection, you can speed up the transfer of
large numbers of files by using the gsutil -m (multi-threading /
multi-processing) option. Be aware, however, that gsutil doesn't attempt to
keep track of which files were downloaded successfully in cases where some
files failed to download. For example, if you use multi-threaded transfers
to download 100 files and 3 failed to download, it is up to your scripting
process to determine which transfers didn't succeed, and retry them. A
periodic check-and-run approach like outlined earlier would handle this case.
If you use parallel transfers (gsutil -m) you might want to experiment with
the number of threads being used (via the parallel_thread_count setting
in the .boto config file). By default, gsutil uses 24 threads. Depending
on your network speed, available memory, CPU load, and other conditions,
this may or may not be optimal. Try experimenting with higher or lower
numbers of threads, to find the best number of threads for your environment.
""")
class CommandOptions(HelpProvider):
"""Additional help about using gsutil for production tasks."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'prod',
# List of help name aliases.
HELP_NAME_ALIASES : ['production', 'resumable', 'resumable upload',
'resumable transfer', 'resumable download',
'scripts', 'scripting'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Scripting production data transfers with gsutil',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
This section discusses how to work with projects in Google Cloud Storage.
For more information about using the Google APIs Console to administer
project memberships (which are automatically included in ACLs for buckets
you create) see https://code.google.com/apis/console#:storage:access.
<B>PROJECT MEMBERS AND PERMISSIONS</B>
There are three groups of users associated with each project:
- Project Owners are allowed to list, create, and delete buckets,
and can also perform administrative tasks like adding and removing team
members and changing billing. The project owners group is the owner
of all buckets within a project, regardless of who may be the original
bucket creator.
- Project Editors are allowed to list, create, and delete buckets.
- All Project Team Members are allowed to list buckets within a project.
These projects make it easy to set up a bucket and start uploading objects
with access control appropriate for a project at your company, as the three
group memberships can be configured by your administrative staff. Control
over projects and their associated memberships is provided by the Google
APIs Console (https://code.google.com/apis/console).
<B>HOW PROJECT MEMBERSHIP IS REFLECTED IN BUCKET ACLS</B>
When you create a bucket without specifying an ACL the bucket is given a
"project-private" ACL, which grants the permissions described in the previous
section. Here's an example of such an ACL:
<AccessControlList>
<Owner>
<ID>
00b4903a9740e42c29800f53bd5a9a62a2f96eb3f64a4313a115df3f3a776bf7
</ID>
</Owner>
<Entries>
<Entry>
<Scope type="GroupById">
<ID>
00b4903a9740e42c29800f53bd5a9a62a2f96eb3f64a4313a115df3f3a776bf7
</ID>
</Scope>
<Permission>
FULL_CONTROL
</Permission>
</Entry>
<Entry>
<Scope type="GroupById">
<ID>
00b4903a977fd817e9da167bc81306489181a110456bb635f466d71cf90a0d51
</ID>
</Scope>
<Permission>
FULL_CONTROL
</Permission>
</Entry>
<Entry>
<Scope type="GroupById">
<ID>
00b4903a974898cc8fc309f2f2835308ba3d3df1b889d3fc7e33e187d52d8e71
</ID>
</Scope>
<Permission>
READ
</Permission>
</Entry>
</Entries>
</AccessControlList>
The three "GroupById" scopes are the canonical IDs for the Project Owners,
Project Editors, and All Project Team Members groups.
You can edit the bucket ACL if you want to (see "gsutil help setacl"),
but for many cases you'll never need to, and instead can change group
membership via the APIs console.
<B>IDENTIFYING PROJECTS WHEN CREATING AND LISTING BUCKETS</B>
When you create a bucket or list your buckets, you need to provide the
project ID that want to create or list (using the gsutil mb -p option or
the gsutil ls -p option, respectively). The project's name shown in the
Google APIs Console is a user-friendly name that you can choose; this is
not the project ID required by the gsutil mb and ls commands. To find the
project ID, go to the Storage Access pane in the Google APIs Console. Your
project ID is listed under Identifying your project.
""")
class CommandOptions(HelpProvider):
"""Additional help about Access Control Lists."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'projects',
# List of help name aliases.
HELP_NAME_ALIASES : ['apis console', 'console', 'dev console', 'project',
'proj', 'project-id'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Working with projects',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
This section provides details about how subdirectories work in gsutil.
Most users probably don't need to know these details, and can simply use
the commands (like cp -R) that work with subdirectories. We provide this
additional documentation to help users understand how gsutil handles
subdirectories differently than most GUI / web-based tools (e.g., why
those other tools create "dir_$folder$" objects), and also to explain cost and
performance implications of the gsutil approach, for those interested in such
details.
gsutil provides the illusion of a hierarchical file tree atop the "flat"
name space supported by the Google Cloud Storage service. To the service,
the object gs://bucket/abc/def/ghi.txt is just an object that happens to have
"/" characters in its name. There are no "abc" or "abc/def" directories;
just a single object with the given name.
gsutil achieves the hierarchical file tree illusion by applying a variety of
rules, to try to make naming work the way users would expect. For example, in
order to determine whether to treat a destination URI as an object name or the
root of a directory under which objects should be copied gsutil uses these
rules:
1. If the destination object ends with a "/" gsutil treats it as a directory.
For example, if you run the command:
gsutil cp file gs://bucket/abc/
gsutil will create the object gs://bucket/abc/file.
2. If you attempt to copy multiple source files to a destination URI, gsutil
treats the destination URI as a directory. For example, if you run
the command:
gsutil cp -R dir gs://bucket/abc
gsutil will create objects like gs://bucket/abc/dir/file1, etc. (assuming
file1 is a file under the source dir).
3. If neither of the above rules applies, gsutil performs a bucket listing to
determine if the target of the operation is a prefix match to the
specified string. For example, if you run the command:
gsutil cp file gs://bucket/abc
gsutil will make a bucket listing request for the named bucket, using
delimiter="/" and prefix="abc". It will then examine the bucket listing
results and determine whether there are objects in the bucket whose path
starts with gs://bucket/abc/, to determine whether to treat the target as
an object name or a directory name. In turn this impacts the name of the
object you create: If the above check indicates there is an "abc" directory
you will end up with the object gs://bucket/abc/file; otherwise you will
end up with the object gs://bucket/abc. (See "HOW NAMES ARE CONSTRUCTED"
under "gsutil help cp" for more details.)
This rule-based approach stands in contrast to the way many tools work, which
create objects to mark the existence of folders (such as "dir_$folder$").
gsutil understands several conventions used by such tools but does not
require such marker objects to implement naming behavior consistent with
UNIX commands.
A downside of the gsutil approach is it requires an extra bucket listing
before performing the needed cp or mv command. However those listings are
relatively inexpensive, because they use delimiter and prefix parameters to
limit result data. Moreover, gsutil makes only one bucket listing request
per cp/mv command, and thus amortizes the bucket listing cost across all
transferred objects (e.g., when performing a recursive copy of a directory
to the cloud).
""")
class CommandOptions(HelpProvider):
"""Additional help about subdirectory handling in gsutil."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'subdirs',
# List of help name aliases.
HELP_NAME_ALIASES : ['dirs', 'directory', 'directories', 'folder',
'folders', 'hierarchy', 'subdir', 'subdirectory',
'subdirectories'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'How subdirectories work in gsutil',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>TECHNICAL SUPPORT</B>
If you have any questions or encounter any problems with Google Cloud Storage,
please first read the FAQ at https://developers.google.com/storage/docs/faq.
If you still need help, please post your question to the gs-discussion forum
(https://developers.google.com/storage/forum) or to Stack Overflow with the
Google Cloud Storage tag
(http://stackoverflow.com/questions/tagged/google-cloud-storage). Our support
team actively monitors these forums and we'll do our best to respond. To help
us diagnose any issues you encounter, please provide these details in addition
to the description of your problem:
- The resource you are attempting to access (bucket name, object name)
- The operation you attempted (GET, PUT, etc.)
- The time and date (including timezone) at which you encountered the problem
- The tool or library you use to interact with Google Cloud Storage
- If you can use gsutil to reproduce your issue, specify the -D option to
display your request's HTTP details. Provide these details with your post
to the forum as they can help us further troubleshoot your issue.
Warning: The gsutil -D, -d, and -DD options will also print the authentication
header with authentication credentials for your Google Cloud Storage account.
Make sure to remove any "Authorization:" headers before you post HTTP details
to the forum.
If you make any local modifications to gsutil, please make sure to use
a released copy of gsutil (instead of your locally modified copy) when
providing the gsutil -D output noted above. We cannot support versions
of gsutil that include local modifications. (However, we're open to user
contributions; see "gsutil help dev".)
As an alternative to posting to the gs-discussion forum, we also
actively monitor http://stackoverflow.com for questions tagged with
"google-cloud-storage".
<B>BILLING AND ACCOUNT QUESTIONS</B>
For questions about billing or account issues, please visit
http://code.google.com/apis/console-help/#billing. If you want to cancel
billing, you can do so on the Billing pane of the Google APIs Console. For
more information, see
http://code.google.com/apis/console-help/#BillingCancelled. Caution: When you
disable billing, you also disable the Google Cloud Storage service. Make sure
you want to disable the Google Cloud Storage service before you disable
billing.
""")
class CommandOptions(HelpProvider):
"""Additional help about tech and billing support."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'support',
# List of help name aliases.
HELP_NAME_ALIASES : ['techsupport', 'tech support', 'technical support',
'billing', 'faq', 'questions'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'How to get Google Cloud Storage support',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>OVERVIEW</B>
Versioning-enabled buckets maintain an archive of objects, providing a way to
un-delete data that you accidentally deleted, or to retrieve older versions of
your data. You can turn versioning on or off for a bucket at any time. Turning
versioning off leaves existing object versions in place, and simply causes the
bucket to stop accumulating new object versions. In this case, if you upload
to an existing object the current version is overwritten instead of creating
a new version.
Regardless of whether you have enabled versioning on a bucket, every object
has two associated positive integer fields:
- the generation, which is updated when the content of an object is
overwritten.
- the meta-generation, which identifies the metadata generation. It starts
at 1; is updated every time the metadata (e.g., ACL or Content-Type) for a
given content generation is updated; and gets reset when the generation
number changes.
Of these two integers, only the generation is used when working with versioned
data. Both generation and meta-generation can be used with concurrency control
(discussed in a later section).
To work with object versioning in gsutil, you can use a flavor of storage URIs
that that embed the object generation, which we refer to as version-specific URIs.
For example, the version-less object URI:
gs://bucket/object
might have have two versions, with these version-specific URIs:
gs://bucket/object#1360383693690000
gs://bucket/object#1360383802725000
The following sections discuss how to work with versioning and concurrency
control.
<B>OBJECT VERSIONING</B>
You can view, enable, and disable object versioning on a bucket using
the getversioning and setversioning commands. For example:
gsutil setversioning on gs://bucket
will enable versioning for the named bucket. See 'gsutil help getversioning'
and 'gsutil help setversioning' for additional details.
To see all object versions in a versioning-enabled bucket along with
their generation.meta-generation information, use gsutil ls -a:
gsutil ls -a gs://bucket
You can also specify particular objects for which you want to find the
version-specific URI(s), or you can use wildcards:
gsutil ls -a gs://bucket/object1 gs://bucket/images/*.jpg
The generation values form a monotonically increasing sequence as you create
additional object versions. Because of this, the latest object version is
always the last one listed in the gsutil ls output for a particular object.
For example, if a bucket contains these three versions of gs://bucket/object:
gs://bucket/object#1360035307075000
gs://bucket/object#1360101007329000
gs://bucket/object#1360102216114000
then gs://bucket/object#1360102216114000 is the latest version and
gs://bucket/object#1360035307075000 is the oldest available version.
If you specify version-less URIs with gsutil, you will operate on the
latest not-deleted version of an object, for example:
gsutil cp gs://bucket/object ./dir
or
gsutil rm gs://bucket/object
To operate on a specific object version, use a version-specific URI.
For example, suppose the output of the above gsutil ls -a command is:
gs://bucket/object#1360035307075000
gs://bucket/object#1360101007329000
In this case, the command:
gsutil cp gs://bucket/object#1360035307075000 ./dir
will retrieve the second most recent version of the object.
Note that version-specific URIs cannot be the target of the gsutil cp
command (trying to do so will result in an error), because writing to a
versioned object always creates a new version.
If an object has been deleted, it will not show up in a normal gsutil ls
listing (i.e., ls without the -a option). You can restore a deleted object by
running gsutil ls -a to find the available versions, and then copying one of
the version-specific URIs to the version-less URI, for example:
gsutil cp gs://bucket/object#1360101007329000 gs://bucket/object
Note that when you do this it creates a new object version, which will incur
additional charges. You can get rid of the extra copy by deleting the older
version-specfic object:
gsutil rm gs://bucket/object#1360101007329000
Or you can combine the two steps by using the gsutil mv command:
gsutil mv gs://bucket/object#1360101007329000 gs://bucket/object
If you want to remove all versions of an object use the gsutil rm -a option:
gsutil rm -a gs://bucket/object
Note that there is no limit to the number of older versions of an object you
will create if you continue to upload to the same object in a versioning-
enabled bucket. It is your responsibility to delete versions beyond the ones
you want to retain.
<B>CONCURRENCY CONTROL</B>
If you are building an application using Google Cloud Storage, you may need to
be careful about concurrency control. Normally gsutil itself isn't used for
this purpose, but it's possible to write scripts around gsutil that perform
concurrency control.
For example, suppose you want to implement a "rolling update" system using
gsutil, where a periodic job computes some data and uploads it to the cloud.
On each run, the job starts with the data that it computed from last run, and
computes a new value. To make this system robust, you need to have multiple
machines on which the job can run, which raises the possibility that two
simultaneous runs could attempt to update an object at the same time. This
leads to the following potential race condition:
- job 1 computes the new value to be written
- job 2 computes the new value to be written
- job 2 writes the new value
- job 1 writes the new value
In this case, the value that job 1 read is no longer current by the time
it goes to write the updated object, and writing at this point would result
in stale (or, depending on the application, corrupt) data.
To prevent this, you can find the version-specific name of the object that was
created, and then use the information contained in that URI to specify an
x-goog-if-generation-match header on a subsequent gsutil cp command. You can
do this in two steps. First, use the gsutil cp -v option at upload time to get
the version-specific name of the object that was created, for example:
gsutil cp -v file gs://bucket/object
might output:
Created: gs://bucket/object#1360432179236000
You can extract the generation value from this object and then construct a
subsequent gsutil command like this:
gsutil -h x-goog-if-generation-match:1360432179236000 cp newfile \\
gs://bucket/object
This command requests Google Cloud Storage to attempt to upload newfile
but to fail the request if the generation of newfile that is live at the
time of the upload does not match that specified.
If the command you use updates object metadata, you will need to find the
current meta_generation for an object. To do this, use the gsutil ls -a and
-l options. For example, the command:
gsutil ls -l -a gs://bucket/object
will output something like:
64 2013-02-12T19:59:13 gs://bucket/object#1360699153986000 meta_generation=3
1521 2013-02-13T02:04:08 gs://bucket/object#1360721048778000 meta_generation=2
Given this information, you could use the following command to request setting
the ACL on the older version of the object, such that the command will fail
unless that is the current version of the data+metadata:
gsutil -h x-goog-if-generation-match:1360699153986000 -h \\
x-goog-if-metageneration-match:3 setacl public-read \\
gs://bucket/object#1360699153986000
Without adding these headers, the update would simply overwrite the existing
ACL. Note that in contrast, the gsutil chacl command uses these headers
automatically, because it performs a read-modify-write cycle in order to edit
ACLs.
If you want to experiment with how generations and metagenerations work, try
the following. First, upload an object; then use gsutil ls -l -a to list all
versions of the object, along with each version's meta_generation; then re-
upload the object and repeat the gsutil ls -l -a. You should see two object
versions, each with meta_generation=1. Now try setting the ACL, and rerun the
gsutil ls -l -a. You should see the most recent object generation now has
meta_generation=2.
<B>FOR MORE INFORMATION</B>
For more details on how to use versioning and preconditions, see
https://developers.google.com/storage/docs/object-versioning
""")
class CommandOptions(HelpProvider):
"""Additional help about object versioning."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'versioning',
# List of help name aliases.
HELP_NAME_ALIASES : ['concurrency', 'concurrency control', 'versioning',
'versions'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Working with object versions; concurrency control',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>DESCRIPTION</B>
gsutil supports URI wildcards. For example, the command:
gsutil cp gs://bucket/data/abc* .
will copy all objects that start with gs://bucket/data/abc followed by any
number of characters within that subdirectory.
<B>DIRECTORY BY DIRECTORY VS RECURSIVE WILDCARDS</B>
The "*" wildcard only matches up to the end of a path within
a subdirectory. For example, if bucket contains objects
named gs://bucket/data/abcd, gs://bucket/data/abcdef,
and gs://bucket/data/abcxyx, as well as an object in a sub-directory
(gs://bucket/data/abc/def) the above gsutil cp command would match the
first 3 object names but not the last one.
If you want matches to span directory boundaries, use a '**' wildcard:
gsutil cp gs://bucket/data/abc** .
will match all four objects above.
Note that gsutil supports the same wildcards for both objects and file names.
Thus, for example:
gsutil cp data/abc* gs://bucket
will match all names in the local file system. Most command shells also
support wildcarding, so if you run the above command probably your shell
is expanding the matches before running gsutil. However, most shells do not
support recursive wildcards ('**'), and you can cause gsutil's wildcarding
support to work for such shells by single-quoting the arguments so they
don't get interpreted by the shell before being passed to gsutil:
gsutil cp 'data/abc**' gs://bucket
<B>BUCKET WILDCARDS</B>
You can specify wildcards for bucket names. For example:
gsutil ls gs://data*.example.com
will list the contents of all buckets whose name starts with "data" and
ends with ".example.com".
You can also combine bucket and object name wildcards. For example this
command will remove all ".txt" files in any of your Google Cloud Storage
buckets:
gsutil rm gs://*/**.txt
<B>OTHER WILDCARD CHARACTERS</B>
In addition to '*', you can use these wildcards:
? Matches a single character. For example "gs://bucket/??.txt"
only matches objects with two characters followed by .txt.
[chars] Match any of the specified characters. For example
"gs://bucket/[aeiou].txt" matches objects that contain a single vowel
character followed by .txt
[char range] Match any of the range of characters. For example
"gs://bucket/[a-m].txt" matches objects that contain letters
a, b, c, ... or m, and end with .txt.
You can combine wildcards to provide more powerful matches, for example:
gs://bucket/[a-m]??.j*g
<B>EFFICIENCY CONSIDERATION: USING WILDCARDS OVER MANY OBJECTS</B>
It is more efficient, faster, and less network traffic-intensive
to use wildcards that have a non-wildcard object-name prefix, like:
gs://bucket/abc*.txt
than it is to use wildcards as the first part of the object name, like:
gs://bucket/*abc.txt
This is because the request for "gs://bucket/abc*.txt" asks the server
to send back the subset of results whose object names start with "abc",
and then gsutil filters the result list for objects whose name ends with
".txt". In contrast, "gs://bucket/*abc.txt" asks the server for the complete
list of objects in the bucket and then filters for those objects whose name
ends with "abc.txt". This efficiency consideration becomes increasingly
noticeable when you use buckets containing thousands or more objects. It is
sometimes possible to set up the names of your objects to fit with expected
wildcard matching patterns, to take advantage of the efficiency of doing
server-side prefix requests. See, for example "gsutil help prod" for a
concrete use case example.
<B>EFFICIENCY CONSIDERATION: USING MID-PATH WILDCARDS</B>
Suppose you have a bucket with these objects:
gs://bucket/obj1
gs://bucket/obj2
gs://bucket/obj3
gs://bucket/obj4
gs://bucket/dir1/obj5
gs://bucket/dir2/obj6
If you run the command:
gsutil ls gs://bucket/*/obj5
gsutil will perform a /-delimited top-level bucket listing and then one bucket
listing for each subdirectory, for a total of 3 bucket listings:
GET /bucket/?delimiter=/
GET /bucket/?prefix=dir1/obj5&delimiter=/
GET /bucket/?prefix=dir2/obj5&delimiter=/
The more bucket listings your wildcard requires, the slower and more expensive
it will be. The number of bucket listings required grows as:
- the number of wildcard components (e.g., "gs://bucket/a??b/c*/*/d"
has 3 wildcard components);
- the number of subdirectories that match each component; and
- the number of results (pagination is implemented using one GET
request per 1000 results, specifying markers for each).
If you want to use a mid-path wildcard, you might try instead using a
recursive wildcard, for example:
gsutil ls gs://bucket/**/obj5
This will match more objects than gs://bucket/*/obj5 (since it spans
directories), but is implemented using a delimiter-less bucket listing
request (which means fewer bucket requests, though it will list the entire
bucket and filter locally, so that could require a non-trivial amount of
network traffic).
""")
class CommandOptions(HelpProvider):
"""Additional help about wildcards."""
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'wildcards',
# List of help name aliases.
HELP_NAME_ALIASES : ['wildcard', '*', '**'],
# Type of help:
HELP_TYPE : HelpType.ADDITIONAL_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Wildcard support',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish, dis-
# tribute, sublicense, and/or sell copies of the Software, and to permit
# persons to whom the Software is furnished to do so, subject to the fol-
# lowing conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL-
# ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
# SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
import time
class BucketListingRef(object):
"""
Container that holds a reference to one result from a bucket listing, allowing
polymorphic iteration over wildcard-iterated URIs, Keys, or Prefixes. At a
minimum, every reference contains a StorageUri. If the reference came from a
bucket listing (as opposed to a manually instantiated ref that might populate
only the StorageUri), it will additionally contain either a Key or a Prefix,
depending on whether it was a reference to an object or was just a prefix of a
path (i.e., bucket subdirectory). The latter happens when the bucket was
listed using delimiter='/'.
Note that Keys are shallow-populated, based on the contents extracted from
parsing a bucket listing. This includes name, length, and other fields
(basically, the info listed by gsutil ls -l), but does not include information
like ACL and location (which require separate server requests, which is why
there's a separate gsutil ls -L option to get this more detailed info).
"""
def __init__(self, uri, key=None, prefix=None, headers=None):
"""Instantiate BucketListingRef from uri and (if available) key or prefix.
Args:
uri: StorageUri for the object (required).
key: Key for the object, or None if not available.
prefix: Prefix for the subdir, or None if not available.
headers: Dictionary containing optional HTTP headers to pass to boto
(which happens when GetKey() is called on an BucketListingRef which
has no constructor-populated Key), or None if not available.
At most one of key and prefix can be populated.
"""
assert key is None or prefix is None
self.uri = uri
self.key = key
self.prefix = prefix
self.headers = headers or {}
def GetUri(self):
"""Get URI form of listed URI.
Returns:
StorageUri.
"""
return self.uri
def GetUriString(self):
"""Get string URI form of listed URI.
Returns:
String.
"""
return self.uri.uri
def NamesBucket(self):
"""Determines if this BucketListingRef names a bucket.
Returns:
bool indicator.
"""
return self.key is None and self.prefix is None and self.uri.names_bucket()
def IsLatest(self):
"""Determines if this BucketListingRef names the latest version of an
object.
Returns:
bool indicator.
"""
return hasattr(self.uri, 'is_latest') and self.uri.is_latest
def GetRStrippedUriString(self):
"""Get string URI form of listed URI, stripped of any right trailing
delims, and without version string.
Returns:
String.
"""
return self.uri.versionless_uri.rstrip('/')
def HasKey(self):
"""Return bool indicator of whether this BucketListingRef has a Key."""
return bool(self.key)
def HasPrefix(self):
"""Return bool indicator of whether this BucketListingRef has a Prefix."""
return bool(self.prefix)
def GetKey(self):
"""Get Key form of listed URI.
Returns:
Subclass of boto.s3.key.Key.
Raises:
BucketListingRefException: for bucket-only uri.
"""
# For gsutil ls -l gs://bucket self.key will be populated from (boto)
# parsing the bucket listing. But as noted and handled below there are
# cases where self.key isn't populated.
if not self.key:
if not self.uri.names_object():
raise BucketListingRefException(
'Attempt to call GetKey() on Key-less BucketListingRef (uri=%s) ' %
self.uri)
# This case happens when we do gsutil ls -l on a object name-ful
# StorageUri with no object-name wildcard. Since the ls command
# implementation only reads bucket info we need to read the object
# for this case.
self.key = self.uri.get_key(validate=False, headers=self.headers)
# When we retrieve the object this way its last_modified timestamp
# is formatted in RFC 1123 format, which is different from when we
# retrieve from the bucket listing (which uses ISO 8601 format), so
# convert so we consistently return ISO 8601 format.
tuple_time = (time.strptime(self.key.last_modified,
'%a, %d %b %Y %H:%M:%S %Z'))
self.key.last_modified = time.strftime('%Y-%m-%dT%H:%M:%S', tuple_time)
return self.key
def GetPrefix(self):
"""Get Prefix form of listed URI.
Returns:
boto.s3.prefix.Prefix.
Raises:
BucketListingRefException: if this object has no Prefix.
"""
if not self.prefix:
raise BucketListingRefException(
'Attempt to call GetPrefix() on Prefix-less BucketListingRef '
'(uri=%s)' % self.uri)
return self.prefix
def __repr__(self):
"""Returns string representation of BucketListingRef."""
return 'BucketListingRef(%s, HasKey=%s, HasPrefix=%s)' % (
self.uri, self.HasKey(), self.HasPrefix())
class BucketListingRefException(StandardError):
"""Exception thrown for invalid BucketListingRef requests."""
def __init__(self, reason):
StandardError.__init__(self)
self.reason = reason
def __repr__(self):
return 'BucketListingRefException: %s' % self.reason
def __str__(self):
return 'BucketListingRefException: %s' % self.reason
# Copyright 2010 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Base class for gsutil commands.
In addition to base class code, this file contains helpers that depend on base
class state (such as GetAclCommandHelper, which depends on self.gsutil_bin_dir,
self.bucket_storage_uri_class, etc.) In general, functions that depend on class
state and that are used by multiple commands belong in this file. Functions that
don't depend on class state belong in util.py, and non-shared helpers belong in
individual subclasses.
"""
import boto
import getopt
import gslib
import logging
import multiprocessing
import os
import platform
import re
import sys
import wildcard_iterator
import xml.dom.minidom
from boto import handler
from boto.storage_uri import StorageUri
from getopt import GetoptError
from gslib import util
from gslib.exception import CommandException
from gslib.help_provider import HelpProvider
from gslib.name_expansion import NameExpansionIterator
from gslib.name_expansion import NameExpansionIteratorQueue
from gslib.project_id import ProjectIdHandler
from gslib.storage_uri_builder import StorageUriBuilder
from gslib.thread_pool import ThreadPool
from gslib.util import HAVE_OAUTH2
from gslib.util import NO_MAX
from gslib.wildcard_iterator import ContainsWildcard
def _ThreadedLogger():
"""Creates a logger that resembles 'print' output, but is thread safe.
The logger will display all messages logged with level INFO or above. Log
propagation is disabled.
Returns:
A logger object.
"""
log = logging.getLogger('threaded-logging')
log.propagate = False
log.setLevel(logging.INFO)
log_handler = logging.StreamHandler()
log_handler.setFormatter(logging.Formatter('%(message)s'))
log.addHandler(log_handler)
return log
# command_spec key constants.
COMMAND_NAME = 'command_name'
COMMAND_NAME_ALIASES = 'command_name_aliases'
MIN_ARGS = 'min_args'
MAX_ARGS = 'max_args'
SUPPORTED_SUB_ARGS = 'supported_sub_args'
FILE_URIS_OK = 'file_uri_ok'
PROVIDER_URIS_OK = 'provider_uri_ok'
URIS_START_ARG = 'uris_start_arg'
CONFIG_REQUIRED = 'config_required'
_EOF_NAME_EXPANSION_RESULT = ("EOF")
class Command(object):
# Global instance of a threaded logger object.
THREADED_LOGGER = _ThreadedLogger()
REQUIRED_SPEC_KEYS = [COMMAND_NAME]
# Each subclass must define the following map, minimally including the
# keys in REQUIRED_SPEC_KEYS; other values below will be used as defaults,
# although for readbility subclasses should specify the complete map.
command_spec = {
# Name of command.
COMMAND_NAME : None,
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs are acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs are acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
_default_command_spec = command_spec
help_spec = HelpProvider.help_spec
"""Define an empty test specification, which derived classes must populate.
This is a list of tuples containing the following values:
step_name - mnemonic name for test, displayed when test is run
cmd_line - shell command line to run test
expect_ret or None - expected return code from test (None means ignore)
(result_file, expect_file) or None - tuple of result file and expected
file to diff for additional test
verification beyond the return code
(None means no diff requested)
Notes:
- Setting expected_ret to None means there is no expectation and,
hence, any returned value will pass.
- Any occurrences of the string 'gsutil' in the cmd_line parameter
are expanded to the full path to the gsutil command under test.
- The cmd_line, result_file and expect_file parameters may
contain the following special substrings:
$Bn - converted to one of 10 unique-for-testing bucket names (n=0..9)
$On - converted to one of 10 unique-for-testing object names (n=0..9)
$Fn - converted to one of 10 unique-for-testing file names (n=0..9)
$G - converted to the directory where gsutil is installed. Useful for
referencing test data.
- The generated file names are full pathnames, whereas the generated
bucket and object names are simple relative names.
- Tests with a non-None result_file and expect_file automatically
trigger an implicit diff of the two files.
- These test specifications, in combination with the conversion strings
allow tests to be constructed parametrically. For example, here's an
annotated subset of a test_steps for the cp command:
# Copy local file to object, verify 0 return code.
('simple cp', 'gsutil cp $F1 gs://$B1/$O1', 0, None, None),
# Copy uploaded object back to local file and diff vs. orig file.
('verify cp', 'gsutil cp gs://$B1/$O1 $F2', 0, '$F2', '$F1'),
- After pattern substitution, the specs are run sequentially, in the
order in which they appear in the test_steps list.
"""
test_steps = []
# Define a convenience property for command name, since it's used many places.
def _GetDefaultCommandName(self):
return self.command_spec[COMMAND_NAME]
command_name = property(_GetDefaultCommandName)
def __init__(self, command_runner, args, headers, debug, parallel_operations,
gsutil_bin_dir, boto_lib_dir, config_file_list, gsutil_ver,
bucket_storage_uri_class, test_method=None):
"""
Args:
command_runner: CommandRunner (for commands built atop other commands).
args: Command-line args (arg0 = actual arg, not command name ala bash).
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
parallel_operations: Should command operations be executed in parallel?
gsutil_bin_dir: Bin dir from which gsutil is running.
boto_lib_dir: Lib dir where boto runs.
config_file_list: Config file list returned by _GetBotoConfigFileList().
gsutil_ver: Version string of currently running gsutil command.
bucket_storage_uri_class: Class to instantiate for cloud StorageUris.
Settable for testing/mocking.
test_method: Optional general purpose method for testing purposes.
Application and semantics of this method will vary by
command and test type.
Implementation note: subclasses shouldn't need to define an __init__
method, and instead depend on the shared initialization that happens
here. If you do define an __init__ method in a subclass you'll need to
explicitly call super().__init__(). But you're encouraged not to do this,
because it will make changing the __init__ interface more painful.
"""
# Save class values from constructor params.
self.command_runner = command_runner
self.args = args
self.unparsed_args = args
self.headers = headers
self.debug = debug
self.parallel_operations = parallel_operations
self.gsutil_bin_dir = gsutil_bin_dir
self.boto_lib_dir = boto_lib_dir
self.config_file_list = config_file_list
self.gsutil_ver = gsutil_ver
self.bucket_storage_uri_class = bucket_storage_uri_class
self.test_method = test_method
self.exclude_symlinks = False
self.recursion_requested = False
self.all_versions = False
# Process sub-command instance specifications.
# First, ensure subclass implementation sets all required keys.
for k in self.REQUIRED_SPEC_KEYS:
if k not in self.command_spec or self.command_spec[k] is None:
raise CommandException('"%s" command implementation is missing %s '
'specification' % (self.command_name, k))
# Now override default command_spec with subclass-specified values.
tmp = self._default_command_spec
tmp.update(self.command_spec)
self.command_spec = tmp
del tmp
# Make sure command provides a test specification.
if not self.test_steps:
# TODO: Uncomment following lines when test feature is ready.
#raise CommandException('"%s" command implementation is missing test '
#'specification' % self.command_name)
pass
# Parse and validate args.
try:
(self.sub_opts, self.args) = getopt.getopt(
args, self.command_spec[SUPPORTED_SUB_ARGS])
except GetoptError, e:
raise CommandException('%s for "%s" command.' % (e.msg,
self.command_name))
if (len(self.args) < self.command_spec[MIN_ARGS]
or len(self.args) > self.command_spec[MAX_ARGS]):
raise CommandException('Wrong number of arguments for "%s" command.' %
self.command_name)
if (not self.command_spec[FILE_URIS_OK]
and self.HaveFileUris(self.args[self.command_spec[URIS_START_ARG]:])):
raise CommandException('"%s" command does not support "file://" URIs. '
'Did you mean to use a gs:// URI?' %
self.command_name)
if (not self.command_spec[PROVIDER_URIS_OK]
and self._HaveProviderUris(
self.args[self.command_spec[URIS_START_ARG]:])):
raise CommandException('"%s" command does not support provider-only '
'URIs.' % self.command_name)
if self.command_spec[CONFIG_REQUIRED]:
self._ConfigureNoOpAuthIfNeeded()
self.proj_id_handler = ProjectIdHandler()
self.suri_builder = StorageUriBuilder(debug, bucket_storage_uri_class)
# Cross-platform path to run gsutil binary.
self.gsutil_cmd = ''
# Cross-platform list containing gsutil path for use with subprocess.
self.gsutil_exec_list = []
# If running on Windows, invoke python interpreter explicitly.
if platform.system() == "Windows":
self.gsutil_cmd += 'python '
self.gsutil_exec_list += ['python']
# Add full path to gsutil to make sure we test the correct version.
self.gsutil_path = os.path.join(self.gsutil_bin_dir, 'gsutil')
self.gsutil_cmd += self.gsutil_path
self.gsutil_exec_list += [self.gsutil_path]
# We're treating recursion_requested like it's used by all commands, but
# only some of the commands accept the -R option.
if self.sub_opts:
for o, unused_a in self.sub_opts:
if o == '-r' or o == '-R':
self.recursion_requested = True
break
def WildcardIterator(self, uri_or_str, all_versions=False):
"""
Helper to instantiate gslib.WildcardIterator. Args are same as
gslib.WildcardIterator interface, but this method fills in most of the
values from instance state.
Args:
uri_or_str: StorageUri or URI string naming wildcard objects to iterate.
"""
return wildcard_iterator.wildcard_iterator(
uri_or_str, self.proj_id_handler,
bucket_storage_uri_class=self.bucket_storage_uri_class,
all_versions=all_versions,
headers=self.headers, debug=self.debug)
def RunCommand(self):
"""Abstract function in base class. Subclasses must implement this. The
return value of this function will be used as the exit status of the
process, so subclass commands should return an integer exit code (0 for
success, a value in [1,255] for failure).
"""
raise CommandException('Command %s is missing its RunCommand() '
'implementation' % self.command_name)
############################################################
# Shared helper functions that depend on base class state. #
############################################################
def UrisAreForSingleProvider(self, uri_args):
"""Tests whether the uris are all for a single provider.
Returns: a StorageUri for one of the uris on success, None on failure.
"""
provider = None
uri = None
for uri_str in uri_args:
# validate=False because we allow wildcard uris.
uri = boto.storage_uri(
uri_str, debug=self.debug, validate=False,
bucket_storage_uri_class=self.bucket_storage_uri_class)
if not provider:
provider = uri.scheme
elif uri.scheme != provider:
return None
return uri
def SetAclCommandHelper(self):
"""
Common logic for setting ACLs. Sets the standard ACL or the default
object ACL depending on self.command_name.
"""
acl_arg = self.args[0]
uri_args = self.args[1:]
# Disallow multi-provider setacl requests, because there are differences in
# the ACL models.
storage_uri = self.UrisAreForSingleProvider(uri_args)
if not storage_uri:
raise CommandException('"%s" command spanning providers not allowed.' %
self.command_name)
# Determine whether acl_arg names a file containing XML ACL text vs. the
# string name of a canned ACL.
if os.path.isfile(acl_arg):
acl_file = open(acl_arg, 'r')
acl_arg = acl_file.read()
# TODO: Remove this workaround when GCS allows
# whitespace in the Permission element on the server-side
acl_arg = re.sub(r'<Permission>\s*(\S+)\s*</Permission>',
r'<Permission>\1</Permission>', acl_arg)
acl_file.close()
self.canned = False
else:
# No file exists, so expect a canned ACL string.
canned_acls = storage_uri.canned_acls()
if acl_arg not in canned_acls:
raise CommandException('Invalid canned ACL "%s".' % acl_arg)
self.canned = True
# Used to track if any ACLs failed to be set.
self.everything_set_okay = True
def _SetAclExceptionHandler(e):
"""Simple exception handler to allow post-completion status."""
self.THREADED_LOGGER.error(str(e))
self.everything_set_okay = False
def _SetAclFunc(name_expansion_result):
exp_src_uri = self.suri_builder.StorageUri(
name_expansion_result.GetExpandedUriStr())
# We don't do bucket operations multi-threaded (see comment below).
assert self.command_name != 'setdefacl'
self.THREADED_LOGGER.info('Setting ACL on %s...' %
name_expansion_result.expanded_uri_str)
if self.canned:
exp_src_uri.set_acl(acl_arg, exp_src_uri.object_name, False,
self.headers)
else:
exp_src_uri.set_xml_acl(acl_arg, exp_src_uri.object_name, False,
self.headers)
# If user specified -R option, convert any bucket args to bucket wildcards
# (e.g., gs://bucket/*), to prevent the operation from being applied to
# the buckets themselves.
if self.recursion_requested:
for i in range(len(uri_args)):
uri = self.suri_builder.StorageUri(uri_args[i])
if uri.names_bucket():
uri_args[i] = uri.clone_replace_name('*').uri
else:
# Handle bucket ACL setting operations single-threaded, because
# our threading machinery currently assumes it's working with objects
# (name_expansion_iterator), and normally we wouldn't expect users to need
# to set ACLs on huge numbers of buckets at once anyway.
for i in range(len(uri_args)):
uri_str = uri_args[i]
if self.suri_builder.StorageUri(uri_str).names_bucket():
self._RunSingleThreadedSetAcl(acl_arg, uri_args)
return
name_expansion_iterator = NameExpansionIterator(
self.command_name, self.proj_id_handler, self.headers, self.debug,
self.bucket_storage_uri_class, uri_args, self.recursion_requested,
self.recursion_requested, all_versions=self.all_versions)
# Perform requests in parallel (-m) mode, if requested, using
# configured number of parallel processes and threads. Otherwise,
# perform requests with sequential function calls in current process.
self.Apply(_SetAclFunc, name_expansion_iterator, _SetAclExceptionHandler)
if not self.everything_set_okay:
raise CommandException('ACLs for some objects could not be set.')
def _RunSingleThreadedSetAcl(self, acl_arg, uri_args):
some_matched = False
for uri_str in uri_args:
for blr in self.WildcardIterator(uri_str):
if blr.HasPrefix():
continue
some_matched = True
uri = blr.GetUri()
if self.command_name == 'setdefacl':
print 'Setting default object ACL on %s...' % uri
if self.canned:
uri.set_def_acl(acl_arg, uri.object_name, False, self.headers)
else:
uri.set_def_xml_acl(acl_arg, False, self.headers)
else:
print 'Setting ACL on %s...' % uri
if self.canned:
uri.set_acl(acl_arg, uri.object_name, False, self.headers)
else:
uri.set_xml_acl(acl_arg, uri.object_name, False, self.headers)
if not some_matched:
raise CommandException('No URIs matched')
def GetAclCommandHelper(self):
"""Common logic for getting ACLs. Gets the standard ACL or the default
object ACL depending on self.command_name."""
# Resolve to just one object.
# Handle wildcard-less URI specially in case this is a version-specific
# URI, because WildcardIterator().IterUris() would lose the versioning info.
if not ContainsWildcard(self.args[0]):
uri = self.suri_builder.StorageUri(self.args[0])
else:
uris = list(self.WildcardIterator(self.args[0]).IterUris())
if len(uris) == 0:
raise CommandException('No URIs matched')
if len(uris) != 1:
raise CommandException('%s matched more than one URI, which is not '
'allowed by the %s command' % (self.args[0], self.command_name))
uri = uris[0]
if not uri.names_bucket() and not uri.names_object():
raise CommandException('"%s" command must specify a bucket or '
'object.' % self.command_name)
if self.command_name == 'getdefacl':
acl = uri.get_def_acl(False, self.headers)
else:
acl = uri.get_acl(False, self.headers)
# Pretty-print the XML to make it more easily human editable.
parsed_xml = xml.dom.minidom.parseString(acl.to_xml().encode('utf-8'))
print parsed_xml.toprettyxml(indent=' ')
def GetXmlSubresource(self, subresource, uri_arg):
"""Print an xml subresource, e.g. logging, for a bucket/object.
Args:
subresource: The subresource name.
uri_arg: URI for the bucket/object. Wildcards will be expanded.
Raises:
CommandException: if errors encountered.
"""
# Wildcarding is allowed but must resolve to just one bucket.
uris = list(self.WildcardIterator(uri_arg).IterUris())
if len(uris) != 1:
raise CommandException('Wildcards must resolve to exactly one item for '
'get %s' % subresource)
uri = uris[0]
xml_str = uri.get_subresource(subresource, False, self.headers)
# Pretty-print the XML to make it more easily human editable.
parsed_xml = xml.dom.minidom.parseString(xml_str.encode('utf-8'))
print parsed_xml.toprettyxml(indent=' ')
def Apply(self, func, name_expansion_iterator, thr_exc_handler,
shared_attrs=None):
"""Dispatch input URI assignments across a pool of parallel OS
processes and/or Python threads, based on options (-m or not)
and settings in the user's config file. If non-parallel mode
or only one OS process requested, execute requests sequentially
in the current OS process.
Args:
func: Function to call to process each URI.
name_expansion_iterator: Iterator of NameExpansionResult.
thr_exc_handler: Exception handler for ThreadPool class.
shared_attrs: List of attributes to manage across sub-processes.
Raises:
CommandException if invalid config encountered.
"""
# Set OS process and python thread count as a function of options
# and config.
if self.parallel_operations:
process_count = boto.config.getint(
'GSUtil', 'parallel_process_count',
gslib.commands.config.DEFAULT_PARALLEL_PROCESS_COUNT)
if process_count < 1:
raise CommandException('Invalid parallel_process_count "%d".' %
process_count)
thread_count = boto.config.getint(
'GSUtil', 'parallel_thread_count',
gslib.commands.config.DEFAULT_PARALLEL_THREAD_COUNT)
if thread_count < 1:
raise CommandException('Invalid parallel_thread_count "%d".' %
thread_count)
else:
# If -m not specified, then assume 1 OS process and 1 Python thread.
process_count = 1
thread_count = 1
if self.debug:
self.THREADED_LOGGER.info('process count: %d', process_count)
self.THREADED_LOGGER.info('thread count: %d', thread_count)
if self.parallel_operations and process_count > 1:
procs = []
# If any shared attributes passed by caller, create a dictionary of
# shared memory variables for every element in the list of shared
# attributes.
shared_vars = None
if shared_attrs:
for name in shared_attrs:
if not shared_vars:
shared_vars = {}
shared_vars[name] = multiprocessing.Value('i', 0)
# Construct work queue for parceling out work to multiprocessing workers,
# setting the max queue length of 50k so we will block if workers don't
# empty the queue as fast as we can continue iterating over the bucket
# listing. This number may need tuning; it should be large enough to
# keep workers busy (overlapping bucket list next-page retrieval with
# operations being fed from the queue) but small enough that we don't
# overfill memory when runing across a slow network link.
work_queue = multiprocessing.Queue(50000)
for shard in range(process_count):
# Spawn a separate OS process for each shard.
if self.debug:
self.THREADED_LOGGER.info('spawning process for shard %d', shard)
p = multiprocessing.Process(target=self._ApplyThreads,
args=(func, work_queue, shard,
thread_count, thr_exc_handler,
shared_vars))
procs.append(p)
p.start()
last_name_expansion_result = None
try:
# Feed all work into the queue being emptied by the workers.
for name_expansion_result in name_expansion_iterator:
last_name_expansion_result = name_expansion_result
work_queue.put(name_expansion_result)
except:
sys.stderr.write('Failed URI iteration. Last result (prior to '
'exception) was: %s\n'
% repr(last_name_expansion_result))
finally:
# We do all of the process cleanup in a finally cause in case the name
# expansion iterator throws an exception. This will send EOF to all the
# child processes and join them back into the parent process.
# Send an EOF per worker.
for shard in range(process_count):
work_queue.put(_EOF_NAME_EXPANSION_RESULT)
# Wait for all spawned OS processes to finish.
failed_process_count = 0
for p in procs:
p.join()
# Count number of procs that returned non-zero exit code.
if p.exitcode != 0:
failed_process_count += 1
# Propagate shared variables back to caller's attributes.
if shared_vars:
for (name, var) in shared_vars.items():
setattr(self, name, var.value)
# Abort main process if one or more sub-processes failed. Note that this
# is outside the finally clause, because we only want to raise a new
# exception if an exception wasn't already raised in the try clause above.
if failed_process_count:
plural_str = ''
if failed_process_count > 1:
plural_str = 'es'
raise Exception('unexpected failure in %d sub-process%s, '
'aborting...' % (failed_process_count, plural_str))
else:
# Using just 1 process, so funnel results to _ApplyThreads using facade
# that makes NameExpansionIterator look like a Multiprocessing.Queue
# that sends one EOF once the iterator empties.
work_queue = NameExpansionIteratorQueue(name_expansion_iterator,
_EOF_NAME_EXPANSION_RESULT)
self._ApplyThreads(func, work_queue, 0, thread_count, thr_exc_handler,
None)
def HaveFileUris(self, args_to_check):
"""Checks whether args_to_check contain any file URIs.
Args:
args_to_check: Command-line argument subset to check.
Returns:
True if args_to_check contains any file URIs.
"""
for uri_str in args_to_check:
if uri_str.lower().startswith('file://') or uri_str.find(':') == -1:
return True
return False
######################
# Private functions. #
######################
def _HaveProviderUris(self, args_to_check):
"""Checks whether args_to_check contains any provider URIs (like 'gs://').
Args:
args_to_check: Command-line argument subset to check.
Returns:
True if args_to_check contains any provider URIs.
"""
for uri_str in args_to_check:
if re.match('^[a-z]+://$', uri_str):
return True
return False
def _ConfigureNoOpAuthIfNeeded(self):
"""Sets up no-op auth handler if no boto credentials are configured."""
config = boto.config
if not util.HasConfiguredCredentials():
if self.config_file_list:
if (config.has_option('Credentials', 'gs_oauth2_refresh_token')
and not HAVE_OAUTH2):
raise CommandException(
'Your gsutil is configured with OAuth2 authentication '
'credentials.\nHowever, OAuth2 is only supported when running '
'under Python 2.6 or later\n(unless additional dependencies are '
'installed, see README for details); you are running Python %s.' %
sys.version)
raise CommandException('You have no storage service credentials in any '
'of the following boto config\nfiles. Please '
'add your credentials as described in the '
'gsutil README file, or else\nre-run '
'"gsutil config" to re-create a config '
'file:\n%s' % self.config_file_list)
else:
# With no boto config file the user can still access publicly readable
# buckets and objects.
from gslib import no_op_auth_plugin
def _ApplyThreads(self, func, work_queue, shard, num_threads,
thr_exc_handler=None, shared_vars=None):
"""
Perform subset of required requests across a caller specified
number of parallel Python threads, which may be one, in which
case the requests are processed in the current thread.
Args:
func: Function to call for each request.
work_queue: shared queue of NameExpansionResult to process.
shard: Assigned subset (shard number) for this function.
num_threads: Number of Python threads to spawn to process this shard.
thr_exc_handler: Exception handler for ThreadPool class.
shared_vars: Dict of shared memory variables to be managed.
(only relevant, and non-None, if this function is
run in a separate OS process).
"""
# Each OS process needs to establish its own set of connections to
# the server to avoid writes from different OS processes interleaving
# onto the same socket (and garbling the underlying SSL session).
# We ensure each process gets its own set of connections here by
# closing all connections in the storage provider connection pool.
connection_pool = StorageUri.provider_pool
if connection_pool:
for i in connection_pool:
connection_pool[i].connection.close()
if num_threads > 1:
thread_pool = ThreadPool(num_threads, thr_exc_handler)
try:
while True: # Loop until we hit EOF marker.
name_expansion_result = work_queue.get()
if name_expansion_result == _EOF_NAME_EXPANSION_RESULT:
break
exp_src_uri = self.suri_builder.StorageUri(
name_expansion_result.GetExpandedUriStr())
if self.debug:
self.THREADED_LOGGER.info('process %d shard %d is handling uri %s',
os.getpid(), shard, exp_src_uri)
if (self.exclude_symlinks and exp_src_uri.is_file_uri()
and os.path.islink(exp_src_uri.object_name)):
self.THREADED_LOGGER.info('Skipping symbolic link %s...', exp_src_uri)
elif num_threads > 1:
thread_pool.AddTask(func, name_expansion_result)
else:
func(name_expansion_result)
# If any Python threads created, wait here for them to finish.
if num_threads > 1:
thread_pool.WaitCompletion()
finally:
if num_threads > 1:
thread_pool.Shutdown()
# If any shared variables (which means we are running in a separate OS
# process), increment value for each shared variable.
if shared_vars:
for (name, var) in shared_vars.items():
var.value += getattr(self, name)
#!/usr/bin/env python
# coding=utf8
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Class that runs a named gsutil command."""
import boto
import os
from boto.storage_uri import BucketStorageUri
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.exception import CommandException
class CommandRunner(object):
def __init__(self, gsutil_bin_dir, boto_lib_dir, config_file_list,
gsutil_ver, bucket_storage_uri_class=BucketStorageUri):
"""
Args:
gsutil_bin_dir: Bin dir from which gsutil is running.
boto_lib_dir: Lib dir where boto runs.
config_file_list: Config file list returned by _GetBotoConfigFileList().
gsutil_ver: Version string of currently running gsutil command.
bucket_storage_uri_class: Class to instantiate for cloud StorageUris.
Settable for testing/mocking.
"""
self.gsutil_bin_dir = gsutil_bin_dir
self.boto_lib_dir = boto_lib_dir
self.config_file_list = config_file_list
self.gsutil_ver = gsutil_ver
self.bucket_storage_uri_class = bucket_storage_uri_class
self.command_map = self._LoadCommandMap()
def _LoadCommandMap(self):
"""Returns dict mapping each command_name to implementing class."""
# Walk gslib/commands and find all commands.
commands_dir = os.path.join(self.gsutil_bin_dir, 'gslib', 'commands')
for f in os.listdir(commands_dir):
# Handles no-extension files, etc.
(module_name, ext) = os.path.splitext(f)
if ext == '.py':
__import__('gslib.commands.%s' % module_name)
command_map = {}
# Only include Command subclasses in the dict.
for command in Command.__subclasses__():
command_map[command.command_spec[COMMAND_NAME]] = command
for command_name_aliases in command.command_spec[COMMAND_NAME_ALIASES]:
command_map[command_name_aliases] = command
return command_map
def RunNamedCommand(self, command_name, args=None, headers=None, debug=0,
parallel_operations=False, test_method=None):
"""Runs the named command. Used by gsutil main, commands built atop
other commands, and tests .
Args:
command_name: The name of the command being run.
args: Command-line args (arg0 = actual arg, not command name ala bash).
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
parallel_operations: Should command operations be executed in parallel?
test_method: Optional general purpose method for testing purposes.
Application and semantics of this method will vary by
command and test type.
Raises:
CommandException: if errors encountered.
"""
if not args:
args = []
# Include api_version header in all commands.
api_version = boto.config.get_value('GSUtil', 'default_api_version', '1')
if not headers:
headers = {}
headers['x-goog-api-version'] = api_version
if command_name not in self.command_map:
raise CommandException('Invalid command "%s".' % command_name)
command_class = self.command_map[command_name]
command_inst = command_class(self, args, headers, debug,
parallel_operations, self.gsutil_bin_dir,
self.boto_lib_dir, self.config_file_list,
self.gsutil_ver, self.bucket_storage_uri_class,
test_method)
return command_inst.RunCommand()
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Package marker file."""
# Copyright 2011 Google Inc. All Rights Reserved.
# Copyright 2011, Nexenta Systems Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
from gslib.wildcard_iterator import ContainsWildcard
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil cat [-h] uri...
<B>DESCRIPTION</B>
The cat command outputs the contents of one or more URIs to stdout.
It is equivalent to doing:
gsutil cp uri... -
(The final '-' causes gsutil to stream the output to stdout.)
<B>OPTIONS</B>
-h Prints short header for each object. For example:
gsutil cat -h gs://bucket/meeting_notes/2012_Feb/*.txt
""")
class CatCommand(Command):
"""Implementation of gsutil cat command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'cat',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'hv',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'cat',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Concatenate object content to stdout',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
show_header = False
if self.sub_opts:
for o, unused_a in self.sub_opts:
if o == '-h':
show_header = True
elif o == '-v':
self.THREADED_LOGGER.info('WARNING: The %s -v option is no longer'
' needed, and will eventually be removed.\n'
% self.command_name)
printed_one = False
# We manipulate the stdout so that all other data other than the Object
# contents go to stderr.
cat_outfd = sys.stdout
sys.stdout = sys.stderr
did_some_work = False
for uri_str in self.args:
for uri in self.WildcardIterator(uri_str).IterUris():
if not uri.names_object():
raise CommandException('"%s" command must specify objects.' %
self.command_name)
did_some_work = True
if show_header:
if printed_one:
print
print '==> %s <==' % uri.__str__()
printed_one = True
key = uri.get_key(False, self.headers)
key.get_file(cat_outfd, self.headers)
sys.stdout = cat_outfd
if not did_some_work:
raise CommandException('No URIs matched')
return 0
# Copyright 2013 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module provides the chacl command to gsutil.
This command allows users to easily specify changes to access control lists.
"""
import random
import re
import time
from xml.dom import minidom
from boto.exception import GSResponseError
from boto.gs import acl
from gslib import name_expansion
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HELP_TYPE
from gslib.help_provider import HelpType
from gslib.util import NO_MAX
from gslib.util import Retry
class ChangeType(object):
USER = 'User'
GROUP = 'Group'
class AclChange(object):
"""Represents a logical change to an access control list."""
public_scopes = ['AllAuthenticatedUsers', 'AllUsers']
id_scopes = ['UserById', 'GroupById']
email_scopes = ['UserByEmail', 'GroupByEmail']
domain_scopes = ['GroupByDomain']
scope_types = public_scopes + id_scopes + email_scopes + domain_scopes
permission_shorthand_mapping = {
'R': 'READ',
'W': 'WRITE',
'FC': 'FULL_CONTROL',
}
def __init__(self, acl_change_descriptor, scope_type, logger):
"""Creates an AclChange object.
acl_change_descriptor: An acl change as described in chacl help.
scope_type: Either ChangeType.USER or ChangeType.GROUP, specifying the
extent of the scope.
logger: An instance of ThreadedLogger.
"""
self.logger = logger
self.identifier = ''
self.raw_descriptor = acl_change_descriptor
self._Parse(acl_change_descriptor, scope_type)
self._Validate()
def __str__(self):
return 'AclChange<{0}|{1}|{2}>'.format(self.scope_type, self.perm,
self.identifier)
def _Parse(self, change_descriptor, scope_type):
"""Parses an ACL Change descriptor."""
def _ClassifyScopeIdentifier(text):
re_map = {
'AllAuthenticatedUsers': r'^(AllAuthenticatedUsers|AllAuth)$',
'AllUsers': '^(AllUsers|All)$',
'Email': r'^.+@.+\..+$',
'Id': r'^[0-9A-Fa-f]{64}$',
'Domain': r'^[^@]+\..+$',
}
for type_string, regex in re_map.items():
if re.match(regex, text, re.IGNORECASE):
return type_string
if change_descriptor.count(':') != 1:
raise CommandException('{0} is an invalid change description.'
.format(change_descriptor))
scope_string, perm_token = change_descriptor.split(':')
perm_token = perm_token.upper()
if perm_token in self.permission_shorthand_mapping:
self.perm = self.permission_shorthand_mapping[perm_token]
else:
self.perm = perm_token
scope_class = _ClassifyScopeIdentifier(scope_string)
if scope_class == 'Domain':
# This may produce an invalid UserByDomain scope,
# which is good because then validate can complain.
self.scope_type = '{0}ByDomain'.format(scope_type)
self.identifier = scope_string
elif scope_class in ['Email', 'Id']:
self.scope_type = '{0}By{1}'.format(scope_type, scope_class)
self.identifier = scope_string
elif scope_class == 'AllAuthenticatedUsers':
self.scope_type = 'AllAuthenticatedUsers'
elif scope_class == 'AllUsers':
self.scope_type = 'AllUsers'
else:
# This is just a fallback, so we set it to something
# and the validate step has something to go on.
self.scope_type = scope_string
def _Validate(self):
"""Validates a parsed AclChange object."""
def _ThrowError(msg):
raise CommandException('{0} is not a valid ACL change\n{1}'
.format(self.raw_descriptor, msg))
if self.scope_type not in self.scope_types:
_ThrowError('{0} is not a valid scope type'.format(self.scope_type))
if self.scope_type in self.public_scopes and self.identifier:
_ThrowError('{0} requires no arguments'.format(self.scope_type))
if self.scope_type in self.id_scopes and not self.identifier:
_ThrowError('{0} requires an id'.format(self.scope_type))
if self.scope_type in self.email_scopes and not self.identifier:
_ThrowError('{0} requires an email address'.format(self.scope_type))
if self.scope_type in self.domain_scopes and not self.identifier:
_ThrowError('{0} requires domain'.format(self.scope_type))
if self.perm not in self.permission_shorthand_mapping.values():
perms = ', '.join(self.permission_shorthand_mapping.values())
_ThrowError('Allowed permissions are {0}'.format(perms))
def _YieldMatchingEntries(self, current_acl):
"""Generator that yields entries that match the change descriptor.
current_acl: An instance of bogo.gs.acl.ACL which will be searched
for matching entries.
"""
for entry in current_acl.entries.entry_list:
if entry.scope.type == self.scope_type:
if self.scope_type in ['UserById', 'GroupById']:
if self.identifier == entry.scope.id:
yield entry
elif self.scope_type in ['UserByEmail', 'GroupByEmail']:
if self.identifier == entry.scope.email_address:
yield entry
elif self.scope_type == 'GroupByDomain':
if self.identifier == entry.scope.domain:
yield entry
elif self.scope_type in ['AllUsers', 'AllAuthenticatedUsers']:
yield entry
else:
raise CommandException('Found an unrecognized ACL '
'entry type, aborting.')
def _AddEntry(self, current_acl):
"""Adds an entry to an ACL."""
if self.scope_type in ['UserById', 'UserById', 'GroupById']:
entry = acl.Entry(type=self.scope_type, permission=self.perm,
id=self.identifier)
elif self.scope_type in ['UserByEmail', 'GroupByEmail']:
entry = acl.Entry(type=self.scope_type, permission=self.perm,
email_address=self.identifier)
elif self.scope_type == 'GroupByDomain':
entry = acl.Entry(type=self.scope_type, permission=self.perm,
domain=self.identifier)
else:
entry = acl.Entry(type=self.scope_type, permission=self.perm)
current_acl.entries.entry_list.append(entry)
def Execute(self, uri, current_acl):
"""Executes the described change on an ACL.
uri: The URI object to change.
current_acl: An instance of boto.gs.acl.ACL to permute.
"""
self.logger.debug('Executing {0} on {1}'
.format(self.raw_descriptor, uri))
if self.perm == 'WRITE' and uri.names_object():
self.logger.warn(
'Skipping {0} on {1}, as WRITE does not apply to objects'
.format(self.raw_descriptor, uri))
return 0
matching_entries = list(self._YieldMatchingEntries(current_acl))
change_count = 0
if matching_entries:
for entry in matching_entries:
if entry.permission != self.perm:
entry.permission = self.perm
change_count += 1
else:
self._AddEntry(current_acl)
change_count = 1
parsed_acl = minidom.parseString(current_acl.to_xml())
self.logger.debug('New Acl:\n{0}'.format(parsed_acl.toprettyxml()))
return change_count
class AclDel(AclChange):
"""Represents a logical change from an access control list."""
scope_regexes = {
r'All(Users)?': 'AllUsers',
r'AllAuth(enticatedUsers)?': 'AllAuthenticatedUsers',
}
def __init__(self, identifier, logger):
self.raw_descriptor = '-d {0}'.format(identifier)
self.logger = logger
self.identifier = identifier
for regex, scope in self.scope_regexes.items():
if re.match(regex, self.identifier, re.IGNORECASE):
self.identifier = scope
self.scope_type = 'Any'
self.perm = 'NONE'
def _YieldMatchingEntries(self, current_acl):
for entry in current_acl.entries.entry_list:
if self.identifier == entry.scope.id:
yield entry
elif self.identifier == entry.scope.email_address:
yield entry
elif self.identifier == entry.scope.domain:
yield entry
elif self.identifier == 'AllUsers' and entry.scope.type == 'AllUsers':
yield entry
elif (self.identifier == 'AllAuthenticatedUsers'
and entry.scope.type == 'AllAuthenticatedUsers'):
yield entry
def Execute(self, uri, current_acl):
self.logger.debug('Executing {0} on {1}'
.format(self.raw_descriptor, uri))
matching_entries = list(self._YieldMatchingEntries(current_acl))
for entry in matching_entries:
current_acl.entries.entry_list.remove(entry)
parsed_acl = minidom.parseString(current_acl.to_xml())
self.logger.debug('New Acl:\n{0}'.format(parsed_acl.toprettyxml()))
return len(matching_entries)
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil chacl [-R] -u|-g|-d <grant>... uri...
where each <grant> is one of the following forms:
-u <id|email>:<perm>
-g <id|email|domain|All|AllAuth>:<perm>
-d <id|email|domain|All|AllAuth>
<B>DESCRIPTION</B>
The chacl command updates access control lists, similar in spirit to the Linux
chmod command. You can specify multiple access grant additions and deletions
in a single command run; all changes will be made atomically to each object in
turn. For example, if the command requests deleting one grant and adding a
different grant, the ACLs being updated will never be left in an intermediate
state where one grant has been deleted but the second grant not yet added.
Each change specifies a user or group grant to add or delete, and for grant
additions, one of R, W, FC (for the permission to be granted). A more formal
description is provided in a later section; below we provide examples.
Note: If you want to set a simple "canned" ACL on each object (such as
project-private or public), or if you prefer to edit the XML representation
for ACLs, you can do that with the setacl command (see 'gsutil help setacl').
<B>EXAMPLES</B>
Grant the user john.doe@example.com WRITE access to the bucket
example-bucket:
gsutil chacl -u john.doe@example.com:WRITE gs://example-bucket
Grant the group admins@example.com FULL_CONTROL access to all jpg files in
the top level of example-bucket:
gsutil chacl -g admins@example.com:FC gs://example-bucket/*.jpg
Grant the user with the specified canonical ID READ access to all objects in
example-bucket that begin with folder/:
gsutil chacl -R \\
-u 84fac329bceSAMPLE777d5d22b8SAMPLE77d85ac2SAMPLE2dfcf7c4adf34da46:R \\
gs://example-bucket/folder/
Grant all users from my-domain.org READ access to the bucket
gcs.my-domain.org:
gsutil chacl -g my-domain.org:R gs://gcs.my-domain.org
Remove any current access by john.doe@example.com from the bucket
example-bucket:
gsutil chacl -d john.doe@example.com gs://example-bucket
If you have a large number of objects to update, enabling multi-threading with
the gsutil -m flag can significantly improve performance. The following
command adds FULL_CONTROL for admin@example.org using multi-threading:
gsutil -m chacl -R -u admin@example.org:FC gs://example-bucket
Grant READ access to everyone from my-domain.org and to all authenticated
users, and grant FULL_CONTROL to admin@mydomain.org, for the buckets
my-bucket and my-other-bucket, with multi-threading enabled:
gsutil -m chacl -R -g my-domain.org:R -g AllAuth:R \\
-u admin@mydomain.org:FC gs://my-bucket/ gs://my-other-bucket
<B>SCOPES</B>
There are four different scopes: Users, Groups, All Authenticated Users, and
All Users.
Users are added with -u and a plain ID or email address, as in
"-u john-doe@gmail.com:r"
Groups are like users, but specified with the -g flag, as in
"-g power-users@example.com:fc". Groups may also be specified as a full
domain, as in "-g my-company.com:r".
AllAuthenticatedUsers and AllUsers are specified directly, as
in "-g AllUsers:R" or "-g AllAuthenticatedUsers:FC". These are case
insensitive, and may be shortened to "all" and "allauth", respectively.
Removing permissions is specified with the -d flag and an ID, email
address, domain, or one of AllUsers or AllAuthenticatedUsers.
Many scopes can be specified on the same command line, allowing bundled
changes to be executed in a single run. This will reduce the number of
requests made to the server.
<B>PERMISSIONS</B>
You may specify the following permissions with either their shorthand or
their full name:
R: READ
W: WRITE
FC: FULL_CONTROL
<B>OPTIONS</B>
-R, -r Performs chacl request recursively, to all objects under the
specified URI.
-u Add or modify a user permission as specified in the SCOPES
and PERMISSIONS sections.
-g Add or modify a group permission as specified in the SCOPES
and PERMISSIONS sections.
-d Remove all permissions associated with the matching argument, as
specified in the SCOPES and PERMISSIONS sections.
""")
class ChAclCommand(Command):
"""Implementation of gsutil chacl command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'chacl',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'Rrfg:u:d:',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'chacl',
# List of help name aliases.
HELP_NAME_ALIASES : ['chmod'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Add / remove entries on bucket and/or object ACLs',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
"""This is the point of entry for the chacl command."""
self.parse_versions = True
self.changes = []
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-g':
self.changes.append(AclChange(a, scope_type=ChangeType.GROUP,
logger=self.THREADED_LOGGER))
if o == '-u':
self.changes.append(AclChange(a, scope_type=ChangeType.USER,
logger=self.THREADED_LOGGER))
if o == '-d':
self.changes.append(AclDel(a, logger=self.THREADED_LOGGER))
if not self.changes:
raise CommandException(
'Please specify at least one access change '
'with the -g, -u, or -d flags')
storage_uri = self.UrisAreForSingleProvider(self.args)
if not (storage_uri and storage_uri.get_provider().name == 'google'):
raise CommandException('The "{0}" command can only be used with gs:// URIs'
.format(self.command_name))
bulk_uris = set()
for uri_arg in self.args:
for result in self.WildcardIterator(uri_arg):
uri = result.uri
if uri.names_bucket():
if self.recursion_requested:
bulk_uris.add(uri.clone_replace_name('*').uri)
else:
# If applying to a bucket directly, the threading machinery will
# break, so we have to apply now, in the main thread.
self.ApplyAclChanges(uri)
else:
bulk_uris.add(uri_arg)
try:
name_expansion_iterator = name_expansion.NameExpansionIterator(
self.command_name, self.proj_id_handler, self.headers, self.debug,
self.bucket_storage_uri_class, bulk_uris, self.recursion_requested)
except CommandException as e:
# NameExpansionIterator will complain if there are no URIs, but we don't
# want to throw an error if we handled bucket URIs.
if e.reason == 'No URIs matched':
return 0
else:
raise e
self.everything_set_okay = True
self.Apply(self.ApplyAclChanges,
name_expansion_iterator,
self._ApplyExceptionHandler)
if not self.everything_set_okay:
raise CommandException('ACLs for some objects could not be set.')
return 0
def _ApplyExceptionHandler(self, exception):
self.THREADED_LOGGER.error('Encountered a problem: {0}'.format(exception))
self.everything_set_okay = False
@Retry(GSResponseError, tries=3, delay=1, backoff=2)
def ApplyAclChanges(self, uri_or_expansion_result):
"""Applies the changes in self.changes to the provided URI."""
if isinstance(uri_or_expansion_result, name_expansion.NameExpansionResult):
uri = self.suri_builder.StorageUri(
uri_or_expansion_result.expanded_uri_str)
else:
uri = uri_or_expansion_result
try:
current_acl = uri.get_acl()
except GSResponseError as e:
self.THREADED_LOGGER.warning('Failed to set acl for {0}: {1}'
.format(uri, e.reason))
return
modification_count = 0
for change in self.changes:
modification_count += change.Execute(uri, current_acl)
if modification_count == 0:
self.THREADED_LOGGER.info('No changes to {0}'.format(uri))
return
# TODO: Remove the concept of forcing when boto provides access to
# bucket generation and meta_generation.
headers = dict(self.headers)
force = uri.names_bucket()
if not force:
key = uri.get_key()
headers['x-goog-if-generation-match'] = key.generation
headers['x-goog-if-metageneration-match'] = key.meta_generation
# If this fails because of a precondition, it will raise a
# GSResponseError for @Retry to handle.
uri.set_acl(current_acl, uri.object_name, False, headers)
self.THREADED_LOGGER.info('Updated ACL on {0}'.format(uri))
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import boto
import datetime
import multiprocessing
import platform
import os
import signal
import sys
import time
import webbrowser
from boto.provider import Provider
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import AbortException
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import HAVE_OAUTH2
from gslib.util import TWO_MB
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil [-D] config [-a] [-b] [-f] [-o <file>] [-r] [-s <scope>] [-w]
<B>DESCRIPTION</B>
The gsutil config command obtains access credentials for Google Cloud
Storage and writes a boto/gsutil configuration file containing the obtained
credentials along with a number of other configuration-controllable values.
Unless specified otherwise (see OPTIONS), the configuration file is written
to ~/.boto (i.e., the file .boto under the user's home directory). If the
default file already exists, an attempt is made to rename the existing file
to ~/.boto.bak; if that attempt fails the command will exit. A different
destination file can be specified with the -o option (see OPTIONS).
Because the boto configuration file contains your credentials you should
keep its file permissions set so no one but you has read access. (The file
is created read-only when you run gsutil config.)
<B>CREDENTIALS</B>
By default gsutil config obtains OAuth2 credentials, and writes them
to the [Credentials] section of the configuration file. The -r, -w,
-f options (see OPTIONS below) cause gsutil config to request a token
with restricted scope; the resulting token will be restricted to read-only
operations, read-write operation, or all operations (including getacl/setacl/
getdefacl/setdefacl/disablelogging/enablelogging/getlogging operations). In
addition, -s <scope> can be used to request additional (non-Google-Storage)
scopes.
If you want to use credentials based on access key and secret (the older
authentication method before OAuth2 was supported) instead of OAuth2,
see help about the -a option in the OPTIONS section.
If you wish to use gsutil with other providers (or to copy data back and
forth between multiple providers) you can edit their credentials into the
[Credentials] section after creating the initial configuration file.
<B>CONFIGURATION FILE SELECTION PROCEDURE</B>
By default, gsutil will look for the configuration file in /etc/boto.cfg and
~/.boto. You can override this choice by setting the BOTO_CONFIG environment
variable. This is also useful if you have several different identities or
cloud storage environments: By setting up the credentials and any additional
configuration in separate files for each, you can switch environments by
changing environment variables.
You can also set up a path of configuration files, by setting the BOTO_PATH
environment variable to contain a ":" delimited path. For example setting
the BOTO_PATH environment variable to:
/etc/projects/my_group_project.boto.cfg:/home/mylogin/.boto
will cause gsutil to load each configuration file found in the path in
order. This is useful if you want to set up some shared configuration
state among many users: The shared state can go in the central shared file
( /etc/projects/my_group_project.boto.cfg) and each user's individual
credentials can be placed in the configuration file in each of their home
directories. (For security reasons users should never share credentials
via a shared configuration file.)
<B>CONFIGURATION FILE STRUCTURE</B>
The configuration file contains a number of sections: [Credentials],
[Boto], [GSUtil], and [OAuth2]. If you edit the file make sure to edit the
appropriate section (discussed below), and to be careful not to mis-edit
any of the setting names (like "gs_access_key_id") and not to remove the
section delimiters (like "[Credentials]").
<B>ADDITIONAL CONFIGURATION-CONTROLLABLE FEATURES</B>
With the exception of setting up gsutil to work through a proxy (see
below), most users won't need to edit values in the boto configuration file;
values found in there tend to be of more specialized use than command line
option-controllable features.
The following are the currently defined configuration settings, broken
down by section. Their use is documented in comments preceding each, in
the configuration file. If you see a setting you want to change that's not
listed in your current file, see the section below on Updating to the Latest
Configuration File.
The currently supported settings, are, by section:
[Boto]
proxy
proxy_port
proxy_user
proxy_pass
is_secure
https_validate_certificates
send_crlf_after_proxy_auth_headers
debug
num_retries
[GSUtil]
resumable_threshold
resumable_tracker_dir
parallel_process_count
parallel_thread_count
default_api_version
default_project_id
use_magicfile
[OAuth2]
token_cache
token_cache
client_id
client_secret
provider_label
provider_authorization_uri
provider_token_uri
<B>UPDATING TO THE LATEST CONFIGURATION FILE</B>
We add new configuration controllable features to the boto configuration file
over time, but most gsutil users create a configuration file once and then
keep it for a long time, so new features aren't apparent when you update
to a newer version of gsutil. If you want to get the latest configuration
file (which includes all the latest settings and documentation about each)
you can rename your current file (e.g., to '.boto_old'), run gsutil config,
and then edit any configuration settings you wanted from your old file
into the newly created file. Note, however, that if you're using OAuth2
credentials and you go back through the OAuth2 configuration dialog it will
invalidate your previous OAuth2 credentials.
If no explicit scope option is given, -f (full control) is assumed by default.
<B>OPTIONS</B>
-a Prompt for Google Cloud Storage access key and secret (the older
authentication method before OAuth2 was supported) instead of
obtaining an OAuth2 token.
-b Causes gsutil config to launch a browser to obtain OAuth2 approval
and the project ID instead of showing the URL for each and asking
the user to open the browser. This will probably not work as
expected if you are running gsutil from an ssh window, or using
gsutil on Windows.
-f Request token with full-control access (default).
-o <file> Write the configuration to <file> instead of ~/.boto.
Use '-' for stdout.
-r Request token restricted to read-only access.
-s <scope> Request additional OAuth2 <scope>.
-w Request token restricted to read-write access.
""")
try:
from oauth2_plugin import oauth2_helper
except ImportError:
pass
GOOG_API_CONSOLE_URI = 'http://code.google.com/apis/console'
SCOPE_FULL_CONTROL = 'https://www.googleapis.com/auth/devstorage.full_control'
SCOPE_READ_WRITE = 'https://www.googleapis.com/auth/devstorage.read_write'
SCOPE_READ_ONLY = 'https://www.googleapis.com/auth/devstorage.read_only'
CONFIG_PRELUDE_CONTENT = """
# This file contains credentials and other configuration information needed
# by the boto library, used by gsutil. You can edit this file (e.g., to add
# credentials) but be careful not to mis-edit any of the variable names (like
# "gs_access_key_id") or remove important markers (like the "[Credentials]" and
# "[Boto]" section delimiters).
#
"""
# Default number of OS processes and Python threads for parallel operations.
# On Linux systems we automatically scale the number of processes to match
# the underlying CPU/core count. Given we'll be running multiple concurrent
# processes on a typical multi-core Linux computer, to avoid being too
# aggressive with resources, the default number of threads is reduced from
# the previous value of 24 to 10.
# On Windows and Mac systems parallel multi-processing and multi-threading
# in Python presents various challenges so we retain compatibility with
# the established parallel mode operation, i.e. one process and 24 threads.
if platform.system() == 'Linux':
DEFAULT_PARALLEL_PROCESS_COUNT = multiprocessing.cpu_count()
DEFAULT_PARALLEL_THREAD_COUNT = 10
else:
DEFAULT_PARALLEL_PROCESS_COUNT = 1
DEFAULT_PARALLEL_THREAD_COUNT = 24
CONFIG_BOTO_SECTION_CONTENT = """
[Boto]
# To use a proxy, edit and uncomment the proxy and proxy_port lines. If you
# need a user/password with this proxy, edit and uncomment those lines as well.
#proxy = <proxy host>
#proxy_port = <proxy port>
#proxy_user = <your proxy user name>
#proxy_pass = <your proxy password>
# The following two options control the use of a secure transport for requests
# to S3 and Google Cloud Storage. It is highly recommended to set both options
# to True in production environments, especially when using OAuth2 bearer token
# authentication with Google Cloud Storage.
# Set 'is_secure' to False to cause boto to connect using HTTP instead of the
# default HTTPS. This is useful if you want to capture/analyze traffic
# (e.g., with tcpdump). This option should always be set to True in production
# environments.
#is_secure = False
# Set 'https_validate_certificates' to False to disable server certificate
# checking. The default for this option in the boto library is currently
# 'False' (to avoid breaking apps that depend on invalid certificates); it is
# therefore strongly recommended to always set this option explicitly to True
# in configuration files, to protect against "man-in-the-middle" attacks.
https_validate_certificates = True
# Set 'send_crlf_after_proxy_auth_headers' to True if you encounter problems
# tunneling HTTPS through a proxy. Users who don't have a proxy in the path
# to Google Cloud Storage don't need to touch this. Users who use a proxy will
# probably find that the default behavior (flag value False) works. If
# you encounter an error like "EOF occurred in violation of protocol" while
# trying to use gsutil through your proxy, try setting this flag to True. We
# (gs-team@google.com) would be interested to hear from you if you need to set
# this, including the make and version of the proxy server you are using.
#send_crlf_after_proxy_auth_headers = False
# 'debug' controls the level of debug messages printed: 0 for none, 1
# for basic boto debug, 2 for all boto debug plus HTTP requests/responses.
# Note: 'gsutil -d' sets debug to 2 for that one command run.
#debug = <0, 1, or 2>
# 'num_retries' controls the number of retry attempts made when errors occur.
# The default is 6. Note: don't set this value to 0, as it will cause boto to
# fail when reusing HTTP connections.
#num_retries = <integer value>
"""
CONFIG_INPUTLESS_GSUTIL_SECTION_CONTENT = """
[GSUtil]
# 'resumable_threshold' specifies the smallest file size [bytes] for which
# resumable Google Cloud Storage transfers are attempted. The default is 2097152
# (2 MiB).
#resumable_threshold = %(resumable_threshold)d
# 'resumable_tracker_dir' specifies the base location where resumable
# transfer tracker files are saved. By default they're in ~/.gsutil
#resumable_tracker_dir = <file path>
# 'parallel_process_count' and 'parallel_thread_count' specify the number
# of OS processes and Python threads, respectively, to use when executing
# operations in parallel. The default settings should work well as configured,
# however, to enhance performance for transfers involving large numbers of
# files, you may experiment with hand tuning these values to optimize
# performance for your particular system configuration.
# MacOS and Windows users should see
# http://code.google.com/p/gsutil/issues/detail?id=78 before attempting
# to experiment with these values.
#parallel_process_count = %(parallel_process_count)d
#parallel_thread_count = %(parallel_thread_count)d
# 'use_magicfile' specifies if the 'file --mime-type <filename>' command should
# be used to guess content types instead of the default filename extension-based
# mechanism. Available on UNIX and MacOS (and possibly on Windows, if you're
# running Cygwin or some other package that provides implementations of
# UNIX-like commands). When available and enabled use_magicfile should be more
# robust because it analyzes file contents in addition to extensions.
#use_magicfile = False
# 'content_language' specifies the ISO 639-1 language code of the content, to be
# passed in the Content-Language header. By default no Content-Language is sent.
# See the ISO 631-1 column of
# http://www.loc.gov/standards/iso639-2/php/code_list.php for a list of
# language codes.
content_language = en
""" % {'resumable_threshold': TWO_MB,
'parallel_process_count': DEFAULT_PARALLEL_PROCESS_COUNT,
'parallel_thread_count': DEFAULT_PARALLEL_THREAD_COUNT}
CONFIG_OAUTH2_CONFIG_CONTENT = """
[OAuth2]
# This section specifies options used with OAuth2 authentication.
# 'token_cache' specifies how the OAuth2 client should cache access tokens.
# Valid values are:
# 'in_memory': an in-memory cache is used. This is only useful if the boto
# client instance (and with it the OAuth2 plugin instance) persists
# across multiple requests.
# 'file_system' : access tokens will be cached in the file system, in files
# whose names include a key derived from the refresh token the access token
# based on.
# The default is 'file_system'.
#token_cache = file_system
#token_cache = in_memory
# 'token_cache_path_pattern' specifies a path pattern for token cache files.
# This option is only relevant if token_cache = file_system.
# The value of this option should be a path, with place-holders '%(key)s' (which
# will be replaced with a key derived from the refresh token the cached access
# token was based on), and (optionally), %(uid)s (which will be replaced with
# the UID of the current user, if available via os.getuid()).
# Note that the config parser itself interpolates '%' placeholders, and hence
# the above placeholders need to be escaped as '%%(key)s'.
# The default value of this option is
# token_cache_path_pattern = <tmpdir>/oauth2client-tokencache.%%(uid)s.%%(key)s
# where <tmpdir> is the system-dependent default temp directory.
# The following options specify the OAuth2 client identity and secret that is
# used when requesting and using OAuth2 tokens. If not specified, a default
# OAuth2 client for the gsutil tool is used; for uses of the boto library (with
# OAuth2 authentication plugin) in other client software, it is recommended to
# use a tool/client-specific OAuth2 client. For more information on OAuth2, see
# http://code.google.com/apis/accounts/docs/OAuth2.html
#client_id = <OAuth2 client id>
#client_secret = <OAuth2 client secret>
# The following options specify the label and endpoint URIs for the OAUth2
# authorization provider being used. Primarily useful for tool developers.
#provider_label = Google
#provider_authorization_uri = https://accounts.google.com/o/oauth2/auth
#provider_token_uri = https://accounts.google.com/o/oauth2/token
"""
class ConfigCommand(Command):
"""Implementation of gsutil config command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'config',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['cfg', 'conf', 'configure'],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 0,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'habfwrs:o:',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : False,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'config',
# List of help name aliases.
HELP_NAME_ALIASES : ['cfg', 'conf', 'configure', 'proxy', 'aws', 's3'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Obtain credentials and create configuration file',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
def _OpenConfigFile(self, file_path):
"""Creates and opens a configuration file for writing.
The file is created with mode 0600, and attempts to open existing files will
fail (the latter is important to prevent symlink attacks).
It is the caller's responsibility to close the file.
Args:
file_path: Path of the file to be created.
Returns:
A writable file object for the opened file.
Raises:
CommandException: if an error occurred when opening the file (including
when the file already exists).
"""
flags = os.O_RDWR | os.O_CREAT | os.O_EXCL
# Accommodate Windows; stolen from python2.6/tempfile.py.
if hasattr(os, 'O_NOINHERIT'):
flags |= os.O_NOINHERIT
try:
fd = os.open(file_path, flags, 0600)
except (OSError, IOError), e:
raise CommandException('Failed to open %s for writing: %s' %
(file_path, e))
return os.fdopen(fd, 'w')
def _WriteBotoConfigFile(self, config_file, use_oauth2=True,
launch_browser=True, oauth2_scopes=[SCOPE_FULL_CONTROL]):
"""Creates a boto config file interactively.
Needed credentials are obtained interactively, either by asking the user for
access key and secret, or by walking the user through the OAuth2 approval
flow.
Args:
config_file: File object to which the resulting config file will be
written.
use_oauth2: If True, walk user through OAuth2 approval flow and produce a
config with an oauth2_refresh_token credential. If false, ask the
user for access key and secret.
launch_browser: In the OAuth2 approval flow, attempt to open a browser
window and navigate to the approval URL.
oauth2_scopes: A list of OAuth2 scopes to request authorization for, when
using OAuth2.
"""
# Collect credentials
provider_map = {'aws': 'aws', 'google': 'gs'}
uri_map = {'aws': 's3', 'google': 'gs'}
key_ids = {}
sec_keys = {}
if use_oauth2:
oauth2_refresh_token = oauth2_helper.OAuth2ApprovalFlow(
oauth2_helper.OAuth2ClientFromBotoConfig(boto.config),
oauth2_scopes, launch_browser)
else:
got_creds = False
for provider in provider_map:
if provider == 'google':
key_ids[provider] = raw_input('What is your %s access key ID? ' %
provider)
sec_keys[provider] = raw_input('What is your %s secret access key? ' %
provider)
got_creds = True
if not key_ids[provider] or not sec_keys[provider]:
raise CommandException(
'Incomplete credentials provided. Please try again.')
if not got_creds:
raise CommandException('No credentials provided. Please try again.')
# Write the config file prelude.
config_file.write(CONFIG_PRELUDE_CONTENT.lstrip())
config_file.write(
'# This file was created by gsutil version %s at %s.\n'
% (self.gsutil_ver,
datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
config_file.write('#\n# You can create additional configuration files by '
'running\n# gsutil config [options] [-o <config-file>]\n\n\n')
# Write the config file Credentials section.
config_file.write('[Credentials]\n\n')
if use_oauth2:
config_file.write('# Google OAuth2 credentials (for "gs://" URIs):\n')
config_file.write('# The following OAuth2 token is authorized for '
'scope(s):\n')
for scope in oauth2_scopes:
config_file.write('# %s\n' % scope)
config_file.write('gs_oauth2_refresh_token = %s\n\n' %
oauth2_refresh_token.refresh_token)
else:
config_file.write('# To add Google OAuth2 credentials ("gs://" URIs), '
'edit and uncomment the\n# following line:\n'
'#gs_oauth2_refresh_token = <your OAuth2 refresh token>\n\n')
for provider in provider_map:
key_prefix = provider_map[provider]
uri_scheme = uri_map[provider]
if provider in key_ids and provider in sec_keys:
config_file.write('# %s credentials ("%s://" URIs):\n' %
(provider, uri_scheme))
config_file.write('%s_access_key_id = %s\n' %
(key_prefix, key_ids[provider]))
config_file.write('%s_secret_access_key = %s\n' %
(key_prefix, sec_keys[provider]))
else:
config_file.write('# To add %s credentials ("%s://" URIs), edit and '
'uncomment the\n# following two lines:\n'
'#%s_access_key_id = <your %s access key ID>\n'
'#%s_secret_access_key = <your %s secret access key>\n' %
(provider, uri_scheme, key_prefix, provider, key_prefix,
provider))
host_key = Provider.HostKeyMap[provider]
config_file.write('# The ability to specify an alternate storage host '
'is primarily for cloud\n# storage service developers.\n'
'#%s_host = <alternate storage host address>\n\n' % host_key)
# Write the config file Boto section.
config_file.write('%s\n' % CONFIG_BOTO_SECTION_CONTENT)
# Write the config file GSUtil section that doesn't depend on user input.
config_file.write(CONFIG_INPUTLESS_GSUTIL_SECTION_CONTENT)
# Write the default API version.
config_file.write("""
# 'default_api_version' specifies the default Google Cloud Storage API
# version to use. If not set below gsutil defaults to API version 1.
""")
api_version = 2
if not use_oauth2: api_version = 1
config_file.write('default_api_version = %d\n' % api_version)
# Write the config file GSUtil section that includes the default
# project ID input from the user.
if launch_browser:
sys.stdout.write(
'Attempting to launch a browser to open the Google API console at '
'URL: %s\n\n'
'[Note: due to a Python bug, you may see a spurious error message '
'"object is not\n callable [...] in [...] Popen.__del__" which can '
'be ignored.]\n\n' % GOOG_API_CONSOLE_URI)
sys.stdout.write(
'In your browser you should see the API Console. Click "Storage" and '
'look for the value under "Identifying your project\n\n')
if not webbrowser.open(GOOG_API_CONSOLE_URI, new=1, autoraise=True):
sys.stdout.write(
'Launching browser appears to have failed; please navigate a '
'browser to the following URL:\n%s\n' % GOOG_API_CONSOLE_URI)
# Short delay; webbrowser.open on linux insists on printing out a message
# which we don't want to run into the prompt for the auth code.
time.sleep(2)
else:
sys.stdout.write(
'\nPlease navigate your browser to %s,\nthen click "Services" on the '
'left side panel and ensure you have Google Cloud\nStorage'
'activated, then click "Google Cloud Storage" on the left side '
'panel and\nfind the "x-goog-project-id" on that page.\n' %
GOOG_API_CONSOLE_URI)
default_project_id = raw_input('What is your project-id? ')
project_id_section_prelude = """
# 'default_project_id' specifies the default Google Cloud Storage project ID to
# use with the 'mb' and 'ls' commands. If defined it overrides the default value
# you set in the API Console. Either of these defaults can be overridden
# by specifying the -p option to the 'mb' and 'ls' commands.
"""
if default_project_id:
config_file.write('%sdefault_project_id = %s\n\n\n' %
(project_id_section_prelude, default_project_id))
else:
sys.stderr.write('No default project ID entered. You will need to edit '
'the default_project_id value\nin your boto config file '
'before using "gsutil ls gs://" or "mb" commands'
'with the\ndefault API version (2).\n')
config_file.write('%s#default_project_id = <value>\n\n\n' %
project_id_section_prelude)
# Write the config file OAuth2 section.
config_file.write(CONFIG_OAUTH2_CONFIG_CONTENT)
# Command entry point.
def RunCommand(self):
scopes = []
use_oauth2 = True
launch_browser = False
output_file_name = None
for opt, opt_arg in self.sub_opts:
if opt == '-a':
use_oauth2 = False
elif opt == '-b':
launch_browser = True
elif opt == '-f':
scopes.append(SCOPE_FULL_CONTROL)
elif opt == '-o':
output_file_name = opt_arg
elif opt == '-r':
scopes.append(SCOPE_READ_ONLY)
elif opt == '-s':
scopes.append(opt_arg)
elif opt == '-w':
scopes.append(SCOPE_READ_WRITE)
if use_oauth2 and not HAVE_OAUTH2:
raise CommandException(
'OAuth2 is only supported when running under Python 2.6 or later\n'
'(unless additional dependencies are installed, '
'see README for details);\n'
'you are running Python %s.\nUse "gsutil config -a" to create a '
'config with Developer Key authentication credentials.' % sys.version)
if not scopes:
scopes.append(SCOPE_FULL_CONTROL)
if output_file_name is None:
# Check to see if a default config file name is requested via
# environment variable. If so, use it, otherwise use the hard-coded
# default file. Then use the default config file name, if it doesn't
# exist or can be moved out of the way without clobbering an existing
# backup file.
boto_config_from_env = os.environ.get('BOTO_CONFIG', None)
if boto_config_from_env:
default_config_path = boto_config_from_env
else:
default_config_path = os.path.expanduser(os.path.join('~', '.boto'))
if not os.path.exists(default_config_path):
output_file_name = default_config_path
default_config_path_bak = None
else:
default_config_path_bak = default_config_path + '.bak'
if os.path.exists(default_config_path_bak):
raise CommandException('Cannot back up existing config '
'file "%s": backup file exists ("%s").'
% (default_config_path, default_config_path_bak))
else:
try:
sys.stderr.write(
'Backing up existing config file "%s" to "%s"...\n'
% (default_config_path, default_config_path_bak))
os.rename(default_config_path, default_config_path_bak)
except e:
raise CommandException('Failed to back up existing config '
'file ("%s" -> "%s"): %s.'
% (default_config_path, default_config_path_bak, e))
output_file_name = default_config_path
if output_file_name == '-':
output_file = sys.stdout
else:
output_file = self._OpenConfigFile(output_file_name)
sys.stderr.write(
'This script will create a boto config file at\n%s\ncontaining your '
'credentials, based on your responses to the following questions.\n\n'
% output_file_name)
# Catch ^C so we can restore the backup.
signal.signal(signal.SIGINT, cleanup_handler)
try:
self._WriteBotoConfigFile(output_file, use_oauth2=use_oauth2,
launch_browser=launch_browser, oauth2_scopes=scopes)
except Exception, e:
user_aborted = isinstance(e, AbortException)
if user_aborted:
sys.stderr.write('\nCaught ^C; cleaning up\n')
# If an error occurred during config file creation, remove the invalid
# config file and restore the backup file.
if output_file_name != '-':
output_file.close()
os.unlink(output_file_name)
if default_config_path_bak:
sys.stderr.write('Restoring previous backed up file (%s)\n' %
default_config_path_bak)
os.rename(default_config_path_bak, output_file_name)
raise
if output_file_name != '-':
output_file.close()
sys.stderr.write(
'\nBoto config file "%s" created.\nIf you need to use a proxy to '
'use a proxy to access the Internet please see the instructions in '
'that file.\n' % output_file_name)
return 0
def cleanup_handler(signalnum, handler):
raise AbortException('User interrupted config command')
# Copyright 2011 Google Inc. All Rights Reserved.
# Copyright 2011, Nexenta Systems Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import boto
import errno
import gzip
import hashlib
import mimetypes
import os
import platform
import re
import subprocess
import stat
import sys
import tempfile
import threading
import time
from boto import config
from boto.exception import GSResponseError
from boto.exception import ResumableUploadException
from boto.gs.resumable_upload_handler import ResumableUploadHandler
from boto.s3.keyfile import KeyFile
from boto.s3.resumable_download_handler import ResumableDownloadHandler
from boto.storage_uri import BucketStorageUri
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import Command
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HELP_TYPE
from gslib.help_provider import HelpType
from gslib.name_expansion import NameExpansionIterator
from gslib.util import ExtractErrorDetail
from gslib.util import IS_WINDOWS
from gslib.util import MakeHumanReadable
from gslib.util import NO_MAX
from gslib.util import TWO_MB
from gslib.wildcard_iterator import ContainsWildcard
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil cp [OPTION]... src_uri dst_uri
- or -
gsutil cp [OPTION]... src_uri... dst_uri
- or -
gsutil cp [OPTION]... -I dst_uri
<B>DESCRIPTION</B>
The gsutil cp command allows you to copy data between your local file
system and the cloud, copy data within the cloud, and copy data between
cloud storage providers. For example, to copy all text files from the
local directory to a bucket you could do:
gsutil cp *.txt gs://my_bucket
Similarly, you can download text files from a bucket by doing:
gsutil cp gs://my_bucket/*.txt .
If you want to copy an entire directory tree you need to use the -R option:
gsutil cp -R dir gs://my_bucket
If you have a large number of files to upload you might want to use the
gsutil -m option, to perform a parallel (multi-threaded/multi-processing)
copy:
gsutil -m cp -R dir gs://my_bucket
You can pass a list of URIs to copy on STDIN instead of as command line
arguments by using the -I option. This allows you to use gsutil in a
pipeline to copy files and objects as generated by a program, such as:
some_program | gsutil -m cp -I gs://my_bucket
The contents of STDIN can name files, cloud URIs, and wildcards of files
and cloud URIs.
<B>HOW NAMES ARE CONSTRUCTED</B>
The gsutil cp command strives to name objects in a way consistent with how
Linux cp works, which causes names to be constructed in varying ways depending
on whether you're performing a recursive directory copy or copying
individually named objects; and whether you're copying to an existing or
non-existent directory.
When performing recursive directory copies, object names are constructed
that mirror the source directory structure starting at the point of
recursive processing. For example, the command:
gsutil cp -R dir1/dir2 gs://my_bucket
will create objects named like gs://my_bucket/dir2/a/b/c, assuming
dir1/dir2 contains the file a/b/c.
In contrast, copying individually named files will result in objects named
by the final path component of the source files. For example, the command:
gsutil cp dir1/dir2/** gs://my_bucket
will create objects named like gs://my_bucket/c.
The same rules apply for downloads: recursive copies of buckets and
bucket subdirectories produce a mirrored filename structure, while copying
individually (or wildcard) named objects produce flatly named files.
Note that in the above example the '**' wildcard matches all names
anywhere under dir. The wildcard '*' will match names just one level deep. For
more details see 'gsutil help wildcards'.
There's an additional wrinkle when working with subdirectories: the resulting
names depend on whether the destination subdirectory exists. For example,
if gs://my_bucket/subdir exists as a subdirectory, the command:
gsutil cp -R dir1/dir2 gs://my_bucket/subdir
will create objects named like gs://my_bucket/subdir/dir2/a/b/c. In contrast,
if gs://my_bucket/subdir does not exist, this same gsutil cp command will
create objects named like gs://my_bucket/subdir/a/b/c.
<B>COPYING TO/FROM SUBDIRECTORIES; DISTRIBUTING TRANSFERS ACROSS MACHINES</B>
You can use gsutil to copy to and from subdirectories by using a command like:
gsutil cp -R dir gs://my_bucket/data
This will cause dir and all of its files and nested subdirectories to be
copied under the specified destination, resulting in objects with names like
gs://my_bucket/data/dir/a/b/c. Similarly you can download from bucket
subdirectories by using a command like:
gsutil cp -R gs://my_bucket/data dir
This will cause everything nested under gs://my_bucket/data to be downloaded
into dir, resulting in files with names like dir/data/a/b/c.
Copying subdirectories is useful if you want to add data to an existing
bucket directory structure over time. It's also useful if you want
to parallelize uploads and downloads across multiple machines (often
reducing overall transfer time compared with simply running gsutil -m
cp on one machine). For example, if your bucket contains this structure:
gs://my_bucket/data/result_set_01/
gs://my_bucket/data/result_set_02/
...
gs://my_bucket/data/result_set_99/
you could perform concurrent downloads across 3 machines by running these
commands on each machine, respectively:
gsutil -m cp -R gs://my_bucket/data/result_set_[0-3]* dir
gsutil -m cp -R gs://my_bucket/data/result_set_[4-6]* dir
gsutil -m cp -R gs://my_bucket/data/result_set_[7-9]* dir
Note that dir could be a local directory on each machine, or it could
be a directory mounted off of a shared file server; whether the latter
performs acceptably may depend on a number of things, so we recommend
you experiment and find out what works best for you.
<B>COPYING IN THE CLOUD AND METADATA PRESERVATION</B>
If both the source and destination URI are cloud URIs from the same
provider, gsutil copies data "in the cloud" (i.e., without downloading
to and uploading from the machine where you run gsutil). In addition to
the performance and cost advantages of doing this, copying in the cloud
preserves metadata (like Content-Type and Cache-Control). In contrast,
when you download data from the cloud it ends up in a file, which has
no associated metadata. Thus, unless you have some way to hold on to
or re-create that metadata, downloading to a file will not retain the
metadata.
Note that by default, the gsutil cp command does not copy the object
ACL to the new object, and instead will use the default bucket ACL (see
"gsutil help setdefacl"). You can override this behavior with the -p
option (see OPTIONS below).
gsutil does not preserve metadata when copying objects between providers.
<B>RESUMABLE TRANSFERS</B>
gsutil automatically uses the Google Cloud Storage resumable upload
feature whenever you use the cp command to upload an object that is larger
than 2 MB. You do not need to specify any special command line options
to make this happen. If your upload is interrupted you can restart the
upload by running the same cp command that you ran to start the upload.
Similarly, gsutil automatically performs resumable downloads (using HTTP
standard Range GET operations) whenever you use the cp command to download an
object larger than 2 MB.
Resumable uploads and downloads store some state information in a file
in ~/.gsutil named by the destination object or file. If you attempt to
resume a transfer from a machine with a different directory, the transfer
will start over from scratch.
See also "gsutil help prod" for details on using resumable transfers
in production.
<B>STREAMING TRANSFERS</B>
Use '-' in place of src_uri or dst_uri to perform a streaming
transfer. For example:
long_running_computation | gsutil cp - gs://my_bucket/obj
Streaming transfers do not support resumable uploads/downloads.
(The Google resumable transfer protocol has a way to support streaming
transers, but gsutil doesn't currently implement support for this.)
<B>CHANGING TEMP DIRECTORIES</B>
gsutil writes data to a temporary directory in several cases:
- when compressing data to be uploaded (see the -z option)
- when decompressing data being downloaded (when the data has
Content-Encoding:gzip, e.g., as happens when uploaded using gsutil cp -z)
- when running integration tests (using the gsutil test command)
In these cases it's possible the temp file location on your system that
gsutil selects by default may not have enough space. If you find that
gsutil runs out of space during one of these operations (e.g., raising
"CommandException: Inadequate temp space available to compress <your file>"
during a gsutil cp -z operation), you can change where it writes these
temp files by setting the TMPDIR environment variable. On Linux and MacOS
you can do this either by running gsutil this way:
TMPDIR=/some/directory gsutil cp ...
or by adding this line to your ~/.bashrc file and then restarting the shell
before running gsutil:
export TMPDIR=/some/directory
On Windows 7 you can change the TMPDIR environment variable from Start ->
Computer -> System -> Advanced System Settings -> Environment Variables.
You need to reboot after making this change for it to take effect. (Rebooting
is not necessary after running the export command on Linux and MacOS.)
<B>OPTIONS</B>
-a canned_acl Sets named canned_acl when uploaded objects created. See
'gsutil help acls' for further details.
-c If an error occurrs, continue to attempt to copy the remaining
files.
-D Copy in "daisy chain" mode, i.e., copying between two buckets by
hooking a download to an upload, via the machine where gsutil is
run. By default, data are copied between two buckets "in the
cloud", i.e., without needing to copy via the machine where
gsutil runs. However, copy-in-the-cloud is not supported when
copying between different locations (like US and EU) or between
different storage classes (like STANDARD and
DURABLE_REDUCED_AVAILABILITY). For these cases, you can use the
-D option to copy data between buckets.
Note: Daisy chain mode is automatically used when copying
between providers (e.g., to copy data from Google Cloud Storage
to another provider).
-e Exclude symlinks. When specified, symbolic links will not be
copied.
-n No-clobber. When specified, existing files or objects at the
destination will not be overwritten. Any items that are skipped
by this option will be reported as being skipped. This option
will perform an additional HEAD request to check if an item
exists before attempting to upload the data. This will save
retransmitting data, but the additional HTTP requests may make
small object transfers slower and more expensive.
This option can be combined with the -c option to build a script
that copies a large number of objects, allowing retries when
some failures occur from which gsutil doesn't automatically
recover, using a bash script like the following:
status=1
while [ $status -ne 0 ] ; do
gsutil cp -c -n -R ./dir gs://bucket
status=$?
done
The -c option will cause copying to continue after failures
occur, and the -n option will cause objects already copied to be
skipped on subsequent iterations. The loop will continue running
as long as gsutil exits with a non-zero status (such a status
indicates there was at least one failure during the gsutil run).
-p Causes ACLs to be preserved when copying in the cloud. Note that
this option has performance and cost implications, because it
is essentially performing three requests (getacl, cp, setacl).
(The performance issue can be mitigated to some degree by
using gsutil -m cp to cause parallel copying.)
You can avoid the additional performance and cost of using cp -p
if you want all objects in the destination bucket to end up with
the same ACL by setting a default ACL on that bucket instead of
using cp -p. See "help gsutil setdefacl".
Note that it's not valid to specify both the -a and -p options
together.
-q Causes copies to be performed quietly, i.e., without reporting
progress indicators of files being copied. Errors are still
reported. This option can be useful for running gsutil from a
cron job that logs its output to a file, for which the only
information desired in the log is failures.
-R, -r Causes directories, buckets, and bucket subdirectories to be
copied recursively. If you neglect to use this option for
an upload, gsutil will copy any files it finds and skip any
directories. Similarly, neglecting to specify -R for a download
will cause gsutil to copy any objects at the current bucket
directory level, and skip any subdirectories.
-v Requests that the version-specific URI for each uploaded object
be printed. Given this URI you can make future upload requests
that are safe in the face of concurrent updates, because Google
Cloud Storage will refuse to perform the update if the current
object version doesn't match the version-specific URI. See
'gsutil help versioning' for more details. Note: at present this
option does not work correctly for objects copied "in the cloud"
(e.g., gsutil cp gs://bucket/obj1 gs://bucket/obj2).
-z ext1,... Compresses file uploads with the given extensions. If you are
uploading a large file with compressible content, such as
a .js, .css, or .html file, you can gzip-compress the file
during the upload process by specifying the -z <extensions>
option. Compressing data before upload saves on usage charges
because you are uploading a smaller amount of data.
When you specify the -z option, the data from your files is
compressed before it is uploaded, but your actual files are left
uncompressed on the local disk. The uploaded objects retain the
original content type and name as the original files but are
given a Content-Encoding header with the value "gzip" to
indicate that the object data stored are compressed on the
Google Cloud Storage servers.
For example, the following command:
gsutil cp -z html -a public-read cattypes.html gs://mycats
will do all of the following:
- Upload as the object gs://mycats/cattypes.html (cp command)
- Set the Content-Type to text/html (based on file extension)
- Compress the data in the file cattypes.html (-z option)
- Set the Content-Encoding to gzip (-z option)
- Set the ACL to public-read (-a option)
- If a user tries to view cattypes.html in a browser, the
browser will know to uncompress the data based on the
Content-Encoding header, and to render it as HTML based on
the Content-Type header.
""")
class CpCommand(Command):
"""
Implementation of gsutil cp command.
Note that CpCommand is run for both gsutil cp and gsutil mv. The latter
happens by MvCommand calling CpCommand and passing the hidden (undocumented)
-M option. This allows the copy and remove needed for each mv to run
together (rather than first running all the cp's and then all the rm's, as
we originally had implemented), which in turn avoids the following problem
with removing the wrong objects: starting with a bucket containing only
the object gs://bucket/obj, say the user does:
gsutil mv gs://bucket/* gs://bucket/d.txt
If we ran all the cp's and then all the rm's and we didn't expand the wildcard
first, the cp command would first copy gs://bucket/obj to gs://bucket/d.txt,
and the rm command would then remove that object. In the implementation
prior to gsutil release 3.12 we avoided this by building a list of objects
to process and then running the copies and then the removes; but building
the list up front limits scalability (compared with the current approach
of processing the bucket listing iterator on the fly).
"""
# Set default Content-Type type.
DEFAULT_CONTENT_TYPE = 'application/octet-stream'
USE_MAGICFILE = boto.config.getbool('GSUtil', 'use_magicfile', False)
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'cp',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['copy'],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
# -t is deprecated but leave intact for now to avoid breakage.
SUPPORTED_SUB_ARGS : 'a:cDeIMNnpqrRtvz:',
# True if file URIs acceptable for this command.
FILE_URIS_OK : True,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'cp',
# List of help name aliases.
HELP_NAME_ALIASES : ['copy'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Copy files and objects',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
def _CheckFinalMd5(self, key, file_name):
"""
Checks that etag from server agrees with md5 computed after the
download completes.
"""
obj_md5 = key.etag.strip('"\'')
file_md5 = None
if hasattr(key, 'md5') and key.md5:
file_md5 = key.md5
else:
print 'Computing MD5 from scratch for resumed download'
# Open file in binary mode to avoid surprises in Windows.
fp = open(file_name, 'rb')
try:
file_md5 = key.compute_md5(fp)[0]
finally:
fp.close()
if self.debug:
print 'Checking file md5 against etag. (%s/%s)' % (file_md5, obj_md5)
if file_md5 != obj_md5:
# Checksums don't match - remove file and raise exception.
os.unlink(file_name)
raise CommandException(
'File changed during download: md5 signature doesn\'t match '
'etag (incorrect downloaded file deleted)')
def _CheckForDirFileConflict(self, exp_src_uri, dst_uri):
"""Checks whether copying exp_src_uri into dst_uri is not possible.
This happens if a directory exists in local file system where a file
needs to go or vice versa. In that case we print an error message and
exits. Example: if the file "./x" exists and you try to do:
gsutil cp gs://mybucket/x/y .
the request can't succeed because it requires a directory where
the file x exists.
Note that we don't enforce any corresponding restrictions for buckets,
because the flat namespace semantics for buckets doesn't prohibit such
cases the way hierarchical file systems do. For example, if a bucket
contains an object called gs://bucket/dir and then you run the command:
gsutil cp file1 file2 gs://bucket/dir
you'll end up with objects gs://bucket/dir, gs://bucket/dir/file1, and
gs://bucket/dir/file2.
Args:
exp_src_uri: Expanded source StorageUri of copy.
dst_uri: Destination URI.
Raises:
CommandException: if errors encountered.
"""
if dst_uri.is_cloud_uri():
# The problem can only happen for file destination URIs.
return
dst_path = dst_uri.object_name
final_dir = os.path.dirname(dst_path)
if os.path.isfile(final_dir):
raise CommandException('Cannot retrieve %s because a file exists '
'where a directory needs to be created (%s).' %
(exp_src_uri, final_dir))
if os.path.isdir(dst_path):
raise CommandException('Cannot retrieve %s because a directory exists '
'(%s) where the file needs to be created.' %
(exp_src_uri, dst_path))
def _InsistDstUriNamesContainer(self, exp_dst_uri,
have_existing_dst_container, command_name):
"""
Raises an exception if URI doesn't name a directory, bucket, or bucket
subdir, with special exception for cp -R (see comments below).
Args:
exp_dst_uri: Wildcard-expanding dst_uri.
have_existing_dst_container: bool indicator of whether exp_dst_uri
names a container (directory, bucket, or existing bucket subdir).
command_name: Name of command making call. May not be the same as
self.command_name in the case of commands implemented atop other
commands (like mv command).
Raises:
CommandException: if the URI being checked does not name a container.
"""
if exp_dst_uri.is_file_uri():
ok = exp_dst_uri.names_directory()
else:
if have_existing_dst_container:
ok = True
else:
# It's ok to specify a non-existing bucket subdir, for example:
# gsutil cp -R dir gs://bucket/abc
# where gs://bucket/abc isn't an existing subdir.
ok = exp_dst_uri.names_object()
if not ok:
raise CommandException('Destination URI must name a directory, bucket, '
'or bucket\nsubdirectory for the multiple '
'source form of the %s command.' % command_name)
class _FileCopyCallbackHandler(object):
"""Outputs progress info for large copy requests."""
def __init__(self, upload):
if upload:
self.announce_text = 'Uploading'
else:
self.announce_text = 'Downloading'
def call(self, total_bytes_transferred, total_size):
sys.stderr.write('%s: %s/%s \r' % (
self.announce_text,
MakeHumanReadable(total_bytes_transferred),
MakeHumanReadable(total_size)))
if total_bytes_transferred == total_size:
sys.stderr.write('\n')
class _StreamCopyCallbackHandler(object):
"""Outputs progress info for Stream copy to cloud.
Total Size of the stream is not known, so we output
only the bytes transferred.
"""
def call(self, total_bytes_transferred, total_size):
sys.stderr.write('Uploading: %s \r' % (
MakeHumanReadable(total_bytes_transferred)))
if total_size and total_bytes_transferred == total_size:
sys.stderr.write('\n')
def _GetTransferHandlers(self, dst_uri, size, upload):
"""
Selects upload/download and callback handlers.
We use a callback handler that shows a simple textual progress indicator
if size is above the configurable threshold.
We use a resumable transfer handler if size is >= the configurable
threshold and resumable transfers are supported by the given provider.
boto supports resumable downloads for all providers, but resumable
uploads are currently only supported by GS.
Args:
dst_uri: the destination URI.
size: size of file (object) being uploaded (downloaded).
upload: bool indication of whether transfer is an upload.
"""
config = boto.config
resumable_threshold = config.getint('GSUtil', 'resumable_threshold', TWO_MB)
transfer_handler = None
cb = None
num_cb = None
# Checks whether the destination file is a "special" file, like /dev/null on
# Linux platforms or null on Windows platforms, so we can disable resumable
# download support since the file size of the destination won't ever be
# correct.
dst_is_special = False
if dst_uri.is_file_uri():
# Check explicitly first because os.stat doesn't work on 'nul' in Windows.
if dst_uri.object_name == os.devnull:
dst_is_special = True
try:
mode = os.stat(dst_uri.object_name).st_mode
if stat.S_ISCHR(mode):
dst_is_special = True
except OSError:
pass
if size >= resumable_threshold and not dst_is_special:
if not self.quiet:
cb = self._FileCopyCallbackHandler(upload).call
num_cb = int(size / TWO_MB)
resumable_tracker_dir = config.get(
'GSUtil', 'resumable_tracker_dir',
os.path.expanduser('~' + os.sep + '.gsutil'))
if not os.path.exists(resumable_tracker_dir):
os.makedirs(resumable_tracker_dir)
if upload:
# Encode the dest bucket and object name into the tracker file name.
res_tracker_file_name = (
re.sub('[/\\\\]', '_', 'resumable_upload__%s__%s.url' %
(dst_uri.bucket_name, dst_uri.object_name)))
else:
# Encode the fully-qualified dest file name into the tracker file name.
res_tracker_file_name = (
re.sub('[/\\\\]', '_', 'resumable_download__%s.etag' %
(os.path.realpath(dst_uri.object_name))))
res_tracker_file_name = _hash_filename(res_tracker_file_name)
tracker_file = '%s%s%s' % (resumable_tracker_dir, os.sep,
res_tracker_file_name)
if upload:
if dst_uri.scheme == 'gs':
transfer_handler = ResumableUploadHandler(tracker_file)
else:
transfer_handler = ResumableDownloadHandler(tracker_file)
return (cb, num_cb, transfer_handler)
def _LogCopyOperation(self, src_uri, dst_uri, headers):
"""
Logs copy operation being performed, including Content-Type if appropriate.
"""
if self.quiet:
return
if 'Content-Type' in headers and dst_uri.is_cloud_uri():
content_type_msg = ' [Content-Type=%s]' % headers['Content-Type']
else:
content_type_msg = ''
if src_uri.is_stream():
self.THREADED_LOGGER.info('Copying from <STDIN>%s...', content_type_msg)
else:
self.THREADED_LOGGER.info('Copying %s%s...', src_uri, content_type_msg)
# We pass the headers explicitly to this call instead of using self.headers
# so we can set different metadata (like Content-Type type) for each object.
def _CopyObjToObjInTheCloud(self, src_key, src_uri, dst_uri, headers):
"""Performs copy-in-the cloud from specified src to dest object.
Args:
src_key: Source Key.
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
headers: A copy of the headers dictionary.
Returns:
(elapsed_time, bytes_transferred, dst_uri) excluding overhead like initial
HEAD. Note: At present copy-in-the-cloud doesn't return the generation of
the created object, so the returned URI is actually not version-specific
(unlike other cp cases).
Raises:
CommandException: if errors encountered.
"""
self._SetContentTypeHeader(src_uri, headers)
self._LogCopyOperation(src_uri, dst_uri, headers)
# Do Object -> object copy within same provider (uses
# x-<provider>-copy-source metadata HTTP header to request copying at the
# server).
src_bucket = src_uri.get_bucket(False, headers)
preserve_acl = False
canned_acl = None
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-a':
canned_acls = dst_uri.canned_acls()
if a not in canned_acls:
raise CommandException('Invalid canned ACL "%s".' % a)
canned_acl = a
headers[dst_uri.get_provider().acl_header] = canned_acl
if o == '-p':
preserve_acl = True
if preserve_acl and canned_acl:
raise CommandException(
'Specifying both the -p and -a options together is invalid.')
start_time = time.time()
# Pass headers in headers param not metadata param, so boto will copy
# existing key's metadata and just set the additional headers specified
# in the headers param (rather than using the headers to override existing
# metadata). In particular this allows us to copy the existing key's
# Content-Type and other metadata users need while still being able to
# set headers the API needs (like x-goog-project-id). Note that this means
# you can't do something like:
# gsutil cp -t Content-Type text/html gs://bucket/* gs://bucket2
# to change the Content-Type while copying.
try:
dst_key = dst_uri.copy_key(
src_bucket.name, src_uri.object_name, preserve_acl=False,
headers=headers, src_version_id=src_uri.version_id,
src_generation=src_uri.generation)
except GSResponseError as e:
exc_name, error_detail = ExtractErrorDetail(e)
if (exc_name == 'GSResponseError'
and ('Copy-in-the-cloud disallowed' in error_detail)):
raise CommandException('%s.\nNote: you can copy between locations '
'and between storage classes by using the '
'gsutil cp -D option.' % error_detail)
else:
raise
end_time = time.time()
return (end_time - start_time, src_key.size,
dst_uri.clone_replace_key(dst_key))
def _CheckFreeSpace(self, path):
"""Return path/drive free space (in bytes)."""
if platform.system() == 'Windows':
from ctypes import c_int, c_uint64, c_wchar_p, windll, POINTER, WINFUNCTYPE, WinError
try:
GetDiskFreeSpaceEx = WINFUNCTYPE(c_int, c_wchar_p, POINTER(c_uint64),
POINTER(c_uint64), POINTER(c_uint64))
GetDiskFreeSpaceEx = GetDiskFreeSpaceEx(
('GetDiskFreeSpaceExW', windll.kernel32), (
(1, 'lpszPathName'),
(2, 'lpFreeUserSpace'),
(2, 'lpTotalSpace'),
(2, 'lpFreeSpace'),))
except AttributeError:
GetDiskFreeSpaceEx = WINFUNCTYPE(c_int, c_char_p, POINTER(c_uint64),
POINTER(c_uint64), POINTER(c_uint64))
GetDiskFreeSpaceEx = GetDiskFreeSpaceEx(
('GetDiskFreeSpaceExA', windll.kernel32), (
(1, 'lpszPathName'),
(2, 'lpFreeUserSpace'),
(2, 'lpTotalSpace'),
(2, 'lpFreeSpace'),))
def GetDiskFreeSpaceEx_errcheck(result, func, args):
if not result:
raise WinError()
return args[1].value
GetDiskFreeSpaceEx.errcheck = GetDiskFreeSpaceEx_errcheck
return GetDiskFreeSpaceEx(os.getenv('SystemDrive'))
else:
(_, f_frsize, _, _, f_bavail, _, _, _, _, _) = os.statvfs(path)
return f_frsize * f_bavail
def _PerformResumableUploadIfApplies(self, fp, dst_uri, canned_acl, headers):
"""
Performs resumable upload if supported by provider and file is above
threshold, else performs non-resumable upload.
Returns (elapsed_time, bytes_transferred, version-specific dst_uri).
"""
start_time = time.time()
# Determine file size different ways for case where fp is actually a wrapper
# around a Key vs an actual file.
if isinstance(fp, KeyFile):
file_size = fp.getkey().size
else:
file_size = os.path.getsize(fp.name)
(cb, num_cb, res_upload_handler) = self._GetTransferHandlers(
dst_uri, file_size, True)
if dst_uri.scheme == 'gs':
# Resumable upload protocol is Google Cloud Storage-specific.
dst_uri.set_contents_from_file(fp, headers, policy=canned_acl,
cb=cb, num_cb=num_cb,
res_upload_handler=res_upload_handler)
else:
dst_uri.set_contents_from_file(fp, headers, policy=canned_acl,
cb=cb, num_cb=num_cb)
if res_upload_handler:
# ResumableUploadHandler does not update upload_start_point from its
# initial value of -1 if transferring the whole file, so clamp at 0
bytes_transferred = file_size - max(
res_upload_handler.upload_start_point, 0)
else:
bytes_transferred = file_size
end_time = time.time()
return (end_time - start_time, bytes_transferred, dst_uri)
def _PerformStreamingUpload(self, fp, dst_uri, headers, canned_acl=None):
"""
Performs a streaming upload to the cloud.
Args:
fp: The file whose contents to upload.
dst_uri: Destination StorageUri.
headers: A copy of the headers dictionary.
canned_acl: Optional canned ACL to set on the object.
Returns (elapsed_time, bytes_transferred, version-specific dst_uri).
"""
start_time = time.time()
if self.quiet:
cb = None
else:
cb = self._StreamCopyCallbackHandler().call
dst_uri.set_contents_from_stream(
fp, headers, policy=canned_acl, cb=cb)
try:
bytes_transferred = fp.tell()
except:
bytes_transferred = 0
end_time = time.time()
return (end_time - start_time, bytes_transferred, dst_uri)
def _SetContentTypeHeader(self, src_uri, headers):
"""
Sets content type header to value specified in '-h Content-Type' option (if
specified); else sets using Content-Type detection.
"""
if 'Content-Type' in headers:
# If empty string specified (i.e., -h "Content-Type:") set header to None,
# which will inhibit boto from sending the CT header. Otherwise, boto will
# pass through the user specified CT header.
if not headers['Content-Type']:
headers['Content-Type'] = None
# else we'll keep the value passed in via -h option (not performing
# content type detection).
else:
# Only do content type recognition is src_uri is a file. Object-to-object
# copies with no -h Content-Type specified re-use the content type of the
# source object.
if src_uri.is_file_uri():
object_name = src_uri.object_name
content_type = None
# Streams (denoted by '-') are expected to be 'application/octet-stream'
# and 'file' would partially consume them.
if object_name != '-':
if self.USE_MAGICFILE:
p = subprocess.Popen(['file', '--mime-type', object_name],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = p.communicate()
if p.returncode != 0 or error:
raise CommandException(
'Encountered error running "file --mime-type %s" '
'(returncode=%d).\n%s' % (object_name, p.returncode, error))
# Parse output by removing line delimiter and splitting on last ":
content_type = output.rstrip().rpartition(': ')[2]
else:
content_type = mimetypes.guess_type(object_name)[0]
if not content_type:
content_type = self.DEFAULT_CONTENT_TYPE
headers['Content-Type'] = content_type
def _UploadFileToObject(self, src_key, src_uri, dst_uri, headers,
should_log=True):
"""Uploads a local file to an object.
Args:
src_key: Source StorageUri. Must be a file URI.
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
headers: The headers dictionary.
should_log: bool indicator whether we should log this operation.
Returns:
(elapsed_time, bytes_transferred, version-specific dst_uri), excluding
overhead like initial HEAD.
Raises:
CommandException: if errors encountered.
"""
gzip_exts = []
canned_acl = None
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-a':
canned_acls = dst_uri.canned_acls()
if a not in canned_acls:
raise CommandException('Invalid canned ACL "%s".' % a)
canned_acl = a
elif o == '-t':
print('Warning: -t is deprecated, and will be removed in the future. '
'Content type\ndetection is '
'now performed by default, unless inhibited by specifying '
'a\nContent-Type header via the -h option.')
elif o == '-z':
gzip_exts = a.split(',')
self._SetContentTypeHeader(src_uri, headers)
if should_log:
self._LogCopyOperation(src_uri, dst_uri, headers)
if 'Content-Language' not in headers:
content_language = config.get_value('GSUtil', 'content_language')
if content_language:
headers['Content-Language'] = content_language
fname_parts = src_uri.object_name.split('.')
if len(fname_parts) > 1 and fname_parts[-1] in gzip_exts:
if self.debug:
print 'Compressing %s (to tmp)...' % src_key
(gzip_fh, gzip_path) = tempfile.mkstemp()
gzip_fp = None
try:
# Check for temp space. Assume the compressed object is at most 2x
# the size of the object (normally should compress to smaller than
# the object)
if (self._CheckFreeSpace(gzip_path)
< 2*int(os.path.getsize(src_key.name))):
raise CommandException('Inadequate temp space available to compress '
'%s' % src_key.name)
gzip_fp = gzip.open(gzip_path, 'wb')
gzip_fp.writelines(src_key.fp)
finally:
if gzip_fp:
gzip_fp.close()
os.close(gzip_fh)
headers['Content-Encoding'] = 'gzip'
gzip_fp = open(gzip_path, 'rb')
try:
(elapsed_time, bytes_transferred, result_uri) = (
self._PerformResumableUploadIfApplies(gzip_fp, dst_uri,
canned_acl, headers))
finally:
gzip_fp.close()
try:
os.unlink(gzip_path)
# Windows sometimes complains the temp file is locked when you try to
# delete it.
except Exception, e:
pass
elif (src_key.is_stream()
and dst_uri.get_provider().supports_chunked_transfer()):
(elapsed_time, bytes_transferred, result_uri) = (
self._PerformStreamingUpload(src_key.fp, dst_uri, headers,
canned_acl))
else:
if src_key.is_stream():
# For Providers that doesn't support chunked Transfers
tmp = tempfile.NamedTemporaryFile()
file_uri = self.suri_builder.StorageUri('file://%s' % tmp.name)
try:
file_uri.new_key(False, headers).set_contents_from_file(
src_key.fp, headers)
src_key = file_uri.get_key()
finally:
file_uri.close()
try:
(elapsed_time, bytes_transferred, result_uri) = (
self._PerformResumableUploadIfApplies(src_key.fp, dst_uri,
canned_acl, headers))
finally:
if src_key.is_stream():
tmp.close()
else:
src_key.close()
return (elapsed_time, bytes_transferred, result_uri)
def _DownloadObjectToFile(self, src_key, src_uri, dst_uri, headers,
should_log=True):
"""Downloads an object to a local file.
Args:
src_key: Source StorageUri. Must be a file URI.
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
headers: The headers dictionary.
should_log: bool indicator whether we should log this operation.
Returns:
(elapsed_time, bytes_transferred, dst_uri), excluding overhead like
initial HEAD.
Raises:
CommandException: if errors encountered.
"""
if should_log:
self._LogCopyOperation(src_uri, dst_uri, headers)
(cb, num_cb, res_download_handler) = self._GetTransferHandlers(
dst_uri, src_key.size, False)
file_name = dst_uri.object_name
dir_name = os.path.dirname(file_name)
if dir_name and not os.path.exists(dir_name):
# Do dir creation in try block so can ignore case where dir already
# exists. This is needed to avoid a race condition when running gsutil
# -m cp.
try:
os.makedirs(dir_name)
except OSError, e:
if e.errno != errno.EEXIST:
raise
# For gzipped objects not named *.gz download to a temp file and unzip.
if (hasattr(src_key, 'content_encoding')
and src_key.content_encoding == 'gzip'
and not file_name.endswith('.gz')):
# We can't use tempfile.mkstemp() here because we need a predictable
# filename for resumable downloads.
download_file_name = '%s_.gztmp' % file_name
need_to_unzip = True
else:
download_file_name = file_name
need_to_unzip = False
fp = None
try:
if res_download_handler:
fp = open(download_file_name, 'ab')
else:
fp = open(download_file_name, 'wb')
start_time = time.time()
src_key.get_contents_to_file(fp, headers, cb=cb, num_cb=num_cb,
res_download_handler=res_download_handler)
# If a custom test method is defined, call it here. For the copy command,
# test methods are expected to take one argument: an open file pointer,
# and are used to perturb the open file during download to exercise
# download error detection.
if self.test_method:
self.test_method(fp)
end_time = time.time()
finally:
if fp:
fp.close()
# Discard the md5 if we are resuming a partial download.
if res_download_handler and res_download_handler.download_start_point:
src_key.md5 = None
# Verify downloaded file checksum matched source object's checksum.
self._CheckFinalMd5(src_key, download_file_name)
if res_download_handler:
bytes_transferred = (
src_key.size - res_download_handler.download_start_point)
else:
bytes_transferred = src_key.size
if need_to_unzip:
# Log that we're uncompressing if the file is big enough that
# decompressing would make it look like the transfer "stalled" at the end.
if not self.quiet and bytes_transferred > 10 * 1024 * 1024:
self.THREADED_LOGGER.info('Uncompressing downloaded tmp file to %s...',
file_name)
# Downloaded gzipped file to a filename w/o .gz extension, so unzip.
f_in = gzip.open(download_file_name, 'rb')
f_out = open(file_name, 'wb')
try:
while True:
data = f_in.read(8192)
if not data:
break
f_out.write(data)
finally:
f_out.close()
f_in.close()
os.unlink(download_file_name)
return (end_time - start_time, bytes_transferred, dst_uri)
def _PerformDownloadToStream(self, src_key, src_uri, str_fp, headers):
(cb, num_cb, res_download_handler) = self._GetTransferHandlers(
src_uri, src_key.size, False)
start_time = time.time()
src_key.get_contents_to_file(str_fp, headers, cb=cb, num_cb=num_cb)
end_time = time.time()
bytes_transferred = src_key.size
end_time = time.time()
return (end_time - start_time, bytes_transferred)
def _CopyFileToFile(self, src_key, src_uri, dst_uri, headers):
"""Copies a local file to a local file.
Args:
src_key: Source StorageUri. Must be a file URI.
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
headers: The headers dictionary.
Returns:
(elapsed_time, bytes_transferred, dst_uri), excluding
overhead like initial HEAD.
Raises:
CommandException: if errors encountered.
"""
self._LogCopyOperation(src_uri, dst_uri, headers)
dst_key = dst_uri.new_key(False, headers)
start_time = time.time()
dst_key.set_contents_from_file(src_key.fp, headers)
end_time = time.time()
return (end_time - start_time, os.path.getsize(src_key.fp.name), dst_uri)
def _CopyObjToObjDaisyChainMode(self, src_key, src_uri, dst_uri, headers):
"""Copies from src_uri to dst_uri in "daisy chain" mode.
See -D OPTION documentation about what daisy chain mode is.
Args:
src_key: Source Key.
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
headers: A copy of the headers dictionary.
Returns:
(elapsed_time, bytes_transferred, version-specific dst_uri) excluding
overhead like initial HEAD.
Raises:
CommandException: if errors encountered.
"""
self._SetContentTypeHeader(src_uri, headers)
self._LogCopyOperation(src_uri, dst_uri, headers)
canned_acl = None
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-a':
canned_acls = dst_uri.canned_acls()
if a not in canned_acls:
raise CommandException('Invalid canned ACL "%s".' % a)
canned_acl = a
elif o == '-p':
# We don't attempt to preserve ACLs across providers because
# GCS and S3 support different ACLs and disjoint principals.
raise NotImplementedError('Cross-provider cp -p not supported')
return self._PerformResumableUploadIfApplies(KeyFile(src_key), dst_uri,
canned_acl, headers)
def _PerformCopy(self, src_uri, dst_uri):
"""Performs copy from src_uri to dst_uri, handling various special cases.
Args:
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
Returns:
(elapsed_time, bytes_transferred, version-specific dst_uri) excluding
overhead like initial HEAD.
Raises:
CommandException: if errors encountered.
"""
# Make a copy of the input headers each time so we can set a different
# content type for each object.
if self.headers:
headers = self.headers.copy()
else:
headers = {}
src_key = src_uri.get_key(False, headers)
if not src_key:
raise CommandException('"%s" does not exist.' % src_uri)
# On Windows, stdin is opened as text mode instead of binary which causes
# problems when piping a binary file, so this switches it to binary mode.
if IS_WINDOWS and src_uri.is_file_uri() and src_key.is_stream():
import msvcrt
msvcrt.setmode(src_key.fp.fileno(), os.O_BINARY)
if self.no_clobber:
# There are two checks to prevent clobbering:
# 1) The first check is to see if the item
# already exists at the destination and prevent the upload/download
# from happening. This is done by the exists() call.
# 2) The second check is only relevant if we are writing to gs. We can
# enforce that the server only writes the object if it doesn't exist
# by specifying the header below. This check only happens at the
# server after the complete file has been uploaded. We specify this
# header to prevent a race condition where a destination file may
# be created after the first check and before the file is fully
# uploaded.
# In order to save on unnecessary uploads/downloads we perform both
# checks. However, this may come at the cost of additional HTTP calls.
if dst_uri.exists(headers):
if not self.quiet:
self.THREADED_LOGGER.info('Skipping existing item: %s' %
dst_uri.uri)
return (0, 0, None)
if dst_uri.is_cloud_uri() and dst_uri.scheme == 'gs':
headers['x-goog-if-generation-match'] = '0'
if src_uri.is_cloud_uri() and dst_uri.is_cloud_uri():
if src_uri.scheme == dst_uri.scheme and not self.daisy_chain:
return self._CopyObjToObjInTheCloud(src_key, src_uri, dst_uri, headers)
else:
return self._CopyObjToObjDaisyChainMode(src_key, src_uri, dst_uri,
headers)
elif src_uri.is_file_uri() and dst_uri.is_cloud_uri():
return self._UploadFileToObject(src_key, src_uri, dst_uri, headers)
elif src_uri.is_cloud_uri() and dst_uri.is_file_uri():
return self._DownloadObjectToFile(src_key, src_uri, dst_uri, headers)
elif src_uri.is_file_uri() and dst_uri.is_file_uri():
return self._CopyFileToFile(src_key, src_uri, dst_uri, headers)
else:
raise CommandException('Unexpected src/dest case')
def _ExpandDstUri(self, dst_uri_str):
"""
Expands wildcard if present in dst_uri_str.
Args:
dst_uri_str: String representation of requested dst_uri.
Returns:
(exp_dst_uri, have_existing_dst_container)
where have_existing_dst_container is a bool indicating whether
exp_dst_uri names an existing directory, bucket, or bucket subdirectory.
Raises:
CommandException: if dst_uri_str matched more than 1 URI.
"""
dst_uri = self.suri_builder.StorageUri(dst_uri_str)
# Handle wildcarded dst_uri case.
if ContainsWildcard(dst_uri):
blr_expansion = list(self.WildcardIterator(dst_uri))
if len(blr_expansion) != 1:
raise CommandException('Destination (%s) must match exactly 1 URI' %
dst_uri_str)
blr = blr_expansion[0]
uri = blr.GetUri()
if uri.is_cloud_uri():
return (uri, uri.names_bucket() or blr.HasPrefix()
or blr.GetKey().endswith('/'))
else:
return (uri, uri.names_directory())
# Handle non-wildcarded dst_uri:
if dst_uri.is_file_uri():
return (dst_uri, dst_uri.names_directory())
if dst_uri.names_bucket():
return (dst_uri, True)
# For object URIs check 3 cases: (a) if the name ends with '/' treat as a
# subdir; else, perform a wildcard expansion with dst_uri + "*" and then
# find if (b) there's a Prefix matching dst_uri, or (c) name is of form
# dir_$folder$ (and in both these cases also treat dir as a subdir).
if dst_uri.is_cloud_uri() and dst_uri_str.endswith('/'):
return (dst_uri, True)
blr_expansion = list(self.WildcardIterator(
'%s*' % dst_uri_str.rstrip(dst_uri.delim)))
for blr in blr_expansion:
if blr.GetRStrippedUriString().endswith('_$folder$'):
return (dst_uri, True)
if blr.GetRStrippedUriString() == dst_uri_str.rstrip(dst_uri.delim):
return (dst_uri, blr.HasPrefix())
return (dst_uri, False)
def _ConstructDstUri(self, src_uri, exp_src_uri,
src_uri_names_container, src_uri_expands_to_multi,
have_multiple_srcs, exp_dst_uri,
have_existing_dest_subdir):
"""
Constructs the destination URI for a given exp_src_uri/exp_dst_uri pair,
using context-dependent naming rules that mimic Linux cp and mv behavior.
Args:
src_uri: src_uri to be copied.
exp_src_uri: Single StorageUri from wildcard expansion of src_uri.
src_uri_names_container: True if src_uri names a container (including the
case of a wildcard-named bucket subdir (like gs://bucket/abc,
where gs://bucket/abc/* matched some objects). Note that this is
additional semantics tha src_uri.names_container() doesn't understand
because the latter only understands StorageUris, not wildcards.
src_uri_expands_to_multi: True if src_uri expanded to multiple URIs.
have_multiple_srcs: True if this is a multi-source request. This can be
true if src_uri wildcard-expanded to multiple URIs or if there were
multiple source URIs in the request.
exp_dst_uri: the expanded StorageUri requested for the cp destination.
Final written path is constructed from this plus a context-dependent
variant of src_uri.
have_existing_dest_subdir: bool indicator whether dest is an existing
subdirectory.
Returns:
StorageUri to use for copy.
Raises:
CommandException if destination object name not specified for
source and source is a stream.
"""
if self._ShouldTreatDstUriAsSingleton(
have_multiple_srcs, have_existing_dest_subdir, exp_dst_uri):
# We're copying one file or object to one file or object.
return exp_dst_uri
if exp_src_uri.is_stream():
if exp_dst_uri.names_container():
raise CommandException('Destination object name needed when '
'source is a stream')
return exp_dst_uri
if not self.recursion_requested and not have_multiple_srcs:
# We're copying one file or object to a subdirectory. Append final comp
# of exp_src_uri to exp_dst_uri.
src_final_comp = exp_src_uri.object_name.rpartition(src_uri.delim)[-1]
return self.suri_builder.StorageUri('%s%s%s' % (
exp_dst_uri.uri.rstrip(exp_dst_uri.delim), exp_dst_uri.delim,
src_final_comp))
# Else we're copying multiple sources to a directory, bucket, or a bucket
# "sub-directory".
# Ensure exp_dst_uri ends in delim char if we're doing a multi-src copy or
# a copy to a directory. (The check for copying to a directory needs
# special-case handling so that the command:
# gsutil cp gs://bucket/obj dir
# will turn into file://dir/ instead of file://dir -- the latter would cause
# the file "dirobj" to be created.)
# Note: need to check have_multiple_srcs or src_uri.names_container()
# because src_uri could be a bucket containing a single object, named
# as gs://bucket.
if ((have_multiple_srcs or src_uri.names_container()
or os.path.isdir(exp_dst_uri.object_name))
and not exp_dst_uri.uri.endswith(exp_dst_uri.delim)):
exp_dst_uri = exp_dst_uri.clone_replace_name(
'%s%s' % (exp_dst_uri.object_name, exp_dst_uri.delim)
)
# Making naming behavior match how things work with local Linux cp and mv
# operations depends on many factors, including whether the destination is a
# container, the plurality of the source(s), and whether the mv command is
# being used:
# 1. For the "mv" command that specifies a non-existent destination subdir,
# renaming should occur at the level of the src subdir, vs appending that
# subdir beneath the dst subdir like is done for copying. For example:
# gsutil rm -R gs://bucket
# gsutil cp -R dir1 gs://bucket
# gsutil cp -R dir2 gs://bucket/subdir1
# gsutil mv gs://bucket/subdir1 gs://bucket/subdir2
# would (if using cp naming behavior) end up with paths like:
# gs://bucket/subdir2/subdir1/dir2/.svn/all-wcprops
# whereas mv naming behavior should result in:
# gs://bucket/subdir2/dir2/.svn/all-wcprops
# 2. Copying from directories, buckets, or bucket subdirs should result in
# objects/files mirroring the source directory hierarchy. For example:
# gsutil cp dir1/dir2 gs://bucket
# should create the object gs://bucket/dir2/file2, assuming dir1/dir2
# contains file2).
# To be consistent with Linux cp behavior, there's one more wrinkle when
# working with subdirs: The resulting object names depend on whether the
# destination subdirectory exists. For example, if gs://bucket/subdir
# exists, the command:
# gsutil cp -R dir1/dir2 gs://bucket/subdir
# should create objects named like gs://bucket/subdir/dir2/a/b/c. In
# contrast, if gs://bucket/subdir does not exist, this same command
# should create objects named like gs://bucket/subdir/a/b/c.
# 3. Copying individual files or objects to dirs, buckets or bucket subdirs
# should result in objects/files named by the final source file name
# component. Example:
# gsutil cp dir1/*.txt gs://bucket
# should create the objects gs://bucket/f1.txt and gs://bucket/f2.txt,
# assuming dir1 contains f1.txt and f2.txt.
if (self.perform_mv and self.recursion_requested
and src_uri_expands_to_multi and not have_existing_dest_subdir):
# Case 1. Handle naming rules for bucket subdir mv. Here we want to
# line up the src_uri against its expansion, to find the base to build
# the new name. For example, running the command:
# gsutil mv gs://bucket/abcd gs://bucket/xyz
# when processing exp_src_uri=gs://bucket/abcd/123
# exp_src_uri_tail should become /123
# Note: mv.py code disallows wildcard specification of source URI.
exp_src_uri_tail = exp_src_uri.uri[len(src_uri.uri):]
dst_key_name = '%s/%s' % (exp_dst_uri.object_name.rstrip('/'),
exp_src_uri_tail.strip('/'))
return exp_dst_uri.clone_replace_name(dst_key_name)
if src_uri_names_container and not exp_dst_uri.names_file():
# Case 2. Build dst_key_name from subpath of exp_src_uri past
# where src_uri ends. For example, for src_uri=gs://bucket/ and
# exp_src_uri=gs://bucket/src_subdir/obj, dst_key_name should be
# src_subdir/obj.
src_uri_path_sans_final_dir = _GetPathBeforeFinalDir(src_uri)
dst_key_name = exp_src_uri.uri[
len(src_uri_path_sans_final_dir):].lstrip(src_uri.delim)
# Handle case where dst_uri is a non-existent subdir.
if not have_existing_dest_subdir:
dst_key_name = dst_key_name.partition(src_uri.delim)[-1]
# Handle special case where src_uri was a directory named with '.' or
# './', so that running a command like:
# gsutil cp -r . gs://dest
# will produce obj names of the form gs://dest/abc instead of
# gs://dest/./abc.
if dst_key_name.startswith('.%s' % os.sep):
dst_key_name = dst_key_name[2:]
else:
# Case 3.
dst_key_name = exp_src_uri.object_name.rpartition(src_uri.delim)[-1]
if (exp_dst_uri.is_file_uri()
or self._ShouldTreatDstUriAsBucketSubDir(
have_multiple_srcs, exp_dst_uri, have_existing_dest_subdir)):
if exp_dst_uri.object_name.endswith(exp_dst_uri.delim):
dst_key_name = '%s%s%s' % (
exp_dst_uri.object_name.rstrip(exp_dst_uri.delim),
exp_dst_uri.delim, dst_key_name)
else:
delim = exp_dst_uri.delim if exp_dst_uri.object_name else ''
dst_key_name = '%s%s%s' % (exp_dst_uri.object_name, delim, dst_key_name)
return exp_dst_uri.clone_replace_name(dst_key_name)
def _FixWindowsNaming(self, src_uri, dst_uri):
"""
Rewrites the destination URI built by _ConstructDstUri() to translate
Windows pathnames to cloud pathnames if needed.
Args:
src_uri: Source URI to be copied.
dst_uri: The destination URI built by _ConstructDstUri().
Returns:
StorageUri to use for copy.
"""
if (src_uri.is_file_uri() and src_uri.delim == '\\'
and dst_uri.is_cloud_uri()):
trans_uri_str = re.sub(r'\\', '/', dst_uri.uri)
dst_uri = self.suri_builder.StorageUri(trans_uri_str)
return dst_uri
# Command entry point.
def RunCommand(self):
# Inner funcs.
def _CopyExceptionHandler(e):
"""Simple exception handler to allow post-completion status."""
self.THREADED_LOGGER.error(str(e))
self.copy_failure_count += 1
def _CopyFunc(name_expansion_result):
"""Worker function for performing the actual copy (and rm, for mv)."""
if self.perform_mv:
cmd_name = 'mv'
else:
cmd_name = self.command_name
src_uri = self.suri_builder.StorageUri(
name_expansion_result.GetSrcUriStr())
exp_src_uri = self.suri_builder.StorageUri(
name_expansion_result.GetExpandedUriStr())
src_uri_names_container = name_expansion_result.NamesContainer()
src_uri_expands_to_multi = name_expansion_result.NamesContainer()
have_multiple_srcs = name_expansion_result.IsMultiSrcRequest()
have_existing_dest_subdir = (
name_expansion_result.HaveExistingDstContainer())
if src_uri.names_provider():
raise CommandException(
'The %s command does not allow provider-only source URIs (%s)' %
(cmd_name, src_uri))
if have_multiple_srcs:
self._InsistDstUriNamesContainer(exp_dst_uri,
have_existing_dst_container,
cmd_name)
if self.perform_mv:
if name_expansion_result.NamesContainer():
# Use recursion_requested when performing name expansion for the
# directory mv case so we can determine if any of the source URIs are
# directories (and then use cp -R and rm -R to perform the move, to
# match the behavior of Linux mv (which when moving a directory moves
# all the contained files).
self.recursion_requested = True
# Disallow wildcard src URIs when moving directories, as supporting it
# would make the name transformation too complex and would also be
# dangerous (e.g., someone could accidentally move many objects to the
# wrong name, or accidentally overwrite many objects).
if ContainsWildcard(src_uri):
raise CommandException('The mv command disallows naming source '
'directories using wildcards')
if (exp_dst_uri.is_file_uri()
and not os.path.exists(exp_dst_uri.object_name)
and have_multiple_srcs):
os.makedirs(exp_dst_uri.object_name)
dst_uri = self._ConstructDstUri(src_uri, exp_src_uri,
src_uri_names_container,
src_uri_expands_to_multi,
have_multiple_srcs, exp_dst_uri,
have_existing_dest_subdir)
dst_uri = self._FixWindowsNaming(src_uri, dst_uri)
self._CheckForDirFileConflict(exp_src_uri, dst_uri)
if self._SrcDstSame(exp_src_uri, dst_uri):
raise CommandException('%s: "%s" and "%s" are the same file - '
'abort.' % (cmd_name, exp_src_uri, dst_uri))
if dst_uri.is_cloud_uri() and dst_uri.is_version_specific:
raise CommandException('%s: a version-specific URI\n(%s)\ncannot be '
'the destination for gsutil cp - abort.'
% (cmd_name, dst_uri))
elapsed_time = bytes_transferred = 0
try:
(elapsed_time, bytes_transferred, result_uri) = (
self._PerformCopy(exp_src_uri, dst_uri))
except Exception, e:
if self._IsNoClobberServerException(e):
if not self.quiet:
self.THREADED_LOGGER.info('Rejected (noclobber): %s' % dst_uri.uri)
elif self.continue_on_error:
if not self.quiet:
self.THREADED_LOGGER.error('Error copying %s: %s' % (src_uri.uri,
str(e)))
self.copy_failure_count += 1
else:
raise
if self.print_ver:
# Some cases don't return a version-specific URI (e.g., if destination
# is a file).
if hasattr(result_uri, 'version_specific_uri'):
self.THREADED_LOGGER.info('Created: %s' %
result_uri.version_specific_uri)
else:
self.THREADED_LOGGER.info('Created: %s' % result_uri.uri)
# TODO: If we ever use -n (noclobber) with -M (move) (not possible today
# since we call copy internally from move and don't specify the -n flag)
# we'll need to only remove the source when we have not skipped the
# destination.
if self.perform_mv:
if not self.quiet:
self.THREADED_LOGGER.info('Removing %s...', exp_src_uri)
exp_src_uri.delete_key(validate=False, headers=self.headers)
stats_lock.acquire()
self.total_elapsed_time += elapsed_time
self.total_bytes_transferred += bytes_transferred
stats_lock.release()
# Start of RunCommand code.
self._ParseArgs()
self.total_elapsed_time = self.total_bytes_transferred = 0
if self.args[-1] == '-' or self.args[-1] == 'file://-':
self._HandleStreamingDownload()
return 0
if self.read_args_from_stdin:
if len(self.args) != 1:
raise CommandException('Source URIs cannot be specified with -I option')
uri_strs = self._StdinIterator()
else:
if len(self.args) < 2:
raise CommandException('Wrong number of arguments for "cp" command.')
uri_strs = self.args[0:len(self.args)-1]
(exp_dst_uri, have_existing_dst_container) = self._ExpandDstUri(
self.args[-1])
name_expansion_iterator = NameExpansionIterator(
self.command_name, self.proj_id_handler, self.headers, self.debug,
self.bucket_storage_uri_class, uri_strs,
self.recursion_requested or self.perform_mv,
have_existing_dst_container)
# Use a lock to ensure accurate statistics in the face of
# multi-threading/multi-processing.
stats_lock = threading.Lock()
# Tracks if any copies failed.
self.copy_failure_count = 0
# Start the clock.
start_time = time.time()
# Tuple of attributes to share/manage across multiple processes in
# parallel (-m) mode.
shared_attrs = ('copy_failure_count', 'total_bytes_transferred')
# Perform copy requests in parallel (-m) mode, if requested, using
# configured number of parallel processes and threads. Otherwise,
# perform requests with sequential function calls in current process.
self.Apply(_CopyFunc, name_expansion_iterator, _CopyExceptionHandler,
shared_attrs)
if self.debug:
print 'total_bytes_transferred:' + str(self.total_bytes_transferred)
end_time = time.time()
self.total_elapsed_time = end_time - start_time
# Sometimes, particularly when running unit tests, the total elapsed time
# is really small. On Windows, the timer resolution is too small and
# causes total_elapsed_time to be zero.
try:
float(self.total_bytes_transferred) / float(self.total_elapsed_time)
except ZeroDivisionError:
self.total_elapsed_time = 0.01
self.total_bytes_per_second = (float(self.total_bytes_transferred) /
float(self.total_elapsed_time))
if self.debug == 3:
# Note that this only counts the actual GET and PUT bytes for the copy
# - not any transfers for doing wildcard expansion, the initial HEAD
# request boto performs when doing a bucket.get_key() operation, etc.
if self.total_bytes_transferred != 0:
self.THREADED_LOGGER.info(
'Total bytes copied=%d, total elapsed time=%5.3f secs (%sps)',
self.total_bytes_transferred, self.total_elapsed_time,
MakeHumanReadable(self.total_bytes_per_second))
if self.copy_failure_count:
plural_str = ''
if self.copy_failure_count > 1:
plural_str = 's'
raise CommandException('%d file%s/object%s could not be transferred.' % (
self.copy_failure_count, plural_str, plural_str))
return 0
def _ParseArgs(self):
self.perform_mv = False
self.exclude_symlinks = False
self.quiet = False
self.no_clobber = False
self.continue_on_error = False
self.daisy_chain = False
self.read_args_from_stdin = False
self.print_ver = False
# self.recursion_requested initialized in command.py (so can be checked
# in parent class for all commands).
if self.sub_opts:
for o, unused_a in self.sub_opts:
if o == '-c':
self.continue_on_error = True
elif o == '-D':
self.daisy_chain = True
elif o == '-e':
self.exclude_symlinks = True
elif o == '-I':
self.read_args_from_stdin = True
elif o == '-M':
# Note that we signal to the cp command to perform a move (copy
# followed by remove) and use directory-move naming rules by passing
# the undocumented (for internal use) -M option when running the cp
# command from mv.py.
self.perform_mv = True
elif o == '-n':
self.no_clobber = True
elif o == '-q':
self.quiet = True
elif o == '-r' or o == '-R':
self.recursion_requested = True
elif o == '-v':
self.print_ver = True
def _HandleStreamingDownload(self):
# Destination is <STDOUT>. Manipulate sys.stdout so as to redirect all
# debug messages to <STDERR>.
stdout_fp = sys.stdout
sys.stdout = sys.stderr
did_some_work = False
for uri_str in self.args[0:len(self.args)-1]:
for uri in self.WildcardIterator(uri_str).IterUris():
did_some_work = True
key = uri.get_key(False, self.headers)
(elapsed_time, bytes_transferred) = self._PerformDownloadToStream(
key, uri, stdout_fp, self.headers)
self.total_elapsed_time += elapsed_time
self.total_bytes_transferred += bytes_transferred
if not did_some_work:
raise CommandException('No URIs matched')
if self.debug == 3:
if self.total_bytes_transferred != 0:
self.THREADED_LOGGER.info(
'Total bytes copied=%d, total elapsed time=%5.3f secs (%sps)',
self.total_bytes_transferred, self.total_elapsed_time,
MakeHumanReadable(float(self.total_bytes_transferred) /
float(self.total_elapsed_time)))
def _StdinIterator(self):
"""A generator function that returns lines from stdin."""
for line in sys.stdin:
# Strip CRLF.
yield line.rstrip()
def _SrcDstSame(self, src_uri, dst_uri):
"""Checks if src_uri and dst_uri represent the same object or file.
We don't handle anything about hard or symbolic links.
Args:
src_uri: Source StorageUri.
dst_uri: Destination StorageUri.
Returns:
Bool indicator.
"""
if src_uri.is_file_uri() and dst_uri.is_file_uri():
# Translate a/b/./c to a/b/c, so src=dst comparison below works.
new_src_path = os.path.normpath(src_uri.object_name)
new_dst_path = os.path.normpath(dst_uri.object_name)
return (src_uri.clone_replace_name(new_src_path).uri ==
dst_uri.clone_replace_name(new_dst_path).uri)
else:
return (src_uri.uri == dst_uri.uri and
src_uri.generation == dst_uri.generation and
src_uri.version_id == dst_uri.version_id)
def _ShouldTreatDstUriAsBucketSubDir(self, have_multiple_srcs, dst_uri,
have_existing_dest_subdir):
"""
Checks whether dst_uri should be treated as a bucket "sub-directory". The
decision about whether something constitutes a bucket "sub-directory"
depends on whether there are multiple sources in this request and whether
there is an existing bucket subdirectory. For example, when running the
command:
gsutil cp file gs://bucket/abc
if there's no existing gs://bucket/abc bucket subdirectory we should copy
file to the object gs://bucket/abc. In contrast, if
there's an existing gs://bucket/abc bucket subdirectory we should copy
file to gs://bucket/abc/file. And regardless of whether gs://bucket/abc
exists, when running the command:
gsutil cp file1 file2 gs://bucket/abc
we should copy file1 to gs://bucket/abc/file1 (and similarly for file2).
Note that we don't disallow naming a bucket "sub-directory" where there's
already an object at that URI. For example it's legitimate (albeit
confusing) to have an object called gs://bucket/dir and
then run the command
gsutil cp file1 file2 gs://bucket/dir
Doing so will end up with objects gs://bucket/dir, gs://bucket/dir/file1,
and gs://bucket/dir/file2.
Args:
have_multiple_srcs: Bool indicator of whether this is a multi-source
operation.
dst_uri: StorageUri to check.
have_existing_dest_subdir: bool indicator whether dest is an existing
subdirectory.
Returns:
bool indicator.
"""
return ((have_multiple_srcs and dst_uri.is_cloud_uri())
or (have_existing_dest_subdir))
def _ShouldTreatDstUriAsSingleton(self, have_multiple_srcs,
have_existing_dest_subdir, dst_uri):
"""
Checks that dst_uri names a singleton (file or object) after
dir/wildcard expansion. The decision is more nuanced than simply
dst_uri.names_singleton()) because of the possibility that an object path
might name a bucket sub-directory.
Args:
have_multiple_srcs: Bool indicator of whether this is a multi-source
operation.
have_existing_dest_subdir: bool indicator whether dest is an existing
subdirectory.
dst_uri: StorageUri to check.
Returns:
bool indicator.
"""
if have_multiple_srcs:
# Only a file meets the criteria in this case.
return dst_uri.names_file()
return not have_existing_dest_subdir and dst_uri.names_singleton()
def _IsNoClobberServerException(self, e):
"""
Checks to see if the server attempted to clobber a file after we specified
in the header that we didn't want the file clobbered.
Args:
e: The Exception that was generated by a failed copy operation
Returns:
bool indicator - True indicates that the server did attempt to clobber
an existing file.
"""
return self.no_clobber and (
(isinstance(e, GSResponseError) and e.status==412) or
(isinstance(e, ResumableUploadException) and 'code 412' in e.message))
def _GetPathBeforeFinalDir(uri):
"""
Returns the part of the path before the final directory component for the
given URI, handling cases for file system directories, bucket, and bucket
subdirectories. Example: for gs://bucket/dir/ we'll return 'gs://bucket',
and for file://dir we'll return file://
Args:
uri: StorageUri.
Returns:
String name of above-described path, sans final path separator.
"""
sep = uri.delim
assert not uri.names_file()
if uri.names_directory():
past_scheme = uri.uri[len('file://'):]
if past_scheme.find(sep) == -1:
return 'file://'
else:
return 'file://%s' % past_scheme.rstrip(sep).rpartition(sep)[0]
if uri.names_bucket():
return '%s://' % uri.scheme
# Else it names a bucket subdir.
return uri.uri.rstrip(sep).rpartition(sep)[0]
def _hash_filename(filename):
"""
Apply a hash function (SHA1) to shorten the passed file name. The spec
for the hashed file name is as follows:
TRACKER_<hash>_<trailing>
where hash is a SHA1 hash on the original file name and trailing is
the last 16 chars from the original file name. Max file name lengths
vary by operating system so the goal of this function is to ensure
the hashed version takes fewer than 100 characters.
Args:
filename: file name to be hashed.
Returns:
shorter, hashed version of passed file name
"""
if not isinstance(filename, unicode):
filename = unicode(filename, 'utf8').encode('utf-8')
m = hashlib.sha1(filename)
return "TRACKER_" + m.hexdigest() + '.' + filename[-16:]
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil disablelogging uri...
<B>DESCRIPTION</B>
This command will disable access logging of the buckets named by the
specified uris. All URIs must name buckets (e.g., gs://bucket).
No logging data is removed from the log buckets when you disable logging,
but Google Cloud Storage will stop delivering new logs once you have
run this command.
""")
class DisableLoggingCommand(Command):
"""Implementation of disablelogging command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'disablelogging',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'disablelogging',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Disable logging on buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
did_some_work = False
for uri_str in self.args:
for uri in self.WildcardIterator(uri_str).IterUris():
if uri.names_object():
raise CommandException('disablelogging cannot be applied to objects')
did_some_work = True
print 'Disabling logging on %s...' % uri
self.proj_id_handler.FillInProjectHeaderIfNeeded('disablelogging',
uri, self.headers)
uri.disable_logging(False, self.headers)
if not did_some_work:
raise CommandException('No URIs matched')
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil enablelogging -b logging_bucket [-o log_object_prefix] uri...
<B>DESCRIPTION</B>
Google Cloud Storage offers access logs and storage data in the form of
CSV files that you can download and view. Access logs provide information
for all of the requests made on a specified bucket in the last 24 hours,
while the storage logs provide information about the storage consumption of
that bucket for the last 24 hour period. The logs and storage data files
are automatically created as new objects in a bucket that you specify, in
24 hour intervals.
The gsutil enablelogging command will enable access logging of the
buckets named by the specified uris, outputting log files in the specified
logging_bucket. logging_bucket must already exist, and all URIs must name
buckets (e.g., gs://bucket). For example, the command:
gsutil enablelogging -b gs://my_logging_bucket -o AccessLog \\
gs://my_bucket1 gs://my_bucket2
will cause all read and write activity to objects in gs://mybucket1 and
gs://mybucket2 to be logged to objects prefixed with the name "AccessLog",
with those log objects written to the bucket gs://my_logging_bucket.
Note that log data may contain sensitive information, so you should make
sure to set an appropriate default bucket ACL to protect that data. (See
"gsutil help setdefacl".)
You can check logging status using the gsutil getlogging command. For log
format details see "gsutil help getlogging".
<B>OPTIONS</B>
-b bucket Specifies the log bucket.
-o prefix Specifies the prefix for log object names. Default value
is the bucket name.
""")
class EnableLoggingCommand(Command):
"""Implementation of gsutil enablelogging command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'enablelogging',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'b:o:',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'enablelogging',
# List of help name aliases.
HELP_NAME_ALIASES : ['logging', 'logs', 'log'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Enable logging on buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
# Disallow multi-provider enablelogging calls, because the schemas
# differ.
storage_uri = self.UrisAreForSingleProvider(self.args)
if not storage_uri:
raise CommandException('enablelogging command spanning providers not '
'allowed.')
target_bucket_uri = None
target_prefix = None
for opt, opt_arg in self.sub_opts:
if opt == '-b':
target_bucket_uri = self.suri_builder.StorageUri(opt_arg)
if opt == '-o':
target_prefix = opt_arg
if not target_bucket_uri:
raise CommandException('enablelogging requires \'-b <log_bucket>\' '
'option')
if not target_bucket_uri.names_bucket():
raise CommandException('-b option must specify a bucket uri')
did_some_work = False
for uri_str in self.args:
for uri in self.WildcardIterator(uri_str).IterUris():
if uri.names_object():
raise CommandException('enablelogging cannot be applied to objects')
did_some_work = True
print 'Enabling logging on %s...' % uri
self.proj_id_handler.FillInProjectHeaderIfNeeded(
'enablelogging', storage_uri, self.headers)
uri.enable_logging(target_bucket_uri.bucket_name, target_prefix, False,
self.headers)
if not did_some_work:
raise CommandException('No URIs matched')
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil getacl uri
<B>DESCRIPTION</B>
Gets ACL XML for a bucket or object, which you can save and edit for the
setacl command.
""")
class GetAclCommand(Command):
"""Implementation of gsutil getacl command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'getacl',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'v',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'getacl',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Get ACL XML for a bucket or object',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
self.GetAclCommandHelper()
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import xml
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil getcors uri
<B>DESCRIPTION</B>
Gets the Cross-Origin Resource Sharing (CORS) configuration for a given
bucket. This command is supported for buckets only, not objects and you
can get the CORS settings for only one bucket at a time. The output from
getcors can be redirected into a file, edited and then updated via the
setcors sub-command. The CORS configuration is expressed by an XML document
with the following structure:
<?xml version="1.0" ?>
<CorsConfig>
<Cors>
<Origins>
<Origin>origin1.example.com</Origin>
</Origins>
<Methods>
<Method>GET</Method>
</Methods>
<ResponseHeaders>
<ResponseHeader>Content-Type</ResponseHeader>
</ResponseHeaders>
</Cors>
</CorsConfig>
For more info about CORS, see http://www.w3.org/TR/cors/.
""")
class GetCorsCommand(Command):
"""Implementation of gsutil getcors command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'getcors',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'getcors',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help)
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Get a bucket\'s CORS XML document',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
# Wildcarding is allowed but must resolve to just one bucket.
uris = list(self.WildcardIterator(self.args[0]).IterUris())
if len(uris) == 0:
raise CommandException('No URIs matched')
if len(uris) != 1:
raise CommandException('%s matched more than one URI, which is not\n'
'allowed by the %s command' % (self.args[0], self.command_name))
uri = uris[0]
if not uri.names_bucket():
raise CommandException('"%s" command must specify a bucket' %
self.command_name)
cors = uri.get_cors(False, self.headers)
# Pretty-print the XML to make it more easily human editable.
parsed_xml = xml.dom.minidom.parseString(cors.to_xml().encode('utf-8'))
sys.stdout.write(parsed_xml.toprettyxml(indent=' '))
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil getdefacl uri
<B>DESCRIPTION</B>
Gets the default ACL XML for a bucket, which you can save and edit
for use with the setdefacl command.
""")
class GetDefAclCommand(Command):
"""Implementation of gsutil getdefacl command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'getdefacl',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'getdefacl',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Get default ACL XML for a bucket',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
if not self.suri_builder.StorageUri(self.args[-1]).names_bucket():
raise CommandException('URI must name a bucket for the %s command' %
self.command_name)
self.GetAclCommandHelper()
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil getlogging uri
<B>DESCRIPTION</B>
If logging is enabled for the specified bucket uri, the server responds
with a <Logging> XML element that looks something like this:
<?xml version="1.0" ?>
<Logging>
<LogBucket>
logs-bucket
</LogBucket>
<LogObjectPrefix>
my-logs-enabled-bucket
</LogObjectPrefix>
</Logging>
If logging is not enabled, an empty <Logging> element is returned.
You can download log data from your log bucket using the gsutil cp command.
<B>ACCESS LOG FIELDS</B>
Field Type Description
time_micros integer The time that the request was completed, in
microseconds since the Unix epoch.
c_ip string The IP address from which the request was made.
The "c" prefix indicates that this is information
about the client.
c_ip_type integer The type of IP in the c_ip field:
A value of 1 indicates an IPV4 address.
A value of 2 indicates an IPV6 address.
c_ip_region string Reserved for future use.
cs_method string The HTTP method of this request. The "cs" prefix
indicates that this information was sent from the
client to the server.
cs_uri string The URI of the request.
sc_status integer The HTTP status code the server sent in response.
The "sc" prefix indicates that this information
was sent from the server to the client.
cs_bytes integer The number of bytes sent in the request.
sc_bytes integer The number of bytes sent in the response.
time_taken_micros integer The time it took to serve the request in
microseconds.
cs_host string The host in the original request.
cs_referrer string The HTTP referrer for the request.
cs_user_agent string The User-Agent of the request.
s_request_id string The request identifier.
cs_operation string The Google Cloud Storage operation e.g.
GET_Object.
cs_bucket string The bucket specified in the request. If this is a
list buckets request, this can be null.
cs_object string The object specified in this request. This can be
null.
<B>STORAGE DATA FIELDS</B>
Field Type Description
bucket string The name of the bucket.
storage_byte_hours integer Average size in bytes/per hour of that bucket.
""")
class GetLoggingCommand(Command):
"""Implementation of gsutil getlogging command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'getlogging',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'getlogging',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Get logging configuration for a bucket',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
self.GetXmlSubresource('logging', self.args[0])
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil getversioning bucket_uri
<B>DESCRIPTION</B>
The Versioning Configuration feature enables you to configure a Google Cloud
Storage bucket to keep old versions of objects.
The gsutil getversioning command gets the versioning configuration for a
bucket, and displays an XML representation of the configuration.
In Google Cloud Storage, this would look like:
<?xml version="1.0" ?>
<VersioningConfiguration>
<Status>
Enabled
</Status>
</VersioningConfiguration>
""")
class GetVersioningCommand(Command):
"""Implementation of gsutil getversioning command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'getversioning',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'getversioning',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help)
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Get the versioning configuration '
'for one or more buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
uri_args = self.args
# Iterate over URIs, expanding wildcards, and getting the website
# configuration on each.
some_matched = False
for uri_str in uri_args:
for blr in self.WildcardIterator(uri_str):
uri = blr.GetUri()
if not uri.names_bucket():
raise CommandException('URI %s must name a bucket for the %s command'
% (str(uri), self.command_name))
some_matched = True
uri_str = '%s://%s' % (uri.scheme, uri.bucket_name)
if uri.get_versioning_config():
print '%s: Enabled' % uri_str
else:
print '%s: Suspended' % uri_str
if not some_matched:
raise CommandException('No URIs matched')
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from xml.dom.minidom import parseString as XmlParseString
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil getwebcfg bucket_uri
<B>DESCRIPTION</B>
The Website Configuration feature enables you to configure a Google Cloud
Storage bucket to simulate the behavior of a static website. You can define
main pages or directory indices (for example, index.html) for buckets and
"directories". Also, you can define a custom error page in case a requested
resource does not exist.
The gsutil getwebcfg command gets the web semantics configuration for a
bucket, and displays an XML representation of the configuration.
In Google Cloud Storage, this would look like:
<?xml version="1.0" ?>
<WebsiteConfiguration>
<MainPageSuffix>
index.html
</MainPageSuffix>
<NotFoundPage>
404.html
</NotFoundPage>
</WebsiteConfiguration>
""")
class GetWebcfgCommand(Command):
"""Implementation of gsutil getwebcfg command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'getwebcfg',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1, # Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'getwebcfg',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help)
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : ('Get the website configuration '
'for one or more buckets'),
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
uri_args = self.args
# Iterate over URIs, expanding wildcards, and getting the website
# configuration on each.
some_matched = False
for uri_str in uri_args:
for blr in self.WildcardIterator(uri_str):
uri = blr.GetUri()
if not uri.names_bucket():
raise CommandException('URI %s must name a bucket for the %s command'
% (str(uri), self.command_name))
some_matched = True
sys.stderr.write('Getting website config on %s...\n' % uri)
_, xml_body = uri.get_website_config()
sys.stdout.write(XmlParseString(xml_body).toprettyxml())
if not some_matched:
raise CommandException('No URIs matched')
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import gslib
import itertools
import os
import re
import struct
import sys
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import ALL_HELP_TYPES
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HelpProvider
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.help_provider import MAX_HELP_NAME_LEN
from subprocess import PIPE
from subprocess import Popen
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil help [command or topic]
<B>DESCRIPTION</B>
Running:
gsutil help
will provide a summary of all commands and additional topics on which
help is available.
Running:
gsutil help command or topic
will provide help about the specified command or topic.
If you set the PAGER environment variable to the path to a pager program
(such as /bin/less on Linux), long help sections will be piped through
the specified pager.
""")
top_level_usage_string = (
"Usage: gsutil [-d][-D] [-h header]... [-m] [command [opts...] args...]"
)
class HelpCommand(Command):
"""Implementation of gsutil help command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'help',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['?', 'man'],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : True,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : False,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'help',
# List of help name aliases.
HELP_NAME_ALIASES : ['?'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Get help about commands and topics',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
(help_type_map, help_name_map) = self._LoadHelpMaps()
output = []
if not len(self.args):
output.append('%s\nAvailable commands:\n' % top_level_usage_string)
format_str = ' %-' + str(MAX_HELP_NAME_LEN) + 's%s\n'
for help_prov in sorted(help_type_map[HelpType.COMMAND_HELP],
key=lambda hp: hp.help_spec[HELP_NAME]):
output.append(format_str % (help_prov.help_spec[HELP_NAME],
help_prov.help_spec[HELP_ONE_LINE_SUMMARY]))
output.append('\nAdditional help topics:\n')
for help_prov in sorted(help_type_map[HelpType.ADDITIONAL_HELP],
key=lambda hp: hp.help_spec[HELP_NAME]):
output.append(format_str % (help_prov.help_spec[HELP_NAME],
help_prov.help_spec[HELP_ONE_LINE_SUMMARY]))
output.append('\nUse gsutil help <command or topic> for detailed help.')
else:
arg = self.args[0]
if arg not in help_name_map:
output.append('No help available for "%s"' % arg)
else:
help_prov = help_name_map[self.args[0]]
output.append('<B>NAME</B>\n')
output.append(' %s - %s\n' % (
help_prov.help_spec[HELP_NAME],
help_prov.help_spec[HELP_ONE_LINE_SUMMARY]))
output.append('\n\n')
output.append(help_prov.help_spec[HELP_TEXT].strip('\n'))
self._OutputHelp(''.join(output))
return 0
def _OutputHelp(self, str):
"""Outputs simply formatted string, paginating if long and PAGER defined"""
# Replace <B> and </B> with terminal formatting strings.
str = re.sub('<B>', '\033[1m', str)
str = re.sub('</B>', '\033[0;0m', str)
num_lines = len(str.split('\n'))
if 'PAGER' in os.environ and num_lines >= self.getTermLines():
# Use -r option for less to make bolding work right.
pager = os.environ['PAGER'].split(' ')
if pager[0].endswith('less'):
pager.append('-r')
try:
Popen(pager, stdin=PIPE).communicate(input=str)
except OSError, e:
raise CommandException('Unable to open pager (%s): %s' %
(' '.join(pager), e))
else:
print str
_DEFAULT_LINES = 25
def getTermLines(self):
"""Returns number of terminal lines"""
# fcntl isn't supported in Windows.
try:
import fcntl
import termios
except ImportError:
return self._DEFAULT_LINES
def ioctl_GWINSZ(fd):
try:
return struct.unpack(
'hh', fcntl.ioctl(fd, termios.TIOCGWINSZ, '1234'))[0]
except:
return 0 # Failure (so will retry on different file descriptor below).
# Try to find a valid number of lines from termio for stdin, stdout,
# or stderr, in that order.
ioc = ioctl_GWINSZ(0) or ioctl_GWINSZ(1) or ioctl_GWINSZ(2)
if not ioc:
try:
fd = os.open(os.ctermid(), os.O_RDONLY)
ioc = ioctl_GWINSZ(fd)
os.close(fd)
except:
pass
if not ioc:
ioc = os.environ.get('LINES', self._DEFAULT_LINES)
return int(ioc)
def _LoadHelpMaps(self):
"""Returns tuple (help type -> [HelpProviders],
help name->HelpProvider dict,
)."""
# Walk gslib/commands and gslib/addlhelp to find all HelpProviders.
for f in os.listdir(os.path.join(self.gsutil_bin_dir, 'gslib', 'commands')):
# Handles no-extension files, etc.
(module_name, ext) = os.path.splitext(f)
if ext == '.py':
__import__('gslib.commands.%s' % module_name)
for f in os.listdir(os.path.join(self.gsutil_bin_dir, 'gslib', 'addlhelp')):
(module_name, ext) = os.path.splitext(f)
if ext == '.py':
__import__('gslib.addlhelp.%s' % module_name)
help_type_map = {}
help_name_map = {}
for s in gslib.help_provider.ALL_HELP_TYPES:
help_type_map[s] = []
# Only include HelpProvider subclasses in the dict.
for help_prov in itertools.chain(
HelpProvider.__subclasses__(), Command.__subclasses__()):
if help_prov is Command:
# Skip the Command base class itself; we just want its subclasses,
# where the help command text lives (in addition to non-Command
# HelpProviders, like naming.py).
continue
gslib.help_provider.SanityCheck(help_prov, help_name_map)
help_name_map[help_prov.help_spec[HELP_NAME]] = help_prov
for help_name_aliases in help_prov.help_spec[HELP_NAME_ALIASES]:
help_name_map[help_name_aliases] = help_prov
help_type_map[help_prov.help_spec[HELP_TYPE]].append(help_prov)
return (help_type_map, help_name_map)
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from boto.s3.deletemarker import DeleteMarker
from gslib.bucket_listing_ref import BucketListingRef
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.plurality_checkable_iterator import PluralityCheckableIterator
from gslib.util import ListingStyle
from gslib.util import MakeHumanReadable
from gslib.util import NO_MAX
from gslib.wildcard_iterator import ContainsWildcard
import boto
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil ls [-a] [-b] [-l] [-L] [-R] [-p proj_id] uri...
<B>LISTING PROVIDERS, BUCKETS, SUBDIRECTORIES, AND OBJECTS</B>
If you run gsutil ls without URIs, it lists all of the Google Cloud Storage
buckets under your default project ID:
gsutil ls
(For details about projects, see "gsutil help projects" and also the -p
option in the OPTIONS section below.)
If you specify one or more provider URIs, gsutil ls will list buckets at
each listed provider:
gsutil ls gs://
If you specify bucket URIs, gsutil ls will list objects at the top level of
each bucket, along with the names of each subdirectory. For example:
gsutil ls gs://bucket
might produce output like:
gs://bucket/obj1.htm
gs://bucket/obj2.htm
gs://bucket/images1/
gs://bucket/images2/
The "/" at the end of the last 2 URIs tells you they are subdirectories,
which you can list using:
gsutil ls gs://bucket/images*
If you specify object URIs, gsutil ls will list the specified objects. For
example:
gsutil ls gs://bucket/*.txt
will list all files whose name matches the above wildcard at the top level
of the bucket.
See "gsutil help wildcards" for more details on working with wildcards.
<B>DIRECTORY BY DIRECTORY, FLAT, and RECURSIVE LISTINGS</B>
Listing a bucket or subdirectory (as illustrated near the end of the previous
section) only shows the objects and names of subdirectories it contains. You
can list all objects in a bucket by using the -R option. For example:
gsutil ls -R gs://bucket
will list the top-level objects and buckets, then the objects and
buckets under gs://bucket/images1, then those under gs://bucket/images2, etc.
If you want to see all objects in the bucket in one "flat" listing use the
recursive ("**") wildcard, like:
gsutil ls -R gs://bucket/**
or, for a flat listing of a subdirectory:
gsutil ls -R gs://bucket/dir/**
<B>LISTING OBJECT DETAILS</B>
If you specify the -l option, gsutil will output additional information
about each matching provider, bucket, subdirectory, or object. For example,
gsutil ls -l gs://bucket/*.txt
will print the object size, creation time stamp, and name of each matching
object, along with the total count and sum of sizes of all matching objects:
2276224 2012-03-02T19:25:17 gs://bucket/obj1
3914624 2012-03-02T19:30:27 gs://bucket/obj2
TOTAL: 2 objects, 6190848 bytes (5.9 MB)
Note that the total listed in parentheses above is in mebibytes (or gibibytes,
tebibytes, etc.), which corresponds to the unit of billing measurement for
Google Cloud Storage.
You can get a listing of all the objects in the top-level bucket directory
(along with the total count and sum of sizes) using a command like:
gsutil ls -l gs://bucket
To print additional detail about objects and buckets use the gsutil ls -L
option. For example:
gsutil ls -L gs://bucket/obj1
will print something like:
gs://bucket/obj1:
Creation Time: Fri, 02 Mar 2012 19:25:17 GMT
Size: 2276224
Cache-Control: private, max-age=0
Content-Type: application/x-executable
ETag: 5ca6796417570a586723b7344afffc81
ACL: <Owner:00b4903a97163d99003117abe64d292561d2b4074fc90ce5c0e35ac45f66ad70, <<UserById: 00b4903a97163d99003117abe64d292561d2b4074fc90ce5c0e35ac45f66ad70>: u'FULL_CONTROL'>>
TOTAL: 1 objects, 2276224 bytes (2.17 MB)
Note that the -L option is slower and more costly to use than the -l option,
because it makes a bucket listing request followed by a HEAD request for
each individual object (rather than just parsing the information it needs
out of a single bucket listing, the way the -l option does).
See also "gsutil help getacl" for getting a more readable version of the ACL.
<B>LISTING BUCKET DETAILS</B>
If you want to see information about the bucket itself, use the -b
option. For example:
gsutil ls -L -b gs://bucket
will print something like:
gs://bucket/ :
24 objects, 29.83 KB
StorageClass: STANDARD
LocationConstraint: US
Versioning enabled: True
ACL: <Owner:00b4903a9740e42c29800f53bd5a9a62a2f96eb3f64a4313a115df3f3a776bf7, <<GroupById: 00b4903a9740e42c29800f53bd5a9a62a2f96eb3f64a4313a115df3f3a776bf7>: u'FULL_CONTROL'>>
Default ACL: <>
TOTAL: 24 objects, 30544 bytes (29.83 KB)
<B>OPTIONS</B>
-l Prints long listing (owner, length).
-L Prints even more detail than -l. This is a separate option because
it makes additional service requests (so, takes longer and adds
requests costs).
-b Prints info about the bucket when used with a bucket URI.
-p proj_id Specifies the project ID to use for listing buckets.
-R, -r Requests a recursive listing.
-a Includes non-current object versions / generations in the listing
(only useful with a versioning-enabled bucket). If combined with
-l option also prints meta-generation for each listed object.
""")
class LsCommand(Command):
"""Implementation of gsutil ls command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'ls',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['dir', 'list'],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'ablLp:rR',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : True,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'ls',
# List of help name aliases.
HELP_NAME_ALIASES : ['dir', 'list'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'List providers, buckets, or objects',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
def _PrintBucketInfo(self, bucket_uri, listing_style):
"""Print listing info for given bucket.
Args:
bucket_uri: StorageUri being listed.
listing_style: ListingStyle enum describing type of output desired.
Returns:
Tuple (total objects, total bytes) in the bucket.
"""
bucket_objs = 0
bucket_bytes = 0
if listing_style == ListingStyle.SHORT:
print bucket_uri
else:
for obj in self.WildcardIterator(
bucket_uri.clone_replace_name('**')).IterKeys():
bucket_objs += 1
bucket_bytes += obj.size
if listing_style == ListingStyle.LONG:
print '%s : %s objects, %s' % (
bucket_uri, bucket_objs, MakeHumanReadable(bucket_bytes))
else: # listing_style == ListingStyle.LONG_LONG:
location_constraint = bucket_uri.get_location(validate=False,
headers=self.headers)
location_output = ''
if location_constraint:
location_output = '\n\tLocationConstraint: %s' % location_constraint
storage_class = bucket_uri.get_storage_class(validate=False,
headers=self.headers)
self.proj_id_handler.FillInProjectHeaderIfNeeded(
'get_acl', bucket_uri, self.headers)
print('%s :\n\t%d objects, %s\n\tStorageClass: %s%s\n'
'\tVersioning enabled: %s\n\tACL: %s\n'
'\tDefault ACL: %s' % (
bucket_uri, bucket_objs, MakeHumanReadable(bucket_bytes),
storage_class, location_output,
bucket_uri.get_versioning_config(),
bucket_uri.get_acl(False, self.headers),
bucket_uri.get_def_acl(False, self.headers)))
return (bucket_objs, bucket_bytes)
def _UriStrForObj(self, uri, obj):
"""Constructs a URI string for the given object.
For example if we were iterating gs://*, obj could be an object in one
of the user's buckets enumerated by the ls command.
Args:
uri: base StorageUri being iterated.
obj: object (Key) being listed.
Returns:
URI string.
"""
version_info = ''
if self.all_versions:
if uri.get_provider().name == 'google' and obj.generation:
version_info = '#%s' % obj.generation
elif uri.get_provider().name == 'aws' and obj.version_id:
if isinstance(obj, DeleteMarker):
version_info = '#<DeleteMarker>' + str(obj.version_id)
else:
version_info = '#' + str(obj.version_id)
else:
version_info = ''
return '%s://%s/%s%s' % (uri.scheme, obj.bucket.name, obj.name,
version_info)
def _PrintInfoAboutBucketListingRef(self, bucket_listing_ref, listing_style):
"""Print listing info for given bucket_listing_ref.
Args:
bucket_listing_ref: BucketListing being listed.
listing_style: ListingStyle enum describing type of output desired.
Returns:
Tuple (number of objects,
object length, if listing_style is one of the long listing formats)
Raises:
Exception: if calling bug encountered.
"""
uri = bucket_listing_ref.GetUri()
obj = bucket_listing_ref.GetKey()
uri_str = self._UriStrForObj(uri, obj)
if listing_style == ListingStyle.SHORT:
print uri_str.encode('utf-8')
return (1, 0)
elif listing_style == ListingStyle.LONG:
# Exclude timestamp fractional secs (example: 2010-08-23T12:46:54.187Z).
timestamp = obj.last_modified[:19].decode('utf8').encode('ascii')
if not isinstance(obj, DeleteMarker):
if self.all_versions:
print '%10s %s %s meta_generation=%s' % (
obj.size, timestamp, uri_str.encode('utf-8'), obj.meta_generation)
else:
print '%10s %s %s' % (obj.size, timestamp, uri_str.encode('utf-8'))
return (1, obj.size)
else:
if self.all_versions:
print '%10s %s %s meta_generation=%s' % (
0, timestamp, uri_str.encode('utf-8'), obj.meta_generation)
else:
print '%10s %s %s' % (0, timestamp, uri_str.encode('utf-8'))
return (0, 1)
elif listing_style == ListingStyle.LONG_LONG:
# Run in a try/except clause so we can continue listings past
# access-denied errors (which can happen because user may have READ
# permission on object and thus see the bucket listing data, but lack
# FULL_CONTROL over individual objects and thus not be able to read
# their ACLs).
try:
print '%s:' % uri_str.encode('utf-8')
suri = self.suri_builder.StorageUri(uri_str)
obj = suri.get_key(False)
print '\tCreation time:\t%s' % obj.last_modified
if obj.cache_control:
print '\tCache-Control:\t%s' % obj.cache_control
if obj.content_disposition:
print '\tContent-Disposition:\t%s' % obj.content_disposition
if obj.content_encoding:
print '\tContent-Encoding:\t%s' % obj.content_encoding
if obj.content_language:
print '\tContent-Language:\t%s' % obj.content_language
print '\tContent-Length:\t%s' % obj.size
print '\tContent-Type:\t%s' % obj.content_type
if obj.metadata:
prefix = uri.get_provider().metadata_prefix
for name in obj.metadata:
print '\t%s%s:\t%s' % (prefix, name, obj.metadata[name])
print '\tETag:\t\t%s' % obj.etag.strip('"\'')
print '\tACL:\t\t%s' % (suri.get_acl(False, self.headers))
return (1, obj.size)
except boto.exception.GSResponseError as e:
if e.status == 403:
print ('\tACL:\t\tACCESS DENIED. Note: you need FULL_CONTROL '
'permission\n\t\t\ton the object to read its ACL.')
return (1, obj.size)
else:
raise e
else:
raise Exception('Unexpected ListingStyle(%s)' % listing_style)
def _ExpandUriAndPrintInfo(self, uri, listing_style, should_recurse=False):
"""
Expands wildcards and directories/buckets for uri as needed, and
calls _PrintInfoAboutBucketListingRef() on each.
Args:
uri: StorageUri being listed.
listing_style: ListingStyle enum describing type of output desired.
should_recurse: bool indicator of whether to expand recursively.
Returns:
Tuple (number of matching objects, number of bytes across these objects).
"""
# We do a two-level loop, with the outer loop iterating level-by-level from
# blrs_to_expand, and the inner loop iterating the matches at the current
# level, printing them, and adding any new subdirs that need expanding to
# blrs_to_expand (to be picked up in the next outer loop iteration).
blrs_to_expand = [BucketListingRef(uri)]
num_objs = 0
num_bytes = 0
expanding_top_level = True
printed_one = False
num_expanded_blrs = 0
while len(blrs_to_expand):
if printed_one:
print
blr = blrs_to_expand.pop(0)
if blr.HasKey():
blr_iterator = iter([blr])
elif blr.HasPrefix():
# Bucket subdir from a previous iteration. Print "header" line only if
# we're listing more than one subdir (or if it's a recursive listing),
# to be consistent with the way UNIX ls works.
if num_expanded_blrs > 1 or should_recurse:
print '%s:' % blr.GetUriString().encode('utf-8')
printed_one = True
blr_iterator = self.WildcardIterator('%s/*' %
blr.GetRStrippedUriString(),
all_versions=self.all_versions)
elif blr.NamesBucket():
blr_iterator = self.WildcardIterator('%s*' % blr.GetUriString(),
all_versions=self.all_versions)
else:
# This BLR didn't come from a bucket listing. This case happens for
# BLR's instantiated from a user-provided URI.
blr_iterator = PluralityCheckableIterator(
_UriOnlyBlrExpansionIterator(
self, blr, all_versions=self.all_versions))
if blr_iterator.is_empty() and not ContainsWildcard(uri):
raise CommandException('No such object %s' % uri)
for cur_blr in blr_iterator:
num_expanded_blrs = num_expanded_blrs + 1
if cur_blr.HasKey():
# Object listing.
(no, nb) = self._PrintInfoAboutBucketListingRef(
cur_blr, listing_style)
num_objs += no
num_bytes += nb
printed_one = True
else:
# Subdir listing. If we're at the top level of a bucket subdir
# listing don't print the list here (corresponding to how UNIX ls
# dir just prints its contents, not the name followed by its
# contents).
if (expanding_top_level and not uri.names_bucket()) or should_recurse:
if cur_blr.GetUriString().endswith('//'):
# Expand gs://bucket// into gs://bucket//* so we don't infinite
# loop. This case happens when user has uploaded an object whose
# name begins with a /.
cur_blr = BucketListingRef(self.suri_builder.StorageUri(
'%s*' % cur_blr.GetUriString()), None, None, cur_blr.headers)
blrs_to_expand.append(cur_blr)
# Don't include the subdir name in the output if we're doing a
# recursive listing, as it will be printed as 'subdir:' when we get
# to the prefix expansion, the next iteration of the main loop.
else:
if listing_style == ListingStyle.LONG:
print '%-33s%s' % (
'', cur_blr.GetUriString().encode('utf-8'))
else:
print cur_blr.GetUriString().encode('utf-8')
expanding_top_level = False
return (num_objs, num_bytes)
# Command entry point.
def RunCommand(self):
got_nomatch_errors = False
listing_style = ListingStyle.SHORT
get_bucket_info = False
self.recursion_requested = False
self.all_versions = False
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-a':
self.all_versions = True
elif o == '-b':
get_bucket_info = True
elif o == '-l':
listing_style = ListingStyle.LONG
elif o == '-L':
listing_style = ListingStyle.LONG_LONG
elif o == '-p':
self.proj_id_handler.SetProjectId(a)
elif o == '-r' or o == '-R':
self.recursion_requested = True
if not self.args:
# default to listing all gs buckets
self.args = ['gs://']
total_objs = 0
total_bytes = 0
for uri_str in self.args:
uri = self.suri_builder.StorageUri(uri_str)
self.proj_id_handler.FillInProjectHeaderIfNeeded('ls', uri, self.headers)
if uri.names_provider():
# Provider URI: use bucket wildcard to list buckets.
for uri in self.WildcardIterator('%s://*' % uri.scheme).IterUris():
(bucket_objs, bucket_bytes) = self._PrintBucketInfo(uri,
listing_style)
total_bytes += bucket_bytes
total_objs += bucket_objs
elif uri.names_bucket():
# Bucket URI -> list the object(s) in that bucket.
if get_bucket_info:
# ls -b bucket listing request: List info about bucket(s).
for uri in self.WildcardIterator(uri).IterUris():
(bucket_objs, bucket_bytes) = self._PrintBucketInfo(uri,
listing_style)
total_bytes += bucket_bytes
total_objs += bucket_objs
else:
# Not -b request: List objects in the bucket(s).
(no, nb) = self._ExpandUriAndPrintInfo(uri, listing_style,
should_recurse=self.recursion_requested)
if no == 0 and ContainsWildcard(uri):
got_nomatch_errors = True
total_objs += no
total_bytes += nb
else:
# URI names an object or object subdir -> list matching object(s) /
# subdirs.
(exp_objs, exp_bytes) = self._ExpandUriAndPrintInfo(uri, listing_style,
should_recurse=self.recursion_requested)
if exp_objs == 0 and ContainsWildcard(uri):
got_nomatch_errors = True
total_bytes += exp_bytes
total_objs += exp_objs
if total_objs and listing_style != ListingStyle.SHORT:
print ('TOTAL: %d objects, %d bytes (%s)' %
(total_objs, total_bytes, MakeHumanReadable(float(total_bytes))))
if got_nomatch_errors:
raise CommandException('One or more URIs matched no objects.')
return 0
class _UriOnlyBlrExpansionIterator:
"""
Iterator that expands a BucketListingRef that contains only a URI (i.e.,
didn't come from a bucket listing), yielding BucketListingRefs to which it
expands. This case happens for BLR's instantiated from a user-provided URI.
Note that we can't use NameExpansionIterator here because it produces an
iteration over the full object names (e.g., expanding "gs://bucket" to
"gs://bucket/dir/o1" and "gs://bucket/dir/o2"), while for the ls command
we need also to see the intermediate directories (like "gs://bucket/dir").
"""
def __init__(self, command_instance, blr, all_versions=False):
self.command_instance = command_instance
self.blr = blr
self.all_versions=all_versions
def __iter__(self):
"""
Args:
command_instance: calling instance of Command class.
blr: BucketListingRef to expand.
Yields:
List of BucketListingRef to which it expands.
"""
# Do a delimited wildcard expansion so we get any matches along with
# whether they are keys or prefixes. That way if bucket contains a key
# 'abcd' and another key 'abce/x.txt' the expansion will return two BLRs,
# the first with HasKey()=True and the second with HasPrefix()=True.
rstripped_versionless_uri_str = self.blr.GetRStrippedUriString()
if ContainsWildcard(rstripped_versionless_uri_str):
for blr in self.command_instance.WildcardIterator(
rstripped_versionless_uri_str, all_versions=self.all_versions):
yield blr
return
# Build a wildcard to expand so CloudWildcardIterator will not just treat it
# as a key and yield the result without doing a bucket listing.
for blr in self.command_instance.WildcardIterator(
rstripped_versionless_uri_str + '*', all_versions=self.all_versions):
# Find the originally specified BucketListingRef in the expanded list (if
# present). Don't just use the expanded list, because it would also
# include objects whose name prefix matches the blr name (because of the
# wildcard match we did above). Note that there can be multiple matches,
# for the case where there's both an object and a subdirectory with the
# same name.
if (blr.GetRStrippedUriString()
== rstripped_versionless_uri_str):
yield blr
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil mb [-c storage_class] [-l location] [-p proj_id] uri...
<B>DESCRIPTION</B>
The mb command creates a new bucket. Google Cloud Storage has a single
namespace, so you will not be allowed to create a bucket with a name already
in use by another user. You can, however, carve out parts of the bucket name
space corresponding to your company's domain name (see "gsutil help naming").
If you don't specify a project ID using the -p option, the bucket
will be created using the default project ID specified in your gsutil
configuration file (see "gsutil help config"). For more details about
projects see "gsutil help projects".
The -c and -l options specify the storage class and location, respectively,
for the bucket. Once a bucket is created in a given location and with a
given storage class, it cannot be moved to a different location, and the
storage class cannot be changed. Instead, you would need to create a new
bucket and move the data over and then delete the original bucket.
<B>BUCKET STORAGE CLASSES</B>
If you don't specify a -c option, the bucket will be created with the default
(standard) storage class.
If you specify -c DURABLE_REDUCED_AVAILABILITY (or -c DRA), it causes the data
stored in the bucket to use durable reduced availability storage. Buckets
created with this storage class have lower availability than standard storage
class buckets, but durability equal to that of buckets created with standard
storage class. This option allows users to reduce costs for data for which
lower availability is acceptable. Durable Reduced Availability storage would
not be appropriate for "hot" objects (i.e., objects being accessed frequently)
or for interactive workloads; however, it might be appropriate for other types
of applications. See the online documentation for pricing and SLA details.
<B>BUCKET LOCATIONS</B>
If you don't specify a -l option, the bucket will be created in the default
location (US). Otherwise, you can specify one of the available locations:
US (United States) or EU (Europe).
<B>OPTIONS</B>
-c storage_class Can be DRA (or DURABLE_REDUCED_AVAILABILITY) or S (or
STANDARD). Default is STANDARD.
-l location Can be US or EU. Default is US. Locations are case
insensitive.
-p proj_id Specifies the project ID under which to create the bucket.
""")
class MbCommand(Command):
"""Implementation of gsutil mb command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'mb',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['makebucket', 'createbucket', 'md', 'mkdir'],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'c:l:p:',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'mb',
# List of help name aliases.
HELP_NAME_ALIASES : ['createbucket', 'makebucket', 'md', 'mkdir',
'location', 'dra', 'dras', 'reduced_availability',
'durable_reduced_availability',
'rr', 'reduced_redundancy',
'standard', 'storage class' ],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Make buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
location = ''
storage_class = ''
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-l':
location = a
elif o == '-p':
self.proj_id_handler.SetProjectId(a)
elif o == '-c':
storage_class = self._Normalize_Storage_Class(a)
if not self.headers:
headers = {}
else:
headers = self.headers.copy()
for bucket_uri_str in self.args:
bucket_uri = self.suri_builder.StorageUri(bucket_uri_str)
if not bucket_uri.names_bucket():
raise CommandException('The mb command requires a URI that specifies a '
'bucket.\n"%s" is not valid.' % bucket_uri)
self.proj_id_handler.FillInProjectHeaderIfNeeded('mb', bucket_uri,
headers)
print 'Creating %s...' % bucket_uri
# Pass storage_class param only if this is a GCS bucket. (In S3 the
# storage class is specified on the key object.)
if bucket_uri.scheme == 'gs':
bucket_uri.create_bucket(headers=headers, location=location,
storage_class=storage_class)
else:
bucket_uri.create_bucket(headers=headers, location=location)
return 0
def _Normalize_Storage_Class(self, sc):
sc = sc.upper()
if sc in ('DRA', 'DURABLE_REDUCED_AVAILABILITY'):
return 'DURABLE_REDUCED_AVAILABILITY'
if sc in ('S', 'STD', 'STANDARD'):
return 'STANDARD'
return sc
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil mv [-p] src_uri dst_uri
- or -
gsutil mv [-p] uri... dst_uri
<B>DESCRIPTION</B>
The gsutil mv command allows you to move data between your local file
system and the cloud, move data within the cloud, and move data between
cloud storage providers. For example, to move all objects from a
bucket to a local directory you could use:
gsutil mv gs://my_bucket dir
Similarly, to move all objects from a local directory to a bucket you could
use:
gsutil mv ./dir gs://my_bucket
<B>RENAMING BUCKET SUBDIRECTORIES</B>
You can use the gsutil mv command to rename subdirectories. For example,
the command:
gsutil mv gs://my_bucket/olddir gs://my_bucket/newdir
would rename all objects and subdirectories under gs://my_bucket/olddir to be
under gs://my_bucket/newdir, otherwise preserving the subdirectory structure.
If you do a rename as specified above and you want to preserve ACLs, you
should use the -p option (see OPTIONS).
Note that when using mv to rename bucket subdirectories you cannot specify
the source URI using wildcards. You need to spell out the complete name:
gsutil mv gs://my_bucket/olddir gs://my_bucket/newdir
If you have a large number of files to move you might want to use the
gsutil -m option, to perform a multi-threaded/multi-processing move:
gsutil -m mv gs://my_bucket/olddir gs://my_bucket/newdir
<B>NON-ATOMIC OPERATION</B>
Unlike the case with many file systems, the gsutil mv command does not
perform a single atomic operation. Rather, it performs a copy from source
to destination followed by removing the source for each object.
<B>OPTIONS</B>
-p Causes ACL to be preserved when moving in the cloud. Note that
this option has performance and cost implications, because it
is essentially performing three requests (getacl, cp, setacl).
(The performance issue can be mitigated to some degree by
using gsutil -m cp to cause multi-threaded/multi-processing
copying.)
""")
class MvCommand(Command):
"""Implementation of gsutil mv command.
Note that there is no atomic rename operation - this command is simply
a shorthand for 'cp' followed by 'rm'.
"""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'mv',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['move', 'ren', 'rename'],
# Min number of args required by this command.
MIN_ARGS : 2,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'pv',
# True if file URIs acceptable for this command.
FILE_URIS_OK : True,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'mv',
# List of help name aliases.
HELP_NAME_ALIASES : ['move', 'rename'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Move/rename objects and/or subdirectories',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
# Check each source arg up, refusing to delete a bucket src URI (force users
# to explicitly do that as a separate operation).
for arg_to_check in self.args[0:-1]:
if self.suri_builder.StorageUri(arg_to_check).names_bucket():
raise CommandException('You cannot move a source bucket using the mv '
'command. If you meant to move\nall objects in '
'the bucket, you can use a command like:\n'
'\tgsutil mv %s/* %s' %
(arg_to_check, self.args[-1]))
# Insert command-line opts in front of args so they'll be picked up by cp
# and rm commands (e.g., for -p option). Use undocumented (internal
# use-only) cp -M option, which causes each original object to be deleted
# after successfully copying to its destination, and also causes naming
# behavior consistent with Unix mv naming behavior (see comments in
# _ConstructDstUri in cp.py).
unparsed_args = ['-M']
if self.recursion_requested:
unparsed_args.append('-R')
unparsed_args.extend(self.unparsed_args)
self.command_runner.RunNamedCommand('cp', unparsed_args, self.headers,
self.debug, self.parallel_operations)
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains the perfdiag gsutil command."""
import calendar
from collections import defaultdict
import contextlib
import datetime
import json
import math
import multiprocessing
import os
import re
import socket
import string
import subprocess
import tempfile
import time
import boto.gs.connection
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.commands import config
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HELP_TYPE
from gslib.help_provider import HelpType
from gslib.util import IS_LINUX
from gslib.util import MakeBitsHumanReadable
from gslib.util import MakeHumanReadable
from gslib.util import Percentile
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil perfdiag [-i in.json] [-o out.json]
[-n iterations] [-c concurrency] [-s size] [-t tests] uri...
<B>DESCRIPTION</B>
The perfdiag command runs a suite of diagnostic tests for a given Google
Storage bucket.
The 'uri' parameter must name an existing bucket (e.g. gs://foo) to which
the user has write permission. Several test files will be uploaded to and
downloaded from this bucket. All test files will be deleted at the completion
of the diagnostic if it finishes successfully.
gsutil performance can be impacted by many factors at the client, server,
and in-between, such as: CPU speed; available memory; the access path to the
local disk; network bandwidth; contention and error rates along the path
between gsutil and Google; operating system buffering configuration; and
firewalls and other network elements. The perfdiag command is provided so
that customers can run a known measurement suite when troubleshooting
performance problems.
<B>PROVIDING DIAGNOSTIC OUTPUT TO GOOGLE CLOUD STORAGE TEAM</B>
If the Google Cloud Storage Team asks you to run a performance diagnostic
please use the following command, and email the output file (output.json)
to gs-team@google.com:
gsutil perfdiag -o output.json gs://your-bucket
<B>OPTIONS</B>
-n Sets the number of iterations performed when downloading and
uploading files during latency and throughput tests. Defaults to
5.
-c Sets the level of concurrency to use while running throughput
experiments. The default value of 1 will only run a single read
or write operation concurrently.
-s Sets the size (in bytes) of the test file used to perform read
and write throughput tests. The default is 1 MiB.
-t Sets the list of diagnostic tests to perform. The default is to
run all diagnostic tests. Must be a comma-separated list
containing one or more of the following:
lat: Runs N iterations (set with -n) of writing the file,
retrieving its metadata, reading the file, and deleting
the file. Records the latency of each operation.
rthru: Runs N (set with -n) read operations, with at most C
(set with -c) reads outstanding at any given time.
wthru: Runs N (set with -n) write operations, with at most C
(set with -c) writes outstanding at any given time.
-o Writes the results of the diagnostic to an output file. The output
is a JSON file containing system information and performance
diagnostic results. The file can be read and reported later using
the -i option.
-i Reads the JSON output file created using the -o command and prints
a formatted description of the results.
<B>NOTE</B>
The perfdiag command collects system information. It collects your IP address,
executes DNS queries to Google servers and collects the results, and collects
network statistics information from the output of netstat -s. None of this
information will be sent to Google unless you choose to send it.
""")
class PerfDiagCommand(Command):
"""Implementation of gsutil perfdiag command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME: 'perfdiag',
# List of command name aliases.
COMMAND_NAME_ALIASES: ['diag', 'diagnostic', 'perf', 'performance'],
# Min number of args required by this command.
MIN_ARGS: 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS: 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS: 'n:c:s:t:i:o:',
# True if file URIs acceptable for this command.
FILE_URIS_OK: False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK: False,
# Index in args of first URI arg.
URIS_START_ARG: 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED: True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME: 'perfdiag',
# List of help name aliases.
HELP_NAME_ALIASES: [],
# Type of help:
HELP_TYPE: HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY: 'Run performance diagnostic',
# The full help text.
HELP_TEXT: _detailed_help_text,
}
# Byte sizes to use for testing files.
# TODO: Consider letting the user specify these sizes with a configuration
# parameter.
test_file_sizes = (
0, # 0 bytes
1024, # 1 KB
102400, # 100 KB
1048576, # 1MB
)
# List of all diagnostic tests.
ALL_DIAG_TESTS = ('rthru', 'wthru', 'lat')
# Google Cloud Storage API endpoint host.
GOOGLE_API_HOST = boto.gs.connection.GSConnection.DefaultHost
def _WindowedExec(self, cmd, n, w, raise_on_error=True):
"""Executes a command n times with a window size of w.
Up to w instances of the command will be executed and left outstanding at a
time until n instances of the command have completed.
Args:
cmd: List containing the command to execute.
n: Number of times the command will be executed.
w: Window size of outstanding commands being executed.
raise_on_error: See _Exec.
Raises:
Exception: If raise_on_error is set to True and any process exits with a
non-zero return code.
"""
if self.debug:
print 'Running command:', cmd
devnull_f = open(os.devnull, 'w')
num_finished = 0
running = []
while len(running) or num_finished < n:
# Fires off new commands that can be executed.
while len(running) < w and num_finished + len(running) < n:
print 'Starting concurrent command: %s' % (' '.join(cmd))
p = subprocess.Popen(cmd, stdout=devnull_f, stderr=devnull_f)
running.append(p)
# Checks for finished commands.
prev_running = running
running = []
for p in prev_running:
retcode = p.poll()
if retcode is None:
running.append(p)
elif raise_on_error and retcode:
raise CommandException("Received non-zero return code (%d) from "
"subprocess '%s'." % (retcode, ' '.join(cmd)))
else:
num_finished += 1
def _Exec(self, cmd, raise_on_error=True, return_output=False,
mute_stderr=False):
"""Executes a command in a subprocess.
Args:
cmd: List containing the command to execute.
raise_on_error: Whether or not to raise an exception when a process exits
with a non-zero return code.
return_output: If set to True, the return value of the function is the
stdout of the process.
mute_stderr: If set to True, the stderr of the process is not printed to
the console.
Returns:
The return code of the process or the stdout if return_output is set.
Raises:
Exception: If raise_on_error is set to True and any process exits with a
non-zero return code.
"""
if self.debug:
print 'Running command:', cmd
stderr = subprocess.PIPE if mute_stderr else None
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=stderr)
(stdoutdata, stderrdata) = p.communicate()
if raise_on_error and p.returncode:
raise CommandException("Received non-zero return code (%d) from "
"subprocess '%s'." % (p.returncode, ' '.join(cmd)))
return stdoutdata if return_output else p.returncode
def _GsUtil(self, cmd, raise_on_error=True, return_output=False,
mute_stderr=False):
"""Executes a gsutil command in a subprocess.
Args:
cmd: A list containing the arguments to the gsutil program, e.g. ['ls',
'gs://foo'].
raise_on_error: see _Exec.
return_output: see _Exec.
mute_stderr: see _Exec.
Returns:
The return code of the process or the stdout if return_output is set.
"""
cmd = self.gsutil_exec_list + cmd
return self._Exec(cmd, raise_on_error=raise_on_error,
return_output=return_output, mute_stderr=mute_stderr)
def _SetUp(self):
"""Performs setup operations needed before diagnostics can be run."""
# Stores test result data.
self.results = {}
# List of test files in a temporary location on disk for latency ops.
self.latency_files = []
# Maps each test file path to its size in bytes.
self.file_sizes = {}
# Maps each test file to its contents as a string.
self.file_contents = {}
def _MakeFile(file_size):
"""Creates a temporary file of the given size and returns its path."""
fd, fpath = tempfile.mkstemp(suffix='.bin', prefix='gsutil_test_file',
text=False)
self.file_sizes[fpath] = file_size
f = os.fdopen(fd, 'wb')
f.write(os.urandom(file_size))
f.close()
f = open(fpath, 'rb')
self.file_contents[fpath] = f.read()
f.close()
return fpath
# Create files for latency tests.
for file_size in self.test_file_sizes:
fpath = _MakeFile(file_size)
self.latency_files.append(fpath)
# Local file on disk for write throughput tests.
self.thru_local_file = _MakeFile(self.thru_filesize)
# Remote file to write/read from during throughput tests.
self.thru_remote_file = (str(self.bucket_uri) +
os.path.basename(self.thru_local_file))
def _TearDown(self):
"""Performs operations to clean things up after performing diagnostics."""
for fpath in self.latency_files + [self.thru_local_file]:
try:
os.remove(fpath)
except OSError:
pass
self._GsUtil(['rm', self.thru_remote_file], raise_on_error=False,
mute_stderr=True)
@contextlib.contextmanager
def _Time(self, key, bucket):
"""A context manager that measures time.
A context manager that prints a status message before and after executing
the inner command and times how long the inner command takes. Keeps track of
the timing, aggregated by the given key.
Args:
key: The key to insert the timing value into a dictionary bucket.
bucket: A dictionary to place the timing value in.
Yields:
For the context manager.
"""
print key, 'starting...'
t0 = time.time()
yield
t1 = time.time()
bucket[key].append(t1 - t0)
print key, 'done.'
def _RunLatencyTests(self):
"""Runs latency tests."""
# Stores timing information for each category of operation.
self.results['latency'] = defaultdict(list)
for i in range(self.num_iterations):
print
print 'Running latency iteration %d...' % (i+1)
for fpath in self.latency_files:
basename = os.path.basename(fpath)
gsbucket = str(self.bucket_uri)
gsuri = gsbucket + basename
file_size = self.file_sizes[fpath]
readable_file_size = MakeHumanReadable(file_size)
print
print ("File of size %(size)s located on disk at '%(fpath)s' being "
"diagnosed in the cloud at '%(gsuri)s'."
% {'size': readable_file_size,
'fpath': fpath,
'gsuri': gsuri})
k = self.bucket.key_class(self.bucket)
k.key = basename
with self._Time('UPLOAD_%d' % file_size, self.results['latency']):
k.set_contents_from_string(self.file_contents[fpath])
with self._Time('METADATA_%d' % file_size, self.results['latency']):
k.exists()
with self._Time('DOWNLOAD_%d' % file_size, self.results['latency']):
k.get_contents_as_string()
with self._Time('DELETE_%d' % file_size, self.results['latency']):
k.delete()
def _RunReadThruTests(self):
"""Runs read throughput tests."""
self.results['read_throughput'] = {'file_size': self.thru_filesize,
'num_times': self.num_iterations,
'concurrency': self.concurrency}
# Copy the file to remote location before reading.
self._GsUtil(['cp', self.thru_local_file, self.thru_remote_file])
if self.concurrency == 1:
k = self.bucket.key_class(self.bucket)
k.key = os.path.basename(self.thru_local_file)
# Warm up the TCP connection by transferring a couple times first.
for i in range(2):
k.get_contents_as_string()
t0 = time.time()
for i in range(self.num_iterations):
k.get_contents_as_string()
t1 = time.time()
else:
cmd = self.gsutil_exec_list + ['cp', self.thru_remote_file, os.devnull]
t0 = time.time()
self._WindowedExec(cmd, self.num_iterations, self.concurrency)
t1 = time.time()
time_took = t1 - t0
total_bytes_copied = self.thru_filesize * self.num_iterations
bytes_per_second = total_bytes_copied / time_took
self.results['read_throughput']['time_took'] = time_took
self.results['read_throughput']['total_bytes_copied'] = total_bytes_copied
self.results['read_throughput']['bytes_per_second'] = bytes_per_second
def _RunWriteThruTests(self):
"""Runs write throughput tests."""
self.results['write_throughput'] = {'file_size': self.thru_filesize,
'num_copies': self.num_iterations,
'concurrency': self.concurrency}
if self.concurrency == 1:
k = self.bucket.key_class(self.bucket)
k.key = os.path.basename(self.thru_local_file)
# Warm up the TCP connection by transferring a couple times first.
for i in range(2):
k.set_contents_from_string(self.file_contents[self.thru_local_file])
t0 = time.time()
for i in range(self.num_iterations):
k.set_contents_from_string(self.file_contents[self.thru_local_file])
t1 = time.time()
else:
cmd = self.gsutil_exec_list + ['cp', self.thru_local_file,
self.thru_remote_file]
t0 = time.time()
self._WindowedExec(cmd, self.num_iterations, self.concurrency)
t1 = time.time()
time_took = t1 - t0
total_bytes_copied = self.thru_filesize * self.num_iterations
bytes_per_second = total_bytes_copied / time_took
self.results['write_throughput']['time_took'] = time_took
self.results['write_throughput']['total_bytes_copied'] = total_bytes_copied
self.results['write_throughput']['bytes_per_second'] = bytes_per_second
def _GetDiskCounters(self):
"""Retrieves disk I/O statistics for all disks.
Adapted from the psutil module's psutil._pslinux.disk_io_counters:
http://code.google.com/p/psutil/source/browse/trunk/psutil/_pslinux.py
Originally distributed under under a BSD license.
Original Copyright (c) 2009, Jay Loden, Dave Daeschler, Giampaolo Rodola.
Returns:
A dictionary containing disk names mapped to the disk counters from
/disk/diskstats.
"""
# iostat documentation states that sectors are equivalent with blocks and
# have a size of 512 bytes since 2.4 kernels. This value is needed to
# calculate the amount of disk I/O in bytes.
sector_size = 512
partitions = []
with open('/proc/partitions', 'r') as f:
lines = f.readlines()[2:]
for line in lines:
_, _, _, name = line.split()
if name[-1].isdigit():
partitions.append(name)
retdict = {}
with open('/proc/diskstats', 'r') as f:
for line in f:
values = line.split()[:11]
_, _, name, reads, _, rbytes, rtime, writes, _, wbytes, wtime = values
if name in partitions:
rbytes = int(rbytes) * sector_size
wbytes = int(wbytes) * sector_size
reads = int(reads)
writes = int(writes)
rtime = int(rtime)
wtime = int(wtime)
retdict[name] = (reads, writes, rbytes, wbytes, rtime, wtime)
return retdict
def _GetTcpStats(self):
"""Tries to parse out TCP packet information from netstat output.
Returns:
A dictionary containing TCP information
"""
# netstat return code is non-zero for -s on Linux, so don't raise on error.
netstat_output = self._Exec(['netstat', '-s'], return_output=True,
raise_on_error=False)
netstat_output = netstat_output.strip().lower()
found_tcp = False
tcp_retransmit = None
tcp_received = None
tcp_sent = None
for line in netstat_output.split('\n'):
# Header for TCP section is "Tcp:" in Linux/Mac and
# "TCP Statistics for" in Windows.
if 'tcp:' in line or 'tcp statistics' in line:
found_tcp = True
# Linux == "segments retransmited" (sic), Mac == "retransmit timeouts"
# Windows == "segments retransmitted".
if (found_tcp and tcp_retransmit is None and
('segments retransmited' in line or 'retransmit timeouts' in line or
'segments retransmitted' in line)):
tcp_retransmit = ''.join(c for c in line if c in string.digits)
# Linux+Windows == "segments received", Mac == "packets received".
if (found_tcp and tcp_received is None and
('segments received' in line or 'packets received' in line)):
tcp_received = ''.join(c for c in line if c in string.digits)
# Linux == "segments send out" (sic), Mac+Windows == "packets sent".
if (found_tcp and tcp_sent is None and
('segments send out' in line or 'packets sent' in line or
'segments sent' in line)):
tcp_sent = ''.join(c for c in line if c in string.digits)
result = {}
try:
result['tcp_retransmit'] = int(tcp_retransmit)
result['tcp_received'] = int(tcp_received)
result['tcp_sent'] = int(tcp_sent)
except (ValueError, TypeError):
result['tcp_retransmit'] = None
result['tcp_received'] = None
result['tcp_sent'] = None
return result
def _CollectSysInfo(self):
"""Collects system information."""
sysinfo = {}
# Get the local IP address from socket lib.
sysinfo['ip_address'] = socket.gethostbyname(socket.gethostname())
# Record the temporary directory used since it can affect performance, e.g.
# when on a networked filesystem.
sysinfo['tempdir'] = tempfile.gettempdir()
# Produces an RFC 2822 compliant GMT timestamp.
sysinfo['gmt_timestamp'] = time.strftime('%a, %d %b %Y %H:%M:%S +0000',
time.gmtime())
# Execute a CNAME lookup on Google DNS to find what Google server
# it's routing to.
cmd = ['nslookup', '-type=CNAME', self.GOOGLE_API_HOST]
nslookup_cname_output = self._Exec(cmd, return_output=True)
m = re.search(r' = (?P<googserv>[^.]+)\.', nslookup_cname_output)
sysinfo['googserv_route'] = m.group('googserv') if m else None
# Look up IP addresses for Google Server.
(hostname, aliaslist, ipaddrlist) = socket.gethostbyname_ex(
self.GOOGLE_API_HOST)
sysinfo['googserv_ips'] = ipaddrlist
# Reverse lookup the hostnames for the Google Server IPs.
sysinfo['googserv_hostnames'] = []
for googserv_ip in ipaddrlist:
(hostname, aliaslist, ipaddrlist) = socket.gethostbyaddr(googserv_ip)
sysinfo['googserv_hostnames'].append(hostname)
# Query o-o to find out what the Google DNS thinks is the user's IP.
cmd = ['nslookup', '-type=TXT', 'o-o.myaddr.google.com.']
nslookup_txt_output = self._Exec(cmd, return_output=True)
m = re.search(r'text\s+=\s+"(?P<dnsip>[\.\d]+)"', nslookup_txt_output)
sysinfo['dns_o-o_ip'] = m.group('dnsip') if m else None
# Try and find the number of CPUs in the system if available.
try:
sysinfo['cpu_count'] = multiprocessing.cpu_count()
except NotImplementedError:
sysinfo['cpu_count'] = None
# For *nix platforms, obtain the CPU load.
try:
sysinfo['load_avg'] = list(os.getloadavg())
except (AttributeError, OSError):
sysinfo['load_avg'] = None
# Try and collect memory information from /proc/meminfo if possible.
mem_total = None
mem_free = None
mem_buffers = None
mem_cached = None
try:
with open('/proc/meminfo', 'r') as f:
for line in f:
if line.startswith('MemTotal'):
mem_total = (int(''.join(c for c in line if c in string.digits))
* 1000)
elif line.startswith('MemFree'):
mem_free = (int(''.join(c for c in line if c in string.digits))
* 1000)
elif line.startswith('Buffers'):
mem_buffers = (int(''.join(c for c in line if c in string.digits))
* 1000)
elif line.startswith('Cached'):
mem_cached = (int(''.join(c for c in line if c in string.digits))
* 1000)
except (IOError, ValueError):
pass
sysinfo['meminfo'] = {'mem_total': mem_total,
'mem_free': mem_free,
'mem_buffers': mem_buffers,
'mem_cached': mem_cached}
# Get configuration attributes from config module.
sysinfo['gsutil_config'] = {}
for attr in dir(config):
attr_value = getattr(config, attr)
# Filter out multiline strings that are not useful.
if attr.isupper() and not (isinstance(attr_value, basestring) and
'\n' in attr_value):
sysinfo['gsutil_config'][attr] = attr_value
self.results['sysinfo'] = sysinfo
def _DisplayStats(self, trials):
"""Prints out mean, standard deviation, median, and 90th percentile."""
n = len(trials)
mean = float(sum(trials)) / n
stdev = math.sqrt(sum((x - mean)**2 for x in trials) / n)
print str(n).rjust(6), '',
print ('%.1f' % (mean * 1000)).rjust(9), '',
print ('%.1f' % (stdev * 1000)).rjust(12), '',
print ('%.1f' % (Percentile(trials, 0.5) * 1000)).rjust(11), '',
print ('%.1f' % (Percentile(trials, 0.9) * 1000)).rjust(11), ''
def _DisplayResults(self):
"""Displays results collected from diagnostic run."""
print
print '=' * 78
print 'DIAGNOSTIC RESULTS'.center(78)
print '=' * 78
if 'latency' in self.results:
print
print '-' * 78
print 'Latency'.center(78)
print '-' * 78
print ('Operation Size Trials Mean (ms) Std Dev (ms) '
'Median (ms) 90th % (ms)')
print ('========= ========= ====== ========= ============ '
'=========== ===========')
for key in sorted(self.results['latency']):
trials = sorted(self.results['latency'][key])
op, numbytes = key.split('_')
numbytes = int(numbytes)
if op == 'METADATA':
print 'Metadata'.rjust(9), '',
print MakeHumanReadable(numbytes).rjust(9), '',
self._DisplayStats(trials)
if op == 'DOWNLOAD':
print 'Download'.rjust(9), '',
print MakeHumanReadable(numbytes).rjust(9), '',
self._DisplayStats(trials)
if op == 'UPLOAD':
print 'Upload'.rjust(9), '',
print MakeHumanReadable(numbytes).rjust(9), '',
self._DisplayStats(trials)
if op == 'DELETE':
print 'Delete'.rjust(9), '',
print MakeHumanReadable(numbytes).rjust(9), '',
self._DisplayStats(trials)
if 'write_throughput' in self.results:
print
print '-' * 78
print 'Write Throughput'.center(78)
print '-' * 78
write_thru = self.results['write_throughput']
print 'Copied a %s file %d times for a total transfer size of %s.' % (
MakeHumanReadable(write_thru['file_size']),
write_thru['num_copies'],
MakeHumanReadable(write_thru['total_bytes_copied']))
print 'Write throughput: %s/s.' % (
MakeBitsHumanReadable(write_thru['bytes_per_second'] * 8))
if 'read_throughput' in self.results:
print
print '-' * 78
print 'Read Throughput'.center(78)
print '-' * 78
read_thru = self.results['read_throughput']
print 'Copied a %s file %d times for a total transfer size of %s.' % (
MakeHumanReadable(read_thru['file_size']),
read_thru['num_times'],
MakeHumanReadable(read_thru['total_bytes_copied']))
print 'Read throughput: %s/s.' % (
MakeBitsHumanReadable(read_thru['bytes_per_second'] * 8))
if 'sysinfo' in self.results:
print
print '-' * 78
print 'System Information'.center(78)
print '-' * 78
info = self.results['sysinfo']
print 'IP Address: \n %s' % info['ip_address']
print 'Temporary Directory: \n %s' % info['tempdir']
print 'Bucket URI: \n %s' % self.results['bucket_uri']
if 'gmt_timestamp' in info:
ts_string = info['gmt_timestamp']
timetuple = None
try:
# Convert RFC 2822 string to Linux timestamp.
timetuple = time.strptime(ts_string, '%a, %d %b %Y %H:%M:%S +0000')
except ValueError:
pass
if timetuple:
# Converts the GMT time tuple to local Linux timestamp.
localtime = calendar.timegm(timetuple)
localdt = datetime.datetime.fromtimestamp(localtime)
print 'Measurement time: \n %s' % localdt.strftime(
'%Y-%m-%d %I-%M-%S %p %Z')
print 'Google Server: \n %s' % info['googserv_route']
print ('Google Server IP Addresses: \n %s' %
('\n '.join(info['googserv_ips'])))
print ('Google Server Hostnames: \n %s' %
('\n '.join(info['googserv_hostnames'])))
print 'Google DNS thinks your IP is: \n %s' % info['dns_o-o_ip']
print 'CPU Count: \n %s' % info['cpu_count']
print 'CPU Load Average: \n %s' % info['load_avg']
try:
print ('Total Memory: \n %s' %
MakeHumanReadable(info['meminfo']['mem_total']))
# Free memory is really MemFree + Buffers + Cached.
print 'Free Memory: \n %s' % MakeHumanReadable(
info['meminfo']['mem_free'] +
info['meminfo']['mem_buffers'] +
info['meminfo']['mem_cached'])
except TypeError:
pass
netstat_after = info['netstat_end']
netstat_before = info['netstat_start']
for tcp_type in ('sent', 'received', 'retransmit'):
try:
delta = (netstat_after['tcp_%s' % tcp_type] -
netstat_before['tcp_%s' % tcp_type])
print 'TCP segments %s during test:\n %d' % (tcp_type, delta)
except TypeError:
pass
if 'disk_counters_end' in info and 'disk_counters_start' in info:
print 'Disk Counter Deltas:\n',
disk_after = info['disk_counters_end']
disk_before = info['disk_counters_start']
print '', 'disk'.rjust(6),
for colname in ['reads', 'writes', 'rbytes', 'wbytes', 'rtime',
'wtime']:
print colname.rjust(8),
print
for diskname in sorted(disk_after):
before = disk_before[diskname]
after = disk_after[diskname]
(reads1, writes1, rbytes1, wbytes1, rtime1, wtime1) = before
(reads2, writes2, rbytes2, wbytes2, rtime2, wtime2) = after
print '', diskname.rjust(6),
deltas = [reads2-reads1, writes2-writes1, rbytes2-rbytes1,
wbytes2-wbytes1, rtime2-rtime1, wtime2-wtime1]
for delta in deltas:
print str(delta).rjust(8),
print
if self.output_file:
with open(self.output_file, 'w') as f:
json.dump(self.results, f, indent=2)
print
print "Output file written to '%s'." % self.output_file
print
def _ParsePositiveInteger(self, val, msg):
"""Tries to convert val argument to a positive integer.
Args:
val: The value (as a string) to convert to a positive integer.
msg: The error message to place in the CommandException on an error.
Returns:
A valid positive integer.
Raises:
CommandException: If the supplied value is not a valid positive integer.
"""
try:
val = int(val)
if val < 1:
raise CommandException(msg)
return val
except ValueError:
raise CommandException(msg)
def _ParseArgs(self):
"""Parses arguments for perfdiag command."""
# From -n.
self.num_iterations = 5
# From -c.
self.concurrency = 1
# From -s.
self.thru_filesize = 1048576
# From -t.
self.diag_tests = self.ALL_DIAG_TESTS
# From -o.
self.output_file = None
# From -i.
self.input_file = None
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-n':
self.num_iterations = self._ParsePositiveInteger(
a, 'The -n parameter must be a positive integer.')
if o == '-c':
self.concurrency = self._ParsePositiveInteger(
a, 'The -c parameter must be a positive integer.')
if o == '-s':
self.thru_filesize = self._ParsePositiveInteger(
a, 'The -s parameter must be a positive integer.')
if o == '-t':
self.diag_tests = []
for test_name in a.strip().split(','):
if test_name.lower() not in self.ALL_DIAG_TESTS:
raise CommandException("List of test names (-t) contains invalid "
"test name '%s'." % test_name)
self.diag_tests.append(test_name)
if o == '-o':
self.output_file = os.path.abspath(a)
if o == '-i':
self.input_file = os.path.abspath(a)
if not os.path.isfile(self.input_file):
raise CommandException("Invalid input file (-i): '%s'." % a)
try:
with open(self.input_file, 'r') as f:
self.results = json.load(f)
print "Read input file: '%s'." % self.input_file
except ValueError:
raise CommandException("Could not decode input file (-i): '%s'." %
a)
return
if not self.args:
raise CommandException('Wrong number of arguments for "perfdiag" '
'command.')
self.bucket_uri = self.suri_builder.StorageUri(self.args[0])
if not self.bucket_uri.names_bucket():
raise CommandException('The perfdiag command requires a URI that '
'specifies a bucket.\n"%s" is not '
'valid.' % self.bucket_uri)
self.bucket = self.bucket_uri.get_bucket()
# Command entry point.
def RunCommand(self):
"""Called by gsutil when the command is being invoked."""
self._ParseArgs()
if self.input_file:
self._DisplayResults()
return 0
print 'Number of iterations to run: %d' % self.num_iterations
print 'Base bucket URI: %s' % self.bucket_uri
print 'Concurrency level: %d' % self.concurrency
print 'Throughput file size: %s' % MakeHumanReadable(self.thru_filesize)
print 'Diagnostics to run: %s' % (', '.join(self.diag_tests))
try:
self._SetUp()
# Collect generic system info.
self._CollectSysInfo()
# Collect netstat info and disk counters before tests (and again later).
self.results['sysinfo']['netstat_start'] = self._GetTcpStats()
if IS_LINUX:
self.results['sysinfo']['disk_counters_start'] = self._GetDiskCounters()
# Record bucket URI.
self.results['bucket_uri'] = str(self.bucket_uri)
if 'lat' in self.diag_tests:
self._RunLatencyTests()
if 'rthru' in self.diag_tests:
self._RunReadThruTests()
if 'wthru' in self.diag_tests:
self._RunWriteThruTests()
# Collect netstat info and disk counters after tests.
self.results['sysinfo']['netstat_end'] = self._GetTcpStats()
if IS_LINUX:
self.results['sysinfo']['disk_counters_end'] = self._GetDiskCounters()
self._DisplayResults()
finally:
self._TearDown()
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from boto.exception import GSResponseError
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil rb uri...
<B>DESCRIPTION</B>
The rb command deletes new bucket. Buckets must be empty before you can delete
them.
Be certain you want to delete a bucket before you do so, as once it is
deleted the name becomes available and another user may create a bucket with
that name. (But see also "DOMAIN NAMED BUCKETS" under "gsutil help naming"
for help carving out parts of the bucket name space.)
""")
class RbCommand(Command):
"""Implementation of gsutil rb command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'rb',
# List of command name aliases.
COMMAND_NAME_ALIASES : [
'deletebucket', 'removebucket', 'removebuckets', 'rmdir'],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'rb',
# List of help name aliases.
HELP_NAME_ALIASES :
['deletebucket', 'removebucket', 'removebuckets', 'rmdir'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Remove buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
# Expand bucket name wildcards, if any.
did_some_work = False
for uri_str in self.args:
for uri in self.WildcardIterator(uri_str).IterUris():
if uri.object_name:
raise CommandException('"rb" command requires a URI with no object '
'name')
print 'Removing %s...' % uri
try:
uri.delete_bucket(self.headers)
except GSResponseError as e:
if e.code == 'BucketNotEmpty' and uri.get_versioning_config():
raise CommandException('Bucket is not empty. Note: this is a '
'versioned bucket, so to delete all objects'
'\nyou need to use:\n\tgsutil rm -ra %s'
% uri)
else:
raise
did_some_work = True
if not did_some_work:
raise CommandException('No URIs matched')
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import boto
from boto.exception import GSResponseError
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.name_expansion import NameExpansionIterator
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil rm [-f] [-R] uri...
<B>DESCRIPTION</B>
The gsutil rm command removes objects.
For example, the command:
gsutil rm gs://bucket/subdir/*
will remove all objects in gs://bucket/subdir, but not in any of its
sub-directories. In contrast:
gsutil rm gs://bucket/subdir/**
will remove all objects under gs://bucket/subdir or any of its
subdirectories.
You can also use the -R option to specify recursive object deletion. Thus, for
example, the following two commands will both remove all objects in a bucket:
gsutil rm gs://bucket/**
gsutil rm -R gs://bucket
If you have a large number of objects to remove you might want to use the
gsutil -m option, to perform a parallel (multi-threaded/multi-processing)
removes:
gsutil -m rm -R gs://my_bucket/subdir
Note that gsutil rm will refuse to remove files from the local
file system. For example this will fail:
gsutil rm *.txt
<B>OPTIONS</B>
-f Continues silently (without printing error messages) despite
errors when removing multiple objects.
-R, -r Causes bucket contents to be removed recursively (i.e., including
all objects and subdirectories). Will not delete the bucket
itself; you need to run the gsutil rb command separately to do
that.
-a Delete all versions of an object.
""")
class RmCommand(Command):
"""Implementation of gsutil rm command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'rm',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['del', 'delete', 'remove'],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'afrRv',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'rm',
# List of help name aliases.
HELP_NAME_ALIASES : ['del', 'delete', 'remove'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Remove objects',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
# self.recursion_requested initialized in command.py (so can be checked
# in parent class for all commands).
self.continue_on_error = False
self.all_versions = False
if self.sub_opts:
for o, unused_a in self.sub_opts:
if o == '-a':
self.all_versions = True
elif o == '-f':
self.continue_on_error = True
elif o == '-r' or o == '-R':
self.recursion_requested = True
elif o == '-v':
self.THREADED_LOGGER.info('WARNING: The %s -v option is no longer'
' needed, and will eventually be removed.\n'
% self.command_name)
# Used to track if any files failed to be removed.
self.everything_removed_okay = True
# Tracks if any URIs matched the given args.
remove_func = self._MkRemoveFunc()
exception_handler = self._MkRemoveExceptionHandler()
try:
# Expand wildcards, dirs, buckets, and bucket subdirs in URIs.
name_expansion_iterator = NameExpansionIterator(
self.command_name, self.proj_id_handler, self.headers, self.debug,
self.bucket_storage_uri_class, self.args, self.recursion_requested,
flat=self.recursion_requested, all_versions=self.all_versions)
# Perform remove requests in parallel (-m) mode, if requested, using
# configured number of parallel processes and threads. Otherwise,
# perform requests with sequential function calls in current process.
self.Apply(remove_func, name_expansion_iterator, exception_handler)
# Assuming the bucket has versioning enabled, uri's that don't map to
# objects should throw an error even with all_versions, since the prior
# round of deletes only sends objects to a history table.
# This assumption that rm -a is only called for versioned buckets should be
# corrected, but the fix is non-trivial.
except CommandException as e:
if not self.continue_on_error:
raise
except GSResponseError, e:
if not self.continue_on_error:
raise
if not self.everything_removed_okay and not self.continue_on_error:
raise CommandException('Some files could not be removed.')
# If this was a gsutil rm -r command covering any bucket subdirs,
# remove any dir_$folder$ objects (which are created by various web UI
# tools to simulate folders).
if self.recursion_requested:
folder_object_wildcards = []
for uri_str in self.args:
uri = self.suri_builder.StorageUri(uri_str)
if uri.names_object:
folder_object_wildcards.append('%s**_$folder$' % uri)
if len(folder_object_wildcards):
self.continue_on_error = True
try:
name_expansion_iterator = NameExpansionIterator(
self.command_name, self.proj_id_handler, self.headers, self.debug,
self.bucket_storage_uri_class, folder_object_wildcards,
self.recursion_requested, flat=True,
all_versions=self.all_versions)
self.Apply(remove_func, name_expansion_iterator, exception_handler)
except CommandException as e:
# Ignore exception from name expansion due to an absent folder file.
if not e.reason.startswith('No URIs matched:'):
raise
return 0
def _MkRemoveExceptionHandler(self):
def RemoveExceptionHandler(e):
"""Simple exception handler to allow post-completion status."""
self.THREADED_LOGGER.error(str(e))
self.everything_removed_okay = False
return RemoveExceptionHandler
def _MkRemoveFunc(self):
def RemoveFunc(name_expansion_result):
exp_src_uri = self.suri_builder.StorageUri(
name_expansion_result.GetExpandedUriStr(),
is_latest=name_expansion_result.is_latest)
if exp_src_uri.names_container():
if exp_src_uri.is_cloud_uri():
# Before offering advice about how to do rm + rb, ensure those
# commands won't fail because of bucket naming problems.
boto.s3.connection.check_lowercase_bucketname(exp_src_uri.bucket_name)
uri_str = exp_src_uri.object_name.rstrip('/')
raise CommandException('"rm" command will not remove buckets. To '
'delete this/these bucket(s) do:\n\tgsutil rm '
'%s/*\n\tgsutil rb %s' % (uri_str, uri_str))
# Perform delete.
self.THREADED_LOGGER.info('Removing %s...',
name_expansion_result.expanded_uri_str)
try:
exp_src_uri.delete_key(validate=False, headers=self.headers)
except:
if self.continue_on_error:
self.everything_removed_okay = False
else:
raise
return RemoveFunc
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil setacl [-R] file-or-canned_acl_name uri...
<B>DESCRIPTION</B>
The setacl command allows you to set an Access Control List on one or
more buckets and objects. The simplest way to use it is to specify one of
the canned ACLs, e.g.,:
gsutil setacl private gs://bucket
or:
gsutil setacl public-read gs://bucket/object
See "gsutil help acls" for a list of all canned ACLs.
If you want to define more fine-grained control over your data, you can
retrieve an ACL using the getacl command (see "gsutil help getacl"),
save the output to a file, edit the file, and then use the gsutil setacl
command to set that ACL on the buckets and/or objects. For example:
gsutil getacl gs://bucket/file.txt > acl.txt
(Make changes to acl.txt such as adding an additional grant.)
gsutil setacl acl.txt gs://cats/file.txt
Note that you can set an ACL on multiple buckets or objects at once,
for example:
gsutil setacl acl.txt gs://bucket/*.jpg
If you have a large number of ACLs to update you might want to use the
gsutil -m option, to perform a parallel (multi-threaded/multi-processing)
update:
gsutil -m setacl acl.txt gs://bucket/*.jpg
Note that multi-threading/multi-processing is only done when the named URIs
refer to objects. gsutil -m setacl gs://bucket1 gs://bucket2 will run the
setacl operations sequentially.
One other note: If you want to change a set of ACLs by adding and removing
grants, without the need to manually retrieve and edit the XML representation,
you can do that with the chacl command (see 'gsutil help chacl').
<B>OPTIONS</B>
-R, -r Performs setacl request recursively, to all objects under the
specified URI.
-a Performs setacl request on all object versions.
""")
class SetAclCommand(Command):
"""Implementation of gsutil setacl command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'setacl',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 2,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'aRrv',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'setacl',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Set bucket and/or object ACLs',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
if self.sub_opts:
for o, unused_a in self.sub_opts:
if o == '-a':
self.all_versions = True
elif o == '-r' or o == '-R':
self.recursion_requested = True
elif o == '-v':
self.THREADED_LOGGER.info('WARNING: The %s -v option is no longer'
' needed, and will eventually be removed.\n'
% self.command_name)
self.SetAclCommandHelper()
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import xml.sax
from boto import handler
from boto.gs.cors import Cors
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil setcors cors-xml-file uri...
<B>DESCRIPTION</B>
Sets the Cross-Origin Resource Sharing (CORS) configuration on one or more
buckets. This command is supported for buckets only, not objects. The
cors-xml-file specified on the command line should be a path to a local
file containing an XML document with the following structure:
<?xml version="1.0" ?>
<CorsConfig>
<Cors>
<Origins>
<Origin>http://origin1.example.com</Origin>
</Origins>
<Methods>
<Method>GET</Method>
</Methods>
<ResponseHeaders>
<ResponseHeader>Content-Type</ResponseHeader>
</ResponseHeaders>
</Cors>
</CorsConfig>
The above XML document explicitly allows cross-origin GET requests from
http://origin1.example.com and may include the Content-Type response header.
For more info about CORS, see http://www.w3.org/TR/cors/.
""")
class SetCorsCommand(Command):
"""Implementation of gsutil setcors command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'setcors',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 2,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'setcors',
# List of help name aliases.
HELP_NAME_ALIASES : ['cors', 'cross-origin'],
# Type of help)
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Set a CORS XML document for one or more buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
cors_arg = self.args[0]
uri_args = self.args[1:]
# Disallow multi-provider setcors requests.
storage_uri = self.UrisAreForSingleProvider(uri_args)
if not storage_uri:
raise CommandException('"%s" command spanning providers not allowed.' %
self.command_name)
# Open, read and parse file containing XML document.
cors_file = open(cors_arg, 'r')
cors_txt = cors_file.read()
cors_file.close()
cors_obj = Cors()
# Parse XML document and convert into Cors object.
h = handler.XmlHandler(cors_obj, None)
try:
xml.sax.parseString(cors_txt, h)
except xml.sax._exceptions.SAXParseException, e:
raise CommandException('Requested CORS is invalid: %s at line %s, '
'column %s' % (e.getMessage(), e.getLineNumber(),
e.getColumnNumber()))
# Iterate over URIs, expanding wildcards, and setting the CORS on each.
some_matched = False
for uri_str in uri_args:
for blr in self.WildcardIterator(uri_str):
uri = blr.GetUri()
if not uri.names_bucket():
raise CommandException('URI %s must name a bucket for the %s command'
% (str(uri), self.command_name))
some_matched = True
print 'Setting CORS on %s...' % uri
uri.set_cors(cors_obj, False, self.headers)
if not some_matched:
raise CommandException('No URIs matched')
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil setdefacl file-or-canned_acl_name uri...
<B>DESCRIPTION</B>
The setdefacl command sets default object ACLs for the specified buckets. If
you specify a default object ACL for a certain bucket, Google Cloud Storage
applies the default object ACL to all new objects uploaded to that bucket.
Similar to the setacl command, the file-or-canned_acl_name names either a
canned ACL or the path to a file that contains ACL XML. (See "gsutil
help setacl" for examples of editing and setting ACLs via the
getacl/setacl commands.)
If you don't set a default object ACL on a bucket, the bucket's default
object ACL will be project-private.
Setting a default object ACL on a bucket provides a convenient way
to ensure newly uploaded objects have a specific ACL, and avoids the
need to back after the fact and set ACLs on a large number of objects
for which you forgot to set the ACL at object upload time (which can
happen if you don't set a default object ACL on a bucket, and get the
default project-private ACL).
""")
class SetDefAclCommand(Command):
"""Implementation of gsutil setdefacl command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'setdefacl',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 2,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'setdefacl',
# List of help name aliases.
HELP_NAME_ALIASES : ['default acl'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Set default ACL on buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
if not self.suri_builder.StorageUri(self.args[-1]).names_bucket():
raise CommandException('URI must name a bucket for the %s command' %
self.command_name)
self.SetAclCommandHelper()
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#coding=utf8
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import boto
import csv
import random
import StringIO
import time
from boto.exception import GSResponseError
from boto.s3.key import Key
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import Command
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HELP_TYPE
from gslib.help_provider import HelpType
from gslib.name_expansion import NameExpansionIterator
from gslib.util import NO_MAX
from gslib.util import Retry
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil setmeta [-n] -h [header:value|header] ... uri...
<B>DESCRIPTION</B>
The gsutil setmeta command allows you to set or remove the metadata on one
or more objects. It takes one or more header arguments followed by one or
more URIs, where each header argument is in one of two forms:
- if you specify header:value, it will set the given header on all
named objects.
- if you specify header (with no value), it will remove the given header
from all named objects.
For example, the following command would set the Content-Type and
Cache-Control and remove the Content-Disposition on the specified objects:
gsutil setmeta -h "Content-Type:text/html" \\
-h "Cache-Control:public, max-age=3600" \\
-h "Content-Disposition" gs://bucket/*.html
If you have a large number of objects to update you might want to use the
gsutil -m option, to perform a parallel (multi-threaded/multi-processing)
update:
gsutil -m setmeta -h "Content-Type:text/html" \\
-h "Cache-Control:public, max-age=3600" \\
-h "Content-Disposition" gs://bucket/*.html
See "gsutil help metadata" for details about how you can set metadata
while uploading objects, what metadata fields can be set and the meaning of
these fields, use of custom metadata, and how to view currently set metadata.
<B>OPERATION COST</B>
This command uses four operations per URI (one to read the ACL, one to read
the current metadata, one to set the new metadata, and one to set the ACL).
For cases where you want all objects to have the same ACL you can avoid half
these operations by setting a default ACL on the bucket(s) containing the
named objects, and using the setmeta -n option. See "help gsutil setdefacl".
<B>OPTIONS</B>
-h Specifies a header:value to be added, or header to be removed,
from each named object.
-n Causes the operations for reading and writing the ACL to be
skipped. This halves the number of operations performed per
request, improving the speed and reducing the cost of performing
the operations. This option makes sense for cases where you want
all objects to have the same ACL, for which you have set a default
ACL on the bucket(s) containing the objects. See "help gsutil
setdefacl".
<B>OLDER SYNTAX (DEPRECATED)</B>
The first version of the setmeta command used more complicated syntax
(described below). gsutil still supports this syntax, to avoid breaking
existing customer uses, but it is now deprecated and will eventually
be removed.
With this older syntax, the setmeta command accepts a single metadata
argument in one of two forms:
gsutil setmeta [-n] header:value uri...
or
gsutil setmeta [-n] '"header:value","-header",...' uri...
The first form allows you to specify a single header name and value to
set. For example, the following command would set the Content-Type and
Cache-Control and remove the Content-Disposition on the specified objects:
gsutil setmeta -h "Content-Type:text/html" \\
-h "Cache-Control:public, max-age=3600" \\
-h "Content-Disposition" gs://bucket/*.html
This form only works if the header name and value don't contain double
quotes or commas, and only works for setting the header value (not for
removing it).
The more general form of the first argument allows both setting and removing
multiple fields, without any of the content restrictions noted above. For
this variant the first argument is a CSV-formatted list of headers to add
or remove. Getting the CSV-formatted list to be passed correctly into gsutil
requires different syntax on Linux or MacOS than it does on Windows.
On Linux or MacOS you need to surround the entire argument in single quotes
to avoid having the shell interpret/strip out the double-quotes in the CSV
data. For example, the following command would set the Content-Type and
Cache-Control and remove the Content-Disposition on the specified objects:
gsutil setmeta '"Content-Type:text/html","Cache-Control:public, max-age=3600","-Content-Disposition"' gs://bucket/*.html
To pass CSV data on Windows you need two sets of double quotes around
each header/value pair, and one set of double quotes around the entire
expression. For example, the following command would set the Content-Type
and Cache-Control and remove the Content-Disposition on the specified objects:
gsutil setmeta "\""Content-Type:text/html"",""Cache-Control:public, max-age=3600"",""-Content-Disposition""\" gs://bucket/*.html
<B>WARNING ABOUT USING SETMETA WITH VERSIONING ENABLED</B>
Note that if you use the gsutil setmeta command on an object in a bucket
with versioning enabled (see 'gsutil help versioning'), it will create
a new object version (and thus, you will get charged for the space required
for holding the additional version).
""")
class SetMetaCommand(Command):
"""Implementation of gsutil setmeta command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'setmeta',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['setheader'],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'h:n',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'setmeta',
# List of help name aliases.
HELP_NAME_ALIASES : ['setheader'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Set metadata on already uploaded objects',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
headers = []
preserve_acl = True
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-n':
preserve_acl = False
elif o == '-h':
headers.append(a)
if headers:
(metadata_minus, metadata_plus) = self._ParseMetadataHeaders(headers)
uri_args = self.args
else:
(metadata_minus, metadata_plus) = self._ParseMetadataSpec(self.args[0])
uri_args = self.args[1:]
if (len(uri_args) == 1
and not self.suri_builder.StorageUri(uri_args[0]).names_object()):
raise CommandException('URI (%s) must name an object' % uri_args[0])
# Used to track if any objects' metadata failed to be set.
self.everything_set_okay = True
def _SetMetadataExceptionHandler(e):
"""Simple exception handler to allow post-completion status."""
self.THREADED_LOGGER.error(str(e))
self.everything_set_okay = False
@Retry(GSResponseError, tries=3, delay=1, backoff=2)
def _SetMetadataFunc(name_expansion_result):
exp_src_uri = self.suri_builder.StorageUri(
name_expansion_result.GetExpandedUriStr())
self.THREADED_LOGGER.info('Setting metadata on %s...', exp_src_uri)
key = exp_src_uri.get_key()
meta_generation = key.meta_generation
generation = key.generation
headers = {}
if generation:
headers['x-goog-if-generation-match'] = generation
if meta_generation:
headers['x-goog-if-metageneration-match'] = meta_generation
# If this fails because of a precondition, it will raise a
# GSResponseError for @Retry to handle.
exp_src_uri.set_metadata(metadata_plus, metadata_minus, preserve_acl,
headers=headers)
name_expansion_iterator = NameExpansionIterator(
self.command_name, self.proj_id_handler, self.headers, self.debug,
self.bucket_storage_uri_class, uri_args, self.recursion_requested,
self.recursion_requested)
# Perform requests in parallel (-m) mode, if requested, using
# configured number of parallel processes and threads. Otherwise,
# perform requests with sequential function calls in current process.
self.Apply(_SetMetadataFunc, name_expansion_iterator,
_SetMetadataExceptionHandler)
if not self.everything_set_okay:
raise CommandException('Metadata for some objects could not be set.')
return 0
def _ParseMetadataHeaders(self, headers):
metadata_minus = set()
cust_metadata_minus = set()
metadata_plus = {}
cust_metadata_plus = {}
# Build a count of the keys encountered from each plus and minus arg so we
# can check for dupe field specs.
num_metadata_plus_elems = 0
num_cust_metadata_plus_elems = 0
num_metadata_minus_elems = 0
num_cust_metadata_minus_elems = 0
for md_arg in headers:
parts = md_arg.split(':')
if len(parts) not in (1, 2):
raise CommandException(
'Invalid argument: must be either header or header:value (%s)' %
md_arg)
if len(parts) == 2:
(header, value) = parts
else:
(header, value) = (parts[0], None)
_InsistAsciiHeader(header)
# Translate headers to lowercase to match the casing assumed by our
# sanity-checking operations.
header = header.lower()
if value:
if _IsCustomMeta(header):
# Allow non-ASCII data for custom metadata fields. Don't unicode
# encode other fields because that would perturb their content
# (e.g., adding %2F's into the middle of a Cache-Control value).
value = unicode(value, 'utf-8')
cust_metadata_plus[header] = value
num_cust_metadata_plus_elems += 1
else:
metadata_plus[header] = value
num_metadata_plus_elems += 1
else:
if _IsCustomMeta(header):
cust_metadata_minus.add(header)
num_cust_metadata_minus_elems += 1
else:
metadata_minus.add(header)
num_metadata_minus_elems += 1
if (num_metadata_plus_elems != len(metadata_plus)
or num_cust_metadata_plus_elems != len(cust_metadata_plus)
or num_metadata_minus_elems != len(metadata_minus)
or num_cust_metadata_minus_elems != len(cust_metadata_minus)
or metadata_minus.intersection(set(metadata_plus.keys()))):
raise CommandException('Each header must appear at most once.')
other_than_base_fields = (set(metadata_plus.keys())
.difference(Key.base_user_settable_fields))
other_than_base_fields.update(
metadata_minus.difference(Key.base_user_settable_fields))
for f in other_than_base_fields:
# This check is overly simple; it would be stronger to check, for each
# URI argument, whether f.startswith the
# uri.get_provider().metadata_prefix, but here we just parse the spec
# once, before processing any of the URIs. This means we will not
# detect if the user tries to set an x-goog-meta- field on an another
# provider's object, for example.
if not _IsCustomMeta(f):
raise CommandException('Invalid or disallowed header (%s).\n'
'Only these fields (plus x-goog-meta-* fields)'
' can be set or unset:\n%s' % (f,
sorted(list(Key.base_user_settable_fields))))
metadata_plus.update(cust_metadata_plus)
metadata_minus.update(cust_metadata_minus)
return (metadata_minus, metadata_plus)
def _ParseMetadataSpec(self, spec):
self.THREADED_LOGGER.info('WARNING: metadata spec syntax (%s)\nis '
'deprecated and will eventually be removed.\n'
'Please see "gsutil help setmeta" for current '
'syntax' % spec)
metadata_minus = set()
cust_metadata_minus = set()
metadata_plus = {}
cust_metadata_plus = {}
# Build a count of the keys encountered from each plus and minus arg so we
# can check for dupe field specs.
num_metadata_plus_elems = 0
num_cust_metadata_plus_elems = 0
num_metadata_minus_elems = 0
num_cust_metadata_minus_elems = 0
mdf = StringIO.StringIO(spec)
for md_arg in csv.reader(mdf).next():
if not md_arg:
raise CommandException(
'Invalid empty metadata specification component.')
if md_arg[0] == '-':
header = md_arg[1:]
if header.find(':') != -1:
raise CommandException('Removal spec may not contain ":" (%s).' %
header)
_InsistAsciiHeader(header)
# Translate headers to lowercase to match the casing required by
# uri.set_metadata().
header = header.lower()
if _IsCustomMeta(header):
cust_metadata_minus.add(header)
num_cust_metadata_minus_elems += 1
else:
metadata_minus.add(header)
num_metadata_minus_elems += 1
else:
parts = md_arg.split(':', 1)
if len(parts) != 2:
raise CommandException(
'Fields being added must include values (%s).' % md_arg)
(header, value) = parts
_InsistAsciiHeader(header)
header = header.lower()
if _IsCustomMeta(header):
# Allow non-ASCII data for custom metadata fields. Don't unicode
# encode other fields because that would perturb their content
# (e.g., adding %2F's into the middle of a Cache-Control value).
value = unicode(value, 'utf-8')
cust_metadata_plus[header] = value
num_cust_metadata_plus_elems += 1
else:
metadata_plus[header] = value
num_metadata_plus_elems += 1
mdf.close()
if (num_metadata_plus_elems != len(metadata_plus)
or num_cust_metadata_plus_elems != len(cust_metadata_plus)
or num_metadata_minus_elems != len(metadata_minus)
or num_cust_metadata_minus_elems != len(cust_metadata_minus)
or metadata_minus.intersection(set(metadata_plus.keys()))):
raise CommandException('Each header must appear at most once.')
other_than_base_fields = (set(metadata_plus.keys())
.difference(Key.base_user_settable_fields))
other_than_base_fields.update(
metadata_minus.difference(Key.base_user_settable_fields))
for f in other_than_base_fields:
# This check is overly simple; it would be stronger to check, for each
# URI argument, whether f.startswith the
# uri.get_provider().metadata_prefix, but here we just parse the spec
# once, before processing any of the URIs. This means we will not
# detect if the user tries to set an x-goog-meta- field on an another
# provider's object, for example.
if not _IsCustomMeta(f):
raise CommandException('Invalid or disallowed header (%s).\n'
'Only these fields (plus x-goog-meta-* fields)'
' can be set or unset:\n%s' % (f,
sorted(list(Key.base_user_settable_fields))))
metadata_plus.update(cust_metadata_plus)
metadata_minus.update(cust_metadata_minus)
return (metadata_minus, metadata_plus)
def _InsistAsciiHeader(header):
if not all(ord(c) < 128 for c in header):
raise CommandException('Invalid non-ASCII header (%s).' % header)
def _IsCustomMeta(header):
return header.startswith('x-goog-meta-') or header.startswith('x-amz-meta-')
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil setversioning [on|off] bucket_uri...
<B>DESCRIPTION</B>
The Versioning Configuration feature enables you to configure a Google Cloud
Storage bucket to keep old versions of objects.
The gsutil setversioning command allows you to enable or suspend versioning
on one or more buckets.
""")
class SetVersioningCommand(Command):
"""Implementation of gsutil setversioning command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'setversioning',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'setversioning',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help)
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Enable or suspend versioning for one or more '
'buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
versioning_arg = self.args[0].lower()
if not versioning_arg in ('on', 'off'):
raise CommandException('Argument to %s must be either [on|off]'
% (self.command_name))
uri_args = self.args[1:]
# Iterate over URIs, expanding wildcards, and setting the website
# configuration on each.
some_matched = False
for uri_str in uri_args:
for blr in self.WildcardIterator(uri_str):
uri = blr.GetUri()
if not uri.names_bucket():
raise CommandException('URI %s must name a bucket for the %s command'
% (str(uri), self.command_name))
some_matched = True
if versioning_arg == 'on':
print 'Enabling versioning for %s...' % uri
uri.configure_versioning(True)
else:
print 'Suspending versioning for %s...' % uri
uri.configure_versioning(False)
if not some_matched:
raise CommandException('No URIs matched')
return 0
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from gslib.util import NO_MAX
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil setwebcfg [-m main_page_suffix] [-e error_page] bucket_uri...
<B>DESCRIPTION</B>
The Website Configuration feature enables you to configure a Google Cloud
Storage bucket to behave like a static website. This means requests made via a
domain-named bucket aliased using a Domain Name System "CNAME" to
c.storage.googleapis.com will work like any other website, i.e., a GET to the
bucket will serve the configured "main" page instead of the usual bucket
listing and a GET for a non-existent object will serve the configured error
page.
For example, suppose your company's Domain name is example.com. You could set
up a website bucket as follows:
1. Create a bucket called example.com (see the "DOMAIN NAMED BUCKETS"
section of "gsutil help naming" for details about creating such buckets).
2. Create index.html and 404.html files and upload them to the bucket.
3. Configure the bucket to have website behavior using the command:
gsutil setwebcfg -m index.html -e 404.html gs://example.com
4. Add a DNS CNAME record for example.com pointing to c.storage.googleapis.com
(ask your DNS administrator for help with this).
Now if you open a browser and navigate to http://example.com, it will display
the main page instead of the default bucket listing. Note: It can take time
for DNS updates to propagate because of caching used by the DNS, so it may
take up to a day for the domain-named bucket website to work after you create
the CNAME DNS record.
Additional notes:
1. Because the main page is only served when a bucket listing request is made
via the CNAME alias, you can continue to use "gsutil ls" to list the bucket
and get the normal bucket listing (rather than the main page).
2. The main_page_suffix applies to each subdirectory of the bucket. For
example, with the main_page_suffix configured to be index.html, a GET
request for http://example.com would retrieve
http://example.com/index.html, and a GET request for
http://example.com/photos would retrieve
http://example.com/photos/index.html.
2. There is just one 404.html page: For example, a GET request for
http://example.com/photos/missing would retrieve
http://example.com/404.html, not http://example.com/photos/404.html.
3. For additional details see
https://developers.google.com/storage/docs/website-configuration.
<B>OPTIONS</B>
-m index.html Specifies the object name to serve when a bucket listing
is requested via the CNAME alias to
c.storage.googleapis.com.
-e 404.html Specifies the error page to serve when a request is made
for a non-existing object, via the is requested via the
CNAME alias to c.storage.googleapis.com.
""")
def BuildGSWebConfig(main_page_suffix=None, not_found_page=None):
config_body_l = ['<WebsiteConfiguration>']
if main_page_suffix:
config_body_l.append('<MainPageSuffix>%s</MainPageSuffix>' %
main_page_suffix)
if not_found_page:
config_body_l.append('<NotFoundPage>%s</NotFoundPage>' %
not_found_page)
config_body_l.append('</WebsiteConfiguration>')
return "".join(config_body_l)
def BuildS3WebConfig(main_page_suffix=None, error_page=None):
config_body_l = ['<WebsiteConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">']
if not main_page_suffix:
raise CommandException('S3 requires main page / index document')
config_body_l.append('<IndexDocument><Suffix>%s</Suffix></IndexDocument>' %
main_page_suffix)
if error_page:
config_body_l.append('<ErrorDocument><Key>%s</Key></ErrorDocument>' %
error_page)
config_body_l.append('</WebsiteConfiguration>')
return "".join(config_body_l)
class SetWebcfgCommand(Command):
"""Implementation of gsutil setwebcfg command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'setwebcfg',
# List of command name aliases.
COMMAND_NAME_ALIASES : [],
# Min number of args required by this command.
MIN_ARGS : 1,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : NO_MAX,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'm:e:',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 1,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'setwebcfg',
# List of help name aliases.
HELP_NAME_ALIASES : [],
# Type of help)
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Set a main page and/or error page for one or more buckets',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
main_page_suffix = None
error_page = None
if self.sub_opts:
for o, a in self.sub_opts:
if o == '-m':
main_page_suffix = a
elif o == '-e':
error_page = a
uri_args = self.args
# Iterate over URIs, expanding wildcards, and setting the website
# configuration on each.
some_matched = False
for uri_str in uri_args:
for blr in self.WildcardIterator(uri_str):
uri = blr.GetUri()
if not uri.names_bucket():
raise CommandException('URI %s must name a bucket for the %s command'
% (str(uri), self.command_name))
some_matched = True
print 'Setting website config on %s...' % uri
uri.set_website_config(main_page_suffix, error_page)
if not some_matched:
raise CommandException('No URIs matched')
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import platform
import shutil
import signal
import sys
import tarfile
import tempfile
from boto import config
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.exception import CommandException
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil update [-f] [uri]
<B>DESCRIPTION</B>
The gsutil update command downloads the latest gsutil release, checks its
version, and offers to let you update to it if it differs from the version
you're currently running.
Once you say "Y" to the prompt of whether to install the update, the gsutil
update command locates where the running copy of gsutil is installed,
unpacks the new version into an adjacent directory, moves the previous version
aside, moves the new version to where the previous version was installed,
and removes the moved-aside old version. Because of this, users are cautioned
not to store data in the gsutil directory, since that data will be lost
when you update gsutil. (Some users change directories into the gsutil
directory to run the command. We advise against doing that, for this reason.)
By default gsutil update will retrieve the new code from
gs://pub/gsutil.tar.gz, but you can optionally specify a URI to use
instead. This is primarily used for distributing pre-release versions of
the code to a small group of early test users.
<B>OPTIONS</B>
-f Forces the update command to offer to let you update, even if you
have the most current copy already. This can be useful if you have
a corrupted local copy.
""")
class UpdateCommand(Command):
"""Implementation of gsutil update command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'update',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['refresh'],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 1,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : 'f',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : True,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'update',
# List of help name aliases.
HELP_NAME_ALIASES : ['refresh'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Update to the latest gsutil release',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
def _ExplainIfSudoNeeded(self, tf, dirs_to_remove):
"""Explains what to do if sudo needed to update gsutil software.
Happens if gsutil was previously installed by a different user (typically if
someone originally installed in a shared file system location, using sudo).
Args:
tf: Opened TarFile.
dirs_to_remove: List of directories to remove.
Raises:
CommandException: if errors encountered.
"""
system = platform.system()
# If running under Windows we don't need (or have) sudo.
if system.lower().startswith('windows'):
return
user_id = os.getuid()
if (os.stat(self.gsutil_bin_dir).st_uid == user_id
and os.stat(self.boto_lib_dir).st_uid == user_id):
return
# Won't fail - this command runs after main startup code that insists on
# having a config file.
config_files = ' '.join(self.config_file_list)
self._CleanUpUpdateCommand(tf, dirs_to_remove)
raise CommandException(
('Since it was installed by a different user previously, you will need '
'to update using the following commands.\nYou will be prompted for '
'your password, and the install will run as "root". If you\'re unsure '
'what this means please ask your system administrator for help:'
'\n\tchmod 644 %s\n\tsudo env BOTO_CONFIG=%s gsutil update'
'\n\tchmod 600 %s') % (config_files, config_files, config_files),
informational=True)
# This list is checked during gsutil update by doing a lowercased
# slash-left-stripped check. For example "/Dev" would match the "dev" entry.
unsafe_update_dirs = [
'applications', 'auto', 'bin', 'boot', 'desktop', 'dev',
'documents and settings', 'etc', 'export', 'home', 'kernel', 'lib',
'lib32', 'library', 'lost+found', 'mach_kernel', 'media', 'mnt', 'net',
'null', 'network', 'opt', 'private', 'proc', 'program files', 'python',
'root', 'sbin', 'scripts', 'srv', 'sys', 'system', 'tmp', 'users', 'usr',
'var', 'volumes', 'win', 'win32', 'windows', 'winnt',
]
def _EnsureDirsSafeForUpdate(self, dirs):
"""Throws Exception if any of dirs is known to be unsafe for gsutil update.
This provides a fail-safe check to ensure we don't try to overwrite
or delete any important directories. (That shouldn't happen given the
way we construct tmp dirs, etc., but since the gsutil update cleanup
uses shutil.rmtree() it's prudent to add extra checks.)
Args:
dirs: List of directories to check.
Raises:
CommandException: If unsafe directory encountered.
"""
for d in dirs:
if not d:
d = 'null'
if d.lstrip(os.sep).lower() in self.unsafe_update_dirs:
raise CommandException('EnsureDirsSafeForUpdate: encountered unsafe '
'directory (%s); aborting update' % d)
def _CleanUpUpdateCommand(self, tf, dirs_to_remove):
"""Cleans up temp files etc. from running update command.
Args:
tf: Opened TarFile.
dirs_to_remove: List of directories to remove.
"""
tf.close()
self._EnsureDirsSafeForUpdate(dirs_to_remove)
for directory in dirs_to_remove:
try:
shutil.rmtree(directory)
except OSError as e:
# Ignore errors while attempting to remove old dirs under Windows. They
# happen because of Windows exclusive file locking, and the update
# actually succeeds but just leaves the old versions around in the
# user's temp dir.
if not platform.system().lower().startswith('windows'):
raise
# Command entry point.
def RunCommand(self):
for cfg_var in ('is_secure', 'https_validate_certificates'):
if (config.has_option('Boto', cfg_var)
and not config.getboolean('Boto', cfg_var)):
raise CommandException('Your boto configuration has %s = False. '
'The update command\ncannot be run this way, for '
'security reasons.' % cfg_var)
dirs_to_remove = []
# Retrieve gsutil tarball and check if it's newer than installed code.
# TODO: Store this version info as metadata on the tarball object and
# change this command's implementation to check that metadata instead of
# downloading the tarball to check the version info.
tmp_dir = tempfile.mkdtemp()
dirs_to_remove.append(tmp_dir)
os.chdir(tmp_dir)
print 'Checking for software update...'
if len(self.args):
update_from_uri_str = self.args[0]
if not update_from_uri_str.endswith('.tar.gz'):
raise CommandException(
'The update command only works with tar.gz files.')
else:
update_from_uri_str = 'gs://pub/gsutil.tar.gz'
self.command_runner.RunNamedCommand('cp', [update_from_uri_str,
'file://gsutil.tar.gz'],
self.headers, self.debug)
# Note: tf is closed in _CleanUpUpdateCommand.
tf = tarfile.open('gsutil.tar.gz')
tf.errorlevel = 1 # So fatal tarball unpack errors raise exceptions.
tf.extract('./gsutil/VERSION')
ver_file = open('gsutil/VERSION', 'r')
try:
latest_version_string = ver_file.read().rstrip('\n')
finally:
ver_file.close()
force_update = False
if self.sub_opts:
for o, unused_a in self.sub_opts:
if o == '-f':
force_update = True
if not force_update and self.gsutil_ver == latest_version_string:
self._CleanUpUpdateCommand(tf, dirs_to_remove)
if len(self.args):
raise CommandException('You already have %s installed.' %
update_from_uri_str, informational=True)
else:
raise CommandException('You already have the latest gsutil release '
'installed.', informational=True)
print(('This command will update to the "%s" version of\ngsutil at %s') %
(latest_version_string, self.gsutil_bin_dir))
self._ExplainIfSudoNeeded(tf, dirs_to_remove)
answer = raw_input('Proceed? [y/N] ')
if not answer or answer.lower()[0] != 'y':
self._CleanUpUpdateCommand(tf, dirs_to_remove)
raise CommandException('Not running update.', informational=True)
# Ignore keyboard interrupts during the update to reduce the chance someone
# hitting ^C leaves gsutil in a broken state.
signal.signal(signal.SIGINT, signal.SIG_IGN)
# self.gsutil_bin_dir lists the path where the code should end up (like
# /usr/local/gsutil), which is one level down from the relative path in the
# tarball (since the latter creates files in ./gsutil). So, we need to
# extract at the parent directory level.
gsutil_bin_parent_dir = os.path.dirname(self.gsutil_bin_dir)
# Extract tarball to a temporary directory in a sibling to gsutil_bin_dir.
old_dir = tempfile.mkdtemp(dir=gsutil_bin_parent_dir)
new_dir = tempfile.mkdtemp(dir=gsutil_bin_parent_dir)
dirs_to_remove.append(old_dir)
dirs_to_remove.append(new_dir)
self._EnsureDirsSafeForUpdate(dirs_to_remove)
try:
tf.extractall(path=new_dir)
except Exception, e:
self._CleanUpUpdateCommand(tf, dirs_to_remove)
raise CommandException('Update failed: %s.' % e)
# For enterprise mode (shared/central) installation, users with
# different user/group than the installation user/group must be
# able to run gsutil so we need to do some permissions adjustments
# here. Since enterprise mode is not not supported for Windows
# users, we can skip this step when running on Windows, which
# avoids the problem that Windows has no find or xargs command.
system = platform.system()
if not system.lower().startswith('windows'):
# Make all files and dirs in updated area readable by other
# and make all directories executable by other. These steps
os.system('chmod -R o+r ' + new_dir)
os.system('find ' + new_dir + ' -type d | xargs chmod o+x')
# Make main gsutil script readable and executable by other.
os.system('chmod o+rx ' + os.path.join(new_dir, 'gsutil'))
# Move old installation aside and new into place.
os.rename(self.gsutil_bin_dir, old_dir + os.sep + 'old')
os.rename(new_dir + os.sep + 'gsutil', self.gsutil_bin_dir)
self._CleanUpUpdateCommand(tf, dirs_to_remove)
signal.signal(signal.SIGINT, signal.SIG_DFL)
print 'Update complete.'
return 0
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import boto
import os
import re
import sys
from boto.pyami.config import BotoConfigLocations
from gslib.command import Command
from gslib.command import COMMAND_NAME
from gslib.command import COMMAND_NAME_ALIASES
from gslib.command import CONFIG_REQUIRED
from gslib.command import FILE_URIS_OK
from gslib.command import MAX_ARGS
from gslib.command import MIN_ARGS
from gslib.command import PROVIDER_URIS_OK
from gslib.command import SUPPORTED_SUB_ARGS
from gslib.command import URIS_START_ARG
from gslib.help_provider import HELP_NAME
from gslib.help_provider import HELP_NAME_ALIASES
from gslib.help_provider import HELP_ONE_LINE_SUMMARY
from gslib.help_provider import HELP_TEXT
from gslib.help_provider import HelpType
from gslib.help_provider import HELP_TYPE
from hashlib import md5
_detailed_help_text = ("""
<B>SYNOPSIS</B>
gsutil version
<B>DESCRIPTION</B>
Prints information about the version of gsutil, boto, and Python being
run on your system.
""")
class VersionCommand(Command):
"""Implementation of gsutil version command."""
# Command specification (processed by parent class).
command_spec = {
# Name of command.
COMMAND_NAME : 'version',
# List of command name aliases.
COMMAND_NAME_ALIASES : ['ver'],
# Min number of args required by this command.
MIN_ARGS : 0,
# Max number of args required by this command, or NO_MAX.
MAX_ARGS : 0,
# Getopt-style string specifying acceptable sub args.
SUPPORTED_SUB_ARGS : '',
# True if file URIs acceptable for this command.
FILE_URIS_OK : False,
# True if provider-only URIs acceptable for this command.
PROVIDER_URIS_OK : False,
# Index in args of first URI arg.
URIS_START_ARG : 0,
# True if must configure gsutil before running command.
CONFIG_REQUIRED : False,
}
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : 'version',
# List of help name aliases.
HELP_NAME_ALIASES : ['ver'],
# Type of help:
HELP_TYPE : HelpType.COMMAND_HELP,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : 'Print version info about gsutil',
# The full help text.
HELP_TEXT : _detailed_help_text,
}
# Command entry point.
def RunCommand(self):
for path in BotoConfigLocations:
f = None
try:
f = open(path, 'r')
break
except IOError:
pass
finally:
if f:
f.close()
else:
path = "no config found"
try:
f = open(os.path.join(self.gsutil_bin_dir, 'CHECKSUM'))
shipped_checksum = f.read().strip()
f.close()
except IOError:
shipped_checksum = 'MISSING'
try:
cur_checksum = self._ComputeCodeChecksum()
except IOError:
cur_checksum = 'MISSING FILES'
if shipped_checksum == cur_checksum:
checksum_ok_str = 'OK'
else:
checksum_ok_str = '!= %s' % shipped_checksum
sys.stderr.write(
'gsutil version %s\nchecksum %s (%s)\n'
'boto version %s\npython version %s\n'
'config path: %s\ngsutil path: %s\n' % (
self.gsutil_ver, cur_checksum, checksum_ok_str,
boto.__version__, sys.version, path, os.path.realpath(sys.argv[0])))
return 0
def _ComputeCodeChecksum(self):
"""
Computes a checksum of gsutil code so we can see if users locally modified
gsutil when requesting support. (It's fine for users to make local mods,
but when users ask for support we ask them to run a stock version of
gsutil so we can reduce possible variables.)
"""
m = md5()
# Checksum gsutil and all .py files under gsutil bin (including bundled
# libs). Although we will eventually make gsutil allow use of a centrally
# installed boto (once boto shifts to more frequent releases), in that case
# the local copies still should not have any user modifications.
files_to_checksum = [os.path.join(self.gsutil_bin_dir, 'gsutil')]
for root, sub_folders, files in os.walk(self.gsutil_bin_dir):
for file in files:
if file[-3:] == '.py':
files_to_checksum.append(os.path.join(root, file))
# Sort to ensure consistent checksum build, no matter how os.walk
# orders the list.
for file in sorted(files_to_checksum):
f = open(file, 'r')
content = f.read()
content = re.sub(r'(\r\n|\r|\n)', '\n', content)
m.update(content)
f.close()
return m.hexdigest()
# Copyright 2010 Google Inc. All Rights Reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish, dis-
# tribute, sublicense, and/or sell copies of the Software, and to permit
# persons to whom the Software is furnished to do so, subject to the fol-
# lowing conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL-
# ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
# SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
"""gsutil exceptions."""
class AbortException(StandardError):
"""Exception raised when a user aborts a command that needs to do cleanup."""
def __init__(self, reason):
StandardError.__init__(self)
self.reason = reason
def __repr__(self):
return 'AbortException: %s' % self.reason
def __str__(self):
return 'AbortException: %s' % self.reason
class CommandException(StandardError):
"""Exception raised when a problem is encountered running a gsutil command.
This exception should be used to signal user errors or system failures
(like timeouts), not bugs (like an incorrect param value). For the
latter you should raise Exception so we can see where/how it happened
via gsutil -D (which will include a stack trace for raised Exceptions).
"""
def __init__(self, reason, informational=False):
"""Instantiate a CommandException.
Args:
reason: Text describing the problem.
informational: Indicates reason should be printed as FYI, not a failure.
"""
StandardError.__init__(self)
self.reason = reason
self.informational = informational
def __repr__(self):
return 'CommandException: %s' % self.reason
def __str__(self):
return 'CommandException: %s' % self.reason
class ProjectIdException(StandardError):
def __init__(self, reason):
StandardError.__init__(self)
self.reason = reason
def __repr__(self):
return 'ProjectIdException: %s' % self.reason
def __str__(self):
return 'ProjectIdException: %s' % self.reason
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from gslib.exception import CommandException
class HelpType(object):
COMMAND_HELP = 'command_help'
ADDITIONAL_HELP = 'additional_help'
ALL_HELP_TYPES = [HelpType.COMMAND_HELP, HelpType.ADDITIONAL_HELP]
# help_spec key constants.
HELP_NAME = 'help_name'
HELP_NAME_ALIASES = 'help_name_aliases'
HELP_TYPE = 'help_type'
HELP_ONE_LINE_SUMMARY = 'help_one_line_summary'
HELP_TEXT = 'help_text'
# Constants enforced by SanityCheck
MAX_HELP_NAME_LEN = 15
MIN_ONE_LINE_SUMMARY_LEN = 10
MAX_ONE_LINE_SUMMARY_LEN = 80 - MAX_HELP_NAME_LEN
REQUIRED_SPEC_KEYS = [HELP_NAME, HELP_NAME_ALIASES, HELP_TYPE,
HELP_ONE_LINE_SUMMARY, HELP_TEXT]
class HelpProvider(object):
"""Interface for providing help."""
# Each subclass must define the following map.
help_spec = {
# Name of command or auxiliary help info for which this help applies.
HELP_NAME : None,
# List of help name aliases.
HELP_NAME_ALIASES : None,
# HelpType.
HELP_TYPE : None,
# One line summary of this help.
HELP_ONE_LINE_SUMMARY : None,
# The full help text.
HELP_TEXT : None,
}
# This is a static helper instead of a class method because the help loader
# (gslib.commands.help._LoadHelpMaps()) operates on classes not instances.
def SanityCheck(help_provider, help_name_map):
"""Helper for checking that a HelpProvider has minimally adequate content."""
for k in REQUIRED_SPEC_KEYS:
if k not in help_provider.help_spec or help_provider.help_spec[k] is None:
raise CommandException('"%s" help implementation is missing %s '
'specification' % (help_provider.help_name, k))
# Sanity check the content.
assert (len(help_provider.help_spec[HELP_NAME]) > 1
and len(help_provider.help_spec[HELP_NAME]) < MAX_HELP_NAME_LEN)
for hna in help_provider.help_spec[HELP_NAME_ALIASES]:
assert len(hna) > 0
one_line_summary_len = len(help_provider.help_spec[HELP_ONE_LINE_SUMMARY])
assert (one_line_summary_len > MIN_ONE_LINE_SUMMARY_LEN
and one_line_summary_len < MAX_ONE_LINE_SUMMARY_LEN)
assert len(help_provider.help_spec[HELP_TEXT]) > 10
# Ensure there are no dupe help names or aliases across commands.
name_check_list = [help_provider.help_spec[HELP_NAME]]
name_check_list.extend(help_provider.help_spec[HELP_NAME_ALIASES])
for name_or_alias in name_check_list:
if help_name_map.has_key(name_or_alias):
raise CommandException(
'Duplicate help name/alias "%s" found while loading help from %s. '
'That name/alias was already taken by %s' % (name_or_alias,
help_provider.__module__, help_name_map[name_or_alias].__module__))
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import copy
import threading
import wildcard_iterator
from bucket_listing_ref import BucketListingRef
from gslib.exception import CommandException
from gslib.plurality_checkable_iterator import PluralityCheckableIterator
from gslib.storage_uri_builder import StorageUriBuilder
from wildcard_iterator import ContainsWildcard
"""
Name expansion support for the various ways gsutil lets users refer to
collections of data (via explicit wildcarding as well as directory,
bucket, and bucket subdir implicit wildcarding). This class encapsulates
the various rules for determining how these expansions are done.
"""
class NameExpansionResult(object):
"""
Holds one fully expanded result from iterating over NameExpansionIterator.
The member data in this class need to be pickleable because
NameExpansionResult instances are passed through Multiprocessing.Queue. In
particular, don't include any boto state like StorageUri, since that pulls
in a big tree of objects, some of which aren't pickleable (and even if
they were, pickling/unpickling such a large object tree would result in
significant overhead).
The state held in this object is needed for handling the various naming cases
(e.g., copying from a single source URI to a directory generates different
dest URI names than copying multiple URIs to a directory, to be consistent
with naming rules used by the Unix cp command). For more details see comments
in _NameExpansionIterator.
"""
def __init__(self, src_uri_str, is_multi_src_request,
src_uri_expands_to_multi, names_container, expanded_uri_str,
have_existing_dst_container=None, is_latest=False):
"""
Args:
src_uri_str: string representation of StorageUri that was expanded.
is_multi_src_request: bool indicator whether src_uri_str expanded to more
than 1 BucketListingRef.
src_uri_expands_to_multi: bool indicator whether the current src_uri
expanded to more than 1 BucketListingRef.
names_container: Bool indicator whether src_uri names a container.
expanded_uri_str: string representation of StorageUri to which src_uri_str
expands.
have_existing_dst_container: bool indicator whether this is a copy
request to an existing bucket, bucket subdir, or directory. Default
None value should be used in cases where this is not needed (commands
other than cp).
is_latest: Bool indicating that the result represents the object's current
version.
"""
self.src_uri_str = src_uri_str
self.is_multi_src_request = is_multi_src_request
self.src_uri_expands_to_multi = src_uri_expands_to_multi
self.names_container = names_container
self.expanded_uri_str = expanded_uri_str
self.have_existing_dst_container = have_existing_dst_container
self.is_latest = is_latest
def __repr__(self):
return '%s' % self.expanded_uri_str
def IsEmpty(self):
"""Returns True if name expansion yielded no matches."""
return self.expanded_blr is None
def GetSrcUriStr(self):
"""Returns the string representation of the StorageUri that was expanded."""
return self.src_uri_str
def IsMultiSrcRequest(self):
"""
Returns bool indicator whether name expansion resulted in more than 0
BucketListingRef.
"""
return self.is_multi_src_request
def SrcUriExpandsToMulti(self):
"""
Returns bool indicator whether the current src_uri expanded to more than
1 BucketListingRef
"""
return self.src_uri_expands_to_multi
def NamesContainer(self):
"""
Returns bool indicator of whether src_uri names a directory, bucket, or
bucket subdir.
"""
return self.names_container
def GetExpandedUriStr(self):
"""
Returns the string representation of StorageUri to which src_uri_str
expands.
"""
return self.expanded_uri_str
def HaveExistingDstContainer(self):
"""Returns bool indicator whether this is a copy request to an
existing bucket, bucket subdir, or directory, or None if not
relevant."""
return self.have_existing_dst_container
class _NameExpansionIterator(object):
"""
Iterates over all src_uris, expanding wildcards, object-less bucket names,
subdir bucket names, and directory names, generating a flat listing of all
the matching objects/files.
You should instantiate this object using the static factory function
NameExpansionIterator, because consumers of this iterator need the
PluralityCheckableIterator wrapper built by that function.
Yields:
gslib.name_expansion.NameExpansionResult.
Raises:
CommandException: if errors encountered.
"""
def __init__(self, command_name, proj_id_handler, headers, debug,
bucket_storage_uri_class, uri_strs, recursion_requested,
have_existing_dst_container=None, flat=True,
all_versions=False, for_all_version_delete=False):
"""
Args:
command_name: name of command being run.
proj_id_handler: ProjectIdHandler to use for current command.
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
bucket_storage_uri_class: Class to instantiate for cloud StorageUris.
Settable for testing/mocking.
uri_strs: PluralityCheckableIterator of URI strings needing expansion.
recursion_requested: True if -R specified on command-line.
have_existing_dst_container: Bool indicator whether this is a copy
request to an existing bucket, bucket subdir, or directory. Default
None value should be used in cases where this is not needed (commands
other than cp).
flat: Bool indicating whether bucket listings should be flattened, i.e.,
so the mapped-to results contain objects spanning subdirectories.
all_versions: Bool indicating whether to iterate over all object versions.
for_all_version_delete: Bool indicating whether this is for an all-version
delete.
Examples of _NameExpansionIterator with flat=True:
- Calling with one of the uri_strs being 'gs://bucket' will enumerate all
top-level objects, as will 'gs://bucket/' and 'gs://bucket/*'.
- 'gs://bucket/**' will enumerate all objects in the bucket.
- 'gs://bucket/abc' will enumerate all next-level objects under directory
abc (i.e., not including subdirectories of abc) if gs://bucket/abc/*
matches any objects; otherwise it will enumerate the single name
gs://bucket/abc
- 'gs://bucket/abc/**' will enumerate all objects under abc or any of its
subdirectories.
- 'file:///tmp' will enumerate all files under /tmp, as will
'file:///tmp/*'
- 'file:///tmp/**' will enumerate all files under /tmp or any of its
subdirectories.
Example if flat=False: calling with gs://bucket/abc/* lists matching objects
or subdirs, but not sub-subdirs or objects beneath subdirs.
Note: In step-by-step comments below we give examples assuming there's a
gs://bucket with object paths:
abcd/o1.txt
abcd/o2.txt
xyz/o1.txt
xyz/o2.txt
and a directory file://dir with file paths:
dir/a.txt
dir/b.txt
dir/c/
"""
self.command_name = command_name
self.proj_id_handler = proj_id_handler
self.headers = headers
self.debug = debug
self.bucket_storage_uri_class = bucket_storage_uri_class
self.suri_builder = StorageUriBuilder(debug, bucket_storage_uri_class)
self.uri_strs = uri_strs
self.recursion_requested = recursion_requested
self.have_existing_dst_container = have_existing_dst_container
self.flat = flat
self.all_versions = all_versions
# Map holding wildcard strings to use for flat vs subdir-by-subdir listings.
# (A flat listing means show all objects expanded all the way down.)
self._flatness_wildcard = {True: '**', False: '*'}
def __iter__(self):
for uri_str in self.uri_strs:
# Step 1: Expand any explicitly specified wildcards. The output from this
# step is an iterator of BucketListingRef.
# Starting with gs://buck*/abc* this step would expand to gs://bucket/abcd
if ContainsWildcard(uri_str):
post_step1_iter = self._WildcardIterator(uri_str)
else:
suri = self.suri_builder.StorageUri(uri_str)
post_step1_iter = iter([BucketListingRef(suri)])
post_step1_iter = PluralityCheckableIterator(post_step1_iter)
# Step 2: Expand bucket subdirs and versions. The output from this
# step is an iterator of (names_container, BucketListingRef).
# Starting with gs://bucket/abcd this step would expand to:
# iter([(True, abcd/o1.txt), (True, abcd/o2.txt)]).
if self.flat and self.recursion_requested:
post_step2_iter = _ImplicitBucketSubdirIterator(self,
post_step1_iter, self.flat)
elif self.all_versions:
post_step2_iter = _AllVersionIterator(self, post_step1_iter,
headers=self.headers)
else:
post_step2_iter = _NonContainerTuplifyIterator(post_step1_iter)
post_step2_iter = PluralityCheckableIterator(post_step2_iter)
# Step 3. Expand directories and buckets. This step yields the iterated
# values. Starting with gs://bucket this step would expand to:
# [abcd/o1.txt, abcd/o2.txt, xyz/o1.txt, xyz/o2.txt]
# Starting with file://dir this step would expand to:
# [dir/a.txt, dir/b.txt, dir/c/]
exp_src_bucket_listing_refs = []
wc = self._flatness_wildcard[self.flat]
src_uri_expands_to_multi = (post_step1_iter.has_plurality()
or post_step2_iter.has_plurality())
is_multi_src_request = (self.uri_strs.has_plurality()
or src_uri_expands_to_multi)
if post_step2_iter.is_empty():
raise CommandException('No URIs matched: %s' % uri_str)
for (names_container, blr) in post_step2_iter:
if (not blr.GetUri().names_container()
and (self.flat or not blr.HasPrefix())):
yield NameExpansionResult(uri_str, is_multi_src_request,
src_uri_expands_to_multi, names_container,
blr.GetUriString(),
self.have_existing_dst_container,
is_latest=blr.IsLatest())
continue
if not self.recursion_requested:
if blr.GetUri().is_file_uri():
desc = 'directory'
else:
desc = 'bucket'
print 'Omitting %s "%s". (Did you mean to do %s -R?)' % (
desc, blr.GetUri(), self.command_name)
continue
if blr.GetUri().is_file_uri():
# Convert dir to implicit recursive wildcard.
uri_to_iterate = '%s/%s' % (blr.GetUriString(), wc)
else:
# Convert bucket to implicit recursive wildcard.
uri_to_iterate = blr.GetUri().clone_replace_name(wc)
wc_iter = PluralityCheckableIterator(
self._WildcardIterator(uri_to_iterate))
src_uri_expands_to_multi = (src_uri_expands_to_multi
or wc_iter.has_plurality())
is_multi_src_request = (self.uri_strs.has_plurality()
or src_uri_expands_to_multi)
for blr in wc_iter:
yield NameExpansionResult(uri_str, is_multi_src_request,
src_uri_expands_to_multi, True,
blr.GetUriString(),
self.have_existing_dst_container,
is_latest=blr.IsLatest())
def _WildcardIterator(self, uri_or_str):
"""
Helper to instantiate gslib.WildcardIterator. Args are same as
gslib.WildcardIterator interface, but this method fills in most of the
values from instance state.
Args:
uri_or_str: StorageUri or URI string naming wildcard objects to iterate.
"""
return wildcard_iterator.wildcard_iterator(
uri_or_str, self.proj_id_handler,
bucket_storage_uri_class=self.bucket_storage_uri_class,
headers=self.headers, debug=self.debug,
all_versions=self.all_versions)
def NameExpansionIterator(command_name, proj_id_handler, headers, debug,
bucket_storage_uri_class, uri_strs,
recursion_requested,
have_existing_dst_container=None, flat=True,
all_versions=False,
for_all_version_delete=False):
"""
Static factory function for instantiating _NameExpansionIterator, which
wraps the resulting iterator in a PluralityCheckableIterator and checks
that it is non-empty. Also, allows uri_strs can be either an array or an
iterator.
Args:
command_name: name of command being run.
proj_id_handler: ProjectIdHandler to use for current command.
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
bucket_storage_uri_class: Class to instantiate for cloud StorageUris.
Settable for testing/mocking.
uri_strs: PluralityCheckableIterator of URI strings needing expansion.
recursion_requested: True if -R specified on command-line.
have_existing_dst_container: Bool indicator whether this is a copy
request to an existing bucket, bucket subdir, or directory. Default
None value should be used in cases where this is not needed (commands
other than cp).
flat: Bool indicating whether bucket listings should be flattened, i.e.,
so the mapped-to results contain objects spanning subdirectories.
all_versions: Bool indicating whether to iterate over all object versions.
for_all_version_delete: Bool indicating whether this is for an all-version
delete.
Examples of ExpandWildcardsAndContainers with flat=True:
- Calling with one of the uri_strs being 'gs://bucket' will enumerate all
top-level objects, as will 'gs://bucket/' and 'gs://bucket/*'.
- 'gs://bucket/**' will enumerate all objects in the bucket.
- 'gs://bucket/abc' will enumerate all next-level objects under directory
abc (i.e., not including subdirectories of abc) if gs://bucket/abc/*
matches any objects; otherwise it will enumerate the single name
gs://bucket/abc
- 'gs://bucket/abc/**' will enumerate all objects under abc or any of its
subdirectories.
- 'file:///tmp' will enumerate all files under /tmp, as will
'file:///tmp/*'
- 'file:///tmp/**' will enumerate all files under /tmp or any of its
subdirectories.
Example if flat=False: calling with gs://bucket/abc/* lists matching objects
or subdirs, but not sub-subdirs or objects beneath subdirs.
Note: In step-by-step comments below we give examples assuming there's a
gs://bucket with object paths:
abcd/o1.txt
abcd/o2.txt
xyz/o1.txt
xyz/o2.txt
and a directory file://dir with file paths:
dir/a.txt
dir/b.txt
dir/c/
"""
uri_strs = PluralityCheckableIterator(uri_strs)
name_expansion_iterator = _NameExpansionIterator(
command_name, proj_id_handler, headers, debug, bucket_storage_uri_class,
uri_strs, recursion_requested, have_existing_dst_container, flat,
all_versions=all_versions, for_all_version_delete=for_all_version_delete)
name_expansion_iterator = PluralityCheckableIterator(name_expansion_iterator)
if name_expansion_iterator.is_empty():
raise CommandException('No URIs matched')
return name_expansion_iterator
class NameExpansionIteratorQueue(object):
"""
Wrapper around NameExpansionIterator that provides a Multiprocessing.Queue
facade.
Only a blocking get() function can be called, and the block and timeout
params on that function are ignored. All other class functions raise
NotImplementedError.
This class is thread safe.
"""
def __init__(self, name_expansion_iterator, final_value):
self.name_expansion_iterator = name_expansion_iterator
self.final_value = final_value
self.lock = threading.Lock()
def qsize(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.qsize() not implemented")
def empty(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.empty() not implemented")
def full(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.full() not implemented")
def put(self, obj=None, block=None, timeout=None):
raise NotImplementedError(
"NameExpansionIteratorQueue.put() not implemented")
def put_nowait(self, obj):
raise NotImplementedError(
"NameExpansionIteratorQueue.put_nowait() not implemented")
def get(self, block=None, timeout=None):
self.lock.acquire()
try:
if self.name_expansion_iterator.is_empty():
return self.final_value
return self.name_expansion_iterator.next()
finally:
self.lock.release()
def get_nowait(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.get_nowait() not implemented")
def get_no_wait(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.get_no_wait() not implemented")
def close(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.close() not implemented")
def join_thread(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.join_thread() not implemented")
def cancel_join_thread(self):
raise NotImplementedError(
"NameExpansionIteratorQueue.cancel_join_thread() not implemented")
class _NonContainerTuplifyIterator(object):
"""
Iterator that produces the tuple (False, blr) for each iteration
of blr_iter. Used for cases where blr_iter iterates over a set of
BucketListingRefs known not to name containers.
"""
def __init__(self, blr_iter):
"""
Args:
blr_iter: iterator of BucketListingRef.
"""
self.blr_iter = blr_iter
def __iter__(self):
for blr in self.blr_iter:
yield (False, blr)
class _ImplicitBucketSubdirIterator(object):
"""
Iterator wrapper that iterates over blr_iter, performing implicit bucket
subdir expansion.
Each iteration yields tuple (names_container, expanded BucketListingRefs)
where names_container is true if URI names a directory, bucket,
or bucket subdir (vs how StorageUri.names_container() doesn't
handle latter case).
For example, iterating over [BucketListingRef("gs://abc")] would expand to:
[BucketListingRef("gs://abc/o1"), BucketListingRef("gs://abc/o2")]
if those subdir objects exist, and [BucketListingRef("gs://abc") otherwise.
"""
def __init__(self, name_expansion_instance, blr_iter, flat):
"""
Args:
name_expansion_instance: calling instance of NameExpansion class.
blr_iter: iterator of BucketListingRef.
flat: bool indicating whether bucket listings should be flattened, i.e.,
so the mapped-to results contain objects spanning subdirectories.
"""
self.blr_iter = blr_iter
self.name_expansion_instance = name_expansion_instance
self.flat = flat
def __iter__(self):
for blr in self.blr_iter:
uri = blr.GetUri()
if uri.names_object():
# URI could be a bucket subdir.
implicit_subdir_iterator = PluralityCheckableIterator(
self.name_expansion_instance._WildcardIterator(
self.name_expansion_instance.suri_builder.StorageUri(
'%s/%s' % (uri.uri.rstrip('/'),
self.name_expansion_instance._flatness_wildcard[
self.flat]))))
if not implicit_subdir_iterator.is_empty():
for exp_blr in implicit_subdir_iterator:
yield (True, exp_blr)
else:
yield (False, blr)
else:
yield (False, blr)
class _AllVersionIterator(object):
"""
Iterator wrapper that iterates over blr_iter, performing implicit version
expansion.
Output behavior is identical to that in _ImplicitBucketSubdirIterator above.
For example, iterating over [BucketListingRef("gs://abc/o1")] would expand to:
[BucketListingRef("gs://abc/o1#1234"), BucketListingRef("gs://abc/o1#1235")]
"""
def __init__(self, name_expansion_instance, blr_iter, headers=None):
"""
Args:
name_expansion_instance: calling instance of NameExpansion class.
blr_iter: iterator of BucketListingRef.
flat: bool indicating whether bucket listings should be flattened, i.e.,
so the mapped-to results contain objects spanning subdirectories.
"""
self.blr_iter = blr_iter
self.name_expansion_instance = name_expansion_instance
self.headers = headers
def __iter__(self):
empty = True
for blr in self.blr_iter:
uri = blr.GetUri()
if not uri.names_object():
empty = False
yield (True, blr)
break
for key in uri.list_bucket(
prefix=uri.object_name, headers=self.headers, all_versions=True):
if key.name != uri.object_name:
# The desired entries will be alphabetically first in this listing.
break
version_blr = BucketListingRef(uri.clone_replace_key(key), key=key)
empty = False
yield (False, version_blr)
# If no version exists, yield the unversioned blr, and let the consuming
# operation fail. This mirrors behavior in _ImplicitBucketSubdirIterator.
if empty:
yield (False, blr)
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This code implements a no-op auth plugin, which allows users to use
# gsutil for accessing publicly readable buckets and objects without first
# signing up for an account.
from boto.auth_handler import AuthHandler
class NoOpAuth(AuthHandler):
capability = ['s3']
def __init__(self, path, config, provider):
pass
def add_auth(self, http_request):
pass
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Iterator wrapper that allows you to check whether the wrapped iterator
is empty and whether it has more than 1 element.
"""
class PluralityCheckableIterator(object):
def __init__(self, it):
self.it = it.__iter__()
self.head = []
# Populate first 2 elems into head so we can check whether iterator has
# more than 1 item.
for i in range(0, 2):
self.__populate_head__()
def __populate_head__(self):
try:
e = self.it.next()
self.underlying_iter_empty = False
self.head.append(e)
except StopIteration:
# Indicates we can no longer call next() on underlying iterator, but
# there could still be elements left to iterate in head.
self.underlying_iter_empty = True
def __iter__(self):
while len(self.head) > 0:
yield self.next()
else:
raise StopIteration()
def next(self):
# Backfill into head each time we pop an element so we can always check
# for emptiness and for has_plurality().
self.__populate_head__()
return self.head.pop(0)
def is_empty(self):
return len(self.head) == 0
def has_plurality(self):
return len(self.head) > 1
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import boto
from gslib.exception import ProjectIdException
from gslib.wildcard_iterator import WILDCARD_BUCKET_ITERATOR
GOOG_PROJ_ID_HDR = 'x-goog-project-id'
class ProjectIdHandler(object):
"""Google Project ID header handling."""
def __init__(self):
"""Instantiates Project ID handler. Call after boto config file loaded."""
config = boto.config
self.project_id = config.get_value('GSUtil', 'default_project_id')
def SetProjectId(self, project_id):
"""Overrides project ID value from config file default.
Args:
project_id: Project ID to use.
"""
self.project_id = project_id
def FillInProjectHeaderIfNeeded(self, command, uri, headers):
"""Fills project ID header into headers if defined and applicable.
Args:
command: The command being run.
uri: The URI against which this command is being run.
headers: Dictionary containing optional HTTP headers to pass to boto.
Must not be None.
"""
# We only include the project ID header if it's a GS URI and a project_id
# was specified and
# (it's an 'mb', 'disablelogging, or 'enablelogging' command or
# a boto request in integration tests or
# (an 'ls' command that doesn't specify a bucket or wildcarded bucket)).
if (uri.scheme.lower() == 'gs' and self.project_id
and (command == 'mb' or command == 'disablelogging'
or command == 'enablelogging'
or command == 'test'
or (command == 'ls' and not uri.names_bucket())
or (command == WILDCARD_BUCKET_ITERATOR))):
# Note: check for None (as opposed to "not headers") here because
# it's ok to pass empty headers.
if headers is None:
raise ProjectIdException(
'FillInProjectHeaderIfNeeded called with headers=None')
headers[GOOG_PROJ_ID_HDR] = self.project_id
elif headers.has_key(GOOG_PROJ_ID_HDR):
del headers[GOOG_PROJ_ID_HDR]
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Class that holds state (bucket_storage_uri_class and debug) needed for
instantiating StorageUri objects. The StorageUri func defined in this class
uses that state plus gsutil default flag values to instantiate this frequently
constructed object with just one param for most cases.
"""
import boto
from gslib.exception import CommandException
class StorageUriBuilder(object):
def __init__(self, debug, bucket_storage_uri_class):
"""
Args:
debug: Debug level to pass in to boto connection (range 0..3).
bucket_storage_uri_class: Class to instantiate for cloud StorageUris.
Settable for testing/mocking.
"""
self.bucket_storage_uri_class = bucket_storage_uri_class
self.debug = debug
def StorageUri(self, uri_str, is_latest=False):
"""
Instantiates StorageUri using class state and gsutil default flag values.
Args:
uri_str: StorageUri naming bucket or object.
is_latest: boolean indicating whether this versioned object represents the
current version.
Returns:
boto.StorageUri for given uri_str.
Raises:
InvalidUriError: if uri_str not valid.
"""
return boto.storage_uri(
uri_str, 'file', debug=self.debug, validate=False,
bucket_storage_uri_class=self.bucket_storage_uri_class,
suppress_consec_slashes=False, is_latest=is_latest)
# Copyright 2011 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Basic thread pool with exception handler."""
import logging
import Queue
import threading
# Magic values used to cleanly bring down threads.
_THREAD_EXIT_MAGIC = ('Clean', 'Thread', 'Exit')
def _DefaultExceptionHandler(e):
logging.exception(e)
class Worker(threading.Thread):
"""Thread executing tasks from a given task's queue."""
def __init__(self, tasks, exception_handler):
threading.Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.exception_handler = exception_handler
self.start()
def run(self):
while True:
func, args, kargs = self.tasks.get()
# Listen for magic value indicating thread exit.
if (func, args, kargs) == _THREAD_EXIT_MAGIC:
break
try:
func(*args, **kargs)
except Exception, e:
self.exception_handler(e)
finally:
self.tasks.task_done()
class ThreadPool(object):
"""Pool of threads consuming tasks from a queue."""
def __init__(self, num_threads, exception_handler=_DefaultExceptionHandler):
self.tasks = Queue.Queue(num_threads)
self.threads = []
for _ in range(num_threads):
self.threads.append(Worker(self.tasks, exception_handler))
def AddTask(self, func, *args, **kargs):
"""Add a task to the queue."""
self.tasks.put((func, args, kargs))
def WaitCompletion(self):
"""Wait for completion of all the tasks in the queue."""
self.tasks.join()
def Shutdown(self):
"""Shutdown the thread pool."""
for thread in self.threads:
self.tasks.put(_THREAD_EXIT_MAGIC)
for thread in self.threads:
thread.join()
# Copyright 2010 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Static data and helper functions."""
import math
import re
import sys
import time
import boto
from third_party.retry_decorator.decorators import retry
# We don't use the oauth2 authentication plugin directly; importing it here
# ensures that it's loaded and available by default. Note: we made this static
# state instead of Command instance state because the top-level gsutil code
# needs to check it.
HAVE_OAUTH2 = False
try:
from oauth2_plugin import oauth2_helper
HAVE_OAUTH2 = True
except ImportError:
pass
TWO_MB = 2 * 1024 * 1024
NO_MAX = sys.maxint
# Binary exponentiation strings.
_EXP_STRINGS = [
(0, 'B', 'bit'),
(10, 'KB', 'kbit'),
(20, 'MB', 'Mbit'),
(30, 'GB', 'Gbit'),
(40, 'TB', 'Tbit'),
(50, 'PB', 'Pbit'),
]
# Detect platform types.
IS_WINDOWS = 'win32' in str(sys.platform).lower()
IS_LINUX = 'linux' in str(sys.platform).lower()
IS_OSX = 'darwin' in str(sys.platform).lower()
Retry = retry
# Enum class for specifying listing style.
class ListingStyle(object):
SHORT = 'SHORT'
LONG = 'LONG'
LONG_LONG = 'LONG_LONG'
def HasConfiguredCredentials():
"""Determines if boto credential/config file exists."""
config = boto.config
has_goog_creds = (config.has_option('Credentials', 'gs_access_key_id') and
config.has_option('Credentials', 'gs_secret_access_key'))
has_amzn_creds = (config.has_option('Credentials', 'aws_access_key_id') and
config.has_option('Credentials', 'aws_secret_access_key'))
has_oauth_creds = (HAVE_OAUTH2 and
config.has_option('Credentials', 'gs_oauth2_refresh_token'))
has_auth_plugins = config.has_option('Plugin', 'plugin_directory')
return (has_goog_creds or has_amzn_creds or has_oauth_creds
or has_auth_plugins)
def _RoundToNearestExponent(num):
i = 0
while i+1 < len(_EXP_STRINGS) and num >= (2 ** _EXP_STRINGS[i+1][0]):
i += 1
return i, round(float(num) / 2 ** _EXP_STRINGS[i][0], 2)
def MakeHumanReadable(num):
"""Generates human readable string for a number of bytes.
Args:
num: The number, in bytes.
Returns:
A string form of the number using size abbreviations (KB, MB, etc.).
"""
i, rounded_val = _RoundToNearestExponent(num)
return '%s %s' % (rounded_val, _EXP_STRINGS[i][1])
def MakeBitsHumanReadable(num):
"""Generates human readable string for a number of bits.
Args:
num: The number, in bits.
Returns:
A string form of the number using bit size abbreviations (kbit, Mbit, etc.)
"""
i, rounded_val = _RoundToNearestExponent(num)
return '%s %s' % (rounded_val, _EXP_STRINGS[i][2])
def Percentile(values, percent, key=lambda x:x):
"""Find the percentile of a list of values.
Taken from: http://code.activestate.com/recipes/511478/
Args:
values: a list of numeric values. Note that the values MUST BE already
sorted.
percent: a float value from 0.0 to 1.0.
key: optional key function to compute value from each element of the list
of values.
Returns:
The percentile of the values.
"""
if not values:
return None
k = (len(values) - 1) * percent
f = math.floor(k)
c = math.ceil(k)
if f == c:
return key(values[int(k)])
d0 = key(values[int(f)]) * (c-k)
d1 = key(values[int(c)]) * (k-f)
return d0 + d1
def ExtractErrorDetail(e):
"""Extract <Details> text from XML content.
Args:
e: The GSResponseError that includes XML to be parsed.
Returns:
(exception_name, d), where d is <Details> text or None if not found.
"""
exc_name_parts = re.split("[\.']", str(type(e)))
if len(exc_name_parts) < 2:
# Shouldn't happen, but have fallback in case.
exc_name = str(type(e))
else:
exc_name = exc_name_parts[-2]
if not hasattr(e, 'body'):
return (exc_name, None)
detail_start = e.body.find('<Details>')
detail_end = e.body.find('</Details>')
if detail_start != -1 and detail_end != -1:
return (exc_name, e.body[detail_start+9:detail_end])
return (exc_name, None)
# Copyright 2010 Google Inc. All Rights Reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish, dis-
# tribute, sublicense, and/or sell copies of the Software, and to permit
# persons to whom the Software is furnished to do so, subject to the fol-
# lowing conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL-
# ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
# SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
"""Implementation of wildcarding over StorageUris.
StorageUri is an abstraction that Google introduced in the boto library,
for representing storage provider-independent bucket and object names with
a shorthand URI-like syntax (see boto/boto/storage_uri.py) The current
class provides wildcarding support for StorageUri objects (including both
bucket and file system objects), allowing one to express collections of
objects with syntax like the following:
gs://mybucket/images/*.png
file:///tmp/???abc???
We provide wildcarding support as part of gsutil rather than as part
of boto because wildcarding is really part of shell command-like
functionality.
A comment about wildcard semantics: We support both single path component
wildcards (e.g., using '*') and recursive wildcards (using '**'), for both
file and cloud URIs. For example,
gs://bucket/doc/*/*.html
would enumerate HTML files one directory down from gs://bucket/doc, while
gs://bucket/**/*.html
would enumerate HTML files in all objects contained in the bucket.
Note also that if you use file system wildcards it's likely your shell
interprets the wildcarding before passing the command to gsutil. For example:
% gsutil cp /opt/eclipse/*/*.html gs://bucket/eclipse
would likely be expanded by the shell into the following before running gsutil:
% gsutil cp /opt/eclipse/RUNNING.html gs://bucket/eclipse
Note also that most shells don't support '**' wildcarding (I think only
zsh does). If you want to use '**' wildcarding with such a shell you can
single quote each wildcarded string, so it gets passed uninterpreted by the
shell to gsutil (at which point gsutil will perform the wildcarding expansion):
% gsutil cp '/opt/eclipse/**/*.html' gs://bucket/eclipse
"""
import boto
import fnmatch
import glob
import os
import re
import sys
import urllib
from boto.s3.prefix import Prefix
from boto.storage_uri import BucketStorageUri
from bucket_listing_ref import BucketListingRef
# Regex to determine if a string contains any wildcards.
WILDCARD_REGEX = re.compile('[*?\[\]]')
WILDCARD_OBJECT_ITERATOR = 'wildcard_object_iterator'
WILDCARD_BUCKET_ITERATOR = 'wildcard_bucket_iterator'
class WildcardIterator(object):
"""Base class for wildcarding over StorageUris.
This class implements support for iterating over StorageUris that
contain wildcards.
The base class is abstract; you should instantiate using the
wildcard_iterator() static factory method, which chooses the right
implementation depending on the StorageUri.
"""
def __repr__(self):
"""Returns string representation of WildcardIterator."""
return 'WildcardIterator(%s)' % self.wildcard_uri
class CloudWildcardIterator(WildcardIterator):
"""WildcardIterator subclass for buckets and objects.
Iterates over BucketListingRef matching the StorageUri wildcard. It's
much more efficient to request the Key from the BucketListingRef (via
GetKey()) than to request the StorageUri and then call uri.get_key()
to retrieve the key, for cases where you want to get metadata that's
available in the Bucket (for example to get the name and size of
each object), because that information is available in the bucket GET
results. If you were to iterate over URIs for such cases and then get
the name and size info from each resulting StorageUri, it would cause
an additional object GET request for each of the result URIs.
"""
def __init__(self, wildcard_uri, proj_id_handler,
bucket_storage_uri_class=BucketStorageUri, all_versions=False,
headers=None, debug=0):
"""
Instantiates an iterator over BucketListingRef matching given wildcard URI.
Args:
wildcard_uri: StorageUri that contains the wildcard to iterate.
proj_id_handler: ProjectIdHandler to use for current command.
bucket_storage_uri_class: BucketStorageUri interface.
Settable for testing/mocking.
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
"""
self.wildcard_uri = wildcard_uri
# Make a copy of the headers so any updates we make during wildcard
# expansion aren't left in the input params (specifically, so we don't
# include the x-goog-project-id header needed by a subset of cases, in
# the data returned to caller, which could then be used in other cases
# where that header must not be passed).
if headers is None:
self.headers = {}
else:
self.headers = headers.copy()
self.proj_id_handler = proj_id_handler
self.bucket_storage_uri_class = bucket_storage_uri_class
self.all_versions = all_versions
self.debug = debug
def __iter__(self):
"""Python iterator that gets called when iterating over cloud wildcard.
Yields:
BucketListingRef, or empty iterator if no matches.
"""
# First handle bucket wildcarding, if any.
if ContainsWildcard(self.wildcard_uri.bucket_name):
regex = fnmatch.translate(self.wildcard_uri.bucket_name)
bucket_uris = []
prog = re.compile(regex)
self.proj_id_handler.FillInProjectHeaderIfNeeded(WILDCARD_BUCKET_ITERATOR,
self.wildcard_uri,
self.headers)
for b in self.wildcard_uri.get_all_buckets(headers=self.headers):
if prog.match(b.name):
# Use str(b.name) because get_all_buckets() returns Unicode
# string, which when used to construct x-goog-copy-src metadata
# requests for object-to-object copies causes pathname '/' chars
# to be entity-encoded (bucket%2Fdir instead of bucket/dir),
# which causes the request to fail.
uri_str = '%s://%s' % (self.wildcard_uri.scheme,
urllib.quote_plus(str(b.name)))
bucket_uris.append(
boto.storage_uri(
uri_str, debug=self.debug,
bucket_storage_uri_class=self.bucket_storage_uri_class,
suppress_consec_slashes=False))
else:
bucket_uris = [self.wildcard_uri.clone_replace_name('')]
# Now iterate over bucket(s), and handle object wildcarding, if any.
self.proj_id_handler.FillInProjectHeaderIfNeeded(WILDCARD_OBJECT_ITERATOR,
self.wildcard_uri,
self.headers)
for bucket_uri in bucket_uris:
if self.wildcard_uri.names_bucket():
# Bucket-only URI.
yield BucketListingRef(bucket_uri, key=None, prefix=None,
headers=self.headers)
else:
# URI contains an object name. If there's no wildcard just yield
# the needed URI.
if not ContainsWildcard(self.wildcard_uri.object_name):
uri_to_yield = bucket_uri.clone_replace_name(
self.wildcard_uri.object_name)
yield BucketListingRef(uri_to_yield, key=None, prefix=None,
headers=self.headers)
else:
# URI contains a wildcard. Expand iteratively by building
# prefix/delimiter bucket listing request, filtering the results per
# the current level's wildcard, and continuing with the next component
# of the wildcard. See _BuildBucketFilterStrings() documentation
# for details.
#
# Initialize the iteration with bucket name from bucket_uri but
# object name from self.wildcard_uri. This is needed to handle cases
# where both the bucket and object names contain wildcards.
uris_needing_expansion = [
bucket_uri.clone_replace_name(self.wildcard_uri.object_name)]
while len(uris_needing_expansion) > 0:
uri = uris_needing_expansion.pop(0)
(prefix, delimiter, prefix_wildcard, suffix_wildcard) = (
self._BuildBucketFilterStrings(uri.object_name))
prog = re.compile(fnmatch.translate(prefix_wildcard))
# List bucket for objects matching prefix up to delimiter.
for key in bucket_uri.list_bucket(prefix=prefix,
delimiter=delimiter,
headers=self.headers,
all_versions=self.all_versions):
# Check that the prefix regex matches rstripped key.name (to
# correspond with the rstripped prefix_wildcard from
# _BuildBucketFilterStrings()).
if prog.match(key.name.rstrip('/')):
if suffix_wildcard and key.name.rstrip('/') != suffix_wildcard:
if isinstance(key, Prefix):
# There's more wildcard left to expand.
uris_needing_expansion.append(
uri.clone_replace_name(key.name.rstrip('/') + '/'
+ suffix_wildcard))
else:
# Done expanding.
expanded_uri = uri.clone_replace_key(key)
if isinstance(key, Prefix):
yield BucketListingRef(expanded_uri, key=None, prefix=key,
headers=self.headers)
else:
if self.all_versions:
yield BucketListingRef(expanded_uri, key=key, prefix=None,
headers=self.headers)
else:
# Yield BLR wrapping version-less URI.
yield BucketListingRef(expanded_uri.clone_replace_name(
expanded_uri.object_name), key=key, prefix=None,
headers=self.headers)
def _BuildBucketFilterStrings(self, wildcard):
"""
Builds strings needed for querying a bucket and filtering results to
implement wildcard object name matching.
Args:
wildcard: The wildcard string to match to objects.
Returns:
(prefix, delimiter, prefix_wildcard, suffix_wildcard)
where:
prefix is the prefix to be sent in bucket GET request.
delimiter is the delimiter to be sent in bucket GET request.
prefix_wildcard is the wildcard to be used to filter bucket GET results.
suffix_wildcard is wildcard to be appended to filtered bucket GET
results for next wildcard expansion iteration.
For example, given the wildcard gs://bucket/abc/d*e/f*.txt we
would build prefix= abc/d, delimiter=/, prefix_wildcard=d*e, and
suffix_wildcard=f*.txt. Using this prefix and delimiter for a bucket
listing request will then produce a listing result set that can be
filtered using this prefix_wildcard; and we'd use this suffix_wildcard
to feed into the next call(s) to _BuildBucketFilterStrings(), for the
next iteration of listing/filtering.
Raises:
AssertionError if wildcard doesn't contain any wildcard chars.
"""
# Generate a request prefix if the object name part of the wildcard starts
# with a non-wildcard string (e.g., that's true for 'gs://bucket/abc*xyz').
match = WILDCARD_REGEX.search(wildcard)
if not match:
# Input "wildcard" has no wildcard chars, so just return tuple that will
# cause a bucket listing to match the given input wildcard. Example: if
# previous iteration yielded gs://bucket/dir/ with suffix_wildcard abc,
# the next iteration will call _BuildBucketFilterStrings() with
# gs://bucket/dir/abc, and we will return prefix ='dir/abc',
# delimiter='/', prefix_wildcard='dir/abc', and suffix_wildcard=''.
prefix = wildcard
delimiter = '/'
prefix_wildcard = wildcard
suffix_wildcard = ''
else:
if match.start() > 0:
# Wildcard does not occur at beginning of object name, so construct a
# prefix string to send to server.
prefix = wildcard[:match.start()]
wildcard_part = wildcard[match.start():]
else:
prefix = None
wildcard_part = wildcard
end = wildcard_part.find('/')
if end != -1:
wildcard_part = wildcard_part[:end+1]
# Remove trailing '/' so we will match gs://bucket/abc* as well as
# gs://bucket/abc*/ with the same wildcard regex.
prefix_wildcard = ((prefix or '') + wildcard_part).rstrip('/')
suffix_wildcard = wildcard[match.end():]
end = suffix_wildcard.find('/')
if end == -1:
suffix_wildcard = ''
else:
suffix_wildcard = suffix_wildcard[end+1:]
# To implement recursive (**) wildcarding, if prefix_wildcard
# suffix_wildcard starts with '**' don't send a delimiter, and combine
# suffix_wildcard at end of prefix_wildcard.
if prefix_wildcard.find('**') != -1:
delimiter = None
prefix_wildcard = prefix_wildcard + suffix_wildcard
suffix_wildcard = ''
else:
delimiter = '/'
delim_pos = suffix_wildcard.find(delimiter)
# The following debug output is useful for tracing how the algorithm
# walks through a multi-part wildcard like gs://bucket/abc/d*e/f*.txt
if self.debug > 1:
sys.stderr.write(
'DEBUG: wildcard=%s, prefix=%s, delimiter=%s, '
'prefix_wildcard=%s, suffix_wildcard=%s\n' %
(wildcard, prefix, delimiter, prefix_wildcard, suffix_wildcard))
return (prefix, delimiter, prefix_wildcard, suffix_wildcard)
def IterKeys(self):
"""
Convenience iterator that runs underlying iterator and returns Key for each
iteration.
Yields:
Subclass of boto.s3.key.Key, or empty iterator if no matches.
Raises:
WildcardException: for bucket-only uri.
"""
for bucket_listing_ref in self. __iter__():
if bucket_listing_ref.HasKey():
yield bucket_listing_ref.GetKey()
def IterUris(self):
"""
Convenience iterator that runs underlying iterator and returns StorageUri
for each iteration.
Yields:
StorageUri, or empty iterator if no matches.
"""
for bucket_listing_ref in self. __iter__():
yield bucket_listing_ref.GetUri()
def IterUrisForKeys(self):
"""
Convenience iterator that runs underlying iterator and returns the
StorageUri for each iterated BucketListingRef that has a Key.
Yields:
StorageUri, or empty iterator if no matches.
"""
for bucket_listing_ref in self. __iter__():
if bucket_listing_ref.HasKey():
yield bucket_listing_ref.GetUri()
class FileWildcardIterator(WildcardIterator):
"""WildcardIterator subclass for files and directories.
If you use recursive wildcards ('**') only a single such wildcard is
supported. For example you could use the wildcard '**/*.txt' to list all .txt
files in any subdirectory of the current directory, but you couldn't use a
wildcard like '**/abc/**/*.txt' (which would, if supported, let you find .txt
files in any subdirectory named 'abc').
"""
def __init__(self, wildcard_uri, headers=None, debug=0):
"""
Instantiate an iterator over BucketListingRefs matching given wildcard URI.
Args:
wildcard_uri: StorageUri that contains the wildcard to iterate.
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
"""
self.wildcard_uri = wildcard_uri
self.headers = headers
self.debug = debug
def __iter__(self):
wildcard = self.wildcard_uri.object_name
match = re.search('\*\*', wildcard)
if match:
# Recursive wildcarding request ('.../**/...').
# Example input: wildcard = '/tmp/tmp2pQJAX/**/*'
base_dir = wildcard[:match.start()-1]
remaining_wildcard = wildcard[match.start()+2:]
# At this point for the above example base_dir = '/tmp/tmp2pQJAX' and
# remaining_wildcard = '/*'
if remaining_wildcard.startswith('*'):
raise WildcardException('Invalid wildcard with more than 2 consecutive '
'*s (%s)' % wildcard)
# If there was no remaining wildcard past the recursive wildcard,
# treat it as if it were a '*'. For example, file://tmp/** is equivalent
# to file://tmp/**/*
if not remaining_wildcard:
remaining_wildcard = '*'
# Skip slash(es).
remaining_wildcard = remaining_wildcard.lstrip(os.sep)
filepaths = []
for dirpath, unused_dirnames, filenames in os.walk(base_dir):
filepaths.extend(
os.path.join(dirpath, f) for f in fnmatch.filter(filenames,
remaining_wildcard)
)
else:
# Not a recursive wildcarding request.
filepaths = glob.glob(wildcard)
for filepath in filepaths:
expanded_uri = self.wildcard_uri.clone_replace_name(filepath)
yield BucketListingRef(expanded_uri)
def IterKeys(self):
"""
Placeholder to allow polymorphic use of WildcardIterator.
Raises:
WildcardException: in all cases.
"""
raise WildcardException(
'Iterating over Keys not possible for file wildcards')
def IterUris(self):
"""
Convenience iterator that runs underlying iterator and returns StorageUri
for each iteration.
Yields:
StorageUri, or empty iterator if no matches.
"""
for bucket_listing_ref in self. __iter__():
yield bucket_listing_ref.GetUri()
class WildcardException(StandardError):
"""Exception thrown for invalid wildcard URIs."""
def __init__(self, reason):
StandardError.__init__(self)
self.reason = reason
def __repr__(self):
return 'WildcardException: %s' % self.reason
def __str__(self):
return 'WildcardException: %s' % self.reason
def wildcard_iterator(uri_or_str, proj_id_handler,
bucket_storage_uri_class=BucketStorageUri,
all_versions=False,
headers=None, debug=0):
"""Instantiate a WildCardIterator for the given StorageUri.
Args:
uri_or_str: StorageUri or URI string naming wildcard objects to iterate.
proj_id_handler: ProjectIdHandler to use for current command.
bucket_storage_uri_class: BucketStorageUri interface.
Settable for testing/mocking.
headers: Dictionary containing optional HTTP headers to pass to boto.
debug: Debug level to pass in to boto connection (range 0..3).
Returns:
A WildcardIterator that handles the requested iteration.
"""
if isinstance(uri_or_str, basestring):
# Disable enforce_bucket_naming, to allow bucket names containing wildcard
# chars.
uri = boto.storage_uri(
uri_or_str, debug=debug, validate=False,
bucket_storage_uri_class=bucket_storage_uri_class,
suppress_consec_slashes=False)
else:
uri = uri_or_str
if uri.is_cloud_uri():
return CloudWildcardIterator(
uri, proj_id_handler,
bucket_storage_uri_class=bucket_storage_uri_class,
all_versions=all_versions,
headers=headers,
debug=debug)
elif uri.is_file_uri():
return FileWildcardIterator(uri, headers=headers, debug=debug)
else:
raise WildcardException('Unexpected type of StorageUri (%s)' % uri)
def ContainsWildcard(uri_or_str):
"""Checks whether uri_or_str contains a wildcard.
Args:
uri_or_str: StorageUri or URI string to check.
Returns:
bool indicator.
"""
if isinstance(uri_or_str, basestring):
return bool(WILDCARD_REGEX.search(uri_or_str))
else:
return bool(WILDCARD_REGEX.search(uri_or_str.uri))
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment