In my bioinformatics work I often stream files between linux hosts and Amazon S3. This could look like:

$ scp host:/path/to/file /dev/stdout | \
    aws s3 cp - s3://bucket/path/to/file

This recently stopped working after upgrading:

ftruncate "/dev/stdout": Invalid argument
Couldn't write to "/dev/stdout": Illegal seek

I think I figured out why this is happening:

With scp I can give the -O flag:

Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for particular filename wildcard patterns and for expanding paths with a '~' prefix for older SFTP servers.

This does work, but it doesn't seem ideal: probably servers will drop support for the SCP protocol at some point? I've filed a bug with OpenSSH.


[1] "man scp" gives me: "Since OpenSSH 8.8 (8.7 in Red Hat/Fedora builds), scp has used the SFTP protocol for transfers by default."

Comment via: facebook, mastodon

New Comment
4 comments, sorted by Click to highlight new comments since:

Using scp to stdout looks weird to me no matter what. Why not

ssh -n host cat /path/to/file | weird-aws-stuff

... but do you really want to copy everything twice? Why not run weird-aws-stuff on the remote host itself?

The remote host supports SCP and SFTP, but not SSH.

The bigger problem is that there is no standard data format or standard tool for moving data bigger than 5 GB. (This includes aws s3; which is not an industry standard)

Whoever builds the industry standard will get decision-making power over your specific issue. 

Is there something special about 5GB?

What's wrong with streaming?