Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

Looks like there are two different problems here:

  • How to use process substitution in Python
  • How to write pipelines in a more compact way like in the shell

For the first one, I would rewrite your shell code as follows:

from subprocess import Popen, PIPE
dl1 = Popen(["aws", "s3", "cp", path1, "-"], stdout=PIPE)
gunzip1 = Popen(["gunzip"], stdin=dl1.stdout, stdout=PIPE)
gunzip1_fd = gunzip1.stdout.fileno()
dl2 = Popen(["aws", "s3", "cp", path2, "-"], stdout=PIPE)
gunzip2 = Popen(["gunzip"], stdin=dl2.stdout, stdout=PIPE)
gunzip2_fd = gunzip2.stdout.fileno()
cmd = Popen(["cmd", "-1", f"/dev/fd/{gunzip1_fd}", "-2", f"/dev/fd/{gunzip2_fd}"],
    stdout=PIPE, pass_fds=[gunzip1_fd, gunzip2_fd])
gzip = Popen(["gzip"], stdin=cmd.stdout)
upload = Popen(["aws", "s3", "cp", "-", pathOut], stdin=gzip.stdout)
for p in dl1, gunzip1, dl2, gunzip2, cmd, gzip:
   p.stdout.close()
outs, errs = upload.communicate()

This assumes /dev/fd support on your platform. Shells fall back to named pipes on platforms where /dev/fd is not present, but I haven't reproduced that behavior here.

For expressing pipelines in a more compact and intuitive way, PyPI has packages like sh or plumbum. While they don't seem to support process substitution out of the box, it looks like it wouldn't be too hard to extend them.