/images/avatar.jpg
杂七杂八的,随手记录。

Data Structures in PG

Table of Contents

1 Hash & TapeSet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

class LogicalTapeSet {
    + BufFile pfile
    + SharedFileSet fileset
    + int worker
    + long nBlocksAllocated
    + long nBlocksWritten
    + long nHoleBlocks
    + bool forgetFreeSpace
    + long freeBlocks
    + long nFreeBlocks
    + Size freeBlocksLen
    + bool enable_prealloc
}

class LogicalTape {
    + LogicalTapeSet tapeSet
    + bool writing
    + bool frozen
    + bool dirty
    + long firstBlockNumber
    + long curBlockNumber
    + long nextBlockNumber
    + long offsetBlockNumber
    + char buffer
    + int buffer_size
    + int max_size
    + int pos
    + int nbytes
    + long prealloc
    + int nprealloc
    + int prealloc_size
}



class BufFile {
    + int numFiles
    + File files
    + bool isInterXact
    + bool dirty
    + bool readOnly
    + FileSet fileset
    + const name
    + ResourceOwner resowner
    + int curFile
    + off_t curOffset
    + int pos
    + int nbytes
    + PGAlignedBlock buffer
}

LogicalTapeSet *-- BufFile

class HashAggSpill {
+ int npartitions
+ LogicalTape partitions
+ int64 ntuples
+ uint32 mask
+ int shift
+ hyperLogLogState hll_card
}


HashAggSpill *-- LogicalTape

LogicalTape - LogicalTapeSet

2 Slot & tuple

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

class TupleTableSlot {
  + NodeTag type
  + int tts_flags
  + AttrNumber tts_nvalid
  + const tts_ops
  + TupleDesc tts_tupleDescriptor
  + int tts_values
  + int tts_isnull
  + int tts_mcxt
  + ItemPointerData tts_tid
  + int tts_tableOid
}




class MinimalTupleData {
  + int t_len
  + char mt_padding
  + int t_infomask2
  + int t_infomask
  + int t_hoff
  + int t_bits
}


class HeapTupleHeaderData {
  + union t_choice
  + ItemPointerData t_ctid
  + int t_infomask2
  + int t_infomask
  + int t_hoff
  + int t_bits
}

class union {
  + HeapTupleFields t_heap
  + DatumTupleFields t_datum
}


HeapTupleHeaderData *-- union

class MinimalTupleTableSlot {
  + TupleTableSlot base
  + HeapTuple tuple
  + MinimalTuple mintuple
  + HeapTupleData minhdr
  + int off
}

class HeapTupleData {
  + int t_len
  + ItemPointerData t_self
  + int t_tableOid
  + HeapTupleHeader t_data
}


class VirtualTupleTableSlot {
  + TupleTableSlot base
  + char data
}

class HeapTupleTableSlot {
  + TupleTableSlot base
  + HeapTuple tuple
  + int off
  + HeapTupleData tupdata
}

class BufferHeapTupleTableSlot {
  + HeapTupleTableSlot base
  + Buffer buffer
}


TupleTableSlot <|-- MinimalTupleTableSlot
TupleTableSlot <|-- VirtualTupleTableSlot
TupleTableSlot <|-- HeapTupleTableSlot
HeapTupleTableSlot <|-- BufferHeapTupleTableSlot


MinimalTupleTableSlot *-- MinimalTupleData
HeapTupleTableSlot *-- HeapTupleData
HeapTupleData *-- HeapTupleHeaderData

gp shared snapshot

Table of Contents

本文为摘录(或转载),侵删,原文为: ../../../Work/pg_gpdb/src/backend/utils/time/sharedsnapshot.c

在 Greenplum 中,作为切片计划的一部分,许多 PostgreSQL 进程(qExecs,QE)在单个段数据库上运行,作为同一用户 SQL 语句的一部分。属于特定用户在特定段数据库上的所有 qExecs 需要具有一致的可见性。为此,使用了一种称为“共享本地快照”(Shared Local Snapshot)的思想。共享内存数据结构 SharedSnapshotSlot 在特定数据库实例上的会话的统一流程(gang processes)之间共享会话和事务信息。这些流程被称为 SegMate 进程组。

gpcheckcat

1 概述

GP 提供了 gpcheckcat 用于在集群内检查系统表。

Table 1: gpcheckcat
检查项描述Utility 模式复合查询错误等级
pg_classCheck pg_class entry that does not have any correspond pg_attribute entryYNNOREPAIR
namespaceCheck for schemas with a missing schema definitionYNNOREPAIR
unique_index_violationCheck for violated unique indexesNYNOREPAIR
duplicateCheck for duplicate entriesNY
missing_extraneousCross consistency check for missing or extraneous entriesNY
inconsistentCross consistency check for coordinator segment inconsistencyN
foreign_keyCheck foreign keysN

Note:

gpdb cdb

1 Data structures

1.1 Slice Table

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
@startuml
class SliceTable {
+ NodeTag type
+ int localSlice
+ int numSlices
+ ExecSlice slices
+ bool hasMotions
+ int instrument_options
+ uint32 ic_instance_id
}

note right of SliceTable::localSlice
Index of the slice to execute
end note

note right of SliceTable::slices
Array of slices, indexed by SliceIndex
end note


note right of SliceTable::hasMotions
Are there any Motion nodes anywhere in the plan?
end note


class ExecSlice {
+ int sliceIndex
+ int rootIndex
+ int parentIndex
+ int planNumSegments
+ List children
+ GangType gangType
+ List segments
+ struct primaryGang
+ List primaryProcesses
+ Bitmapset processesMap
}

note right of ExecSlice::primaryProcesses
A list of CDBProcess nodes corresponding to the worker
processes allocated to implement this plan slice.
end note

note right of ExecSlice::processesMap
A bitmap to identify which QE should execute this slice
end note

SliceTable o-- ExecSlice

class Gang {
+ GangType type
+ int size
+ struct db_descriptors
+ bool allocated
}

note right of Gang::db_descriptors
Array of QEs/segDBs that make up this gang.
Sorted by segment index.
end note


ExecSlice *-- Gang

class CdbProcess {
+ NodeTag type
+ char listenerAddr
+ int listenerPort
+ int pid
+ int contentid
+ int dbid
}

ExecSlice o-- CdbProcess



class SegmentDatabaseDescriptor {
+ struct segment_database_info
+ int segindex
+ int conn
+ int motionListener
+ int backendPid
+ char whoami
+ int isWriter
+ int identifier
}

Gang o-- SegmentDatabaseDescriptor



class CdbComponentDatabases {
+ CdbComponentDatabaseInfo segment_db_info
+ int total_segment_dbs
+ CdbComponentDatabaseInfo entry_db_info
+ int total_entry_dbs
+ int total_segments
+ int fts_version
+ int expand_version
+ int numActiveQEs
+ int numIdleQEs
+ int qeCounter
+ List freeCounterList
}

note right of CdbComponentDatabaseInfo::segment_db_info
array of  SegmentDatabaseInfo for segment databases
end note

note right of CdbComponentDatabaseInfo::entry_db_info
array of  SegmentDatabaseInfo for entry databases
end note


class CdbComponentDatabaseInfo {
+ struct config
+ CdbComponentDatabases cdbs
+ int hostSegs
+ List freelist
+ int numIdleQEs
+ int numActiveQEs
}

note right of CdbComponentDatabaseInfo::cdbs
point to owners
end note

CdbComponentDatabases o-- CdbComponentDatabaseInfo



class GpSegConfigEntry {
+ int dbid
+ int segindex
+ char role
+ char preferred_role
+ char mode
+ char status
+ int port
+ char hostname
+ char address
+ char datadir
+ char hostip
+ char hostaddrs
}

CdbComponentDatabaseInfo o-- GpSegConfigEntry

SegmentDatabaseDescriptor o-- CdbComponentDatabaseInfo

@enduml

gpdb memory control

TODO 1 VMem

2 Resource Group Control

  • ResourceGroupGetQueryMemoryLimit(void) 用于获取内存限制的绝对大小 (非百分比)

2.1 Bypass

  • 绕开资源限制模式

  • Enabled when:

    • gp_resource_group_bypass is true: guc_gp.c

      1
      2
      3
      4
      5
      6
      7
      8
      9
      
      {
          {"gp_resource_group_bypass", PGC_USERSET, RESOURCES,
              gettext_noop("If the value is true, the query in this session will not be limited by resource group."),
              NULL
          },
          &gp_resource_group_bypass,
          false,
          check_gp_resource_group_bypass, NULL, NULL
      }
      
    • Or command is one of:

GPDB: Configuring Your Systems

本文为摘录(或转载),侵删,原文为: https://docs.vmware.com/en/VMware-Tanzu-Greenplum/6/greenplum-database/GUID-install_guide-prep_os.html

1 IP Fragmentation Settings

IP Fragmentation Settings

When the Greenplum Database interconnect uses UDP (the default), the network interface card controls IP packet fragmentation and reassemblies.

If the UDP message size is larger than the size of the maximum transmission unit (MTU) of a network, the IP layer fragments the message. (Refer to Networking later in this topic for more information about MTU sizes for Greenplum Database.) The receiver must store the fragments in a buffer before it can reorganize and reassemble the message.

Hash Index of PG

Table of Contents

本文为摘录(或转载),侵删,原文为: ../../../Work/pg_master/src/backend/access/hash/README

1 Hash Indexing

这个目录包含了 Postgres 的散列索引实现。其中大部分核心思想来自于 Margo Seltzer 和 Ozan Yigit 在 1991 年 1 月举行的冬季 USENIX 会议上的论文《A New Hashing Package for UNIX》。我们的内存哈希表实现(src/backend/utils/hash/dynahash.c)也依赖于相同的概念;它源自于 Esmond Pitt 编写的代码,后来又由 Margo 和其他人进行了改进。

How To Use Journalctl to View and Manipulate Systemd Logs

本文为摘录(或转载),侵删,原文为: https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs

1 Basic Log Viewing

要查看 journald 守护程序收集的日志,请使用 journalctl 命令。

当单独使用时,系统中的每个日志条目都会在翻页器(通常是 less)中显示,供您浏览。最旧的条目将显示在顶部:

Hybrid Blockchain Database Systems: Design and Performance

Table of Contents

本文为摘录(或转载),侵删,原文为: attachments/pdf/e/p1092-loghin.pdf

1 ABSTRACT

Abbrs:

  • CFT: crash fault-tolerant
  • BFT: byzantine fault-tolerant
    Byzantine Fault Tolerance (BFT) is a trait of decentralized, permissionless systems which are capable of successfully identifying and rejecting dishonest or faulty information. Byzantine fault tolerant systems have successfully solved the Byzantine Generals Problem and are robust against sybil attacks.

2 INTRODUCTION

  • 学术界出现了集成分布式数据库与区块链特性的系统